K Number
K243688
Device Name
Saige-Dx (3.1.0)
Manufacturer
Date Cleared
2024-12-19

(20 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

Saige-Dx analyzes digital breast tomosynthesis (DBT) mammograms to identify the presence or absence of soft tissue lesions and calcifications that may be indicative of cancer. For a given DBT mammogram, Saige-Dx analyzes the DBT image stacks and the accompanying 2D images, including full field digital mammography and/or synthetic images. The system assigns a Suspicion Level, indicating the strength of suspicion that cancer may be present, for each detected finding and for the entire case. The outputs of Saige-Dx are intended to be used as a concurrent reading aid for interpreting physicians on screening mammograms with compatible DBT hardware.

Device Description

Saige-Dx is a software device that processes screening mammograms using artificial intelligence to aid interpreting radiologists. By automatically detecting the presence or absence of soft tissue lesions and calcifications in mammography images, Saige-Dx can help improve reader performance, while also reducing time. The software takes as input a set of x-ray mammogram DICOM files from a single digital breast tomosynthesis (DBT) study and generates finding-level outputs for each image analyzed, as well as an aggregate case-level assessment. Saige-Dx processes both the DBT image stacks and the associated 2D images (full-field digital mammography (FFDM) and/or synthetic 2D images) in a DBT study. For each image, Saige-Dx outputs bounding boxes circumscribing any detected findings and assigns a Finding Suspicion Level to each finding, indicating the degree of suspicion that the finding is malignant. Saige-Dx uses the results of the finding-level analysis to generate a Case Suspicion Level, indicating the degree of suspicion for malignancy across the case. Saige-Dx encapsulates the finding and case-level results into a DICOM Structured Report (SR) object containing markings that can be overlaid on the original mammogram images using a viewing workstation and a DICOM Secondary Capture (SC) object containing a summary report of the Saige-Dx results.

AI/ML Overview

The provided text describes the Saige-Dx (v.3.1.0) device and its performance testing as part of an FDA 510(k) submission (K243688). However, it does not contain specific acceptance criteria values or the quantitative results of the device's performance against those criteria. It states that "All tests met the pre-specified performance criteria," but does not list those criteria or the measured performance metrics.

Therefore, while I can extract information related to the different aspects of the study, I cannot create a table of acceptance criteria and reported device performance with specific values.

Here's a breakdown of the information available based on your request:

1. A table of acceptance criteria and the reported device performance

  • Acceptance Criteria: Not explicitly stated in quantitative terms. The document only mentions that "All tests met the pre-specified performance criteria."
  • Reported Device Performance: Not explicitly stated in quantitative terms (e.g., specific sensitivity, specificity, AUC values, or improvements in human reader performance).

2. Sample sized used for the test set and the data provenance (e.g. country of origin of the data, retrospective or prospective)

  • Test Set Sample Size: Not explicitly stated for the validation performance study. The text mentions "Validation of the software was previously conducted using a multi-reader multi-case (MRMC) study and standalone performance testing conducted under approved IRB protocols (K220105 and K241747)." It also mentions that the tests included "DBT screening mammograms with Hologic standard definition and HD images, GE images, exams with unilateral breasts, and from patients with breast implants (on implant displaced views)."
  • Data Provenance: The data for the training set was collected from "multiple vendors including GE and Hologic equipment" and from "diverse practices with the majority from geographically diverse areas within the United States, including New York and California." For the test set, it is implied to be similar in nature as it's part of the overall "performance testing," but specific details for the test set alone are not provided regarding country of origin or retrospective/prospective nature. However, since it involves IRB protocols, it suggests a structured, likely prospective collection or at least a carefully curated retrospective collection.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g. radiologist with 10 years of experience)

  • Not explicitly stated for the test set. The document indicates that a Multi-Reader Multi-Case (MRMC) study was performed, which implies the involvement of expert readers, but the number of experts and their qualifications are not detailed.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

  • Not explicitly stated for the test set. The involvement of an MRMC study suggests a structured interpretation process, potentially including adjudication, but the method (e.g., consensus, majority rule with an adjudicator) is not described.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

  • Yes, an MRMC study was done: "Validation of the software was previously conducted using a multi-reader multi-case (MRMC) study..."
  • Effect Size: The document does not provide the quantitative effect size of how much human readers improved with AI vs. without AI assistance. It broadly states that Saige-Dx "can help improve reader performance, while also reducing time."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

  • Yes, standalone performance testing was done: "...and standalone performance testing conducted under approved IRB protocols..."
  • Results: The document states that "All tests met the pre-specified performance criteria" for the standalone performance, but does not provide the specific quantitative results (e.g., sensitivity, specificity, AUC).

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)

  • Not explicitly stated. For a device identifying "soft tissue lesions and calcifications that may be indicative of cancer," ground truth would typically involve a combination of biopsy/pathology results, clinical follow-up, and potentially expert consensus on imaging in cases without definitive pathology. However, the document doesn't specify the exact method for establishing ground truth for either the training or test sets.

8. The sample size for the training set

  • Training Set Sample Size: "A total of nine datasets comprising 141,768 patients and 316,166 studies were collected..."

9. How the ground truth for the training set was established

  • Not explicitly stated. The document mentions the collection of diverse datasets for training but does not detail how the ground truth for these 141,768 patients and 316,166 studies was established (e.g., through radiologists' interpretations, pathology reports, clinical outcomes).

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.