Search Results

Saige-Dx analyzes digital breast tomosynthesis (DBT) mammograms to identify the presence or absence of soft tissue lesions and calcifications that may be indicative of cancer. For a given DBT mammogram, Saige-Dx analyzes the DBT image stacks and the accompanying 2D images, including full field digital mammography and/or synthetic images. The system assigns a Suspicion Level, indicating the strength of suspicion that cancer may be present, for each detected finding and for the entire case. The outputs of Saige-Dx are intended to be used as a concurrent reading aid for interpreting physicians on screening mammograms with compatible DBT hardware.

Device Description

Saige-Dx is a software device that processes screening mammograms using artificial intelligence to aid interpreting radiologists. By automatically detecting the presence or absence of soft tissue lesions and calcifications in mammography images, Saige-Dx can help improve reader performance, while also reducing time. The software takes as input a set of x-ray mammogram DICOM files from a single digital breast tomosynthesis (DBT) study and generates finding-level outputs for each image analyzed, as well as an aggregate case-level assessment. Saige-Dx processes both the DBT image stacks and the associated 2D images (full-field digital mammography (FFDM) and/or synthetic 2D images) in a DBT study. For each image, Saige-Dx outputs bounding boxes circumscribing any detected findings and assigns a Finding Suspicion Level to each finding, indicating the degree of suspicion that the finding is malignant. Saige-Dx uses the results of the finding-level analysis to generate a Case Suspicion Level, indicating the degree of suspicion for malignancy across the case. Saige-Dx encapsulates the finding and case-level results into a DICOM Structured Report (SR) object containing markings that can be overlaid on the original mammogram images using a viewing workstation and a DICOM Secondary Capture (SC) object containing a summary report of the Saige-Dx results.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. A table of acceptance criteria and the reported device performance

Acceptance Criteria (Endpoint)	Reported Device Performance
Substantial equivalence demonstrating non-inferiority of the subject device (Saige-Dx) on compatible exams compared to the predicate device's performance on previously compatible exams.	The study endpoint was met. The lower bound of the 95% CI around the delta AUC between Hologic and GE cases, compared to Hologic-only exams, was greater than the non-inferiority margin.
	Case-level AUC on compatible exams: 0.910 (95% CI: 0.886, 0.933)
Generalizable standalone performance across confounders for GE and Hologic exams.	Demonstrated generalizable standalone performance on GE and Hologic exams across patient age, breast density, breast size, race, ethnicity, exam type, pathology classification, lesion size, and modality.
Performance on Hologic HD images.	Met pre-specified performance criteria.
Performance on unilateral breasts.	Met pre-specified performance criteria.
Performance on breast implants (implant displaced views).	Met pre-specified performance criteria.

2. Sample size used for the test set and the data provenance

Sample Size: 1,804 women (236 cancer exams and 1,568 non-cancer exams).
Data Provenance: Collected from 12 clinical sites across the United States. It's a retrospective dataset, as indicated by the description of cancer exams being confirmed by biopsy pathology and non-cancer exams by negatively interpreted subsequent screens.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of Experts: At least two independent truthers, plus an additional adjudicator if needed (implying a minimum of two, potentially three).
Qualifications of Experts: MQSA qualified, breast imaging specialists.

4. Adjudication method for the test set

Adjudication Method: "Briefly, each cancer exam and supporting medical reports were reviewed by two independent truthers, plus an additional adjudicator if needed." This describes a 2+1 adjudication method.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

The provided text describes a standalone performance study ("The pivotal study compared the standalone performance between the subject device"). It does not mention an MRMC comparative effectiveness study and therefore no effect size for human reader improvement with AI assistance is reported. The device is intended as a concurrent reading aid, but the reported study focused on the algorithm's standalone performance.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done. The text states: "Validation of the software was performed using standalone performance testing..." and "The pivotal study compared the standalone performance between the subject device."

7. The type of ground truth used

For Cancer Exams: Confirmed by biopsy pathology.
For Non-Cancer Exams: Confirmed by a negatively interpreted exam on the subsequent screen and without malignant biopsy pathology.
For Lesions: Lesions for cancer exams were established by MQSA qualified breast imaging specialists, likely based on radiological findings and pathology reports.

8. The sample size for the training set

Sample Size: 121,348 patients and 122,252 studies.

9. How the ground truth for the training set was established

The document does not explicitly detail the method for establishing ground truth for the training set. It mentions the training dataset was "robust and diverse." However, given the rigorous approach described for the test set's ground truth (biopsy pathology, negative subsequent screens, expert review), it is reasonable to infer a similar, if not identical, standard was applied to the training data. The text emphasizes "no exam overlap between the training and testing datasets," indicating a careful approach to data separation.

Ask a Question

Ask a specific question about this device

Page 1 of 1