(137 days)
Saige-Q is a software workflow tool designed to aid radiologists in prioritizing exams within the standard-of-care image worklist for compatible full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) screening mammograms. Saige-Q uses an artificial intelligence algorithm to generate a code for a given mammogram, indicative of the software's suspicion that the mammogram contains at least one suspicious finding. Saige-Q makes the assigned codes available to a PACS/EPR/RIS/workstation for worklist prioritization or triage.
Saige-Q is intended for passive notification only and does not provide any diagnostic information beyond triage and prioritization. Thus, it is not intended to replace the review of images or be used on a stand-alone basis for clinical decision-making. The decision to use Saige-Q codes and how to use those codes is ultimately up to the interpreting radiologist. The interpreting radiologist is reviewing each exam on a diagnostic viewer and evaluating each patient according to the current standard of care.
Saige-Q is a software workflow device that processes Digital Breast Tomosynthesis (DBT) and Full-Field Digital Mammography (FFDM) screening mammograms using artificial intelligence to act as a prioritization tool for interpreting radiologists. By automatically indicating whether a given mammogram is suspicious for malignancy. Saige-Q can help the user prioritize or triage cases in their worklist (or queue) that may benefit from prioritized review.
Saige-Q takes as input a set of x-ray mammogram DICOM files from a single screening mammography study (FFDM or DBT). The software first checks that the study is appropriate for Saige-Q analysis and then extracts, processes and analyses the DICOM images using an artificial intelligence algorithm. As a result of the analysis, the software generates a Saige-Q code indicating the software's suspicion of the presence of findings suggestive of breast cancer. For mammograms given a Saige-Q code of "Suspicious," the software also generates a compressed preview image, which is for informational purposes only and is not intended for diagnostic use.
The Saige-Q code can be viewed by radiologists on a picture archiving and communication system (PACS), Electronic Patient Record (EPR), and/or Radiology Information System (RIS) worklist and can be used to reorder the worklist. As a software-only device, Saige-Q can be hosted on a compatible host server connected to the necessary clinical IT systems such that DICOM studies can be received and the resulting outputs returned where they can be incorporated into the radiology worklist.
The Saige-Q codes can be used for triage or prioritization. For example, "Suspicious" studies could be given prioritized review. With a worklist that supports sorting, batches of mammograms could also be sorted based on the Saige-Q code.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:
Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criterion | Saige-Q FFDM Performance (Reported Value) | Saige-Q DBT Performance (Reported Value) | BCSC Data (Baseline/Target) | Predicate Device (cmTriage) |
---|---|---|---|---|
Overall AUC | 0.966 (95% CI: [0.957, 0.975]) | 0.985 (95% CI: [0.979, 0.990]) | >0.95 (QFM product code requirement for effective triage) | Meets or exceeds predicate performance |
Specificity at 86.9% Sensitivity | 92.2% (95% CI: [90.2%, 93.8%]) | 98.3% (95% CI: [97.3%, 99.0%]) | >80% CI | - |
Sensitivity at 88.9% Specificity | 91.2% (95% CI: [88.4%, 93.4%]) | 95.7% (95% CI: [93.6%, 97.2%]) | >80% CI | - |
Median Processing Time | 15.5 seconds | 196.8 seconds | Within clinical operational expectations | - |
Performance by Lesion Type (Soft Tissue Densities) - AUC | 0.964 (95% CI: [0.954, 0.974]) | 0.983 (95% CI: [0.977, 0.990]) | Similar performance across subcategories | - |
Performance by Lesion Type (Calcifications) - AUC | 0.973 (95% CI: [0.958, 0.988]) | 0.989 (95% CI: [0.983, 0.996]) | Similar performance across subcategories | - |
Performance by Breast Density (Dense) - AUC | 0.959 (95% CI: [0.945, 0.973]) | 0.980 (95% CI: [0.971, 0.988]) | Similar performance across subcategories | - |
Performance by Breast Density (Non-Dense) - AUC | 0.972 (95% CI: [0.961, 0.984]) | 0.988 (95% CI: [0.981, 0.996]) | Similar performance across subcategories | - |
2. Sample Size Used for the Test Set and Data Provenance
- FFDM Study Test Set:
- Malignant Exams: 501
- Normal Exams: 832
- Total: 1333
- DBT Study Test Set:
- Malignant Exams: 517
- Normal Exams: 1011
- Total: 1528
- Data Provenance:
- Country of Origin: United States (across two states)
- Retrospective or Prospective: Retrospective
- Sites: Data was collected from eight clinical sites for FFDM and six clinical sites for DBT. DeepHealth had never collected data from these sites previous to this study for either training or testing, ensuring an independent test set.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
- Number of Experts: Two independent expert radiologists.
- Qualifications of Experts: The document does not explicitly state the qualifications (e.g., years of experience) of the expert radiologists.
4. Adjudication Method for the Test Set
- Adjudication Method: 2+1 (Two independent expert radiologists reviewed each case. If discordance was observed between the two initial readers, an adjudicator was used to establish the final reference standard).
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? No, the document describes retrospective, blinded, multi-center studies to evaluate the standalone performance of Saige-Q. It does not mention a comparative effectiveness study involving human readers with and without AI assistance.
- Effect Size of Human Improvement with AI vs. Without AI Assistance: Not applicable, as no MRMC study was conducted to assess human reader improvement with AI assistance.
6. Standalone (Algorithm Only) Performance Study
- Was a standalone study done? Yes, the document explicitly states: "DeepHealth conducted two retrospective, blinded, multi-center studies to evaluate the standalone performance of Saige-Q..."
7. Type of Ground Truth Used
- Ground Truth Type:
- Malignant Exams: Confirmed using pathology reports from biopsied lesions.
- Normal Exams: Confirmed with a negative clinical interpretation (BI-RADS 1 or 2) followed by another negative clinical interpretation at least two years later.
- Expert Consensus: Each case in the test set was reviewed by two independent expert radiologists (and an adjudicator if discordance was observed) to establish the reference standard for each case, building upon the pathology/clinical follow-up.
8. Sample Size for the Training Set
- The document states that the AI algorithm was trained on "large numbers of mammograms where cancer status is known." However, it does not provide a specific sample size for the training set.
9. How the Ground Truth for the Training Set Was Established
- The document implies the ground truth for the training set was established based on "cancer status is known" for the mammograms used for training. While not explicitly detailed, this would typically involve a combination of:
- Pathology reports for confirmed cancers.
- Long-term clinical follow-up for confirmed benign cases.
It's also mentioned that the AI algorithm uses "deep neural networks that have been trained on large numbers of mammograms where cancer status is known," suggesting similar rigorous ground truth establishment as for the test set, but no specific methodology for the training set's ground truth is provided.
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.