(173 days)
MammoScreen™ is intended for use as a concurrent reading aid for interpreting physicians, to help identify findings on screening FFDM acquired with compatible mammography systems and assess their level of suspicion. Output of the device includes marks placed on findings on the mammogram and level of suspicion scores. The findings could be soft tissue lesions or calcifications. The level of suspicion score is expressed at the finding level, for each breast and overall for the mammogram. Patient management decisions should not be made solely on the basis of analysis by MammoScreen™.
MammoScreen is a software-only device for aiding interpreting physicians in identifying focal findings suspicious for breast cancer in screening FFDM (full-field digital mammography) acquired with compatible mammography systems. The product consists of a processing server and a web interface. The software applies algorithms for recognition of suspicious calcifications and soft tissue lesions. These algorithms have been trained on large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue. MammoScreen automatically processes FFDM and the output of the device can be used by radiologists concurrently with the reading of mammograms. The user interface of MammoScreen has several functions: a) Activation of computer aided detection (CAD) marks to highlight locations, known as findings, where the device detected calcifications or soft tissue lesions suspicious for cancer. b) Association of findings with a score, known as the MammoScreen Score, which characterizes findings on a 1-10 scale, with increasing level of suspicion. Only the most suspicious findings (with a MammoScreen score equal or greater than 5) are initially marked to limit the number of findings to review. The user shall also review findings with score of 4 or lower. c) Indication, with matching markers, when findings corresponding to the same findings are detected in multiple views of the FFDM. MammoScreen is configured as a DICOM Web compliant node in a network and receives its input images from another DICOM node, called "the DICOM Web Server". The MammoScreen output will be displayed on the screen of a personal computer compliant with requirements specified in the User Manual. The image analysis unit includes machine learning components trained to detect positive findings (calcifications and soft tissue lesions).
Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:
Acceptance Criteria and Device Performance
The provided document defines acceptance criteria primarily through comparison with a predicate device and through the results of a clinical reader study. The core acceptance criterion for the clinical study appears to be an improvement in radiologist performance when using MammoScreen assistance compared to unaided reading.
Table of Acceptance Criteria and Reported Device Performance
Criterion Type | Specific Criterion | Reported Device Performance (MammoScreen) | Met? |
---|---|---|---|
Premarket Equivalence (vs. Predicate Device K181704 Transpara) | |||
Classification Regulation | 21 CFR 892.2090 | SAME | Yes |
Medical Device Class | Class II | SAME | Yes |
Product Code | QDQ | SAME | Yes |
Level of Concern | Moderate | SAME | Yes |
Intended Use | Concurrent reading aid for physicians interpreting screening FFDM to identify findings and assess their level of suspicion. | SAME | Yes |
Target patient population | Women undergoing FFDM screening mammography. | SAME | Yes |
Target user population | Physicians interpreting FFDM screening mammograms. | SAME | Yes |
Design | Software-only device. | SAME | Yes |
Scoring System | While not identical, the principle (level of suspicion from low to high) should be substantially equivalent. | 10-point scale vs. predicate's 1-100. Manufacturer claims interpretability benefits. Exam-level score provided. Deemed "substantially equivalent." | Yes |
Finding Discovery | Reducing the number of findings the user has to review. | Default display for scores ≥ 5, user request for scores ≤ 4. Deemed "equivalent." | Yes |
Performance Comparison | Overall performance gains should be comparable and not raise new safety/effectiveness questions. | AUC: unaided = 0.769, assisted = 0.798 (Difference: 0.028; P = 0.035). Predicate reported unaided = 0.866, assisted = 0.887. Deemed "still comparable." | Yes |
Fundamental Scientific Technology | Involves medical image processing and machine learning, particularly deep learning for suspicious findings. | SAME | Yes |
Clinical Performance (Reader Study) | |||
Radiologist Performance | Radiologist performance with MammoScreen assistance is superior to unaided performance (main objective). | Average AUC improved from 0.769 (unaided) to 0.798 (with MammoScreen) (Difference = 0.028; P = 0.035). | Yes |
Reading Time | Should not significantly increase. | Average reading time increased by 14% for scores > 4, but decreased by 2% for scores ≤ 4 in the second session. Overall, maximum increase did not exceed 15s. | Yes |
Standalone Performance | Non-inferior to average unaided radiologist performance. | Standalone AUC = 0.790; Non-inferior to average unaided radiologist AUC = 0.770 (absence of statistical effect (p>0.05) and lower CI of diff > -0.03). | Yes |
Sensitivity | Sensitivity of readers tended to increase with the use of MammoScreen without decreasing specificity (conclusion statement). | Reported overall performance improvement was statistically significant at breast (AUC) and lesion (pAUC) level, confirming trend. Specific values not explicitly in acceptance criteria here. | Yes |
Study Details for Device Acceptance
-
Sample Size Used for the Test Set and Data Provenance:
- Test Set Size: 240 mammographic screening images (cases).
- Data Provenance: Acquired at a US center. The text states "US FFDM acquired on Hologic® devices, and performance comparison with FFDM acquired on GE® devices," indicating images from at least two major mammography system manufacturers in the US.
- Retrospective/Prospective: Retrospective. The study "collected" images after they were acquired, and "For each exam, the cancer status has been verified... and used as gold standard."
-
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts:
- The document does not explicitly state the number of experts used to establish the ground truth or their specific qualifications (e.g., years of experience). It only states that "the cancer status has been verified by either biopsy results (for all cancer positive cases and some of the negative cases) or an adequate follow-up (for negative cases only) and used as gold standard." This implies clinical data and follow-up was the primary ground truth, not consensus of a specific number of experts.
-
Adjudication Method for the Test Set:
- The document does not explicitly describe an adjudication method for establishing ground truth from multiple expert reads. Ground truth was established via biopsy or adequate follow-up, which are objective clinical outcomes, not subjective reader interpretations.
-
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- Yes, an MRMC study was performed.
- Effect Size of Human Readers Improvement with AI vs. Without AI Assistance:
- Average AUC: Increased from 0.769 (unaided) to 0.798 (with MammoScreen assistance).
- Difference: 0.028 (P = 0.035), indicating a statistically significant improvement.
- The AUC was higher with MammoScreen aid for 11 of the 14 radiologists.
- Performance improvement was also statistically significant at the breast (in terms of AUC) and lesion (in terms of pAUC) level.
-
Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study:
- Yes, a standalone performance study was conducted.
- Standalone Performance: MammoScreen's standalone performance (AUC = 0.790) was found to be non-inferior to the average performance of unaided radiologists (AUC = 0.770). The lower confidence interval of the difference of AUC was equal to or superior to the effect size (-0.03), and the P-value was >0.05, confirming non-inferiority.
- Detailed standalone performance metrics were also provided for mammogram, breast, and finding levels (soft tissue lesions and calcifications), including ROC AUC, sensitivity, and specificity for Hologic, GE, and combined datasets.
-
Type of Ground Truth Used:
- Clinical Outcomes Data: The primary ground truth was established by:
- Biopsy results (for all cancer-positive cases and some negative cases).
- Adequate follow-up (for negative cases only).
- Clinical Outcomes Data: The primary ground truth was established by:
-
Sample Size for the Training Set:
- The document states that the algorithms "have been trained on large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue." However, it does not specify the exact sample size of the training set.
-
How the Ground Truth for the Training Set Was Established:
- The ground truth for the training set was established using "biopsy proven examples of breast cancer, benign lesions and normal tissue." This implies a similar methodology to the test set, relying on objective clinical outcomes (histopathology from biopsy) rather than expert consensus on images.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.