K Number
K211541
Device Name
MammoScreen 2.0
Manufacturer
Date Cleared
2021-11-26

(191 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

MammoScreen® is intended for use as a concurrent reading aid for interpreting physicians, to help identify findings on screening FFDM and DBT acquired with compatible mammography systems and assess their level of suspicion. Output of the device includes marks placed on findings on the mammogram and level of suspicion scores. The findings could be soft tissue lesions or calcifications. The level of suspicion score is expressed at the finding level, for each breast and overall for the mammogram. Patient management decisions should not be made solely on the basis of analysis by MammoScreen®.

Device Description

MammoScreen 2.0 automatically processes the four views (one CC and one MLO per breast) of standard screening FFDM or DBT, and outputs a corresponding report on a separate screen, alongside the monitors used for reading. This report is designed to be easily readable with very few interactions required by providing an overall level of suspicion of each exam and giving explicit visual indications when highly suspicious exams are detected.

MammoScreen 2.0 detects and characterizes findings on a scale from one to ten, referred to as the MammoScreen score. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion.

Furthermore, MammoScreen 2.0 provides a high level of interpretability. Results are by construction consistent at the finding, breast and mammogram level. A breast takes on the highest score of its detected findings, and the level of suspicion for the exam is driven by the breast(s) with the highest score. Therefore, it is always possible to track a high suspicion of malignancy for an exam to the corresponding breast(s), and to a specific finding within the breast(s).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

Performance MetricAcceptance Criteria (Implicit)Reported Device Performance (FFDM)Reported Device Performance (DBT)
Radiologist Performance with AID (AUC)Superior to unaided radiologist performanceIncreased from 0.77 to 0.80Increased from 0.79 to 0.83
Standalone Performance (AUC)Non-inferior to unaided radiologist performance0.79 (non-inferior to 0.77 unaided)0.84 (superior to 0.79 unaided)
Standalone Performance vs. Predicate (FFDM)Non-inferior to predicate deviceAchieved non-inferior performanceNot applicable

2. Sample Size Used for the Test Set and Data Provenance

  • Sample Size (FFDM & DBT): 240 cases (enriched sample set)
  • Data Provenance: Not explicitly stated regarding country of origin. The studies are described as "reader studies," implying prospective collection for the purpose of the study or a curated retrospective selection. The text doesn't specify if it's purely retrospective or prospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

  • Number of Experts: 14 for the 2D (FFDM) study and 20 for the 3D (DBT) study.
  • Qualifications: "MOSA-qualified and ABR-certified readers." (MOSA and ABR are common certifications for radiologists in the US, suggesting a US context for the experts).

4. Adjudication Method for the Test Set

The provided text does not explicitly state the adjudication method used to establish the ground truth for the test set. It mentions "enriched sample set" and "MOSA-qualified and ABR-certified readers," suggesting expert consensus, but the specific process (e.g., 2+1, 3+1) is not detailed.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance

  • Yes, an MRMC study was done. Clinical validation included two reader studies (one for FFDM and one for DBT) using a multi-reader multi-case (MRMC) cross-over design.
  • Effect Size of Improvement:
    • FFDM: Average AUC for radiologists increased from 0.77 (without AI) to 0.80 (with AI). (Improvement: 0.03 AUC)
    • DBT: Average AUC for radiologists increased from 0.79 (without AI) to 0.83 (with AI). (Improvement: 0.04 AUC)

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

  • Yes, standalone performance was evaluated. The objectives of the studies included determining: "Whether the performance of MammoScreen standalone is superior to unaided radiologist performance" and "Whether the performance of MammoScreen standalone is non-inferior to aided radiologist performance."
  • Standalone Performance Results:
    • FFDM: AUC = 0.79 (found to be non-inferior to the average unaided radiologists' performance of 0.77).
    • DBT: AUC = 0.84 (found to be superior to the average unaided radiologists' performance of 0.79).
    • Additionally, standalone performance tests for MammoScreen 2.0 (FFDM) demonstrated non-inferiority compared to the predicate device.

7. The Type of Ground Truth Used

The text implicitly suggests expert consensus based on the mention of "MOSA-qualified and ABR-certified readers." It also references the training of deep learning modules with "biopsy-proven examples of breast cancer and normal tissue," indicating that biopsy (pathology) results were used as the ultimate ground truth to establish the benign/malignant status of lesions in the training data, and likely in the test set's ground truth development as well. The study assesses performance in the "detection of breast cancer," linking the ground truth directly to malignancy.

8. The Sample Size for the Training Set

The document states that the deep learning modules were "trained with very large databases of biopsy-proven examples of breast cancer and normal tissue." However, a specific numerical sample size for the training set is not provided.

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set was established using "biopsy-proven examples of breast cancer and normal tissue." This indicates that histopathological (pathology) results from biopsies served as the definitive ground truth for classifying cases as cancerous or normal during the training of the AI model.

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.