Search Results

MammoScreen® 4 is a concurrent reading and reporting aid for physicians interpreting screening mammograms. It is intended for use with compatible full-field digital mammography and digital breast tomosynthesis systems. The device can also use compatible prior examinations in the analysis.

Output of the device includes graphical marks of findings as soft-tissue lesions or calcifications on mammograms along with their level of suspicion scores. The lesion type is characterized as mass/asymmetry, distortion, or calcifications for each detected finding. The level of suspicion score is expressed at the finding level, for each breast, and overall for the mammogram.

The location of findings, including quadrant, depth, and distance from the nipple, is also provided. This adjunctive information is intended to assist interpreting physicians during reporting.

Patient management decisions should not be made solely based on the analysis by MammoScreen 4.

Device Description

MammoScreen 4 is a concurrent reading medical software device using artificial intelligence to assist radiologists in the interpretation of mammograms.

MammoScreen 4 processes the mammogram(s) and detects findings suspicious for breast cancer. Each detected finding gets a score called the MammoScreen Score™. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion. For each mammogram, MammoScreen 4 outputs the detected findings with their associated score, a score per breast, driven by the highest finding score for each breast, and a score per case, driven by the highest finding score overall. The MammoScreen Score goes from one to ten.

MammoScreen 4 is available for 2D (FFDM images) and 3D processing (FFDM & DBT or 2DSM & DBT). Optionally, MammoScreen 4 can use prior examinations in the analysis.

The results indicating potential breast cancer, identified by MammoScreen 4, are accessible via a dedicated user interface and can seamlessly integrate into DICOM viewers (using DICOM-SC and DICOM-SR). Reporting aid outputs can be incorporated into the practice's reporting system to generate a preliminary report.

Note that the MammoScreen 4 outputs should be used as complementary information by radiologists while interpreting mammograms. For all cases, the medical professional interpreting the mammogram remains the sole decision-maker.

AI/ML Overview

The provided text describes the acceptance criteria and a study to prove that MammoScreen® 4 meets these criteria. Here is a breakdown of the requested information:

Acceptance Criteria and Device Performance

1. Table of Acceptance Criteria and Reported Device Performance

Rationale for using "MammoScreen 2" data for comparison: The document states that the standalone testing for MammoScreen 4 compared its performance against "MammoScreen 2 on Dimension". While MammoScreen 3 is the predicate device, the provided performance data in the standalone test section specifically refers to MammoScreen 2. The PCCP section later references performance targets for MammoScreen versions 1, 2, and 3, but the actual "Primary endpoint" results for the current device validation are given in comparison to MammoScreen 2. Therefore, the table below uses the reported performance against MammoScreen 2 as per the "Primary endpoint" section.

Metric	Acceptance Criteria	Reported Device Performance (MammoScreen 4 vs. MammoScreen 2)
Primary Objective	Non-inferiority in standalone cancer detection performance compared to the previous version of MammoScreen (specifically MammoScreen 2 on Dimension).	Achieved.
AUC at the mammogram level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.894 (0.870, 0.919) MS2: 0.867 (0.839, 0.896) Δ: 0.027 (0.002, 0.052), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC at the breast level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.919 (0.897, 0.941) MS2: 0.895 (0.871, 0.920) Δ: 0.023 (0.002, 0.045), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC LROC at the finding level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.891 (0.862, 0.921) MS2: 0.837 (0.797, 0.877) Δ: 0.055 (0.032, 0.077), p<0.0001 (Lower bound of difference 0.032 is positive, meeting criteria)

Study Details

2. Sample size used for the test set and the data provenance

Sample Size: 1,475 patients, leading to 2,950 included studies (each patient underwent a DBT acquisition with two Hologic mammography systems).
Data Provenance: The document explicitly mentions "Data provenance" as a considered subgroup for analysis but does not specify the country of origin. It indicates that the data for standalone performance testing only belonged to the "test group," which means it was "unseen data" from sources entirely left out during training and tuning. The study appears to be retrospective as it uses existing patient data.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

The document states that for the clinical testing (MRMC studies), "MQSA-qualified and ACR-certified readers" were used. However, for the standalone performance testing (which is where the ground truth for the algorithm's performance is established), the document only describes the "Truthing process" and does not specify the number or qualifications of experts involved in establishing the ground truth.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set

The document describes the "Truthing process" for the standalone performance testing but does not specify an adjudication method involving multiple readers. The ground truth establishment is described as:

Positive cases: biopsy-proven presence of cancer.
Benign cases: cases confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: verified by imaging follow-up.

This indicates a reliance on clinical outcomes/pathology rather than reader consensus for ground truth for the standalone performance data.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Was an MRMC study done? Yes, "Clinical Testing" section explicitly states: "The clinical validation of MammoScreen 4 includes three multi-reader multi-case (MRMC) studies: One for FFDM, One for DBT, One for combined DBT and 2D mammograms (FFDM or 2DSM), and using prior examinations."
Effect size of improvement: The document states, "The studies demonstrated the superiority of the Area Under the Receiver Operating Characteristic Curve of the radiologist using the MammoScreen algorithm compared to the unaided radiologist." However, specific effect sizes (e.g., AUC difference, confidence intervals) for the human reader performance improvement with AI assistance versus without AI assistance are not provided in the excerpt. Only the result of superiority is mentioned, not the quantitative measure of that superiority.

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done

Was a standalone study done? Yes. The section "The standalone performance testing carried out to validate the device is summarized in what follows:" directly addresses this.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

For the standalone performance testing:

Positive cases: Biopsy-proven presence of cancer.
Benign cases: Confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: Verified by imaging follow-up.

This indicates a mix of pathology (biopsy) and outcomes data (imaging follow-up) as the ground truth.

8. The sample size for the training set

The document states: "Sources in the training/tuning group may only be used for model training and tuning. Sources in the test group may only be used for external validation of the model's performances on unseen data (i.e., from sources entirely left out during training and tuning)." However, it does not provide the specific sample size for the training set. It only implies it was "very large databases."

9. How the ground truth for the training set was established

The document states: "These modules are trained with very large databases of biopsy-proven examples of breast cancer and normal tissue." This implies that the ground truth for the training set was primarily established through biopsy results for cancerous cases and likely outcomes/clinical confirmation for normal or benign cases, similar to the test set ground truth. However, detailed methodology on training set ground truth establishment is not provided beyond "biopsy-proven examples."

Ask a Question

Ask a specific question about this device

Page 1 of 1