Search Results

MammoScreen® BD is a software application intended for use with compatible full-field digital mammography and digital breast tomosynthesis systems. MammoScreen BD evaluates the breast tissue composition to provide an ACR BI-RADS 5th Edition breast density category. The device is intended to be used in the population of asymptomatic women undergoing screening mammography who are at least 40 years old.

MammoScreen BD only produces adjunctive information to aid interpreting physicians in the assessment of breast tissue composition. It is not a diagnostic software.

Patient management decisions should not be made solely based on analysis by MammoScreen BD.

Device Description

MammoScreen BD is a software-only device (SaMD) using artificial intelligence to assist radiologists in the interpretation of mammograms. The purpose of the MammoScreen BD software is to automatically process a mammogram to assess the density of the breasts.

MammoScreen BD processes the 2D-mammograms standard views (CC and/or MLO of FFDM and/or the 2DSM from the DBT) to assess breast density.

For each examination, MammoScreen BD outputs the breast density following the ACR BI-RADS 5th Edition breast density category.

MammoScreen BD outputs can be integrated with compatible third-party software such as MammoScreen Suite. Results may be displayed in a web UI, as a DICOM Structured Report, a DICOM Secondary Capture Image, or within patient worklists by the third-party software.

MammoScreen BD takes as input a folder with images in DICOM formats and outputs breast density assessment in a form of a JSON file.

Note that the MammoScreen BD outputs should be used as complementary information by radiologists while interpreting breast density. Patient management decisions should not be made solely on the basis of analysis by MammoScreen BD, the medical professional interpreting the mammogram remains the sole decision-maker.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves MammoScreen BD meets them, based on the provided FDA 510(k) clearance letter:

Acceptance Criteria and Device Performance Study

The study primarily focuses on the standalone performance of MammoScreen BD in assessing breast density against an expert consensus Ground Truth. The key metric for performance is the quadratically weighted Cohen's Kappa (${\kappa}$).

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria	Reported Device Performance
Primary Objective: Superiority in standalone performance for density assignment of MammoScreen BD compared to a pre-determined reference value (${\kappa_{\text{reference}} = 0.85}$).	Hologic: ${\kappa_{\text{quadratic}} = 89.03}$ [95% CI: 87.43 – 90.56]
Acceptance Criteria (Statistical): The one-sided p-value for the test $H_0: \kappa \leq 0.85$ is less than the significance level ($\alpha=0.05$) AND the lower bound of the 95% confidence interval for Kappa $> 0.85$, indicating that the observed weighted Kappa is statistically significantly greater than 0.85.	Hologic Envision: ${\kappa_{\text{quadratic}} = 89.54}$ [95% CI: 86.88 – 91.69]
	GE: ${\kappa_{\text{quadratic}} = 93.19}$ [95% CI: 90.50 – 94.92]

All reported Kappa values exceed the reference value of 0.85, and their 95% confidence intervals' lower bounds are also above 0.85, satisfying the acceptance criteria.

2. Sample Size and Data Provenance

Test Set:

Hologic (original dataset): 922 patients / 1,155 studies
Hologic Envision (new system for subject device): 500 patients / 500 studies
GE (new system for subject device): 376 patients / 490 studies

Data Provenance:

Hologic (original dataset):
- USA: 658 studies (distributed as A:85, B:269, C:241, D:63)
- EU: 447 studies (distributed as A:28, B:169, C:214, D:86)
Hologic Envision: USA: 500 studies (distributed as A:50, B:200, C:200, D:50)
GE:
- USA: 359 studies (distributed as A:38, B:155, C:139, D:31)
- EU: 129 studies (distributed as A:4, B:45, C:61, D:19)

All data for the test sets appears to be retrospective, as it's stated that the "Data used for the standalone performance testing only belongs to the test group" and is distinct from the training data.

3. Number of Experts and Qualifications for Ground Truth

Number of Experts: 5 breast radiologists
Qualifications: At least 10 years of experience in breast imaging interpretation.

4. Adjudication Method for the Test Set

The ground truth was established by majority rule among the assessment of the 5 breast radiologists. This implies a 3-out-of-5 or more agreement for a given breast density category to be assigned as ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

There is no mention of an MRMC comparative effectiveness study being performed to assess how much human readers improve with AI vs. without AI assistance. The study focuses solely on the standalone performance of the AI algorithm. The device is described as "adjunctive information to aid interpreting physicians," but its effect on radiologist performance isn't quantified in this document.

6. Standalone Performance (Algorithm Only)

Yes, a standalone performance study was explicitly conducted. The results for the quadratically weighted Cohen's Kappa presented in the table above (89.03 for Hologic, 89.54 for Hologic Envision, and 93.19 for GE) are all for the algorithm's performance only ("MammoScreen BD against the radiologist consensus assessment").

7. Type of Ground Truth Used

The ground truth used was expert consensus based on the visual assessment of 5 breast radiologists.

8. Sample Size for the Training Set

Total number of studies: 108,775
Total number of patients: 32,368

9. How the Ground Truth for the Training Set was Established

The document states that the training modules are "trained with very large databases of annotated mammograms." While "annotated" implies ground truth was established, the specific method for establishing ground truth for the training set is not detailed in the provided text. It only specifies the ground truth establishment method for the test set (majority rule of 5 radiologists). It's common for training data to use various methods for annotation, which might differ from the rigorous expert consensus used for the test set.

Ask a Question

Ask a specific question about this device

K Number

K243679

Device Name

MammoScreen® (4)

Manufacturer

Therapixel

Date Cleared

2025-07-03

(216 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K192854,K211541,K240301

Predicate For

N/A

Intended Use

MammoScreen® 4 is a concurrent reading and reporting aid for physicians interpreting screening mammograms. It is intended for use with compatible full-field digital mammography and digital breast tomosynthesis systems. The device can also use compatible prior examinations in the analysis.

Output of the device includes graphical marks of findings as soft-tissue lesions or calcifications on mammograms along with their level of suspicion scores. The lesion type is characterized as mass/asymmetry, distortion, or calcifications for each detected finding. The level of suspicion score is expressed at the finding level, for each breast, and overall for the mammogram.

The location of findings, including quadrant, depth, and distance from the nipple, is also provided. This adjunctive information is intended to assist interpreting physicians during reporting.

Patient management decisions should not be made solely based on the analysis by MammoScreen 4.

Device Description

MammoScreen 4 is a concurrent reading medical software device using artificial intelligence to assist radiologists in the interpretation of mammograms.

MammoScreen 4 processes the mammogram(s) and detects findings suspicious for breast cancer. Each detected finding gets a score called the MammoScreen Score™. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion. For each mammogram, MammoScreen 4 outputs the detected findings with their associated score, a score per breast, driven by the highest finding score for each breast, and a score per case, driven by the highest finding score overall. The MammoScreen Score goes from one to ten.

MammoScreen 4 is available for 2D (FFDM images) and 3D processing (FFDM & DBT or 2DSM & DBT). Optionally, MammoScreen 4 can use prior examinations in the analysis.

The results indicating potential breast cancer, identified by MammoScreen 4, are accessible via a dedicated user interface and can seamlessly integrate into DICOM viewers (using DICOM-SC and DICOM-SR). Reporting aid outputs can be incorporated into the practice's reporting system to generate a preliminary report.

Note that the MammoScreen 4 outputs should be used as complementary information by radiologists while interpreting mammograms. For all cases, the medical professional interpreting the mammogram remains the sole decision-maker.

AI/ML Overview

The provided text describes the acceptance criteria and a study to prove that MammoScreen® 4 meets these criteria. Here is a breakdown of the requested information:

Acceptance Criteria and Device Performance

1. Table of Acceptance Criteria and Reported Device Performance

Rationale for using "MammoScreen 2" data for comparison: The document states that the standalone testing for MammoScreen 4 compared its performance against "MammoScreen 2 on Dimension". While MammoScreen 3 is the predicate device, the provided performance data in the standalone test section specifically refers to MammoScreen 2. The PCCP section later references performance targets for MammoScreen versions 1, 2, and 3, but the actual "Primary endpoint" results for the current device validation are given in comparison to MammoScreen 2. Therefore, the table below uses the reported performance against MammoScreen 2 as per the "Primary endpoint" section.

Metric	Acceptance Criteria	Reported Device Performance (MammoScreen 4 vs. MammoScreen 2)
Primary Objective	Non-inferiority in standalone cancer detection performance compared to the previous version of MammoScreen (specifically MammoScreen 2 on Dimension).	Achieved.
AUC at the mammogram level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.894 (0.870, 0.919) MS2: 0.867 (0.839, 0.896) Δ: 0.027 (0.002, 0.052), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC at the breast level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.919 (0.897, 0.941) MS2: 0.895 (0.871, 0.920) Δ: 0.023 (0.002, 0.045), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC LROC at the finding level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.891 (0.862, 0.921) MS2: 0.837 (0.797, 0.877) Δ: 0.055 (0.032, 0.077), p<0.0001 (Lower bound of difference 0.032 is positive, meeting criteria)

Study Details

2. Sample size used for the test set and the data provenance

Sample Size: 1,475 patients, leading to 2,950 included studies (each patient underwent a DBT acquisition with two Hologic mammography systems).
Data Provenance: The document explicitly mentions "Data provenance" as a considered subgroup for analysis but does not specify the country of origin. It indicates that the data for standalone performance testing only belonged to the "test group," which means it was "unseen data" from sources entirely left out during training and tuning. The study appears to be retrospective as it uses existing patient data.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

The document states that for the clinical testing (MRMC studies), "MQSA-qualified and ACR-certified readers" were used. However, for the standalone performance testing (which is where the ground truth for the algorithm's performance is established), the document only describes the "Truthing process" and does not specify the number or qualifications of experts involved in establishing the ground truth.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set

The document describes the "Truthing process" for the standalone performance testing but does not specify an adjudication method involving multiple readers. The ground truth establishment is described as:

Positive cases: biopsy-proven presence of cancer.
Benign cases: cases confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: verified by imaging follow-up.

This indicates a reliance on clinical outcomes/pathology rather than reader consensus for ground truth for the standalone performance data.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Was an MRMC study done? Yes, "Clinical Testing" section explicitly states: "The clinical validation of MammoScreen 4 includes three multi-reader multi-case (MRMC) studies: One for FFDM, One for DBT, One for combined DBT and 2D mammograms (FFDM or 2DSM), and using prior examinations."
Effect size of improvement: The document states, "The studies demonstrated the superiority of the Area Under the Receiver Operating Characteristic Curve of the radiologist using the MammoScreen algorithm compared to the unaided radiologist." However, specific effect sizes (e.g., AUC difference, confidence intervals) for the human reader performance improvement with AI assistance versus without AI assistance are not provided in the excerpt. Only the result of superiority is mentioned, not the quantitative measure of that superiority.

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done

Was a standalone study done? Yes. The section "The standalone performance testing carried out to validate the device is summarized in what follows:" directly addresses this.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

For the standalone performance testing:

Positive cases: Biopsy-proven presence of cancer.
Benign cases: Confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: Verified by imaging follow-up.

This indicates a mix of pathology (biopsy) and outcomes data (imaging follow-up) as the ground truth.

8. The sample size for the training set

The document states: "Sources in the training/tuning group may only be used for model training and tuning. Sources in the test group may only be used for external validation of the model's performances on unseen data (i.e., from sources entirely left out during training and tuning)." However, it does not provide the specific sample size for the training set. It only implies it was "very large databases."

9. How the ground truth for the training set was established

The document states: "These modules are trained with very large databases of biopsy-proven examples of breast cancer and normal tissue." This implies that the ground truth for the training set was primarily established through biopsy results for cancerous cases and likely outcomes/clinical confirmation for normal or benign cases, similar to the test set ground truth. However, detailed methodology on training set ground truth establishment is not provided beyond "biopsy-proven examples."

Ask a Question

Ask a specific question about this device

K Number

K241561

Device Name

MammoScreen BD

Manufacturer

Therapixel

Date Cleared

2024-10-02

(124 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K202013

Predicate For

K243685

Intended Use

MammoScreen® BD is a software application intended for use with compatible full field digital mammography and digital breast tomosynthesis systems. MammoScreen BD evaluates the breast tissue composition to provide an ACR BI-RADS 5th Edition breast density category. The device is intended to be used in the population of asymptomatic women undergoing screening mammography who are at least 40 years old.

MammoScreen BD only produces adjunctive information to aid interpreting physicians in the assessment of breast tissue composition. It is not a diagnostic software.

Patient management decisions should not be made solely based on analysis by MammoScreen BD.

Device Description

MammoScreen BD is a software-only device (SaMD) using artificial intelligence to assist radiologists in the interpretation of mammograms. The MammoScreen BD software is to automatically process a mammogram to assess the density of the breasts.

For each examination, MammoScreen BD outputs the breast density in accordance with the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) Atlas 5th Edition breast density categories "A" through "D".

MammoScreen BD takes as input a folder with images in DICOM formats and outputs a breast density assessment in a form of a JSON file. MammoScreen BD outputs can be integrated with compatible third-party software such as the MammoScreen Web-UI interface, PACS viewer (using DICOM Structured Report or DICOM Secondary Capture SOP Class UIDs), patient worklists, or within reporting software.

AI/ML Overview

Here is a detailed breakdown of the acceptance criteria and study information for MammoScreen BD, based on the provided document:

1. Table of Acceptance Criteria and Reported Device Performance

The primary acceptance criteria for the initial clearance of MammoScreen BD were related to the accuracy and agreement with ground truth established by radiologists for classifying breast density into four BI-RADS categories.

Acceptance Criteria (from PCCP section for future modifications)	Primary Objective Reported Device Performance (4-class task)	Primary Objective Reported Device Performance (Binary task)
Quadratic Kappa on GE mammograms superior to 0.85	Quadratic Cohen's Kappa: 89.03 (95% CI: 87.43 - 90.56)	Quadratic Cohen's Kappa: 84.50 (95% CI: 81.46, 87.36)
Linear Kappa, Accuracy, and Density Bins (A, B, C, D)	Accuracy: 84.68 (95% CI: 82.68, 86.67)	Accuracy: 92.29 (95% CI: 90.82, 93.77)

Note: The document explicitly states "Acceptance criteria of the updated device" under the PCCP for future modifications. While the document does not explicitly state the acceptance criteria for the initial clearance in a separate section, the reported performance metrics (Quadratic Cohen's Kappa and Accuracy for both 4-class and binary classification) are implicitly the metrics against which the device's performance was judged for its initial clearance, demonstrating its effectiveness based on comparison to the ground truth.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: 922 women/exams. (Total of 922 exams with 4 views each).
Data Provenance: Retrospectively collected from two US screening centers and one French screening center.
- 52.6% of cases (485 patients) originated from the USA.
- 47.4% of cases (437 patients) originated from France.
- The provenance did not intersect any clinical centers used for algorithm development, mitigating a center-induced bias.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

Number of Experts: 5 breast radiologists.
Qualifications of Experts: The document specifies "5 breast radiologists" but does not provide details on their years of experience or specific board certifications.

4. Adjudication Method for the Test Set

Adjudication Method: "Consensus among the visual assessment of 5 breast radiologists." The exact method (e.g., majority vote, sequential review with tie-breaking) is not explicitly detailed beyond "consensus."

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a multi-reader multi-case (MRMC) comparative effectiveness study evaluating human readers with AI assistance versus without AI assistance was not conducted or reported in this document. The study focuses on the standalone performance of the AI algorithm against expert consensus.

6. Standalone (Algorithm Only) Performance Study

Yes, a standalone performance study was conducted. The "Primary Objectives" and "Performance Data" sections directly evaluate "the accuracy and the reproducibility of MammoScreen BD algorithm in assessing the breast density category" in terms of agreement with the ground truth established by the consensus of 5 radiologists.
- For the 4-class task, the algorithm achieved a quadratic Cohen's kappa of 89.03 and an accuracy of 84.68%.
- For the binary classification task (dense vs. non-dense), the algorithm achieved a quadratic Cohen's kappa of 84.50 and an accuracy of 92.29%.

7. Type of Ground Truth Used

Type of Ground Truth: Expert Consensus. Specifically, "ground truth (GT) established by consensus among the visual assessment of 5 breast radiologists."

8. Sample Size for the Training Set

Sample Size for Training Set: 32,368 patients, comprising 108,775 studies.

9. How Ground Truth for the Training Set Was Established

The document states that the training data was derived from "De-identified screening mammograms... retrospectively collected from 32,368 patients in 2 different US sites."
It does not explicitly state how the ground truth for the training set was established. It only describes the density distribution (A: 12.79%, B: 34.58%, C: 42.94%, D: 9.38%) within the training data, implying these were pre-existing labels. It's common for such labels to be derived from radiologist reports or existing clinical records, but the specific method of ground truth establishment for the training set is not detailed.

Ask a Question

Ask a specific question about this device

K Number

K240301

Device Name

MammoScreen® (3)

Manufacturer

Therapixel

Date Cleared

2024-08-01

(182 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K211541

Predicate For

K243679

Intended Use

MammoScreen® 3 is a concurrent reading and reporting aid for physicians interpreting screening mammograms. It is intended for use with compatible full-field digital mammography and digital breast tomosynthesis systems. The device can also use compatible prior examinations in the analysis.

Output of the device includes graphical marks of findings as soft-tissue lesions or calcifications on mammograms along with their level of suspicion scores. The lesion type is characterized as mass/ asymmetry, distortion, or calcifications for each detected finding. The level of suspicion score is expressed at the finding level, for each breast, and overall for the mammogram.

The location of findings including quadrant, depth, and distance from the nipple, is also provided. This adjunctive information is intended to assist interpreting physicians during reporting.

Patient management decisions should not be made solely based on the analysis by MammoScreen 3.

Device Description

MammoScreen is a concurrent reading medical software device using artificial intelligence to assist radiologists in the interpretation of mammograms.

MammoScreen processes the mammogram(s) and detects findings suspicious for breast cancer. Each detected finding gets a score called the MammoScreen Score™. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion. For each mammogram, MammoScreen outputs detected findings with their associated score, a score per breast, driven by the highest finding score for each breast, and a score per case, driven by the highest finding score overall. The MammoScreen Score goes from one to ten.

MammoScreen is available for 2D (FFDM images) and 3D processing (FFDM & DBT or 2DSM & DBT). Optionally, MammoScreen can use prior examinations in the analysis.

MammoScreen can also aid in the reporting process by populating an initial report with chosen findings, including lesion type and position (quadrant, depth and distance to nipple).

The results indicating potential breast cancer, identified by MammoScreen, are accessible via a dedicated user interface and can seamlessly integrate into DICOM viewers (using DICOM-SC and DICOM-SR). Reporting aid outputs can be incorporated into the practice's reporting system to generate a preliminary report. Additionally, certain outputs like the case score can be reported into the patient management worklist.

Note that the MammoScreen outputs should be used as complementary information by radiologists while interpreting mammograms. For all cases, the medical professional interpreting the mammogram remains the sole decision-maker.

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly listed in a separate table within the document. However, the clinical and standalone performance studies establish benchmarks and demonstrate achievement of certain levels of accuracy, sensitivity, and specificity. The criteria are implied through the statement "MammoScreen 3 achieved superior performance compared to the predicate device" and the detailed statistical results provided.

Table of Performance Results

Given that specific "acceptance criteria" (e.g., "AUROC must be > X") are not explicitly stated, I will present the reported performance of MammoScreen 3 in both co-reading and standalone modes, along with improvements (effect sizes) in the co-reading scenario.

Performance Metric	Acceptance Criteria (Implied)	MammoScreen 3 (Co-reading with Radiologists)	MammoScreen 3 (Standalone)	Notes
Radiologist Performance (Co-reading)	Superior to unaided radiologist performance
Average AUROC (aided)	Higher than unaided	0.871 [0.829 - 0.912]	N/A	Unaided: 0.797 [0.752 - 0.843]
Average Sensitivity (aided)	Higher than unaided	0.793 [0.725 - 0.860]	N/A	Unaided: 0.706 [0.633 - 0.780]
Average Specificity (aided)	Higher than unaided	0.836 [0.805 - 0.867]	N/A	Unaided: 0.815 [0.782 - 0.848]
Standalone Performance (overall mammogram level)	Superior to unaided radiologists; Non-inferior to aided radiologists	N/A	0.883 [0.837 - 0.929]	Superior to unaided: ΔAUROC = +0.085 (p < 0.0001)
Standalone Sensitivity		N/A	0.833 [0.756 – 0.911]
Standalone Specificity		N/A	0.793 [0.728 – 0.858]
Standalone Performance (Detailed - Overall Mammogram Level)		N/A	0.927 (0.911, 0.942)	For breast cancer detection, overall.
Standalone Performance (Lesion Type Assessment)				Positive Percentage Agreement (PPA) & Negative Percentage Agreement (NPA)
Overall PPA		N/A	0.784, (0.758, 0.811)
Overall NPA		N/A	0.893, (0.880, 0.906)
Mass/asymmetry PPA		N/A	0.868, (0.838, 0.894)
Mass/asymmetry NPA		N/A	0.783, (0.752, 0.815)
Distortion PPA		N/A	0.544, (0.475, 0.611)
Distortion NPA		N/A	0.947, (0.932, 0.962)
Calcifications PPA		N/A	0.941, (0.911, 0.967)
Calcifications NPA		N/A	0.950, (0.934, 0.964)
Standalone Performance (CC quadrant assessment)				PPA & NPA
Overall PPA		N/A	0.765 (0.726, 0.810)
Overall NPA		N/A	0.963 (0.951, 0.965)
Standalone Performance (MLO quadrant assessment)				PPA & NPA
Overall PPA		N/A	0.471 (0.425, 0.523)
Overall NPA		N/A	0.889 (0.878, 0.902)
Standalone Performance (Depth assessment)				PPA & NPA
Overall PPA		N/A	0.617 (0.587, 0.644)
Overall NPA		N/A	0.943 (0.932, 0.953)

1. Sample sizes used for the test set and data provenance:

MRMC Study (AI-aided reading):
- Sample Size: 240 combined DBT/2D mammograms (DBT+FFDM or DBT+2DSM) with a prior.
- Data Provenance: Not explicitly stated, but the inclusion of "MQSA qualified and ABR certified radiologists" suggests US-based data or a study conducted under US regulatory standards. It's retrospective (pre-collected cases).
Standalone Performance Study:
- Sample Size: 7,544 exams from 4,429 patients.
- Data Provenance: Prospective, from 3 US centers. The demographics table provides a distribution of race, age, and imaging modalities used (Hologic only for manufacturer), explicitly confirming US origin. Exam dates range from 2005 - 2023.

2. Number of experts used to establish the ground truth for the test set and their qualifications:

MRMC Study (AI-aided reading): Not explicitly stated how the ground truth was established for the 240 cases, but it's implied that these were "truth" cases used to evaluate reader and system performance. Given the type of study, it's highly likely to have been based on a consensus of expert radiologists or pathology confirmation.
Standalone Performance Study: Not explicitly stated how the ground truth for the 7,544 cases was established, but the description "Cancer status: Malignant: 23% / Normal/benign: 77%" implies a confirmed ground truth, likely through pathology reports or long-term follow-up. The reference to "reference standard" for lesion type, quadrant, and depth assessment also suggests expert review or confirmed diagnoses.

3. Adjudication method for the test set:

MRMC Study: Not explicitly mentioned.
Standalone Performance Study: Not explicitly mentioned.

4. If a multi reader multi case (MRMC) comparative effectiveness study was done, and if so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Yes, an MRMC study was done.
Effect Size of Improvement with AI:
- Average AUROC: Increase of +0.074 [0.047 - 0.101] (p-value < 0.001). (From 0.797 unaided to 0.871 aided).
- Average Sensitivity: Increase of +0.086 [0.040 - 0.133] (p-value < 0.001). (From 0.706 unaided to 0.793 aided).
- Average Specificity: Increase of +0.021 [0.006 - 0.036] (p-value 0.007). (From 0.815 unaided to 0.836 aided).

5. If a standalone (i.e., algorithm only without human-in-the loop performance) was done:

Yes, a standalone performance study was done.
- Overall AUROC at mammogram level: 0.883 [0.837 - 0.929].
- This was found to be superior to radiologists in unaided reading conditions (ΔAUROC = +0.085 [0.044 - 0.127], p-value <0.0001).
- It was also non-inferior to radiologists in aided reading conditions (ΔAUROC = +0.012 [-0.015 - 0.039], p-value <0.0001).
- Standalone sensitivity was 0.833 [0.756 – 0.911] and specificity was 0.793 [0.728 – 0.858].
- Detailed standalone performance by subgroup (density, race, source, age, lesion type, lesion size, lesion severity, imaging combination, prior image combination, prior time difference) was also provided, along with lesion type, quadrant, and depth assessment performance (PPA and NPA).

6. The type of ground truth used:

MRMC Study: Not explicitly detailed, but usually based on pathology or rigorous follow-up.
Standalone Performance Study: The ground truth for cancer status is indicated by "Malignant: 23% / Normal/benign: 77%," implying pathology confirmation or long-term follow-up. For lesion type, quadrant, and depth assessments, it refers to a "reference standard," which typically indicates expert consensus or pathology correlation.

7. The sample size for the training set:

The training set sample size is not explicitly stated in the provided text. The document mentions that the deep learning modules are "trained with large databases of biopsy-proven examples of breast cancer and normal tissue," but specific numbers are not given.

8. How the ground truth for the training set was established:

The ground truth for the training set was established using "large databases of biopsy-proven examples of breast cancer and normal tissue." This implies that the training data included cases with definitive diagnostic outcomes (e.g., via biopsy with histopathological confirmation).

Ask a Question

Ask a specific question about this device

K Number

K211541

Device Name

MammoScreen 2.0

Manufacturer

Therapixel

Date Cleared

2021-11-26

(191 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K192854

Predicate For

K240301

Intended Use

MammoScreen® is intended for use as a concurrent reading aid for interpreting physicians, to help identify findings on screening FFDM and DBT acquired with compatible mammography systems and assess their level of suspicion. Output of the device includes marks placed on findings on the mammogram and level of suspicion scores. The findings could be soft tissue lesions or calcifications. The level of suspicion score is expressed at the finding level, for each breast and overall for the mammogram. Patient management decisions should not be made solely on the basis of analysis by MammoScreen®.

Device Description

MammoScreen 2.0 automatically processes the four views (one CC and one MLO per breast) of standard screening FFDM or DBT, and outputs a corresponding report on a separate screen, alongside the monitors used for reading. This report is designed to be easily readable with very few interactions required by providing an overall level of suspicion of each exam and giving explicit visual indications when highly suspicious exams are detected.

MammoScreen 2.0 detects and characterizes findings on a scale from one to ten, referred to as the MammoScreen score. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion.

Furthermore, MammoScreen 2.0 provides a high level of interpretability. Results are by construction consistent at the finding, breast and mammogram level. A breast takes on the highest score of its detected findings, and the level of suspicion for the exam is driven by the breast(s) with the highest score. Therefore, it is always possible to track a high suspicion of malignancy for an exam to the corresponding breast(s), and to a specific finding within the breast(s).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

Performance Metric	Acceptance Criteria (Implicit)	Reported Device Performance (FFDM)	Reported Device Performance (DBT)
Radiologist Performance with AID (AUC)	Superior to unaided radiologist performance	Increased from 0.77 to 0.80	Increased from 0.79 to 0.83
Standalone Performance (AUC)	Non-inferior to unaided radiologist performance	0.79 (non-inferior to 0.77 unaided)	0.84 (superior to 0.79 unaided)
Standalone Performance vs. Predicate (FFDM)	Non-inferior to predicate device	Achieved non-inferior performance	Not applicable

2. Sample Size Used for the Test Set and Data Provenance

Sample Size (FFDM & DBT): 240 cases (enriched sample set)
Data Provenance: Not explicitly stated regarding country of origin. The studies are described as "reader studies," implying prospective collection for the purpose of the study or a curated retrospective selection. The text doesn't specify if it's purely retrospective or prospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: 14 for the 2D (FFDM) study and 20 for the 3D (DBT) study.
Qualifications: "MOSA-qualified and ABR-certified readers." (MOSA and ABR are common certifications for radiologists in the US, suggesting a US context for the experts).

4. Adjudication Method for the Test Set

The provided text does not explicitly state the adjudication method used to establish the ground truth for the test set. It mentions "enriched sample set" and "MOSA-qualified and ABR-certified readers," suggesting expert consensus, but the specific process (e.g., 2+1, 3+1) is not detailed.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance

Yes, an MRMC study was done. Clinical validation included two reader studies (one for FFDM and one for DBT) using a multi-reader multi-case (MRMC) cross-over design.
Effect Size of Improvement:
- FFDM: Average AUC for radiologists increased from 0.77 (without AI) to 0.80 (with AI). (Improvement: 0.03 AUC)
- DBT: Average AUC for radiologists increased from 0.79 (without AI) to 0.83 (with AI). (Improvement: 0.04 AUC)

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, standalone performance was evaluated. The objectives of the studies included determining: "Whether the performance of MammoScreen standalone is superior to unaided radiologist performance" and "Whether the performance of MammoScreen standalone is non-inferior to aided radiologist performance."
Standalone Performance Results:
- FFDM: AUC = 0.79 (found to be non-inferior to the average unaided radiologists' performance of 0.77).
- DBT: AUC = 0.84 (found to be superior to the average unaided radiologists' performance of 0.79).
- Additionally, standalone performance tests for MammoScreen 2.0 (FFDM) demonstrated non-inferiority compared to the predicate device.

7. The Type of Ground Truth Used

The text implicitly suggests expert consensus based on the mention of "MOSA-qualified and ABR-certified readers." It also references the training of deep learning modules with "biopsy-proven examples of breast cancer and normal tissue," indicating that biopsy (pathology) results were used as the ultimate ground truth to establish the benign/malignant status of lesions in the training data, and likely in the test set's ground truth development as well. The study assesses performance in the "detection of breast cancer," linking the ground truth directly to malignancy.

8. The Sample Size for the Training Set

The document states that the deep learning modules were "trained with very large databases of biopsy-proven examples of breast cancer and normal tissue." However, a specific numerical sample size for the training set is not provided.

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set was established using "biopsy-proven examples of breast cancer and normal tissue." This indicates that histopathological (pathology) results from biopsies served as the definitive ground truth for classifying cases as cancerous or normal during the training of the AI model.

Ask a Question

Ask a specific question about this device

K Number

K192854

Device Name

MammoScreen

Manufacturer

Therapixel

Date Cleared

2020-03-25

(173 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K181704

Predicate For

K211541

Intended Use

MammoScreen™ is intended for use as a concurrent reading aid for interpreting physicians, to help identify findings on screening FFDM acquired with compatible mammography systems and assess their level of suspicion. Output of the device includes marks placed on findings on the mammogram and level of suspicion scores. The findings could be soft tissue lesions or calcifications. The level of suspicion score is expressed at the finding level, for each breast and overall for the mammogram. Patient management decisions should not be made solely on the basis of analysis by MammoScreen™.

Device Description

MammoScreen is a software-only device for aiding interpreting physicians in identifying focal findings suspicious for breast cancer in screening FFDM (full-field digital mammography) acquired with compatible mammography systems. The product consists of a processing server and a web interface. The software applies algorithms for recognition of suspicious calcifications and soft tissue lesions. These algorithms have been trained on large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue. MammoScreen automatically processes FFDM and the output of the device can be used by radiologists concurrently with the reading of mammograms. The user interface of MammoScreen has several functions: a) Activation of computer aided detection (CAD) marks to highlight locations, known as findings, where the device detected calcifications or soft tissue lesions suspicious for cancer. b) Association of findings with a score, known as the MammoScreen Score, which characterizes findings on a 1-10 scale, with increasing level of suspicion. Only the most suspicious findings (with a MammoScreen score equal or greater than 5) are initially marked to limit the number of findings to review. The user shall also review findings with score of 4 or lower. c) Indication, with matching markers, when findings corresponding to the same findings are detected in multiple views of the FFDM. MammoScreen is configured as a DICOM Web compliant node in a network and receives its input images from another DICOM node, called "the DICOM Web Server". The MammoScreen output will be displayed on the screen of a personal computer compliant with requirements specified in the User Manual. The image analysis unit includes machine learning components trained to detect positive findings (calcifications and soft tissue lesions).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Device Performance

The provided document defines acceptance criteria primarily through comparison with a predicate device and through the results of a clinical reader study. The core acceptance criterion for the clinical study appears to be an improvement in radiologist performance when using MammoScreen assistance compared to unaided reading.

Table of Acceptance Criteria and Reported Device Performance

Criterion Type	Specific Criterion	Reported Device Performance (MammoScreen)	Met?
Premarket Equivalence (vs. Predicate Device K181704 Transpara)
Classification Regulation	21 CFR 892.2090	SAME	Yes
Medical Device Class	Class II	SAME	Yes
Product Code	QDQ	SAME	Yes
Level of Concern	Moderate	SAME	Yes
Intended Use	Concurrent reading aid for physicians interpreting screening FFDM to identify findings and assess their level of suspicion.	SAME	Yes
Target patient population	Women undergoing FFDM screening mammography.	SAME	Yes
Target user population	Physicians interpreting FFDM screening mammograms.	SAME	Yes
Design	Software-only device.	SAME	Yes
Scoring System	While not identical, the principle (level of suspicion from low to high) should be substantially equivalent.	10-point scale vs. predicate's 1-100. Manufacturer claims interpretability benefits. Exam-level score provided. Deemed "substantially equivalent."	Yes
Finding Discovery	Reducing the number of findings the user has to review.	Default display for scores ≥ 5, user request for scores ≤ 4. Deemed "equivalent."	Yes
Performance Comparison	Overall performance gains should be comparable and not raise new safety/effectiveness questions.	AUC: unaided = 0.769, assisted = 0.798 (Difference: 0.028; P = 0.035). Predicate reported unaided = 0.866, assisted = 0.887. Deemed "still comparable."	Yes
Fundamental Scientific Technology	Involves medical image processing and machine learning, particularly deep learning for suspicious findings.	SAME	Yes
Clinical Performance (Reader Study)
Radiologist Performance	Radiologist performance with MammoScreen assistance is superior to unaided performance (main objective).	Average AUC improved from 0.769 (unaided) to 0.798 (with MammoScreen) (Difference = 0.028; P = 0.035).	Yes
Reading Time	Should not significantly increase.	Average reading time increased by 14% for scores > 4, but decreased by 2% for scores ≤ 4 in the second session. Overall, maximum increase did not exceed 15s.	Yes
Standalone Performance	Non-inferior to average unaided radiologist performance.	Standalone AUC = 0.790; Non-inferior to average unaided radiologist AUC = 0.770 (absence of statistical effect (p>0.05) and lower CI of diff > -0.03).	Yes
Sensitivity	Sensitivity of readers tended to increase with the use of MammoScreen without decreasing specificity (conclusion statement).	Reported overall performance improvement was statistically significant at breast (AUC) and lesion (pAUC) level, confirming trend. Specific values not explicitly in acceptance criteria here.	Yes

Study Details for Device Acceptance

Sample Size Used for the Test Set and Data Provenance:
- Test Set Size: 240 mammographic screening images (cases).
- Data Provenance: Acquired at a US center. The text states "US FFDM acquired on Hologic® devices, and performance comparison with FFDM acquired on GE® devices," indicating images from at least two major mammography system manufacturers in the US.
- Retrospective/Prospective: Retrospective. The study "collected" images after they were acquired, and "For each exam, the cancer status has been verified... and used as gold standard."
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts:
- The document does not explicitly state the number of experts used to establish the ground truth or their specific qualifications (e.g., years of experience). It only states that "the cancer status has been verified by either biopsy results (for all cancer positive cases and some of the negative cases) or an adequate follow-up (for negative cases only) and used as gold standard." This implies clinical data and follow-up was the primary ground truth, not consensus of a specific number of experts.
Adjudication Method for the Test Set:
- The document does not explicitly describe an adjudication method for establishing ground truth from multiple expert reads. Ground truth was established via biopsy or adequate follow-up, which are objective clinical outcomes, not subjective reader interpretations.
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- Yes, an MRMC study was performed.
- Effect Size of Human Readers Improvement with AI vs. Without AI Assistance:
  - Average AUC: Increased from 0.769 (unaided) to 0.798 (with MammoScreen assistance).
  - Difference: 0.028 (P = 0.035), indicating a statistically significant improvement.
  - The AUC was higher with MammoScreen aid for 11 of the 14 radiologists.
  - Performance improvement was also statistically significant at the breast (in terms of AUC) and lesion (in terms of pAUC) level.
Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study:
- Yes, a standalone performance study was conducted.
- Standalone Performance: MammoScreen's standalone performance (AUC = 0.790) was found to be non-inferior to the average performance of unaided radiologists (AUC = 0.770). The lower confidence interval of the difference of AUC was equal to or superior to the effect size (-0.03), and the P-value was >0.05, confirming non-inferiority.
- Detailed standalone performance metrics were also provided for mammogram, breast, and finding levels (soft tissue lesions and calcifications), including ROC AUC, sensitivity, and specificity for Hologic, GE, and combined datasets.
Type of Ground Truth Used:
- Clinical Outcomes Data: The primary ground truth was established by:
  - Biopsy results (for all cancer-positive cases and some negative cases).
  - Adequate follow-up (for negative cases only).
Sample Size for the Training Set:
- The document states that the algorithms "have been trained on large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue." However, it does not specify the exact sample size of the training set.
How the Ground Truth for the Training Set Was Established:
- The ground truth for the training set was established using "biopsy proven examples of breast cancer, benign lesions and normal tissue." This implies a similar methodology to the test set, relying on objective clinical outcomes (histopathology from biopsy) rather than expert consensus on images.

Ask a Question

Ask a specific question about this device

Page 1 of 1