Search Results

The Zebra HealthMammo is a passive notification-only, parallel-workflow software tool used by MQSA-qualified interpreting physicians to prioritize patients with suspicious findings in the medical care environment. HealthMammo utilizes an artificial intelligence algorithm to analyze 2D FFDM screening mammograms and flags those that are suggestive of the presence of at least one suspicious finding at the exam-level. HealthMammo produces an exam-level output to a PACS/Workstation for flagging the suspicious case and allows worklist prioritization.

MQSA-qualified interpreting physicians are responsible for reviewing each exam on a display approved for use in mammography according to the current standard of care. HealthMammo device is limited to the categorization of exams, does not provide any diagnostic information beyond triage and prioritization, does not remove images from the interpreting physician's worklist, and should not be used in lieu of full patient evaluation, or relied upon to make or confirm diagnosis.

The HealthMammo device is intended for use with complete 2D FFDM mammography exams acquired using validated FFDM systems only.

Device Description

Zebra's HealthMammo solution is a software product that automatically analyzes 2D FFDM screening mammograms and notifies PACS/workstation of the presence of suspicious findings in the scan. This passive-notification allows for worklist prioritization of the specific scan and assists clinicians in viewing prioritized scans before others. The device aim is to aid in prioritization and triage of radiological medical images only. It is a software tool for MQSA interpreting physicians reading mammograms and does not replace complete evaluation according to the standard of care.

The Zebra's HealthMammo device works in parallel to and in conjunction with the standard care of workflow. After a mammogram has been performed, a copy of the study is automatically retrieved and processed by the HealthMammo device. The device performs the analysis of the study and returns a notification about suspected finding to the PACS/workstation which flags it through the worklist interface or alternatively, the Zebra Worklist will notify the user through a desktop application. The clinician is then able to review the study earlier than in standard of care workflow.

The primary benefit of the product is the ability to reduce the time it takes to alert physicians to the presence of a suspicious finding. The software does not recommend treatment or provide a diagnosis. It is meant as a tool to assist in improved workload prioritization of suspicious cases. The final diagnosis is provided by a clinician after reviewing the scan itself.

The following modules compose the HealthMammo software:

Data input and validation: Following retrieval of a study, the validation feature assessed the input data (i.e. age, modality, view) to ensure compatibility for processing by the algorithm.

HealthMammo algorithm: Once a study has been validated, the algorithm analyzes the 2D FFDM screening mammogram for detection of suspected findings.

IMA Integration feature: The study analysis and the results of a successful study analysis is provided to IMA, to then be sent to the PACS/workstation for prioritization.

Error codes feature: In the case of a study failure during data validation or the analysis by the algorithm, an error is provided to the system.

AI/ML Overview

Here's an analysis of the acceptance criteria and study proving the HealthMammo device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The FDA document doesn't explicitly present a formal "acceptance criteria" table with distinct thresholds for each metric. However, it implicitly defines performance goals by comparing to a predicate device (CmTriage, K183285) and the Breast Cancer Surveillance Consortium (BCSC) study. The key performance metric highlighted for the algorithm's standalone performance is the Area Under the Receiver Operating Characteristic (ROC) curve (AUC), along with sensitivity and specificity at different operating points.

Here's a table summarizing the performance values reported, with the implicit acceptance criteria being performance comparable to the predicate and BCSC study, and exceeding AUC > 95% for effective triage.

Metric (Operating Point)	Acceptance Criteria (Implicit)	Reported Device Performance (HealthMammo)
Area Under ROC Curve (AUC)	> 0.95 (for effective triage, comparable to predicate)	0.9661 (95% CI: [0.9552, 0.9769])
Sensitivity (Standard Mode)	Comparable to BCSC study/predicate	89.89% (95% CI: [86.69%; 92.38%])
Specificity (Standard Mode)	Comparable to BCSC study/predicate	90.75% (95% CI: [87.51%; 93.21%])
Sensitivity (High Sensitivity)	Comparable to BCSC study/predicate	94.02% (95% CI: [91.39%; 95.89%])
Specificity (High Sensitivity)	Comparable to BCSC study/predicate	83.50% (95% CI: [79.55%; 86.82%])
Sensitivity (High Specificity)	Comparable to BCSC study/predicate	84.14% (95% CI: [80.41%; 87.27%])
Specificity (High Specificity)	Comparable to BCSC study/predicate	94.00% (95% CI: [91.23%; 95.94%])
Average Processing Time	Comparable to predicate	2.9 minutes

2. Sample Size Used for the Test Set and Data Provenance

Sample Size: 835 anonymized 2D FFDM screening mammograms.
Data Provenance: Retrospective cohort from the USA, UK, and Israel.
- 435 cases positive with biopsy confirmed cancers.
- 400 cases negative for breast cancer (BIRADS 1 and BIRADS 2 with a two-year follow-up of a negative diagnosis).
- The test set was constructed to address confounding factors such as Lesion Type, Breast Density, Age, and Histology Type to ensure consistency with the population undergoing breast cancer screening.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The document does not explicitly state the number of experts or their qualifications used to establish the ground truth for the test set. It mentions "biopsy confirmed cancers" for positive cases and "two-year follow-up of a negative diagnosis" for negative cases, implying a medical gold standard rather than consensus reads.

4. Adjudication Method for the Test Set

The document does not describe an adjudication method for the test set, as the ground truth appears to be based on biopsy results and long-term follow-up rather than expert reader consensus that would typically require adjudication.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of Human Readers Improving with AI vs. Without AI Assistance

No, an MRMC comparative effectiveness study involving human readers and AI assistance was not reported or described in this document. The study described is a standalone performance validation of the AI algorithm. The device is intended as a triage tool that operates in parallel to the standard workflow and does not remove cases from the radiologist's worklist.

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done

Yes, a standalone performance study was done. The document states: "The stand-alone detection and triage accuracy was measured on this cohort versus the ground truth." All the reported performance metrics (AUC, sensitivity, specificity) pertain to the algorithm's performance alone.

7. The Type of Ground Truth Used

The ground truth used was a combination of:

Pathology/Outcomes Data: "biopsy confirmed cancers" for positive cases.
Outcomes Data: "BIRADS 1 and 2 normal cases with a two-year follow-up of a negative diagnosis" for negative cases. This represents a clinical outcome used as ground truth.

8. The Sample Size for the Training Set

The document does not specify the sample size for the training set. It only describes the test set and the performance validation on it.

9. How the Ground Truth for the Training Set Was Established

The document does not describe how the ground truth for the training set was established. It focuses solely on the validation test set.

Ask a Question

Ask a specific question about this device

Page 1 of 1