K Number
K200905
Device Name
HealthMammo
Date Cleared
2020-07-16

(101 days)

Product Code
Regulation Number
892.2080
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

The Zebra HealthMammo is a passive notification-only, parallel-workflow software tool used by MQSA-qualified interpreting physicians to prioritize patients with suspicious findings in the medical care environment. HealthMammo utilizes an artificial intelligence algorithm to analyze 2D FFDM screening mammograms and flags those that are suggestive of the presence of at least one suspicious finding at the exam-level. HealthMammo produces an exam-level output to a PACS/Workstation for flagging the suspicious case and allows worklist prioritization.

MQSA-qualified interpreting physicians are responsible for reviewing each exam on a display approved for use in mammography according to the current standard of care. HealthMammo device is limited to the categorization of exams, does not provide any diagnostic information beyond triage and prioritization, does not remove images from the interpreting physician's worklist, and should not be used in lieu of full patient evaluation, or relied upon to make or confirm diagnosis.

The HealthMammo device is intended for use with complete 2D FFDM mammography exams acquired using validated FFDM systems only.

Device Description

Zebra's HealthMammo solution is a software product that automatically analyzes 2D FFDM screening mammograms and notifies PACS/workstation of the presence of suspicious findings in the scan. This passive-notification allows for worklist prioritization of the specific scan and assists clinicians in viewing prioritized scans before others. The device aim is to aid in prioritization and triage of radiological medical images only. It is a software tool for MQSA interpreting physicians reading mammograms and does not replace complete evaluation according to the standard of care.

The Zebra's HealthMammo device works in parallel to and in conjunction with the standard care of workflow. After a mammogram has been performed, a copy of the study is automatically retrieved and processed by the HealthMammo device. The device performs the analysis of the study and returns a notification about suspected finding to the PACS/workstation which flags it through the worklist interface or alternatively, the Zebra Worklist will notify the user through a desktop application. The clinician is then able to review the study earlier than in standard of care workflow.

The primary benefit of the product is the ability to reduce the time it takes to alert physicians to the presence of a suspicious finding. The software does not recommend treatment or provide a diagnosis. It is meant as a tool to assist in improved workload prioritization of suspicious cases. The final diagnosis is provided by a clinician after reviewing the scan itself.

The following modules compose the HealthMammo software:

Data input and validation: Following retrieval of a study, the validation feature assessed the input data (i.e. age, modality, view) to ensure compatibility for processing by the algorithm.

HealthMammo algorithm: Once a study has been validated, the algorithm analyzes the 2D FFDM screening mammogram for detection of suspected findings.

IMA Integration feature: The study analysis and the results of a successful study analysis is provided to IMA, to then be sent to the PACS/workstation for prioritization.

Error codes feature: In the case of a study failure during data validation or the analysis by the algorithm, an error is provided to the system.

AI/ML Overview

Here's an analysis of the acceptance criteria and study proving the HealthMammo device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The FDA document doesn't explicitly present a formal "acceptance criteria" table with distinct thresholds for each metric. However, it implicitly defines performance goals by comparing to a predicate device (CmTriage, K183285) and the Breast Cancer Surveillance Consortium (BCSC) study. The key performance metric highlighted for the algorithm's standalone performance is the Area Under the Receiver Operating Characteristic (ROC) curve (AUC), along with sensitivity and specificity at different operating points.

Here's a table summarizing the performance values reported, with the implicit acceptance criteria being performance comparable to the predicate and BCSC study, and exceeding AUC > 95% for effective triage.

Metric (Operating Point)Acceptance Criteria (Implicit)Reported Device Performance (HealthMammo)
Area Under ROC Curve (AUC)> 0.95 (for effective triage, comparable to predicate)0.9661 (95% CI: [0.9552, 0.9769])
Sensitivity (Standard Mode)Comparable to BCSC study/predicate89.89% (95% CI: [86.69%; 92.38%])
Specificity (Standard Mode)Comparable to BCSC study/predicate90.75% (95% CI: [87.51%; 93.21%])
Sensitivity (High Sensitivity)Comparable to BCSC study/predicate94.02% (95% CI: [91.39%; 95.89%])
Specificity (High Sensitivity)Comparable to BCSC study/predicate83.50% (95% CI: [79.55%; 86.82%])
Sensitivity (High Specificity)Comparable to BCSC study/predicate84.14% (95% CI: [80.41%; 87.27%])
Specificity (High Specificity)Comparable to BCSC study/predicate94.00% (95% CI: [91.23%; 95.94%])
Average Processing TimeComparable to predicate2.9 minutes

2. Sample Size Used for the Test Set and Data Provenance

  • Sample Size: 835 anonymized 2D FFDM screening mammograms.

  • Data Provenance: Retrospective cohort from the USA, UK, and Israel.

    • 435 cases positive with biopsy confirmed cancers.
    • 400 cases negative for breast cancer (BIRADS 1 and BIRADS 2 with a two-year follow-up of a negative diagnosis).
    • The test set was constructed to address confounding factors such as Lesion Type, Breast Density, Age, and Histology Type to ensure consistency with the population undergoing breast cancer screening.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The document does not explicitly state the number of experts or their qualifications used to establish the ground truth for the test set. It mentions "biopsy confirmed cancers" for positive cases and "two-year follow-up of a negative diagnosis" for negative cases, implying a medical gold standard rather than consensus reads.

4. Adjudication Method for the Test Set

The document does not describe an adjudication method for the test set, as the ground truth appears to be based on biopsy results and long-term follow-up rather than expert reader consensus that would typically require adjudication.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of Human Readers Improving with AI vs. Without AI Assistance

No, an MRMC comparative effectiveness study involving human readers and AI assistance was not reported or described in this document. The study described is a standalone performance validation of the AI algorithm. The device is intended as a triage tool that operates in parallel to the standard workflow and does not remove cases from the radiologist's worklist.

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done

Yes, a standalone performance study was done. The document states: "The stand-alone detection and triage accuracy was measured on this cohort versus the ground truth." All the reported performance metrics (AUC, sensitivity, specificity) pertain to the algorithm's performance alone.

7. The Type of Ground Truth Used

The ground truth used was a combination of:

  • Pathology/Outcomes Data: "biopsy confirmed cancers" for positive cases.
  • Outcomes Data: "BIRADS 1 and 2 normal cases with a two-year follow-up of a negative diagnosis" for negative cases. This represents a clinical outcome used as ground truth.

8. The Sample Size for the Training Set

The document does not specify the sample size for the training set. It only describes the test set and the performance validation on it.

9. How the Ground Truth for the Training Set Was Established

The document does not describe how the ground truth for the training set was established. It focuses solely on the validation test set.

§ 892.2080 Radiological computer aided triage and notification software.

(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.