K Number
K191994
Manufacturer
Date Cleared
2019-10-04

(70 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

ProFound™ AI V2.1 Software is a computer-assisted detection and diagnosis (CAD) software device intended to be used concurrently by interpreting physicians while reading digital breast tomosynthesis (DBT) exams from compatible DBT systems. The system detects soft tissue densities (masses, architectural distortions and asymmetries) and calcifications in the 3D DBT slices. The detections and Certainty of Finding and Case Scores assist interpreting physicians in identifying soft tissue densities and calcifications that may be confirmed or dismissed by the interpreting physician.

Device Description

ProFound AI V2.1 detects malignant soft-tissue densities and calcifications in digital breast tomosynthesis (DBT) images. ProFound AI V2.1 has the same performance with the DBT systems cleared for use with ProFound AI V2; furthermore, it provides support for additional DBT systems. The ProFound AI V.2.1 Software allows a radiologist to quickly identify suspicious soft tissue densities (masses, architectural distortions and asymmetries) and calcifications by marking the detected areas in the tomosynthesis images. When the ProFound AI V2.1 marks are displayed, the marks will appear as overlays on the 3D tomosynthesis images. For 3D tomosynthesis cases and depending on the functionality offered by the viewing/reading application, the ProFound AI V2.1 marks may also serve as a navigation tool for users because each mark can be linked to the tomosynthesis slice where the detection was identified. Each detected region is also assigned a "score" that corresponds to the ProFound AI V2.1 algorithm's confidence that the detected region is malignant (certainty of finding). Each case is also assigned a case score that corresponds to the ProFound AI V2.1 algorithm's confidence that a case is malignant. The certainty of finding scores are represented as an integer in range of 0 to 100 to indicate the CAD confidence that the detected region or case is malignant. The higher the certainty of finding or case score, the more likely the detected region or case is to be malignant.

AI/ML Overview

Here’s a summary of the acceptance criteria and the study details for the ProFound™ AI Software V2.1, based on the provided FDA 510(k) summary.

1. Table of Acceptance Criteria and Reported Device Performance

The document states that "Case-Level Sensitivity, Lesion-Level Sensitivity, FP Rate in Non-Cancer Cases, and Specificity met design specifications" for both Siemens Standard and Empire Reconstruction datasets. However, the specific numerical acceptance criteria are not explicitly provided in the text. The document refers to "design specifications" and "the detailed results are in the User Manual," implying these numerical targets exist but are not included in the 510(k) summary provided.

For the comparison studies, the acceptance criterion was "the difference between the control group [Hologic] and the test group [Siemens Standard/Empire] is within the margin of non-inferiority for Sensitivity and AUC, and FPPI." The reported performance was that "Each of the three measures produced differences that were within the margin of non-inferiority." Again, specific numerical margins for non-inferiority are not detailed.

Acceptance Criteria (Not explicitly stated numerically, but implied)Reported Device Performance (Met criteria)
Standalone Performance:
Case-Level Sensitivity meets design specificationsMet design specifications (for both Siemens Standard and Empire Reconstruction)
Lesion-Level Sensitivity meets design specificationsMet design specifications (for both Siemens Standard and Empire Reconstruction)
FP Rate in Non-Cancer Cases meets design specificationsMet design specifications (for both Siemens Standard and Empire Reconstruction)
Specificity meets design specificationsMet design specifications (for both Siemens Standard and Empire Reconstruction)
Non-Inferiority Comparison (vs. Hologic):
Difference in Sensitivity (Siemens vs. Hologic) within non-inferiority marginWithin the margin of non-inferiority (for both Siemens Standard and Empire Reconstruction)
Difference in FPPI (Siemens vs. Hologic) within non-inferiority marginWithin the margin of non-inferiority (for both Siemens Standard and Empire Reconstruction)
Difference in AUC (Siemens vs. Hologic) within non-inferiority marginWithin the margin of non-inferiority (for both Siemens Standard and Empire Reconstruction)

2. Sample Size Used for the Test Set and Data Provenance

  • Siemens Standard Reconstruction Dataset:
    • Sample Size: 694 cases (238 cancer, 456 non-cancer)
    • Provenance: Not explicitly stated (e.g., country of origin). The study is described as a "screening population dataset," implying it is collected for screening purposes. The terms "stratified bootstrap procedure was used to estimate performance over a screening patient population" suggest it's representative of a screening population. Whether it's retrospective or prospective is not explicitly stated, but "dataset consisted of" typically implies retrospective collection for testing.
  • Siemens Empire Reconstruction Dataset:
    • Sample Size: 322 cases (140 cancer, 182 non-cancer)
    • Provenance: Not explicitly stated (e.g., country of origin). Similar to the Standard Reconstruction dataset, it is described as a "screening population dataset," implying it is collected for screening purposes. Whether it's retrospective or prospective is not explicitly stated, but "dataset consisted of" typically implies retrospective collection for testing.
  • Hologic (Control Group for Comparison): The document references "baseline performance of ProFound AI for DBT V2.0 with Hologic DBT images." While a control group is mentioned, the specific sample size for the Hologic dataset used in the comparison is not provided in this excerpt, only that the performance was used as a reference for non-inferiority.

3. Number of Experts Used to Establish Ground Truth and Qualifications

The document does not explicitly state the number of experts used or their qualifications for establishing ground truth for the test sets.

4. Adjudication Method for the Test Set

The document does not explicitly state the adjudication method used for the test sets (e.g., 2+1, 3+1).

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study (AI vs. without AI assistance) is not described in this document. The studies presented are standalone performance evaluations of the AI system and non-inferiority comparisons of the AI system's performance across different DBT acquisition systems. The "concurrently by interpreting physicians" in the indication for use suggests a human-in-the-loop interaction, but a specific MRMC study to quantify human improvement with AI is not detailed here.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

Yes, standalone (algorithm only without human-in-the-loop performance) studies were done.

  • The "ProFound AI for DBT V2.1 Siemens Standard Screening Population Dataset" study explicitly states: "Standalone testing was performed on tomosynthesis slices only."
  • Similarly, the "ProFound AI for DBT V2.1 Siemens Empire Screening Population Dataset" study states: "Standalone testing was performed on tomosynthesis slices only."
  • The comparison studies ("Standalone Hologic Comparison Test Results") also involve comparing "the standalone performance of ProFound AI for DBT V2.0 with Hologic DBT images to the performance of ProFound AI for DBT V2.1 with Siemens Standard/Empire Reconstruction DBT images."

7. Type of Ground Truth Used

The type of ground truth used is not explicitly stated in this excerpt. However, in the context of screening population datasets for cancer detection, ground truth is typically established by:

  • Pathology (biopsy results) for positive cases.
  • Long-term follow-up (e.g., 1-2 years of negative imaging) for negative cases.

8. Sample Size for the Training Set

The document does not specify the sample size used for the training set.

9. How the Ground Truth for the Training Set Was Established

The document does not specify how the ground truth for the training set was established. It only mentions that the "ProFound AI 2.1 algorithm uses deep learning technology to process feature computations and uses pattern recognition to identify suspicious breast lesions." This implies a training process based on labeled data, but details about the origin and establishment of those labels are not provided in this excerpt.

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.