(73 days)
ProFound AI® V3.0 is a computer-assisted detection and diagnosis (CAD) software device intended to be used concurrently by interpreting physicians while reading digital breast tomosynthesis (DBT) exams from compatible DBT systems. The system detects soft tissue densities (masses, architectural distortions and calcifications in the 3D DBT slices. The detections and Certainty of Finding and Case Scores assist interpreting physicians in identifying soft tissue densities and calcifications that may be confirmed or dismissed by the interpreting Physician
The ProFound Al® V3.0 device detects malignant soft-tissue densities and calcifications in digital breast tomosynthesis (DBT) images. The ProFound AI V3.0 software allows an interpreting physician to quickly identify suspicious soft tissue densities and calcifications by marking the detected areas in the tomosynthesis images. When the ProFound AI V3.0 marks are displayed by a user, the marks will appear as overlays on the tomosynthesis images. Each detected finding will also be assigned a "score" that corresponds to the ProFound AI V3.0 algorithm's confidence that the detected finding is a cancer (Certainty of Finding). Certainty of Finding scores are a percentage in range of 0% to indicate CAD's confidence that the finding is malignant. ProFound AI V3.0 also assigns a score to each case (Case Score) as a percentage in range of 0% to 100% to indicate CAD's confidence that the case has malignant findings. The higher the Certainty of Finding or Case Score, the higher the confidence that the detected finding is a cancer or that the case has malignant findings.
The provided text describes specific acceptance criteria and the study conducted to demonstrate that ProFound AI® Software V3.0 meets these criteria.
1. Table of Acceptance Criteria and Reported Device Performance
The document states that the "Indications for Use" remain unchanged from the Predicate UNMODIFIED Device ProFound AI V2.1, and that the "technological characteristics of Modified Device, ProFound AI V3.0 remain unchanged from Unmodified Device ProFound AI V2.1 as the predicate." The key improvement for V3.0 is "software improvements leading to improved specificity for GE and Hologic modalities."
While specific numerical acceptance criteria (e.g., minimum sensitivity, minimum specificity) are not explicitly stated in a table format with target thresholds, the performance is assessed relative to the predicate device (ProFound AI V2.0/V2.1). The primary performance improvements demonstrated are in specificity.
Acceptance Criterion (Implicitly based on Predicate Equivalence) | Reported Device Performance (ProFound AI V3.0) |
---|---|
Non-inferiority in case sensitivity vs. ProFound AI V2.0/V2.1 | Hologic DBT: The conclusion of non-inferiority of the standalone performance of ProFound AI V3.0 with a Hologic DBT screening population compared to the baseline performance of ProFound AI V2 with a Hologic DBT screening population in terms of case sensitivity, FP rate per 3D volume, and AUC. Claims established in the original Reader Study (K182373) apply to ProFound AI V3.0 with Hologic DBT. |
GE DBT: The conclusion of non-inferiority of the standalone performance of ProFound AI V3.0 with a GE DBT screening population compared to the baseline performance of ProFound AI V2 with a Hologic DBT screening population in terms of case sensitivity, FP rate per 3D volume, and AUC. Claims established in the original Reader Study (K182373) apply to ProFound AI V3.0 with GE DBT. |
| Improved specificity for GE and Hologic modalities | Hologic DBT: A paired comparison demonstrated a significant increase in specificity from V2.0 to V3.0.
GE DBT: A paired comparison demonstrated a significant increase in specificity from V2.0 to V3.0. |
| Retention of original Indications for Use | Unchanged from ProFound AI V2.1. |
| Non-raising of different questions of safety and effectiveness | "These changes do not raise different questions of safety and effectiveness." |
2. Sample Size Used for the Test Set and Data Provenance
The document refers to a "ProFound AI V2 Pivotal Reader Study Clinical Study Report (CSR) (K182373)". This reader study, performed for the predicate device, is the basis for the claims applicable to V3.0 regarding non-inferiority in sensitivity.
For the Hologic DBT Non-clinical Validation Testing and GE DBT Non-clinical Validation Testing for V3.0 itself, the document states, "A paired comparison assessed the performance of ProFound AI V3.0 on [Hologic/GE] DBT images to the performance of ProFound AI V2.0 on the same set of [Hologic/GE] DBT images," implying that the test set for these specificity comparisons consisted of images from both Hologic and GE systems, from which both V2.0 and V3.0 analyses were derived.
- Sample Size: The exact number of cases or images in the test set specifically for the V3.0 validation studies (Hologic and GE paired comparisons) is not explicitly stated in the provided text. The non-inferiority claims rely on the original K182373 study, but its sample size is also not detailed here.
- Data Provenance: The document does not specify the country of origin for the data. Since the device is U.S. FDA cleared, it is plausible the data is from the US, but this is not confirmed. The studies are described as "Non-clinical Validation Testing" and "Supplemental Standalone Study," indicating they are retrospective studies.
3. Number of Experts Used to Establish Ground Truth and Qualifications
The provided text does not explicitly state the number of experts used to establish the ground truth or their specific qualifications for the test sets. It references "the original Reader Study described in 0074-6003. PowerLook® Tomo Detection V2 Pivotal Reader Study Clinical Study Report (CSR) (K182373)", which would have involved radiologists, but details are not provided here.
4. Adjudication Method for the Test Set
The adjudication method for establishing ground truth for the test sets is not explicitly stated in the provided text.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done?
- The document states that the claims established in the original Reader Study (K182373) for the predicate device (V2) apply to ProFound AI V3.0. This original study was likely an MRMC study to support the human-in-the-loop performance of V2.
- For V3.0 itself, the validation focuses on standalone performance comparisons between V2.0 and V3.0 to demonstrate non-inferiority in sensitivity and improvement in specificity. A new human reader study was not conducted specifically for V3.0 to re-evaluate human reader improvement.
- Effect size of human readers improving with AI vs. without AI assistance: This information is not provided for V3.0, as the primary validation focused on the standalone performance of the algorithm and its non-inferiority/specificity improvement over its predecessor. The predicate device's MRMC study (K182373) would contain this information for V2.
6. Standalone (i.e., algorithm only without human-in-the-loop performance) Study
- Yes, standalone studies were done. The document explicitly refers to "ProFound AI V3.0 Hologic supplemental Standalone Study" and "ProFound AI V3.0 GE Supplemental Standalone Study." These studies compared the performance of V3.0 with V2.0/V2.1 on the same image sets.
- The performance metrics assessed in these standalone studies included:
- Case sensitivity
- FP rate per 3D volume (False Positives)
- Area Under the localized Receiver Operating Characteristic (ROC) Curve (AUC)
- Specificity (which was shown to have a significant increase)
7. Type of Ground Truth Used
The type of ground truth used is not explicitly detailed in the provided text. However, for breast cancer detection, ground truth for such studies typically involves:
- Expert Consensus: Multiple radiologists reviewing cases and reaching agreement.
- Pathology: Biopsy-proven presence or absence of malignancy.
- Follow-up Outcomes Data: Clinical follow-up over time to confirm benign or malignant status.
Given that the device detects "malignant soft-tissue densities and calcifications," it is highly likely that pathology (biopsy results) and/or expert radiologist consensus with follow-up were used to establish definitive ground truth regarding the presence and nature of cancers.
8. Sample Size for the Training Set
The document does not specify the sample size used for the training set of ProFound AI V3.0. It mentions that V3.0 uses "deep learning technology to process feature computations and uses pattern recognition to identify suspicious breast lesions," which implies a training phase, but details about the training data are absent.
9. How the Ground Truth for the Training Set Was Established
The document does not specify how the ground truth for the training set was established. Similar to the test set, it would likely involve expert annotations, pathology, and/or follow-up data.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.