(130 days)
Genius AI Detection is a computer-aided detection and diagnosis (CADe/CADx) software device intended to be used with compatible digital breast tomosynthesis (DBT) systems to identify and mark regions of interest including soft tissue densities (masses, architectural distortions and asymmetries) and calcifications in DBT exams from compatible DBT systems and provide confidence scores that offer assessment for Certainty of Findings and a Case Score. The device intends to aid in the interpretation of digital breast tomosynthesis exams in a concurrent fashion, where the interpreting physician confirms or dismisses the findings during the reading of the exam.
Genius AI Detection is a software device intended to identify potential abnormalities in breast tomosynthesis images. Genius Al Detection analyzes each standard mammographic view in a digital breast tomosynthesis examination using deep learning networks. For each detected lesion, Genius AI Detection produces CAD results that include the location of the lesion, an outline of the lesion and a confidence score for that lesion. Genius Al Detection also produces a case score for the entire tomosynthesis exam.
Genius Al Detection packages all CAD findings derived from the corresponding analysis of a tomosynthesis exam into a DICOM Mammography CAD SR object and distributes it for display on DICOM compliant review workstations. The interpreting physician will have access to the CAD findings concurrently to the reading of the tomosynthesis exam. In addition, a combination of peripheral information such as number of marks and case scores may be used on the review workstation to enhance the interpreting physician's workflow by offering a better organization of the patient worklist.
The Genius Al Detection 2.0 now added the CC-MLO Correlation feature. The added feature provides the ability to correlate a suspected lesion in one view with a like finding in the other view and additionally provides a workflow and navigation feature for the interpreting physician.
The provided text describes the regulatory clearance of a medical device, "Genius AI Detection 2.0 with CC-MLO Correlation." While it mentions "acceptance criteria" through the lens of safety and effectiveness, it does not explicitly list quantitative acceptance criteria for the device's performance (e.g., a specific sensitivity or specificity threshold). Instead, it describes internal validation and a standalone evaluation study to demonstrate that the device is "safe and effective."
Here's a breakdown of the requested information based on the provided text:
1. A table of acceptance criteria and the reported device performance
As mentioned, explicit quantitative acceptance criteria are not provided in the document. The document states that the "verification testing showed that the software application satisfied the software requirements." For the standalone evaluation of the CC-MLO Correlation feature, the performance was "estimated in both groups by scoring the detection pairs against the truth pairs and by evaluating the expert radiologist's response, respectively." However, specific performance metrics (e.g., accuracy percentages, sensitivity, specificity for the CC-MLO correlation) are not reported in this summary.
2. Sample size used for the test set and the data provenance (e.g., country of origin of the data, retrospective or prospective)
- Test Set Sample Size:
- For the standalone evaluation of the CC-MLO Correlation feature, the dataset included:
- 106 biopsy-proven malignant cases.
- 561 screening negative cases.
- Additionally, the detection pairs generated by the CC-MLO correlation feature were reviewed on 658 screening negative and biopsied benign cases. (It's unclear if this "658 cases" is a subset or superset of the "561 screening negative cases" mentioned earlier, or an entirely separate review of negative/benign cases for correlation specifically.)
- For the standalone evaluation of the CC-MLO Correlation feature, the dataset included:
- Data Provenance: The document does not specify the country of origin of the data or whether it was retrospective or prospective.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g., radiologist with 10 years of experience)
- Number of Experts: An "expert radiologist" is mentioned in the singular ("an expert radiologist" and "the expert radiologist's response"). This suggests that one expert radiologist was primarily responsible for establishing ground truth for the malignant cases and reviewing detection pairs.
- Qualifications of Experts: The document specifies "expert radiologist" but does not provide details on their specific qualifications, such as years of experience.
4. Adjudication method (e.g., 2+1, 3+1, none) for the test set
- The text describes ground truth for malignant cases being established by "an expert radiologist by generating ground truth marks and truth pairs." For the CC-MLO correlation feature, generated detection pairs were "reviewed by an expert radiologist."
- This suggests a single-reader ground truth establishment and review without an explicit multi-reader adjudication method (like 2+1 or 3+1). It seems to be "none" in terms of multi-reader consensus for the test set ground truth.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
- The document states, "Standalone evaluation testing was also conducted." It focuses on the performance of the algorithm itself and its ability to correlate findings.
- There is no mention of an MRMC comparative effectiveness study where human readers' performance with and without AI assistance was evaluated. Therefore, no effect size for human reader improvement is provided. The device "intends to aid in the interpretation... in a concurrent fashion, where the interpreting physician confirms or dismisses the findings," implying human-in-the-loop, but no study of this combined performance is detailed here.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
- Yes, a standalone evaluation was done. The document explicitly states: "Standalone evaluation testing was also conducted." The performance of the CC-MLO Correlation feature was "estimated... by scoring the detection pairs against the truth pairs."
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)
- For the malignant cases (106 cases): Ground truth was established by "an expert radiologist by generating ground truth marks and truth pairs." It also mentions these were "biopsy proven malignant cases," indicating pathology was also part of the ground truth for these malignant cases. The "truth pairs" were essentially expert annotations of pathologically confirmed lesions on both orthogonal views.
- For the screening negative cases (561 and 658 cases reviewed): The ground truth was presumably based on their screening negativity, validated by clinical follow-up or expert review. The review of detection pairs was against "the expert radiologist's response," implying expert judgment as ground truth for these negative/benign cases.
8. The sample size for the training set
- The sample size for the training set is not provided in this document. The description focuses solely on the "standalone evaluation of the CC-MLO Correlation feature" which used a specific test dataset.
9. How the ground truth for the training set was established
- As the training set sample size is not provided, how its ground truth was established is also not detailed in this document. It only mentions the use of "deep learning networks" which implies a trained model, but the specifics of its training data and ground truth establishment are absent from this summary.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.