(147 days)
The ScreenPoint Transpara™ system is intended for use as a concurrent reading aid for physicians interpreting screening mammograms, to identify regions suspicious for breast cancer and assess their likelihood of malignancy. Output of the device includes marks placed on suspicious soft tissue lesions and suspicious calcifications; region-based scores, displayed upon the physician's query, indicating the likelihood that cancer is present in specific regions; and an overall score indicating the likelihood that cancer is present on the mammogram. Patient management decisions should not be made solely on the basis of analysis by Transpara™.
Transpara™ is a software-only device for aiding radiologists with the detection and diagnosis of breast cancer in mammograms. The product consists of a processing server and an optional viewer. The software applies algorithms for recognition of suspicious calcifications and soft tissue lesions, which are trained with large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue. Processing results of Transpara™ can be transmitted to external destinations, such as medical imaging workstations or archives, using the DICOM mammography CAD SR protocol. This allows PACS workstations to implement the interface of Transpara™ in mammography reading applications.
Transpara™ automatically processes mammograms and the output of the device can be used by radiologists concurrently with the reading of mammograms. The user interface of Transpara™ has different functions:
a) Activation of computer aided detection (CAD) marks to highlight locations where the device detected suspicious calcifications or soft tissue lesions. Only the most suspicious soft tissue lesions are marked to achieve a very low false positive rate.
b) Regions can be queried using a pointer for interactive decision support. When the location of the queried region corresponds with a finding of Transpara™ a suspiciousness level of the region computed by the algorithms in the device is displayed. When Transpara™ has identified a corresponding region in another view of the same breast this corresponding region is also displayed to minimize interactions required from the user.
c) Display of the exam based Transpara™ Score which categorizes exams on a scale of 1-10 with increasing likelihood of cancer.
Transpara™ is configured as a DICOM node in a network and receives its input images from another DICOM node, such as a mammography device or a PACS archive. The image analysis unit includes machine learning components trained to detect calcifications and soft tissue lesions and a component to pre-process images in such a way that images from different vendors can be processed by the same algorithms.
1. Acceptance Criteria and Reported Device Performance:
The document primarily focuses on the clinical study's primary effectiveness endpoint, which is a significant increase in the Area Under the Receiver Operating Characteristic (AUC) curve when radiologists use Transpara™ compared to reading unaided.
Acceptance Criteria (Stated as primary effectiveness endpoint) | Reported Device Performance (mean difference) |
---|---|
Significant increase of area under the ROC (AUC) | +0.020 (95% Cl = 0.010 - 0.030, P=0.0019) |
2. Sample Size Used for the Test Set and Data Provenance:
- Sample Size: 240 cases. The study set included 100 cases with cancer, 40 false positive recalls from screening, and 100 normal exams.
- Data Provenance: Retrospectively collected digital mammograms from two clinical centers in the United States using Lorad Selenia (Hologic) and Mammomat Inspiration devices. The cases consisted of series of consecutive samples.
3. Number of Experts Used to Establish Ground Truth for the Test Set and Their Qualifications:
The document doesn't explicitly state the number of experts used to establish the ground truth for the test set that was read by the radiologists in the MRMC study. However, it does state that the device's algorithms were "trained with large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue." This implies that the ground truth for the training data and potentially for the test set (at least for cancer cases) was based on biopsy results.
For the MRMC study itself, the "experts" were the radiologists, who were MQSA qualified radiologists.
4. Adjudication Method for the Test Set:
The document doesn't explicitly describe an adjudication method for establishing the final ground truth of the test set cases. However, for the MRMC study, 14 MQSA qualified radiologists read each case twice (once with and once without Transpara™), implying their consensus or the established truth of the cases was used for evaluation. The "large databases of biopsy proven examples" for training suggest pathology was a key component of ground truth.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
Yes, an MRMC comparative effectiveness study was done.
- Effect Size of Human Reader Improvement with AI vs. Without AI Assistance: Radiologists significantly improved their detection performance with Transpara™. The average AUC increased from 0.866 to 0.886, representing a mean increase of +0.020 (95% Cl = 0.010 - 0.030, P=0.0019).
- For soft tissue lesions, AUC increased from 0.886 to 0.902 (mean difference = +0.016).
- For calcifications, AUC was 0.878 unaided and 0.898 aided (mean difference = +0.020).
6. Standalone (Algorithm Only) Performance:
Yes, a standalone performance evaluation was conducted.
- Results: The standalone breast cancer detection performance of Transpara™ (AUC=0.887) approached the average performance of the clinical study radiologists when reading mammograms unaided (radiologists' AUC = 0.866).
- Compared to individual radiologists, Transpara™ had a higher AUC (range 1.5-9.3%) than eleven out of fourteen radiologists and a lower AUC (range 1.7-4.7%) than the other three.
7. Type of Ground Truth Used:
The document mentions:
- Biopsy-proven examples: This is explicitly stated for the training of the algorithms.
- Screen-detected cancers: Used in the context of the Transpara™ Score.
- False positive recalls from screening: This indicates clinical follow-up for these cases.
- The overall context of mammography suggests that pathology (from biopsies), clinical follow-up, and possibly expert consensus/review (for normal cases and false positives) were used to establish the ground truth for the cases.
8. Sample Size for the Training Set:
The document states that the algorithms were "trained with large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue." However, it does not provide a specific sample size for the training set.
9. How the Ground Truth for the Training Set Was Established:
The ground truth for the training set was established using "large databases of biopsy proven examples of breast cancer, benign lesions and normal tissue." This indicates that pathological confirmation was the primary method for establishing the ground truth for training data.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.