(153 days)
Transpara software is intended for use as a concurrent reading aid for physicians interpreting screening full-field digital mammography exams and digital breast tomosynthesis exams from compatible FFDM and DBT systems, to identify regions suspicious for breast cancer and assess their likelihood of malignancy. Output of the device includes locations of calcifications groups and soft-tissue regions, with scores indicating the likelihood that cancer is present, and an exam score indicating the likelihood that cancer is present in the exam. Patient management decisions should not be made solely on the basis of analysis by Transpara.
Transpara is a software only application designed to be used by physicians to improve interpretation of full-field digital mammography (FFMD) and digital breast tomosynthesis (DBT). Deep learning algorithms are applied to images for recognition of suspicious calcifications and soft tissue lesions (including densities, masses, architectural distortions, and asymmetries). Algorithms are trained with a large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue.
Transpara offers the following functions which may be used at any time in the reading process, to improve detection and characterization of abnormalities and enhance workflow:
- AI findings for display in the images to highlight locations where the device detects suspicious calcifications or soft tissue lesions, along with region scores per finding on a scale ranging from 1-100, with higher scores indicating a higher level of suspicion.
- Links between corresponding regions in different views of the breast, which may be utilized to enhance user interfaces and workflow.
- An exam-based score which categorizes exams with increasing likelihood of cancer on a scale of 1-10 or in three risk categories labeled as 'low', 'intermediate' or 'elevated'.
The concurrent use indication implies that it is up to the users to decide how to use Transpara in the reading process. Transpara functions can be used before, during or after visual interpretation of an exam by a user.
Results of Transpara are computed in a standalone processing appliance which accepts mammograms in DICOM format as input, processes them, and sends the processing output to a destination using the DICOM protocol in a standardized mammography CAD DICOM format. Common destinations are medical workstations, PACS and RIS. The system can be configured using a service interface. Implementation of a user interface for end users in a medical workstation is to be provided by third parties.
The provided text describes the acceptance criteria and a study that proves the device, Transpara (2.1.0), meets these criteria.
Here's an organized breakdown of the information requested:
Acceptance Criteria and Reported Device Performance
The acceptance criteria are implicitly defined by the reported performance metrics. The study aims to demonstrate non-inferiority and superiority to the predicate device, Transpara 1.7.2. The key metrics reported are sensitivity at various specificity levels and Exam-based Area Under the Receiver Operating Characteristic Curve (AUC).
Table 1: Acceptance Criteria (Implied by Performance Goals) and Reported Device Performance (Standalone without Temporal Analysis)
Metric | Acceptance Criteria (Implied/Target) | Reported Performance (FFDM) | Reported Performance (DBT) |
---|---|---|---|
Sensitivity (Sensitive Mode @ 70% Specificity) | Non-inferior & Superior to Predicate Device 1.7.2 (quantitative value not specified, but implied by comparison) | 97.4% (96.3 - 98.5) | 96.9% (95.5 - 98.3) |
Sensitivity (Specific Mode @ 80% Specificity) | Non-inferior & Superior to Predicate Device 1.7.2 | 95.2% (93.7 - 96.7) | 95.1% (93.3 - 96.8) |
Sensitivity (Elevated Risk @ 97% Specificity) | Non-inferior & Superior to Predicate Device 1.7.2 | 80.8% (78.0 - 83.6) | 78.4% (75.1 - 81.7) |
Exam-based AUC | Non-inferior & Superior to Predicate Device 1.7.2 | 0.960 (0.953 - 0.966) | 0.955 (0.947 - 0.963) |
Table 2: Acceptance Criteria (Implied by Performance Goals) and Reported Device Performance (Standalone with Temporal Analysis - TA)
Metric | Acceptance Criteria (Implied/Target) | Reported Performance (FFDM with TA) | Reported Performance (DBT with TA) |
---|---|---|---|
Sensitivity (Sensitive Mode @ 70% Specificity) | Superior to performance without temporal comparison | 95.7% (93.7 - 97.6) | 94.6% (91.2 - 98.0) |
Sensitivity (Specific Mode @ 80% Specificity) | Superior to performance without temporal comparison | 95.4% (93.4 - 97.4) | 91.0% (86.7 - 95.4) |
Sensitivity (Elevated Risk @ 97% Specificity) | Superior to performance without temporal comparison | 82.7% (79.1 - 86.4) | 74.9% (68.3 - 81.4) |
Exam-based AUC | Superior to performance without temporal comparison | 0.958 (0.946 - 0.969) | 0.941 (0.921 - 0.958) |
Study Details
-
Sample Size Used for the Test Set and Data Provenance:
- Main Test Set (without temporal analysis): 10,207 exams (5,730 FFDM, 4,477 DBT).
- Normal: 8,587 exams
- Benign: 270 exams
- Cancer: 1,350 exams (750 FFDM, 600 DBT)
- Temporal Analysis Test Set: 5,724 exams (4,266 FFDM, 1,458 DBT).
- Normal: 4,998 exams
- Benign: 83 exams
- Cancer: 643 exams (471 FFDM, 172 DBT)
- Data Provenance: Independent dataset acquired from multiple centers in seven EU countries and the US. Retrospective in nature, as it was acquired and not used for algorithm development and included normal exams with at least one year follow-up. The data included images from various manufacturers (Hologic, GE, Philips, Siemens, Fujifilm).
- Main Test Set (without temporal analysis): 10,207 exams (5,730 FFDM, 4,477 DBT).
-
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts:
- The document states that the cancer cases in the test set were "biopsy-proven cancer." It does not specify the number or qualifications of experts used to establish the ground truth for the entire test set (including normal and benign cases, and detailed lesion characteristics). The mechanism for establishing the "normal" and "benign" status is not explicitly detailed beyond "normal follow-up of at least one year."
-
Adjudication Method for the Test Set:
- The document does not explicitly describe an adjudication method involving multiple readers for establishing ground truth for the test set. The ground truth for cancer cases is stated as "biopsy-proven."
-
If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done:
- No, the document does not describe a Multi-Reader Multi-Case (MRMC) comparative effectiveness study. The performance assessment is a standalone evaluation of the algorithm's performance, not a human-in-the-loop study comparing human readers with and without AI assistance.
-
If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, standalone performance tests were conducted. The results presented in Tables 2 and 5 are for the algorithm's performance only.
-
The Type of Ground Truth Used:
- The primary ground truth for cancer cases is biopsy-proven cancer. For normal exams within the test set, the ground truth was established by "a normal follow-up of at least one year," implying outcomes data (absence of diagnosed cancer over a follow-up period).
-
The Sample Size for the Training Set:
- The document does not explicitly state the sample size of the training set. It mentions "Deep learning algorithms are applied to images for recognition of suspicious calcifications and soft tissue lesions... Algorithms are trained with a large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue."
-
How the Ground Truth for the Training Set Was Established:
- The ground truth for the training set was established using a "large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue." This implies a similar methodology to the test set for cancer cases (biopsy verification) and likely clinical follow-up or expert consensus for benign/normal cases, though not explicitly detailed for the training set.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.