(102 days)
cmTriage is a passive notification for prioritization-only, parallel-workflow software tool used by radiologists to prioritize specific patients within the standard-of-care image worklist for 2D FFDM screening mammograms, cmTriage uses an artificial intelligence algorithm to analyze 2D FFDM screening mammograms and flags those that are suggestive of the presence of at least one suspicious finding at the exam level. These flags are viewed by the radiologist via their Picture Archiving and Communication System (PACS) worklist. The decision to use cmTriage codes and how to use on Triage codes is ultimately up to the radiologist. cmTriage does not send a proactive alert directly to the radiologist.
Radiologists are responsible for reviewing each exam on a diagnostic viewer according to the current standard of care.
cm Triage is limited to the categorization of exams, does not provide any diagnostic information beyond triage and prioritization, does not remove images from the radiologist's worklist, and should not be used in lieu of full patient evaluation, or relied upon to make or confirm diagnosis.
cmTriage is for prescription use only.
CureMetrix's cmTriage is a radiological computer-assisted triage and notification software device. Digital two-dimensional (2D) mammograms are captured by a Full-Field Digital Mammography (FFDM) system and deposited on the PACS. The CureMetrix image forwarding software, acting as a PACS listener, receives a copy of the mammography DICOM image(s), creates a local copy of the image(s), de-identifies the local copy, transmits the local copy, transmits the local copy to the CureMetrix cloud, and then deletes the local copy.
Within the CureMetrix cloud, the cmTriage service receives the DICOM image(s), groups them by study, analyzes the image(s) within the study, and produces a result for each study. The results are produced in the form of DICOM Structured Report (SR) file with a cmTriage result.
The result file is encrypted and transmitted from the CureMetrix cloud back to cmEdge where it is decrypted and reassociated with the original study. The DICOM SR is then routed to the PACS.
Once the PACS receives the DICOM SR, the file is opened, the cmTriage code ("Impression Description") is extracted for the exam, and the worklist column for the exam is updated. The cmTriage code will either indicate "Suspicious" or "" (blank).
Within the PACS worklist the cmTriage code can be displayed in a separate column. Each PACS may have different features and functionality depending on the manufacturer which are outside of the scope and control of CureMetrix and cmTriage. However, in general, at a minimum, the user is able to sort their worklist based on values in columns. This sorting functionality (if present) would allow the radiologist to group Suspicious exams together. The device does not alter the original medical image and is not intended to be used as a diagnostic device.
The standard of care for breast cancer screening in the US is quickly becoming one in which both FFDM and digital breast tomosynthesis (DBT) are acquired during the exam. However, cmTriage only operates on 2D images. If a site does not use FFDM and 2D but instead only uses 3D and DBT, they will not be able to use this current device.
In summary, the cmTriage device is intended to provide a passive notification through the PACS to the radiologist indicating the existence of a case that may potentially benefit from that radiologist's prioritization.
Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:
Acceptance Criteria and Reported Device Performance
Acceptance Criteria (Target) | Reported Device Performance |
---|---|
Primary Endpoint 1: Sensitivity | Sensitivity: |
95% CI for sensitivity above the 80% CI reported in the BCSC (BCSC's low end of 80% CI for sensitivity: 80.7%). | Mean: 86.9% |
95% CI: 83.6% to 90.2% (Low end of 95% CI, 83.6%, exceeded BCSC's low end of 80% CI, 80.7%). | |
Primary Endpoint 2: Specificity | Specificity: |
At a sensitivity of 86.9% (corresponding to BCSC median sensitivity), the specificity should be comparable to the median specificity reported by the bottom of the BCSC 80% CI (BCSC median specificity: 88.9%). | Mean: 88.5% (at 86.9% sensitivity) |
95% CI: 86.4% to 90.7% (Low end of 95% CI, 86.4%, exceeded BCSC's low end of 80% CI for specificity, 82.6%, and was comparable to BCSC median of 88.9%). | |
Primary Endpoint 3: Population-Adjusted Mark Rate (Recall Rate) | Population-Adjusted Mark Rate: |
Below the target recall rate of 9.60% for radiologists at 84.4% sensitivity. | 6.37% (at 84.4% sensitivity) - Well below the target recall rate of 9.60% |
Secondary Endpoint: Time Performance | Processing Time: |
Mammograms can be processed and notification results returned for use by radiologists within clinically acceptable minutes. | Average: 3.35 minutes (at a network speed of 10Mbits/s upload and 37Mbits/s download) - Stated as "well within the clinical operations of breast cancer screening." |
Overall Performance Metric (AUC) | Area Under the Curve (AUC): |
Not explicitly stated as an acceptance criterion in terms of a numerical target (e.g., >X), but performance is reported. | 0.951 |
95% CI: 0.937 to 0.964 | |
Also reported by density and lesion type: | |
Density 1: AUC 0.964 (95% CI: 0.934 to 0.994) | |
Density 2: AUC 0.964 (95% CI: 0.946 to 0.981) | |
Density 3: AUC 0.940 (95% CI: 0.917 to 0.963) | |
Density 4: AUC 0.958 (95% CI: 0.92 to 0.995) | |
Mass: AUC 0.941 (95% CI: 0.923 to 0.959) | |
Calcifications: AUC 0.972 (95% CI: 0.958 to 0.985) |
Study Details Proving Device Meets Acceptance Criteria:
-
Sample sizes used for the test set and the data provenance:
- Test Set Sample Size: 1255 mammographic studies (exams).
- 400 biopsy-proven cancer studies (320 soft-tissue density, 122 microcalcifications - Note: The numbers 320 and 122 add up to 442, which is more than 400. This is a potential discrepancy in the source text, though it could imply that some studies had both, or it's a typo in the document. For this response, I'm reporting as written).
- 855 normal studies (BIRADS 1 and 2 with two-year follow-up of negative diagnosis).
- Data Provenance: Quarantined test data obtained from multiple clinical sites. The text does not specify the country of origin but implies a multi-center study. It was a retrospective study.
- Test Set Sample Size: 1255 mammographic studies (exams).
-
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document does not specify the number of experts or their qualifications used to establish the ground truth for the test set. It mentions "biopsy-proven cancer studies" and "normal studies (BIRADS 1 and 2 with two-year follow-up of negative diagnosis)" as the ground truth. This implies that the ground truth was established through clinical outcomes (biopsy results and follow-up) rather than de novo expert reading for the purpose of the study.
-
Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- The document does not specify an adjudication method for the test set. Given that ground truth was established by biopsy and clinical follow-up for the test set, a reader adjudication process would typically not be explicitly needed for establishing the definitive truth labels.
-
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study involving human readers with and without AI assistance was not reported in this document. The study presented focused on the standalone performance of the cmTriage software. The comparison made (e.g., population-adjusted mark rate against radiologists' recall rate) is a comparison to a general historical standard, not a direct human reader study for improvement.
-
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance study of the algorithm was conducted. The results for sensitivity, specificity, AUC, and mark rate are all based on the algorithm's performance in isolation.
-
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The ground truth for the test set was based on pathology (biopsy-proven cancer) and outcomes data (two-year follow-up of negative diagnosis for normal studies).
-
The sample size for the training set:
- The document does not explicitly state the sample size for the training set. It only describes the test set.
-
How the ground truth for the training set was established:
- The document does not explicitly state how the ground truth for the training set was established, as the training set details are not provided. However, typically, ground truth for training similarly relies on confirmed diagnoses (e.g., pathology, long-term follow-up) or expert annotations.
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.