(214 days)
QOCA® image Smart CXR Image Processing System is a software as medical device (SaMD) used, through artificial intelligence/deep learning technology, to analyze chest X-ray images of adult patient, and then identify cases with suspected pneumothorax. This product shall be used in conjunction with Picture Archiving and Communication System (PACS) at the hospital. This product will automatically analyze the DICOM files automatically pushed from PACS, and then make a notation next to the cases with suspected pneumothorax. This product is only used to remind radiologists to prioritize reviewing cases with suspected pneumothorax. Its results cannot be used as a substitute for a diagnosis by a radiologist, nor can it be used on a stand-alone basis for clinical decision-making.
This product, QOCA® image Smart CXR Image Processing System, is a web-based medical device using a locked artificial intelligence algorithm. It provides features such as cases sorting and image viewing, and supports multiple users at a time.
After connecting to Picture Archiving and Communication System (PACS) at the hospital, this product is capable of automatically analyzing either posteroanterior (PA) view or anteroposterior (AP) erect view chest X-ray images automatically pushed from PACS. Once a case with suspected pneumothorax is identified, a notation will be made next to the case in question, so the radiologist can prioritize to review cases with suspected pneumothorax in the Viewer Page. This product will not directly indicate, however, the specific portions or anomalies on the image.
Here's a breakdown of the acceptance criteria and the study details for the QOCA® image Smart CXR Image Processing System:
1. Acceptance Criteria and Reported Device Performance
Metric | Acceptance Criteria (Predicate Device K190362 Performance) | Reported Device Performance (QOCA® image Smart CXR) | Overall Performance |
---|---|---|---|
AUC | 98.3% (95% CI: [97.40%, 99.02%]) | 97.8% (95% CI: [97.0%, 98.5%]) | Met |
Sensitivity | 93.15% (95% CI: [87.76%, 96.67%]) | 92.5% (95% CI: [90.5%, 94.2%]) | Met |
Specificity | 92.99% (95% CI: [90.19%, 95.19%]) | 94.0% (95% CI: [93.9%, 94.6%]) | Met |
Average Performance Time | 22.1 seconds | 4.94 seconds | Met |
Note: The reported device performance is an overall performance across both the MIMIC and Taiwanese datasets. Individual performance for each dataset is also provided in the document.
2. Sample Size Used for the Test Set and Data Provenance
The device's performance was assessed using two separate pivotal studies/datasets:
-
MIMIC Dataset:
- Sample Size: 3,105 radiographs (336 positive pneumothorax cases, 2,769 negative pneumothorax cases).
- Data Provenance: US patient population (MIMIC dataset). This was an independent medical institution from the training dataset.
-
Taiwanese Dataset:
- Sample Size: 2,947 radiographs (472 positive pneumothorax cases, 2,475 negative pneumothorax cases).
- Data Provenance: Taiwanese hospital. This was an independent medical institution from the training dataset.
Overall Test Set: 6,052 radiographs (3,105 from MIMIC + 2,947 from Taiwan). Both datasets were retrospective.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
For both the MIMIC dataset and the Taiwanese dataset:
- Number of Experts: Three radiologists.
- Qualifications: The document states "truthed by three radiologists" without specifying their years of experience or sub-specialty.
4. Adjudication Method for the Test Set
The document does not explicitly state the adjudication method (e.g., 2+1, 3+1). It only mentions that the datasets were "truthed by three radiologists," implying a consensus-based approach, but the specific process for resolving disagreements is not detailed.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
There is no mention of a Multi-Reader Multi-Case (MRMC) comparative effectiveness study being performed to assess how much human readers improve with AI vs. without AI assistance. The study focused on the standalone performance of the AI algorithm.
6. Standalone Performance Study
Yes, a standalone performance study was done. The document explicitly states: "Bases on the results of the standalone performance assessment, this product achieves, identification accuracy of AUC > 95% with Sensitivity > 91% and Specificity > 92%." The performance metrics provided in section 1 (AUC, sensitivity, specificity) reflect the algorithm's performance without human-in-the-loop.
7. Type of Ground Truth Used
The ground truth for the test sets (MIMIC and Taiwanese) was established by "three radiologists," which indicates expert consensus diagnosis.
8. Sample Size for the Training Set
The document states: "The training dataset is used to train the model, and divided into three sets: the training set, the validation set, and the test set." However, the specific sample size for the entire training dataset (including training, validation, and its own internal test set used during development) is not provided in the summary. It only indicates that it was "collected from two hospitals, and additional data from the US National Institutes of Health (NIH) was added to the test set to improve its US patient population representativeness during training."
9. How the Ground Truth for the Training Set Was Established
The document states that the "model training dataset was collected from two hospitals, and additional data from the US National Institutes of Health (NIH) was added to the test set." While it implies the data was labeled for training, it does not explicitly describe how the ground truth for the training set was established (e.g., whether it was also by expert radiologists, pathology, etc.).
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.