(70 days)
The device is designed to aid the clinical assessment of adult chest x-ray cases with features suggestive of pneumothorax and tension pneumothorax in the medical care environment. The device analyses cases using an artificial intelligence algorithm to identify findings. It makes case-level output available to a PACS for worklist prioritization or triage. The device is intended to be used by trained clinicians who are qualified to interpret chest X-rays as part of their scope of practice. The device is not intended to direct attention to specific portions of an image or to anomalies other than pneumothorax and tension pneumothorax. Its results are not intended to be used on a standalone basis for clinical decision making nor it is intended to rule out pneumothorax or tension pneumothorax, or otherwise preclude clinical assessment of X-ray cases.
Annalise Enterprise CXR Triage Pneumothorax is a software workflow tool that interfaces with RIS and PACS to obtain chest x-ray images to process. The artificial intelligence algorithm within the device uses deep learning techniques to identify the presence of pneumothorax and tension pneumothorax. The AI algorithm used in the device is a convolutional network trained using deep learning techniques. The AI results output from the device are sent to the reporting worklist software to enable AI assisted triage of the reporting worklist. The exact functionality available depends on the worklist software being used (RIS, PACS). This triage functionality uses the findings detected in each study by the AI model to provide information into the worklist software enabling the prioritization of the reporting worklist. Each organization can specify which findings will result in triage and the priority of each finding. It is important to note that the device will never decrease a study's existing priority in the worklist. This ensures that worklist items will never have their priorities downgraded by the AI software.
Acceptance Criteria and Device Performance Study
The Annalise Enterprise CXR Triage Pneumothorax device was evaluated through two core studies: a Standalone Performance Assessment (accuracy) and a Triage Effectiveness Assessment (turn-around time).
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are derived from the requirements set forth in Special Controls per 21CFR892.2080 for product code QFM, which include an AUC > 0.95 and sensitivity and specificity > 80%. The reported device performance is based on the "default balanced sensitivity and specificity" operating point.
Metric | Acceptance Criteria (Product Code QFM) | Reported Device Performance (Pneumothorax) | Reported Device Performance (Tension Pneumothorax) |
---|---|---|---|
AUC | > 0.95 | 0.979 (95% CI: 0.970-0.986) | 0.988 (95% CI: 0.981-0.993) |
Sensitivity | > 80% | 93.9% (95% CI: 91.8-96.1) | 94.3% (95% CI: 90.2, 98.4) |
Specificity | > 80% | 92.2% (95% CI: 89.9-94.4) | 95.8% (95% CI: 94.3, 97.1) |
No specific acceptance criteria for triage effectiveness (turn-around time) were explicitly stated, but the reported performance was deemed "substantially equivalent to the total performance time published for the predicate device."
2. Sample Size for Test Set and Data Provenance
- Sample Size: 949 CXR cases from unique adult patients.
- Positive pneumothorax: n=413 (including tension pneumothorax n=123 as a subset)
- Negative pneumothorax: n=536
- Data Provenance: The test dataset was acquired retrospectively and consisted of eligible chest x-ray cases collected consecutively from 4 US hospital network sites. No data had previously been collected from these sites for training or testing of the device's AI algorithm, ensuring independence from the training dataset.
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: At least two American Board of Radiology (ABR)-certified radiologists for initial annotation, with a third radiologist for disagreement resolution.
- Qualifications of Experts: All radiologists were ABR-certified and protocol-trained.
4. Adjudication Method for Test Set
The adjudication method used was a 2+1 consensus method. Each deidentified CXR case was annotated in a blinded fashion by at least two ABR-certified and protocol-trained radiologists. In the event of disagreement between the first two radiologists, a third ground truther determined the consensus.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
The provided text does not mention a Multi-Reader Multi-Case (MRMC) comparative effectiveness study being conducted to assess the effect size of how much human readers improve with AI vs. without AI assistance. The study focuses on the standalone performance of the AI device and its triage effectiveness.
6. Standalone (Algorithm Only) Performance
Yes, a standalone (algorithm only without human-in-the-loop performance) study was done. The "Standalone Performance Assessment (accuracy)" directly evaluates the device's ability to identify pneumothorax and tension pneumothorax without human intervention. The results for AUC, sensitivity, and specificity presented above are from this standalone assessment.
7. Type of Ground Truth Used for Test Set
The ground truth used for the test set was expert consensus. This was established by at least two (and potentially three in case of disagreement) ABR-certified radiologists reviewing and annotating the deidentified CXR cases.
8. Sample Size for Training Set
The training dataset included over 1,500,000 chest x-ray images.
9. How Ground Truth for Training Set Was Established
Cases in the training dataset were labeled by at least three qualified radiologists for the presence or absence of radiographic features suggestive of pneumothorax or tension pneumothorax.
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.