(35 days)
Annalise Enterprise is a device designed to be used in the medical care environment to aid in triage and prioritization of studies with features suggestive of the following findings:
- pleural effusion* [1]
- pneumoperitoneum* [2]
- pneumothorax
- tension pneumothorax
- vertebral compression fracture* [3]
*See additional information below.
The device analyzes studies using an artificial intelligence algorithm to identify findings. It makes study-level output available to an order and imaging management system for worklist prioritization or triage.
The device is not intended to direct attention to specific portions of an image and only provides notification for suspected findings.
Its results are not intended:
- to be used on a standalone basis for clinical decision making
- to rule out specific findings, or otherwise preclude clinical assessment of chest X-ray studies
Intended modality:
Annalise Enterprise identifies suspected findings in digitized (CR) or digital (DX) chest X-ray studies.
Intended user:
The device is intended to be used by trained clinicians who are qualified to interpret chest X-ray studies as part of their scope of practice.
Intended patient population:
The intended population is patients who are 22 years or older.
Additional information:
The following additional information relates to the findings listed above:
[1] Pleural effusion
- specificity may be reduced in the presence of scarring and/or pleural thickening
- standalone performance evaluation was performed on a dataset that included supine and erect positioning
- use of this device with prone positioning may result in differences in performance
[2] Pneumoperitoneum
- standalone performance evaluation was performed on a dataset that included supine and erect positioning where most cases were of unilateral right-sided and bilateral pneumoperitoneum
- use of this device with prone positioning and for unilateral left-sided pneumoperitoneum may result in differences in performance
[3] Vertebral compression fracture
- intended for prioritization or triage of worklists of Bone Health and Fracture Liaison Service program clinicians
- standalone performance evaluation was performed on a dataset that included only erect positioning
- use of this device with supine positioning may result in differences in performance
Annalise Enterprise is a software workflow tool which uses an artificial intelligence (AI) algorithm to identify suspected findings on chest X-ray studies in the medical care environment. The findings identified by the device include pneumothorax, tension pneumothorax, pleural effusion, pneumoperitoneum and vertebral compression fracture.
Radiological findings are identified by the device using an AI algorithm – a convolutional neural network trained using deep-learning techniques. Images used to train the algorithm were sourced from datasets that included a range of equipment manufacturers including. This dataset, which contained over 750,000 chest X-ray imaging studies, was labelled by trained radiologists regarding the presence of the findings of interest.
The performance of the device's AI algorithm was validated in a standalone performance evaluation, in which the case-level output from the device was compared with a reference standard ('ground truth'). This was determined by two ground truthers, with a third truther used in the event of disagreement. All truthers were US board-certified radiologists.
The device interfaces with image and order management systems (such as PACS/RIS) to obtain chest X-ray studies for processing by the AI algorithm. Following processing, if any of the clinical findings of interest are identified in the study, the device provides a notification to the image and order management system for prioritization of that study in the worklist. This enables users to review the studies containing features suggestive of these clinical findings earlier than in the standard clinical workflow. It is important to note that the device will never decrease a study's existing priority in the worklist. This ensures that worklist items will never have their priorities downgraded based on AI results.
The device workflow is performed parallel to and in conjunction with the standard clinical workflow for interpretation of chest X-ray studies. The device is intended to aid in prioritization and triage of radiological medical images only.
The Annalise Enterprise device is designed to aid in the triage and prioritization of chest X-ray studies by identifying features suggestive of several findings. The following outlines the acceptance criteria and the study conducted to prove the device meets these criteria.
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for each finding are implicitly demonstrated by the reported Area Under the Curve (AUC), Sensitivity, and Specificity values, aiming for high performance in triaging positive cases while minimizing false positives. The reported device performance for each finding at various operating points is:
Finding | Product Code | AUC (95% CI) | Operating Point (Threshold) | Sensitivity % (Se) (95% CI) | Specificity % (Sp) (95% CI) |
---|---|---|---|---|---|
Pneumothorax | QFM | 0.984 (0.976, 0.990) | 0.200 | 97.1 (95.5, 98.6) | 88.2 (85.4, 90.8) |
0.250 | 96.2 (94.3, 98.1) | 91.9 (89.5, 94.1) | |||
0.300 | 95.0 (92.8, 97.1) | 94.1 (91.9, 95.9) | |||
0.350 | 93.1 (90.7, 95.5) | 95.6 (93.7, 97.2) | |||
0.400 | 90.7 (88.0, 93.3) | 96.7 (95.0, 98.2) | |||
Tension Pneumothorax | QFM | 0.989 (0.984, 0.994) | 0.225 | 96.0 (92.0, 99.2) | 94.0 (92.3, 95.6) |
0.250 | 95.2 (91.2, 98.4) | 94.6 (93.1, 96.2) | |||
0.300 | 93.6 (88.8, 97.6) | 95.6 (94.1, 96.9) | |||
0.350 | 89.6 (84.0, 94.4) | 96.6 (95.3, 97.8) | |||
0.400 | 87.2 (80.8, 92.8) | 97.5 (96.4, 98.6) | |||
Pneumoperitoneum | QAS | 0.987 (0.976, 0.994) | 0.250 | 96.2 (92.4, 99.0) | 87.9 (83.2, 92.1) |
0.300 | 94.3 (89.5, 98.1) | 90.5 (86.3, 94.2) | |||
0.350 | 92.4 (86.7, 97.1) | 93.7 (90.0, 96.8) | |||
0.400 | 91.4 (85.7, 96.2) | 95.8 (92.6, 98.4) | |||
0.450 | 87.6 (81.0, 93.3) | 98.4 (96.3, 100.0) | |||
Pleural Effusion | QFM | 0.977 (0.969, 0.984) | 0.380 | 96.7 (95.0, 98.1) | 86.8 (83.6, 89.5) |
0.425 | 94.4 (92.3, 96.5) | 89.5 (86.8, 92.1) | |||
0.450 | 92.9 (90.7, 95.0) | 91.3 (88.6, 93.7) | |||
0.475 | 89.8 (87.1, 92.3) | 93.7 (91.5, 95.9) | |||
0.500 | 87.6 (84.6, 90.5) | 95.5 (93.5, 97.0) | |||
Vertebral Compression Fx | QFM | 0.972 (0.960, 0.982) | 0.460 | 93.4 (90.1, 96.0) | 85.8 (82.1, 89.6) |
0.500 | 92.6 (89.3, 95.6) | 90.9 (87.7, 93.7) | |||
0.550 | 87.1 (83.1, 90.8) | 94.7 (91.8, 96.9) |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: The standalone performance evaluation was conducted on a total dataset of 3,252 cases.
- Data Provenance: The data was collected retrospectively and anonymized. Cases were collected consecutively from four US hospital network sites. The datasets included a variety of patient demographics (gender, age, ethnicity, race) and technical parameters (imaging equipment make, model), indicating a diverse geographic (US) and technical (various scanner manufacturers: Agfa, Carestream, Fujifilm, GE Healthcare, Kodak, Konica Minolta, McKesson, Philips, Siemens, Varian) origin.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: At least two ABR-certified radiologists were used for each de-identified case. A third radiologist was used in the event of disagreement.
- Qualifications: All truthers were US board-certified radiologists who interpret chest X-rays as part of their regular clinical practice and were protocol-trained.
4. Adjudication Method for the Test Set
The adjudication method used was 2+1 consensus. Each deidentified case was annotated by at least two ground truthers (radiologists), and consensus was determined by these two. In the event of disagreement between the first two, a third ground truther was used to resolve the discrepancy and establish the final ground truth.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
The provided information does not indicate that a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done to assess how human readers improve with AI vs. without AI assistance. The study focuses on the standalone performance of the AI algorithm and its impact on triage effectiveness (turn-around time).
6. Standalone Performance (Algorithm Only without Human-in-the-Loop Performance)
Yes, a standalone performance evaluation was done. The key results table and associated metrics (AUC, Sensitivity, Specificity) are specifically for the device's AI algorithm independent of human intervention. The study describes "case-level output from the device was compared with a reference standard ('ground truth')", confirming a standalone evaluation.
7. Type of Ground Truth Used
The ground truth used was expert consensus, established by multiple US board-certified radiologists using a 2+1 adjudication method.
8. Sample Size for the Training Set
The training dataset used to train the Convolutional Neural Network (CNN) algorithm contained over 750,000 chest X-ray imaging studies.
9. How the Ground Truth for the Training Set was Established
The studies in the training dataset were labelled by trained radiologists regarding the presence of the findings of interest. The document does not specify the exact number of radiologists or the specific consensus or adjudication method used for the training set, only that they were "trained radiologists".
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.