(243 days)
Annalise Enterprise CXR Triage Trauma is a software workflow tool designed to aid the clinical assessment of adult chest X-ray cases with features suggestive of vertebral compression fracture in the medical care environment.
The device analyzes cases using an artificial intelligence algorithm to identify findings. It makes case-level output available to a PACS or RIS for worklist prioritization or triage intended for clinicians in Bone Health and Fracture Liaison Service programs.
The device is intended to be used by trained clinicians who are qualified to interpret chest X-rays as part of their scope of practice.
The device is not intended to direct attention to specific portions of an image or to anomalies other than vertebral compression fracture.
Its results are not intended to be used on a standalone basis for clinical decision making nor is it intended to rule out specific critical findings, or otherwise preclude clinical assessment of X-ray cases.
Standalone performance evaluation of the device was performed on a dataset that included only erect positioning. Use of this device with supine positioning may result in differences in performance.
Annalise Enterprise CXR Triage Trauma is a software workflow tool which uses an artificial intelligence (AI) algorithm to identify suspected findings on chest X-ray (CXR) studies in the medical care environment. The findings identified by the device include vertebral compression fractures.
Radiological findings are identified by the device using an AI algorithm - a convolutional neural network trained using deep-learning techniques. Images used to train the algorithm were sourced from datasets across three continents, including a range of equipment manufacturers and models. The performance of the device's AI algorithm was validated in a standalone performance evaluation, in which the case-level output from the device was compared with a reference standard ('ground truth'). This was determined by two ground truthers, with a third truther used in the event of disagreement. All truthers were US board-certified radiologists.
The device interfaces with image and order management systems (such as PACS/RIS) to obtain CXR studies for processing by the AI algorithm. Following processing, if the clinical finding of interest is identified in a CXR study, the device provides a notification to the image and order management system for prioritization of that study in the worklist. This enables users to review the studies containing features suggestive of these clinical findings earlier than in the standard clinical workflow. It is important to note that the device will never decrease a study's existing priority in the worklist. This ensures that worklist items will never have their priorities downgraded based on AI results.
The device workflow is performed parallel to and in conjunction with the standard clinical workflow for interpretation of CXRs. The device is intended to aid in prioritization and triage of radiological medical images only.
The provided text describes the Annalise Enterprise CXR Triage Trauma device, an AI-powered software tool designed to aid in the clinical assessment and triage of adult chest X-ray cases for vertebral compression fracture. Here's a breakdown of its acceptance criteria and the study proving its performance:
Acceptance Criteria and Reported Device Performance
Finding | Acceptance Criteria (Metric) | Reported Device Performance |
---|---|---|
Vertebral compression fracture | AUC (Area Under the Curve) | 0.954 (95% CI: 0.939-0.968) |
Vertebral compression fracture | Sensitivity (Se) at specific operating point | 89.3% (85.7-93.0%) at 0.3849 operating point |
Vertebral compression fracture | Specificity (Sp) at specific operating point | 89.0% (85.8-92.1%) at 0.3849 operating point |
Vertebral compression fracture | Sensitivity (Se) at specific operating point | 85.3% (80.9-89.3%) at 0.4834 operating point |
Vertebral compression fracture | Specificity (Sp) at specific operating point | 90.9% (87.7-94.0%) at 0.4834 operating point |
Triage Turn-Around Time | Demonstrates effective triage (implicitly compared to predicate device) | Average 30.0 seconds |
Study Details
-
Sample Size and Data Provenance:
- Test Set Sample Size: 589 CXR cases (272 positive for vertebral compression fracture, 317 negative).
- Data Provenance: Retrospective, anonymized study. Collected consecutively from four U.S. hospital network sites. The cases were collected from multiple data sources spanning a variety of geographical locations.
-
Number of Experts and Qualifications for Ground Truth:
- Number of Experts: At least two ABR-certified radiologists for initial annotation, with a third radiologist for disagreement resolution.
- Qualifications: All truthers were U.S. board-certified radiologists who were protocol-trained.
-
Adjudication Method for Test Set:
- Consensus was determined by two ground truthers. A third ground truther was used in the event of disagreement. This is commonly referred to as a 2+1 adjudication method.
-
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- No MRMC comparative effectiveness study was explicitly mentioned or detailed in the provided text regarding how human readers improve with AI vs without AI assistance. The study described is primarily a standalone performance evaluation of the AI algorithm. The device's role is described as a "workflow tool" for worklist prioritization or triage, implying indirect assistance, rather than direct human-AI collaborative interpretation.
-
Standalone Performance:
- Yes, a standalone (algorithm only without human-in-the-loop) performance evaluation was done. The performance results (AUC, Sensitivity, Specificity) listed in the table above pertain to this standalone evaluation.
-
Type of Ground Truth Used:
- Expert consensus (blinded annotations by ABR-certified radiologists with adjudication).
-
Training Set Sample Size:
- The exact sample size for the training set is not explicitly stated. However, it is mentioned that "Images used to train the algorithm were sourced from datasets across three continents, including a range of equipment manufacturers and models." The test dataset was "newly acquired and independent from the training dataset used in model development."
-
How Ground Truth for Training Set Was Established:
- The document states that the AI algorithm was a "convolutional neural network trained using deep-learning techniques." While it mentions the source of the training data (datasets across three continents), it does not explicitly detail the method for establishing ground truth for the training set (e.g., expert review, pathology, or other means). The focus of the provided text is on the validation of the test set performance.
§ 892.2080 Radiological computer aided triage and notification software.
(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.