(237 days)
icobrain aria is a computer-assisted detection (CADe) and diagnosis (CADx) software device to be used as a concurrent reading aid to help trained radiologists in the detection, assessment and characterization of Amyloid Related Imaging Abnormalities (ARIA) from a set of brain MR images. The software provides information about the presence, location, size, severity and changes of ARIA-E (brain edema or sulcal effusions) and ARIA-H (hemosiderin deposition, including microhemorrhage and superficial siderosis). Patient management decisions should not be made solely on the basis of analysis by icobrain aria.
icobrain aria is a software-only device for assisting radiologists with the detection of amyloid-related imaging abnormalities (ARIA) on brain MRI scans of Alzheimer's disease patients under an amyloid beta-directed antibody therapy. The device utilizes 2D fluid-attenuated inversion recovery (FLAR) for the detection of ARIA-E (edema/sulcal effusion) and 2D T2* gradient echo (T2*-GRE) for the detection of ARIA-H (hemosiderin deposition).
icobrain aria automatically processes input brain MRI scans in DICOM format from two time points and generates annotated DICOM images and an electronic report.
Here's a summary of the acceptance criteria and study that proves the device meets them, based on the provided text:
icobrain aria: Acceptance Criteria and Performance Study Summary
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are not explicitly listed in a single, dedicated table with pass/fail thresholds. Instead, they are implicitly defined by the statistically significant improvements demonstrated in the clinical (MRMC) study, and the "in line with human experts" conclusion from standalone performance. The document focuses on showing the effect size of the improvement rather than pre-defined absolute thresholds for sensitivity, specificity, or AUC for human-AI combined performance. For standalone metrics, it reports specific values and concludes they are "in line with the performance of human experts," suggesting the internal acceptance criteria were met.
Therefore, the table below will summarize the reported performance results from the clinical study, which implicitly met the acceptance criteria by demonstrating significant improvement over unassisted reading.
Performance Metric | Acceptance Criteria (Implicit, based on study outcomes) | Reported Device Performance (Assisted) | Reported Device Performance (Unassisted) | Result |
---|---|---|---|---|
ARIA-E Detection (AUC) | Significant improvement over unassisted reading | 0.873 (95% CI [0.835, 0.911]) | 0.822 | Significant Improvement (+0.051 AUC, p=0.001) |
ARIA-E Detection (Sensitivity) | Increase over unassisted reading | 86.5% | 70.9% | Significant Increase |
ARIA-E Detection (Specificity) | Maintain above 80% with assisted reading | 83.0% | 91.7% | Maintained above 80% (slight decrease compared to unassisted, but still high) |
Pooled ARIA-H Detection (AUC) | Significant improvement over unassisted reading | 0.825 (95% CI [0.781, 0.869]) | 0.781 | Significant Improvement (+0.044 AUC, p=0.001) |
Pooled ARIA-H Detection (Sensitivity) | Increase over unassisted reading | 79.0% | 68.7% | Significant Increase |
Pooled ARIA-H Detection (Specificity) | Maintain above 80% with assisted reading | 80.3% | 82.8% | Maintained above 80% (slight decrease compared to unassisted, but still high) |
ARIA-H Microhemorrhages Detection (AUC) | Significant improvement over unassisted reading | 0.808 (95% CI [0.760, 0.855]) | 0.779 | Significant Improvement (+0.029 AUC, p=0.032) |
ARIA-H Microhemorrhages Detection (Sensitivity) | Increase over unassisted reading | 79.6% | 69.3% | Significant Increase |
ARIA-H Microhemorrhages Detection (Specificity) | Maintain above 80% with assisted reading | 76.7% | 83.1% | Below 80% for this specific subtype |
ARIA-H Superficial Siderosis Detection (AUC) | Significant improvement over unassisted reading | 0.784 (95% CI [0.732, 0.836]) | 0.721 | Significant Improvement (+0.063 AUC, p=0.003) |
ARIA-H Superficial Siderosis Detection (Sensitivity) | Increase over unassisted reading | 59.9% | 49.7% | Significant Increase |
ARIA-H Superficial Siderosis Detection (Specificity) | Maintain above 80% with assisted reading | 95.6% | 92.7% | Maintained and improved |
Localization Performance | Significant improvement in accuracy for spatial distribution | Significantly better for assisted reads | N/A | Met |
ARIA Severity Measurement Accuracy | Significantly lower absolute differences vs. ground truth | Significantly lower assisted vs. unassisted | N/A | Met |
Inter-reader Variability (Kendall's Coeff. of Concordance) | Significantly lower for assisted reads | ARIA-E: 0.809 (assisted) / 0.720 (unassisted); ARIA-H: 0.799 (assisted) / 0.656 (unassisted) | N/A | Significant Reduction |
Reading Time | Faster with assisted reading | Median 2:21min (assisted) | Median 2:34min (unassisted) | Faster |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: 199 cases.
- Data Provenance: MRI datasets from subjects diagnosed with Alzheimer's disease. To guarantee independence, test data subjects were not included in the training set.
- Country of Origin: More than 100 sites in 20 countries. Approximately half the data originated from the US and the other half from outside the US.
- Retrospective/Prospective: The study used retrospective data from clinical trials (aducanumab clinical trials PRIME (NCT02677572), EMERGE (NCT02484547), and ENGAGE (NCT02477800)). This data provenance applies to both training and testing datasets.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: A consensus of 3 experts was used for the clinical (MRMC) study ground truth. For standalone testing, the ground truth was established by unspecified "expert neuroradiologists."
- Qualifications of Experts:
- Clinical Study (MRMC): Experts who performed "safety ARIA reading in clinical trials for Aβ-directed antibody therapies in AD."
- Standalone Testing: "expert neuroradiologists (with experience performing safety ARIA reading in clinical trials for Aβ-directed antibody therapies in AD) manually segmented both ARIA-H findings." This indicates they had prior, relevant experience.
4. Adjudication Method for the Test Set
- Adjudication Method: "A consensus of 3 experts" was used to establish the ground truth for the clinical (MRMC) study. The specific consensus method (e.g., majority vote, discussion to agreement) is not detailed, but the term "consensus" implies a collective agreement process.
5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study was Done, and Effect Size of Improvement
- MRMC Study Done: Yes, a fully-crossed MRMC retrospective reader study was conducted.
- Effect Size (AUC difference, Assisted vs. Unassisted):
- ARIA-E Detection: +0.051 AUC (95% CI [0.020, 0.083]), p=0.001
- Pooled ARIA-H Detection: +0.044 AUC (95% CI [0.017, 0.070]), p=0.001
- ARIA-H Microhemorrhages: +0.029 AUC (95% CI [0.002, 0.055]), p=0.032
- ARIA-H Superficial Siderosis: +0.063 AUC (95% CI [0.023, 0.102]), p=0.003
Readers also showed significant increases in sensitivity, significant decreases in inter-reader variability, and were on average faster when assisted.
6. If a Standalone (i.e. Algorithm only without human-in-the-loop performance) was Done
- Standalone Study Done: Yes, "icometrix conducted standalone performance assessments."
- Standalone Performance Highlights (Main Test Set on 199 cases):
- ARIA-E Diagnosis: Sensitivity 0.94, Specificity 0.67, AUC 0.84
- ARIA-H Diagnosis: Sensitivity 0.87, Specificity 0.66, AUC 0.81
- ARIA-E Finding-level: True Positive Rate 69.1%, False Positive findings per case 0.7
- ARIA-H New Microhemorrhages Finding-level: True Positive Rate 66.1%, False Positive findings per case 0.9
- ARIA-H New Superficial Siderosis Finding-level: True Positive Rate 62.5%, False Positive findings per case 0.1
- The document concludes that standalone performance was "in line with the performance of human experts."
- Standalone Performance Highlights (Main Test Set on 199 cases):
7. The Type of Ground Truth Used
- Ground Truth Type: Expert consensus for the clinical study (MRMC) and expert manual annotations for the standalone testing.
- Details: For standalone testing, "expert neuroradiologists ... manually segmented both ARIA-E and ARIA-H findings. Ground truth ARIA measurements were derived from the expert manual annotated masks." For the MRMC study, ground truth was obtained via "a consensus of 3 experts."
8. The Sample Size for the Training Set
- Training Set Sample Size:
- FLAIR images (for ARIA-E): 475 image pairs from 172 subjects.
- T2-GRE images (for ARIA-H):* 326 image pairs from 177 subjects.
9. How the Ground Truth for the Training Set Was Established
- Ground Truth Establishment for Training Set: The data used for developing the algorithms "have been manually annotated by expert neuroradiologists with prior experience of reading ARIA in clinical trials of amyloid beta-directed antibody drugs." This implies manual annotation by experts served as the ground truth for training.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.