K Number
K200921
Device Name
qER
Date Cleared
2020-06-17

(72 days)

Product Code
Regulation Number
892.2080
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

qER is a radiological computer aided triage and notification software in the analysis of non-contrast head CT images.

The device is intended to assist hospital networks and trained medical specialists in workflow triage by flagging the following suspected positive findings of pathologies in head CT images: intracranial hemorrhage, mass effect, midline shift and cranial fracture.

qER uses an artificial intelligence algorithm to analyze images on a standalone cloud-based application in parallel to the ongoing standard of care image interpretation. The user is presented with notifications for cases with suspected findings. Notifications include non-diagnostic preview images that are meant for informational purposes only. The device does not alter the original medical image and is not intended to be used as a diagnostic device.

The results of the device are intended to be used in conjunction information and based on professional judgment, to assist with triage/prioritization of medical images. Notified clinicians are responsible for viewing full images per the standard of care.

Device Description

Qure.ai Head CT scan interpretation software, qER, is a deep-learning-based software device that analyses head CT scans for signs of intracranial hemorrhage, midline shift, mass effect or cranial fractures in order to prioritize them for clinical review. The standalone software device consists of an on-premise module and a cloud module. qER accepts non-contrast adult head CT scan DICOM files as input and provides a priority flag indicating critical scans. Additionally, the software has the preview of critical scans to the medical specialist.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the qER device meets them, based on the provided text:

Acceptance Criteria and Device Performance

The core purpose of the qER device is workflow triage by identifying suspected positive findings of pathologies in head CT images. The performance data presented focuses on the device's ability to accurately detect these pathologies in a standalone setting.

1. Table of Acceptance Criteria and Reported Device Performance

While the document doesn't explicitly state "acceptance criteria" as numerical thresholds beyond "exceeded the predefined success criteria, as well as the required performance criteria for triage and notification software as per the special controls for QAS," the reported sensitivities and specificities for each pathology effectively serve as the demonstrated "acceptance" level the device achieved.

AbnormalityAcceptance Criteria (Implied Success)Reported Device Performance (Sensitivity [95% CI])Reported Device Performance (Specificity [95% CI])Reported Device Performance (AUC [95% CI])
Intracranial HemorrhageHigh sensitivity & specificity for triage96.98 (95.32 - 98.17)93.92 (91.87 - 95.58)98.53 (98.00 - 99.15)
Cranial FractureHigh sensitivity & specificity for triage96.77 (93.74 - 98.60)92.72 (91.00 - 94.21)97.66 (96.88 - 98.57)
Mass EffectHigh sensitivity & specificity for triage96.39 (94.28 - 97.88)96.00 (94.45 - 97.21)99.09 (98.73 - 99.52)
Midline ShiftHigh sensitivity & specificity for triage97.34 (95.30 - 98.67)95.36 (93.79 - 96.64)99.09 (98.74 - 99.51)
Any of the 4 target abnormalitiesHigh sensitivity & specificity for triage98.53 (97.45 - 99.24)91.22 (88.39 - 93.55)NA

Additionally, a key performance metric for a triage device is the time to notification:

ParameterAcceptance Criteria (Implied improvement over std. care)Reported Device Performance (Mean [95% CI])Reported Device Performance (Median [95% CI])
Time to open exam in the standard of careBenchmark for comparison65.54 (59.14 - 71.76) min60.01 (54.57 - 77.63) min
Time-to-notification with qERSignificantly lower than standard of care2.11 (1.45 - 2.61) min1.21 (1.12 - 1.25) min

Study Details

2. Sample Size Used for the Test Set and Data Provenance

  • Sample Size: 1320 head CT scans.
  • Data Provenance: Retrospective, multicenter study. Data originated from multiple locations within the United States.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

  • Number of Experts: 3 board-certified radiologists.
  • Qualifications: The document explicitly states "board-certified radiologists." No further details on years of experience are provided.

4. Adjudication Method for the Test Set

  • The text states that the ground truth was established by "3 board-certified radiologists reading the scans." It does not explicitly mention an adjudication method (e.g., 2+1, 3+1 consensus). It is implied that their readings defined the ground truth, but the process of resolving discrepancies among the three readers is not detailed.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, and what was the effect size of how much human readers improve with AI vs without AI assistance

  • The provided text does not describe an MRMC comparative effectiveness study where human readers' performance with and without AI assistance was directly compared. The study primarily focuses on the standalone performance of the qER algorithm and its ability to reduce the "time to notification" compared to standard of care "time to open." While the "time-to-notification" analysis suggests a significant workflow improvement when using qER for triage (2.11 mins vs. 65.54 mins), this is not a direct measure of human reader diagnostic accuracy improvement with AI assistance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

  • Yes, a standalone performance study was done. The "Performance Data" section explicitly states, "A retrospective, multicenter, blinded clinical study was conducted to test the accuracy of qER at triaging head CT scans... Sensitivity and specificity exceeded the predefined success criteria... demonstrating the ability of the qER device to effectively triage studies containing one of these conditions." The results in Table 2 are for the qER algorithm's accuracy independently.

7. The Type of Ground Truth Used

  • Expert Consensus: The ground truth for the pathologies (Intracranial hemorrhage, cranial fractures, mass effect, midline shift, and absence of these abnormalities) was established by "3 board-certified radiologists reading the scans." This indicates an expert consensus approach to defining the ground truth.

8. The Sample Size for the Training Set

  • The document does not specify the sample size used for the training set. It mentions that the qER software uses "a pre-trained artificial intelligence algorithm" and "a pre-trained classification convolutional neural network (CNN) that has been trained to detect a specific abnormality from head CT scan images." However, the size of the dataset used for this training is not disclosed in the provided text.

9. How the Ground Truth for the Training Set was Established

  • The document does not explicitly describe how the ground truth for the training set was established. It only states that the CNN was "pre-trained" on medical images to detect specific abnormalities. It is common practice for such training to also rely on expert annotations, but this is not detailed for the training set in this document.

§ 892.2080 Radiological computer aided triage and notification software.

(a)
Identification. Radiological computer aided triage and notification software is an image processing prescription device intended to aid in prioritization and triage of radiological medical images. The device notifies a designated list of clinicians of the availability of time sensitive radiological medical images for review based on computer aided image analysis of those images performed by the device. The device does not mark, highlight, or direct users' attention to a specific location in the original image. The device does not remove cases from a reading queue. The device operates in parallel with the standard of care, which remains the default option for all cases.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the notification and triage algorithms and all underlying image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, how the algorithm affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide effective triage (
e.g., improved time to review of prioritized images for pre-specified clinicians).(iii) Results from performance testing that demonstrate that the device will provide effective triage. The performance assessment must be based on an appropriate measure to estimate the clinical effectiveness. The test dataset must contain sufficient numbers of cases from important cohorts (
e.g., subsets defined by clinically relevant confounders, effect modifiers, associated diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals for these individual subsets can be characterized with the device for the intended use population and imaging equipment.(iv) Stand-alone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use;
(ii) A detailed description of the intended user and user training that addresses appropriate use protocols for the device;
(iii) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality for certain subpopulations), as applicable;(iv) A detailed description of compatible imaging hardware, imaging protocols, and requirements for input images;
(v) Device operating instructions; and
(vi) A detailed summary of the performance testing, including: test methods, dataset characteristics, triage effectiveness (
e.g., improved time to review of prioritized images for pre-specified clinicians), diagnostic accuracy of algorithms informing triage decision, and results with associated statistical uncertainty (e.g., confidence intervals), including a summary of subanalyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.