Search Results

autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old. The autoSCORE component has not been assessed for intracranial recordings.
autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgendently make a conclusion supporting or not supporting brain disease.
This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.

Device Description

autoSCORE is a software-only decision support product intended to be used with compatible electroencephalography (EEG) review software. It is intended to assist the user when reviewing EEG recordings, by assessing the probability that previously acquired sections of EEG recordings contain abnormalities, and classifying these into pre-defined types of abnormality. autoSCORE sends this information to the EEG software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset. autoSCORE also provides an overview of the probability that EEG recordings and sections of EEG recordings include abnormalities, and which type(s) of abnormality they include. This is performed by identifying spikes of epileptiform abnormalities (Focal epileptiform and Generalized epileptiform) as well identifying non-epileptiform abnormalities (Focal Nonepileptiform and Diffuse Non-epileptiform). The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease. autoSCORE cannot detect or classify seizures. The recorded EEG activity is not altered by the information provided by autoSCORE. autoSCORE is not intended to provide information for diagnosis but to assist clinical workflow when using the EEG software.

AI/ML Overview

The FDA 510(k) summary for Holberg EEG AS's autoSCORE device provides extensive information regarding its acceptance criteria and the study proving it meets these criteria. Here's a breakdown of the requested information:

Acceptance Criteria and Device Performance for autoSCORE

The acceptance criteria for autoSCORE are established by its performance metrics in comparison to human expert assessments and predicate devices. The device is intended to assist medical practitioners in the review, monitoring, and analysis of EEG recordings by identifying and classifying abnormalities, particularly epileptic and non-epileptic events.

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are implicitly defined by the performance metrics (Sensitivity, Specificity, PPV, NPV, Correlation Coefficient) shown to be comparable to or exceeding those of human experts or predicate devices. Since specific numeric thresholds for acceptance are not explicitly stated, the reported performance metrics are presented as evidence of meeting acceptable clinical performance.

Table 1: Reported Performance of autoSCORE (Summarized from document)

Metric (Recording-Level)	Normal/Abnormal (All Ages; Part 2, n=100)	Normal/Abnormal (All Ages; Part 1, n=4850)	Normal/Abnormal (All Ages; Part 5, n=1315)	Focal Epi (Part 2, n=100)	Gen Epi (Part 2, n=100)	Diff Non-Epi (Part 2, n=100)	Focal Non-Epi (Part 2, n=100)	Epi (AutoSCORE vs. Predicate; Part 3, n=100)	Epi (AutoSCORE vs. Predicate; Part 4, n=58)
Sensitivity (%)	100	83.1 [81.3, 84.8]	87.8 [85.0, 90.5]	73.9 [54.5, 91.3]	100 [100, 100]	87.5 [72.7, 100]	61.5 [42.1, 80]	90.0 [77.8, 100]	93.3 [83.3, 100]
Specificity (%)	88.4 [77.8, 97.4]	91.8 [90.8, 92.8]	89.4 [87.2, 91.6]	88.3 [80.8, 94.9]	94.1 [88.6, 98.8]	82.8 [74.0, 90.9]	93.2 [86.8, 98.6]	87.1 [78.8, 94.4]	96.4 [88.0, 100]
PPV (%)	92.0 [84.5, 98.3]	84.9 [83.2, 86.6]	86.0 [83.0, 88.8]	65.4 [45.8, 83.3]	75.1 [54.5, 93.8]	61.7 [44.8, 78.0]	76.1 [56.2, 94.1]	75.0 [60.0, 88.9]	96.6 [88.5, 100]
NPV (%)	100	90.8 [89.8, 91.8]	90.9 [88.8, 92.9]	91.9 [85.1, 97.4]	100 [100, 100]	95.5 [89.7, 100]	87.4 [79.5, 94.1]	95.3 [89.5, 100]	93.1 [82.6, 100]
Correlation Coeff.	0.96	0.99	0.99	0.85	0.83	0.93	0.84	N/A	N/A

Note: For detailed confidence intervals and marker-level performance, refer to Tables 4, 5, 6, 7, and 8 in the original document.

2. Sample Sizes Used for the Test Set and Data Provenance

The clinical validation was performed across five separate datasets:

Part 1 (Single-Center): 4,850 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in a hospital setting. Retrospective.
Part 2 (Multi-Center): 100 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in various hospital settings. Retrospective.
Part 3 (Direct Comparison to Primary Predicate): Same 100 EEGs as Part 2. Retrospective.
Part 4 (Benchmarking against Primary and Secondary Predicates): 58 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment. Retrospective.
Part 5 (Hold-out Dataset, Two Centers): 1,315 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in two hospital settings. Retrospective.

None of the EEGs used in the validation were used for the development of the AI model. The document does not explicitly state the country of origin for the data, but the company address in Norway suggests a European origin.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts

Part 1 & 5: Ground truth established by multiple Human Experts (HEs), with a single HE reviewer per EEG.
- Part 1: 9 HEs, each assessing more than 1% of the EEGs.
- Part 5: 15 HEs, each assessing more than 1% of the EEGs.
- Qualifications: "Qualified medical practitioners" or "neurologists" who exercise professional judgment. In Parts 1 and 5, their assessments were part of "routine EEG assessment in their respective hospitals," implying they are experienced clinicians.
Part 2 & 3: Ground truth established by HE consensus.
- Part 2 & 3: 11 independent HEs reviewed 100 EEGs.
- Qualifications: "Independent human experts." Implied to be qualified clinical practitioners.
Part 4: Ground truth established by HE consensus.
- Part 4: 3 HEs.
- Qualifications: "HEs." Implied to be qualified clinical practitioners.

4. Adjudication Method for the Test Set

Part 1 & 5 (Recording and Marker Level): Ground truth was established by single HE reviewer per EEG. While multiple HEs contributed, each EEG had a single "reference standard" HE assessment. This is a "none" or "single-reader" adjudication in the context of individual EEG ground truth, though the overall dataset was reviewed by multiple HEs.
Part 2 & 3 (Recording Level): Ground truth was based on HE consensus of 11 HEs, assessing if EEGs were normal/abnormal and contained specific abnormality categories. This implies a form of majority consensus or agreement-based adjudication among the 11 experts. The granularity of probability grouping was 9 percentage points.
Part 4 (Recording and Marker Level): Ground truth was majority consensus scoring of 3 HEs.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, If So, What was the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance?

The study described is a direct comparison of autoSCORE's performance against human experts and predicate devices, effectively evaluating the AI's standalone or augmented performance rather than the improvement of human readers when assisted by AI. The document states autoSCORE is a "decision support product intended to be used with compatible electroencephalography (EEG) review software" and that the "user is required to review the EEG and exercise their clinical judgement to independently make a conclusion." However, it does not present an MRMC comparative effectiveness study that quantifies the improvement of human readers assisted by AI versus without AI assistance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done?

Yes, the study primarily assesses the standalone performance of the autoSCORE algorithm by comparing its outputs directly against human expert assessments (considered the ground truth) and outputs from predicate devices. The tables summarizing sensitivity, specificity, PPV, NPV, and correlation coefficients directly reflect the algorithm's performance.

7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)

The ground truth used for the test sets was primarily human expert assessment (or consensus of human experts).

Parts 1 and 5 used individual HE assessments as the reference standard (routine clinical assessments).
Parts 2, 3, and 4 used expert consensus as the reference standard.
No pathology or outcomes data were used to establish the ground truth.

8. The Sample Size for the Training Set

The document explicitly states that "None of the EEGs used in the validation were used in the development of the AI model." However, the specific sample size of the training set is not provided in the provided document. It only mentions that "autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset."

9. How the Ground Truth for the Training Set Was Established

The document does not explicitly describe how the ground truth for the training set was established. It only refers to "standard deep learning principles" and a "large training dataset." It notes that the HEs providing the reference standards for the validation phase (Studies 1, 2, 3, and 4) were different from those who participated in the development portion of the process. This implies that human experts were involved in creating the ground truth for the training data, but the method (e.g., single expert, multi-expert consensus, specific rules) is not detailed.

Ask a Question

Ask a specific question about this device

Page 1 of 1