Search Results

• autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.

• The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old for EEGs <4 hours and at least two years old for EEGs >4 hours. The autoSCORE component has not been assessed for intracranial recordings.

• autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform and non-epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.

• This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.

Device Description

autoSCORE is a software only device.

autoSCORE is an AI model that has been trained with standard deep learning principles using a large training dataset. The model will be locked in the field, so it cannot learn from data to which it is exposed when in use. It can only be used with a compatible electroencephalogram (EEG) reviewing software, which acquires and displays the EEG. The model has no user interface. The form of the visualization of the annotations is determined and provided by the EEG reviewing software.

autoSCORE has been trained to identify and then indicate to the user sections of EEG which may include abnormalities and to provide the level of probability of the presence of an abnormality. The algorithm also provides categorization of identified areas of abnormality into the four predefined types of abnormalities, again including a probability of that predefined abnormality type. This is performed by identifying epileptiform abnormalities/spikes (Focal epileptiform and generalised epileptiform) as well identifying non-epileptiform abnormalities (Focal non-epileptiform and Diffuse Non-Epileptiform).

This data is then provided by the algorithm to the EEG reviewing software, for it to display as part of the EEG output for the clinician to review. autoSCORE does not provide any diagnostic conclusion about the patient's condition nor treatment options to the user, and does not replace visual assessment of the EEG by the user. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.

AI/ML Overview

Acceptance Criteria and Study for autoSCORE (V 2.0.0)

This response outlines the acceptance criteria for autoSCORE (V 2.0.0) and the study conducted to demonstrate the device meets these criteria, based on the provided FDA 510(k) clearance letter.

1. Table of Acceptance Criteria and Reported Device Performance

The FDA clearance document does not explicitly present a table of predefined acceptance criteria (e.g., minimum PPV of X%, minimum Sensitivity of Y%). Instead, the regulatory strategy appears to be a demonstration of substantial equivalence through comparison to predicate devices and human expert consensus. The "Performance Validation" section (Section 7) outlines the metrics evaluated, and the "Validation Summary" (Section 7.2.6) states the conclusion of similarity.

Therefore, the "acceptance criteria" are implied to be that the device performs similarly to the predicate devices and/or to human experts, particularly in terms of Positive Predictive Value (PPV), as this was deemed clinically critical.

Here’s a table summarizing the reported device performance, which the manufacturer concluded met the implicit "acceptance criteria" by demonstrating substantial equivalence:

Performance Metric (Category)	autoSCORE V2 (Reported Performance)	Primary Predicate (encevis) (Reported Performance)	Secondary Predicate (autoSCORE V1.4) (Reported Performance)	Note on Comparison & Implied Acceptance
Recording Level - Accuracy (Abnormal)	0.912 (0.850, 0.963)	-	0.950 (0.900, 0.990)	AutoSCORE v2 comparable to autoSCORE v1.4. encevis not provided for "Abnormal."
Recording Level - Sensitivity (Abnormal)	0.926 (0.859, 0985)	-	1.000 (1.000, 1.000)	autoSCORE v2 slightly lower than v1.4, but still high.
Recording Level - Specificity (Abnormal)	0.833 (0.583, 1.000)	-	0.884 (0.778, 0.974)	autoSCORE v2 comparable to v1.4.
Recording Level - PPV (Abnormal)	0.969 (0.922, 1.000)	-	0.920 (0.846, 0.983)	autoSCORE v2 high PPV, comparable to v1.4.
Recording Level - Accuracy (IED)	0.875 (0.800, 0.938)	0.613 (0.500, 0.713)	IED not provided for v1.4	IED (Interictal Epileptiform Discharges) combines Focal Epi and Gen Epi. autoSCORE v2 significantly higher accuracy than encevis.
Recording Level - Sensitivity (IED)	0.939 (0.864, 1.000)	1.000 (1.000, 1.000)	IED not provided for v1.4	autoSCORE v2 high Sensitivity, similar to encevis.
Recording Level - Specificity (IED)	0.774 (0.618, 0.914)	0.000 (0.000, 0.000)	IED not provided for v1.4	autoSCORE v2 significantly higher Specificity than encevis (encevis had 0.000 specificity for IED).
Recording Level - PPV (IED)	0.868 (0.769, 0.952)	0.613 (0.500, 0.713)	IED not provided for v1.4	autoSCORE v2 significantly higher PPV than encevis (considered a key clinical metric).
Marker Level - PPV (Focal Epi)	0.560 (0.526, 0.594)	-	0.626 (0.616, 0.637) (Part 1) / 0.716 (0.701, 0.732) (Part 5)	autoSCORE v2 PPV slightly lower than v1.4 in some instances, but within general range. Comparison is against earlier validation parts of autoSCORE v1.4.
Marker Level - PPV (Gen Epi)	0.446 (0.405, 0.486)	-	0.815 (0.802, 0.828) (Part 1) / 0.825 (0.799, 0.849) (Part 5)	autoSCORE v2 PPV significantly lower than v1.4. This is a point of difference.
Marker Level - PPV (Focal Non-Epi)	0.823 (0.794, 0.852)	-	0.513 (0.506, 0.520) (Part 1) / 0.570 (0.556, 0.585) (Part 5)	autoSCORE v2 PPV significantly higher than v1.4.
Marker Level - PPV (Diff Non-Epi)	0.849 (0.822, 0.876)	-	0.696 (0.691, 0.702) (Part 1) / 0.537 (0.520, 0.554) (Part 5)	autoSCORE v2 PPV significantly higher than v1.4.
Marker Level - PPV (IED)	0.513 (0.486, 0.539)	0.257 (0.166, 0.349)	0.389 (0.281, 0.504)	autoSCORE v2 significantly higher PPV than encevis and autoSCORE v1.4. This is a key finding highlighted.
Correlation (Prob. vs. TP Markers)	p-value < 0.05 (for positive correlation)	Not applicable	Not applicable	The validation states a "significant positive correlation" (p-value < 0.05) was the criterion, and this was met.

Key takeaway on Acceptance: The "Validation Summary" directly states: "autoSCORE demonstrated a higher PPV overall compared to the predicate device encevis and a similar PPV compared to autoSCORE v1.4... autoSCORE was found to have a safety and effectiveness profile that is similar to the predicate devices." This conclusion, particularly the superior/similar PPV results, formed the basis for deeming the device "as safe, as effective, and performs as well as" the predicates.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size: 80 EEGs (40 Long Term Monitoring EEGs (LTMs) and 40 Ambulatory EEGs (AEEGs)).
Data Provenance: Retrospective, de-identified data. Original source hospitals/organizations anonymized the data, excluding only age and gender. No specific country of origin is mentioned, suggesting a general pool of collected clinical data. The time periods for data collection are:
- SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021.
- SET 2 AEEGs: 40 EEGs: June-October 2024.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: Three Human Experts (HEs) were used for consensus per EEG.
Qualifications: The document describes them as "Human Experts (HEs)" and "suitably trained professional who is qualified to clinically review EEG recordings" and "qualified medical practitioners." While specific experience levels (e.g., "radiologist with 10 years of experience") are not provided, the context implies they are board-certified neurologists or equivalent specialists highly proficient in EEG interpretation.

4. Adjudication Method for the Test Set

Adjudication Method: Consensus of three Human Experts (HEs) was used as the reference standard.
- For recording-level validation, HEs independently labeled each EEG segment.
- For marker-level validation (PPV), each autoSCORE-placed marker was reviewed by HEs, and a marker was classified as True Positive (TP) if at least two out of three HEs agreed it correctly identified the abnormality type. If fewer than two HEs agreed, it was considered a False Positive (FP).
- To prevent bias, HEs evaluating recording-level were blinded to autoSCORE output, and HEs evaluating marker-level had not participated in the initial recording-level assessment of the same EEG. All HEs were blinded to patient metadata (except age/gender) and autoSCORE outputs.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

The study design described is a standalone performance evaluation against human expert consensus and predicate devices. While human experts were involved in establishing ground truth, and the device's performance was compared to their consensus, it was not described as a Multi-Reader Multi-Case (MRMC) comparative effectiveness study in which human readers use the AI and are compared to human readers without AI assistance. The study solely evaluated the algorithm's performance against human consensus and predicate devices. Therefore, no effect size for human readers improving with AI assistance is provided.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance evaluation of the autoSCORE algorithm was conducted. The study assessed the algorithm's ability to identify and categorize abnormalities in EEG recordings by comparing its outputs directly against human expert consensus (ground truth) and against predicate devices. The validation focused on the algorithm's output metrics (Accuracy, Sensitivity, Specificity, PPV, NPV) at both recording and marker levels.

7. The Type of Ground Truth Used

The ground truth used was Expert Consensus. Specifically, a consensus agreement of three Human Experts (HEs) served as the reference standard for both recording-level analysis (presence/absence of abnormalities, and their types) and marker-level validation (correctness of autoSCORE-placed markers and their assigned abnormality types). This approach also included a "gold standard" where HEs, blinded to autoSCORE outputs, independently marked abnormalities in the EEG segments.

8. The Sample Size for the Training Set

The document states that autoSCORE "has been trained with standard deep learning principles using a large training dataset." However, the exact sample size for the training set is not provided in the given FDA 510(k) clearance letter.

9. How the Ground Truth for the Training Set Was Established

The document implies the training set ground truth was established through similar "deep learning principles" of data preparation, but it does not explicitly detail how the ground truth for the training set was established. It only describes the ground truth establishment for the test set (expert consensus). It's common for training data to be annotated by experts, but the specifics are not included in this document.

Ask a Question

Ask a specific question about this device

K Number

K231068

Device Name

autoSCORE

Manufacturer

Holberg EEG AS

Date Cleared

2024-01-07

(268 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K171720,K151929

Predicate For

K243743

Intended Use

autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old. The autoSCORE component has not been assessed for intracranial recordings.
autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgendently make a conclusion supporting or not supporting brain disease.
This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.

Device Description

autoSCORE is a software-only decision support product intended to be used with compatible electroencephalography (EEG) review software. It is intended to assist the user when reviewing EEG recordings, by assessing the probability that previously acquired sections of EEG recordings contain abnormalities, and classifying these into pre-defined types of abnormality. autoSCORE sends this information to the EEG software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset. autoSCORE also provides an overview of the probability that EEG recordings and sections of EEG recordings include abnormalities, and which type(s) of abnormality they include. This is performed by identifying spikes of epileptiform abnormalities (Focal epileptiform and Generalized epileptiform) as well identifying non-epileptiform abnormalities (Focal Nonepileptiform and Diffuse Non-epileptiform). The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease. autoSCORE cannot detect or classify seizures. The recorded EEG activity is not altered by the information provided by autoSCORE. autoSCORE is not intended to provide information for diagnosis but to assist clinical workflow when using the EEG software.

AI/ML Overview

The FDA 510(k) summary for Holberg EEG AS's autoSCORE device provides extensive information regarding its acceptance criteria and the study proving it meets these criteria. Here's a breakdown of the requested information:

Acceptance Criteria and Device Performance for autoSCORE

The acceptance criteria for autoSCORE are established by its performance metrics in comparison to human expert assessments and predicate devices. The device is intended to assist medical practitioners in the review, monitoring, and analysis of EEG recordings by identifying and classifying abnormalities, particularly epileptic and non-epileptic events.

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are implicitly defined by the performance metrics (Sensitivity, Specificity, PPV, NPV, Correlation Coefficient) shown to be comparable to or exceeding those of human experts or predicate devices. Since specific numeric thresholds for acceptance are not explicitly stated, the reported performance metrics are presented as evidence of meeting acceptable clinical performance.

Table 1: Reported Performance of autoSCORE (Summarized from document)

Metric (Recording-Level)	Normal/Abnormal (All Ages; Part 2, n=100)	Normal/Abnormal (All Ages; Part 1, n=4850)	Normal/Abnormal (All Ages; Part 5, n=1315)	Focal Epi (Part 2, n=100)	Gen Epi (Part 2, n=100)	Diff Non-Epi (Part 2, n=100)	Focal Non-Epi (Part 2, n=100)	Epi (AutoSCORE vs. Predicate; Part 3, n=100)	Epi (AutoSCORE vs. Predicate; Part 4, n=58)
Sensitivity (%)	100	83.1 [81.3, 84.8]	87.8 [85.0, 90.5]	73.9 [54.5, 91.3]	100 [100, 100]	87.5 [72.7, 100]	61.5 [42.1, 80]	90.0 [77.8, 100]	93.3 [83.3, 100]
Specificity (%)	88.4 [77.8, 97.4]	91.8 [90.8, 92.8]	89.4 [87.2, 91.6]	88.3 [80.8, 94.9]	94.1 [88.6, 98.8]	82.8 [74.0, 90.9]	93.2 [86.8, 98.6]	87.1 [78.8, 94.4]	96.4 [88.0, 100]
PPV (%)	92.0 [84.5, 98.3]	84.9 [83.2, 86.6]	86.0 [83.0, 88.8]	65.4 [45.8, 83.3]	75.1 [54.5, 93.8]	61.7 [44.8, 78.0]	76.1 [56.2, 94.1]	75.0 [60.0, 88.9]	96.6 [88.5, 100]
NPV (%)	100	90.8 [89.8, 91.8]	90.9 [88.8, 92.9]	91.9 [85.1, 97.4]	100 [100, 100]	95.5 [89.7, 100]	87.4 [79.5, 94.1]	95.3 [89.5, 100]	93.1 [82.6, 100]
Correlation Coeff.	0.96	0.99	0.99	0.85	0.83	0.93	0.84	N/A	N/A

Note: For detailed confidence intervals and marker-level performance, refer to Tables 4, 5, 6, 7, and 8 in the original document.

2. Sample Sizes Used for the Test Set and Data Provenance

The clinical validation was performed across five separate datasets:

Part 1 (Single-Center): 4,850 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in a hospital setting. Retrospective.
Part 2 (Multi-Center): 100 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in various hospital settings. Retrospective.
Part 3 (Direct Comparison to Primary Predicate): Same 100 EEGs as Part 2. Retrospective.
Part 4 (Benchmarking against Primary and Secondary Predicates): 58 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment. Retrospective.
Part 5 (Hold-out Dataset, Two Centers): 1,315 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in two hospital settings. Retrospective.

None of the EEGs used in the validation were used for the development of the AI model. The document does not explicitly state the country of origin for the data, but the company address in Norway suggests a European origin.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts

Part 1 & 5: Ground truth established by multiple Human Experts (HEs), with a single HE reviewer per EEG.
- Part 1: 9 HEs, each assessing more than 1% of the EEGs.
- Part 5: 15 HEs, each assessing more than 1% of the EEGs.
- Qualifications: "Qualified medical practitioners" or "neurologists" who exercise professional judgment. In Parts 1 and 5, their assessments were part of "routine EEG assessment in their respective hospitals," implying they are experienced clinicians.
Part 2 & 3: Ground truth established by HE consensus.
- Part 2 & 3: 11 independent HEs reviewed 100 EEGs.
- Qualifications: "Independent human experts." Implied to be qualified clinical practitioners.
Part 4: Ground truth established by HE consensus.
- Part 4: 3 HEs.
- Qualifications: "HEs." Implied to be qualified clinical practitioners.

4. Adjudication Method for the Test Set

Part 1 & 5 (Recording and Marker Level): Ground truth was established by single HE reviewer per EEG. While multiple HEs contributed, each EEG had a single "reference standard" HE assessment. This is a "none" or "single-reader" adjudication in the context of individual EEG ground truth, though the overall dataset was reviewed by multiple HEs.
Part 2 & 3 (Recording Level): Ground truth was based on HE consensus of 11 HEs, assessing if EEGs were normal/abnormal and contained specific abnormality categories. This implies a form of majority consensus or agreement-based adjudication among the 11 experts. The granularity of probability grouping was 9 percentage points.
Part 4 (Recording and Marker Level): Ground truth was majority consensus scoring of 3 HEs.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, If So, What was the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance?

The study described is a direct comparison of autoSCORE's performance against human experts and predicate devices, effectively evaluating the AI's standalone or augmented performance rather than the improvement of human readers when assisted by AI. The document states autoSCORE is a "decision support product intended to be used with compatible electroencephalography (EEG) review software" and that the "user is required to review the EEG and exercise their clinical judgement to independently make a conclusion." However, it does not present an MRMC comparative effectiveness study that quantifies the improvement of human readers assisted by AI versus without AI assistance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done?

Yes, the study primarily assesses the standalone performance of the autoSCORE algorithm by comparing its outputs directly against human expert assessments (considered the ground truth) and outputs from predicate devices. The tables summarizing sensitivity, specificity, PPV, NPV, and correlation coefficients directly reflect the algorithm's performance.

7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)

The ground truth used for the test sets was primarily human expert assessment (or consensus of human experts).

Parts 1 and 5 used individual HE assessments as the reference standard (routine clinical assessments).
Parts 2, 3, and 4 used expert consensus as the reference standard.
No pathology or outcomes data were used to establish the ground truth.

8. The Sample Size for the Training Set

The document explicitly states that "None of the EEGs used in the validation were used in the development of the AI model." However, the specific sample size of the training set is not provided in the provided document. It only mentions that "autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset."

9. How the Ground Truth for the Training Set Was Established

The document does not explicitly describe how the ground truth for the training set was established. It only refers to "standard deep learning principles" and a "large training dataset." It notes that the HEs providing the reference standards for the validation phase (Studies 1, 2, 3, and 4) were different from those who participated in the development portion of the process. This implies that human experts were involved in creating the ground truth for the training data, but the method (e.g., single expert, multi-expert consensus, specific rules) is not detailed.

Ask a Question

Ask a specific question about this device

Page 1 of 1