Search Results

• autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.

• The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old for EEGs <4 hours and at least two years old for EEGs >4 hours. The autoSCORE component has not been assessed for intracranial recordings.

• autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform and non-epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.

• This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.

Device Description

autoSCORE is a software only device.

autoSCORE is an AI model that has been trained with standard deep learning principles using a large training dataset. The model will be locked in the field, so it cannot learn from data to which it is exposed when in use. It can only be used with a compatible electroencephalogram (EEG) reviewing software, which acquires and displays the EEG. The model has no user interface. The form of the visualization of the annotations is determined and provided by the EEG reviewing software.

autoSCORE has been trained to identify and then indicate to the user sections of EEG which may include abnormalities and to provide the level of probability of the presence of an abnormality. The algorithm also provides categorization of identified areas of abnormality into the four predefined types of abnormalities, again including a probability of that predefined abnormality type. This is performed by identifying epileptiform abnormalities/spikes (Focal epileptiform and generalised epileptiform) as well identifying non-epileptiform abnormalities (Focal non-epileptiform and Diffuse Non-Epileptiform).

This data is then provided by the algorithm to the EEG reviewing software, for it to display as part of the EEG output for the clinician to review. autoSCORE does not provide any diagnostic conclusion about the patient's condition nor treatment options to the user, and does not replace visual assessment of the EEG by the user. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.

AI/ML Overview

Acceptance Criteria and Study for autoSCORE (V 2.0.0)

This response outlines the acceptance criteria for autoSCORE (V 2.0.0) and the study conducted to demonstrate the device meets these criteria, based on the provided FDA 510(k) clearance letter.

1. Table of Acceptance Criteria and Reported Device Performance

The FDA clearance document does not explicitly present a table of predefined acceptance criteria (e.g., minimum PPV of X%, minimum Sensitivity of Y%). Instead, the regulatory strategy appears to be a demonstration of substantial equivalence through comparison to predicate devices and human expert consensus. The "Performance Validation" section (Section 7) outlines the metrics evaluated, and the "Validation Summary" (Section 7.2.6) states the conclusion of similarity.

Therefore, the "acceptance criteria" are implied to be that the device performs similarly to the predicate devices and/or to human experts, particularly in terms of Positive Predictive Value (PPV), as this was deemed clinically critical.

Here’s a table summarizing the reported device performance, which the manufacturer concluded met the implicit "acceptance criteria" by demonstrating substantial equivalence:

Performance Metric (Category)	autoSCORE V2 (Reported Performance)	Primary Predicate (encevis) (Reported Performance)	Secondary Predicate (autoSCORE V1.4) (Reported Performance)	Note on Comparison & Implied Acceptance
Recording Level - Accuracy (Abnormal)	0.912 (0.850, 0.963)	-	0.950 (0.900, 0.990)	AutoSCORE v2 comparable to autoSCORE v1.4. encevis not provided for "Abnormal."
Recording Level - Sensitivity (Abnormal)	0.926 (0.859, 0985)	-	1.000 (1.000, 1.000)	autoSCORE v2 slightly lower than v1.4, but still high.
Recording Level - Specificity (Abnormal)	0.833 (0.583, 1.000)	-	0.884 (0.778, 0.974)	autoSCORE v2 comparable to v1.4.
Recording Level - PPV (Abnormal)	0.969 (0.922, 1.000)	-	0.920 (0.846, 0.983)	autoSCORE v2 high PPV, comparable to v1.4.
Recording Level - Accuracy (IED)	0.875 (0.800, 0.938)	0.613 (0.500, 0.713)	IED not provided for v1.4	IED (Interictal Epileptiform Discharges) combines Focal Epi and Gen Epi. autoSCORE v2 significantly higher accuracy than encevis.
Recording Level - Sensitivity (IED)	0.939 (0.864, 1.000)	1.000 (1.000, 1.000)	IED not provided for v1.4	autoSCORE v2 high Sensitivity, similar to encevis.
Recording Level - Specificity (IED)	0.774 (0.618, 0.914)	0.000 (0.000, 0.000)	IED not provided for v1.4	autoSCORE v2 significantly higher Specificity than encevis (encevis had 0.000 specificity for IED).
Recording Level - PPV (IED)	0.868 (0.769, 0.952)	0.613 (0.500, 0.713)	IED not provided for v1.4	autoSCORE v2 significantly higher PPV than encevis (considered a key clinical metric).
Marker Level - PPV (Focal Epi)	0.560 (0.526, 0.594)	-	0.626 (0.616, 0.637) (Part 1) / 0.716 (0.701, 0.732) (Part 5)	autoSCORE v2 PPV slightly lower than v1.4 in some instances, but within general range. Comparison is against earlier validation parts of autoSCORE v1.4.
Marker Level - PPV (Gen Epi)	0.446 (0.405, 0.486)	-	0.815 (0.802, 0.828) (Part 1) / 0.825 (0.799, 0.849) (Part 5)	autoSCORE v2 PPV significantly lower than v1.4. This is a point of difference.
Marker Level - PPV (Focal Non-Epi)	0.823 (0.794, 0.852)	-	0.513 (0.506, 0.520) (Part 1) / 0.570 (0.556, 0.585) (Part 5)	autoSCORE v2 PPV significantly higher than v1.4.
Marker Level - PPV (Diff Non-Epi)	0.849 (0.822, 0.876)	-	0.696 (0.691, 0.702) (Part 1) / 0.537 (0.520, 0.554) (Part 5)	autoSCORE v2 PPV significantly higher than v1.4.
Marker Level - PPV (IED)	0.513 (0.486, 0.539)	0.257 (0.166, 0.349)	0.389 (0.281, 0.504)	autoSCORE v2 significantly higher PPV than encevis and autoSCORE v1.4. This is a key finding highlighted.
Correlation (Prob. vs. TP Markers)	p-value < 0.05 (for positive correlation)	Not applicable	Not applicable	The validation states a "significant positive correlation" (p-value < 0.05) was the criterion, and this was met.

Key takeaway on Acceptance: The "Validation Summary" directly states: "autoSCORE demonstrated a higher PPV overall compared to the predicate device encevis and a similar PPV compared to autoSCORE v1.4... autoSCORE was found to have a safety and effectiveness profile that is similar to the predicate devices." This conclusion, particularly the superior/similar PPV results, formed the basis for deeming the device "as safe, as effective, and performs as well as" the predicates.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size: 80 EEGs (40 Long Term Monitoring EEGs (LTMs) and 40 Ambulatory EEGs (AEEGs)).
Data Provenance: Retrospective, de-identified data. Original source hospitals/organizations anonymized the data, excluding only age and gender. No specific country of origin is mentioned, suggesting a general pool of collected clinical data. The time periods for data collection are:
- SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021.
- SET 2 AEEGs: 40 EEGs: June-October 2024.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: Three Human Experts (HEs) were used for consensus per EEG.
Qualifications: The document describes them as "Human Experts (HEs)" and "suitably trained professional who is qualified to clinically review EEG recordings" and "qualified medical practitioners." While specific experience levels (e.g., "radiologist with 10 years of experience") are not provided, the context implies they are board-certified neurologists or equivalent specialists highly proficient in EEG interpretation.

4. Adjudication Method for the Test Set

Adjudication Method: Consensus of three Human Experts (HEs) was used as the reference standard.
- For recording-level validation, HEs independently labeled each EEG segment.
- For marker-level validation (PPV), each autoSCORE-placed marker was reviewed by HEs, and a marker was classified as True Positive (TP) if at least two out of three HEs agreed it correctly identified the abnormality type. If fewer than two HEs agreed, it was considered a False Positive (FP).
- To prevent bias, HEs evaluating recording-level were blinded to autoSCORE output, and HEs evaluating marker-level had not participated in the initial recording-level assessment of the same EEG. All HEs were blinded to patient metadata (except age/gender) and autoSCORE outputs.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

The study design described is a standalone performance evaluation against human expert consensus and predicate devices. While human experts were involved in establishing ground truth, and the device's performance was compared to their consensus, it was not described as a Multi-Reader Multi-Case (MRMC) comparative effectiveness study in which human readers use the AI and are compared to human readers without AI assistance. The study solely evaluated the algorithm's performance against human consensus and predicate devices. Therefore, no effect size for human readers improving with AI assistance is provided.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance evaluation of the autoSCORE algorithm was conducted. The study assessed the algorithm's ability to identify and categorize abnormalities in EEG recordings by comparing its outputs directly against human expert consensus (ground truth) and against predicate devices. The validation focused on the algorithm's output metrics (Accuracy, Sensitivity, Specificity, PPV, NPV) at both recording and marker levels.

7. The Type of Ground Truth Used

The ground truth used was Expert Consensus. Specifically, a consensus agreement of three Human Experts (HEs) served as the reference standard for both recording-level analysis (presence/absence of abnormalities, and their types) and marker-level validation (correctness of autoSCORE-placed markers and their assigned abnormality types). This approach also included a "gold standard" where HEs, blinded to autoSCORE outputs, independently marked abnormalities in the EEG segments.

8. The Sample Size for the Training Set

The document states that autoSCORE "has been trained with standard deep learning principles using a large training dataset." However, the exact sample size for the training set is not provided in the given FDA 510(k) clearance letter.

9. How the Ground Truth for the Training Set Was Established

The document implies the training set ground truth was established through similar "deep learning principles" of data preparation, but it does not explicitly detail how the ground truth for the training set was established. It only describes the ground truth establishment for the test set (expert consensus). It's common for training data to be annotated by experts, but the specifics are not included in this document.

Ask a Question

Ask a specific question about this device

Page 1 of 1