Search Results

autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old. The autoSCORE component has not been assessed for intracranial recordings.
autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgendently make a conclusion supporting or not supporting brain disease.
This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.

Device Description

autoSCORE is a software-only decision support product intended to be used with compatible electroencephalography (EEG) review software. It is intended to assist the user when reviewing EEG recordings, by assessing the probability that previously acquired sections of EEG recordings contain abnormalities, and classifying these into pre-defined types of abnormality. autoSCORE sends this information to the EEG software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset. autoSCORE also provides an overview of the probability that EEG recordings and sections of EEG recordings include abnormalities, and which type(s) of abnormality they include. This is performed by identifying spikes of epileptiform abnormalities (Focal epileptiform and Generalized epileptiform) as well identifying non-epileptiform abnormalities (Focal Nonepileptiform and Diffuse Non-epileptiform). The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease. autoSCORE cannot detect or classify seizures. The recorded EEG activity is not altered by the information provided by autoSCORE. autoSCORE is not intended to provide information for diagnosis but to assist clinical workflow when using the EEG software.

AI/ML Overview

The FDA 510(k) summary for Holberg EEG AS's autoSCORE device provides extensive information regarding its acceptance criteria and the study proving it meets these criteria. Here's a breakdown of the requested information:

Acceptance Criteria and Device Performance for autoSCORE

The acceptance criteria for autoSCORE are established by its performance metrics in comparison to human expert assessments and predicate devices. The device is intended to assist medical practitioners in the review, monitoring, and analysis of EEG recordings by identifying and classifying abnormalities, particularly epileptic and non-epileptic events.

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are implicitly defined by the performance metrics (Sensitivity, Specificity, PPV, NPV, Correlation Coefficient) shown to be comparable to or exceeding those of human experts or predicate devices. Since specific numeric thresholds for acceptance are not explicitly stated, the reported performance metrics are presented as evidence of meeting acceptable clinical performance.

Table 1: Reported Performance of autoSCORE (Summarized from document)

Metric (Recording-Level)	Normal/Abnormal (All Ages; Part 2, n=100)	Normal/Abnormal (All Ages; Part 1, n=4850)	Normal/Abnormal (All Ages; Part 5, n=1315)	Focal Epi (Part 2, n=100)	Gen Epi (Part 2, n=100)	Diff Non-Epi (Part 2, n=100)	Focal Non-Epi (Part 2, n=100)	Epi (AutoSCORE vs. Predicate; Part 3, n=100)	Epi (AutoSCORE vs. Predicate; Part 4, n=58)
Sensitivity (%)	100	83.1 [81.3, 84.8]	87.8 [85.0, 90.5]	73.9 [54.5, 91.3]	100 [100, 100]	87.5 [72.7, 100]	61.5 [42.1, 80]	90.0 [77.8, 100]	93.3 [83.3, 100]
Specificity (%)	88.4 [77.8, 97.4]	91.8 [90.8, 92.8]	89.4 [87.2, 91.6]	88.3 [80.8, 94.9]	94.1 [88.6, 98.8]	82.8 [74.0, 90.9]	93.2 [86.8, 98.6]	87.1 [78.8, 94.4]	96.4 [88.0, 100]
PPV (%)	92.0 [84.5, 98.3]	84.9 [83.2, 86.6]	86.0 [83.0, 88.8]	65.4 [45.8, 83.3]	75.1 [54.5, 93.8]	61.7 [44.8, 78.0]	76.1 [56.2, 94.1]	75.0 [60.0, 88.9]	96.6 [88.5, 100]
NPV (%)	100	90.8 [89.8, 91.8]	90.9 [88.8, 92.9]	91.9 [85.1, 97.4]	100 [100, 100]	95.5 [89.7, 100]	87.4 [79.5, 94.1]	95.3 [89.5, 100]	93.1 [82.6, 100]
Correlation Coeff.	0.96	0.99	0.99	0.85	0.83	0.93	0.84	N/A	N/A

Note: For detailed confidence intervals and marker-level performance, refer to Tables 4, 5, 6, 7, and 8 in the original document.

2. Sample Sizes Used for the Test Set and Data Provenance

The clinical validation was performed across five separate datasets:

Part 1 (Single-Center): 4,850 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in a hospital setting. Retrospective.
Part 2 (Multi-Center): 100 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in various hospital settings. Retrospective.
Part 3 (Direct Comparison to Primary Predicate): Same 100 EEGs as Part 2. Retrospective.
Part 4 (Benchmarking against Primary and Secondary Predicates): 58 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment. Retrospective.
Part 5 (Hold-out Dataset, Two Centers): 1,315 EEGs. Data provenance not explicitly stated but implied to be from routine EEG assessment in two hospital settings. Retrospective.

None of the EEGs used in the validation were used for the development of the AI model. The document does not explicitly state the country of origin for the data, but the company address in Norway suggests a European origin.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts

Part 1 & 5: Ground truth established by multiple Human Experts (HEs), with a single HE reviewer per EEG.
- Part 1: 9 HEs, each assessing more than 1% of the EEGs.
- Part 5: 15 HEs, each assessing more than 1% of the EEGs.
- Qualifications: "Qualified medical practitioners" or "neurologists" who exercise professional judgment. In Parts 1 and 5, their assessments were part of "routine EEG assessment in their respective hospitals," implying they are experienced clinicians.
Part 2 & 3: Ground truth established by HE consensus.
- Part 2 & 3: 11 independent HEs reviewed 100 EEGs.
- Qualifications: "Independent human experts." Implied to be qualified clinical practitioners.
Part 4: Ground truth established by HE consensus.
- Part 4: 3 HEs.
- Qualifications: "HEs." Implied to be qualified clinical practitioners.

4. Adjudication Method for the Test Set

Part 1 & 5 (Recording and Marker Level): Ground truth was established by single HE reviewer per EEG. While multiple HEs contributed, each EEG had a single "reference standard" HE assessment. This is a "none" or "single-reader" adjudication in the context of individual EEG ground truth, though the overall dataset was reviewed by multiple HEs.
Part 2 & 3 (Recording Level): Ground truth was based on HE consensus of 11 HEs, assessing if EEGs were normal/abnormal and contained specific abnormality categories. This implies a form of majority consensus or agreement-based adjudication among the 11 experts. The granularity of probability grouping was 9 percentage points.
Part 4 (Recording and Marker Level): Ground truth was majority consensus scoring of 3 HEs.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, If So, What was the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance?

The study described is a direct comparison of autoSCORE's performance against human experts and predicate devices, effectively evaluating the AI's standalone or augmented performance rather than the improvement of human readers when assisted by AI. The document states autoSCORE is a "decision support product intended to be used with compatible electroencephalography (EEG) review software" and that the "user is required to review the EEG and exercise their clinical judgement to independently make a conclusion." However, it does not present an MRMC comparative effectiveness study that quantifies the improvement of human readers assisted by AI versus without AI assistance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done?

Yes, the study primarily assesses the standalone performance of the autoSCORE algorithm by comparing its outputs directly against human expert assessments (considered the ground truth) and outputs from predicate devices. The tables summarizing sensitivity, specificity, PPV, NPV, and correlation coefficients directly reflect the algorithm's performance.

7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)

The ground truth used for the test sets was primarily human expert assessment (or consensus of human experts).

Parts 1 and 5 used individual HE assessments as the reference standard (routine clinical assessments).
Parts 2, 3, and 4 used expert consensus as the reference standard.
No pathology or outcomes data were used to establish the ground truth.

8. The Sample Size for the Training Set

The document explicitly states that "None of the EEGs used in the validation were used in the development of the AI model." However, the specific sample size of the training set is not provided in the provided document. It only mentions that "autoSCORE uses an algorithm that has been trained with standard deep learning principles using a large training dataset."

9. How the Ground Truth for the Training Set Was Established

The document does not explicitly describe how the ground truth for the training set was established. It only refers to "standard deep learning principles" and a "large training dataset." It notes that the HEs providing the reference standards for the validation phase (Studies 1, 2, 3, and 4) were different from those who participated in the development portion of the process. This implies that human experts were involved in creating the ground truth for the training data, but the method (e.g., single expert, multi-expert consensus, specific rules) is not detailed.

Ask a Question

Ask a specific question about this device

K Number

K171720

Device Name

encevis

Manufacturer

AIT Austrian Institute of Technology GmbH

Date Cleared

2018-04-19

(311 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K132306,K151929

Predicate For

K211452,K231068,K243743

Intended Use

encevis is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
The seizure detection component of encevis is intended to mark previously acquired sections of adult (greater than or equal to 18 years) EEG recordings that may correspond to electrographic seizures, in order to assist qualified clinical practitioners in the assessment of EEG traces. EEG recordings should be obtained with a full scalp montage according to the standard 10/20-system.
The spike detection component of encevis is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The Spike Detection component is intended to be used in adult patients greater than or equal to 18 years. encevis Spike Detection performance has not been assessed for intracranial recordings.
encevis includes the calculation and display of a set of quantitative measures intended to monitor and analyze the EEG waveform. These include frequency bands, rhythmic and periodic patterns and burst suppression. These quantitative EEG measures should always be interpreted in conjunction with review of the original EEG waveforms.
The aEEG functionality included in encevis is intended to monitor the state of the brain.
encevis provides notifications on an on-screen display for seizure detection, spike detection, quantitative EEG and a EG that can be used when processing a record during acquisition. Delays of up to several minutes can occur between the beginning of a seizure, the occurrence of a spike or detection of quantitative EEG features and when the encevis notifications will be shown to a user. encevis notifications cannot be used as a substitute for real time monitoring of the underlying EEG by a trained expert.
encevis PureEEG (Artifact Reduction) is intended to reduce artifacts in a standard 10-20 EEG recording. PureEEG does not remove the entiract signal, and is not effective for other types of artifacts. PureEEG may modify portions of waveforms representing cerebral activity. Waveforms must still be read by a qualified medical practitioner trained in recognizing artifact, and any interpretation or diagnosis must be made with reference to the original waveforms.
This device does not provide any diagnostic conclusion about the patient's condition to the user.

Device Description

encevis combines several modalities for viewing and analyzing EEG data in one integrated software package. Encevis consists of the following modalities: encevis EEG-Viewer, encevis artifact reduction (PureEEG), encevis seizure detection (EpiScan), encevis spike detection (EpiSpike), encevis rhythmic and periodic patterns, encevis aEEG, encevis frequency bands, encevis Burst Suppression.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the studies performed to prove the device meets these criteria, based on the provided document. The information is organized according to your requested points.

encevis Device Performance Study Summary

1. Table of Acceptance Criteria and Reported Device Performance

The document presents performance metrics for various components of the encevis device, often comparing them to a predicate device (Persyst). The acceptance criteria are implicitly defined by the non-inferiority margins used in statistical testing.

Feature (Component)	Acceptance Criterion (Implicit)	Reported Device Performance (encevis)	Comparison to Predicate (Persyst)
Seizure Detection (EpiScan)	Non-inferiority to predicate device (Persyst) with margins: - PPA: 10% - NDR: 1 (false detections/24 hours)	Average PPA: 75.40% (95% CI=[64.5, 86.3]) Average NDR: 7.01 false detections in 24 hours (95% CI=[5.9, 8.2])	PPA is non-inferior (Predicate PPA: 75.94% (95% CI=[65.5, 86.4])) NDR is non-inferior (Predicate NDR: 10.61 false detections in 24 hours (95% CI=[6.8, 14.5]))
Spike Detection (EpiSpike)	Non-inferiority to predicate device (Persyst) with margins: - PPA: 3% - NPA: 3% - PLPA: 3%	Average PPA: 61.02% (95% CI=[47.9-74.1]) Average NPA: 98.58% (95% CI=[97.9-99.3]) Average PLPA: 96.09% (95% CI=[88.8-103.4])	PPA is non-inferior (Predicate PPA: 8.7% (95% CI=[4.4-13.0])) NPA is non-inferior (Predicate NPA: 99.69% (95% CI=[99.4-99.9])) PLPA is non-inferior (Predicate PLPA: 93.97% (95% CI=[83.6-104.3]))
Artifact Reduction (PureEEG)	Non-inferiority to predicate device (Persyst) with margins: - Relative suppression of true EEG: 1 dB (delta CI = [-0.09, -0.04]) - Signal-to-noise ratio (SNR) after artifact removal: 0.10 (delta CI = [4.78, 6.60])	Relative suppression of clean EEG lower (better) than predicate in 82/111 cases. SNR after artifact removal higher (better) than predicate in 75/80 cases.	Both parameters found non-inferior.
Rhythmic and Periodic Patterns (NeuroTrend)	Sensitivity and Specificity for pattern detection: (No explicit acceptance criteria stated as thresholds, but performance shown relative to expert annotations)	ANY (all patterns): Sensitivity: 77.59% (95% CI=[75.5-79.7]) Specificity: 86.50% (95% CI=[85.8-87.2]) PD: Sensitivity: 63.37% (95% CI=[60.7-66.0]) Specificity: 96.57% (95% CI=[96.2-96.9]) ARA (RTA, RAA, SW): Sensitivity: 92.72% (95% CI=[88.2-97.2]) Specificity: 94.76% (95% CI=[94.4-95.2]) RDA: Sensitivity: 92.56% (95% CI=[87.5-97.7]) Specificity: 90.44% (95% CI=[89.9-91.0])	Not directly compared to a predicate for these metrics. Inter-reader agreement for localization was moderate (Kappa=0.45, CI=0.43-0.48).
aEEG (NeuroTrend)	Functional equivalence to proposed method (Zhang and Ding, 2013) and good accordance with predicate (Persyst)	Frequency response very similar to published version; both hemispheres show same characteristic; stop band suppression -30dB or higher; slope in pass band approx. -12dB/decade. Good accordance with Persyst and raw EEG values.	Demonstrated good accordance with predicate (Persyst)
Frequency Bands (NeuroTrend)	Correct assignment to frequency bands (Delta, Theta, Alpha, Beta) with measurement error for amplitudes below 5%; correctly identifies globally dominant background frequency.	Correctly assigns test signals to corresponding frequency bands; measurement error for amplitudes below 5%. Relative proportion corresponding to true frequency band > 50%.	Not directly compared to a predicate.
Burst Suppression (NeuroTrend)	Validated by sensitivity, specificity, PPV, NPV relative to expert consensus.	Sensitivity: 87.28% (95% CI=[90.8, 92.3]) Specificity: 92.16% (95% CI=[91.3, 92.9]) PPV: 61.09% (95% CI=[57.9, 64.2]) NPV: 98.09% (95% CI=[97.6, 98.5]) ACC: 91.56% (95% CI=[90.8,92.3])	Not directly compared to a predicate.

2. Sample Size Used for the Test Set and Data Provenance

Seizure Detection:
- Sample Size: 55 subjects. 50 patients with seizure events, 5 subjects without epilepsy.
- Data Provenance: Scalp-EEG recordings from video-EEG monitoring in an epilepsy monitoring unit. Patients were 18 years of age or older. The specific country of origin is not stated but context suggests a European context (AIT Austrian Institute of Technology). The data is retrospective.
- Total EEG data reviewed: 1619 hours of EEG, with a maximum of 30 hours per subject.
Spike Detection:
- Sample Size: 23 patients. 18 subjects (>=18 years) with spike events, 5 subjects (>=18 years) diagnosed with no epilepsy.
- Data Provenance: Scalp-EEG recordings from video-EEG monitoring in an epilepsy monitoring unit. The specific country of origin is not stated but context suggests a European context. The data is retrospective.
Artifact Reduction:
- Sample Size: 128 EEG data records. 60 records from epilepsy monitoring units (31 seizure segments from 31 subjects, 33 spike segments from 6 subjects) and 65 from ICU patients (65 subjects).
- Data Provenance: From different patient groups, covering adult patients in epilepsy monitoring and in critical care. The specific country of origin is not stated but context suggests a European context. The data is retrospective. Each record was 10 seconds of data.
Rhythmic and Periodic Patterns:
- Sample Size: 83 long-term EEGs.
- Data Provenance: Prospectively recorded from ICU-patients at two different centers. The specific country of origin is not stated but context suggests a European context.
- Total Annotation Segments: 11935 common annotation segments (first minute of each hour, split into three 20-second segments).
aEEG (NeuroTrend):
- Sample Size: Not explicitly stated for real EEG data comparison with Persyst, beyond "real EEG data were used."
- Data Provenance: Not explicitly stated.
Frequency Bands (NeuroTrend):
- Sample Size: Not explicitly stated for manually selected EEGs from epilepsy or ICU patients, beyond "Each of these EEG samples is representative..."
- Data Provenance: Manually selected from epilepsy or ICU patients.
Burst Suppression (NeuroTrend):
- Sample Size: 83 long-term EEGs.
- Data Provenance: Recorded from intensive care patients from two different centers. The specific country of origin is not stated but context suggests a European context.
- Total Annotation Segments: 3978 valid annotation segments (first minute of each hour).

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Seizure Detection: 3 independent neurologists for blinded review. Qualifications not explicitly detailed (e.g., years of experience).
Spike Detection: 3 independent neurologists for blinded review. Qualifications not explicitly detailed.
Artifact Reduction: 3 independent epileptologists or neurologists for blinded review. Qualifications not explicitly detailed.
Rhythmic and Periodic Patterns: 2 clinical neurophysiologists who were naive to these EEGs. Qualifications not explicitly detailed.
Burst Suppression: 2 clinical neurophysiologists who were naive to these EEGs. Qualifications not explicitly detailed.

4. Adjudication Method for the Test Set

Seizure Detection: An event was considered a "true seizure" if the time interval of two out of three reviewers overlapped by at least 1 second. The seizure epoch was defined as the overlapping time range of these two reviewers. This is a 2+1 consensus method.
Spike Detection: An event was considered a "true spike" only if the time interval of two out of three reviewers overlapped. For localization, the 3D-coordinates of the electrode next to the spike maximum, averaged over reviewers, were used. This implies a 2+1 consensus for detection and a form of consensus/averaging for localization.
Artifact Reduction: The document states that "annotations of clean EEG recordings without any artifacts, and moreover annotations of artifacts that can be superimposed to the clean recordings" were needed, and "three independent epileptologists or neurologists for blinded review of the EEG data" were engaged. It doesn't explicitly state the adjudication method (e.g., 2+1, 3+1) for these annotations, but likely implies a consensus approach to establish ground truth about "clean" vs "artifact" segments.
Rhythmic and Periodic Patterns: Annotations had to be consistent between both reviewers to be used in the sensitivity and specificity measurement. This implies 100% agreement between 2 reviewers.
Burst Suppression: The performance was analyzed for consensus annotations of the two reviewers, meaning annotation segments where both reviewers showed the same decision. This implies 100% agreement between 2 reviewers.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done

No, an MRMC study comparing human readers with AI assistance versus without AI assistance was not explicitly conducted or reported. The studies focused on the standalone performance of the AI components and their non-inferiority to a predicate device (another software), not on the improvement of human readers with AI assistance.

6. If a Standalone Performance (Algorithm Only Without Human-in-the-Loop Performance) Was Done

Yes, standalone performance was assessed for all tested components. The clinical performance data presented (seizure detection, spike detection, artifact reduction, rhythmic and periodic patterns, aEEG, frequency bands, burst suppression) evaluated the algorithm's output against established ground truth without a human in the loop interpreting the AI's results. For instance, seizure detection compared the algorithm's detected seizure time points to the expert consensus.

7. The Type of Ground Truth Used

Seizure Detection: Expert consensus (2 out of 3 neurologists).
Spike Detection: Expert consensus (2 out of 3 neurologists).
Artifact Reduction: Expert annotations (epileptologists/neurologists) for clean EEG and artifact patterns.
Rhythmic and Periodic Patterns: Expert consensus (2 clinical neurophysiologists with 100% agreement).
aEEG: Based on the proposed method of Zhang and Ding (2013) for frequency response, and comparison to another FDA-approved software (Persyst) for real EEG data.
Frequency Bands: Based on predefined frequency borders (Delta, Theta, Alpha, Beta) and manual selection of EEG recordings representative of specific background-EEG-frequency bands by experts.
Burst Suppression: Expert consensus (2 clinical neurophysiologists with 100% agreement) for burst suppression patterns.

8. The Sample Size for the Training Set

The document does not explicitly provide the sample size for the training set for any of the algorithms. It often refers to "large amount of EEG data from different centers" for bench testing, but does not distinguish between training and test data or specify training set sizes.

9. How the Ground Truth for the Training Set Was Established

Similarly, the document does not explicitly describe how ground truth for the training set was established, as it does not detail the training process or the data used for it. The focus is on the validation/test set and its ground truth establishment.

Ask a Question

Ask a specific question about this device

Page 1 of 1