(126 days)
• autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
• The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old for EEGs <4 hours and at least two years old for EEGs >4 hours. The autoSCORE component has not been assessed for intracranial recordings.
• autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform and non-epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.
• This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.
autoSCORE is a software only device.
autoSCORE is an AI model that has been trained with standard deep learning principles using a large training dataset. The model will be locked in the field, so it cannot learn from data to which it is exposed when in use. It can only be used with a compatible electroencephalogram (EEG) reviewing software, which acquires and displays the EEG. The model has no user interface. The form of the visualization of the annotations is determined and provided by the EEG reviewing software.
autoSCORE has been trained to identify and then indicate to the user sections of EEG which may include abnormalities and to provide the level of probability of the presence of an abnormality. The algorithm also provides categorization of identified areas of abnormality into the four predefined types of abnormalities, again including a probability of that predefined abnormality type. This is performed by identifying epileptiform abnormalities/spikes (Focal epileptiform and generalised epileptiform) as well identifying non-epileptiform abnormalities (Focal non-epileptiform and Diffuse Non-Epileptiform).
This data is then provided by the algorithm to the EEG reviewing software, for it to display as part of the EEG output for the clinician to review. autoSCORE does not provide any diagnostic conclusion about the patient's condition nor treatment options to the user, and does not replace visual assessment of the EEG by the user. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
Acceptance Criteria and Study for autoSCORE (V 2.0.0)
This response outlines the acceptance criteria for autoSCORE (V 2.0.0) and the study conducted to demonstrate the device meets these criteria, based on the provided FDA 510(k) clearance letter.
1. Table of Acceptance Criteria and Reported Device Performance
The FDA clearance document does not explicitly present a table of predefined acceptance criteria (e.g., minimum PPV of X%, minimum Sensitivity of Y%). Instead, the regulatory strategy appears to be a demonstration of substantial equivalence through comparison to predicate devices and human expert consensus. The "Performance Validation" section (Section 7) outlines the metrics evaluated, and the "Validation Summary" (Section 7.2.6) states the conclusion of similarity.
Therefore, the "acceptance criteria" are implied to be that the device performs similarly to the predicate devices and/or to human experts, particularly in terms of Positive Predictive Value (PPV), as this was deemed clinically critical.
Here’s a table summarizing the reported device performance, which the manufacturer concluded met the implicit "acceptance criteria" by demonstrating substantial equivalence:
| Performance Metric (Category) | autoSCORE V2 (Reported Performance) | Primary Predicate (encevis) (Reported Performance) | Secondary Predicate (autoSCORE V1.4) (Reported Performance) | Note on Comparison & Implied Acceptance |
|---|---|---|---|---|
| Recording Level - Accuracy (Abnormal) | 0.912 (0.850, 0.963) | - | 0.950 (0.900, 0.990) | AutoSCORE v2 comparable to autoSCORE v1.4. encevis not provided for "Abnormal." |
| Recording Level - Sensitivity (Abnormal) | 0.926 (0.859, 0985) | - | 1.000 (1.000, 1.000) | autoSCORE v2 slightly lower than v1.4, but still high. |
| Recording Level - Specificity (Abnormal) | 0.833 (0.583, 1.000) | - | 0.884 (0.778, 0.974) | autoSCORE v2 comparable to v1.4. |
| Recording Level - PPV (Abnormal) | 0.969 (0.922, 1.000) | - | 0.920 (0.846, 0.983) | autoSCORE v2 high PPV, comparable to v1.4. |
| Recording Level - Accuracy (IED) | 0.875 (0.800, 0.938) | 0.613 (0.500, 0.713) | IED not provided for v1.4 | IED (Interictal Epileptiform Discharges) combines Focal Epi and Gen Epi. autoSCORE v2 significantly higher accuracy than encevis. |
| Recording Level - Sensitivity (IED) | 0.939 (0.864, 1.000) | 1.000 (1.000, 1.000) | IED not provided for v1.4 | autoSCORE v2 high Sensitivity, similar to encevis. |
| Recording Level - Specificity (IED) | 0.774 (0.618, 0.914) | 0.000 (0.000, 0.000) | IED not provided for v1.4 | autoSCORE v2 significantly higher Specificity than encevis (encevis had 0.000 specificity for IED). |
| Recording Level - PPV (IED) | 0.868 (0.769, 0.952) | 0.613 (0.500, 0.713) | IED not provided for v1.4 | autoSCORE v2 significantly higher PPV than encevis (considered a key clinical metric). |
| Marker Level - PPV (Focal Epi) | 0.560 (0.526, 0.594) | - | 0.626 (0.616, 0.637) (Part 1) / 0.716 (0.701, 0.732) (Part 5) | autoSCORE v2 PPV slightly lower than v1.4 in some instances, but within general range. Comparison is against earlier validation parts of autoSCORE v1.4. |
| Marker Level - PPV (Gen Epi) | 0.446 (0.405, 0.486) | - | 0.815 (0.802, 0.828) (Part 1) / 0.825 (0.799, 0.849) (Part 5) | autoSCORE v2 PPV significantly lower than v1.4. This is a point of difference. |
| Marker Level - PPV (Focal Non-Epi) | 0.823 (0.794, 0.852) | - | 0.513 (0.506, 0.520) (Part 1) / 0.570 (0.556, 0.585) (Part 5) | autoSCORE v2 PPV significantly higher than v1.4. |
| Marker Level - PPV (Diff Non-Epi) | 0.849 (0.822, 0.876) | - | 0.696 (0.691, 0.702) (Part 1) / 0.537 (0.520, 0.554) (Part 5) | autoSCORE v2 PPV significantly higher than v1.4. |
| Marker Level - PPV (IED) | 0.513 (0.486, 0.539) | 0.257 (0.166, 0.349) | 0.389 (0.281, 0.504) | autoSCORE v2 significantly higher PPV than encevis and autoSCORE v1.4. This is a key finding highlighted. |
| Correlation (Prob. vs. TP Markers) | p-value < 0.05 (for positive correlation) | Not applicable | Not applicable | The validation states a "significant positive correlation" (p-value < 0.05) was the criterion, and this was met. |
Key takeaway on Acceptance: The "Validation Summary" directly states: "autoSCORE demonstrated a higher PPV overall compared to the predicate device encevis and a similar PPV compared to autoSCORE v1.4... autoSCORE was found to have a safety and effectiveness profile that is similar to the predicate devices." This conclusion, particularly the superior/similar PPV results, formed the basis for deeming the device "as safe, as effective, and performs as well as" the predicates.
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 80 EEGs (40 Long Term Monitoring EEGs (LTMs) and 40 Ambulatory EEGs (AEEGs)).
- Data Provenance: Retrospective, de-identified data. Original source hospitals/organizations anonymized the data, excluding only age and gender. No specific country of origin is mentioned, suggesting a general pool of collected clinical data. The time periods for data collection are:
- SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021.
- SET 2 AEEGs: 40 EEGs: June-October 2024.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: Three Human Experts (HEs) were used for consensus per EEG.
- Qualifications: The document describes them as "Human Experts (HEs)" and "suitably trained professional who is qualified to clinically review EEG recordings" and "qualified medical practitioners." While specific experience levels (e.g., "radiologist with 10 years of experience") are not provided, the context implies they are board-certified neurologists or equivalent specialists highly proficient in EEG interpretation.
4. Adjudication Method for the Test Set
- Adjudication Method: Consensus of three Human Experts (HEs) was used as the reference standard.
- For recording-level validation, HEs independently labeled each EEG segment.
- For marker-level validation (PPV), each autoSCORE-placed marker was reviewed by HEs, and a marker was classified as True Positive (TP) if at least two out of three HEs agreed it correctly identified the abnormality type. If fewer than two HEs agreed, it was considered a False Positive (FP).
- To prevent bias, HEs evaluating recording-level were blinded to autoSCORE output, and HEs evaluating marker-level had not participated in the initial recording-level assessment of the same EEG. All HEs were blinded to patient metadata (except age/gender) and autoSCORE outputs.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done
- The study design described is a standalone performance evaluation against human expert consensus and predicate devices. While human experts were involved in establishing ground truth, and the device's performance was compared to their consensus, it was not described as a Multi-Reader Multi-Case (MRMC) comparative effectiveness study in which human readers use the AI and are compared to human readers without AI assistance. The study solely evaluated the algorithm's performance against human consensus and predicate devices. Therefore, no effect size for human readers improving with AI assistance is provided.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
- Yes, a standalone performance evaluation of the autoSCORE algorithm was conducted. The study assessed the algorithm's ability to identify and categorize abnormalities in EEG recordings by comparing its outputs directly against human expert consensus (ground truth) and against predicate devices. The validation focused on the algorithm's output metrics (Accuracy, Sensitivity, Specificity, PPV, NPV) at both recording and marker levels.
7. The Type of Ground Truth Used
- The ground truth used was Expert Consensus. Specifically, a consensus agreement of three Human Experts (HEs) served as the reference standard for both recording-level analysis (presence/absence of abnormalities, and their types) and marker-level validation (correctness of autoSCORE-placed markers and their assigned abnormality types). This approach also included a "gold standard" where HEs, blinded to autoSCORE outputs, independently marked abnormalities in the EEG segments.
8. The Sample Size for the Training Set
- The document states that autoSCORE "has been trained with standard deep learning principles using a large training dataset." However, the exact sample size for the training set is not provided in the given FDA 510(k) clearance letter.
9. How the Ground Truth for the Training Set Was Established
- The document implies the training set ground truth was established through similar "deep learning principles" of data preparation, but it does not explicitly detail how the ground truth for the training set was established. It only describes the ground truth establishment for the test set (expert consensus). It's common for training data to be annotated by experts, but the specifics are not included in this document.
FDA 510(k) Clearance Letter - autoSCORE (V 2.0.0)
Page 1
April 9, 2025
Holberg EEG AS
Smriti Franklin
QARA Director
Fjøsangerveien 70A
Bergen, Bergen 5068
Norway
Re: K243743
Trade/Device Name: autoSCORE (V 2.0.0)
Regulation Number: 21 CFR 882.1400
Regulation Name: Electroencephalograph
Regulatory Class: Class II
Product Code: OMB
Dated: December 4, 2024
Received: March 10, 2025
Dear Smriti Franklin:
We have reviewed your section 510(k) premarket notification of intent to market the device referenced above and have determined the device is substantially equivalent (for the indications for use stated in the enclosure) to legally marketed predicate devices marketed in interstate commerce prior to May 28, 1976, the enactment date of the Medical Device Amendments, or to devices that have been reclassified in accordance with the provisions of the Federal Food, Drug, and Cosmetic Act (the Act) that do not require approval of a premarket approval application (PMA). You may, therefore, market the device, subject to the general controls provisions of the Act. Although this letter refers to your product as a device, please be aware that some cleared products may instead be combination products. The 510(k) Premarket Notification Database available at https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm identifies combination product submissions. The general controls provisions of the Act include requirements for annual registration, listing of devices, good manufacturing practice, labeling, and prohibitions against misbranding and adulteration. Please note: CDRH does not evaluate information related to contract liability warranties. We remind you, however, that device labeling must be truthful and not misleading.
If your device is classified (see above) into either class II (Special Controls) or class III (PMA), it may be subject to additional controls. Existing major regulations affecting your device can be found in the Code of Federal Regulations, Title 21, Parts 800 to 898. In addition, FDA may publish further announcements concerning your device in the Federal Register.
Page 2
K243743 - Smriti Franklin Page 2
Additional information about changes that may require a new premarket notification are provided in the FDA guidance documents entitled "Deciding When to Submit a 510(k) for a Change to an Existing Device" (https://www.fda.gov/media/99812/download) and "Deciding When to Submit a 510(k) for a Software Change to an Existing Device" (https://www.fda.gov/media/99785/download).
Your device is also subject to, among other requirements, the Quality System (QS) regulation (21 CFR Part 820), which includes, but is not limited to, 21 CFR 820.30, Design controls; 21 CFR 820.90, Nonconforming product; and 21 CFR 820.100, Corrective and preventive action. Please note that regardless of whether a change requires premarket review, the QS regulation requires device manufacturers to review and approve changes to device design and production (21 CFR 820.30 and 21 CFR 820.70) and document changes and approvals in the device master record (21 CFR 820.181).
Please be advised that FDA's issuance of a substantial equivalence determination does not mean that FDA has made a determination that your device complies with other requirements of the Act or any Federal statutes and regulations administered by other Federal agencies. You must comply with all the Act's requirements, including, but not limited to: registration and listing (21 CFR Part 807); labeling (21 CFR Part 801); medical device reporting (reporting of medical device-related adverse events) (21 CFR Part 803) for devices or postmarketing safety reporting (21 CFR Part 4, Subpart B) for combination products (see https://www.fda.gov/combination-products/guidance-regulatory-information/postmarketing-safety-reporting-combination-products); good manufacturing practice requirements as set forth in the quality systems (QS) regulation (21 CFR Part 820) for devices or current good manufacturing practices (21 CFR Part 4, Subpart A) for combination products; and, if applicable, the electronic product radiation control provisions (Sections 531-542 of the Act); 21 CFR Parts 1000-1050.
All medical devices, including Class I and unclassified devices and combination product device constituent parts are required to be in compliance with the final Unique Device Identification System rule ("UDI Rule"). The UDI Rule requires, among other things, that a device bear a unique device identifier (UDI) on its label and package (21 CFR 801.20(a)) unless an exception or alternative applies (21 CFR 801.20(b)) and that the dates on the device label be formatted in accordance with 21 CFR 801.18. The UDI Rule (21 CFR 830.300(a) and 830.320(b)) also requires that certain information be submitted to the Global Unique Device Identification Database (GUDID) (21 CFR Part 830 Subpart E). For additional information on these requirements, please see the UDI System webpage at https://www.fda.gov/medical-devices/device-advice-comprehensive-regulatory-assistance/unique-device-identification-system-udi-system.
Also, please note the regulation entitled, "Misbranding by reference to premarket notification" (21 CFR 807.97). For questions regarding the reporting of adverse events under the MDR regulation (21 CFR Part 803), please go to https://www.fda.gov/medical-devices/medical-device-safety/medical-device-reporting-mdr-how-report-medical-device-problems.
For comprehensive regulatory information about medical devices and radiation-emitting products, including information about labeling regulations, please see Device Advice (https://www.fda.gov/medical-devices/device-advice-comprehensive-regulatory-assistance) and CDRH Learn (https://www.fda.gov/training-and-continuing-education/cdrh-learn). Additionally, you may contact the Division of Industry and Consumer Education (DICE) to ask a question about a specific regulatory topic. See the DICE website (https://www.fda.gov/medical-devices/device-advice-comprehensive-regulatory-
Page 3
K243743 - Smriti Franklin Page 3
assistance/contact-us-division-industry-and-consumer-education-dice) for more information or contact DICE by email (DICE@fda.hhs.gov) or phone (1-800-638-2041 or 301-796-7100).
Sincerely,
Jay R. Gupta -S
Jay Gupta
Assistant Director
DHT5A: Division of Neurosurgical,
Neurointerventional, and
Neurodiagnostic Devices
OHT5: Office of Neurological and
Physical Medicine Devices
Office of Product Evaluation and Quality
Center for Devices and Radiological Health
Enclosure
Page 4
DEPARTMENT OF HEALTH AND HUMAN SERVICES
Food and Drug Administration
Form Approved: OMB No. 0910-0120
Expiration Date: 07/31/2026
See PRA Statement below.
Indications for Use
Submission Number (if known)
K243743
Device Name
autoSCORE (V 2.0.0)
Indications for Use (Describe)
• autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
• The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old for EEGs <4 hours and at least two years old for EEGs >4 hours. The autoSCORE component has not been assessed for intracranial recordings.
• autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities, including epileptiform and non-epileptiform abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.
• This device does not provide any diagnostic conclusion about the patient's condition to the user. The device is not intended to detect or classify seizures.
Type of Use (Select one or both, as applicable)
☒ Prescription Use (Part 21 CFR 801 Subpart D) ☐ Over-The-Counter Use (21 CFR 801 Subpart C)
CONTINUE ON A SEPARATE PAGE IF NEEDED.
This section applies only to requirements of the Paperwork Reduction Act of 1995.
DO NOT SEND YOUR COMPLETED FORM TO THE PRA STAFF EMAIL ADDRESS BELOW.
The burden time for this collection of information is estimated to average 79 hours per response, including the time to review instructions, search existing data sources, gather and maintain the data needed and complete and review the collection of information. Send comments regarding this burden estimate or any other aspect of this information collection, including suggestions for reducing this burden, to:
Department of Health and Human Services
Food and Drug Administration
Office of Chief Information Officer
Paperwork Reduction Act (PRA) Staff
PRAStaff@fda.hhs.gov
"An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB number."
Page 5
510(k) Summary
1. 510K SUBMITTER
Holberg EEG AS
Fjøsangerveien 70A
5068 Bergen, Norway
Phone: +47 926 44 261
Contact Person: Smriti Franklin
Date Prepared: 26th November 2024
2. DEVICE IDENTIFICATION
Name of Device: autoSCORE V2.0
Common or Usual Name: autoSCORE
Classification Name and Regulation Number: Electroencephalograph, 21 CFR 882.1400
Regulatory Class: II
Product code: OMB
3. PREDICATE DEVICES
4.1 Primary Predicate Device
Trade/Device Name: encevis
Model Number: 1.6
510(K) Submitter/Holder: AIT Austrian Institute of Technology GmbH
510(K) Reference: K171720
4.2 Additional Predicate Device
Trade/Device Name: autoSCORE
Model Number: autoSCORE V1.4
510(K) Submitter/Holder: Holberg EEG AS
510(K) Reference: K231068
No reference devices have been used in this submission.
Page 1 of 19
Page 6
4. Device Description
autoSCORE is a software only device.
autoSCORE is an AI model that has been trained with standard deep learning principles using a large training dataset. The model will be locked in the field, so it cannot learn from data to which it is exposed when in use. It can only be used with a compatible electroencephalogram (EEG) reviewing software, which acquires and displays the EEG. The model has no user interface. The form of the visualization of the annotations is determined and provided by the EEG reviewing software.
autoSCORE has been trained to identify and then indicate to the user sections of EEG which may include abnormalities and to provide the level of probability of the presence of an abnormality. The algorithm also provides categorization of identified areas of abnormality into the four predefined types of abnormalities, again including a probability of that predefined abnormality type. This is performed by identifying epileptiform abnormalities/spikes (Focal epileptiform and generalised epileptiform) as well identifying non-epileptiform abnormalities (Focal non-epileptiform and Diffuse Non-Epileptiform).
This data is then provided by the algorithm to the EEG reviewing software, for it to display as part of the EEG output for the clinician to review. autoSCORE does not provide any diagnostic conclusion about the patient's condition nor treatment options to the user, and does not replace visual assessment of the EEG by the user. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
5.1 Intended Use of the Device
Detailed Intended Use
autoSCORE is a software-only decision support product intended to be used with compatible EEG software. It is intended to assist the user when reviewing EEG recordings by assessing the probability that the previously acquired sections of EEG recordings contain abnormalities and classifying these into predefined types of abnormality. autoSCORE sends this information to the EEG software to indicate where markers indicating abnormality are to be placed in the EEG.
autoSCORE also provides an overview of the probabilities that EEG recordings between 14 minutes and 4 hours include any abnormalities and the probabilities of specific predefined type of abnormalities they include. For EEG recordings of duration more than 4 hours, autoSCORE indicates the number of segments with duration of 2-4 hours that include any abnormalities and the total number of analyzed segments. The overview for EEG recordings of duration more than 4 hours also provides the number of segments that include specific pre-defined types of abnormalities and the total number of analyzed segments.
The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.
autoSCORE cannot detect or classify seizures. The recorded EEG activity is not altered by the information
Page 2 of 19
Page 7
provided by autoSCORE. autoSCORE is not intended to provide information for diagnosis but to assist clinical workflow when using the EEG software.
5.2 Intended Users
The intended user is a suitably trained professional who is qualified to clinically review EEG recordings.
5.3 Indications for Use
autoSCORE can be used wherever EEG data must be evaluated. This includes in particular neurological wards, epilepsy monitoring units and neurological practices.
5.3.1 Indications for use Statement –
-
autoSCORE is intended for the review, monitoring and analysis of EEG recordings made by electroencephalogram (EEG) devices using scalp electrodes and to aid neurologists in the assessment of EEG. This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information.
-
The spike detection component of autoSCORE is intended to mark previously acquired sections of the patient's EEG recordings that may correspond to spikes, in order to assist qualified clinical practitioners in the assessment of EEG traces. The spike detection component is intended to be used in patients at least three months old for EEGs <4 hours and at least two years old for EEGs >4 hours. The autoSCORE component has not been assessed for intracranial recordings.
-
autoSCORE is intended to assess the probability that previously acquired sections of EEG recordings contain abnormalities, and classifies these into pre-defined types of abnormalities. autoSCORE does not have a user interface. autoSCORE sends this information to the EEG reviewing software to indicate where markers indicating abnormality are to be placed in the EEG. autoSCORE also provides the probability that EEG recordings include abnormalities, and the type of abnormalities. The user is required to review the EEG and exercise their clinical judgement to independently make a conclusion supporting or not supporting brain disease.
-
This device does not provide any diagnostic conclusion about the patient's condition to the user.
The Indications for Use statement for autoSCORE is identical to secondary predicate device, however, it's not identical to the primary predicate device as autoSCORE does not contain certain encevis features like seizure detection, burst suppression or calculates quantitative measures. Indications for use statement point 1, 2 and 4 are identical to the respective parts of primary predicate device's indications for use statement. However, Point 3 of the indications for use statement describes autoSCORE's technological characteristics that are different from the predicate device, same as autoSCORE V1.4. These differences do not alter the intended use of the device nor do they affect the safety and effectiveness of the device
Page 3 of 19
Page 8
relative to the primary predicate. Both the subject and predicate devices have the same intended use for analysing electroencephalograph data, detecting events like spike detection and output detected parameters for interpretation by a qualified user.
5.4 autoSCORE Software Technology
autoSCORE is a decision support software that assists trained healthcare professionals with the clinical reviewing of human scalp EEG recordings acquired from patients aged 3 months or older for EEG <4 hours and 2 years and older for EEGs >4 hours. It is a locked algorithm using Deep Learning principles to assess the probability that previously acquired sections of EEG contain abnormalities. Deep Learning is a subset of the Artificial Intelligence and Machine Learning methodologies, which uses artificial neural networks for data analysis.
autoSCORE assesses epileptiform as well as non-epileptiform abnormalities in the patient's EEG. It categorizes the assessed abnormalities into predefined types including Focal Epileptiform, Generalized Epileptiform, Focal Non-Epileptiform and Diffuse Non-Epileptiform abnormalities. The probability of abnormality is assessed for each type of abnormality on the level of the EEG recording as well as for individual markers within the EEG recording.
autoSCORE cannot detect or classify seizures.
autoSCORE is designed to integrate with compatible EEG reviewing software through an integration layer. Users do not need to connect autoSCORE to the EEG Reviewing software and it cannot be purchased by an individual physician without an integration with the EEG reviewing software. autoSCORE shall be available as a feature in the compatible EEG reviewing software. autoSCORE receives EEG data and EEG metadata as input from the compatible EEG reviewing software, including the patient's age, gender, and the electrode sensor labels of the EEG recording.
autoSCORE assesses the EEG using the autoSCORE AI model and automatically annotates the EEG where an abnormality is identified (including the type of abnormality and its probability). This annotation, categorization and probability output is generated and sent to the compatible EEG reviewing software. The output is then presented in the electronic user interface of the compatible EEG reviewing software to a qualified medical professional for independent assessment. The recorded EEG activity and the EEG metadata used as input are not altered by the information provided by autoSCORE. autoSCORE does not store any input or output data. Input data are merely utilized by autoSCORE for the purpose of generating output data, which are then sent to the EEG reviewing software.
5. Device Comparison Table
The device comparison table outlines the differences and similarities between autoSCORE and the predicate devices including technological characteristics.
Page 4 of 19
Page 9
Table 1: Comparison of autoSCORE against predicate devices.
| encevis | autoSCORE v1.4 | autoSCORE V2 | Comments | |
|---|---|---|---|---|
| Device Description and Features | ||||
| Device Type | Software-only Device | Software-only Device | Software-only Device | Identical |
| General Device Description | EEG Review and Analysis Software | EEG Review and Analysis Software | EEG Review and Analysis Software | Identical |
| Identifies Spikes | Yes | Yes | Yes | Identical |
| Assessment and categorization of abnormalities including probability in previously acquired sections of EEG | No | Yes | Yes | Different for primary predicate. Same for secondary predicate device. |
| Device Operation | ||||
| Type of EEG | Scalp EEG | Scalp EEG | Scalp EEG | Identical |
| Population age | Adults (age > 18) | > 3 months | > 3 months for EEGs <4 hours and > 2 years for EEGs >4 hours | Minimum patient age higher than secondary predicate device. |
| Design Input | Raw EEG signal | Raw EEG Signal | Raw EEG Signal | Identical |
| Design Input files | Calculation is based on EEG data recorded by external EEG systems. They are either read from the EEG file provided by the EEG system or can be send to encevis using the interface provided by AIT (AITInterfaceDLL) | Calculation is based on EEG data recorded by external EEG systems. They are read from the EEG data provided by the EEG system | Calculation is based on EEG data recorded by external EEG systems. They are read from the EEG data provided by the EEG system | Identical (No AIT interface) |
| Algorithm | Convolutional Neural Network | Convolutional Neural Network | Convolutional Neural Network | Identical |
| User-defined parameters | No parameters in spike detection algorithm can be changed by the user | No parameters in spike detection algorithm can be changed by the user | No parameters in spike detection algorithm can be changed by the user | |
| Type of EEG Analysis | Post-hoc analysis | Post-hoc analysis | Post-hoc analysis | Identical |
| Design Output | Spike Detection component makes the results available to the user in form of markers | Spike Detection component makes the results available to the user in form of markers | Spike Detection component makes the results available to the user in form of markers | Identical |
| Output Files | Results are stored in a database and/or is send over the interface AITInterfaceDLL to an external EEG system. User | Results are returned back to the host software after analysis. | Results are returned back to the host software after analysis. | Similar |
Page 5 of 19
Page 10
| encevis | autoSCORE v1.4 | autoSCORE V2 | Comments | |
|---|---|---|---|---|
| output is given by graphical user interfaces | ||||
| Diagnostic conclusion | This device does not provide any diagnostic conclusion about the patient's condition to the user. | This device does not provide any diagnostic conclusion about the patient's condition to the user. | This device does not provide any diagnostic conclusion about the patient's condition to the user. | Identical |
| User | This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information. | This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information. | This device is intended to be used by qualified medical practitioners who will exercise professional judgment in using the information. | Identical |
| Compliance | No standard data format available in the industry | No standard data format available in the industry | No standard data format available in the industry | Identical |
| Compatible and interoperable Equipment and software | encevis can read and process EEG data from several EEG vendors. A list of compatible EEG systems can be found on http://www.encevis.com | autoSCORE can read and process EEG data from any compatible/interoperable EEG systems. https://www.holbergeeg.com/compatible-eeg-reviewing-software | autoSCORE can read and process EEG data from any compatible/interoperable EEG systems. https://www.holbergeeg.com/compatible-eeg-reviewing-software | Similar |
Colour Key
- Identical/Similar Characteristics
- Different or N/A Characteristics
There are Technological differences between the subject device (autoSCORE V2) and primary predicate device (encevis) that have been highlighted in Table 1 above. There are additional features in the primary predicate device like seizure detection, analysis of quantitative features, user interface, aEEG functionality etc that is outside the intended use of the subject device. These features are completely independent functions that do not impact the spike detection component. The absence of these features only makes the output given by subject device (autoSCORE V2) lower risk to the patient than the output provided by primary predicate device (encevis).
encevis and autoSCORE V2 detect spikes (epileptiform abnormalities.) In addition to the spike detection of epileptiform abnormalities, autoSCORE V2 and autoSCORE V1.4 also detect non-epileptiform
Page 6 of 19
Page 11
abnormalities. autoSCORE V2 and V1.4 also gives the probability of the detected abnormality being an epileptiform abnormality - Focal epileptiform, Generalized epileptiform or non-epileptiform abnormality - Focal non-epileptiform, Diffuse non-epileptiform. The identification of additional abnormalities and categorization of these abnormalities does not pose any additional risks to the information provided by predicate devices as evidenced through performance validation.
There are no technological features between the subject device (autoSCORE V2) and the secondary predicate device (autoSCORE V1). There are some minor design changes but all major technological characteristics including the AI model is the same for both autoSCORE V1.4 and V2.0
7. Performance Validation
autoSCORE Performance Validation was conducted to evaluate autoSCORE performance in two parts.
- Non-Clinical Validation – To validate autoSCORE outputs against defined autoSCORE Inputs and User requirements. Verification and validation activities established the safety and performance characteristics of the subject device with respect to the predicate device.
- Clinical Validation – To validate autoSCORE performance against Independent Human Experts and predicate devices.
These validations have been summarised below.
7.1 Non clinical Performance Validation
Software verification and validation testing was conducted and documented in accordance with FDA Guidance for Industry and FDA Staff, Guidance for the Content of Software Contained in Medical Devices. Product Design and Software Requirements Traceability has been documented and verified against verification and validation test results.
Verification and validation testing includes:
- Code Review
- Unit level testing
- System level testing
- Integration level testing
Page 7 of 19
Page 12
Verification and validation activities established the safety and performance characteristics of the subject device with respect to the predicate device. The following performance data have been provided in support of the substantial equivalence determination.
Table 2: Type of performance test per feature
| Verification Tests Performed | autoSCORE Features - Identification and categorization of following abnormalities | ||||
|---|---|---|---|---|---|
| Normal EEG | Spike Detection - epileptiform abnormalities | Non-epileptiform abnormalities | |||
| Focal epileptiform | Generalized epileptiform | Focal non-epileptiform | Diffuse non-epileptiform | ||
| Software Verification and Validation Testing | x | x | x | Not available in predicate | Not available in predicate |
7.2 Clinical Performance Validation
7.2.1 Clinical Performance Evaluation
A retrospective non-interventional comprehensive clinical validation was performed using de-identified data to evaluate the performance of all autoSCORE features against Human Experts (HEs) and predicate devices to establish substantial equivalence.
The following performance data have been provided in support of the substantial equivalence determination.
Table 3: Type of performance test per feature
| Validation Tests Performed | autoSCORE Features - Identification and Categorization of the Following Abnormalities | ||||
|---|---|---|---|---|---|
| Normal EEG | Spike Detection - Epileptiform Abnormalities | Non-Epileptiform Abnormalities | |||
| Focal Epileptiform | Generalized Epileptiform | Focal Non-Epileptiform | Diffuse Non-Epileptiform | ||
| Direct Comparison Against Predicate Device | x | x | x | autoSCORE V1.4 | autoSCORE V1.4 |
| Comparison with Human Expert Evaluation | x | x | x | x | x |
For performance evaluation of the autoSCORE spike detection device, the study was conducted to measure outputs of autoSCORE V2 against the spike detection from encevis and autoSCORE V1.4, using HE consensus as the reference standard.
Page 8 of 19
Page 13
7.2.2 Study Population
40 Long Term Monitoring EEGs (LTMs) and 40 Ambulatory EEGs (AEEGs) were included ensuring broad distribution of age, gender, patient setting (excluding ICU and neonatal recordings) and types of abnormalities. EEG recordings used in this validation were anonymized by the source hospital/organization. The anonymization included patient metadata, with exclusion of age and gender.
The following distribution of EEGs was used in this validation.
Figure 1: The figure shows the distribution of normal/abnormal (including abnormality types) EEGs in LTM and ambulatory settings for adult and pediatric EEGs. (NOTE - Each EEG may contain multiple abnormalities.)
| Normal | Focal Epi | Generalized Epi | Focal Non-Epi | Diffuse Non-Epi | |
|---|---|---|---|---|---|
| AEEG Adult | 4 | 8 | 4 | 3 | 6 |
| AEEG Paediatric | 4 | 10 | 7 | 4 | 4 |
| LTM Adult | 4 | 8 | 5 | 5 | 4 |
| LTM Paediatric | 4 | 7 | 8 | 7 | 4 |
For performance evaluation of the autoSCORE spike detection device, the study was conducted to measure outputs of autoSCORE V2 against the spike detection from encevis and autoSCORE V1.4, using HE consensus as the reference standard.
Page 9 of 19
Page 14
7.2.3 Reference Standard
A consensus of three HEs was used as the reference standard for all calculations. Each segment was prepared in two forms:
- Without any markers placed by autoSCORE v 2.0 for recording level validation
- With autoSCORE v2.0 markers and their assigned type of abnormality for marker level validation.
To prevent bias in HE assessment, no HE evaluated the same EEG segment in both recording-level (without markers) and marker-level (with markers) formats. Each HE was assigned a distinct set of EEG segments and was blinded to the autoSCORE output for their assigned recording level validation segments. EEG segments and markers were distributed to ensure a three-HE consensus per EEG. HEs were blinded to patient metadata, with the exception of age and gender, and to the outputs of autoSCORE.
While reviewing EEGs, HEs were permitted to change montages, filters, gain, and time resolution. For recording-level validation, HEs independently labelled each EEG segment using the same predefined abnormality types as autoSCORE and inserted markers into the EEG where abnormalities could be found.
For marker-level validation, HEs reviewed autoSCORE v 2.0 markers, retaining those where they agreed that the given abnormality type was present within the markers' boundaries and removing markers if the given abnormality type was absent.
Page 10 of 19
Page 15
7.2.4 Analytical Methods
The analytical methods employed in this validation are described in 7.2.4.1-7.2.4.5 below.
Figure 1 – This flowchart shows the hierarchical organization of the autoSCORE outputs, including the thresholds used to classify recordings into categories such as normal or abnormal, specific abnormality types, and associated output for recordings with duration of four hours and longer. The arrows indicate dependencies, for example: a marker of the type "Focal Epi" is only given if the corresponding segment-level output also exceeds the threshold for "Focal Epi"
Page 11 of 19
Page 16
7.2.4.1 Recording Level Validation
The binary metrics given in Table 4, in section 7.2.5, were computed independently for each feature (Normal/Abnormal, Focal Epi, Gen Epi, Focal Non-Epi, Diffuse Non-Epi) with 95% symmetric confidence intervals. The following definitions were used for the binary metrics for the recording segment level:
TP – HE consensus indicated that the condition is present and autoSCORE also indicates that the condition is present.
FP - HE consensus indicated that the condition is not present but autoSCORE indicates that the condition is present.
TN - HE consensus indicated that the condition is not present and autoSCORE also indicates that the condition is not present.
FN - HE consensus indicated that the condition is present but autoSCORE indicates that the condition is not present.
Values from the contingency tables were used to calculate the following performance metrics, with 95% confidence intervals computed using bootstrap resampling: Sensitivity (TPR), Specificity (TNR), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Accuracy.
7.2.4.2 Marker Level Validation
In the current study, marker-level validation was performed using multiple approaches to evaluate the performance of autoSCORE in detecting and annotating EEG abnormalities. Given the practical limitations of HEs in marking every abnormality within lengthy EEG recordings, different methods were employed to compute the relevant performance metrics: Positive Predictive Value (PPV), True Positive Rate (TPR), False Positive Rate (FPR), and Negative Predictive Value (NPV).
Positive Predictive Value (PPV):
PPV was calculated using the assessments of autoSCORE markers by a consensus of three HEs. Each marker placed by autoSCORE was reviewed by HEs who had not participated in the initial recording-level assessment of the same EEG. A marker was classified as a True Positive (TP) if at least two HEs agreed that it correctly the abnormality type. Conversely, if fewer than two HEs agreed, the marker was considered a False Positive (FP). This approach allowed us to compute the PPV as:
𝑃𝑃𝑉 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃)
Page 12 of 19
Page 17
The resulting PPV values are given in table 5 of in section 7.2.5.
From a clinical perspective, avoiding false positives is generally considered more critical than avoiding false negatives. For this reason PPV was chosen as the acceptance criterion for evaluating autoSCORE's performance [1, 2]. The most robust method for calculating the PPV of autoSCORE markers involved presenting the markers to HEs, who were blinded to the recording-level autoSCORE outputs as outlined above.
Table 3 summarises the overlap between autoSCORE markers, encevis markers, HE-placed markers, and the HE consensus.
7.2.4.3 Validation of Probability Output
The probability outputs assigned to the markers by autoSCORE were validated by analyzing the relationship between these probabilities and the correctness of the markers. The validation process was conducted as follows:
- All autoSCORE markers were categorized into 5-percentage-points bins based on their assigned probabilities.
- For each bin, the average probability and the number of True Positives (TPs) were calculated.
- A Pearson correlation coefficient was computed to assess the relationship between the average probabilities and the number of TPs across all bins (See table 6).
The criterion for validation was a significant positive correlation (p-value < 0.05) between higher probabilities and the likelihood of the markers being TPs. The correlation coefficients and corresponding p-values are provided in Table 6.
7.2.4.4 Comparison with Predicate Device (encevis)
The predicate device, encevis, was also evaluated using similar methods. encevis places "Epi-Spike" markers in the EEG to identify epileptiform activity but does not distinguish between focal and generalized epileptiform discharges and does not detect non-epileptiform abnormalities.
For encevis, we:
- Merged focal and generalized epileptiform abnormalities into a binary category (epileptiform yes/no) for comparison.
- Interpreted the presence of at least one "Epi-Spike" marker as a positive assessment on recording-level.
- Repeated measurement of PPV, TPR, Accuracy, NPV, and FPR using the same methods as for autoSCORE for the recording level output (See table 4).
Page 13 of 19
Page 18
For marker level output, PPV calculation for encevis was not possible as encevis markers were not visible to HE for direct assessment. The PPV value provided in Table 5 originates from validation of autoSCORE V1.4 for routine EEGs.
These analyses allowed for a relevant comparison between autoSCORE and encevis, accounting for their different functionalities.
7.2.4.5 Statistical Analysis
All performance metrics, except for the marker probability correlation, were reported with 95% confidence intervals computed using bootstrap resampling. This statistical approach provides an estimate of the variability of the metrics and enhances the robustness and reliability of the findings. All computations were performed in Python, utilizing the numpy, pandas, and the scipy.stats libraries.
7.2.5 Results of Performance Evaluation
7.2.5.1 Recording level
Table 4 below presents the summary results for autoSCORE v2.0 in categorizing LTM and AEEG segments as Abnormal and classifying them into four abnormality types using the methods presented above. Table 4 also includes results from the current study for the primary predicate device, encevis, as well as previously reported results for secondary predicate device, autoSCORE v1.4.
Subgroup analyses on recording level for Abnormal/Normal, different part of LTM (first, middle and last), different settings (hospital and ambulatory), and different age groups (adult and pediatric) have been documented in HB-002280-RA - autoSCORE V2 Validation Report Appendix A Output 2. These analyses of autoSCORE v2.0 revealed no significant differences across these subgroups.
Table 4. Summary results for autoSCORE v2.0 and the predicate devices at the recording level including results for output 2 and output 5.
Source of data: autoSCORE v2.0 results, encevis results, and HE analysis. Background of truth: HE consensus. Time period data was collected: (SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021. SET 2 AEEGs: 40 EEGs: June-October 2024). Method used to obtain 95% confidence intervals: Bootstrap analysis. Margin of error: Up to 11%. The autoSCORE v1.4 results are taken from HB-001948-PP - Appendix H autoSCORE Output 5.
| Device | Category | Accuracy (ACC) | Sensitivity (TPR) | Specificity (TNR) | Precision (PPV) | NPV | Prevalence |
|---|---|---|---|---|---|---|---|
| autoSCORE V2 | Abnormal | 0.912 (0.850, 0.963) | 0.926 (0.859, 0985) | 0.833 (0.583, 1.000) | 0.969 (0.922, 1.000) | 0.666 (0.412, 0.905) | 0.850 |
| Focal Epi | 0.787 (0.700, 0.875) | 0.765 (0.613, 0.900) | 0.804 (0.682, 0.913) | 0.743 (0.590, 0.882) | 0.822 (0.704, 0.929) | 0.425 | |
| Gen Epi | 0.925 (0.863, 0.975) | 0.964 (0.880, 1.000) | 0.904 (0.816, 0.980) | 0.844 (0.706, 0.964) | 0.979 (0.932, 1.000) | 0.35 |
Page 14 of 19
Page 19
| Non-Epi Diff | 0.850 (0.762, 0.925) | 0.680 (0.483, 0.857) | 0.927 (0.852, 0.983) | 0.809 (0.625, 0.957) | 0.864 (0.771, 0.947) | 0.3125 | |
|---|---|---|---|---|---|---|---|
| Non-Epi Focal | 0.838 (0.750, 0.912) | 0.667 (0.500, 0.824) | 0.957 (0.891, 1.000) | 0.917 (0.789, 1.000) | 0.804 (0.695, 0.902) | 0.4125 | |
| IED | 0.875 (0.800, 0.938) | 0.939 (0.864, 1.000) | 0.774 (0.618, 0.914) | 0.868 (0.769, 0.952) | 0.889 (0.758, 1.000) | 0.6125 | |
| autoSCORE v1.4 (from the validation of autoSCORE v1.4 part 2) | Abnormal | 0.950 (0.900, 0.990) | 1.000 (1.000, 1.000) | 0.884 (0.778, 0.974) | 0.920 (0.846, 0.983) | 1.000 (1.000, 1.000) | 0.570 |
| Focal Epi | 0.850 (0.780, 0.920) | 0.739 (0.545, 0.913) | 0.883 (0.808, 0.949) | 0.654 (0.458, 0.833) | 0.919 (0.851, 0.974) | 0.230 | |
| Gen Epi | 0.950 (0.900, 0.990) | 1.000 (1.000, 1.000) | 0.941 (0.886, 0.988) | 0.751 (0.545, 0.938) | 1.000 (1.000, 1.000) | 0.150 | |
| Non-Epi Diff | 0.840 (0.770, 0.910) | 0.875 (0.727, 1.000) | 0.828 (0.740, 0.909) | 0.617 (0.448, 0.780) | 0.955 (0.897, 1.000) | 0.240 | |
| Non-Epi Focal | 0.850 (0.780, 0.920) | 0.615 (0.421, 0.800) | 0.932 (0.868, 0.986) | 0.761 (0.562, 0.941) | 0.874 (0.795, 0.941) | 0.260 | |
| encevis | IED | 0.613 (0.500, 0.713) | 1.000 (1.000, 1.000) | 0.000 (0.000, 0.000) | 0.613 (0.500, 0.713) | n/a | 0.6125 |
For autoSCORE v1.4, the targeted prevalence of abnormal versus normal EEGs in the multicenter validation study was 60% abnormal and 40% normal. Additionally, the goal was to include at least 15% of each abnormality type: FocalEpi, GeneralizedEpi, FocalNonEpi, and DiffuseNonEpi. This validation emphasized differentiating normal from abnormal EEGs, leading to a relatively high proportion of normal recordings. The final clinical validation dataset for v1.4, based on the consensus of 11 experts, comprised 57% abnormal and 43% normal EEGs. Since each EEG could include multiple abnormality types, their prevalence exceeded the targeted 15% for most categories, reaching 23%, 15%, 24%, and 26% for FocalEpi, GeneralizedEpi, FocalNonEpi, and DiffuseNonEpi, respectively.
For autoSCORE v2.0, the validation focused on long-duration LTM and ambulatory EEG recordings. Due to the substantial effort required for expert review, the number of EEGs in this study (N=80) was lower than for the validation of the shorter EEGs (N=100). To ensure a good representation and sufficient statistical power of all abnormality types, the targeted abnormal-to-normal EEG distribution was adjusted to 80% and 20%, respectively, with at least 20% for each abnormality type. The final clinical validation dataset, based on the consensus of three experts per EEG, resulted in 85% abnormal and 15% normal EEGs. As in v1.4, multiple abnormality types per EEG led to a prevalence exceeding the targeted 20%, reaching 43%, 35%, 31%, and 41% for FocalEpi, GeneralizedEpi, FocalNonEpi, and DiffuseNonEpi, respectively. The prevalences of Interictal Epileptiform Discharges (IED) were 30% in the v4.1 validation Part 3, 52% in v4.1 Part 4, and 61% in the v2.0 validation.
The IED abnormality type includes both FocalEpi and GeneralizedEpi categories. An EEG is classified as including IED if it contains either or both of these abnormality types. Consequently, the prevalence of IED is higher than that of either individual category but remains lower than the sum of the individual prevalences.
Page 15 of 19
Page 20
The autoSCORE IED category was introduced to facilitate comparison between autoSCORE and encevis, as encevis does not distinguish between FocalEpi and GeneralizedEpi. However, it was not intended for comparison between autoSCORE v2.0 and v1.4, as IED is not an autoSCORE output category. For this reason, IED results from the v1.4 validation were not included in Table 4.
The expected prevalence of abnormal versus normal EEGs, as well as the distribution of specific abnormality types, varies across hospitals and EEG labs depending on their specialization. Some facilities are highly specialized epilepsy centers, primarily handling complex, refractory epilepsy cases referred by other specialists. Others focus on broader diagnostic workflows, distinguishing epilepsy from non-epileptic conditions.
These prevalence differences are evident in the clinical validation studies for autoSCORE v1.4 and v2.0. To be included in the assessment of abnormal cases, a sufficient number of abnormalities must be present. However, no single abnormality type will be disproportionately weighted based on prevalence, as all fall within the range of 31% to 43%. Such variation is expected, given that the final distribution is based on human consensus among blinded raters. Despite the observed variation in prevalence, the results are consistent between the v1.4 and the 2.0 validations showing similar autoSCORE PPVs across the v1.4 and v2.0 validations, and superior autoSCORE versus encevis PPVs. The "Abnormal" and "IED" categories are combinations of abnormality types, leading to higher overall prevalences.
7.2.5.2 Marker level
The evaluation of autoSCORE v2.0 marker placement and correct type assignment, based on HE consensus, is presented in Table 5. Previously reported results for the predicate devices, encevis and autoSCORE v1.4, are also included. Table 5. reports PPV as the primary performance metric. From a clinical perspective, avoiding false positives is generally considered more critical than avoiding false negatives [1, 2]. The PPV values presented were obtained using the most robust method, as described above, while other performance parameters were obtained with methods that are more explorative and qualitative.
Table 6 provides an overview of the overlap between autoSCORE's v2.0 markers and the encevis spikes and the HE placed markers, and also the HE consensus assessment of markers dependent on the probability assigned to the autoSCORE markers. The highest agreement between the devices was observed for markers assigned a higher probability, indicating that markers assigned with the highest probabilities are the most likely to indicate presence of abnormality.
Page 16 of 19
Page 21
Subgroup analyses on marker level for different types of abnormalities, Different parts of LTM (first, middle, and last), different settings (hospital and ambulatory), and different age groups (adult and pediatric) revealed no significant differences across these subgroups.
Table 5. Summary results for autoSCORE v2.0 and the predicate devices at the marker level. The PPV and associated contingency data for autoSCORE v2.0 identified within the first 36 minutes of each segment across different abnormality types. The results for autoSCORE v1.4 and encevis are sourced from the validation of autoSCORE v1.4 See HB-001946-PP - Appendix J autoSCORE Output 7 and 8 (Marker level) for detailed description. Data Source: autoSCORE v2.0 results and HE analysis. Reference Standard: HE consensus. Time period data was collected: (SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021. SET 2 AEEGs: 40 EEGs: June-October 2024). Method used to obtain 95% confidence intervals: Bootstrap analysis. Margin of error: 2.6%-4.1%
| Device | Marker Type | Number of Samples | FP | TP | PPV |
|---|---|---|---|---|---|
| autoSCORE v2.0 | Focal Epi | 807 | 355 | 452 | 0.560 (0.526, 0.594) |
| Gen Epi | 568 | 315 | 253 | 0.446 (0.405, 0.486) | |
| Focal Non-Epi | 667 | 118 | 549 | 0.823 (0.794, 0.852) | |
| Diff Non-Epi | 664 | 100 | 564 | 0.849 (0.822, 0.876) | |
| IED | 1375 | 670 | 705 | 0.513 (0.486, 0.539) | |
| encevis (from the validation of autoSCORE v1.4 Part 4) | IED | 805 | 597 | 208 | 0.257 (0.166, 0.349) |
| autoSCORE v1.4 (from the validation of autoSCORE v1.4 Part 4) | IED | 509 | 311 | 198 | 0.389 (0.281, 0.504) |
| autoSCORE v1.4 (from the validation of autoSCORE v1.4 Part 1) | Focal Epi | 172043 | 3192 | 5354 | 0.626 (0.616, 0.637) |
| Gen Epi | 168441 | 643 | 2839 | 0.815 (0.802, 0.828) | |
| Diff Non-Epi | 204119 | 9481 | 21761 | 0.696 (0.691, 0.702) | |
| Focal Non-Epi | 179700 | 10101 | 10644 | 0.513 (0.506, 0.520) | |
| autoSCORE v1.4 (from the validation of autoSCORE v1.4 Part 5) | Focal Epi | 29989 | 886 | 2238 | 0.716 (0.701, 0.732) |
| Gen Epi | 27521 | 159 | 748 | 0.825 (0.799, 0.849) | |
| Diff Non-Epi | 28986 | 1510 | 1753 | 0.537 (0.520, 0.554) | |
| Focal Non-Epi | 30168 | 1974 | 2621 | 0.570 (0.556, 0.585) |
From a clinical perspective, markers with the highest probabilities are the most impactful, as they are designed to help direct users to the most relevant findings by sorting them in descending order of probability. The average PPV of markers with a probability between 90% and 100% is 0.83, compared to 0.67 for all markers. For IEDs, the PPV using overlap with encevis as reference increases from 0.51 for all markers to 0.82 for those in the 90%–100% probability range.
Page 17 of 19
Page 22
Table 6. Overview of the overlap between autoSCORE markers and those identified by encevis and three HEs in full-length EEG segments (2–4 hours) across different probability ranges. It also includes an evaluation of autoSCORE markers from the first 36 minutes of each segment, based on HE consensus. Probability correlation and p-value was calculated based on 5% probability ranges. Comprehensive details can be found in HB-002235-PP – autoSCORE v2 Clinical Validation Protocol. Data Source: autoSCORE v2.0 results, encevis results, and HE analysis. Reference Standard: Markers identified by encevis, individual HEs, or HE consensus. Time period data was collected: (SET 1 LTMs: 39 EEGs – June-September 2024; 1 EEG: November 2021-December 2021. SET 2 AEEGs: 40 EEGs: June-October 2024).
7.2.6 Validation Summary
In this clinical performance validation, autoSCORE demonstrated a higher PPV overall compared to the predicate device encevis and a similar PPV compared to autoSCORE v1.4. These results indicate that autoSCORE's output performance is similar to both encevis and autoSCORE v1.4.
Since autoSCORE and encevis do not provide a diagnosis, and the results from autoSCORE are reviewed by medical professionals for independent assessment, any major adverse effects to the patients are unlikely. An incorrect autoSCORE result may lead to minor injury to the patient if an incorrect result is provided. However, based on the clinical performance, as documented in this validation, autoSCORE was found to have a safety and effectiveness profile that is similar to the predicate devices. autoSCORE's additional technological characteristics that differ from predicate devices were also found have a safety and effectiveness profile that is similar to the HEs. Therefore, autoSCORE is as safe, as effective, and performs as well as encevis and autoSCORE v1.4.
7 Biocompatibility, Electrical Safety and electromagnetic Compatibility (EMC), Mechanical and acoustic testing and Animal Study
autoSCORE is a software only device. Biocompatibility, electrical safety, electromagnetic compatibility, mechanical and acoustic testing is not applicable. There were no animal studies performed for this submission.
Page 18 of 19
Page 23
8 Statement of Substantial Equivalence
Since the predicate devices were cleared based in part on the results of clinical studies, and since the comparison of bench testing to clinical outcomes is still not well understood for this type of device, clinical testing was required to support substantial equivalence.
The non-clinical data support the safety of the device, and the software verification and validation demonstrate that the autoSCORE device should perform as intended in the specified use conditions. The clinical data demonstrate that the subject device (autoSCORE) performs as well as predicate devices that is currently marketed for the same intended use.
Therefore, autoSCORE is substantially equivalent to predicate devices in its intended use. Any differences between the subject and predicate device have no significant influence on safety or effectiveness. autoSCORE is at least as safe and effective as the legally marketed as predicate devices, as established through performance testing. Therefore, autoSCORE raises no new issues of safety or effectiveness when compared to the predicate device.
9 References
-
Benbadis, S. and P. Kaplan, The dangers of over-reading an EEG. Journal of Clinical Neurophysiology, 2019. 36(4): p. 249.
-
Amin, U. and S.R. Benbadis, The role of EEG in the erroneous diagnosis of epilepsy. Journal of clinical neurophysiology, 2019. 36(4): p. 294-297.
Page 19 of 19
§ 882.1400 Electroencephalograph.
(a)
Identification. An electroencephalograph is a device used to measure and record the electrical activity of the patient's brain obtained by placing two or more electrodes on the head.(b)
Classification. Class II (performance standards).