Search Results

SleepStageML is intended for assisting the diagnostic evaluation by a qualified clinician to assess sleep quality from level 1 polysomnography (PSG) recordings in a clinical environment in patients aged 18 and older.

SleepStageML is a software-only medical device to be used to analyze physiological signals and automatically score sleep stages. All outputs are subject to review by a qualified clinician.

Device Description

SleepStageML is an Artificial Intelligence/Machine Learning (Al/ML)-enabled software-only medical device that analyzes polysomnography (PSG) recordings and automatically scores sleep stages. It is intended for assisting the diagnostic evaluation by a qualified clinician to assess sleep quality in patients aged 18 and older.

Qualified clinicians (also referred to as clinical users) such as sleep physicians, sleep technicians, or registered PSG technologists (RPSGTs) who are qualified to review PSG studies, provide PSG recordings in European Data Format (EDF) file format through a secure file transfer system to Beacon Biosignals. SleepStageML automatically analyzes the provided PSG recording and return an EDF file containing the original PSG recording with software-generated sleep stage annotations (i.e., Wake (W), non-REM 1 (N1), non-REM 2 (N2), non-REM 3 (N3), and REM (R)) back to the clinical user. The EDF files containing PSG signals as well as sleep stage annotations are referred to as EDF+. The returned EDF+ files can then be reviewed by the qualified clinicians via the users' PSG viewing software. The recordings processed by SleepStageML are level-1 PSG recordings obtained in an attended setting in accordance with American Association of Sleep Medicine (AASM) recommendations with respect to minimum sampling rate, electroencephalography (EEG) channels, and EEG locations. SleepStageML only uses the EEG signals in provided PSGs and does not consider electromyography (EMG) or electrooculography (EOG) signals when performing sleep staging. The sleep stage outputs of SleepStageML are intended to be comparable to sleep stages as defined by AASM guidelines. SleepStageML software outputs are subject to qualified clinician's review.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) summary for SleepStageML:

Acceptance Criteria and Reported Device Performance

Sleep Staging Comparisons	Acceptance Criteria (Predicate Reference: Sleep Profiler, K153412, N=43 subjects)	Reported Device Performance (SleepStageML, N=100 subjects)
Overall Agreement (OA)
W	89%	96.1% (95% CI: 95.4%, 96.8%)
N1	89%	94.5% (95% CI: 93.7%, 95.2%)
N2	81%	87.1% (95% CI: 85.9%, 88.3%)
N3	91%	92.9% (95% CI: 91.8%, 93.8%)
R	95%	97.3% (95% CI: 96.7%, 97.9%)
Positive Agreement (PA)
W	73%	88.9% (95% CI: 86.5%, 91.2%)
N1	25%	58.4% (95% CI: 54.2%, 62.4%)
N2	77%	79.8% (95% CI: 77.7%, 81.8%)
N3	76%	93.0% (95% CI: 89.8%, 95.7%)
R	74%	93.1% (95% CI: 91.5%, 94.5%)
Negative Agreement (NA)
W	94%	98.5% (95% CI: 98.2%, 98.8%)
N1	93%	96.2% (95% CI: 95.4%, 96.9%)
N2	84%	94.2% (95% CI: 93.2%, 95.0%)
N3	94%	92.9% (95% CI: 91.7%, 93.9%)
R	97%	98.0% (95% CI: 97.3%, 98.6%)
Multi-stage Agreement	Not explicitly stated for predicate in a comparable way, but implied.	84.02% (Calculated from N=100 subjects total epochs: 86,983 overall, 2,289 no consensus)

Study Details:

Sample sizes used for the test set and data provenance:
- Test Set Sample Size: 100 patients.
- Data Provenance: Retrospective pivotal validation study using previously collected clinical polysomnography (PSG) recordings. The recordings were randomly selected from three Level 1 clinical PSG data sources. The document does not specify the country of origin of the data.
Number of experts used to establish the ground truth for the test set and their qualifications:
- Number of Experts: Three (3) registered PSG technologists (RPSGTs).
- Qualifications: Each RPSGT had at least 5 years of experience in clinical scoring of sleep studies.
Adjudication method for the test set:
- Method: 2/3 majority scoring. Expert consensus sleep stages were constructed using the stage per epoch where at least 2 of the 3 experts agreed. Epochs where all 3 RPSGTs disagreed were excluded.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done:
- No, an MRMC study comparing human readers with AI vs. without AI assistance was not explicitly detailed. The study focused on the standalone performance of the AI algorithm against human expert consensus to demonstrate non-inferiority to a predicate device. The device's indication for use explicitly states, "All outputs are subject to review by a qualified clinician," indicating a human-in-the-loop design, but the described performance study is primarily a standalone evaluation.
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, the clinical validation test evaluated the SleepStageML software's performance "against the expert consensus sleep stages" in a standalone manner. The device's outputs are intended to be reviewed by a clinician, but the performance metrics reported are for the algorithm's direct output compared to ground truth.
The type of ground truth used:
- Type: Expert Consensus. The ground truth was established by three RPSGTs, with a 2/3 majority rule for consensus.
The sample size for the training set:
- The document states, "SleepStageML uses a deep learning algorithm based on convolutional neural networks, which was trained on a large and diverse set of PSG recordings with sleep staging labels." However, a specific sample size for the training set is not provided in the summary.
How the ground truth for the training set was established:
- The document states the training was on "PSG recordings with sleep staging labels." It does not explicitly detail the method for establishing ground truth for the training set (e.g., if it was also expert consensus, single expert, or another method). However, given the nature of sleep staging, it is highly likely that these labels were also derived from expert annotations, similar to the test set, though possibly not with the same rigorous 3-expert consensus and adjudication for every record.

Ask a Question

Ask a specific question about this device

Page 1 of 1