Search Results

Automatic scoring of sleep EEG data to identify stages of sleep according the American Academy of Sleep Medicine definitions, rules and guidelines. It is to be used with adult populations.

Device Description

The Neurosom EEG Assessment Technology (NEAT) is a medical device software application that allows users to perform sleep staging post-EEG acquisition. NEAT allows users to review sleep stages on scored MFF files and perform sleep scoring on unscored MFF files.

NEAT software is designed in a client-server model and comprises a User Interface (UI) that runs on a Chrome web browser in the client computer and a Command Line Interface (CLI) software that runs on a Forward-Looking Operations Workflow (FLOW) server.

The user interacts with the NEAT UI through the FLOW front-end application to initiate the NEAT workflow on unscored MFF files and visualize sleep-scoring results. Sleep stages are scored by the containerized neat-cli software on the FLOW server using the EEG data. The sleep stages are then added to the input MFF file as an event track file in XML format. Once the new event track file is created, the NEAT UI component retrieves the sleep events from the FLOW server and displays a hypnogram (visual representation of sleep stages over time) on the screen, along with sleep statistics and other subject details. Additionally, a summary of the sleep scoring is automatically generated and added to the same participant in the FLOW server in PDF format.

AI/ML Overview

The FDA 510(k) Clearance Letter for NEAT 001 provides information about the device's acceptance criteria and the study conducted to prove its performance.

Acceptance Criteria and Device Performance

The core acceptance criteria for NEAT 001, as demonstrated by the comparative clinical study, are based on its ability to classify sleep stages (Wake, N1, N2, N3, REM) with performance comparable to the predicate device, EnsoSleep, and within the variability observed among expert human raters.

Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state pre-defined numerical "acceptance criteria" for each metric (Sensitivity, Specificity, Overall Agreement) that NEAT 001 had to meet. Instead, the approach was a comparative effectiveness study against a predicate device (EnsoSleep), with the overarching criterion being "substantial equivalence" as interpreted by performance falling within the range of differences expected among expert human raters.

Therefore, the "acceptance criteria" are implied by the findings of substantial equivalence. The "reported device performance" is given in terms of the comparison between NEAT and EnsoSleep, and their differences relative to human agreement variability.

Metric / Sleep Stage	NEAT Performance (vs. Predicate EnsoSleep)	Acceptance Criteria (Implied)
Wake (Wa)	Equivalent performance (1-2% difference)	Difference within range of human agreement variability
REM (R)	EnsoSleep performed better (3-4% difference)	Difference within range of human agreement variability (stated as 3% for CSF dataset)
N1 (Overall Performance)	EnsoSleep better (4-7%)	Difference within range of human agreement variability (only in BEL data set was this difference bigger than human agreement)
N1 (Sensitivity)	NEAT substantially better (8-20%)	Not a primary equivalence metric, but noted as an area where NEAT excels.
N1 (Specificity)	EnsoSleep better (5-9%)	Not a primary equivalence metric, but noted.
N2 (Overall Performance)	EnsoSleep marginally better (5%) for BEL data set	Difference within range of human agreement variability
N2 (Sensitivity)	EnsoSleep more sensitive (22%)	Not a primary equivalence metric, but noted.
N2 (Specificity)	EnsoSleep less specific (9-11%)	Not a primary equivalence metric, but noted.
N3 (Overall Performance)	Equivalent (1% difference overall)	Difference within range of human agreement variability
N3 (Sensitivity)	NEAT substantially better (15-39%)	Not a primary equivalence metric, but noted as an area where NEAT excels.
N3 (Specificity)	EnsoSleep marginally better (3-4%)	Not a primary equivalence metric, but noted.
General Conclusion	Statistically significant differences, but practically within the range of differences expected among expert human raters.	Substantial equivalence to predicate device.

Study Details

Here's a breakdown of the study details based on the provided text:

1. Sample Size and Data Provenance

Test Set Sample Size: The exact number of participants or EEG recordings in the test set is not explicitly stated. The document refers to "two data sets" (referred to as "BEL data set" and "CSF data set") used for testing both NEAT and EnsoSleep. The large resampling number (R=2000 resamples for bootstrapping) suggests a dataset size sufficient to yield small confidence intervals.
Data Provenance:
- Country of Origin: Not explicitly stated.
- Retrospective or Prospective: Not explicitly stated, but the mention of "All data files were scored by EnsoSleep" and "All data files were scored by NEAT" implies these were pre-existing datasets, making them retrospective.

2. Number of Experts and Qualifications for Ground Truth

Number of Experts: Not explicitly stated. The study refers to "established gold standard" and "human agreement variability" among "expert human raters," implying multiple experts.
Qualifications of Experts: Not explicitly stated beyond "expert human raters." No details are provided regarding their specific medical background (e.g., neurologists, sleep specialists), years of experience, or board certifications.

3. Adjudication Method for the Test Set

Adjudication Method: Not explicitly stated. The document simply refers to "the established gold standard." It does not mention whether this gold standard was derived from a single expert, consensus among multiple experts, or a specific adjudication process (like 2+1 or 3+1).

4. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was an MRMC study done? A direct MRMC comparative effectiveness study involving human readers assisting with AI vs. without AI assistance was not explicitly described. The study primarily focuses on comparing the standalone performance of NEAT (the AI) against the standalone performance of the predicate device (EnsoSleep), and then interpreting these differences in the context of human-to-human agreement variability.
Effect Size of Human Reader Improvement: Since a direct MRMC study with human readers assisting AI was not detailed, there is no information provided on the effect size of how much human readers improve with AI vs. without AI assistance.

5. Standalone Performance (Algorithm Only)

Was a standalone study done? Yes. The study evaluated the "segment-by-segment" performance of NEAT and EnsoSleep algorithms directly against the "established gold standard." This is a measure of the algorithm's standalone performance without human input during the scoring process.

6. Type of Ground Truth Used

Type of Ground Truth: The ground truth for the test set was based on an "established gold standard" for sleep stage classification. This strongly implies expert consensus or expert scoring of the EEG data according to American Academy of Sleep Medicine definitions, rules, and guidelines. Pathology or outcomes data were not used for sleep staging ground truth.

7. Training Set Sample Size

Training Set Sample Size: The sample size for the training set is not explicitly stated in the provided document.

8. How Ground Truth for Training Set Was Established

How Ground Truth for Training Set Was Established: The document states that neat-cli "leverages Python libraries for identifying stages of sleep on MFF files using Machine Learning (ML)." However, it does not explicitly describe how the ground truth for the training set was established. Typically, for ML models, the training data's ground truth would also be established by expert annotation or consensus, similar to the test set ground truth, but this is not confirmed in the provided text.

Ask a Question

Ask a specific question about this device

K Number

K241960

Device Name

DeepRESP

Manufacturer

Nox Medical ehf

Date Cleared

2025-03-14

(254 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K192469,K202142

Predicate For

N/A

Intended Use

DeepRESP is an aid in the diagnosis of various sleep disorders where subjects are often evaluated during the initiation or follow-up of treatment of various sleep disorders. The recordings to be analyzed by DeepRESP can be performed in a hospital, patient home, or an ambulatory setting. It is indicated for use with adults (22 years and above) in a clinical environment by or on the order of a medical professional.

DeepRESP is intended to mark sleep study signals to aid in the identification of events and annotation of traces; automatically calculate measures obtained from recorded signals (e.g., magnitude, time, frequency, and statistical measures of marked events); infer sleep staging with arousals with EEG and in the absence of EEG. All output is subject to verification by a medical professional.

Device Description

DeepRESP is a cloud-based software as a medical device (SaMD), designed to perform analysis of sleep study recordings, with and without EEG signals, providing data for the assessment and diagnosis of sleep-related disorders. Its algorithmic framework provides the derivation of sleep staging including arousals, scoring of respiratory events and key parameters such as the Apnea-Hypopnea Index (AHI).

DeepRESP is hosted on a serverless stack. It consists of:

A web Application Programming Interface (API) intended to interface with a third-party client application, allowing medical professionals to access DeepRESP's analytical capabilities.
Predefined sequences called Protocols that run data analyses, including artificial intelligence and rule-based models for the scoring of sleep studies, and a parameter calculation service.
A Result storage using an object storage service to temporarily store outputs from the DeepRESP Protocols.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for the DeepRESP device, based on the provided FDA 510(k) summary:

1. Table of Acceptance Criteria & Reported Device Performance:

The document doesn't explicitly state "acceptance criteria" as a separate table, but it compares DeepRESP's performance against manual scoring and predicate devices. I've extracted the performance metrics that effectively serve as acceptance criteria given the "non-inferiority" and "superiority" claims against established devices.

Metric (Against Manual Scoring)	DeepRESP Performance (95% CI)	Equivalent Predicate Performance (Nox Sleep System K192469) (95% CI)	Superiority/Non-inferiority Claim	Relevant Study Type
Severity Classification (AHI ≥ 5)
PPA%	87.5 [86.2, 89.0]	73.6 [PPA% reported for predicate]	Superiority	Type I/II
NPA%	91.9 [87.4, 95.8]	65.8 [NPA% reported for predicate]	Non-inferiority	Type I/II
OPA%	87.9 [86.6, 89.3]	73.0 [OPA% reported for predicate]	Superiority	Type I/II
Severity Classification (AHI ≥ 15)
PPA%	74.1 [72.0, 76.5]	54.5 [PPA% reported for predicate]	Superiority	Type I/II
NPA%	94.7 [93.2, 96.2]	89.8 [NPA% reported for predicate]	Non-inferiority	Type I/II
OPA%	81.5 [79.9, 83.3]	67.2 [OPA% reported for predicate]	Superiority	Type I/II
Respiratory Events
PPA%	72.0 [70.9, 73.2]	58.5 [PPA% reported for predicate]	Non-inferiority (Superiority for OPA claimed)	Type I/II
NPA%	94.2 [94.0, 94.5]	95.4 [NPA% reported for predicate]	Non-inferiority	Type I/II
OPA%	87.2 [86.8, 87.5]	81.7 [OPA% reported for predicate]	Superiority	Type I/II
Sleep State Estimation (Wake)
PPA%	95.4 [95.1, 95.6]	56.7 [PPA% reported for predicate]	Non-inferiority	Type I/II
NPA%	94.6 [94.4, 94.9]	98.1 [NPA% reported for predicate]	Non-inferiority	Type I/II
OPA%	94.8 [94.6, 95.0]	89.8 [OPA% reported for predicate]	Non-inferiority	Type I/II
Arousal Events
ArI ICC (against Sleepware G3 K202142)	0.63 [ArI ICC]	0.794 [ArI ICC for additional predicate]	Non-inferiority	Type I/II
PPA%	62.2 [61.2, 63.1]	N/A (Manual for primary predicate)	N/A	Type I/II
NPA%	89.3 [88.8, 89.7]	N/A (Manual for primary predicate)	N/A	Type I/II
OPA%	81.4 [81.1, 81.7]	N/A (Manual for primary predicate)	N/A	Type I/II
Type III Severity Classification (AHI ≥ 5)
PPA%	93.1 [92.2, 93.9]	82.4 [PPA% reported for predicate]	Superiority	Type III
NPA%	81.1 [75.1, 86.6]	56.6 [NPA% reported for predicate]	Non-inferiority	Type III
OPA%	92.5 [91.7, 93.3]	81.1 [OPA% reported for predicate]	Non-inferiority	Type III
Type III Respiratory Events
PPA%	75.4 [74.6, 76.1]	58.5 [PPA% reported for predicate]	Superiority	Type III
NPA%	87.8 [87.4, 88.1]	95.4 [NPA% reported for predicate]	Non-inferiority	Type III
OPA%	83.7 [83.4, 84.0]	81.7 [OPA% reported for predicate]	Superiority	Type III
Type III Arousal Events
ArI ICC (against Sleepware G3 K202142)	0.76 [ArI ICC]	0.73 [ArI ICC for additional predicate]	Non-inferiority	Type III

2. Sample Size Used for the Test Set and Data Provenance:

Type I/II Studies (EEG present): 2,224 sleep recordings
Type III Studies (No EEG): 3,488 sleep recordings (including 2,213 Type I recordings and 1,275 Type II recordings, processed to utilize only Type III relevant signals).
Provenance: Retrospective study. Data originated from sleep clinics in the United States, collected as part of routine clinical work for patients suspected of sleep disorders. The patient population showed diversity in age, BMI, and race/ethnicity (Caucasian or White, Black or African American, Other, Not Reported) and was considered representative of patients seeking medical services for sleep disorders in the United States.

3. Number of Experts and Qualifications for Ground Truth:

The document explicitly states that the studies used "manually scored sleep recordings" but does not specify the number of experts or their specific qualifications (e.g., "radiologist with 10 years of experience"). It implicitly relies on the quality of "manual scoring" from routine clinical work in US sleep clinics as the ground truth.

4. Adjudication Method for the Test Set:

The document does not describe any specific adjudication method (e.g., 2+1, 3+1). It refers to "manual scoring" as the established ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:

No, a MRMC comparative effectiveness study was not reported. The study design was a retrospective data analysis comparing the algorithm's performance against existing manual scoring (ground truth) and established predicate devices. There is no information about human readers improving with AI vs. without AI assistance. The device is intended to provide automatic scoring subject to verification by a medical professional.

6. Standalone (Algorithm Only) Performance:

Yes, the study report describes the standalone performance of the DeepRESP algorithm. The reported PPA, NPA, OPA percentages, and ICC values represent the agreement of the automated scoring by DeepRESP compared to the manual ground truth. The device produces output "subject to verification by a medical professional," but the performance metrics provided are for the algorithmic output itself.

7. Type of Ground Truth Used:

The ground truth used was expert consensus (manual scoring). The document states "It used manually scored sleep recordings... The studies were done by evaluating the agreement in scoring and clinical indices resulting from the automatic scoring by DeepRESP compared to manual scoring."

8. Sample Size for the Training Set:

The document does not explicitly state the sample size used for the training set. The clinical validation study is described as a "retrospective study" used for validation, but details about the training data are not provided in this summary.

9. How the Ground Truth for the Training Set Was Established:

The document does not specify how the ground truth for the training set was established. It only describes the ground truth for the validation sets as "manually scored sleep recordings" from routine clinical work.

Ask a Question

Ask a specific question about this device

K Number

K242094

Device Name

Dreem 3S

Manufacturer

Beacon Biosignals, Inc.

Date Cleared

2024-11-22

(128 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K223539

Predicate For

N/A

Intended Use

The Dreem 3S is intended for prescription use to measure, record, display, transmit and analyze the electrical activity of the brain to assess sleep and awake in the home or healthcare environment. The Dreem 3S can also output a hypnogram of sleep scoring by 30-second epoch and summary of sleep metrics derived from this hypnogram.

The Dreem 3S is used for the assessment of sleep on adult individuals (22 to 65 years old). The Dreem 3S allows for the generation of user/predefined reports based on the subject's data.

Device Description

The Dreem 3S headband contains microelectronics, within a flexible case made of plastic, foam, and fabric. It includes 6 EEG electrodes and a 3D accelerometer sensor.

The EEG signal is measured by two electrodes in the frontal position) and two at the back of the head (occipital position), along with one reference electrode and one ground electrode.

The 3D accelerometer is embedded in the top of the headband to ensure accurate measurements of the wearer's head movement during the night. The raw EEG and accelerometer data are transferred to Dreem's servers for further analysis after the night is over.

The device includes a bone-conduction speaker with volume control to provide notifications to the wearer, and a power button circled by a multicolor LED light

The device generates a sleep report that includes a sleep staging for each 30-second epoch during the night. This output is produced using an algorithm that analyzes data from the headband EEG and accelerometer sensors. A raw data file is also available in EDF format.

AI/ML Overview

The provided text is a 510(k) summary for the Dreem 3S device. It does not contain a comprehensive study detailing acceptance criteria and device performance. Instead, it states that no new testing was performed because the current submission is primarily for the inclusion of a Predetermined Change Control Plan (PCCP). It relies on the performance characteristics previously reported for the predicate device (K223539).

Therefore, I cannot provide a table of acceptance criteria with reported performance, or details about the sample sizes and ground truth for a new study, as none was conducted or reported in this document.

However, based on the information for the predicate device, and the intent behind the PCCP, I can infer and summarize what would typically be expected for such a device and what the PCCP aims to maintain:

Inferred Acceptance Criteria based on Predicate Device (K223539) and PCCP:
The document states, "clinical performance validation will also be repeated, and will require that the performance of any modification to Dreem 3S to be non-inferior to the all previously released versions of the Dreem 3S device." This indicates that the primary acceptance criterion for any future algorithmic updates under the PCCP is non-inferiority to the performance established in the original clearance (K223539). While the specific metrics are not detailed in this current summary, for a sleep staging device, these would typically include accuracy metrics like Cohen's Kappa, Sensitivity, Specificity, and overall accuracy for differentiating sleep stages (Wake, NREM1, NREM2, NREM3, REM).

Regarding Study Information (based on the original clearance of K223539, not detailed here):

Since the provided document explicitly states, "No bench testing, animal testing, or clinical testing was performed to support this submission," I cannot fill in the details for a new study. The performance information relates to the predicate device (K223539).

However, based on the Predetermined Change Control Plan (PCCP) section, which outlines how future algorithmic modifications will be validated, I can describe the methodology for future performance validation under that plan:

Inferred Acceptance Criteria and Future Performance Validation Methodology (based on PCCP)

1. Table of Acceptance Criteria and Reported Device Performance:

Acceptance Criterion (Inferred from PCCP)	Reported Device Performance (From K223539 - Not detailed in this document)
Non-inferiority of sleep staging performance to previously cleared versions	Specific performance metrics (e.g., Kappa, Accuracy, Sensitivity, Specificity for sleep stages) measured in K223539.
Maintain performance across specific sleep stages (Wake, N1, N2, N3, REM)	Specific performance metrics for each stage from K223539.
Robustness to signal preprocessing, ML model, and postprocessing updates	Performance maintained within non-inferiority margins after updates.

Note: The actual numerical performance metrics for the predicate device (K223539) are not provided in this document. They would have been part of the original K223539 submission. The PCCP ensures that future algorithmic changes meet these same (or non-inferior) performance levels.

2. Sample Size Used for the Test Set and Data Provenance:

For future updates under PCCP: The PCCP states, "Recordings that are used for any purpose (e.g., training, tuning, failure analysis, etc.) that might lead to direct or indirect insight regarding the performance of a modified sleep staging algorithm on this recording, other than execution of the clinical performance validation per the methods specified in the PCCP, are excluded from the test dataset." This implies that a new, independent test set will be used for each validation under the PCCP.
Sample Size: Not specified for future PCCP validations, but it is stated that "Quality checks will ensure that the test data are sufficiently high quality and representative of the intended use population."
Data Provenance: Not explicitly stated, but for sleep studies, typically involves polysomnography (PSG) data. The "human variability estimated from comparison of expert scoring from 284 American Academy of Sleep Medicine (AASM) compliant polysomnography recordings" suggests a U.S. or internationally recognized standard for data interpretation. The fact that the device assesses adult individuals (22 to 65 years old) means the test set would be composed of data from this age demographic. Retrospective or prospective is not specified, but typically retrospective datasets are used for initial clearances.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications:

For future updates under PCCP: "Non-inferiority margins were selected based on the level of human variability estimated from comparison of expert scoring from 284 American Academy of Sleep Medicine (AASM) compliant polysomnography recordings." This strongly implies that the ground truth for validation (both for K223539 and subsequent PCCP validations) is expert consensus scoring based on AASM guidelines.
Number of Experts: Not explicitly stated, but "expert scoring" typically implies one or more certified sleep technologists or sleep physicians. The mention of "human variability" often means comparison between at least two independent expert scorings.
Qualifications: "American Academy of Sleep Medicine (AASM) compliant polysomnography recordings" strongly suggests that the experts would be board-certified sleep physicians or registered polysomnographic technologists (RPSGTs) with experience in AASM sleep staging. The number of years of experience is not specified.

4. Adjudication Method for the Test Set:

Not explicitly defined in the provided text. However, for "expert scoring" and estimating "human variability," common adjudication methods include:
- Consensus: Multiple experts independently score, and a final consensus is reached (e.g., by discussion or a third adjudicator if initial scores differ significantly).
- Majority vote: If more than two experts, the majority decision prevails.
- Pairwise agreement: Often used to quantify inter-rater variability for tasks like sleep staging.

5. Multi Reader Multi Case (MRMC) Comparative Effectiveness Study:

The document does not report on an MRMC comparative effectiveness study where human readers improve with AI vs. without AI assistance for this specific submission (K242094). This submission is for a PCCP and relies on the predicate's performance.

6. Standalone (Algorithm Only) Performance Study:

Yes, the document implies that a standalone performance study was conducted for the predicate device (K223539). The algorithm "analyzes data from the headband EEG and accelerometer sensors" and "uses raw EEG data and accelerometer data to provide automatic sleep staging according to the AASM classification." The PCCP is about maintaining and improving this algorithm's standalone performance.
The "clinical performance validation will also be repeated, and will require that the performance of any modification to Dreem 3S to be non-inferior" to previous versions. This directly refers to the algorithm's standalone performance.

7. Type of Ground Truth Used:

Expert Consensus: The phrase "automatic sleep staging according to the AASM classification" and "comparison of expert scoring from 284 American Academy of Sleep Medicine (AASM) compliant polysomnography recordings" strongly indicates that the ground truth is established by expert scoring conforming to AASM guidelines. This is the standard for sleep staging.

8. Sample Size for the Training Set:

Not specified in this document. This refers to the original training data used for the predicate device (K223539). For future updates, the PCCP mentions "Retraining with an updated training/tuning dataset" but does not specify the size of these datasets.

9. How the Ground Truth for the Training Set Was Established:

Not explicitly specified for the training set itself, but it is highly probable that the ground truth for the training set was established through expert consensus scoring according to AASM guidelines, similar to how the test set's ground truth is (or will be for PCCP updates) established. This is standard practice for supervised machine learning models in this domain.

Ask a Question

Ask a specific question about this device

K Number

K233438

Device Name

SleepStageML

Manufacturer

Beacon Biosignals, Inc.

Date Cleared

2024-03-08

(147 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K221179,K153412

Predicate For

N/A

Intended Use

SleepStageML is intended for assisting the diagnostic evaluation by a qualified clinician to assess sleep quality from level 1 polysomnography (PSG) recordings in a clinical environment in patients aged 18 and older.

SleepStageML is a software-only medical device to be used to analyze physiological signals and automatically score sleep stages. All outputs are subject to review by a qualified clinician.

Device Description

SleepStageML is an Artificial Intelligence/Machine Learning (Al/ML)-enabled software-only medical device that analyzes polysomnography (PSG) recordings and automatically scores sleep stages. It is intended for assisting the diagnostic evaluation by a qualified clinician to assess sleep quality in patients aged 18 and older.

Qualified clinicians (also referred to as clinical users) such as sleep physicians, sleep technicians, or registered PSG technologists (RPSGTs) who are qualified to review PSG studies, provide PSG recordings in European Data Format (EDF) file format through a secure file transfer system to Beacon Biosignals. SleepStageML automatically analyzes the provided PSG recording and return an EDF file containing the original PSG recording with software-generated sleep stage annotations (i.e., Wake (W), non-REM 1 (N1), non-REM 2 (N2), non-REM 3 (N3), and REM (R)) back to the clinical user. The EDF files containing PSG signals as well as sleep stage annotations are referred to as EDF+. The returned EDF+ files can then be reviewed by the qualified clinicians via the users' PSG viewing software. The recordings processed by SleepStageML are level-1 PSG recordings obtained in an attended setting in accordance with American Association of Sleep Medicine (AASM) recommendations with respect to minimum sampling rate, electroencephalography (EEG) channels, and EEG locations. SleepStageML only uses the EEG signals in provided PSGs and does not consider electromyography (EMG) or electrooculography (EOG) signals when performing sleep staging. The sleep stage outputs of SleepStageML are intended to be comparable to sleep stages as defined by AASM guidelines. SleepStageML software outputs are subject to qualified clinician's review.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) summary for SleepStageML:

Acceptance Criteria and Reported Device Performance

Sleep Staging Comparisons	Acceptance Criteria (Predicate Reference: Sleep Profiler, K153412, N=43 subjects)	Reported Device Performance (SleepStageML, N=100 subjects)
Overall Agreement (OA)
W	89%	96.1% (95% CI: 95.4%, 96.8%)
N1	89%	94.5% (95% CI: 93.7%, 95.2%)
N2	81%	87.1% (95% CI: 85.9%, 88.3%)
N3	91%	92.9% (95% CI: 91.8%, 93.8%)
R	95%	97.3% (95% CI: 96.7%, 97.9%)
Positive Agreement (PA)
W	73%	88.9% (95% CI: 86.5%, 91.2%)
N1	25%	58.4% (95% CI: 54.2%, 62.4%)
N2	77%	79.8% (95% CI: 77.7%, 81.8%)
N3	76%	93.0% (95% CI: 89.8%, 95.7%)
R	74%	93.1% (95% CI: 91.5%, 94.5%)
Negative Agreement (NA)
W	94%	98.5% (95% CI: 98.2%, 98.8%)
N1	93%	96.2% (95% CI: 95.4%, 96.9%)
N2	84%	94.2% (95% CI: 93.2%, 95.0%)
N3	94%	92.9% (95% CI: 91.7%, 93.9%)
R	97%	98.0% (95% CI: 97.3%, 98.6%)
Multi-stage Agreement	Not explicitly stated for predicate in a comparable way, but implied.	84.02% (Calculated from N=100 subjects total epochs: 86,983 overall, 2,289 no consensus)

Study Details:

Sample sizes used for the test set and data provenance:
- Test Set Sample Size: 100 patients.
- Data Provenance: Retrospective pivotal validation study using previously collected clinical polysomnography (PSG) recordings. The recordings were randomly selected from three Level 1 clinical PSG data sources. The document does not specify the country of origin of the data.
Number of experts used to establish the ground truth for the test set and their qualifications:
- Number of Experts: Three (3) registered PSG technologists (RPSGTs).
- Qualifications: Each RPSGT had at least 5 years of experience in clinical scoring of sleep studies.
Adjudication method for the test set:
- Method: 2/3 majority scoring. Expert consensus sleep stages were constructed using the stage per epoch where at least 2 of the 3 experts agreed. Epochs where all 3 RPSGTs disagreed were excluded.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done:
- No, an MRMC study comparing human readers with AI vs. without AI assistance was not explicitly detailed. The study focused on the standalone performance of the AI algorithm against human expert consensus to demonstrate non-inferiority to a predicate device. The device's indication for use explicitly states, "All outputs are subject to review by a qualified clinician," indicating a human-in-the-loop design, but the described performance study is primarily a standalone evaluation.
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, the clinical validation test evaluated the SleepStageML software's performance "against the expert consensus sleep stages" in a standalone manner. The device's outputs are intended to be reviewed by a clinician, but the performance metrics reported are for the algorithm's direct output compared to ground truth.
The type of ground truth used:
- Type: Expert Consensus. The ground truth was established by three RPSGTs, with a 2/3 majority rule for consensus.
The sample size for the training set:
- The document states, "SleepStageML uses a deep learning algorithm based on convolutional neural networks, which was trained on a large and diverse set of PSG recordings with sleep staging labels." However, a specific sample size for the training set is not provided in the summary.
How the ground truth for the training set was established:
- The document states the training was on "PSG recordings with sleep staging labels." It does not explicitly detail the method for establishing ground truth for the training set (e.g., if it was also expert consensus, single expert, or another method). However, given the nature of sleep staging, it is highly likely that these labels were also derived from expert annotations, similar to the test set, though possibly not with the same rigorous 3-expert consensus and adjudication for every record.

Ask a Question

Ask a specific question about this device

K Number

K223539

Device Name

Dreem 3S

Manufacturer

Beacon Biosignals, Inc.

Date Cleared

2023-08-18

(268 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K170138,K192469

Predicate For

K242094

Intended Use

The Dreem 3S is intended for prescription use to measure, record, display, transmit and analyze the electrical activity of the brain to assess sleep and awake in the home or healthcare environment.

The Dreem 3S can also output a hypnogram of sleep scoring by 30-second epoch and summary of sleep metrics derived from this hypnogram.

The Dreem 3S is used for the assessment of sleep on adult individuals (22 to 65 years old). The Dreem 3S allows for the generation of user/predefined reports based on the subject's data.

Device Description

The Dreem 3S headband contains microelectronics, within a flexible case made of plastic, foam, and fabric. It includes 6 EEG electrodes and a 3D accelerometer sensor.

The EEG signal is measured by two electrodes in the frontal position) and two at the back of the head (occipital position), along with one reference electrode and one ground electrode.

The device includes a bone-conduction speaker with volume control to provide notifications to the wearer, and a power button circled by a multicolor LED light

The algorithm uses raw EEG data and accelerometer data to provide automatic sleep staging according to the AASM classification. The algorithm is implemented with an artificial neural network. Frequency spectrums are computed from raw data and then passed to several neural network layers including recurrent layers and attention layers. The algorithm outputs prediction for several epochs of 30 seconds at the same time, every 30 seconds. The various outputs for a single epoch of 30 seconds are combined to provide robust sleep scoring.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for the Dreem 3S device based on the provided text:

Acceptance Criteria and Device Performance

Acceptance Criteria (Implicit from Study Results)	Reported Device Performance (Dreem 3S vs. Expert-scored PSG)
Sleep Stage Classification Accuracy
Wake Classification Performance	Positive Agreement (PA): 88.5% (85.1%, 91.3% CI)
N1 Classification Performance	Positive Agreement (PA): 58.0% (52.7%, 63.0% CI)
N2 Classification Performance	Positive Agreement (PA): 83.4% (80.7%, 85.7% CI)
N3 Classification Performance	Positive Agreement (PA): 98.2% (96.73%, 99.3% CI)
REM Classification Performance	Positive Agreement (PA): 91.57% (86.63%, 95.72% CI)
EEG Data Quality for Manual Scoring	96.6% epochs per night of recording were acceptable for manual scoring and sleep staging by at least two out of three reviewers.
Minimum Scoreable Data	All data recordings reviewed had ≥4 hours of data considered to be scoreable by at least two out three reviewers.
Usability in Home Setting	The device could be successfully used and was tolerated by study subjects.

Note: The document primarily presents performance results rather than explicitly stating pre-defined acceptance criteria with numerical thresholds the device needed to meet. The "acceptance criteria" listed above are inferred from the demonstrated performance that supported substantial equivalence.

Study Details

Sample size used for the test set and the data provenance:
- Sample Size: 38 subjects
- Data Provenance: The study was a "clinical investigation... completed... in a sleep lab setting." Subjects ranged from 23 to 66 years old, equally split between male and female, and included individuals self-identified as White, Black African American, Asian, Hispanic, and some not identified. This suggests a prospective study with diverse participants, likely conducted in a single country, though the specific country of origin is not explicitly stated. The study included a total of 36447 epochs, corresponding to about 303 hours and 43 minutes of sleep.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The ground truth for sleep staging was established by "expert-scored sleep stages from a cleared device." For EEG data quality, this was assessed by "at least two out of three reviewers qualified to read EEG and/or PSG data." The specific number of experts used for the primary sleep staging ground truth is not explicitly stated (e.g., whether it was one expert, or a consensus of multiple). Their exact qualifications (e.g., years of experience as a radiologist) are also not detailed beyond "expert-scored" and "qualified to read EEG and/or PSG data."
Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- For the primary sleep staging (Table 2), the ground truth is referred to as "Consensus from manual staging" or "expert-scored PSG." The specific adjudication method (e.g., 2+1, 3+1) is not detailed.
- For EEG data quality, acceptability was determined if "at least two out of three reviewers qualified to read EEG and/or PSG data" agreed. This implies a 2-out-of-3 consensus (similar to a 2+1 method if one reviewer was the primary and two others adjudicated).
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, a multi-reader multi-case (MRMC) comparative effectiveness study focusing on human reader improvement with AI assistance was not done. This study solely evaluated the standalone performance of the Dreem 3S algorithm against expert-scored PSG.
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone study was done. The clinical performance evaluation directly compares the "Dreem 3S (Automated analysis)" output to "Consensus from manual staging" (expert-scored PSG), indicating the performance of the algorithm without human intervention in the loop.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert consensus of manual staging from a 510(k)-cleared PSG system.
The sample size for the training set:
- The sample size for the training set is not specified in the provided document. The document only mentions "The algorithm is implemented with an artificial neural network. Frequency spectrums are computed from raw data and then passed to several neural network layers including recurrent layers and attention layers," which implies a training process, but no details on the training data size are given.
How the ground truth for the training set was established:
- How the ground truth for the training set was established is not specified in the provided document.

Ask a Question

Ask a specific question about this device

K Number

K223922

Device Name

SOMNUM (V.1.1.2.)

Manufacturer

Honeynaps Co., Ltd

Date Cleared

2023-08-16

(229 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K162627,K112102

Predicate For

N/A

Intended Use

SOMNUM is a computer program (software) intended for use as an aid for the diagnosis of sleep and respiratory related sleep disorders. SOMNUM is intended to be used for analysis (automatic scoring and manual re-scoring), display, redisplay(retrieve), summarize and reports generation of digital data collected by monitoring devices typically used to evaluate sleep and respiratory related sleep disorders. The device is to be used under the supervision of a physician. Use is restricted to files obtained from adult patients.

For respiratory events - Sleep Disordered Breathing (Apneas)- obstructive, central, mixed apneas, and hypopneas must be manually scored by physician. The device does not output specific apnea or hypopnea events and therefore should not be used for management decisions.

Device Description

SOMNUM is a standalone software application that analyze previously recorded physiological data obtained during level 1 sleep studies, referred to as polysomnography (PSG) records. The SOMNUM software can analyze any EDF files. Automated algorithms are applied to the raw signals in order to identify the occurrence of certain events. The software automates recognition of:

· Sleep Stage Events : Wake, Stage N1, Stage N2, Stage N3, Stage REM
Respiratory Events : Sleep Disordered Breathing (device output does not distinguish between Apneas and Hypopneas. Obstructive, central, mixed apneas, and hypopneas must be manually scored by physician)
· Arousal Events
· Leg Movement Events : Periodic Leg Movements during Sleep (PLMs)

The SOMNUM software can be used as a stand-alone application for use on Windows 10 operating system platform. All processing, scoring, and analysis of signal data occurs on local desktop PC.

AI/ML Overview

The Honeynaps Co., Ltd. Somnum (v.1.1.2) device underwent a non-clinical performance test to establish substantial equivalence to predicate devices for the analysis of sleep and respiratory-related sleep disorders.

1. Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly stated as numerical thresholds for specific metrics before the results table. However, the "Discussion" sections following the results implicitly define acceptable performance based on comparisons to predicate devices and acceptable error ranges in clinical practice. The reported device performance is shown in the tables below.

Endpoint 1: Performance for Detecting Each Event Type (Sleep Stage, Arousal, SDB, PLMs)

Event Type	Metric	SOMNUM Performance (CI)	Predicate Device (K162627) Reference Data	Predicate Device (K112102) Reference Data	Implied Acceptance Criteria (Based on discussion)
Sleep Stage (Overall OPA and Kappa with reference to K162627)	OPA	87.5% (87.2, 87.8)	91% (91,92)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	Kappa	82.1% (81.6, 82.5)	N/A	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
Wake	PPA	89.9% (89.2, 90.6)	86% (82,88)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	97.4% (97.3, 97.6)	97% (95,98)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
N1	PPA	77.9% (77.0, 78.9)	41% (33,48)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	94.6% (94.4, 94.9)	94% (93,96)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
N2	PPA	91.1% (90.7, 91.5)	77% (73,81)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	92.4% (92.0, 92.7)	87% (85,90)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
N3	PPA	84.1% (82.6, 85.6)	81% (74,88)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	99.1% (99.0, 99.2)	93% (91,95)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
REM	PPA	84.7% (83.8, 85.4)	79% (72,84)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	98.9% (98.8, 99.0)	99% (98,99)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
Arousal	OPA	82.5% (82.2, 82.9)	87% (85,88)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	PPA	82% (81.4, 82.6)	66% (61,71)	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	NPA	82.8% (82.4, 83.2)	90% (88,91)	N/A	The lower NPA (8% lower than predicate) is considered within "clinically acceptable error range of around 15%".
SDB	OPA	92.3% (92.1, 92.6)	93.0% (from K112102 ref [1]) & 91% (90,92) (from K162627 ref [4])	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	PPA	94.2% (93.8, 94.5)	75.5% (from ref [4]) & 67% (58, 75) (from ref [1])	N/A	Exceeds predicate performance.
	NPA	91.3% (91.0, 91.6)	98.1% (from ref [4]) & 93% (92, 94) (from ref [1])	N/A	The lower NPA (1.7% lower than K162627, 6.8% lower than K112102) is considered acceptable when considering overall performance (PPA, OPA) and the 15% manual scorer agreement error range.
PLMS	OPA	94.1% (93.9, 94.4)	95.7% (from ref [4]) & 89% (87,90) (from ref [1])	N/A	Comparable to predicate and within clinically acceptable error (approx. 15%).
	PPA	92.9% (92.1, 93.6)	78.4% (from ref [4]) & 71% (60,80) (from ref [1])	N/A	Exceeds predicate performance.
	NPA	94.3% (94.1, 94.5)	97.6% (from ref [4]) & 90% (89,92) (from ref [1])	N/A	The lower NPA is considered within "clinically acceptable range".

Endpoint 2: Performance for Summary Variables (Absolute Max Difference/LOA)

Variable	Type of Limit	SOMNUM Abs. MAX	Target (Reference) Abs. MAX	SOMNUM LOA Range	Reference LOA Range (if available)	Acceptance Criteria (Based on discussion)
TST	U	20	120 (Ref [5])	U: 7.8, L: -14.21	U: 35, L: -70	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
SE	U	5	13 (Ref [7])	U: 1.69, L: -3.31	U: 10, L: -12	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
SOL	U	23	40 (Ref [7])	U: 2.93, L: -4.21	U: 15, L: -11	SOMNUM's LOA range is narrower than references or within target. Exceeds target. Also, has smaller absolute error compared to target.
ROL	U	120	170 (Ref [5])	U: 62.01, L: -60.69	U: 70, L: -90	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
Wake	U	22	60 (Ref [7])	U: 14.21, L: -7.80	U: 70, L: -45	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
N1	U	45	80 (Ref [5])	U: 25.30, L: -17.29	U: 30, L: -30	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
N2	U	65	120 (Ref [5])	U: 31.61, L: -29.10	U: 10, L: -75	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
N1_N2	U	55	70 (Ref [5])	U: 31.58, L: -19.17	U: 30, L: -75	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
N3	U	40	140 (Ref [5])	U: 18.83, L: -30.45	U: 65, L: -5	SOMNUM's LOA range is narrower than references or within target. Also, has smaller absolute error compared to target.
REM	U	40	80 (Ref [5])	U: 6.07, L: -31.95	U: 18, L: -55	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
Arousal Index	U	30	-	U: 23.91, L: -15.21	-	SOMNUM's LOA range is narrower than references or within target. Exceeds target.
PLMS Index	U	43	7 (Ref [6])	U: 12.24, L: -0.37	U: 13, L: -15	Shows almost same value as target. Differences considered within 15% clinical error range.
AHI Index	U	18	7 (Ref [5])	U: 8.79, L: -3.28	U: 4, L: -2	SOMNUM has smaller absolute error compared to target.

Conclusion: The study concludes that SOMNUM passed all pass/fail criteria for both Endpoint 1 and Endpoint 2, demonstrating substantial equivalence.

2. Sample Size and Data Provenance

Sample Size (Test Set): N=48 subjects
Data Provenance: The data was recorded in a sleep laboratory. The country of origin is not explicitly stated. The study design is described as "cross-sectional experimental design," which implies it was specifically conducted for this evaluation. It is not explicitly stated if it's retrospective or prospective, but the phrasing "representative N=48 subjects of data recorded in the sleep laboratory" suggests it was existing data selected for the study.

3. Number of Experts and Qualifications for Ground Truth

Number of Experts: Three technologists.
Qualifications of Experts: "Medical professionals certified on PSG recording and analysis". Specific years of experience are not provided.

4. Adjudication Method

Adjudication Method: 2/3 majority rule. This means at least two out of the three experts had to agree on the presence of an event within an epoch for it to be considered ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

A MRMC comparative effectiveness study was not conducted to measure the improvement of human readers with AI assistance versus without AI assistance. The study focuses on the standalone performance of the AI.

6. Standalone Performance Study

Yes, a standalone study was conducted. The performance test evaluated "SOMNUM device performance using a cross-sectional experimental design." The comparison was between SOMNUM's scoring and the ground truth established by expert consensus.

7. Type of Ground Truth Used

Type of Ground Truth: Expert consensus (2/3 majority rule of three certified medical professionals).

8. Sample Size for the Training Set

The document does not provide specific details regarding the sample size used for the training set. It only describes the performance test (test set).

9. How Ground Truth for Training Set was Established

The document does not provide details on how the ground truth for the training set was established. It only describes the ground truth establishment for the test set.

Ask a Question

Ask a specific question about this device

K Number

K221179

Device Name

SomnoMetry

Manufacturer

Neumetry Medical Inc

Date Cleared

2022-09-21

(149 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K162627

Predicate For

K241288

Intended Use

SomnoMetry is intended for use for the diagnostic evaluation by a physician to assess sleep quality and as an aid for the diagnosis of sleep and respiratory-related sleep disorders in adults only. SomnoMetry is a software-only medical device to be used to analyze physiological signals and automatically score sleep study results, including the staging of sleep, AHI, and detection of sleepdisordered breathing events including obstructive apneas. It is intended to be used under the supervision of a clinician in a clinical environment. All automatically scored events are subject to verification by a qualified clinician.

Device Description

The SomnoMetry is an Artificial Intelligent/Machine Learning (AI/ML)-enabled Software as a Medical Device (SaMD) that automatically scores sleep study results by analyzing polysomnography (PSG) signals recorded during sleep studies. It is intended to be used under the supervision of a clinician in clinical environments to aid in the diagnosis of sleep and respiratory related sleep disorders.

All scored events that are analyzed, displayed, and summarized can be manually marked or edited by a qualified clinician during review and verification.

SomnoMetry consists of:

A web Application Programing Interface (API) to allow authenticated users to upload . PSG files to SomnoMetry Platform
A database to store the input, intermedium output, final output, and associated data ●
A database API to access the database and store/retrieve the output ●
A dashboard to display, retrieve, manage, edit, verify, and summarize the output
An AI/ML Engine using AI/ML algorithms/approaches to analyze PSG data ●
A reporting API to generate sleep reports

AI/ML Overview

Here's an analysis of the acceptance criteria and study that proves the device meets them, based on the provided text:

Device: SomnoMetry (Neumetry Medical Inc.)

Regulatory Class: Class II (Product Code: OLZ)
Intended Use: Diagnostic evaluation to assess sleep quality and aid in diagnosing sleep and respiratory-related sleep disorders in adults. Automatically scores sleep study results (sleep staging, AHI, obstructive apneas). Used under clinician supervision; all automatically scored events are subject to verification.

1. Table of Acceptance Criteria and Reported Device Performance

The document doesn't explicitly define 'acceptance criteria' in a numerical table for the SomnoMetry device against specific thresholds (e.g., "accuracy must be > 90%"). Instead, it frames the performance evaluation as demonstrating non-inferiority and substantial equivalence to a predicate device (EnsoSleep K162627) based on clinical performance metrics. The implicit acceptance criteria are that the SomnoMetry's performance for sleep staging and sleep apnea diagnosis is statistically equivalent to, or better than, the predicate device.

Table 1: Implicit Acceptance Criteria and SomnoMetry's Reported Performance (vs. Predicate)

Metric/Endpoint	Acceptance Criterion (Implicit: Non-inferiority/Substantial Equivalence to Predicate)	SomnoMetry Performance (Clinical Study Results)	Predicate Device Performance (Clinical Study Results)
Endpoint 1: Sleep Staging Performance
Wake (W)	Non-inferior to AASM gold standard (manual scoring)	True Label W: Predicted W = 92.7% (91.8, 93.6)*	Not directly comparable in this table; stated as "substantially equivalent to the predicate device."
Stage 1 (N1)	Non-inferior to AASM gold standard (manual scoring)	True Label N1: Predicted N1 = 47.1% (46.1, 48.8)*	Not directly comparable in this table; stated as "substantially equivalent to the predicate device."
Stage 2 (N2)	Non-inferior to AASM gold standard (manual scoring)	True Label N2: Predicted N2 = 88.3% (87.4, 89.1)*	Not directly comparable in this table; stated as "substantially equivalent to the predicate device."
Stage 3 (N3)	Non-inferior to AASM gold standard (manual scoring)	True Label N3: Predicted N3 = 80.8% (79.8, 81.7)*	Not directly comparable in this table; stated as "substantially equivalent to the predicate device."
REM (R)	Non-inferior to AASM gold standard (manual scoring)	True Label R: Predicted R = 94.5% (93.5, 95.5)*	Not directly comparable in this table; stated as "substantially equivalent to the predicate device."
*Note on Sleep Staging:	The confusion matrix shows how well the algorithm predicts each sleep stage given the true stage. Higher percentages on the diagonal (true W to predicted W, etc.) indicate better performance.
Endpoint 2: Sleep Apnea Diagnostic Agreement (AHI Thresholds)
Positive Agreement (PA)	No statistically significant differences from predicate device	All: AHI > 5: 90.6%; AHI ≥ 15: 89.1%; AHI ≥ 30: 83.3% REM: AHI ≥ 5: 85.6%; AHI ≥ 15: 80.0%; AHI ≥ 30: 78.8%	All: AHI > 5: 91%; AHI ≥ 15: 95%; AHI ≥ 30: N/A REM: AHI > 5: 83%; AHI ≥ 15: 79%; AHI ≥ 30: N/A
Negative Agreement (NA)	No statistically significant differences from predicate device	All: AHI > 5: 92.2%; AHI ≥ 15: 94.9%; AHI ≥ 30: 97.5% REM: AHI ≥ 5: 94.7%; AHI ≥ 15: 94.7%; AHI ≥ 30: 95.6%	All: AHI > 5: 76%; AHI ≥ 15: 98%; AHI ≥ 30: N/A REM: AHI > 5: 89%; AHI ≥ 15: 96%; AHI ≥ 30: N/A
Overall Agreement (OA)	No statistically significant differences from predicate device	All: AHI > 5: 91.2%; AHI ≥ 15: 92.8%; AHI ≥ 30: 95.6% REM: AHI ≥ 5: 88.9%; AHI ≥ 15: 88.9%; AHI ≥ 30: 92.4%	All: AHI > 5: 85%; AHI ≥ 15: 97%; AHI ≥ 30: N/A REM: AHI > 5: 86%; AHI ≥ 15: 92%; AHI ≥ 30: N/A
Likelihood Ratio (+)	No statistically significant differences from predicate device	All: AHI > 5: 11.62; AHI ≥ 15: 17.47; AHI ≥ 30: 33.32 REM: AHI ≥ 5: 16.15; AHI ≥ 15: 15.09; AHI ≥ 30: 17.91	All: AHI > 5: 3.76; AHI ≥ 15: 52.25; AHI ≥ 30: N/A REM: AHI > 5: 7.71; AHI ≥ 15: 22.0; AHI ≥ 30: N/A
Likelihood Ratio (-)	No statistically significant differences from predicate device	All: AHI > 5: 0.10; AHI ≥ 15: 0.11; AHI ≥ 30: 0.17 REM: AHI ≥ 5: 0.15; AHI ≥ 15: 0.21; AHI ≥ 30: 0.22	All: AHI > 5: 0.12; AHI ≥ 15: 0.05; AHI ≥ 30: N/A <brREM: AHI > 5: 0.19; AHI ≥ 15: 0.22; AHI ≥ 30: N/A

2. Sample Size and Data Provenance for Test Set

Sample Size: N = 201 adult subjects.
Data Provenance: Retrospective clinical data obtained from 2 AASM accredited Sleep Testing Facilities in California, USA.
- The data was verified to meet specified disease spectrum, medical condition, medication, and demographic requirements.
- Randomized sampling with proportionate allocation across each sleep apnea disease severity quantile (normative, mild, moderate, and severe sleep apnea) and sleep cycles was used.
- Age of subjects ranged from 20 to 84 years.
- No race/ethnicity information was collected.

3. Number of Experts and Qualifications for Ground Truth

The document explicitly states the ground truth was established by "manually scored PSG data" which aligns with the "AASM gold standard". While it doesn't specify the number of experts, it implies these were the standard scores provided by the AASM accredited facilities. The qualifications are implicitly that they are "qualified clinicians" following "American Academy of Sleep Medicine scoring manual and guidelines."

4. Adjudication Method for Test Set

The document does not describe a formal expert adjudication method (e.g., 2+1, 3+1). The ground truth is stated as "AASM gold standard of manually scored PSG data," which suggests a single, accepted manual score per PSG study generated by the sleep facility.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a MRMC comparative effectiveness study was not conducted. The study evaluated the standalone performance of the AI algorithm against the AASM gold standard and then compared this algorithm performance to the predicate device's algorithm performance. It did not assess how human readers improve with AI assistance versus without. The device's intended use note ("All automatically scored events are subject to verification by a qualified clinician") implies a human-in-the-loop workflow, but the study itself wasn't designed to measure the impact of AI assistance on human reader performance.

6. Standalone (Algorithm Only) Performance Study

Yes, a standalone (algorithm only) performance study was performed. The "retrospective clinical performance testing" evaluated the SomnoMetry AI/ML algorithms directly, comparing their output to the AASM gold standard. The results presented in Table 2 (Confusion Matrix for Sleep Staging) and Table 3 (Sleep Apnea Diagnostic Agreement) are for the algorithm's performance.

7. Type of Ground Truth Used

The type of ground truth used was expert consensus / AASM (American Academy of Sleep Medicine) gold standard based on "manually scored PSG data." This is considered the clinical standard for sleep study interpretation.

8. Sample Size for the Training Set

The document does not specify the sample size for the training set. It only describes the test set (N=201).

9. How Ground Truth for the Training Set Was Established

The document does not describe how the ground truth for the training set was established. It focuses solely on the clinical evaluation (test set) and states that data was obtained from AASM accredited facilities. It can be inferred that similar "manually scored PSG data" would have been used for training, given the expertise required for such scoring and the device's reliance on AASM guidelines.

Ask a Question

Ask a specific question about this device

K Number

K210034

Device Name

EnsoSleep

Manufacturer

EnsoData, Inc.

Date Cleared

2021-06-16

(161 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K112102,K162627

Predicate For

N/A

Intended Use

EnsoSleep is intended for use in the diagnostic evaluation by a physician to assess sleep quality and as an aid for physicians in the diagnosis of sleep disorders and respiratory related sleep disorders in pediatric as follows:

· Pediatric patients 13 years and older with polysomnography (PSG) tests obtained in a Hospital or Sleep Clinic
· Adult patients with PSGs obtained in a Hospital or Sleep Clinic
· Adult patients with Home Sleep Tests
EnsoSleep is a software-only medical device to be used under the supervision of a clinician to analyze physiological signals and automatically score sleep study results, including the staging of arousals, leg movements, and sleep disordered breathing events including obstructive apneas (OSA), central sleep apneas (CSA), and hypopneas.
All automatically scored events and physiological signals which are retrieved, analyzed, displayed, and summarized are subject to verification by a qualified clinician. Central sleep apneas (CSA) should be manually reviewed and modified as appropriate by a clinician.
All events can be manually marked or edited within records during review.
Photoplethysmography (PPG) total sleep time is not intended for use when electroencephalograph (EEG) data is recorded. PPG total sleep time is not intended to be used as the sole or primary basis for diagnosing any sleep related breathing disorder, prescribing treatment, or determining whether additional diagnostic assessment is warranted.

Device Description

EnsoSleep is a software-only medical device that analyzes previously recorded physiological signals obtained during sleep. Users of EnsoSleep are consistent with the roles required to run a sleep clinic: sleep physicians, sleep technicians, clinic operations managers, and IT administrators. EnsoSleep can analyze at-home and in-lab sleep studies for both adult and pediatric patients who are at least 13 years old. Automated algorithms are applied to the raw signals in order to derive additional signals and interpret the raw and derived signal information. The software automates recognition of the following: respiratory events, sleep staging events, arousal events, movement events, cardiac events, derived signals, and calculated indices. EnsoSleep does not interpret the results, nor does it suggest a diagnosis. The device only marks events of interest for review by a physician who is responsible for diagnoses. The device does not analyze data that are different from those analyzed by human scorers.
The signals and automated analyses can be visually inspected and edited prior to the results being integrated into a sleep study report.
The software consists of 4 major components:

The Application Platform runs on local clinic workstations and manages the detection, upload, and download of study records and scoring to and from the Storage Platform
The Processing Platform accepts raw physiological signals as inputs in order to recognize events, derive signals, and calculate indices
. The Storage Platform facilitates file and database storage in the EnsoSleep cloud through an API
The Dashboard is a web-based user interface to support configuration, clinic management, and sleep study scoring

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The text describes that "the acceptance criteria were selected based on PA, NA, OA, MAE, Deming Regression coefficient, Bland-Altman mean difference and limits of agreement performance criteria that validate performance substantially equivalent to, or greater than, but not lesser than by more than 10% or similarly defined criteria in any category of the predicate 510(k) reported device performance across all endpoints respectively."

For detailed performance metrics, the document presents tables for specific endpoints. Below is a structured representation focusing on the key performance indicators mentioned and a comparison to the predicate device where available.

Endpoint 1: Sleep Staging Event Detection (EnsoSleep K210034 vs. Predicate K162627)

Metric	Acceptance Criteria (Conceptual)	EnsoSleep (K210034) Adult Sample (Reported)	EnsoSleep (K210034) Pediatric Sample (Reported)	Predicate (K162627) Adult Sample (Reported)
Positive Agreement (PA)	≥ Predicate PA - 10% (Ideally ≥ Predicate PA)	Wake: 93.5%	Wake: 93.1%	Wake: 86%
		N1: 37.0%	N1: 43.2%	N1: 41%
		N2: 88.3%	N2: 92.6%	N2: 77%
		N3: 80.0%	N3: 92.3%	N3: 81%
		REM: 90.9%	REM: 80.9%	REM: 79%
Negative Agreement (NA)	≥ Predicate NA - 10% (Ideally ≥ Predicate NA)	Wake: 97.2%	Wake: 99.2%	Wake: 97%
		N1: 98.3%	N1: 98.8%	N1: 94%
		N2: 89.3%	N2: 89.4%	N2: 87%
		N3: 96.3%	N3: 97.5%	N3: 93%
		REM: 99.3%	REM: 99.1%	REM: 99%
Overall Agreement (OA)	≥ Predicate OA - 10% (Ideally ≥ Predicate OA)	Wake: 96.1%	Wake: 97.9%	Wake: 94%
		N1: 95.0%	N1: 97.0%	N1: 91%
		N2: 88.8%	N2: 90.9%	N2: 83%
		N3: 95.0%	N3: 96.6%	N3: 92%
		REM: 98.3%	REM: 97.0%	REM: 96%
Total (Overall) OA (Pooled)	≥ Predicate Total OA - 10% (Ideally ≥ Predicate Total OA)	Adult: 86.6%	Pediatric: 89.7%	Adult: 78%
Conclusion on Acceptance Criteria (Endpoint 1)	All 3 EnsoSleep PA, NA, and OA point-estimates vs 2/3 Majority Scoring were observed to be greater than the predicate device PA, NA, and OA point-estimates in some events, with statistically significant results in terms of higher agreement for several stages (e.g., Adult REM, Pediatric Wake, N3, Total). None were observed 10% or lower than 2/3 Majority.

Endpoint 2: Sleep Apnea Diagnostic Agreement (Per-Patient AHI)

Metric	Acceptance Criteria (Conceptual)	EnsoSleep (K210034) Adult Sample (Reported)	EnsoSleep (K210034) Pediatric Sample (Reported)	Predicate (K162627) Adult Sample (Reported)
Positive Percent Agreement (PA)	≥ Predicate PA - 10% (Ideally ≥ Predicate PA)	AHI ≥ 5: 94.4%	AHI ≥ 1: 94.4%	AHI ≥ 5: 91%
		AHI ≥ 15: 94.0%	AHI ≥ 5: 90.5%	AHI ≥ 15: 95%
		REM AHI ≥ 5: 86.7%	AHI ≥ 10: 78.6%	REM AHI ≥ 5: 83%
		REM AHI ≥ 15: 81.5%	AHI ≥ 15: 85.7%	REM AHI ≥ 15: 79%
Negative Percent Agreement (NA)	≥ Predicate NA - 10% (Ideally ≥ Predicate NA)	AHI ≥ 5: 89.7%	AHI ≥ 1: 77.8%	AHI ≥ 5: 76%
		AHI ≥ 15: 96.3%	AHI ≥ 5: 100.0%	AHI ≥ 15: 98%
		REM AHI ≥ 5: 83.0%	AHI ≥ 10: 94.9%	REM AHI ≥ 5: 89%
		REM AHI ≥ 15: 93.3%	AHI ≥ 15: 100.0%	REM AHI ≥ 15: 96%
Overall Percent Agreement (OA)	≥ Predicate OA - 10% (Ideally ≥ Predicate OA)	AHI ≥ 5: 93.0%	AHI ≥ 1: 89.4%	AHI ≥ 5: 85%
		AHI ≥ 15: 95.0%	AHI ≥ 5: 95.7%	AHI ≥ 15: 97%
		REM AHI ≥ 5: 85.0%	AHI ≥ 10: 91.5%	REM AHI ≥ 5: 86%
		REM AHI ≥ 15: 90.0%	AHI ≥ 15: 97.9%	REM AHI ≥ 15: 92%
Conclusion on Acceptance Criteria (Endpoint 2)	EnsoSleep PA, NA, and OA point-estimates vs 2/3 Majority Scoring were observed to be greater than the predicate device for some OSA severity categories in both adult and pediatric samples. Only one instance (Pediatric AHI > 10 PA) was within 10% of the predicate. All met or exceeded objective performance goals.

Endpoint 3: Sleep Scoring Event Detection (EnsoSleep K210034 vs. Predicate K162627 and Reference K112102)

Metric	Acceptance Criteria (Conceptual)	EnsoSleep (K210034) Adult Sample (Reported)	EnsoSleep (K210034) Pediatric Sample (Reported)	Predicate (K162627) Adult Sample (Reported)	Reference (K112102) Adult Sample (Reported)
Positive Agreement (PA)	≥ Reference PA - 10% (Ideally ≥ Reference PA)	SDB: 75.4%	SDB: 72.7%	SDB: 67%	N/A (for SDB)
		Hypopnea: 66.3%	Hypopnea: 68.8%	Hypopnea: 60.3%	Hypopnea: 60.3%
		Obstructive Apnea: 74.1%	Obstructive Apnea: 45.5%	Obstructive Apnea: 53%	N/A (for Obstructive Apnea)
		Central Apnea: 65.3%	Central Apnea: 68.9%	Central Apnea: 63.8%	Central Apnea: 63.8%
		Arousal: 73.6%	Arousal: 78.6%	Arousal: 66%	N/A (for Arousal)
		Leg Movement: 82.0%	Leg Movement: 66.0%	Leg Movement: 71%	N/A (for Leg Movement)
Negative Agreement (NA)	≥ Reference NA - 10% (Ideally ≥ Reference NA)	SDB: 97.0%	SDB: 98.6%	SDB: 93%	N/A (for SDB)
		Hypopnea: 97.1%	Hypopnea: 98.9%	Hypopnea: 97.6%	Hypopnea: 97.6%
		Obstructive Apnea: 99.3%	Obstructive Apnea: 99.7%	Obstructive Apnea: 97%	N/A (for Obstructive Apnea)
		Central Apnea: 99.5%	Central Apnea: 99.7%	Central Apnea: 99.6%	Central Apnea: 99.6%
		Arousal: 95.6%	Arousal: 97.0%	Arousal: 90%	N/A (for Arousal)
		Leg Movement: 92.4%	Leg Movement: 95.5%	Leg Movement: 90%	N/A (for Leg Movement)
Overall Agreement (OA)	≥ Reference OA - 10% (Ideally ≥ Reference OA)	SDB: 94.9%	SDB: 97.6%	SDB: 91%	N/A (for SDB)
		Hypopnea: 95.5%	Hypopnea: 98.0%	Hypopnea: N/R	Hypopnea: N/R
		Obstructive Apnea: 98.8%	Obstructive Apnea: 99.5%	Obstructive Apnea: 96%	N/A (for Obstructive Apnea)
		Central Apnea: 98.9%	Central Apnea: 99.5%	Central Apnea: N/R	Central Apnea: N/R
		Arousal: 93.2%	Arousal: 95.5%	Arousal: 87%	N/A (for Arousal)
		Leg Movement: 91.7%	Leg Movement: 94.5%	Leg Movement: 89%	N/A (for Leg Movement)
Conclusion on Acceptance Criteria (Endpoint 3)	EnsoSleep PA, NA, and OA point-estimates observed to be greater than or within 5% of the reference device performance for all event types, with statistically significant differences (greater performance) in a majority of cases. All met or exceeded objective performance goals.

Endpoint 4: Total Sleep Time (TST) and Respiratory Rate (RR)

Metric	Acceptance Criteria (Conceptual)	EnsoSleep PPG-TST (K210034) RR Sample (Reported)	EnsoSleep EEG-TST (K210034) RR Sample (Reported)	EnsoSleep EEG-TST (K210034) Adult Sample (Reported)	EnsoSleep EEG-TST (K210034) Pediatric Sample (Reported)
Deming Regression Slope ($β$1)	Near unity (0.90 < $β$1 < 1.10)	0.964	0.984	1.037	1.006
Deming Regression Intercept ($β$0) [hours]	Near zero ($β$0 < 15 minutes / e.g., < 0.25 hours)	0.089	0.156	-0.181	0.021
Bland-Altman Mean Difference (MD) [minutes]	Near zero (MD within ≤15 minutes)	5.380	-4.785	0.515	-3.255
Bland-Altman Upper Limit of Agreement (ULOA) [min]	Within <90 minutes	73.463	32.922	57.750	10.654
Bland-Altman Lower Limit of Agreement (LLOA) [min]	Within <90 minutes	-62.703	-42.492	-56.720	-17.164
RR Performance	≥ 90% percent epochs within ≤2 brpm of reference; MAE ≤ 2 brpm of reference	Met acceptance criteria (no specific numerical values provided in the table)	Met acceptance criteria (no specific numerical values provided in the table)	N/A	N/A
Conclusion on Acceptance Criteria (Endpoint 4)	EnsoSleep PPG-TST and EEG-TST demonstrated statistically similar performance to the predicate device in all samples, meeting or exceeding all acceptance criteria for Deming regression and Bland-Altman analysis, demonstrating no clinically significant deviations.

Overall Conclusion on Study Meeting Acceptance Criteria:
"The subject EnsoSleep device event detection and diagnostic agreement performance were observed to meet or exceed the PA, NA, and OA performance acceptance in the 26 total experiments (26/26) across all 4 experimental endpoints evaluated..." indicating that the device successfully met all predefined acceptance criteria.

2. Sample Sizes Used for the Test Set and Data Provenance

Test Set Sample Sizes:
- Adult Sample: N=100 adult subjects (for Sleep Staging, Sleep Apnea Diagnostic, Event Detection, and TST/RR)
- Pediatric Sample: N=47 pediatric subjects (for Sleep Staging, Sleep Apnea Diagnostic, and Event Detection, and TST/RR)
- Respiratory Rate (RR) Sample: N=100 adult subjects (specifically for Respiratory Rate and TST analysis)
Data Provenance:
- "Archived collection of retrospective diagnostic clinical PSG subject data."
- The data was collected from "five (5) clinical testing laboratories" where "two (2) AASM Accredited Sleep Testing Facilities were selected each with two (2) regional sleep testing centers."
- No specific country of origin is mentioned, but the regulatory submission is to the FDA (U.S. Food & Drug Administration). Given the use of AASM (American Academy of Sleep Medicine) accreditation, it is highly likely the data originated from the United States.
- The data is explicitly stated as retrospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

Number of Experts: A panel of 9 total registered scoring technologists (RPSGTs) was used to establish the ground truth. From this panel, any given subject was assigned to 3 additional, prospective scorers.
Qualifications of Experts: The RPSGTs had "5 to 20+ years clinical experience". They were verified to meet "all defined study scoring-acquisition, scoring-blind, and rater-quality certification controls."

4. Adjudication Method for the Test Set

The ground truth was established using a "2/3 Majority Scoring consensus reference". This means that for any given event or stage, at least two out of the three independent scorers had to agree for it to be considered part of the ground truth.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

The study described is not a Multi-Reader Multi-Case (MRMC) comparative effectiveness study evaluating human readers' improvement with AI assistance.
Instead, it's a standalone performance study comparing the EnsoSleep algorithm's performance against expert consensus (human-scored ground truth) and demonstrating substantial equivalence to a predicate device.
The study design focused on the agreement of the device with human scorers, not on how the device assists human readers or changes their performance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was Done

Yes, a standalone performance study was done. The described tests for "event detection and diagnostic agreement performance" of the EnsoSleep device relate to its direct output compared to the ground truth established by human experts. The system "automatically score sleep study results" and "does not interpret the results, nor does it suggest a diagnosis." It marks events for a clinician's review and verification, implying its standalone detection capability is what was assessed.

7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)

The type of ground truth used was expert consensus, specifically a "2/3 Majority Scoring consensus reference" established by independent, registered sleep technologists (RPSGTs).

8. The Sample Size for the Training Set

The document does not explicitly state the sample size for the training set. The sample sizes provided (N=100 Adult, N=47 Pediatric, N=100 RR) refer specifically to the test sets used for clinical validation, described as "study sample[s]... to construct the final study sample."

9. How the Ground Truth for the Training Set Was Established

Since the training set size and characteristics are not detailed, the method for establishing its ground truth is also not described in this document. It is implied that the algorithm was trained on prior data, but the specifics of that process are outside the scope of this 510(k) summary, which focuses on the validation of the final product.

Ask a Question

Ask a specific question about this device

K Number

K202142

Device Name

Sleepware G3

Manufacturer

Respironics, Inc.

Date Cleared

2020-10-29

(90 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K142988

Predicate For

K231355,K241960

Intended Use

Sleepware G3 is a software application used for analysis (automatic and manual scoring), display, retrieval, summarization, report generation, and networking of data received from monitoring devices used to categorize sleep related events that help aid in the diagnosis of sleep-related disorders. It is indicated for use with adults (18 and older) and infant patients (one year old or less) in a clinical environment by or on the order of a physician.

The optional Somnolyzer scoring algorithms are for use with adults (18 and older) to generate an output that is ready for review and interpretation by a physician. Cardio-Respiratory Sleep Staging (CReSS) is an additionality of Somnolyzer which uses standard Home Sleep Apnea Test HSAT signals (in the absence of EEG signals) to infer sleep stage.

Device Description

Sleepware G3 software is a polysomnography scoring application, used by trained clinical professionals, for managing data from sleep diagnostic devices using a personal computer. Sleepware G3 is able to configure sleep diagnostic device parameters, transfer data stored in sleep diagnostic device memory to the personal host computer, process and auto-score data to display graphical and statistical analyses, provide aid to clinical professionals for evaluating the physiological data waveforms relevant to sleep monitoring, and create unique patient reports.

Sleepware G3 includes an optional Somnolyzer plug-in. The auto-scoring algorithms of the Somnolyzer Inside software can be used in addition to, or in the place of, the auto-scoring algorithms that are included in Sleepware G3.

Sleepware G3, remains unchanged in function and fundamental scientific technology from Sleepware G3 which was cleared under K142988.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets those criteria, based on the provided text:

Acceptance Criteria and Reported Device Performance

The acceptance criteria for the Somnolyzer scoring algorithms are based on demonstrating non-inferiority to manual expert scoring. The reported device performance indicates that all primary and secondary endpoints were met.

Acceptance Criterion (Non-Inferiority to Manual Expert Scoring)	Reported Device Performance
Full PSG Acquisition:
Sleep stages according to AASM criteria	Non-inferior (all primary and secondary endpoints met)
Arousals during sleep according to AASM criteria	Non-inferior (all primary and secondary endpoints met)
Apneas and hypopneas during sleep according to AASM criteria	Non-inferior (all primary and secondary endpoints met)
Periodic limb movements during sleep according to AASM criteria	Non-inferior (all primary and secondary endpoints met)
HST Acquisition:
Apneas and hypopneas according to AASM criteria	Non-inferior (all primary and secondary endpoints met)
Cardio-Respiratory Sleep Staging (CReSS):
REI based on cardio-respiratory feature-based sleep time is superior to REI based on monitoring time (for HST acquisition)	Evidence provided that REI calculated using CReSS is a more accurate estimate of AHI than REI calculated using total recording time. Accuracy further improved with additional signals: mean difference between REI and AHI reduced from -6.6 events/hour (95% CI -7.51 to -5.71) to -1.76 events/hour (95% CI -2.27 to -1.24).

Detailed Study Information:

Sample sizes used for the test set and the data provenance:
- Test Set Sample Size: A total of 1,204 polysomnography (PSG) and home sleep apnea test (HSAT) files were used in the five clinical studies.
- Data Provenance: The document does not explicitly state the country of origin. The studies are described as using a "large, diverse sample... collected via a number of different platforms," suggesting diverse sources but not specifying geographical location. The studies were likely retrospective, as they involved validating algorithms against existing manual scoring, but this is not explicitly stated.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Not explicitly stated how many individual experts were used across all studies. However, the non-inferiority margin for comparisons was "set at the lower-margin of the agreement observed across expert technologists." This implies multiple experts were involved in defining the range of agreement for ground truth.
- Qualifications of Experts: The experts are referred to as Registered Polysomnographic Technologists (RPSGT). This indicates their professional qualification in sleep study scoring.
Adjudication method for the test set:
- The document implies a form of consensus or agreement among experts was utilized to set the non-inferiority margin, but it does not explicitly describe a specific adjudication method like 2+1 or 3+1 for individual cases within the test set. The focus is on comparing the algorithm's performance against the established range of agreement among experts.
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was not the primary focus described. The study design primarily involved a standalone evaluation of the AI algorithm (Somnolyzer) against human expert scoring, demonstrating its non-inferiority.
- The document states that Somnolyzer's output is "ready for review and interpretation by a physician," implying it assists human readers by providing a pre-scored output. However, it does not quantify the improvement in human reader performance with AI assistance versus without AI assistance.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone evaluation was performed. The clinical studies "validated the Somnolyzer and CReSS algorithms against manual scoring." The non-inferiority claims ("Somnolyzer scoring... is non-inferior to manual expert scoring") directly refer to the algorithm's performance without human intervention in the scoring process.
The type of ground truth used:
- The ground truth was expert consensus scoring. The document states that the algorithms were validated "against manual scoring" by "expert technologists" (RPSGTs). The non-inferiority margin was based on "the agreement observed across expert technologists."
The sample size for the training set:
- The document does not provide information on the training set sample size. The provided text focuses solely on the clinical performance testing for validation.
How the ground truth for the training set was established:
- As the training set information is not provided, the method for establishing its ground truth is also not described in the document.

Ask a Question

Ask a specific question about this device

K Number

K192469

Device Name

Nox Sleep System

Manufacturer

Nox Medical

Date Cleared

2019-11-13

(65 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K082113,K122516,K072201

Predicate For

K223539,K241960

Intended Use

The Nox Sleep System is used as an aid in the diagnosis of different sleep disorders and for the assessment of sleep.

The Nox Sleep System is used to measure, record, display, organize, analyze, summarize, and retrieve physiological parameters during sleep and wake.

The Nox Sleep System allows the user to decide on the complexity of the study by varying the number and types of physiological signals measured.

The Nox Sleep System allows for generation of user/pre-defined reports based on subject's data.

The user of the Nox Sleep System are medical professionals who have received training in the areas of hospital/clinical procedures, physiological monitoring of human subjects, or sleep disorder investigation.

The intended environments are hospitals, institutions, sleep clinics, or other test environments, including patient's home.

Device Description

The Nox Sleep System is intended for patients undergoing physiological measurements, for the assessment of sleep quality and the screening for sleep disorders.

The Nox Sleep System does not provide any alarms and is not intended to be used for continuous monitoring where failure to operate can cause injuries or death of the patient.

The basic Nox Sleep System consists of two recording/acquisition devices (Nox A1 Recorder and Nox C1 Access Point), a software running on a PC (Noxturnal PSG), an Android application (Noxturnal App) running on mobile platform, along with sensors and accessories. The system supports full Polysomnography (PSG) studies both in ambulatory and online/attended setups but also more simple sleep study setups, recording only few channels. The ambulatory sleep studies may take place in the clinic or in the home environment, but the online/attended sleep studies are only conducted in the clinical environment.

The Nox A 1 Recorder is a small battery-operated recording unit that is worn by the patient during the study. It records signals from patient applied sensors that connect to the unit but also supports recording of signals from auxiliary devices over Bluetooth. The Nox A1 Recorder allows for communication over Bluetooth with the Noxturnal App during ambulatory setup and with the Nox C1 Access Point during online setup. The recorder is intended to be worn over clothing.

New accessories and sensors as part of this submission are the Nox A 1 EEG 5 Lead Gold Electrode Cable and Nox A1 EEG Head Cable that are used for recording of EEG/EOG. These components are in direct contact with the patient.

The Nox C1 Access Point is a separate mains powered unit located remotely from the patient that allows for recording of signals from auxiliary devices. It supports communication over LAN/Ethernet to the Noxturnal PSG, and communication with the Nox A1 Recorder and Noxturnal App over Bluetooth. The Nox C1 Access Point is only used for online study setup and is thus not intended to be used in the home environment.

The Noxturnal App is used as a mobile interface to the Nox A1 Recorder and Nox C1 Access Point. The communications are via Bluetooth link. The app is normally used in the beginning of a sleep study, for basic tasks such as device configuration, starting a recording, checking the signal quality of signals being recorded and marking events during bio calibration.

The Noxturnal PSG is used for configuration of the Nox recording/acquisition devices, to download a study from ambulatory recording or to collect an online study. The software supports the viewing, retrieving, storing and processing of data recorded/collected, manual and automatic analysis and reporting on the results from the recorded studies. The purpose with the automatic scoring function in Noxturnal PSG is to assist the trained physician in the diagnosis of a patient. It is not intended to provide the trained physician with a diagnostic results. The type of automatic analysis events scored by Noxturnal PSG include: Sleep Stages (Wake, N1, N2, N3, REM), Apneas, Hypopneas, Apnea Cassification (Obstructive, Mixed and Central Apneas), Limb Movements, Periodic Limb Movements, SpO2 Desaturation Events. and potential Bruxism-Related Events.

The result of the automatic analysis/scoring must always be manually verified by the trained physician prior to diagnosis.

AI/ML Overview

The Nox Sleep System is designed to aid in the diagnosis of sleep disorders and assess sleep quality by measuring, recording, displaying, organizing, analyzing, summarizing, and retrieving physiological parameters during sleep and wake. The system includes automatic scoring functionalities for various sleep events, which are intended to assist trained medical professionals in diagnosis.

Here's an analysis of the acceptance criteria and the study proving the device meets them:

1. A table of acceptance criteria and the reported device performance

The document presents separate sections for the performance of different automatic scoring algorithms rather than a single consolidated table. However, the information can be extracted and presented as follows:

Automatic Scoring Algorithm	Acceptance Criteria (Safety Endpoint/Justification)	Reported Device Performance
Bruxism Analysis	Detect at least 90% of oromandibular movements considered by a human expert to be bruxism-related events with 95% confidence (Sensitivity).	Sensitivity: 95.7% (95% CI 93.2% - 97.4%)Specificity: 61.0% (95% CI 58.9% - 63.0%)PPV: 34.6% (95% CI 32.0% - 37.3%)NPV: 98.5% (95% CI 97.7% - 99.1%)
PLM Analysis	Interclass correlation (ICC) of 0.61 or greater and a bias unlikely to impact a diagnosis for the Periodic Limb Movement Index.	ICC for Periodic Limb Movement Index: 0.87
Respiratory Flow Analysis (AHI)	Not classifying patients with an AHI below 5 as having an AHI greater than or equal to 15 (95% confidence), AND Not classifying patients with an AHI greater than or equal to 15 as having an AHI below 5 (95% confidence). Also, Cohen's kappa reported.	Cohen's Kappa for AHI (Cannula): 0.78Cohen's Kappa for AHI (RIP flow): 0.62 (95% CI 0.59-0.66)Cohen's Kappa for AHI (cRIP flow): 0.62 (95% CI 0.59-0.66)
Respiratory Flow Analysis (ODI)	(Implicitly similar to AHI, with Cohen's kappa reported)	Cohen's Kappa for ODI: 0.87
Apnea Classification	ICC comparable to what has been reported in scientific literature for Central Apnea Index (0.46).	ICC for Central Apnea Index: 0.91Cohen's Kappa for Central Apnea Index: 0.89
Sleep Staging Analysis	Average accuracy of at least 60% when scoring wake epochs to ensure total sleep time measurement with 10% error or less (assuming 80% sleep efficiency).	Cohen's Kappa: <=0.67Accuracy for Wake: 66.7%Accuracy for N1: 9.8%Accuracy for N2: 87.0%Accuracy for N3: 83.0%Accuracy for REM: 82.5%

2. Sample size used for the test set and the data provenance

The document states that the clinical performance testing for the automatic scoring algorithms involved "retrospectively analyzing pre-existing clinical data from sleep studies that had already been collected and manually scored as part of routine clinical care."

Bruxism Analysis: "adult Polygraphy recordings, including masseter EMG signal, recorded from multiple U.S. dental clinics."
PLM Analysis: "adult Polysomnography recordings recorded at a national hospital serving a general population who seeks medical attention at a sleep clinic."
Respiratory Flow Analysis (AHI & ODI): "adult Polysomnography recordings recorded at a national hospital serving a general population who seeks medical attention at a sleep clinic."
Apnea Classification: "adult Polygraphy recordings recorded at a national hospital serving a general population who seeks medical attention at a sleep clinic."
Sleep Staging Analysis: "adult recordings recorded at a national hospital serving a general population who seeks medical attention at a sleep clinic."

The exact sample size (number of patients/studies) for each test set is not explicitly provided in the given text. The provenance is retrospective and data was collected from U.S. dental clinics (for bruxism) and a national hospital (for other sleep studies).

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

The ground truth for the test set was established by "qualified polysomnographic technologists" who "followed the American Academy of Sleep Medicine (AASM) scoring guidelines."

The number of experts is not specified. Their qualification is: "qualified polysomnographic technologists." No details on years of experience are provided, but adherence to AASM guidelines suggests a standardized and recognized level of expertise.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

The document states, "All scorers were qualified polysomnographic technologists and followed the American Academy of Sleep Medicine (AASM) scoring guidelines." It then describes comparison of the automatic scoring results to "the results of the manually scored data."

This implies that the manual scoring by the polysomnographic technologists served as the ground truth. There is no explicit mention of an adjudication method (like 2+1 or 3+1) among multiple human scorers for establishing the ground truth. It seems to rely on single-expert scoring as the reference standard.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

No MRMC comparative effectiveness study was described where human readers' performance with and without AI assistance was evaluated. The study focused on the standalone performance of the automatic scoring algorithms against human-scored ground truth. The device is intended as an aid for professionals, requiring manual verification.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, standalone performance was done. The clinical performance testing directly compared the output of the automatic scoring algorithms to the manually scored ground truth data. The results (sensitivity, specificity, PPV, NPV, ICC, Cohen's Kappa, and accuracy) reported are measures of the algorithm's performance on its own.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

The ground truth used was expert consensus (manual scoring by qualified polysomnographic technologists) following the American Academy of Sleep Medicine (AASM) scoring guidelines. This is a form of expert consensus, even if the "consensus" here refers to the adherence to established scoring rules rather than explicit multi-reader agreement on individual cases.

8. The sample size for the training set

The document details the "clinical performance testing," which involved "retrospectively analyzing pre-existing clinical data." This section describes the test set used for validation.

The sample size for the training set is not provided or mentioned in this document. Information about the training data used to develop the automatic scoring algorithms is absent.

9. How the ground truth for the training set was established

Since the document does not discuss the training set, there is no information provided on how its ground truth was established.

Ask a Question

Ask a specific question about this device

Page 1 of 3