(151 days)
The MICHELE Sleep Scoring System is a computer program (software) intended for use as an aid for the diagnosis of sleep and respiratory related sleep disorders.
The MICHELE Sleep Scoring System is intended to be used for analysis (automatic scoring and manual rescoring), display, redisplay (retrieve), summarize, reports generation and networking of digital data collected by monitoring devices typically used to evaluate sleep and respiratory related sleep disorders.
The device is to be used under the supervision of a physician. Use is restricted to files obtained from adult patients.
The MICHELE Sleep Scoring System (MICHELE) is a software system that scans physiological data obtained during level 1 sleep studies, referred to as polysomnography (PSG) records, and applies a variety of analytical approaches to identify the occurrence of certain events that relate to the presence and type of sleep state, breathing abnormalities and limb movements. The system scores Sleep Stages, Arousals, Respiratory Events and Leg Movements. At the end of the analysis the system generates a PSG Report that includes tables and graphs typical of those generated following manual scoring of PSG records by certified technologists. The results of the automated scoring may be displayed using a PSG Scoring Viewer application, which allows manual editing of the results and generation of a revised PSG Report.
The device does not analyze data that are different from those analyzed by human scorers. It also neither interprets the results nor suggests a diagnosis.
Here's a detailed breakdown of the MICHELE Sleep Scoring System's acceptance criteria and the study that proves its performance, based on the provided text:
MICHELE Sleep Scoring System Performance Study Analysis
The MICHELE Sleep Scoring System is a software intended to aid in the diagnosis of sleep and respiratory-related sleep disorders by automatically scoring polysomnography (PSG) records.
1. Acceptance Criteria and Reported Device Performance
The acceptance criteria for the MICHELE system were established by comparing its performance against a consensus of human technologists and, in some cases, against predicate devices. The performance was evaluated using epoch-by-epoch agreement and agreement for clinically relevant data.
Table 1: Acceptance Criteria and Reported Device Performance (Epoch-by-Epoch Agreement)
Scoring Function | Metric | Acceptance Criteria (Implied: Better than Predicate Alice 5) | MICHELE Performance (vs. 2/3 Consensus) | Predicate Alice 5 Performance (vs. 2/3 Consensus) |
---|---|---|---|---|
All Sleep Stages | Overall % Agreement | > 30.5% | 82.6% | 30.5% |
Kappa | > 5.9% | 76.5% | 5.9% | |
Awake | APPA | > 5.4% | 89.9% | 5.4% |
ANPA | > 85.1% | 96.4% | 85.1% | |
N1 | APPA | > 2.3% | 50.4% | 2.3% |
ANPA | > 94.6% | 94.7% | 94.6% | |
N2 | APPA | > 42.1% | 82.9% | 42.1% |
ANPA | > 51.2% | 89.6% | 51.2% | |
N3 | APPA | > 34.7% | 82.9% | 34.7% |
ANPA | > 73.9% | 97.5% | 73.9% | |
REM | APPA | > 7.5% | 89.8% | 7.5% |
ANPA | > 93.1% | 98.5% | 93.1% | |
Arousals | Overall % Agreement | > 57.9% | 89.9% | 57.9% |
Kappa | > 10.0% | 54.2% | 10.0% | |
APPA (Yes) | > 28.1% | 60.0% | 28.1% | |
ANPA (None) | > 70.3% | 94.1% | 70.3% | |
Periodic Leg Movements (PLMs) | Overall % Agreement | > 88.3% | 95.7% | 88.3% |
Kappa | > 38.2% | 68.7% | 38.2% | |
APPA (Yes) | > 44.7% | 78.4% | 44.7% | |
ANPA (None) | > 93.4% | 97.6% | 93.4% | |
Respiratory Events (Criteria A) | Overall % Agreement | > 78.0% | 94.0% | 78.0% |
Kappa | > 24.7% | 74.2% | 24.7% | |
Respiratory Events (Criteria B) | Overall % Agreement | > 75.9% | 93.0% | 75.9% |
Kappa | > 23.1% | 70.4% | 23.1% |
Table 2: Acceptance Criteria and Reported Device Performance (Clinically Relevant Data - ICC)
Variable | Acceptance Criteria (Implied: Not significantly different from human variability & > Predicate Alice 5) | MICHELE ICC vs. Ave. Techs. | Predicate Alice 5 ICC vs. Ave. Techs. |
---|---|---|---|
Total sleep time (min) | > -0.226 | 0.983 | -0.226 |
Sleep efficiency (%) | > -0.243 | 0.985 | -0.243 |
Sleep-onset latency (min) | > -0.118 | 0.950 | -0.118 |
REM-onset latency (min) | N/A (Alice 5 didn't report) | 0.923 | N/A |
Stage wake (min) | > -0.229 | 0.986 | -0.229 |
Stage 1 (min) | > -0.219 | 0.876 | -0.219 |
Stage 2 (min) | > -0.288 | 0.922 | -0.288 |
Stage 1+2 (min) | > -0.192 | 0.923 | -0.192 |
Stage delta (min) | > -0.132 | 0.869 | -0.132 |
Stage REM (min) | > -0.282 | 0.951 | -0.282 |
Arousal Index (hr⁻¹) | > -0.251 | 0.566 | -0.251 |
PLM Index (hr⁻¹) | > 0.589 | 0.958 | 0.589 |
AHI A (hr⁻¹) | > 0.369 | 0.982 | 0.369 |
AHI B (hr⁻¹) | > 0.384 | 0.971 | 0.384 |
Average ICC | - | 0.918 | - |
Note: The acceptance criteria are implicitly defined by demonstrating superior or comparable performance to the predicate devices and strong agreement with human expert consensus.
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size for Test Set: 30 full night studies.
- Data Provenance: Retrospective, from the sleep laboratory of a tertiary care facility (Foothills Hospital, Calgary, Canada).
- Patient Characteristics: Included 19 patients with sleep apnea (15 moderate to severe), 9 with PLMs, 2 with severe sleep fragmentation, and 7 with normal sleep. Total Sleep Time ranged from 2.6 to 7.8 hours, sleep efficiency from 37% to 99%, and arousal index from 9 to 97 hr⁻¹. A total of 24,967 thirty-second epochs were scored.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: Three technologists.
- Qualifications of Experts: Each technologist was Board certified and had at least 15 years of hands-on experience in scoring polysomnograms.
4. Adjudication Method for the Test Set
- Adjudication Method: 2+1 (consensus of two out of three scorers). This means that for a ground truth label to be established, at least two of the three technologists had to agree on the scoring of an epoch or event.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? A direct MRMC comparative effectiveness study "with AI vs. without AI assistance" to measure human reader improvement was not explicitly reported in the provided text. The study focused on evaluating the standalone performance of the MICHELE system against human consensus and comparing it to predicate devices.
- However, the study implicitly compares the effectiveness of the automated system to human scoring (without assistance), and then highlights that MICHELE's performance matches or exceeds that of human inter-reader variability and predicate devices. For instance, the average ICC for MICHELE vs. average of three technologists (0.918) was only marginally below S1 (0.954) and not significantly different from S2 or S3 (all individual technologists were compared against the average of all three, including themselves). This suggests MICHELE performs at a level comparable to individual human experts.
6. Standalone (Algorithm-Only) Performance
- Was a standalone study done? Yes, the entire performance evaluation described is for the standalone (algorithm-only) performance of the MICHELE Sleep Scoring System. The system processes PSG records and applies analytical approaches to identify and score events, then generates a report. While the results may be displayed using a PSG Scoring Viewer application, which allows manual editing of the results, the performance metrics reported in Tables 6-1, 6-2, and 6-3 are for the automated scoring before any potential manual rescoring.
7. Type of Ground Truth Used
- Type of Ground Truth: Expert consensus. Specifically, a 2/3 consensus of three Board-certified technologists was used for epoch-by-epoch agreement and for calculating summary variables in the clinical report. The scoring was done according to the standard guidelines of the American Academy of Sleep Medicine (AASM) described in The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, American Academy of Sleep Medicine, Westchester, IL, 2007.
8. Sample Size for the Training Set
- The document does not explicitly state the sample size for a separate training set. The descriptions of the study focus on the evaluation of the device performance using the 30 full night studies. It's possible that these 30 studies or a subset were used for internal development/tuning, or that the system was developed using other proprietary data not detailed in this 510(k) summary.
9. How the Ground Truth for the Training Set Was Established
- As the training set size and its specific use are not detailed, the method for establishing ground truth for a training set (if distinct from the test set) is also not provided in this document. It is implied that the development and validation adhere to AASM guidelines, similar to how the test set ground truth was established by certified technologists.
§ 868.2375 Breathing frequency monitor.
(a)
Identification. A breathing (ventilatory) frequency monitor is a device intended to measure or monitor a patient's respiratory rate. The device may provide an audible or visible alarm when the respiratory rate, averaged over time, is outside operator settable alarm limits. This device does not include the apnea monitor classified in § 868.2377.(b)
Classification. Class II (performance standards).