Search Results

The Hypertension Notification Feature (HTNF) is a software-only mobile medical application that analyzes photoplethysmography (PPG) data opportunistically collected by Apple Watch to identify patterns that are suggestive of hypertension and provides a notification to the user.

The feature is intended for over-the-counter (OTC) use by adults age 22 and over who have not been previously diagnosed with hypertension. It is not intended to replace traditional methods of diagnosis, to monitor hypertension treatment effect, or to be used as a method of blood pressure surveillance. It is not intended for use during pregnancy. The absence of a notification does not indicate the absence of hypertension.

Device Description

The Hypertension Notification Feature (HTNF) is an over-the-counter mobile medical application that is intended to analyze data collected from the PPG sensor of the Apple Watch (a general purpose computing platform), over multiple days to surface a notification to users who may have hypertension. The feature is intended for adults who have not been previously diagnosed with hypertension. The feature is not intended for use during pregnancy. The feature is not intended to replace traditional methods of diagnosis, to monitor hypertension treatment effect, or to be used as a method of blood pressure surveillance.

Absence of a notification does not indicate the absence of hypertension. HTNF cannot identify every instance of hypertension. In addition, HTNF will not surface a notification if insufficient data is collected.

HTNF comprises the following features:
• A software feature on the Apple Watch ("Software Feature on Watch"), and
• A pair of software features on the iOS device ("Software Feature on iPhone" and "Software Feature on iPad")

On the Apple Watch, HTNF uses PPG data and qualification information from the watch platform. The Software Feature on watch incorporates a machine-learning model that gives each qualified PPG signal a score associated with risk of hypertension.

On the iPhone, HTNF incorporates an algorithm that aggregates qualified hypertension risk scores and identifies patterns suggestive of hypertension. If hypertension patterns are identified, the feature surfaces a notification to users that they may have hypertension. The feature includes a user interface (UI) framework to enable user on-boarding and display educational materials and hypertension notification history in the Hypertension Notification room in the Health app.

On the iPad, HTNF provides a data viewing framework to display hypertension notification history in the Hypertension Notification room in Health app.

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the Apple Hypertension Notification Feature (HTNF) meets them, based on the provided FDA 510(k) clearance letter:

Apple Hypertension Notification Feature (HTNF) - Acceptance Criteria and Study Summary

1. Table of Acceptance Criteria and Reported Device Performance

Metric	Acceptance Criteria (Explicitly Stated Goals)	Reported Device Performance (Clinical Validation)
Overall Sensitivity	"met all pre-determined primary endpoints" (implies a specific target was met, but the value itself is not given as the criteria here)	41.2% (95% CI [37.2, 45.3])
Overall Specificity	"met all pre-determined primary endpoints" (implies a specific target was met, but the value itself is not given as the criteria here)	92.3% (95% CI [90.6, 93.7])
Hypertension Definition	Average systolic blood pressure ≥ 130 mmHg OR diastolic blood pressure ≥ 80 mmHg (America Heart Association guidelines)	Used as the ground truth for hypertension status
Sensitivity for Stage 2 HTN	Not explicitly stated as an acceptance criterion/primary endpoint, but analyzed	53.7% (95% CI [47.7, 59.7])
Specificity for Normotensive	Not explicitly stated as an acceptance criterion/primary endpoint, but analyzed	95.3% (95% CI [93.7, 96.5])
Long-term Specificity (Non-Hypertensives)	Not explicitly stated as an acceptance criterion/primary endpoint, but observed	86.4% (95% CI [80.2%, 92.5%]) after 2 years
Long-term Specificity (Normotensives)	Not explicitly stated as an acceptance criterion/primary endpoint, but observed	92.5% (95% CI [86.8%, 98.3%]) after 2 years

Note: The document states that the feature "met all pre-determined primary endpoints" for overall sensitivity and specificity, but the specific numerical targets for these endpoints are not directly listed as "acceptance criteria" in the provided text. The reported performance values are the results from the clinical study that met these implicit criteria.

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size:
- Clinical Validation Study: 2,229 enrolled subjects, with 1,863 subjects providing at least 15 days of usable data for the primary endpoint analysis.
- Longitudinal Performance Evaluation: 187 non-hypertensive subjects.
Data Provenance: The document does not explicitly state the country of origin for the data. However, it indicates subjects were "enrolled from diverse demographic groups" and "representative of the intended use population." The study described is a prospective clinical validation study where subjects wore an Apple Watch and measured blood pressure.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

The document does not specify the use of "experts" to establish the ground truth for the test set.

Ground Truth Method: Hypertension status was defined based on objective measurements from an FDA-cleared home blood pressure monitor. Specifically, "Hypertension is established as average systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 80 mmHg by America Heart Association." Therefore, expert consensus was not the primary method for ground truth determination in the principal clinical study.

4. Adjudication Method for the Test Set

Not applicable, as the ground truth was based on objective blood pressure monitor readings against established guidelines, not expert review requiring adjudication.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and Effect Size

No, an MRMC comparative effectiveness study was not conducted. The HTNF is an "algorithm only" device designed to provide notifications to lay users, not an assistive tool for human readers in a diagnostic setting.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, the primary clinical validation study assessed the standalone performance of the HTNF algorithm. The device "analyzes photoplethysmography (PPG) data... to identify patterns that are suggestive of hypertension and provides a notification to the user," without human intervention in the interpretation of the PPG data for notification generation.

7. The Type of Ground Truth Used

The ground truth used for the clinical validation study was objective outcome data (blood pressure measurements). Specifically, "Hypertension is established as average systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 80 mmHg by America Heart Association" using an FDA-cleared home blood pressure monitor as the reference.

8. The Sample Size for the Training Set

The document describes the algorithm development dataset as follows:

Self-supervised learning for deep-learning (DL) model: "large-scale unlabeled data... included Apple Watch sensor data collected over 86,000 participants."
Linear model training for classification: "included Apple Watch sensor data and home blood pressure reference measurements collected over 9,800 participants."

These datasets were pooled and split into Training, Train Dev, Test Dev, and Test sets for model development.

9. How the Ground Truth for the Training Set Was Established

For the linear model that provides specific hypertension classifications (hypertensive vs. non-hypertensive), the ground truth for the training set was established using home blood pressure reference measurements. For the self-supervised deep learning model, it used "large-scale unlabeled data" where ground truth for hypertension status wasn't required for pre-training.

Ask a Question

Ask a specific question about this device

K Number

K250143

Device Name

Digital Prism Correction Feature (DPCF)

Manufacturer

Apple Inc.

Date Cleared

2025-06-23

(157 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K240585

Predicate For

N/A

Intended Use

The Digital Prism Correction Feature (DPCF) is software that is intended to provide digital image adjustments in Apple Vision Pro in accordance with a user's prism prescription.

DPCF is available over-the-counter (OTC) for users with prism in their eyeglass prescription. When a prescription also includes other parts (e.g., sphere, cylinder, ADD), which can be fulfilled by optical inserts, the DPCF fulfills the prism part of the prescription while using Apple Vision Pro.

DPCF supports prism prescriptions up to 7.75 Prism Diopters (PD) in the horizontal and/or vertical dimension (i.e., base-up, base-down, base-in, base-out), per eye.

Device Description

The Digital Prism Correction Feature (DPCF) is intended to provide a high quality visual experience in Apple Vision Pro spatial computing applications for users with a prism prescription. Specifically, the DPCF is software that is intended to provide digital image adjustments in Apple Vision Pro in accordance with a user's prism prescription, in the horizontal and/or vertical dimensions. DPCF fulfills the prism part of an eyeglass prescription. When an eyeglass prescription also includes other parts (e.g., sphere, cylinder, ADD), DPCF fulfills the prism part of the prescription, while prescription optical inserts fulfill the other parts of the prescription.

The DPCF achieves its intended use by converting a user's prism prescription into digital image adjustment parameters that are utilized by the spatial computing image system to automatically provide digital image adjustments in horizontal and/or vertical dimensions in accordance with a user's prism prescription. At this time, DPCF supports prism prescriptions up to 7.75 Prism Diopters (PD) in the horizontal and/or vertical dimensions (i.e, base-up, base-down, base-in, base-out), per eye.

The DPCF is available over-the-counter (OTC).

AI/ML Overview

The provided FDA 510(k) clearance letter and summary for the Apple Digital Prism Correction Feature (DPCF) primarily discuss its substantial equivalence to a predicate device and its move from prescription to over-the-counter (OTC) use. It does not contain an in-depth study proving the device meets acceptance criteria in the typical sense of a clinical trial for diagnostic AI.

However, based on the information provided, we can extract details about the acceptance criteria and the type of study conducted to support the device's performance, particularly focusing on the "Summary of Non-Clinical Testing."

Here's an analysis of the requested information:

Acceptance Criteria and Reported Device Performance

Acceptance Criteria (Target)	Reported Device Performance
Meets standardized prism tolerance requirements (ISO 8980-1:2017)	"demonstrated DPCF meets prism tolerance requirements specified in ISO 8980-1:2017"
Acceptable use-related risks for OTC use	"demonstrate that the use-related risks are acceptable"

Study Details

2. Sample size used for the test set and the data provenance:

Test Set Sample Size: For the human factors and usability study, the sample size was 30 subjects.
Data Provenance: The document does not explicitly state the country of origin. The study was conducted as non-clinical testing to assess "self-selection and use-related risks associated with use of the DPCF as an OTC device." The nature of "bench validation testing" mentioned for prism tolerance requirements suggests it's a controlled engineering test rather than a patient data-driven study.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

The document does not specify the number or qualifications of experts involved in establishing ground truth for the human factors study. For “bench validation testing” which measured prism tolerance, the ground truth would be based on metrological standards and precision instruments, rather than expert judgment.

4. Adjudication method for the test set:

The document does not describe any adjudication method. The human factors study assessed self-selection and use-related risks, which would typically involve observing user interactions and collecting feedback, rather than a diagnostic accuracy adjudication process. The bench testing involves direct measurement against a standard.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

No, an MRMC comparative effectiveness study was not explicitly mentioned or performed. The DPCF is a "Digital Prism Correction Feature" designed to "provide digital image adjustments" based on a user's existing prism prescription. It's not a diagnostic AI device that assists human readers in interpreting medical images or data. Therefore, the concept of "human readers improving with AI vs without AI assistance" does not directly apply to the described function of this device, which seems to be a corrective rather than diagnostic tool.

6. If a standalone (i.e. algorithm only without human-in-the loop performance) was done:

Yes, in essence. The "bench validation testing that demonstrated DPCF meets prism tolerance requirements specified in ISO 8980-1:2017" represents a standalone evaluation of the algorithm's output (the digital image adjustment) against a predefined standard, independent of human interaction for interpretation. The human factors study, while involving humans, evaluates the usability and safety of the interface for an OTC product, not the diagnostic performance of an algorithm.

7. The type of ground truth used:

For prism tolerance requirements: The ground truth is based on standardized metrological requirements as specified in ISO 8980-1:2017. This is a technical standard for ophthalmic optics.
For human factors and usability: The "ground truth" would be the assessment of use-related risks against predefined safety and usability thresholds, determined through observation and feedback collection in a controlled user study.

8. The sample size for the training set:

The document does not provide information regarding a training set sample size. This is consistent with the device being a "Digital Prism Correction Feature" that applies a known optical principle (prism correction) digitally, rather than a machine learning model trained on a large dataset to perform a diagnostic or predictive task. The device likely uses algorithms based on physics and geometry to implement the prism correction, rather than being a trained AI model in the typical sense.

9. How the ground truth for the training set was established:

As no training set is mentioned in the context of machine learning, there is no information provided on how ground truth for a training set was established. The device functions as a digital implementation of an established optical correction principle.

Summary of Device Functionality Context: The DPCF appears to be a software feature that digitally applies prism correction within the Apple Vision Pro. Its "AI" component, if any, is not a diagnostic or predictive model in the typical sense seen in many FDA-cleared AI pathology or radiology devices. Instead, it seems to be a precise digital enactment of a known optical principle to correct for existing prism prescriptions. The studies described focus on whether this digital implementation meets optical accuracy standards and whether its use as an over-the-counter device is safe and usable.

Ask a Question

Ask a specific question about this device

K Number

K242058

Device Name

Digital Prism Correction Feature (DPCF)

Manufacturer

Apple Inc.

Date Cleared

2024-10-21

(98 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The Digital Prism Correction Feature (DPCF) is software that is intended to provide digital image adjustments in Apple Vision Pro in accordance with a user's prism prescription.

DPCF is available for users with prism in their eyeglass prescription also includes other parts (e.g., sphere, cylinder, ADD), which can be fulfilled by optical inserts, the DPCF fulfills the prism part of the prescription while using Apple Vision Pro.

DPCF supports prism prescriptions up to 7.75 Prism Diopters (PD) in the horizontal and/or vertical dimensions (i.e., baseup, base-down, base-in, base-out), per eye.

Device Description

AI/ML Overview

The provided document, a 510(k) summary for Apple's Digital Prism Correction Feature (DPCF), outlines the acceptance criteria and the study conducted to prove the device meets these criteria.

Here's the breakdown:

1. Table of Acceptance Criteria and Reported Device Performance:

Acceptance Criteria (What the device must achieve)	Reported Device Performance (How the device performed)
Provide prismatic adjustments in accordance with a prism prescription.	The bench validation testing demonstrated that the DPCF provides prismatic adjustments in accordance with a prism prescription.
Meet the prism tolerance requirements specified in ISO 8980-1:2017.	The results validate that the digital image adjustments provided by DPCF meet the prism tolerance requirements specified in ISO 8980-1:2017.
Provide reliable and acceptable prism adjustments for the available prism adjustment range (up to 7.75 Prism Diopters (PD) in horizontal and/or vertical dimensions, per eye).	The results demonstrate that the DPCF provides reliable and acceptable prism adjustments for the available prism adjustment range. (Maximum supported range is 7.75 PD horizontal and/or vertical, per eye).
Perform as intended with and without optical inserts.	The results demonstrate that the DPCF performs as intended with and without optical inserts.
The general purpose spatial computing platform (Apple Vision Pro) and its "other functions" do not adversely affect DPCF's ability to meet standardized prism tolerances.	The impact of the general purpose spatial computing platform on DPCF was assessed as part of the feature's risk management, verification, and validation activities, and determined to be acceptable. (This implies it does not adversely affect performance).
Software appropriately designed, verified, and validated (based on FDA guidance "Content of Premarket Submissions for Device Software Functions").	Software verification and validation was conducted in accordance with Apple's robust quality system and documented to address the recommendations in FDA's "Content of Premarket Submissions for Device Software Functions" guidance document. DPCF was determined to be a Basic Documentation Level. Apple's good software engineering practices, as demonstrated by the 510(k) submission's documentation, supports a conclusion that DPCF was appropriately designed, verified, and validated.

Summary of the Study Proving Device Meets Acceptance Criteria:

The study conducted was Non-Clinical Testing, specifically focusing on Bench Validation Testing and Software Verification and Validation. No clinical testing was performed or submitted.

2. Sample Size Used for the Test Set and Data Provenance:

Sample Size for Test Set: The document does not explicitly state a sample size for the bench validation testing. It mentions "results" from the testing but doesn't quantify the number of instances or measurements taken.
Data Provenance: The document does not specify the country of origin for the data or whether it was retrospective or prospective. It implies the testing was conducted by Apple Inc., an American company, but no further details are given. The nature of "bench validation testing" suggests prospective testing conducted specifically for this submission.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:

The document does not mention the use of experts to establish ground truth for the bench validation testing. The ground truth was established by standardized prism tolerance requirements (ISO 8980-1:2017), which are technical and objective metrics, not requiring expert human adjudication in the typical sense for medical imaging AI.

4. Adjudication Method for the Test Set:

None. As the ground truth was established by objective technical standards (ISO 8980-1:2017), there was no need for human adjudication (e.g., 2+1, 3+1). The testing involved measuring the device's output against these predefined technical tolerances.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done:

No, an MRMC study was NOT done. The document explicitly states: "No clinical testing data has been submitted." The device is a "software-only" device that modifies digital images based on a user's prism prescription, and its performance was evaluated against technical standards, not by human readers comparing performance with and without AI assistance.

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) was Done:

Yes, in essence. The "Bench Validation Testing" evaluated the DPCF's ability to meet "standardized prism tolerance requirements" as an independent function. While the DPCF operates within the Apple Vision Pro, the testing focused on the device's algorithmic output (digital image adjustments) against an objective standard, rather than its effect on human performance. The assessment that the "general purpose spatial computing platform... do not adversely affect the DPCF's ability to meet standardized prism tolerances" further supports this standalone assessment of the DPCF's core function.

7. The Type of Ground Truth Used:

The ground truth used was objective technical standards/specifications, specifically the prism tolerance requirements specified in ISO 8980-1:2017. This is a well-established international standard for ophthalmic optics.

8. The Sample Size for the Training Set:

The document does not specify a training set sample size. The DPCF is described as software that converts a user's prism prescription into digital image adjustment parameters. This implies a rule-based or calculative approach based on optical principles rather than a machine learning model that would typically require a large training dataset. The "software engineering practices" and "bench validation testing" validate the implementation of these optical principles.

9. How the Ground Truth for the Training Set Was Established:

Not applicable in the typical sense for machine learning training sets. Given the nature of the device (applying prism corrections based on a prescription), the "ground truth" for its function is rooted in established optical principles and formulas for prism correction in optics. The software's "design" (as mentioned in "design controls") would incorporate these principles. The validation then confirms that the software's output aligns with these physical optical truths as defined by ISO standards.

Ask a Question

Ask a specific question about this device

K Number

K240929

Device Name

Sleep Apnea Notification Feature (SANF)

Manufacturer

Apple Inc.

Date Cleared

2024-09-13

(162 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K192469,DEN230041

Predicate For

N/A

Intended Use

The Sleep Apnea Notification Feature (SANF) is a software-only mobile medical application that analyzes Apple Watch sensor data to identify patterns of breathing disturbances suggestive of moderate-to-severe sleep apnea and provides a notification to the user. This feature is intended for over-the-counter (OTC) use by adults age 18 and over who have not previously received a sleep apnea diagnosis and is not intended to diagnose, treat, or aid in the management of sleep apnea. The absence of a notification is not intended to indicate the absence of sleep apnea.

Device Description

The Sleep Apnea Notification Feature (SANF) is an over-the-counter mobile medical application (MMA) intended to identify patterns of breathing disturbances suggestive of moderate-to-severe sleep apnea and provide a notification to the user. SANF is intended to run on compatible iOS (e.g. iPhone, iPad) and Apple Watch platforms. Users set up SANF and view their health data on the iOS platform. Prior to use, users must undergo educational onboarding. SANF uses accelerometer sensor data collected by the Apple Watch to calculate breathing disturbance values while a user is asleep. Breathing disturbances describe transient changes in breathing patterns, such as temporary breathing interruptions.

Breathing disturbance data is analyzed in discrete, consecutive 30-day evaluation windows, If patterns consistent with moderate-to-severe sleep apnea are identified within the 30-day evaluation window, the user is notified. SANF provides visualizations depicting the user's breathing disturbance data over various time scales. SANF is not intended to provide instantaneous measurements. Instead, once activated, SANF runs opportunistically in the background receiving signals from Apple Watch sensors for processing.

AI/ML Overview

Here's a summary of the acceptance criteria and study details for the Sleep Apnea Notification Feature (SANF), based on the provided FDA 510(k) summary:

1. Table of Acceptance Criteria and Reported Device Performance

Metric	Acceptance Criteria (Stated Goal)	Reported Device Performance (95% CI)
Sensitivity	Optimized for high specificity given SANF is designed as an opportunistic detection feature.	66.3% [62.2%, 70.3%] for moderate-to-severe sleep apnea (AHI ≥ 15)
Specificity	Optimized for high specificity given SANF is designed as an opportunistic detection feature.	98.5% [98.0%, 99.0%] for normal-to-mild sleep apnea (AHI < 15)
False Positives	SANF did not falsely notify any subjects with normal AHI (AHI < 5).	0% (implicitly, based on the statement above)
Breathing Disturbance Estimates (Proportion within pre-specified performance zone)	Not explicitly stated as a numerical acceptance criterion, but implicitly that it demonstrates effectiveness.	91.4% (1,193 out of 1,305 subjects)

Note: The document emphasizes that performance was "optimized for high specificity" given the opportunistic detection nature of the device. This implies that while a specific numerical sensitivity might not have been a hard "acceptance criterion" per se, the reported sensitivity alongside high specificity demonstrated sufficient effectiveness for clearance.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set (Clinical Study):
- Notification Performance Analysis: 1,278 subjects
- Breathing Disturbance Performance Analysis: 1,305 subjects
- Total Subjects Enrolled: 1,499 subjects (some had insufficient data for analysis)
Data Provenance:
- Country of Origin: United States (from "several sites across the United States").
- Retrospective or Prospective: Prospective. The study "enrolling 1,499 subjects" suggests a prospective collection of data specifically for this validation study.

3. Number of Experts Used to Establish the Ground Truth and Qualifications

The document refers to the "Nox T3s home sleep apnea testing (HSAT) device (K192469) as a reference device" for ground truth. The HSAT device itself provides the AHI (Apnea-Hypopnea Index) which is the clinical standard for sleep apnea diagnosis.

Thus, the ground truth was established by the HSAT device, not by human experts directly adjudicating each case. The output of the HSAT device is the ground truth measure (AHI).

4. Adjudication Method for the Test Set

The ground truth was established by the Nox T3s HSAT device, which is an objective measurement device. Therefore, a human expert adjudication method (like 2+1 or 3+1) was not explicitly mentioned or performed for the primary clinical endpoint, as the HSAT device is considered the reference standard. The AHI values derived from the HSAT device served as the diagnostic ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

There is no mention of an MRMC comparative effectiveness study involving human readers with or without AI assistance. The study focuses on the standalone performance of the SANF device against a reference standard (HSAT).

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance)

Yes, a standalone performance study was done. The reported sensitivity and specificity values are for the algorithm's performance in identifying patterns suggestive of moderate-to-severe sleep apnea and providing a notification, without human intervention in the interpretation or decision-making process based on the device's output. The device itself "provides a notification to the user," implying direct algorithm output.

7. Type of Ground Truth Used

The ground truth used was objective diagnostic data derived from a medical device: The Nox T3s home sleep apnea testing (HSAT) device, which provides the Apnea-Hypopnea Index (AHI). This is considered a gold standard for diagnosing and classifying the severity of sleep apnea in a home setting.

8. Sample Size for the Training Set

The algorithm development dataset included over 11,000 nights of concurrent reference and watch sensor data.

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set was established using concurrent in-lab polysomnography (PSG) and Home Sleep Apnea Test (HSAT) reference recordings. These are the gold standard diagnostic tests for sleep apnea, providing objective measures like the Apnea-Hypopnea Index (AHI). The document also mentions that the distribution of sleep apnea classifications (normal, mild, moderate, severe) was broad in this dataset.

Ask a Question

Ask a specific question about this device

K Number

DEN230081

Device Name

Hearing Aid Feature (HAF)

Manufacturer

Apple Inc.

Date Cleared

2024-09-12

(283 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The Hearing Aid Feature is a software-only mobile medical application that is intended to be used with compatible wearable electronic products. The feature is intended to amplify sound for individuals 18 years of age or older with perceived mild to moderate hearing impairment. The Hearing Aid Feature utilizes a self-fitting strategy and is adjusted by the user to meet their hearing needs without the assistance of a hearing healthcare professional. The device is intended for Over-the-Counter use.

Device Description

The Hearing Aid Feature (HAF) is a software-only device that is comprised of a pair of software modules which operate on two separate required products: (1) HAF iOS Application on a compatible iOS product, and (2) HAF software (i.e., firmware) on the Apple AirPods Pro 2. Refer to Figure I, middle and right, respectively, The AirPods Pro 2, formerly named AirPods Pro (2nd generation), supported this granting and are hereafter simply referred to as "AirPods Pro" in this document.

The HAF iOS Application guides users through the onboarding and setup process for the HAF. The process is self-guided by the user and includes step-by-step instructions and informational content (e.g. warnings, instructions for use). To initiate HAF setup, the user must select a saved audiogram from the iOS HealthKit.

Once the audiogram has been imported by the HAF, the feature will configure the amplification for the user's audiogram based upon Apple's proprietary fitting formula. Once the initial set-up is complete, users can listen with the HAF using the AirPods Pro and refine their settings. Fine tuning is facilitated by user controls on the iOS device that can adjust amplification, tone, and balance. A user can access the fine tuning settings at any time after setting up the HAF.

The HAF settings are transferred to the HAF Firmware Module on the AirPods Pro. The HAF Firmware Module utilizes the general purpose computing platform features of the AirPods Pro, including the microphone, speakers, amplifiers, and audio processing software, to process incoming sound and provide amplification at a specific frequency and gain based on the user's custom settings. The user's custom settings are stored on the HAF Firmware Module and will be available even when the AirPods Pro are not connected to the iOS device.

AI/ML Overview

Acceptance Criteria and Device Performance for Apple's Hearing Aid Feature (HAF)

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria (Regulatory Standard)	Reported Device Performance
Output Limits (21 CFR 800.30(d))
1. General output limit (111 dB SPL)	N/A (input-controlled compression device)
2. Output limit for input-controlled compression (117 dB SPL)	Max OSPL90: 105.93 dB SPL
Electroacoustic performance limits (21 CFR 800.30(e))
1. Output distortion control limits (Total harmonic distortion + noise ≤ 5%)	Harmonic distortion does not exceed 1% for any test frequency
2. Self-generated noise level limits (Self-generated noise ≤ 32 dBA)	Max Self-Generated Noise: 28.20 dBA
3. Latency (Latency ≤ 15 ms)	Median Latency: 3.15 ms
4. Frequency response bandwidth (Lower cutoff ≤ 250 Hz, upper cutoff ≥ 5 kHz)	Frequency bandwidth: 100 - 10,000 Hz
5. Frequency response smoothness (No single peak in one-third-octave response > 12 dB relative to average levels of adjacent bands)	All peaks < 12dB relative to adjacent bands, within frequency response bandwidth
6. Acoustic coupler choice (2-cubic centimeter (cm³) acoustic coupler when compatible)	2 cm³ acoustic coupler used
Design requirements (21 CFR 800.30(f))
1. Insertion depth (> 10 mm from tympanic membrane)	>10mm gap from tympanic membrane. Verified via multiple ear tip sizes, instructions, and usability testing.
2. Use of atraumatic materials	AirPods Pro platform verified to use atraumatic patient-contacting materials.
3. Proper physical fit	Met for AirPods Pro platform, refer to insertion depth verification.
4. Tools, tests, or software permit lay user control and customization	HAF fitting customized based on input audiogram; three fine-tuning sliders (amplification, tone, balance) for user customization.
5. User-adjustable volume control	HAF has an amplification fine-tuning slider to adjust volume.
6. Adequate reprocessing	Adequacy of reprocessing for AirPods Pro platform verified via instructions and design mitigations.
Clinical Performance - Non-inferiority
IOI-HA score of Self-Fit group is no more than 3 points below that of Professionally-Fit group.	FAS/CCAS set: Mean Difference (Pro-Fit - Self-Fit) = 1.17 (SD 3.34), 95% CI (-0.05, 2.39). P-value = 0.0036. Pass.
	PP set: Mean Difference (Pro-Fit - Self-Fit) = 1.23 (SD 3.34), 95% CI (0.01, 2.46). P-value = 0.0050. Pass.
Supplemental Clinical Data: Apple Hearing Test Feature Validation
HTF derived audiograms' pure-tone average similar to professionally derived audiograms.	Demonstrated similar pure-tone average for HTF derived audiograms as professionally derived audiograms for the same users (n=202).
Gain values generated by HAF for HTF vs. professionally-derived audiograms are within +/- 5 dB for >90% of differences.	Output gains across all test frequencies were within +/- 5 dB for >98% of gain differences (for subset of n=173 subjects with mild to moderate hearing loss).

2. Sample Sizes Used for the Test Set and Data Provenance

Bench/Non-Clinical Tests:

Performance Testing (21 CFR 800.30(d) & (e)): No specific sample size (n) is provided, but the tests refer to "all test frequencies" and compliance with ANSI/ASA S3.22 or ANSI/CTA 2051:2017 clauses. This implies comprehensive testing across the specified parameters, rather than a limited sample.
Human Factors Formative Testing: 39 subjects.
Audiogram Input Risk and Mitigation Study: No specific sample size (n) for the study itself, but refers to the Hearing Test Feature (HTF) validation study dataset.

Clinical Study:

Overall Clinical Study (HAF Self-Fit vs. Professionally-Fit): 118 total participants (59 in Self-Fit group, 59 in Professionally-Fit group for FAS/CCAS; 59 in Self-Fit, 58 in Professionally-Fit for PP analysis).
Data Provenance: Prospective, non-significant risk study from three sites across the United States.

Supplemental Clinical Data (Apple Hearing Test Feature Validation):

Comparison of HTF outputs to professionally derived audiograms: n = 202.
Gain analysis for HAF with HTF vs. professionally-derived audiograms: n = 173 (subset with mild to moderate hearing loss from the n=202 dataset).

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Bench/Non-Clinical Tests:

Performance Testing: Ground truth is established by well-defined regulatory standards (21 CFR 800.30) and industry standards (ANSI/ASA S3.22, ANSI/CTA 2051:2017). Expertise is inherent in the test methodologies themselves.
Human Factors Formative Testing: No explicitly stated "experts" establishing ground truth in the context of diagnostic accuracy. The testing assessed use-related risks, and findings led to software design modifications.
Audiogram Input Risk and Mitigation Study: No explicitly stated "experts" for ground truth on this specific study, but the study for the Hearing Test Feature Validation (see below) involved professional audiograms which would have been established by qualified audiologists.

Clinical Study (HAF Self-Fit vs. Professionally-Fit):

Ground Truth for Professional-Fit (PF) Group: The "Professionally-Fit" group had their hearing aids fitted by an audiologist and underwent an optional audiologist fine-tuning session. This implies a number of audiologists (not specified but plural) provided this professional fit, thus establishing a "ground truth" reference for professional care. The study design intrinsically compares the self-fit approach to professional audiologist care, using the latter as the benchmark for a successful fit in terms of patient-perceived benefit.

Supplemental Clinical Data (Apple Hearing Test Feature Validation):

Comparison to Professionally Derived Audiograms: These would have been established by qualified hearing healthcare professionals, such as audiologists. The exact number of such professionals establishing these audiograms for the 202 subjects is not specified but the term "professionally derived" implies expertise.

4. Adjudication Method for the Test Set

Bench/Non-Clinical Tests:

Performance Testing: Not applicable; compliance is determined by direct measurement against pre-defined numerical thresholds in regulatory and industry standards.
Human Factors Formative Testing: Not applicable; the output is identification of use-related risks and subsequent design modifications.
Audiogram Input Risk and Mitigation Study: Not applicable.

Clinical Study (HAF Self-Fit vs. Professionally-Fit):

Primary Endpoint (IOI-HA score): Not applicable in the sense of expert adjudication of a diagnostic finding. The primary outcome was a patient-reported outcome measure (IOI-HA score), a subjective assessment collected directly from participants. The comparison was statistical (non-inferiority margin) between the two groups.
Objective Measures (QuickSIN, REM): These are objective measurements and do not require adjudication.

Supplemental Clinical Data (Apple Hearing Test Feature Validation):

Audiogram Comparison & Gain Analysis: Not applicable. The comparison was quantitative (pure-tone average, gain differences) between HTF-derived and professionally-derived audiograms.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a traditional MRMC comparative effectiveness study, as typically seen in diagnostic imaging where multiple readers interpret cases with and without AI assistance, was not performed for the core Hearing Aid Feature.
Instead, a clinical study compared two groups:
- Self-Fit (SF): Users applying the HAF's self-fitting algorithm.
- Professionally-Fit (PF): Users whose devices were fitted by an audiologist using the NAL-NL2 formula.
This study evaluated the effectiveness of the HAF's self-fitting approach directly against professional care by assessing patient-reported outcomes (IOI-HA) and objective measures (QuickSIN, REM).
Effect Size of Human Readers Improve with AI vs without AI assistance: This metric is not applicable as the study design was a comparison of a self-fitting AI system against professional human fitting, not a study of human readers improving with AI assistance. The study concluded that the HAF Self-Fit group achieved non-inferior perceived benefit (IOI-HA scores) compared to the Professionally-Fit group, indicating equivalent patient outcomes without the direct involvement of a hearing healthcare professional in the fitting process.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

Yes, a standalone study was inherently performed to assess the performance of the HAF's self-fitting algorithm.
The "Self-Fit (SF)" group in the clinical study directly represents the standalone performance of the algorithm. These users utilized the HAF's automatic fitting algorithm and then could adjust amplification, tone, and balance themselves. The device's performance, as measured by IOI-HA scores, QuickSIN, and REM, was attributed to this self-fitting strategy.
The comparison to the "Professionally-Fit (PF)" group served as a benchmark for what a human expert (audiologist) would achieve.
Therefore, the clinical study's SF arm is a direct measure of the algorithm's standalone performance in a real-world setting.

7. Type of Ground Truth Used

Bench/Non-Clinical Tests:

Regulatory and Industry Standards: Ground truth is defined by explicit numerical thresholds and methodologies prescribed by 21 CFR 800.30 and ANSI/ASA/CTA standards.
Human Factors: "Ground truth" is the identification of potential use errors and associated risks, which is derived from observing user interactions and conducting risk analysis.

Clinical Study (HAF Self-Fit vs. Professionally-Fit):

Expert Consensus / Professional Practice: The "Professionally-Fit" group, whose devices were fitted by audiologists using a standard clinical fitting formula (NAL-NL2), served as the gold standard or ground truth for best clinical practice in hearing aid fitting. The HAF's performance was evaluated against this professional benchmark.
Patient-Reported Outcomes (IOI-HA): The primary ground truth for effectiveness was the subjective perception of benefit, satisfaction, and quality of life as reported by the patients themselves via the IOI-HA questionnaire.
Objective Outcomes Data (QuickSIN, REM): These objective functional measures also served as ground truth regarding speech intelligibility and actual gain.

Supplemental Clinical Data (Apple Hearing Test Feature Validation):

Expert Consensus / Professional Practice: The comparison was made against "professionally derived audiograms," implying that these audiograms, established by trained hearing healthcare professionals, served as the ground truth.

8. Sample Size for the Training Set

The document does not explicitly state the sample size for the training set used to develop Apple's proprietary fitting formula within the HAF. The description only refers to the clinical study as establishing safety and effectiveness, and the HAF's fitting formula is described as "Apple's proprietary fitting formula." This formula would have been developed using a separate dataset prior to the validation study.

9. How the Ground Truth for the Training Set Was Established

Since the training set size and characteristics are not provided, the method for establishing its ground truth is also not explicitly stated in this document.

However, based on the nature of hearing aid fitting algorithms and the validation study design, it is highly probable that the proprietary fitting formula was developed and refined using:

Large datasets of audiogram data: Likely anonymized audiograms from various sources, potentially including those collected by Apple's own Hearing Test Feature over time or from research collaborations.
Patient-reported outcomes data: To correlate objective audiometric data with subjective patient benefit and preference.
Expert knowledge/models: Incorporating established audiological principles, validated fitting targets (e.g., NAL-NL2, DSL v5), and clinical experience synthesized into an algorithmic form.
Iterative development and testing: The "proprietary fitting formula" would have undergone extensive internal testing, likely including simulations and pilot studies with real users, where the "ground truth" would be established by comparing algorithm outputs to professional recommendations or patient preferences.

Ask a Question

Ask a specific question about this device

K Number

K231173

Device Name

Irregular Rhythm Notification Feature (IRNF)

Manufacturer

Apple Inc.

Date Cleared

2023-07-21

(87 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212516

Predicate For

N/A

Intended Use

The IRNF is a software-only mobile medical application that is intended to be used with the Apple Watch. The feature analyzes pulse rate data to identify episodes of irregular heart rhythms suggestive of atrial fibrillation (AFib) and provides a notification to the user. The feature is intended for over-the-counter (OTC) use. It is not intended to provide a notification on every episode of irregular rhythm suggestive of AFib and the absence of a notification is not intended to indicate no disease process is present; rather the feature is intended to opportunistically surface a notification of possible AFib when sufficient data are available for analysis. These data are only captured when the user is still. Along with the user's risk factors the feature can be used to supplement the decision for AFib screening. The feature is not intended to replace traditional methods of diagnosis or treatment.

The feature has not been tested for and is not intended for use in people under 22 years of age. It is also not intended for use in individuals previously diagnosed with AFib.

Device Description

IRNF 2.0 is comprised of a pair of mobile medical apps - One on Apple Watch and the other on the iPhone.

IRNE 2.0 is intended to analyze pulse rate data collected by the Apple Watch PPG sensor on Apple Watch Series 3-8, Series SE, and Apple Watch Ultra to identify episodes of irregular heart rhythms consistent with AFib and provides a notification to the user. It is a background screening tool and there is no way for a user to initiate analysis of pulse rate data. IRNF 2.0 iPhone App is part of the Health App, which allows users to store, manage, and share health and fitness data, and comes pre-installed on every iPhone.

IRNF 2.0 Watch App refers to the tachogram classification algorithm, confirmation cycle algorithm, and the AF notification generation. If an irreqular heart rhythm consistent with AFib is identified, IRNF 2.0 Watch App will transfer the AFib notification to IRNF 2.0 iPhone App through HealthKit sync. In addition to indicating the finding of signs of AFib, the notification will encourage the user to seek medical care.

IRNF 2.0 iPhone App contains the on-boarding and educational materials that a user must review prior to enabling AFib notifications. IRNF 2.0 iPhone App is designed to work in combination with IRNF 2.0 Watch App and will display a history of all prior AFib notifications. The user is also able to view a list of times when each of the irregular tachograms contributing to the notification was generated.

AI/ML Overview

The provided text describes the Irregular Rhythm Notification Feature (IRNF) 2.0. However, the document provided is a 510(k) summary and clearance letter for a Predetermined Change Control Plan (PCCP) for IRNF 2.0, rather than a standalone study proving the device meets acceptance criteria for initial clearance.

The document indicates that the subject device (IRNF 2.0) is identical to its predicate device (also IRNF 2.0, K212516), with the only difference being the implementation of a PCCP. This PCCP outlines anticipated modifications to the software and the methods for implementing those changes. Therefore, the acceptance criteria and study data for the initial clearance of IRNF 2.0 (K212516) would be the most relevant information, which is not entirely detailed in this document.

However, the PCCP does specify test methods and acceptance criteria that will be used to demonstrate substantial equivalence for future modifications made under the plan. I will extract information primarily related to these future modification criteria and the study that would be performed to meet them.

Here's a breakdown based on the provided text, focusing on the PCCP and what it implies for future studies:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria described here are for future modifications to the algorithm under the PCCP, showing substantial equivalence to the performance of the existing IRNF 2.0. The document does not provide the absolute performance of IRNF 2.0 itself in this section, but rather the performance target for modified algorithms relative to IRNF 2.0.

Category of Change	Acceptance Criteria	Reported Device Performance (as described for future modifications)
Modifications to Tachogram Classification Algorithm	Substantial equivalence in sensitivity and specificity when compared to the performance of IRNF 2.0	To be demonstrated in future validation activities under the PCCP, by meeting the specified substantial equivalence in sensitivity and specificity criteria.
Modifications to Confirmation Cycle Algorithm	Substantial equivalence in positive predictive value relative to IRNF 2.0	To be demonstrated in future validation activities under the PCCP, by meeting the specified substantial equivalence in positive predictive value criteria.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: The document states that for future modifications under the PCCP, "each will meet minimum demographic requirements for age, sex, race, and skin tone derived from the demographics of the United States." It does not specify an exact numerical sample size for the test set.
Data Provenance: The document implies that validation test datasets will be "representative of the intended use population" and mentions "demographics of the United States." This suggests the data will primarily be from the United States. It does not explicitly state whether the data will be retrospective or prospective for these future validation activities.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

The document does not specify the number of experts or their qualifications for establishing ground truth, either for the initial clearance of IRNF 2.0 or for the future modifications under the PCCP.

4. Adjudication Method for the Test Set

The document does not specify an adjudication method (e.g., 2+1, 3+1, none) for the test set, either for the initial clearance of IRNF 2.0 or for the future modifications under the PCCP.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done

The document does not mention a Multi-Reader Multi-Case (MRMC) comparative effectiveness study. The IRNF is described as a "software-only mobile medical application" providing notifications to the user, not a tool for human readers to interpret.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, the document implies that standalone performance studies were done (or will be done for future modifications). The device is described as "software-only" and "analyzes pulse rate data... and provides a notification to the user." The acceptance criteria for future modifications explicitly refer to the algorithm's sensitivity, specificity, and positive predictive value, which are metrics of standalone algorithm performance.

7. The Type of Ground Truth Used

The document does not explicitly state the type of ground truth used (e.g., expert consensus, pathology, outcomes data). In the context of "irregular heart rhythms suggestive of atrial fibrillation (AFib)," the ground truth would typically be established by a gold standard method such as a 12-lead ECG interpreted by a cardiologist, or a continuous ECG monitor.

8. The Sample Size for the Training Set

The document states that for future modifications to the tachogram classification algorithm, the plan is to "retrain algorithm with additional datasets." It does not specify the sample size for the training set, either for the original IRNF 2.0 or for the "additional datasets" mentioned for future retraining.

9. How the Ground Truth for the Training Set Was Established

The document does not specify how the ground truth for the training set was established, either for the original IRNF 2.0 or for future retraining datasets.

Ask a Question

Ask a specific question about this device

K Number

K213971

Device Name

Atrial Fibrillation History Feature

Manufacturer

Apple Inc.

Date Cleared

2022-06-03

(165 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212516

Predicate For

N/A

Intended Use

The Atrial Fibrillation (AFib) History Feature is an over-the-counter ("OTC") software-only mobile medical application intended for users 22 years of age and over who have a diagnosis of atrial fibrillation (AFib). The feature opportunistically analyzes pulse rate data to identify episodes of irregular heart rhythms suggestive of AFib and provides the user with a retrospective estimate of AFib burden (a measure of the spent in AFib during past Apple Watch wear).

The feature also tracks and trends estimated AFib burden over time, and includes lifestyle data visualizations to enable users to understand the impact of certain aspects of their AFib. It is not intended to provide individual irregular rhythm notifications or to replace traditional methods of diagnosis, treatment, or MFib.

The feature is intended for use with the Apple Watch and the Health app on iPhone.

Device Description

The Atrial Fibrillation History Feature (AFib History Feature) is comprised of a pair of mobile medical apps - one on Apple Watch and the other on the iPhone.

The AFib History Feature is intended to analyze pulse rate data collected by the Apple Watch PPG sensor on Apple Watch Series 4. Series 5. and SE to identify episodes of irregular heart rhythms consistent with AFib and provides the user with a retrospective estimate of AFib burden (a measure of the amount of time spent in AFib during past Apple Watch wear).

The AFib History Feature uses PPG pulse rhythm data from compatible Apple Watches. Apple Watch uses green LED lights paired with light-sensitive photodiodes to detect relative changes in the amount of blood flowing through a user's wrist at any given moment. When the heart beats it sends a pressure wave down the vasculature, causing a momentary increase in blood volume when it passes by the sensor. By monitoring these changes in blood flow, the sensor detects individual pulses when they reach the peripherv and thereby measure beat-to-beat intervals.

The AFib History Feature iPhone App is part of the Health App. which allows users to store, manage, and share health and fitness data, and comes pre-installed on every iPhone.

The AFib History Feature provides users visualizations of AFib burden estimate data alongside clinically relevant lifestyle data and presents estimates of AFib burden in three different ways. These visualizations empower users to observe and understand the impact of lifestyle on their AFib burden, and to better understand their condition generally.

· Weekly Estimate an estimate of the amount of time a user was in Atrial Fibrillation over the past calendar week during watch wear, presented to the user as a percentage.
Day of Week Estimate an estimate of the amount of time a user was in Atrial Fibrillation on each day of the week over the previous 42 days during watch wear, presented to the user as a percentage. That is, all Mondays over the past 42 days, all Tuesdays over the past 42 days.
Time of Day Estimate an estimate of the amount of time a user was in Atrial Fibrillation on 4-hour segments of the day over the previous 42 days during watch wear, presented to the user as a percentage. That is, all 12 am - 4 am segments over the past 42 days, all 4 am - 8 am segments over the past 42 days.

The AFib History Feature is intended to serve as an extension of the predicate Irregular Rhythm Notification feature, but has been optimized for users with a diagnosis of Afib.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA document.

The document describes the Atrial Fibrillation History Feature, a software-only mobile medical application intended for users 22 years and older with a diagnosed Atrial Fibrillation (AFib). The feature analyzes pulse rate data from Apple Watch to provide a retrospective estimate of AFib burden.

1. Table of Acceptance Criteria and Reported Device Performance

The document doesn't explicitly list "acceptance criteria" in a definitive table format with pass/fail thresholds. Instead, it presents performance metrics from development and clinical studies for the classification algorithm and for the AFib burden estimation.

Implicit Acceptance Criteria (Derived from Performance Data and Context):

Rhythm Classification Algorithm Performance: High sensitivity and specificity in differentiating AFib from non-AFib rhythms.
AFib Burden Estimation Accuracy: Close agreement between the device's weekly AFib burden estimate and the reference method.

Reported Device Performance:

Metric	Reported Device Performance (AFib History Feature)	Comparator (IRNF 2.0 - Predicate Device)
Rhythm Classification Algorithm (Development Studies)
Sensitivity	97%	79.6%
Specificity	99.0%	99.9%
Rhythm Classification Algorithm (Clinical Validation Study)
Sensitivity	92.6%	85.5%
Specificity	98.8%	99.6%
Weekly AFib Burden Estimation (Clinical Validation Study)
Bland-Altman Limits of Agreement (Lower/Upper 2 SD)	-11.4% and 12.8%	N/A (The predicate device provides irregular rhythm notifications, not AFib burden estimates, so this metric is not directly comparable. The document states the AFib History Feature is an "extension of the predicate Irregular Rhythm Notification feature, but has been optimized for users with a diagnosis of Afib." The performance comparison for the classification algorithm shows how the underlying algorithm was adapted.)
Average difference (device vs. reference)	0.67%	N/A
% of subjects with weekly AFib burden differences within ±5%	92.9% (260/280)	N/A
% of subjects with weekly AFib burden estimates within ±10%	95.7% (268/280)	N/A

2. Sample Size for the Test Set and Data Provenance

The document refers to two main test sets:

Development Studies Test Set: Data used for evaluating the model during development and as a "last test" on a "Sequestration set" (which functions as a final test set after model locking).
- Sample Size: Part of "over 2500 subjects" and "over 3 million pulse rate recordings". The exact number of subjects or recordings specifically in the test/sequestration sets for the classification algorithm performance is not individually quantified beyond being a split of the total development data.
- Data Provenance: Not explicitly stated (e.g., country of origin, specific institutions). It mentions "demographically diverse populations" in recruitment. The studies are described as "development studies," implying they were specifically conducted for the purpose of algorithm development. The data appears to be prospective given it was collected from recruited subjects.
Clinical Validation Study Test Set:
- Sample Size: 413 enrolled subjects for the study. 280 subjects contributed data to the primary endpoint analysis for AFib burden estimation.
- Data Provenance: Not explicitly stated (e.g., country/region). It describes "enrolled subjects wore an Apple Watch and a reference electrocardiogram (ECG) patch concurrently for up to 13 days," indicating a prospective clinical study specifically for this validation.

3. Number of Experts Used to Establish Ground Truth and Qualifications

The document refers to "reference electrocardiogram (ECG) patch concurrently" and "reference weekly burden" for establishing ground truth, both in the development studies and the clinical validation study.

Number of Experts: Not explicitly stated.
Qualifications of Experts: Not explicitly stated. However, the use of "reference electrocardiogram (ECG)" strongly implies that medical professionals (e.g., cardiologists, electrophysiologists) would be involved in reading and interpreting these ECGs to establish the ground truth for AFib presence and burden.

4. Adjudication Method for the Test Set

Adjudication Method: Not explicitly stated. The document refers to "reference electrocardiogram (ECG)" as the ground truth. It's common practice for ECG interpretations, especially in clinical studies, to involve review by multiple experts or an adjudication committee, but the specific method (e.g., 2+1, 3+1) is not detailed.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was an MRMC Comparative Effectiveness Study Done? No, an MRMC study comparing human readers with and without AI assistance was not described. The study focuses on the standalone performance of the device's algorithm against a reference standard (ECG).
Effect Size of Human Reader Improvement: Not applicable, as no MRMC study was performed.

6. Standalone Performance (Algorithm Only)

Was a Standalone Study Done? Yes. The performance metrics presented for both the rhythm classification algorithm and the AFib burden estimation in the development and clinical studies are indicative of the device's standalone (algorithm-only, without human-in-the-loop performance) performance. The device is a "software-only mobile medical application," and its output (AFib burden estimate) is directly compared to the reference standard.

7. Type of Ground Truth Used

Type of Ground Truth: The primary ground truth for both the classification algorithm development/testing and the clinical validation of AFib burden estimation was established using reference electrocardiogram (ECG) data. This is considered a high-fidelity diagnostic standard for cardiac rhythm.

8. Sample Size for the Training Set

Training Set Sample Size: The algorithm was "trained extensively using data collected in a number of development studies. In total, the studies included over 2500 subjects and collected over 3 million pulse rate recordings." The training set was a portion of this total dataset, with the data "split into four sets...: Training, Validation, Test, and Sequestration sets." The exact number of subjects or recordings specifically in the training set is not provided separately.

9. How the Ground Truth for the Training Set Was Established

Ground Truth Establishment for Training Set: "The rhythm classification algorithm uses a convolutional neural network based architecture and was trained extensively using data collected in a number of development studies." Similar to the test sets, the ground truth for rhythms in the training data would also have been established using reference electrocardiogram (ECG) data. The document implies that these development studies also involved the collection of "pulse rate recordings on a variety of rhythms including: atrial fibrillation, normal sinus rhythm, sinus arrhythmia, and other ectopic beats (PVCs, PACs)," which would necessitate ECG interpretation to label these rhythms for training the algorithm.

Ask a Question

Ask a specific question about this device

K Number

K212516

Device Name

IRNF App

Manufacturer

Apple Inc.

Date Cleared

2021-10-22

(73 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN180042

Predicate For

K213971,K231173

Intended Use

The Irregular Rhythm Notification Feature is a software-only mobile medical application that is intended to be used with the Apple Watch. The feature analyzes pulse rate data to identify episodes of irregular heart rhythms suggestive of atrial fibrillation (AFib) and provides a notification to the user. The feature is intended for over-the-counter (OTC) use. It is not intended to provide a notification on every episode of irregular rhythm suggestive of AFib and the absence of a notification is not intended to indicate no disease process is present; rather is intended to opportunistically surface a notification of possible AFib when sufficient data are available for analysis. These data are only captured when the user is still. Along with the user's risk factors the feature can be used to supplement the decision for AFib screening. The feature is not intended to replace traditional methods of diagnosis or treatment.

The feature has not been tested for and is not in people under 22 years of age. It is also not intended for use in individuals previously diagnosed with AFib.

Device Description

Irregular Rhythm Notification Feature 2.0 (IRNF 2.0) is comprised of a pair of mobile medical apps - One on Apple Watch and the other on the iPhone.

IRNF 2.0 is intended to analyze pulse rate data collected by the Apple Watch PPG sensor on Apple Watch Series 3. Series 4. Series 5. and SE to identify exisodes of irreqular heart rhythms consistent with AFib and provide a notification to the user. It is a background screening tool and there is no way for a user to initiate analysis of pulse rate data. IRNF 2.0 iPhone App is part of the Health App. which allows users to store, manage, and share health and fitness data, and comes pre-installed on every iPhone.

IRNF 2.0 Watch App refers to the rhythm classification algorithm, confirmation cvcle algorithm, and the AFib notification generation. If an irreqular heart rhythm consistent with Afib is identified and confirmed through the confirmation cycle, IRNF 2.0 Watch app will notify the user and transfer the AFib notification to the iPhone App through HealthKit sync. In addition to indicating the finding of signs of AFib, the notification will encourage the user to seek medical care.

IRNF 2.0 iPhone App contains the onboarding and educational materials that a user must review prior to use. IRNF 2.0 iPhone App is designed to work in combination with IRNF 2.0 Watch App and will display a history of all prior AFib notifications. The user is also able to view a list of times of the irreqular rhythms contributing to the notification.

AI/ML Overview

The provided text describes the Irregular Rhythm Notification Feature (IRNF) 2.0 app, a software-only mobile medical application for detecting irregular heart rhythms suggestive of Atrial Fibrillation (AFib) using Apple Watch pulse rate data. Below is a detailed breakdown of the acceptance criteria and study proving the device meets them, based on the provided document:

Acceptance Criteria and Reported Device Performance

The document states that the clinical performance of the IRNF 2.0 app was assessed, with specific metrics reported:

Metric	Acceptance Criteria (Implied)	Reported Device Performance
Person-level Sensitivity (AFib)	Not explicitly stated but inferred as non-inferior to predicate	88.6%
Person-level Specificity (AFib)	Not explicitly stated but inferred as non-inferior to predicate	99.3%

Note: The document states "IRNF 2.0 person-level sensitivity (88.6%) and specificity (99.3%) were both demonstrated to be non-inferior to those of the predicate device." While a specific numerical acceptance criterion for sensitivity and specificity isn't explicitly listed, the demonstration of non-inferiority to the predicate device (which had a reported 78.9% sensitivity for concordant AFib and 98.2% for AFib and other clinically relevant arrhythmias) serves as the implicit acceptance benchmark.

Study Details

Here's the information about the study that proves the device meets the acceptance criteria:

2. Sample Size Used for the Test Set and Data Provenance:

Test Set Sample Size: The clinical validation study (referred to as the "clinical study" under 5.7 Clinical Performance) involved 573 participants.
- For the primary endpoint analysis, 432 participants contributed data to determine sensitivity, with 140 of these presenting with AFib.
- 292 participants contributed data to the analysis of device specificity.
Data Provenance: The document does not explicitly state the country of origin of the data. It is implied to be a prospective study, as it involved enrolled subjects wearing an Apple Watch and a reference ECG patch concurrently.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts:

The document does not specify the number or qualifications of experts used to establish the ground truth for the test set. It states that the reference was an "electrocardiogram (ECG) patch concurrently." This implies a medical standard for AFib diagnosis, but the human interpretation component by experts is not detailed.

4. Adjudication Method for the Test Set:

The document does not describe a specific adjudication method (e.g., 2+1, 3+1) for the test set. The ground truth appears to be established directly from the reference ECG patch data.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done:

No, an MRMC comparative effectiveness study involving human readers assisting with or without AI was not described for this device. The study primarily focuses on the standalone performance of the IRNF 2.0 app against a reference standard.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done:

Yes, a standalone performance study was done. The reported sensitivity (88.6%) and specificity (99.3%) refer to the device's performance in identifying irregular heart rhythms suggestive of AFib based on pulse rate data, without human interpretation in the loop for the primary performance metrics. The feature provides a notification to the user, encouraging them to seek medical care, but its performance metrics are of the algorithm only.

7. The Type of Ground Truth Used:

The ground truth for the clinical study was established using a reference electrocardiogram (ECG) patch, worn concurrently by participants. AFib was "identified on the reference ECG patch."

8. The Sample Size for the Training Set:

The document states that the new rhythm classification algorithm "was trained extensively using data collected in a number of development studies." These studies "included over 2500 subjects." While the exact size of the training set itself is not broken out numerically, it's part of this larger "over 2500 subjects" pool. The data was split into Training, Validation, Testing, and Sequestration sets.

9. How the Ground Truth for the Training Set Was Established:

The ground truth for the training set was established from "data collected in a number of development studies" that included "over 3 million pulse rate recordings on a variety of rhythms including: atrial fibrillation, normal sinus rhythm, sinus arrhythmia, and other ectopic beats (PVCs, PACs)."
It's inferred that these "rhythms" were definitively classified to serve as ground truth for training the convolutional neural network. While the exact method of establishing ground truth for the training data (e.g., direct ECG correlation, expert annotation of ECGs) is not explicitly detailed, the mention of specific rhythm types suggests a medically validated classification.

Ask a Question

Ask a specific question about this device

K Number

K201525

Device Name

ECG App

Manufacturer

Apple Inc.

Date Cleared

2020-10-08

(122 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN180044

Predicate For

K243236,K240795

Intended Use

The ECG app is a software-only mobile medical application intended for use with the Apple Watch to create, record, store, transfer, and display a single channel electrocardiogram (ECG) similar to a Lead I ECG. The ECG app determines the presence of atrial fibrillation (AFib), sinus rhythm, and high heart rate (no detected AF with heart rate 100-150 bpm) on a classifiable waveform. The ECG app is not recommended for users with other known arrhythmias.

The ECG app is intended for over-the-counter (OTC) use. The ECG data displayed by the ECG app is intended for informational use only. The user is not intended to interpret or take clinical action based on the device output without consultation of a qualified healthcare professional. The ECG waveform is meant to supplement rhythm classification for the purposes of discriminating AFib from sinus rhythm and is not intended to replace traditional methods of diagnosis or treatment.

The ECG app is not intended for use by people under 22 years old.

Device Description

The ECG 2.0 app comprises a pair of mobile medical apps - one on Apple Watch and the other on the iPhone.

The ECG Watch app analyzes data collected by the integrated electrical sensors on a compatible Apple Watch to generate an ECG waveform similar to a Lead I. calculate average heart rate, and provide a rhythm classification to the user for a given 30 second session. When a user opens the ECG Watch app while wearing the Watch on one wrist, and places the finger of the opposite hand on the digital crown, they are completing the circuit across the heart which begins a recording session.

Once the recording session is complete, the ECG Watch app performs signal processing, feature extraction and rhythm classification to generate a session result.

The resulting classification and average heart rate for the session, along with educational information, will be displayed to the user within the ECG Watch app.

The ECG iPhone app contains the on-boarding and educational materials that a user must review prior to taking an ECG reading. The ECG iPhone app is included in the Health App, which allows users to store, manage, and share health and fitness data, and comes pre-installed on every iPhone. The ECG 2.0 app expands the classifiable heart range, introduces new classification results, and introduces minor, non-userfacing algorithm updates. These changes will be reflected in both the Apple Watch app, and also on the corresponding iPhone app within the Health App.

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the device meets them, based on the provided text:

1. Table of acceptance criteria and the reported device performance

Acceptance Criteria	Device Performance (ECG 2.0 App)
AFib Classification Sensitivity (HR 50-150 bpm)	98.5%
Sinus Rhythm Classification Specificity (HR 50-150 bpm)	99.3%
PQRST Waveform Visual Acceptability	100% pass rating
R-wave Amplitude Assessment	97.2% total pass rating

2. Sample size used for the test set and the data provenance

Sample size: Approximately 546 subjects.
- 305 subjects were in the Atrial Fibrillation cohort.
- 241 subjects were in the normal sinus rhythm cohort.
Data provenance: Prospective, multi-center clinical trial. The country of origin is not explicitly stated, but it is a "multi-center" trial, implying diverse participant recruitment.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of experts: Not explicitly stated, but the ground truth was established by "a cardiologist." This implies at least one, likely a panel or multiple, to ensure robustness, though the exact number isn't quantified.
Qualifications of experts: "Cardiologist." Years of experience are not specified.

4. Adjudication method for the test set

The text states: "Rhythm classification of a 12-lead ECG by a cardiologist was compared to the rhythm classification of a simultaneously collected ECG from the ECG 2.0 app." This indicates that the cardiologist's interpretation of a 12-lead ECG served as the ground truth. It does not explicitly describe an adjudication method like 2+1 or 3+1 if multiple cardiologists were involved. It implies a single definitive classification by the cardiologist.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, and the effect size of how much human readers improve with AI vs without AI assistance

No, a multi-reader multi-case (MRMC) comparative effectiveness study comparing human readers with AI assistance versus without AI assistance was not conducted or reported in the provided text. The study focused on the standalone performance of the ECG 2.0 app against a cardiologist's interpretation.

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done. The reported performance metrics (sensitivity, specificity, waveform acceptability) reflect the algorithm's direct classification capabilities compared to the ground truth established by a cardiologist. The device is intended for over-the-counter use, and its performance in classifying AFib and sinus rhythm was assessed directly.

7. The type of ground truth used

The ground truth used was expert consensus / diagnosis from a cardiologist's interpretation of a 12-lead ECG.

8. The sample size for the training set

The document does not explicitly state the sample size for the training set. It only mentions the test set (clinical trial of 546 subjects).

9. How the ground truth for the training set was established

The document does not explicitly describe how the ground truth for the training set was established. It primarily focuses on the validation study. However, given that it states "Apple conducted database testing using a previously adjudicated dataset" for "ECG Database Testing per EC57," it is highly probable that the training data's ground truth was also established by expert cardiologists adjudicating ECGs in a similar manner to the test set, but this is not explicitly detailed for the training set.

Ask a Question

Ask a specific question about this device

Page 1 of 1