(283 days)
The Hearing Aid Feature is a software-only mobile medical application that is intended to be used with compatible wearable electronic products. The feature is intended to amplify sound for individuals 18 years of age or older with perceived mild to moderate hearing impairment. The Hearing Aid Feature utilizes a self-fitting strategy and is adjusted by the user to meet their hearing needs without the assistance of a hearing healthcare professional. The device is intended for Over-the-Counter use.
The Hearing Aid Feature (HAF) is a software-only device that is comprised of a pair of software modules which operate on two separate required products: (1) HAF iOS Application on a compatible iOS product, and (2) HAF software (i.e., firmware) on the Apple AirPods Pro 2. Refer to Figure I, middle and right, respectively, The AirPods Pro 2, formerly named AirPods Pro (2nd generation), supported this granting and are hereafter simply referred to as "AirPods Pro" in this document.
The HAF iOS Application guides users through the onboarding and setup process for the HAF. The process is self-guided by the user and includes step-by-step instructions and informational content (e.g. warnings, instructions for use). To initiate HAF setup, the user must select a saved audiogram from the iOS HealthKit.
Once the audiogram has been imported by the HAF, the feature will configure the amplification for the user's audiogram based upon Apple's proprietary fitting formula. Once the initial set-up is complete, users can listen with the HAF using the AirPods Pro and refine their settings. Fine tuning is facilitated by user controls on the iOS device that can adjust amplification, tone, and balance. A user can access the fine tuning settings at any time after setting up the HAF.
The HAF settings are transferred to the HAF Firmware Module on the AirPods Pro. The HAF Firmware Module utilizes the general purpose computing platform features of the AirPods Pro, including the microphone, speakers, amplifiers, and audio processing software, to process incoming sound and provide amplification at a specific frequency and gain based on the user's custom settings. The user's custom settings are stored on the HAF Firmware Module and will be available even when the AirPods Pro are not connected to the iOS device.
Acceptance Criteria and Device Performance for Apple's Hearing Aid Feature (HAF)
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criteria (Regulatory Standard) | Reported Device Performance |
---|---|
Output Limits (21 CFR 800.30(d)) | |
1. General output limit (111 dB SPL) | N/A (input-controlled compression device) |
2. Output limit for input-controlled compression (117 dB SPL) | Max OSPL90: 105.93 dB SPL |
Electroacoustic performance limits (21 CFR 800.30(e)) | |
1. Output distortion control limits (Total harmonic distortion + noise ≤ 5%) | Harmonic distortion does not exceed 1% for any test frequency |
2. Self-generated noise level limits (Self-generated noise ≤ 32 dBA) | Max Self-Generated Noise: 28.20 dBA |
3. Latency (Latency ≤ 15 ms) | Median Latency: 3.15 ms |
4. Frequency response bandwidth (Lower cutoff ≤ 250 Hz, upper cutoff ≥ 5 kHz) | Frequency bandwidth: 100 - 10,000 Hz |
5. Frequency response smoothness (No single peak in one-third-octave response > 12 dB relative to average levels of adjacent bands) | All peaks 10 mm from tympanic membrane) |
2. Use of atraumatic materials | AirPods Pro platform verified to use atraumatic patient-contacting materials. |
3. Proper physical fit | Met for AirPods Pro platform, refer to insertion depth verification. |
4. Tools, tests, or software permit lay user control and customization | HAF fitting customized based on input audiogram; three fine-tuning sliders (amplification, tone, balance) for user customization. |
5. User-adjustable volume control | HAF has an amplification fine-tuning slider to adjust volume. |
6. Adequate reprocessing | Adequacy of reprocessing for AirPods Pro platform verified via instructions and design mitigations. |
Clinical Performance - Non-inferiority | |
IOI-HA score of Self-Fit group is no more than 3 points below that of Professionally-Fit group. | FAS/CCAS set: Mean Difference (Pro-Fit - Self-Fit) = 1.17 (SD 3.34), 95% CI (-0.05, 2.39). P-value = 0.0036. Pass. |
PP set: Mean Difference (Pro-Fit - Self-Fit) = 1.23 (SD 3.34), 95% CI (0.01, 2.46). P-value = 0.0050. Pass. | |
Supplemental Clinical Data: Apple Hearing Test Feature Validation | |
HTF derived audiograms' pure-tone average similar to professionally derived audiograms. | Demonstrated similar pure-tone average for HTF derived audiograms as professionally derived audiograms for the same users (n=202). |
Gain values generated by HAF for HTF vs. professionally-derived audiograms are within +/- 5 dB for >90% of differences. | Output gains across all test frequencies were within +/- 5 dB for >98% of gain differences (for subset of n=173 subjects with mild to moderate hearing loss). |
2. Sample Sizes Used for the Test Set and Data Provenance
Bench/Non-Clinical Tests:
- Performance Testing (21 CFR 800.30(d) & (e)): No specific sample size (n) is provided, but the tests refer to "all test frequencies" and compliance with ANSI/ASA S3.22 or ANSI/CTA 2051:2017 clauses. This implies comprehensive testing across the specified parameters, rather than a limited sample.
- Human Factors Formative Testing: 39 subjects.
- Audiogram Input Risk and Mitigation Study: No specific sample size (n) for the study itself, but refers to the Hearing Test Feature (HTF) validation study dataset.
Clinical Study:
- Overall Clinical Study (HAF Self-Fit vs. Professionally-Fit): 118 total participants (59 in Self-Fit group, 59 in Professionally-Fit group for FAS/CCAS; 59 in Self-Fit, 58 in Professionally-Fit for PP analysis).
- Data Provenance: Prospective, non-significant risk study from three sites across the United States.
Supplemental Clinical Data (Apple Hearing Test Feature Validation):
- Comparison of HTF outputs to professionally derived audiograms: n = 202.
- Gain analysis for HAF with HTF vs. professionally-derived audiograms: n = 173 (subset with mild to moderate hearing loss from the n=202 dataset).
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
Bench/Non-Clinical Tests:
- Performance Testing: Ground truth is established by well-defined regulatory standards (21 CFR 800.30) and industry standards (ANSI/ASA S3.22, ANSI/CTA 2051:2017). Expertise is inherent in the test methodologies themselves.
- Human Factors Formative Testing: No explicitly stated "experts" establishing ground truth in the context of diagnostic accuracy. The testing assessed use-related risks, and findings led to software design modifications.
- Audiogram Input Risk and Mitigation Study: No explicitly stated "experts" for ground truth on this specific study, but the study for the Hearing Test Feature Validation (see below) involved professional audiograms which would have been established by qualified audiologists.
Clinical Study (HAF Self-Fit vs. Professionally-Fit):
- Ground Truth for Professional-Fit (PF) Group: The "Professionally-Fit" group had their hearing aids fitted by an audiologist and underwent an optional audiologist fine-tuning session. This implies a number of audiologists (not specified but plural) provided this professional fit, thus establishing a "ground truth" reference for professional care. The study design intrinsically compares the self-fit approach to professional audiologist care, using the latter as the benchmark for a successful fit in terms of patient-perceived benefit.
Supplemental Clinical Data (Apple Hearing Test Feature Validation):
- Comparison to Professionally Derived Audiograms: These would have been established by qualified hearing healthcare professionals, such as audiologists. The exact number of such professionals establishing these audiograms for the 202 subjects is not specified but the term "professionally derived" implies expertise.
4. Adjudication Method for the Test Set
Bench/Non-Clinical Tests:
- Performance Testing: Not applicable; compliance is determined by direct measurement against pre-defined numerical thresholds in regulatory and industry standards.
- Human Factors Formative Testing: Not applicable; the output is identification of use-related risks and subsequent design modifications.
- Audiogram Input Risk and Mitigation Study: Not applicable.
Clinical Study (HAF Self-Fit vs. Professionally-Fit):
- Primary Endpoint (IOI-HA score): Not applicable in the sense of expert adjudication of a diagnostic finding. The primary outcome was a patient-reported outcome measure (IOI-HA score), a subjective assessment collected directly from participants. The comparison was statistical (non-inferiority margin) between the two groups.
- Objective Measures (QuickSIN, REM): These are objective measurements and do not require adjudication.
Supplemental Clinical Data (Apple Hearing Test Feature Validation):
- Audiogram Comparison & Gain Analysis: Not applicable. The comparison was quantitative (pure-tone average, gain differences) between HTF-derived and professionally-derived audiograms.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- No, a traditional MRMC comparative effectiveness study, as typically seen in diagnostic imaging where multiple readers interpret cases with and without AI assistance, was not performed for the core Hearing Aid Feature.
- Instead, a clinical study compared two groups:
- Self-Fit (SF): Users applying the HAF's self-fitting algorithm.
- Professionally-Fit (PF): Users whose devices were fitted by an audiologist using the NAL-NL2 formula.
- This study evaluated the effectiveness of the HAF's self-fitting approach directly against professional care by assessing patient-reported outcomes (IOI-HA) and objective measures (QuickSIN, REM).
- Effect Size of Human Readers Improve with AI vs without AI assistance: This metric is not applicable as the study design was a comparison of a self-fitting AI system against professional human fitting, not a study of human readers improving with AI assistance. The study concluded that the HAF Self-Fit group achieved non-inferior perceived benefit (IOI-HA scores) compared to the Professionally-Fit group, indicating equivalent patient outcomes without the direct involvement of a hearing healthcare professional in the fitting process.
6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study
- Yes, a standalone study was inherently performed to assess the performance of the HAF's self-fitting algorithm.
- The "Self-Fit (SF)" group in the clinical study directly represents the standalone performance of the algorithm. These users utilized the HAF's automatic fitting algorithm and then could adjust amplification, tone, and balance themselves. The device's performance, as measured by IOI-HA scores, QuickSIN, and REM, was attributed to this self-fitting strategy.
- The comparison to the "Professionally-Fit (PF)" group served as a benchmark for what a human expert (audiologist) would achieve.
- Therefore, the clinical study's SF arm is a direct measure of the algorithm's standalone performance in a real-world setting.
7. Type of Ground Truth Used
Bench/Non-Clinical Tests:
- Regulatory and Industry Standards: Ground truth is defined by explicit numerical thresholds and methodologies prescribed by 21 CFR 800.30 and ANSI/ASA/CTA standards.
- Human Factors: "Ground truth" is the identification of potential use errors and associated risks, which is derived from observing user interactions and conducting risk analysis.
Clinical Study (HAF Self-Fit vs. Professionally-Fit):
- Expert Consensus / Professional Practice: The "Professionally-Fit" group, whose devices were fitted by audiologists using a standard clinical fitting formula (NAL-NL2), served as the gold standard or ground truth for best clinical practice in hearing aid fitting. The HAF's performance was evaluated against this professional benchmark.
- Patient-Reported Outcomes (IOI-HA): The primary ground truth for effectiveness was the subjective perception of benefit, satisfaction, and quality of life as reported by the patients themselves via the IOI-HA questionnaire.
- Objective Outcomes Data (QuickSIN, REM): These objective functional measures also served as ground truth regarding speech intelligibility and actual gain.
Supplemental Clinical Data (Apple Hearing Test Feature Validation):
- Expert Consensus / Professional Practice: The comparison was made against "professionally derived audiograms," implying that these audiograms, established by trained hearing healthcare professionals, served as the ground truth.
8. Sample Size for the Training Set
The document does not explicitly state the sample size for the training set used to develop Apple's proprietary fitting formula within the HAF. The description only refers to the clinical study as establishing safety and effectiveness, and the HAF's fitting formula is described as "Apple's proprietary fitting formula." This formula would have been developed using a separate dataset prior to the validation study.
9. How the Ground Truth for the Training Set Was Established
Since the training set size and characteristics are not provided, the method for establishing its ground truth is also not explicitly stated in this document.
However, based on the nature of hearing aid fitting algorithms and the validation study design, it is highly probable that the proprietary fitting formula was developed and refined using:
- Large datasets of audiogram data: Likely anonymized audiograms from various sources, potentially including those collected by Apple's own Hearing Test Feature over time or from research collaborations.
- Patient-reported outcomes data: To correlate objective audiometric data with subjective patient benefit and preference.
- Expert knowledge/models: Incorporating established audiological principles, validated fitting targets (e.g., NAL-NL2, DSL v5), and clinical experience synthesized into an algorithmic form.
- Iterative development and testing: The "proprietary fitting formula" would have undergone extensive internal testing, likely including simulations and pilot studies with real users, where the "ground truth" would be established by comparing algorithm outputs to professional recommendations or patient preferences.
N/A