(324 days)
Natural Cycles is a stand-alone software application, intended for women 18 years and older, to monitor their fertility. Natural Cycles can be used for preventing a pregnancy (contraception) or planning a pregnancy (conception).
Natural Cycles is an over-the-counter web and mobile-based standalone software application that monitors a woman's menstrual cycle using information entered by the user and informs the user about her past, current and future fertility status. The following information is entered into the application by the user:
- daily basal body temperature (BBT) measurements
- information about the user's menstruation cycle (i.e., start date, number of days)
- optional ovulation or pregnancy test results
A proprietary algorithm evaluates the data and returns the user's fertility status.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criteria (Special Control) | Reported Device Performance |
---|---|
1. Clinical performance testing must demonstrate the contraceptive effectiveness of the software in the intended use population. | |
Specificity related to unintended pregnancy rate. | Clinical Study Results (v.3 algorithm, Sept 2017 - Apr 2018): |
- Method Failure Rate: 0.6 per 100 women-years. This means 0.6 out of 100 women using the application for one year get pregnant due to the application incorrectly displaying a green day when the woman is fertile.
- Perfect Use Pearl Index: 1 per 100 women-years. This includes method failures and failures of a chosen contraceptive method on red days.
- Typical Use Pearl Index: 6.5 per 100 women-years (95% CI: 5.9-7.1). This accounts for all possible reasons for pregnancy, including user behavior (e.g., unprotected intercourse on red days, failure of contraceptive method used on red days, and method failure).
Subgroup Analysis (Typical Use PI):
- Recent Hormonal Birth Control use (within 60 days): 8.6 (7.2-10.0)
- No Hormonal Birth Control use (within 12 months): 5.0 (4.3-5.7)
The study enrolled 15,570 women for a total exposure of 7,353 woman-years. 475 pregnancies were observed (584 worst-case). The "Fraction of Days that were Green" was 48.8%. |
| 2. Human factors performance evaluation must be provided to demonstrate that the intended users can self-identify that they are in the intended use population and can correctly use the application, based solely on reading the directions for use for contraception. | A usability study was conducted with (b) (4) users. The study confirmed:
- 98.9% of users were within the intended age range (18-45).
- Analysis of sexual activity on red days: 29% of women had sex on red days. Of these, 49% used condoms, 25% withdrawal, 9% abstention, etc. Only 4% used no protection and took the risk.
- When asked why no protection was used on red days, responses showed that a high percentage understood the directive (e.g., trying to conceive, mistakenly confirming withdrawal, IUD in place, sex not penetrative). Only 2% stated they didn't know red meant fertile.
- Comparison of pregnancy rates between "Prevent Mode" (contraception) users and "Plan Mode" (conception) users demonstrated that users understand the labeling and behave accordingly (low pregnancy rate in Prevent Mode, high in Plan Mode).
- The study was conducted OUS but deemed generalizable to the US population due to similar education levels, user ages, temperature variation, and cycle lengths. |
| 3. Software verification, validation, and hazard analysis must be performed... a. A cybersecurity vulnerability and management process... b. A description of the technical parameters of the software, including the algorithm... | Software Documentation: Major level of concern, with submitted documentation including Software/Firmware Description, Device Hazard Analysis, Software Requirement Specifications, Architecture Design Chart, Software Design Specifications, Traceability, Software Development Environment Description, Revision Level History, and Unresolved Anomalies. - Risk Analysis: Comprehensive, addressing hazards, causes, severity, and control methods.
- Verification & Validation: Acceptable protocols for unit, integration, and system levels provided.
- Cybersecurity: Addressed data confidentiality, integrity, availability, DoS attacks, and malware with controls and evidence of performance.
- Technical Parameters/Algorithm: Full characterization provided, including description of the algorithm that analyzes BBT and menstrual cycle data to detect ovulation and determine fertility status. |
| 4. Labeling must include specific warnings, instructions, and a summary of clinical validation. | Labeling via Instructions for Use manual (downloadable and in-app): - Warnings/Precautions: Included (e.g., no contraceptive method is 100% effective, use another form of contraception on specified days, factors affecting accuracy, cannot protect against STIs).
- Hardware/OS Requirements: Not explicitly detailed in the provided text but implied as part of software documentation.
- Instructions: Identifies and explains how to use, including required user inputs and interpreting outputs.
- Clinical Summary: Provides a summary of the clinical validation study and results, including effectiveness and comparison to other methods. |
2. Sample size used for the test set and the data provenance
- Sample Size for Test Set: 15,570 women
- Data Provenance:
- Country of Origin: 37 countries outside of the United States (OUS), with the majority from Sweden.
- Retrospective or Prospective: Prospective. Women were followed prospectively from September 1, 2017, to April 30, 2018. A retrospective analysis was also conducted to validate ovulation identification.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
The provided text does not specify the number of experts or their qualifications for establishing ground truth for the test set's clinical outcomes (pregnancies).
4. Adjudication method for the test set
The provided text does not specify an explicit adjudication method for the test set. Pregnancies were detected via "pregnancy tests, via email follow-up or via the algorithm." The "worst case" pregnancy count was calculated by assuming pregnancy in women who left early with unknown status or where data indicated possible pregnancy without confirmation. This suggests a blend of user reporting and algorithmic inference for pregnancy detection, but not a formal expert adjudication process for each case.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
No, a multi-reader multi-case (MRMC) comparative effectiveness study was not explicitly described. The study evaluates the device's standalone effectiveness as a contraceptive, which women use independently. It doesn't assess how human healthcare providers improve their diagnostic or decision-making ability with or without the AI's assistance.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
Yes, a standalone study was done. The entire clinical study described (with the Pearl Index calculations) focuses on the Natural Cycles application's performance as a "stand-alone software application" for contraception. Women interact directly with the app, entering data, and the app provides fertility status. The effectiveness rates (method failure, perfect use, typical use Pearl Index) directly reflect the algorithm's performance in real-world use without a healthcare provider actively interpreting the algorithm's output for the user's daily contraceptive decisions.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
The ground truth for the primary outcome (contraceptive effectiveness) was outcomes data: specifically, pregnancy detection. This was identified "via pregnancy tests, via email follow-up or via the algorithm."
For the retrospective analysis on ovulation identification: the ground truth was "ovulation day was correctly identified whether using temperature plus LH test results or just temperature alone." This implies comparison to either a gold standard of combined physiological markers (temperature + LH tests) or an internal standard from the algorithm itself for validation.
8. The sample size for the training set
The provided text does not explicitly state the sample size for the training set used to develop the Natural Cycles algorithm. It mentions that Natural Cycles "utilized real-world data to evaluate the effectiveness of the current version of the algorithm (v.3)," referring to the 15,570 women study as the evaluation of the algorithm, rather than its training.
9. How the ground truth for the training set was established
The provided text does not explicitly describe how the ground truth for the training set was established. It states: "Natural Cycles has provided a full characterization of the technical parameters of the software, including a description of the algorithm that analyzes the patient's basal body temperature and menstrual cycle data to detect the day of ovulation and, by accounting for various sources of uncertainty, to determine the fertility status." This implies a biologically-based ground truth related to ovulation and fertile windows, likely established through extensive physiological research and potentially validated against various methods (e.g., hormone levels, ultrasound, BBT, LH tests) over time. However, the specific methodology for the training data ground truth is not detailed.
§ 884.5370 Software application for contraception.
(a)
Identification. A software application for contraception is a device that provides user-specific fertility information for preventing a pregnancy. This device includes an algorithm that performs analysis of patient-specific data (e.g., temperature, menstrual cycle dates) to distinguish between fertile and non-fertile days, then provides patient-specific recommendations related to contraception.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Clinical performance testing must demonstrate the contraceptive effectiveness of the software in the intended use population.
(2) Human factors performance evaluation must be provided to demonstrate that the intended users can self-identify that they are in the intended use population and can correctly use the application, based solely on reading the directions for use for contraception.
(3) Software verification, validation, and hazard analysis must be performed. Documentation must include the following:
(i) A cybersecurity vulnerability and management process to assure software functionality; and
(ii) A description of the technical parameters of the software, including the algorithm used to determine fertility status and alerts for user inputs outside of expected ranges.
(4) Labeling must include:
(i) The following warnings and precautions:
(A) A statement that no contraceptive method is 100% effective.
(B) A statement that another form of contraception (or abstinence) must be used on days specified by the application.
(C) Statements of any factors that may affect the accuracy of the contraceptive information.
(D) A warning that the application cannot protect against sexually transmitted infections.
(ii) Hardware platform and operating system requirements.
(iii) Instructions identifying and explaining how to use the software application, including required user inputs and how to interpret the application outputs.
(iv) A summary of the clinical validation study and results, including effectiveness of the application as a stand-alone contraceptive and how this effectiveness compares to other forms of legally marketed contraceptives.