Search Results
Found 4 results
510(k) Data Aggregation
(242 days)
CHLOE BLAST is indicated to provide adjunctive information on events occurring during embryo development that may predict further development to the blastocyst stage on Day 5 of development. This adjunctive information aids in the selection of embryo(s) for transfer on Day 3, when, following morphological assessment, there are multiple embryos deemed suitable for transfer or freezing.
CHLOE BLAST is to be used only for the analysis of images captured by the EmbryoScope version D incubator system.
CHLOE BLAST is a decision support tool designed to automatically analyze time lapse videos of developing embryos, retrieved from EmbryoScope (version D) Time Lapse Incubators (TLI) system. It is intended to provide adjunctive information on developmental events up to Day 3 that may predict progression to the blastocyst stage by Day 5.
CHLOE BLAST is a cloud-based software as a medical device (SaMD) that uses a convolutional neural network (CNN) to analyze TLI videos from insemination to Day 3. The output is the "CHLOE Score", which is a blastocyst development prediction value associated with the likelihood of the embryo reaching blastocyst stage at Day 5.
This information aids in the selection of embryo(s) for transfer on Day 3, when, following morphological assessment, there are multiple normally fertilized embryos deemed suitable for transfer or freezing. In a clinical setting, the CHLOE score is intended to be used by the embryologist as adjunctive information, to be used only after the embryologists complete their independent morphological assessments based on the lab's standard of care (e.g., Istanbul Consensus Grading).
The main user interaction is via the graphic user interface (GUI) available via Chrome browsers. It includes screens for treatments overview, manual embryo assessment, and score presentation, and integrates with the day-to-day normal operation in IVF clinics using TLI.
Here's a breakdown of the acceptance criteria and the study proving the device meets those criteria, based on the provided FDA clearance letter:
Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance
Note: The document presents acceptance criteria primarily as "AUC lower bound >0.8" for various performance metrics. It also establishes an Odds Ratio (OR) greater than 1 as the primary endpoint for clinical utility.
| Metric / Test | Acceptance Criterion | Reported Device Performance | Meets Criterion? |
|---|---|---|---|
| Non-Clinical Performance - Algorithm Validation | |||
| Morphokinetic Events Detection Accuracy (Overall) | N/A (Accuracy reported, not AUC) | 0.82 (95% CI: 0.81, 0.84) | N/A |
| Morphokinetic Events Detection Accuracy (2PNs) | N/A (Accuracy reported, not AUC) | 0.84 (95% CI: 0.83, 0.85) | N/A |
| Morphokinetic Events Detection (Overall AUC) | AUC lower bound >0.8 | N/A (Accuracy reported, not AUC for overall) | Yes (Implicitly, as sub-model AUCs are mentioned in relation to this criterion) |
| Morphokinetic Events Detection (2PNs Sub-model AUC) | AUC lower bound >0.8 | 0.84 (95% CI: 0.83, 0.85) - This appears to be the accuracy value, not AUC. The text states "Accuracy of the sub-model... was 0.84". However, it immediately follows the criterion "AUC lower bound >0.8 were met." This is a slight inconsistency in the document's reporting. Assuming the 0.84 is indeed AUC, then: Yes | Yes (Assuming 0.84 refers to AUC) |
| Morphokinetic Events Detection (Sub-groups: Age <35, 41≤) | AUC lower bound >0.8 | Not met (Performance was not consistent, indicating some subgroups might not have met the criterion, though specific AUC values for these subgroups are not provided) | No (Stated in text) |
| Morphokinetic Events Detection (Sub-groups: Underweight, Obese BMI) | AUC lower bound >0.8 | Not met (Performance was not consistent, indicating some subgroups might not have met the criterion, though specific AUC values for these subgroups are not provided) | No (Stated in text) |
| Blast Prediction (Overall AUC) | AUC lower bound >0.8 | 0.88 (95% CI: 0.86, 0.90) | Yes |
| Blast Prediction (All Subgroups except Obese BMI) | AUC lower bound >0.8 | AUC similar and higher than 0.8 | Yes |
| Blast Prediction (Obese BMI Subgroup AUC) | AUC lower bound >0.8 | Not met (However, specific AUC for this subgroup is not provided, only that it "was not met") | No (Stated in text) |
| Blast Prediction (2PN embryos AUC) | N/A (Reduction in AUC observed, but no specific criterion for this subgroup) | 0.81 (95% CI: 0.78, 0.83) | N/A (But still > 0.8) |
| Blast Prediction (Good/Fair embryos AUC) | N/A (Reduction in AUC observed, but no specific criterion for this subgroup) | 0.74 (95% CI: 0.69, 0.78) | N/A (Lower than 0.8, but explanation given for clinical study focusing on this subgroup) |
| Non-Clinical Performance - Reproducibility Test | |||
| AUC with Optical Augmentations | AUC lower bound >0.8 | All AUCs > 0.89, CI lower bound > 0.87 | Yes |
| Clinical Performance - Primary Endpoint | |||
| Odds Ratio (OR) for Good/Fair Embryos (CHLOE-assisted) | OR > 1 | 5.67 (95% CI: 4.6, 6.99) | Yes |
| Clinical Performance - Secondary Endpoints (Highlights) | |||
| OR for All Embryos (CHLOE-assisted) | N/A (Secondary endpoint) | 8.51 (95% CI: 6.97, 10.38) | N/A |
| Sensitivity (CHLOE-assisted) | N/A (Performance measure) | 0.846 | N/A |
| Specificity (CHLOE-assisted) | N/A (Performance measure) | 0.444 | N/A |
| PPV (CHLOE-assisted) | N/A (Performance measure) | 0.629 | N/A |
| NPV (CHLOE-assisted) | N/A (Performance measure) | 0.721 | N/A |
| OR for Individual Embryologists (CHLOE-assisted) | OR > 1 | Improved and > 1 for all embryologists | Yes |
| OR in Subgroups (Age and BMI) (CHLOE-assisted) | OR > 1 | OR > 1 in all subgroups (lower bound of CI > 1 in all but one age and one BMI category) | Yes (Mostly) |
| Subject-level Sensitivity (CHLOE-assisted) | N/A (Performance measure) | 87.50% to 92.86% | N/A |
| Top 2 Embryo Analysis OR (CHLOE-assisted) | N/A (Performance measure) | 10.73 (95% CI: 6.19, 18.60) | N/A |
Study Details Proving Device Meets Acceptance Criteria
2. Sample Sizes and Data Provenance
- Non-Clinical Performance (Algorithm Validation):
- Morphokinetic Events Detection: 1,094 embryos from 143 slides. Collected from two sites: one in the US and one in Norway. The data provenance is retrospective, as it's a "test dataset... entirely independent from the dataset utilized in the CHLOE BLAST clinical study."
- Blast Prediction: 1,726 embryos from 233 slides. Collected from two sites: one in the US and one in Norway. The data provenance is retrospective.
- Clinical Performance (CHLOE BLAST Clinical Study):
- 703 embryos from 59 mothers.
- Data collected from three different sites located in the United States.
- Data provenance: Prospective collection for the purpose of this study (described as a "pivotal, multicenter, single arm, observational, prospective assessment study").
3. Number of Experts and Qualifications for Ground Truth
- Non-Clinical Performance (Algorithm Validation):
- Morphokinetic Stages and Blast Annotations: Three independent embryologists.
- Qualifications: "The annotators were not involved in the training or tuning of the model and were blinded to each other's labels." No explicit years of experience are stated for these annotators.
- Clinical Performance (CHLOE BLAST Clinical Study):
- Morphology Grading (Assessors): Three embryologists.
- Qualifications: "blinded to CHLOE information," and performed grading according to SART standards. No explicit years of experience are stated.
- Clinical Assessment (Panelists): Five independent embryologists.
- Qualifications: All "in practice during the study period and from a range of geographical areas within the United States." 3 were senior embryologists with over 10 years of clinical embryology experience each, and the other 2 were junior embryologists with less than 3 years of clinical embryology experience.
4. Adjudication Method for the Test Set
- Non-Clinical Performance (Algorithm Validation):
- Ground Truth: "Each embryo video was viewed by three independent embryologists who provided their morphokinetic stages and Blast annotations based on the time-lapse videos. The annotators were not involved in the training or tuning of the model and were blinded to each other's labels." It implies a consensus-based approach, but directly states, "The TLI videos were annotated at a frame level with the ground truth of one of the morphokinetic stages and at a video level with blastulation results."
- Clinical Performance (CHLOE BLAST Clinical Study):
- Morphology Grading (Assessors): "Then, the following parameters were categorized by majority agreement (at least 2 of 3 Assessors): Severe asymmetry (yes/no), Fragmentation > 25% (yes/no), Number of cells (1 through 8, 9≤)." This is a clear 2 out of 3 (2+1) consensus method for specific parameters.
- Clinical Assessment (Panelists): No explicit adjudication method is stated for the Panelists' predictions. Each Panelist performed their own independent predictions, and the study analyzed the collective performance as well as individual improvements.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Yes, an MRMC comparative effectiveness study was done as part of the clinical performance study.
- Effect Size of Human Readers Improvement with AI vs. without AI assistance:
- The primary endpoint focused on the Odds Ratio (OR) for predicting blastocyst formation in Good/Fair embryos.
- Without AI assistance (Morphology Only): OR = 3.77 (95% CI: 2.97, 4.79)
- With AI assistance (Morphology + CHLOE Score): OR = 5.67 (95% CI: 4.6, 6.99)
- This represents an improvement in the Odds Ratio from 3.77 to 5.67 for the primary endpoint.
- For all embryos, the OR improved from 6.93 (without CHLOE) to 8.51 (with CHLOE).
- For subject-level sensitivity, it improved from 80.36%-83.93% (traditional morphology) to 87.50%-92.86% (with CHLOE).
- For Top 2 Embryo analysis, the OR improved from 3 (without CHLOE) to 10.73 (with CHLOE).
6. Standalone (Algorithm Only without Human-in-the-Loop) Performance
- Yes, a standalone performance assessment was done as part of the "Non-Clinical Performance – Algorithm Validation" section.
- The algorithm's performance in predicting blastocyst formation was assessed independently, yielding an AUC of 0.88 (95% CI: 0.86, 0.90). This demonstrates the algorithm's capability on its own.
7. Type of Ground Truth Used
- For Non-Clinical Performance (Algorithm Validation):
- Expert Consensus: Morphokinetic stages and blast annotations were established by three independent embryologists.
- Outcomes Data: The "blastulation results" (blastocyst Yes/No) are actual outcomes.
- For Clinical Performance (CHLOE BLAST Clinical Study):
- Expert Consensus: Morphology grading by three "Assessors" with majority agreement (2 out of 3).
- Outcomes Data: The "actual blastocyst outcome" (Yes/No) which the algorithm and human readers are predicting.
8. Sample Size for the Training Set
- The document states: "The study dataset included data collected specifically for the purpose of this study according to the predefined inclusion and exclusion criteria and was segregated from algorithm training and verification datasets."
- "The dataset used for the performance test was entirely independent from the dataset utilized in the CHLOE BLAST clinical study described in section 9, and the clinics that provided data for the performance dataset were not used to collect data for the clinical study."
- The specific sample size for the training set is NOT PROVIDED in this document. It only clearly states that the various test sets were independent from the training data.
9. How Ground Truth for the Training Set Was Established
- The document implies that the training data exists and was used to develop the CNN, but it does NOT specify how the ground truth for the training set was established. It only focuses on how ground truth was established for the independent testing and clinical validation sets.
Ask a specific question about this device
(290 days)
The KIDScore D3 tool provides decision support for prediction of embryos developing to the blastocyst stage by scoring them according to their statistical viability.
Adjunctive information provided by KIDScore D3 aids in the selection of embryo(s) for either transfer on Day 3, freezing or continued embryo development when, following morphological assessment on Day 3, there are multiple embryos deemed suitable for transfer or freezing.
The KIDScore D3 tool is only to be used with the EmbryoScope timelapse incubator systems.
The KIDScore D3 decision support tool is an adjunctive algorithm that is designed to support embryologists in their decision about which embryos are suitable for transfer. The tool is an optional accessory to the EmbryoViewer software. It is used in the "Compare & Select" function. The "D3" in the name refers to the use of the algorithm on Day 3 for aiding the embryologist in preparing for transfer of the embryo to the female patient.
KIDScore D3 utilizes the following manually annotated parameters to aid in identifying embryos that are suitable for transfer:
- Pronuclei (number of pronuclei): ●
- tPNf (time from insemination until pronuclei is fading) ●
- t2 (time from insemination to complete division to two cells) ●
- t3 (time from insemination to complete division to three cells) ●
- t4 (time for insemination to complete division to four cells) .
- t5 (time from insemination to complete division to five cells) ●
- t8 (time from insemination to complete division to eight cells) ●
The KIDScore D3 assigns scores by comparing the parameters above in embryos to the model criteria, one criterion at a time until the process stops either because the embryo did not pass one of the criteria in the sequence or because the last criterion in the model was reached. From the information available at day three of incubation, the KIDScore D3 divides embryos into five score groups (1-5, as described below):
- 0 = The embryo is not 2PN
- 1 = Initial development was too fast or the embryo displayed a direct cleavage from one to three cells
- 2 = The embryo was slow to develop
- 3 = Embryo development was irregular and the development pace increased from day two to day three
- 4 = Embryo development was irregular and the development pace slowed from day two to day three
and/or
The number of cells annotated at 66 hours was not as expected
5 = The embryo passed all of the avoidance criteria included in the model.
One or more computers running the EmbryoViewer software may be connected to the ES Server. KIDScore D3 is stored on the computer running the ES Server software. Calculations related to the model in KIDScore D3 are performed on the computer running the ES Server software.
This document describes the KIDScore D3 device, an adjunctive algorithm designed to support embryologists in selecting suitable embryos for transfer by predicting their likelihood of developing to the blastocyst stage.
Here's an analysis of the acceptance criteria and the study proving the device meets them:
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are implicitly defined by the primary endpoint of the clinical study, which required the blastocyst Odds Ratio (OR) for the adjunct prediction to be statistically significantly greater than 1 for Good/Fair embryos (graded A, B, or C using Day 3 morphology).
| Acceptance Criteria | Reported Device Performance (KIDScore D3) | Reported Device Performance (Eeva System - Predicate) |
|---|---|---|
| Blastocyst Odds Ratio (OR) for adjunct prediction (for Good/Fair embryos) statistically significantly greater than 1 | 4.13 | 2.57 |
| 95% Confidence Interval for OR | 3.48 - 4.9 | 1.88 - 3.51 |
| P-value for OR | <0.0001 | <0.0001 |
Conclusion: The KIDScore D3 device’s reported performance (OR of 4.13 with a p-value <0.0001) significantly exceeds the acceptance criterion, demonstrating its ability to predict blastocyst outcome when used as an adjunct to morphology.
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 81 patients, including their embryos (number of embryos not explicitly stated for the test set, but mentioned as "embryos from 81 patients").
- Data Provenance: Retrospective. The data was a subset taken from a total collection of 4152 embryos from 1338 treatments carried out in European IVF clinics between 2009 and 2014.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
The document does not explicitly state the number of experts used to establish the ground truth for the test set nor their specific qualifications (e.g., years of experience). It mentions:
- "Embryologists were masked to imaging data and evaluation was only based on morphology and KIDScore D3 scores." This implies embryologists were involved in the morphological grading, which is part of the overall assessment.
- The "adjunct prediction of blastocyst outcome and the actual blastocyst outcome" were assessed. This "actual blastocyst outcome" serves as the ground truth.
4. Adjudication Method for the Test Set
The document does not explicitly describe an adjudication method (like 2+1 or 3+1) for resolving discrepancies in ground truth establishment for the test set. It states that "Embryologists were masked to imaging data and evaluation was only based on morphology and KIDScore D3 scores," which suggests individual assessments were made.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done, If So, What Was the Effect Size of How Much Human Readers Improve with AI vs Without AI Assistance
No, a multi-reader multi-case (MRMC) comparative effectiveness study evaluating human readers' improvement with and without AI assistance was not explicitly conducted or reported. The study focused on the device's ability to predict blastocyst outcome as an adjunct to morphology, and its performance was compared to the predicate device, not necessarily against human readers in an assisted vs. unassisted paradigm for their performance improvement. The study evaluated "the utilization of established morphology methods with adjunct outcome of an algorithm (KIDScore D3)" and concluded that "the adjunct use of KIDScore D3 improved the selection of embryos for transfer compared with morphology alone." However, it does not quantify the effect size of how much human readers themselves improved.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was Done
Yes, a standalone performance of the algorithm was implicitly evaluated. The primary endpoint, "the association between the adjunct prediction of blastocyst outcome and the actual blastocyst outcome," assesses the KIDScore D3's predictive capability in conjunction with morphology. While it's adjunctive information, the study's results (Odds Ratio for KIDScore D3) clearly reflect the algorithm's standalone predictive power within the context of the study design. It was evaluated for its ability to predict which embryos are most likely to develop to blastocyst stage. The phrase "Embryologists were masked to imaging data and evaluation was only based on morphology and KIDScore D3 scores" means the KIDScore D3 scores were provided directly to the embryologists as part of their evaluation, indicating the algorithm's output was central to the assessment.
7. The Type of Ground Truth Used
The ground truth used was actual blastocyst outcome (i.e., whether an embryo developed to the blastocyst stage). This can be considered a form of outcome data directly observed during the IVF process.
8. The Sample Size for the Training Set
The document does not explicitly state the sample size for the training set. It mentions the "data included embryos from 81 patients" for the clinical study (test set), and that this "data are a subset taken from a total collection of 4152 embryos from 1338 treatments where all sibling embryos have been annotated for the morphokinetic events required by KIDScore D3." This larger collection of 4152 embryos is likely the source from which the algorithm was developed and presumably trained, but the exact training set size is not segregated.
9. How the Ground Truth for the Training Set Was Established
The document does not explicitly state how the ground truth for the training set (if it was a separate, formally defined set) was established. However, given the nature of the device, it's highly probable that similar to the test set, the ground truth for the training data (from the "total collection of 4152 embryos") would have been established by observing whether the embryos developed to the blastocyst stage, potentially combined with expert morphological assessment done during the standard clinical practice at the time the data was collected (2009-2014). The manual annotation of parameters like pronuclei, tPNf, t2, t3, t4, t5, and t8 would have been linked to these observed blastocyst outcomes to develop the KIDScore D3 model.
Ask a specific question about this device
(111 days)
The Eeva System is indicated to provide adjunctive information on events occurring during the first two days of development that may predict further development to the blastocyst stage on Day 5 of development. This adjunctive information aids in the selection of embryo(s) for transfer on Day 3 when, following morphological assessment on Day 3, there are multiple embryos deemed suitable for transfer or freezing. The device may also be used to collect additional time-lapse images until Day 5 of development for embryos not selected for transfer, to allow monitoring of continued embryo development.
The Eeva™ System is an Assisted Reproduction Embryo Image Assessment System (21 CFR 884.6195), installed in an IVF lab and used by embryologists and other IVF professionals. None of the System components have an individual, prior 510(k) clearance. Eeva System, Model EVS210 requires the use of the 12-microwell configuration of the Eeva™ Dish (K141663, also referred to as the "dish"), which is placed on the Eeva Scope (an assisted reproductive microscope). The Eeva Scopes are placed in commercially-available standard-sized incubators. The microscope employs high resolution time-lapse imaging to record an embryo's development during its first two days of incubation. Automated measurements of cell division timing parameters and the Eeva Test results are provided to the user after approximately 42 hours predicting the likelihood of whether an embryo will develop to the blastocyst stage. In Eeva System, Model EVS2210, image recording may continue through Day 5 of embryo development.
The Eeva™ System (EVS2210) provides adjunctive information on early embryo development (first two days) to predict progression to the blastocyst stage by Day 5. This information assists in selecting embryos for transfer on Day 3, especially when multiple suitable embryos are identified through morphological assessment.
1. Table of Acceptance Criteria and Reported Device Performance
The provided document does not explicitly list acceptance criteria for specific performance metrics (like sensitivity, specificity, PPV, NPV) with predefined thresholds. However, it states that "simulated clinical testing (mechanical analysis) demonstrates that the Eeva System Model EVS2210 is informative, and the average specificity, sensitivity, positive predictive value, and negative predictive value performance are substantially equivalent in the adjunctive use of the subject and predicate devices." This implies that the performance of the Eeva System (EVS2210) closely matched that of its predicate device, Eeva System (EVS2000).
The "Algorithm Software Validation" and "Simulated Clinical Use" tests aimed to evaluate the ability of the Eeva System software to predict blastocyst formation and its clinical performance, including determination of sensitivity, specificity, positive predictive value, negative predictive value, and odds ratio.
Since the document asserts substantial equivalence, the implied acceptance criteria are that the EVS2210's performance metrics (sensitivity, specificity, PPV, NPV, odds ratio) should be comparable to or not worse than those of the predicate device (EVS2000).
| Metric | Acceptance Criteria (Implied, relative to predicate) | Reported Device Performance (Implied) |
|---|---|---|
| Blastocyst Prediction | "informative" and "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
| Sensitivity | "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
| Specificity | "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
| Positive Predictive Value | "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
| Negative Predictive Value | "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
| Odds Ratio | "substantially equivalent" to predicate device EVS2000 | Met, based on substantial equivalence claim |
2. Sample size used for the test set and the data provenance
The document mentions "Simulated Clinical Use" for evaluation but does not explicitly state the sample size used for the test set or the data provenance (e.g., country of origin, retrospective/prospective). It refers to "clinical data submitted for the predicate device" as being "representative of expected safety and effectiveness of the Eeva System Model EVS2210," but this doesn't specify if new data was used for the EVS2210's simulated clinical use or if it entirely leveraged the predicate's data.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
The document does not specify the number or qualifications of experts used to establish the ground truth for the "Simulated Clinical Use" test set.
4. Adjudication method for the test set
The document does not describe any adjudication method used for the test set.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
The document does not mention a multi-reader multi-case (MRMC) comparative effectiveness study, nor does it describe an effect size for human reader improvement with or without AI assistance. The device provides "adjunctive information" to aid embryologists, suggesting it's intended for human-in-the-loop use, but a formal MRMC study is not detailed.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
The "Simulated Clinical Use" is described as evaluating "clinical performance of the Eeva System software including determination of sensitivity, specificity, positive predictive value, negative predictive value, and odds ratio." This suggests a standalone evaluation of the algorithm's performance in predicting blastocyst formation. The device provides "adjunctive information," implying its output is then used by an embryologist. Therefore, a standalone evaluation of the software's predictive capability appears to have been performed.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
The ground truth used for the "Algorithm Software Validation" and "Simulated Clinical Use" tests was the "blastocyst formation." This is an objective biological outcome (whether an embryo develops to the blastocyst stage by Day 5).
8. The sample size for the training set
The document does not provide information regarding the sample size for the training set used for the Eeva System's algorithm.
9. How the ground truth for the training set was established
The document does not provide information on how the ground truth for the training set was established. It only mentions that the device evaluates cell division timing parameters to predict blastocyst formation.
Ask a specific question about this device
(651 days)
The Eeva System is indicated to provide adjunctive information on events occurring during the first two days of development that may predict further development to the blastocyst stage on Day 5 of development. This adjunctive information aids in the selection of embryo(s) for transfer on Day 3 when, following morphological assessment on Day 3, there are multiple embryos deemed suitable for transfer or freezing.
The Eeva System provides image recording and automated analysis of cell division from high resolution time-lapse images collected until day 3 (72 hours) of development. Results of cell division timing parameters (time from first to second mitosis: and time from second to third mitosis) are provided to the user in addition to a prediction of the likelihood that an embryo will develop to the blastocyst stage. These timing parameters are based on those published in a study by Wong, et. al. (2010).
The Eeva System incorporates: (1) a set of up to four time-lapse image microscopes that automatically take darkfield microscopy images of embryos at regular intervals (every 5 minutes) while the embryos remain in the incubator environment, (2) Eeva Computer and other components (Control Box, Station, Scope Screen and Printer), (3) system software for image capture and recording, user interface, and patient database and (4) image analysis software that automatically identifies embryo development events, compares their times to specified timing parameters and makes a prediction of embryo development to the blastocyst stage. The system is installed in an In Vitro Fertilization (IVF) laboratory, and is to be used as an adjunct to the traditional morphological method to identify the embryos that are more likely to develop into blastocysts.
The Eeva™ System is indicated as an adjunct to traditional morphology evaluation to aid in the selection of embryo(s) for transfer on Day 3 when multiple embryos are deemed suitable for transfer or freezing. It helps predict the likelihood of an embryo developing to the blastocyst stage on Day 5/6.
1. Table of Acceptance Criteria & Reported Device Performance:
The document primarily focuses on clinical performance characteristics rather than specific hard-coded acceptance thresholds for every metric. However, for "Software Validation," a clear acceptance criterion is defined and met. For the primary and secondary endpoints in the Pivotal Adjunct Use Study, the outcome of statistical significance against stated objectives serves as the "acceptance."
| Metric | Acceptance Criteria | Reported Device Performance |
|---|---|---|
| Software Validation | Specificity of Eeva System software non-inferior to embryologist measurements; Lower limit of 95% CI for specificity of Eeva System software ≥ 65%. | Eeva System specificity: 85.12%. Embryologist specificity: 82.64%. Lower limit of 95% CI for Eeva specificity: 77.71%. Met. |
| Pivotal Adjunct Use Study (Primary Endpoint) | Blastocyst Odds Ratio (OR) for adjunct prediction (for Good/Fair embryos) statistically significantly greater than 1. | Overall OR for adjunct prediction: 2.56 (95% CI: [1.75, 3.74], p<.0001) using pre-specified analysis; 2.57 (95% CI: [1.88, 3.51]) using GLMM. Both significantly greater than 1. Met. |
| Pivotal Adjunct Use Study (Secondary Endpoints - Accuracy Measures) | Improve specificity and maintain acceptable sensitivity while showing improved PPV and comparable NPV/NLR. The primary goal was to enhance selection among already good/fair embryos. | Specificity (from morphology alone to adjunct): 39% to 76% (37% improvement). Sensitivity (from morphology alone to adjunct): 72% to 45% (27% decrease). PPV (from morphology alone to adjunct): 43% to 54% (11% improvement). NPV (from morphology alone to adjunct): 68% to 68% (no change). PLR (from morphology alone to adjunct): 1.21 to 1.86 (improvement). NLR (from morphology alone to adjunct): 0.73 to 0.73 (no change). Overall, the results indicate improved selection effectiveness where intended. |
| Simulated Use (System Operation) | The Eeva System shall operate successfully in a simulated use procedure and incubator door opening shall not impact image capture or embryo prediction. | Pass |
| Algorithm Reproducibility | The software must generate repeatable outputs across multiple Eeva Systems, given the same set of input image data. | Pass |
2. Sample Sizes and Data Provenance:
- Test Set (Pivotal Adjunct Use Study):
- Number of Subjects: 54 subjects.
- Number of Embryos: Not explicitly stated as a total count for the test set, but it refers to "a subject's cohort of embryos" and "multiple embryos per subject."
- Data Provenance: Prospective, multi-center clinical study conducted at five sites in the United States. Data was collected on embryos cultured to cleavage stage (Day 3) or blastocyst (Day 5/6) stage. This was a non-interventional study, meaning the Eeva output was not used for real-time patient management.
3. Number of Experts and Qualifications for Ground Truth:
- Software Validation Phase (for Eeva System Software's ability to predict blastocyst formation):
- Number of Experts: A panel of three embryologists.
- Qualifications: "Embryologists" are mentioned, with no further specific details on their years of experience or board certification. Their role was to review image series and identify start/stop times of development parameters.
- Pivotal Adjunct Use Study (for morphological and adjunct assessments):
- Number of Experts: A panel of 5 clinical embryologists.
- Qualifications: "Currently in practice, representing a range of geographical areas and level of experience." No further specific expertise (e.g., years of experience) is provided.
4. Adjudication Method for the Test Set:
- Software Validation: The document states that a panel of three embryologists reviewed the image series to identify start/stop times. The results of their measurements were compared to the Eeva System measurements. This implies a comparison against the consensus or individual expert findings, but it doesn't explicitly detail a formal adjudication method (e.g., majority vote, or a specific process for resolving disagreements between the three). It presents the embryologists' aggregate specificity for comparison.
- Pivotal Adjunct Use Study: For the morphological and adjunct assessments, each of the five panelists independently provided their assessments and embryo selections. There is no mention of an adjudication method to establish a single "expert ground truth" for these assessments among the five panelists. Instead, the study analyzed the impact of the Eeva System on each panelist's predictions and selections, and then reported overall odds ratios and performance metrics that appear to aggregate or model across the panelists' individual responses. The primary endpoint analysis used complex statistical procedures (pre-specified method and GLMM) to address the "complex structure of the data (multiple embryos per subject, five panelists evaluating all embryos by both traditional morphology and sequential adjunctive use of Eeva)."
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- Yes, a form of MRMC was done. The "Pivotal Adjunct Use Study" serves as this, comparing human readers (embryologists) with and without AI assistance (Eeva System). The "readers" are the 5 clinical embryologists.
- Effect Size:
- Odds Ratio for Blastocyst Prediction:
- Traditional Morphology: 1.66 (95% CI: [0.78, 3.51]) by pre-spec analysis; 1.68 (95% CI: [1.29, 2.19]) by GLMM.
- Adjunct Prediction (with Eeva): 2.56 (95% CI: [1.75, 3.74], p<.0001) by pre-spec analysis; 2.57 (95% CI: [1.88, 3.51]) by GLMM.
- Improvement: The odds of an embryo forming a blastocyst were 2.56-2.57 times higher with adjunct use compared to traditional morphology alone (1.66-1.68). This is a substantial improvement in the informativeness of the prediction.
- Specificity: Improved by 37% (from 39% to 76%) with adjunct prediction.
- Positive Predictive Value (PPV): Improved by 11% (from 43% to 54%) with adjunct prediction.
- Positive Likelihood Ratio (PLR): Improved (from 1.21 to 1.86) with adjunct prediction.
- Note on Sensitivity: Sensitivity decreased by 27% (from 72% to 45%). However, the document argues this trade-off is expected and acceptable because the Eeva System's role is to aid in selecting among already morphologically suitable embryos, making specificity and PPV more critical for this use case.
- Odds Ratio for Blastocyst Prediction:
6. Standalone (Algorithm Only) Performance Study:
- Yes, in the "Software Validation" phase. This phase directly assessed the Eeva System Software's ability to predict blastocyst formation independently.
- The Eeva System software’s specificity was 85.12%, compared to the embryologists’ 82.64%. The lower limit of the 95% CI for Eeva was 77.71%. This demonstrates the algorithm's standalone prediction capability against expert "measurements."
- The document states, "The results of the embryologists' measurements were compared to the Eeva System measurements to validate the software." This indicates a direct comparison of the software's output against the expert observations.
7. Type of Ground Truth Used:
- For Blastocyst Formation: The ultimate ground truth for blastocyst formation was the actual biological outcome – whether the embryo progressed to the blastocyst stage (Day 5/6) or arrested. This is an objective outcome based on embryo development.
- For Software Validation of Timing Parameters: The ground truth for validating the software's ability to identify cell division start/stop times was based on the measurements/observations of the panel of three embryologists.
- For Clinical Study Assessments: The ground truth for the panels' predictions was the actual blastocyst outcome. The panels were given morphological data collected by clinical site embryologists and, in the adjunct arm, the Eeva parameters.
8. Sample Size for the Training Set:
- Software Development Phase: 63 subjects. This data was used to "further develop the Eeva System Software" and identify the parameters for P2 (time from first to second mitosis) and P3 (time from second to third mitosis) that were implemented. This effectively served as the training/development set for the algorithm's specific predictive windows.
9. How the Ground Truth for the Training Set was Established:
- For the "Software Development" phase (training set), "Imaging data was collected on embryos cultured to cleavage stage (Day 3) or blastocyst stage (Day 5/6)." This implies the ground truth for blastocyst formation was established by the actual development outcome of these embryos (i.e., did they reach blastocyst stage or not). This outcome data was then used by the developers to determine the optimal timing parameters (P2 and P3) that formed the basis of the Eeva System's prediction model.
Ask a specific question about this device
Page 1 of 1