(375 days)
The OVA1 Next Generation test is a qualitative serum test that combines the results of five immunoassays into a single numeric result. It is indicated for women who meet the following criteria: over age 18, ovarian adnexal mass present for which surgery is planned, and not yet referred to an oncologist.
The OVA1 Next Generation test is an aid to further assess the likelihood that malignancy is present when the physician's independent clinical and radiological evaluation does not indicate malignancy. The test is not intended as a screening or stand-alone diagnostic assay.
The OVA1 Next Generation (NG) test consists of software, instruments, assays and reagents. The software incorporates the results of serum biomarker concentrations from five immunoassays to calculate a single, unitless numeric result indicating a low or high risk of ovarian malignancy.
The assays used to generate the numeric result (OVA1 NG test result) are APO, CA 125 II, FSH, HE4 and TRF.
Biomarker values are determined using assays on the Roche cobas® 6000 system, which is a fully automated, software-controlled system for clinical chemistry and immunoassay analysis. The biomarker assays are run according to the manufacturer's instructions as detailed in the package insert for each reagent.
The OVA1 NG software (OvaCalc v4.0.0) contains a proprietary algorithm that utilizes the results (values) from the five biomarker assays, (APO, CA 125 II, FSH, HE4 and TRF). The assay values from the cobas 6000 system are either imported into OvaCalc through a .csv file or manually entered into the OvaCalc user interface to generate an OVA1 NG test result between 0.0 and 10.0. A low- or high-risk result is then determined by comparing the software-generated risk score to a single cutoff (low-risk result
Here's an analysis of the acceptance criteria and study findings for the OVA1 Next Generation device, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
The document does not explicitly state pre-defined acceptance criteria for the OVA1 Next Generation device in terms of specific performance thresholds (e.g., "Sensitivity must be >X%"). Instead, the study focuses on demonstrating substantial equivalence to a predicate device (the original OVA1) and showing improvements in certain metrics.
However, based on the clinical performance evaluation and comparisons to the predicate, we can infer the primary goal was to at least maintain sensitivity while significantly improving specificity and positive predictive value. The "clinically small" and "clinically significant" definitions also act as implicit criteria for comparison with the predicate.
Here's a table summarizing the reported comparative performance:
Metric (vs. Predicate OVA1) | OVA1 Next Generation Performance (with PA) | OVA1 Next Generation Performance (Standalone) | Goal/Inferred Acceptance Criterion |
---|---|---|---|
Specificity (Overall) | Improved by ~14% (64.8% vs 50.9%) | Improved by ~16% (69.1% vs 53.6%) | Significantly improved specificity while maintaining sensitivity |
Specificity (Postmenopausal) | Improved ~23% (60.9% vs 37.8%) | Improved ~24% (65.4% vs 41.0%) | Significantly improved specificity |
Specificity (Premenopausal) | Improved ~8% (67.3% vs 59.2%) | Improved ~10% (71.4% vs 61.6%) | Significantly improved specificity |
Sensitivity (Overall) | Differences "clinically small" (~ -2.17%) (93.5% vs 95.7%) | Differences "clinically small" (~ -1.09%) (91.3% vs 92.4%) | Maintain similar sensitivity |
Sensitivity (Postmenopausal) | Differences "clinically small" (~ -1.64%) (95.1% vs 96.7%) | Identical (91.8% vs 91.8%) | Maintain similar sensitivity |
Sensitivity (Premenopausal) | Differences "clinically small" (~ -3.23%) (90.3% vs 93.5%) | Differences "clinically small" (~ -3.23%) (90.3% vs 93.5%) | Maintain similar sensitivity |
Positive Predictive Value (PPV) (Overall) | N/A (Only Standalone data provided for PPV comparison) | Improved by 9% (40.4% vs 31.4%) | Significantly improved PPV |
Negative Predictive Value (NPV) (Overall) | Differences "substantially equivalent" (~0.35%) (97.7% vs 98.1%) | Differences "substantially equivalent" (~0.35%) (97.2% vs 96.8%) | Maintain similar NPV |
Precision (%CV) | 1.54% (Overall) | 1.54% (Overall) | Better than or equivalent to predicate (Predicate was 4.09%) |
Reproducibility (%CV) | 1.63% (Overall) | 1.63% (Overall) | Better than or equivalent to predicate (Predicate was 2.80%) |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size:
- Clinical Performance Evaluation: 493 evaluable subjects (from an initial 519 enrolled). Split into 276 premenopausal and 217 postmenopausal.
- Clinical Specificity - Healthy Women Study: 152 healthy women (68 premenopausal, 84 postmenopausal).
- Clinical Specificity - Other Cancers and Disease States: 401 samples from women with various non-ovarian cancers and benign conditions.
- Method Comparison (Archived Samples): 133 samples (28 primary ovarian malignancies, 105 benign ovarian conditions) for a direct comparison with the predicate.
- Data Provenance: The primary clinical study used a banked sample set from a prospective, multi-site pivotal study of OVA1 – the OVA500 Study. The archived samples were used to conduct a side-by-side clinical validation for Substantial Equivalence purposes. The method comparison study (Table 10) also used archived samples collected from selected larger prospective studies, tested within one year of collection. The document does not specify the country of origin for the OVA500 study, but it is implied to be a US study given the FDA submission.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
The ground truth for the clinical performance evaluation was established by postoperative pathology diagnosis, which was recorded at each enrolling site and independently reviewed.
The document does not specify the number of experts or their explicit qualifications (e.g., "radiologist with 10 years of experience") for this pathology review. However, "postoperative pathology diagnosis" generally implies review by trained pathologists.
4. Adjudication Method for the Test Set
The document explicitly states that postoperative pathology diagnosis was "independently reviewed." However, the exact adjudication method (e.g., 2+1, 3+1, none) for discrepancies in pathology results is not detailed in the provided text.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done
No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was not done in the traditional sense of evaluating multiple human readers' performance with and without AI assistance.
The study compared the performance of:
- The OVA1 Next Generation device itself (standalone).
- The original OVA1 device (predicate) itself (standalone).
- A "dual assessment" combining Physician Assessment (PA) OR the device result (OVA1 Next Generation or original OVA1).
While Physician Assessment (PA) involves human readers (clinicians), the study design evaluates the addition of the device's score to the physician's assessment, rather than directly measuring improvement of human readers performing a task with AI assistance vs without.
- Effect Size of Human Readers Improvement with AI vs. Without AI Assistance: Not directly measurable from the provided data in the context of an MRMC study. The "with PA" results show the combined performance, where the device acts as an "aid" to PA.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
Yes, a standalone (algorithm only) performance evaluation was done.
Table 6 and Table 7 explicitly present "Standalone Specificity" and "Standalone Sensitivity" for the OVA1 Next Generation device without Physician Assessment (PA) in the risk calculation.
7. The Type of Ground Truth Used
The primary ground truth used for the clinical performance evaluation was pathology (postoperative pathology diagnosis), which was independently reviewed.
8. The Sample Size for the Training Set
The document does not provide the sample size for the training set for the OVA1 Next Generation algorithm. It describes the device's algorithm and its inputs but focuses on the validation study using a banked sample set without detailing the original training cohort.
9. How the Ground Truth for the Training Set Was Established
Since the sample size for the training set is not provided, the method for establishing its ground truth is also not detailed. It is implied that the algorithm was developed (and likely trained/tuned) using similar pathology-confirmed data, but specifics are absent in this document.
§ 866.6050 Ovarian adnexal mass assessment score test system.
(a)
Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum or plasma. It yields a single result for the likelihood that an adnexal pelvic mass in a woman, for whom surgery is planned, is malignant. The test is for adjunctive use, in the context of a negative primary clinical and radiological evaluation, to augment the identification of patients whose gynecologic surgery requires oncology expertise and resources.(b)
Classification. Class II (special controls). The special control for this device is FDA's guidance document entitled “Class II Special Controls Guidance Document: Ovarian Adnexal Mass Assessment Score Test System.” For the availability of this guidance document,see § 866.1(e).(c)
Black box warning. Under section 520(e) of the Federal Food, Drug, and Cosmetic Act these devices are subject to the following restriction: A warning statement must be placed in a black box and must appear in all advertising, labeling, and promotional material for these devices. That warning statement must read: