Search Results

The OVA1 Next Generation test is a qualitative serum test that combines the results of five immunoassays into a single numeric result. It is indicated for women who meet the following criteria: over age 18, ovarian adnexal mass present for which surgery is planned, and not yet referred to an oncologist.

The OVA1 Next Generation test is an aid to further assess the likelihood that malignancy is present when the physician's independent clinical and radiological evaluation does not indicate malignancy. The test is not intended as a screening or stand-alone diagnostic assay.

Device Description

The OVA1 Next Generation (NG) test consists of software, instruments, assays and reagents. The software incorporates the results of serum biomarker concentrations from five immunoassays to calculate a single, unitless numeric result indicating a low or high risk of ovarian malignancy.

The assays used to generate the numeric result (OVA1 NG test result) are APO, CA 125 II, FSH, HE4 and TRF.

Biomarker values are determined using assays on the Roche cobas® 6000 system, which is a fully automated, software-controlled system for clinical chemistry and immunoassay analysis. The biomarker assays are run according to the manufacturer's instructions as detailed in the package insert for each reagent.

The OVA1 NG software (OvaCalc v4.0.0) contains a proprietary algorithm that utilizes the results (values) from the five biomarker assays, (APO, CA 125 II, FSH, HE4 and TRF). The assay values from the cobas 6000 system are either imported into OvaCalc through a .csv file or manually entered into the OvaCalc user interface to generate an OVA1 NG test result between 0.0 and 10.0. A low- or high-risk result is then determined by comparing the software-generated risk score to a single cutoff (low-risk result <5. high-risk result ≥5).

AI/ML Overview

Here's an analysis of the acceptance criteria and study findings for the OVA1 Next Generation device, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state pre-defined acceptance criteria for the OVA1 Next Generation device in terms of specific performance thresholds (e.g., "Sensitivity must be >X%"). Instead, the study focuses on demonstrating substantial equivalence to a predicate device (the original OVA1) and showing improvements in certain metrics.

However, based on the clinical performance evaluation and comparisons to the predicate, we can infer the primary goal was to at least maintain sensitivity while significantly improving specificity and positive predictive value. The "clinically small" and "clinically significant" definitions also act as implicit criteria for comparison with the predicate.

Here's a table summarizing the reported comparative performance:

Metric (vs. Predicate OVA1)	OVA1 Next Generation Performance (with PA)	OVA1 Next Generation Performance (Standalone)	Goal/Inferred Acceptance Criterion
Specificity (Overall)	Improved by ~14% (64.8% vs 50.9%)	Improved by ~16% (69.1% vs 53.6%)	Significantly improved specificity while maintaining sensitivity
Specificity (Postmenopausal)	Improved ~23% (60.9% vs 37.8%)	Improved ~24% (65.4% vs 41.0%)	Significantly improved specificity
Specificity (Premenopausal)	Improved ~8% (67.3% vs 59.2%)	Improved ~10% (71.4% vs 61.6%)	Significantly improved specificity
Sensitivity (Overall)	Differences "clinically small" (~ -2.17%) (93.5% vs 95.7%)	Differences "clinically small" (~ -1.09%) (91.3% vs 92.4%)	Maintain similar sensitivity
Sensitivity (Postmenopausal)	Differences "clinically small" (~ -1.64%) (95.1% vs 96.7%)	Identical (91.8% vs 91.8%)	Maintain similar sensitivity
Sensitivity (Premenopausal)	Differences "clinically small" (~ -3.23%) (90.3% vs 93.5%)	Differences "clinically small" (~ -3.23%) (90.3% vs 93.5%)	Maintain similar sensitivity
Positive Predictive Value (PPV) (Overall)	N/A (Only Standalone data provided for PPV comparison)	Improved by 9% (40.4% vs 31.4%)	Significantly improved PPV
Negative Predictive Value (NPV) (Overall)	Differences "substantially equivalent" (~0.35%) (97.7% vs 98.1%)	Differences "substantially equivalent" (~0.35%) (97.2% vs 96.8%)	Maintain similar NPV
Precision (%CV)	1.54% (Overall)	1.54% (Overall)	Better than or equivalent to predicate (Predicate was 4.09%)
Reproducibility (%CV)	1.63% (Overall)	1.63% (Overall)	Better than or equivalent to predicate (Predicate was 2.80%)

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size:
- Clinical Performance Evaluation: 493 evaluable subjects (from an initial 519 enrolled). Split into 276 premenopausal and 217 postmenopausal.
- Clinical Specificity - Healthy Women Study: 152 healthy women (68 premenopausal, 84 postmenopausal).
- Clinical Specificity - Other Cancers and Disease States: 401 samples from women with various non-ovarian cancers and benign conditions.
- Method Comparison (Archived Samples): 133 samples (28 primary ovarian malignancies, 105 benign ovarian conditions) for a direct comparison with the predicate.
Data Provenance: The primary clinical study used a banked sample set from a prospective, multi-site pivotal study of OVA1 – the OVA500 Study. The archived samples were used to conduct a side-by-side clinical validation for Substantial Equivalence purposes. The method comparison study (Table 10) also used archived samples collected from selected larger prospective studies, tested within one year of collection. The document does not specify the country of origin for the OVA500 study, but it is implied to be a US study given the FDA submission.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The ground truth for the clinical performance evaluation was established by postoperative pathology diagnosis, which was recorded at each enrolling site and independently reviewed.
The document does not specify the number of experts or their explicit qualifications (e.g., "radiologist with 10 years of experience") for this pathology review. However, "postoperative pathology diagnosis" generally implies review by trained pathologists.

4. Adjudication Method for the Test Set

The document explicitly states that postoperative pathology diagnosis was "independently reviewed." However, the exact adjudication method (e.g., 2+1, 3+1, none) for discrepancies in pathology results is not detailed in the provided text.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was not done in the traditional sense of evaluating multiple human readers' performance with and without AI assistance.
The study compared the performance of:

The OVA1 Next Generation device itself (standalone).
The original OVA1 device (predicate) itself (standalone).
A "dual assessment" combining Physician Assessment (PA) OR the device result (OVA1 Next Generation or original OVA1).

While Physician Assessment (PA) involves human readers (clinicians), the study design evaluates the addition of the device's score to the physician's assessment, rather than directly measuring improvement of human readers performing a task with AI assistance vs without.

Effect Size of Human Readers Improvement with AI vs. Without AI Assistance: Not directly measurable from the provided data in the context of an MRMC study. The "with PA" results show the combined performance, where the device acts as an "aid" to PA.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone (algorithm only) performance evaluation was done.
Table 6 and Table 7 explicitly present "Standalone Specificity" and "Standalone Sensitivity" for the OVA1 Next Generation device without Physician Assessment (PA) in the risk calculation.

7. The Type of Ground Truth Used

The primary ground truth used for the clinical performance evaluation was pathology (postoperative pathology diagnosis), which was independently reviewed.

8. The Sample Size for the Training Set

The document does not provide the sample size for the training set for the OVA1 Next Generation algorithm. It describes the device's algorithm and its inputs but focuses on the validation study using a banked sample set without detailing the original training cohort.

9. How the Ground Truth for the Training Set Was Established

Since the sample size for the training set is not provided, the method for establishing its ground truth is also not detailed. It is implied that the algorithm was developed (and likely trained/tuned) using similar pathology-confirmed data, but specifics are absent in this document.

Ask a Question

Ask a specific question about this device

K Number

DEN090004

Device Name

OVA1 TEST

Manufacturer

VERMILLION

Date Cleared

2009-09-11

(51 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The OVA1™ Test is a qualitative serum test that combines the results of five immunoassays into a single numerical score. It is indicated for women who meet the following criteria: over age 18; ovarian adnexal mass present for which surgery is planned, and not yet referred to an oncologist. The OVA 1 Test is an aid to further assess the likelihood that malignancy is present when the physician's independent clinical and radiological evaluation does not indicate malignancy. The test is not intended as a screening or stand-alone diagnostic assay.

PRECAUTION: The OVA1™ Test should not be used without an independent clinical/radiological evaluation and is not intended to be a screening test or to determine whether a patient should proceed to surgery. Incorrect use of the OVA1™ Test carries the risk of unnecessary testing, surgery, and/or delayed diagnosis.

Device Description

The OVA1™ Test uses OvaCalc Software to incorporate the values for 5 analytes from separately run immunoassays (described below) into a single numerical score between 0.0 and 10.0.

The cleared test system consists of the software, instruments, assays and reagents used to obtain the OVA1™ Test result. The immunoassays and reagents are sold separately from the OvaCalc Software. Users are instructed to use only those lots identified by Vermillion. The immunoassays are performed according to the manufacturers' directions detailed in each product insert. The analytes and corresponding tests and calibrators used in the OVA1™ Test are:

Analyte	Device (Assay and Calibrator)	Instrument
CA 125	Elecsys CA 125 IICA125 II CalSet	Roche Elecsys 2010
Prealbumin	N Antisera to Human Prealbumin andRetinal-binding ProteinN Protein Standard SL (human)	Siemens BN II
ApolipoproteinA-1	N-Antisera to Human Apolipoprotein A-1and Apolipoprotein BN Apolipoprotein Standard Serum (human)	Siemens BN II
β2-microglobulin	Human Beta-2 Microglobulin LatexEnhanced Nephelometric Kit (Binding Site)	Siemens BN II
Transferrin	N Antisera to Human Transferrin andHaptoglobinN Protein Standard SL (human)	Siemens BN II

The user enters results of the five analytes manually into an Excel spreadsheet together with the headers needed by OvaCalc Software. There is no physical or electronic connection between the immunoassay devices and the OvaCalc Software. Using an algorithm and the values of these 5 analytes, the OvaCalc Software generates a single unit-less numerical score from 0.0 to 10.0.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study detailed in the provided document:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state pre-defined acceptance criteria in terms of target performance metrics (e.g., minimum sensitivity or specificity values). Instead, it presents the performance characteristics observed in the clinical validation study. The closest thing to acceptance criteria for the clinical performance seems to be the demonstration that the "True Positive Rate (TPR) exceeded the False Positive Rate (FPR)" with statistical significance. The other criteria relate to analytical performance, such as precision and stability, which are met through demonstrated performance.

Criterion Type	Acceptance Criteria (Implicit/Explicit)	Reported Device Performance
Analytical Performance
Precision (Total %CV)	Acceptable within-run, between-run, between-day, between-operator, and between-site %CV. (Explicitly stated < 8.6% for lot-to-lot and < 8.9% for reproducibility).	Overall within-run %CV: ≤ 7.3%. Between-run %CV: ≤ 2.4%. Between-day %CV: ≤ 3.9%. Between-operator %CV: ≤ 5.3%. Between-site %CV: < 4.4%. Total Reproducibility %CV: < 8.9%. Lot-to-Lot Imprecision %CV: < 8.6%.
Linearity	Claimed in package inserts for individual analytes; measurement intervals used in OVA1™ Test within demonstrated linear range.	Demonstrated for measurement intervals corresponding to those used in the OVA1™ Test.
Traceability	Calibrators and controls traceable to recognized standards (WHO 1st International Preparation for β2M; protein reference preparation CRM 470 for transferrin and prealbumin).	β2M calibrators/controls traceable to WHO 1st International Preparation. Transferrin and prealbumin calibrators/controls traceable to protein reference preparation CRM 470.
Stability (Reagents)	Reagent stability acceptable with specified storage conditions and timeframes.	Open Vial Stability: Apo A-1, Prealbumin, Transferrin (Siemens BNII): 4 weeks at 2-8°C. CA 125 II (Roche Elecsys® 2010): 6 weeks. β2M reagents (reconstituted): up to 1 week. Calibrators and controls: up to 4 weeks.
Stability (Specimen)	Specimen stability acceptable under specified storage conditions and timeframes.	Fresh serum: +2 to +8°C up to 8 days. Frozen serum (within 24 hr) at -20°C: up to 9 weeks. Frozen serum (within 24 hours) at -65 to -85°C: up to 12 weeks.
Limit of Detection (LoD)	LoD/LoQ from individual assay package inserts confirmed and incorporated into algorithm.	Confirmed and incorporated; results outside measuring interval do not yield OVA1™ Test score.
Analytical Specificity (Interference)	< 10% difference between sample with interferent and control, with the exception of rheumatoid factor at high concentrations.	No significant interference observed for hemoglobin, bilirubin, triglycerides. Rheumatoid factor > 250 RU/mL caused significant interference; specimens with RF > 250 RU/mL are not appropriate for the test.
Clinical Performance
Statistical Informativeness	True Positive Rate (TPR) must exceed False Positive Rate (FPR) with statistical significance for combined data, pre-menopausal subjects, and post-menopausal subjects.	All combined data: TPR (87.5%) > FPR (49.2%); difference of 38.3% (95% CI: 26.5% to 47.8%) statistically significant. Pre-menopausal: TPR (80.8%) > FPR (43.2%); difference of 37.6% (95% CI: 16.7% to 52.2%) statistically significant. Post-menopausal: TPR (91.3%) > FPR (58.2%); difference of 33.1% (95% CI: 17.3% to 46.1%) statistically significant.
Adjunctive Information Value (Dual Assessment for Non-GO)	Dual assessment (Physician's pre-surgical assessment + OVA1™ Test) should provide additional information compared to physician's assessment alone, specifically by increasing sensitivity for malignancy and maintaining (or improving) NPV. The benefit of detecting additional true positive cases should outweigh the additional false positives for the intended use population.	Sensitivity: Increased from 72.2% (single assessment) to 91.7% (dual assessment). Specificity: Decreased from 82.7% to 41.6%. PPV: Decreased from 60.5% to 36.5%. NPV: Increased from 89.1% to 93.2%. (95% CI for the 4.1% increase in NPV was -0.5% to 8.7%, borderline statistical significance). Conclusion notes "sufficient benefit".
Adjunctive Information Value (Dual Assessment for GO)	Corroborative results to non-GO analysis regarding additional information provided by dual assessment.	Sensitivity: Increased from 77.5% (single assessment) to 98.9% (dual assessment). Specificity: Decreased from 74.7% to 25.9%. PPV: Decreased from 63.3% to 42.9%. NPV: Increased from 85.5% to 97.6% (95% CI for the 12.1% increase in NPV was 5.7% to 18.6%, statistically significant). Conclusion notes "corroborative, but not dispositive" for intended use.

2. Sample Sizes and Data Provenance

Test Set (Clinical Validation Study):

Total Enrolled: 743 patients.
Training Set (from enrollment): 146 subjects were set aside for training, with 21 not evaluable, leaving 125 for training.
Final Evaluable Test Set: 516 subjects/samples (after excluding training set and those with missing info/lack of sample).
- Non-GO Physician Evaluated Subset: 269 patients.
- GO Physician Evaluated Subset: 247 patients.
Data Provenance: Prospective, multicenter, double-blind clinical study. Samples collected from 27 demographically mixed subject enrollment sites in the US (implied by typical FDA submission context and "demographically mixed" implies US diversity).

Training Set:

Training Set 1: 284 pre-operative serum samples from the University of Kentucky.
- Complete laboratory data for 274 samples (109 malignant, 175 benign).
Training Set 2: A randomly selected subset of 146 pre-operative serum samples collected under a clinical trial specimen repository.
- 21 not evaluable, leaving 125 samples (89 benign, 10 LMPs, 19 EOCs, 1 primary, 3 non-primary ovarian cancers, 3 other malignancies).

3. Number of Experts and Qualifications for Test Set Ground Truth

The document does not specify a "number of experts" used to establish ground truth in the sense of independent review of imaging or clinical data. Instead, the ground truth for malignancy status in the clinical validation study was established through histopathology results from tissue samples obtained during surgical intervention.

The clinical assessments made by physicians (non-GO and GO) were used for comparative purposes against the device and as part of the "dual assessment" scenario, but these were not explicitly designated as "expert ground truth" for the device's diagnostic accuracy. The ultimate ground truth for classification of benign vs. malignant was pathology.

4. Adjudication Method for the Test Set

The document does not describe an "adjudication method" in the typical sense of multiple expert readers reviewing cases and resolving disagreements.

The OVA1™ Test results were generated by an algorithm from five immunoassay values.
The clinical pre-surgical assessments were made by individual physicians (non-GO or GO).
The ground truth outcome was histopathology.

The comparison was between these individual components or combinations, not between adjudicated expert readings.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a multi-reader multi-case (MRMC) comparative effectiveness study, as typically understood for imaging devices where human readers interpret cases with and without AI assistance, was not performed.

This study compared:

The OVA1™ Test alone.
The physician's (non-GO or GO) pre-surgical assessment alone.
A "Dual Assessment" (physician's assessment OR OVA1™ Test positive).

The "effect size of how much human readers improve with AI vs without AI assistance" is not reported in the traditional MRMC sense. Instead, the document quantifies the change in clinical performance metrics (Sensitivity, Specificity, PPV, NPV) when the OVA1™ Test is combined with the physician's pre-surgical assessment, rather than directly measuring physician improvement while using AI.

For non-GO physicians, when using dual assessment:

Sensitivity for malignancy increased from 72.2% (single assessment) to 91.7% (dual assessment).
NPV increased from 89.1% to 93.2%.

For GO physicians, when using dual assessment:

Sensitivity for malignancy increased from 77.5% (single assessment) to 98.9% (dual assessment).
NPV increased from 85.5% to 97.6%.

6. Standalone (Algorithm Only) Performance

Yes, a standalone performance assessment (algorithm only without human-in-the-loop performance) was done for the OVA1™ Test.

The "Performance Characteristics of the OVA1™ Test Alone" section directly presents its sensitivity, specificity, NPV, and PPV compared to histopathology for patients evaluated by non-GO physicians (and similarly for GO physicians, though these were deemed less relevant for the intended use population).

Standalone Performance (Non-GO Physician Population):

Sensitivity: 87.5% (63/72)
Specificity: 50.8% (100/197)
NPV: 91.7% (100/109)
PPV: 39.4% (63/160)

7. Type of Ground Truth Used (for Clinical Studies)

The primary ground truth used for establishing clinical performance was histopathology results from tissue samples obtained during surgical intervention. Malignancy status was determined based on these reports.

8. Sample Size for the Training Set

The algorithm was derived using two independent training datasets:

Training Set 1: 284 pre-operative serum samples (274 evaluable: 109 malignant, 175 benign).
Training Set 2: 146 pre-operative serum samples (125 evaluable: 89 benign, 10 LMPs, 19 EOCs, 1 primary and 3 non-primary ovarian cancers, 3 other malignancies).

9. How the Ground Truth for the Training Set Was Established

The document states that the training sets consisted of "pre-operative serum samples" which were classified into categories like "benign diseases," "ovarian tumors of low malignant potential (LMP)," "epithelial ovarian cancers," etc. This classification would universally be based on histopathology obtained from the surgical specimens after mass removal.

Ask a Question

Ask a specific question about this device

Page 1 of 1