K Number
DEN090004
Device Name
OVA1 TEST
Manufacturer
Date Cleared
2009-09-11

(51 days)

Product Code
Regulation Number
866.6050
Panel
IM
Reference & Predicate Devices
N/A
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

The OVA1™ Test is a qualitative serum test that combines the results of five immunoassays into a single numerical score. It is indicated for women who meet the following criteria: over age 18; ovarian adnexal mass present for which surgery is planned, and not yet referred to an oncologist. The OVA 1 Test is an aid to further assess the likelihood that malignancy is present when the physician's independent clinical and radiological evaluation does not indicate malignancy. The test is not intended as a screening or stand-alone diagnostic assay.

PRECAUTION: The OVA1™ Test should not be used without an independent clinical/radiological evaluation and is not intended to be a screening test or to determine whether a patient should proceed to surgery. Incorrect use of the OVA1™ Test carries the risk of unnecessary testing, surgery, and/or delayed diagnosis.

Device Description

The OVA1™ Test uses OvaCalc Software to incorporate the values for 5 analytes from separately run immunoassays (described below) into a single numerical score between 0.0 and 10.0.

The cleared test system consists of the software, instruments, assays and reagents used to obtain the OVA1™ Test result. The immunoassays and reagents are sold separately from the OvaCalc Software. Users are instructed to use only those lots identified by Vermillion. The immunoassays are performed according to the manufacturers' directions detailed in each product insert. The analytes and corresponding tests and calibrators used in the OVA1™ Test are:

AnalyteDevice (Assay and Calibrator)Instrument
CA 125Elecsys CA 125 II
CA125 II CalSetRoche Elecsys 2010
PrealbuminN Antisera to Human Prealbumin and
Retinal-binding Protein
N Protein Standard SL (human)Siemens BN II
Apolipoprotein
A-1N-Antisera to Human Apolipoprotein A-1
and Apolipoprotein B
N Apolipoprotein Standard Serum (human)Siemens BN II
β2-microglobulinHuman Beta-2 Microglobulin Latex
Enhanced Nephelometric Kit (Binding Site)Siemens BN II
TransferrinN Antisera to Human Transferrin and
Haptoglobin
N Protein Standard SL (human)Siemens BN II

The user enters results of the five analytes manually into an Excel spreadsheet together with the headers needed by OvaCalc Software. There is no physical or electronic connection between the immunoassay devices and the OvaCalc Software. Using an algorithm and the values of these 5 analytes, the OvaCalc Software generates a single unit-less numerical score from 0.0 to 10.0.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study detailed in the provided document:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state pre-defined acceptance criteria in terms of target performance metrics (e.g., minimum sensitivity or specificity values). Instead, it presents the performance characteristics observed in the clinical validation study. The closest thing to acceptance criteria for the clinical performance seems to be the demonstration that the "True Positive Rate (TPR) exceeded the False Positive Rate (FPR)" with statistical significance. The other criteria relate to analytical performance, such as precision and stability, which are met through demonstrated performance.

Criterion TypeAcceptance Criteria (Implicit/Explicit)Reported Device Performance
Analytical Performance
Precision (Total %CV)Acceptable within-run, between-run, between-day, between-operator, and between-site %CV. (Explicitly stated 250 RU/mL caused significant interference; specimens with RF > 250 RU/mL are not appropriate for the test.
Clinical Performance
Statistical InformativenessTrue Positive Rate (TPR) must exceed False Positive Rate (FPR) with statistical significance for combined data, pre-menopausal subjects, and post-menopausal subjects.All combined data: TPR (87.5%) > FPR (49.2%); difference of 38.3% (95% CI: 26.5% to 47.8%) statistically significant. Pre-menopausal: TPR (80.8%) > FPR (43.2%); difference of 37.6% (95% CI: 16.7% to 52.2%) statistically significant. Post-menopausal: TPR (91.3%) > FPR (58.2%); difference of 33.1% (95% CI: 17.3% to 46.1%) statistically significant.
Adjunctive Information Value (Dual Assessment for Non-GO)Dual assessment (Physician's pre-surgical assessment + OVA1™ Test) should provide additional information compared to physician's assessment alone, specifically by increasing sensitivity for malignancy and maintaining (or improving) NPV. The benefit of detecting additional true positive cases should outweigh the additional false positives for the intended use population.Sensitivity: Increased from 72.2% (single assessment) to 91.7% (dual assessment). Specificity: Decreased from 82.7% to 41.6%. PPV: Decreased from 60.5% to 36.5%. NPV: Increased from 89.1% to 93.2%. (95% CI for the 4.1% increase in NPV was -0.5% to 8.7%, borderline statistical significance). Conclusion notes "sufficient benefit".
Adjunctive Information Value (Dual Assessment for GO)Corroborative results to non-GO analysis regarding additional information provided by dual assessment.Sensitivity: Increased from 77.5% (single assessment) to 98.9% (dual assessment). Specificity: Decreased from 74.7% to 25.9%. PPV: Decreased from 63.3% to 42.9%. NPV: Increased from 85.5% to 97.6% (95% CI for the 12.1% increase in NPV was 5.7% to 18.6%, statistically significant). Conclusion notes "corroborative, but not dispositive" for intended use.

2. Sample Sizes and Data Provenance

Test Set (Clinical Validation Study):

  • Total Enrolled: 743 patients.
  • Training Set (from enrollment): 146 subjects were set aside for training, with 21 not evaluable, leaving 125 for training.
  • Final Evaluable Test Set: 516 subjects/samples (after excluding training set and those with missing info/lack of sample).
    • Non-GO Physician Evaluated Subset: 269 patients.
    • GO Physician Evaluated Subset: 247 patients.
  • Data Provenance: Prospective, multicenter, double-blind clinical study. Samples collected from 27 demographically mixed subject enrollment sites in the US (implied by typical FDA submission context and "demographically mixed" implies US diversity).

Training Set:

  • Training Set 1: 284 pre-operative serum samples from the University of Kentucky.
    • Complete laboratory data for 274 samples (109 malignant, 175 benign).
  • Training Set 2: A randomly selected subset of 146 pre-operative serum samples collected under a clinical trial specimen repository.
    • 21 not evaluable, leaving 125 samples (89 benign, 10 LMPs, 19 EOCs, 1 primary, 3 non-primary ovarian cancers, 3 other malignancies).

3. Number of Experts and Qualifications for Test Set Ground Truth

The document does not specify a "number of experts" used to establish ground truth in the sense of independent review of imaging or clinical data. Instead, the ground truth for malignancy status in the clinical validation study was established through histopathology results from tissue samples obtained during surgical intervention.

The clinical assessments made by physicians (non-GO and GO) were used for comparative purposes against the device and as part of the "dual assessment" scenario, but these were not explicitly designated as "expert ground truth" for the device's diagnostic accuracy. The ultimate ground truth for classification of benign vs. malignant was pathology.

4. Adjudication Method for the Test Set

The document does not describe an "adjudication method" in the typical sense of multiple expert readers reviewing cases and resolving disagreements.

  • The OVA1™ Test results were generated by an algorithm from five immunoassay values.
  • The clinical pre-surgical assessments were made by individual physicians (non-GO or GO).
  • The ground truth outcome was histopathology.

The comparison was between these individual components or combinations, not between adjudicated expert readings.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, a multi-reader multi-case (MRMC) comparative effectiveness study, as typically understood for imaging devices where human readers interpret cases with and without AI assistance, was not performed.

This study compared:

  1. The OVA1™ Test alone.
  2. The physician's (non-GO or GO) pre-surgical assessment alone.
  3. A "Dual Assessment" (physician's assessment OR OVA1™ Test positive).

The "effect size of how much human readers improve with AI vs without AI assistance" is not reported in the traditional MRMC sense. Instead, the document quantifies the change in clinical performance metrics (Sensitivity, Specificity, PPV, NPV) when the OVA1™ Test is combined with the physician's pre-surgical assessment, rather than directly measuring physician improvement while using AI.

For non-GO physicians, when using dual assessment:

  • Sensitivity for malignancy increased from 72.2% (single assessment) to 91.7% (dual assessment).
  • NPV increased from 89.1% to 93.2%.

For GO physicians, when using dual assessment:

  • Sensitivity for malignancy increased from 77.5% (single assessment) to 98.9% (dual assessment).
  • NPV increased from 85.5% to 97.6%.

6. Standalone (Algorithm Only) Performance

Yes, a standalone performance assessment (algorithm only without human-in-the-loop performance) was done for the OVA1™ Test.

The "Performance Characteristics of the OVA1™ Test Alone" section directly presents its sensitivity, specificity, NPV, and PPV compared to histopathology for patients evaluated by non-GO physicians (and similarly for GO physicians, though these were deemed less relevant for the intended use population).

Standalone Performance (Non-GO Physician Population):

  • Sensitivity: 87.5% (63/72)
  • Specificity: 50.8% (100/197)
  • NPV: 91.7% (100/109)
  • PPV: 39.4% (63/160)

7. Type of Ground Truth Used (for Clinical Studies)

The primary ground truth used for establishing clinical performance was histopathology results from tissue samples obtained during surgical intervention. Malignancy status was determined based on these reports.

8. Sample Size for the Training Set

The algorithm was derived using two independent training datasets:

  • Training Set 1: 284 pre-operative serum samples (274 evaluable: 109 malignant, 175 benign).
  • Training Set 2: 146 pre-operative serum samples (125 evaluable: 89 benign, 10 LMPs, 19 EOCs, 1 primary and 3 non-primary ovarian cancers, 3 other malignancies).

9. How the Ground Truth for the Training Set Was Established

The document states that the training sets consisted of "pre-operative serum samples" which were classified into categories like "benign diseases," "ovarian tumors of low malignant potential (LMP)," "epithelial ovarian cancers," etc. This classification would universally be based on histopathology obtained from the surgical specimens after mass removal.

§ 866.6050 Ovarian adnexal mass assessment score test system.

(a)
Identification. An ovarian/adnexal mass assessment test system is a device that measures one or more proteins in serum or plasma. It yields a single result for the likelihood that an adnexal pelvic mass in a woman, for whom surgery is planned, is malignant. The test is for adjunctive use, in the context of a negative primary clinical and radiological evaluation, to augment the identification of patients whose gynecologic surgery requires oncology expertise and resources.(b)
Classification. Class II (special controls). The special control for this device is FDA's guidance document entitled “Class II Special Controls Guidance Document: Ovarian Adnexal Mass Assessment Score Test System.” For the availability of this guidance document,see § 866.1(e).(c)
Black box warning. Under section 520(e) of the Federal Food, Drug, and Cosmetic Act these devices are subject to the following restriction: A warning statement must be placed in a black box and must appear in all advertising, labeling, and promotional material for these devices. That warning statement must read: