K Number
K192916
Date Cleared
2020-12-11

(423 days)

Product Code
Regulation Number
866.5100
Panel
IM
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

NOVA Lite® DAPI dsDNA Crithidia luciliae is an indirect immunofluorescent assay for the qualitative and/or semi-quantitative determination of anti-double stranded DNA (dsDNA) IgG antibodies in human serum by NOVA View Automated Fluorescence Microscope or manual fluorescence microscopy. The presence of anti-dsDNA can be used in conjunction with other serological and clinical findings to aid in the diagnosis of systemic lupus erythematosus (SLE). All results generated with NOVA View device must be confirmed by a trained operator.

Device Description

The NOVA Lite DAPI dsDNA Crithidia luciliae Kit is an indirect immunofluorescence assay for the qualitative detection and semi-quantitative determination of Anti-dsDNA Antibodies (IgG) in human serum. Samples are diluted 1:10 in PBS and incubated with the antigen substrate (dsDNA on glass microscope slides). After incubation, unbound antibodies are washed off. The substrate is then incubated with antihuman IgG-FITC conjugate. The conjugate contains a DNA-binding blue fluorescent dye, 4',6-diamidino-2phenylindole (DAPI) that is required for NOVA View use. The blue dye is not visible by traditional fluorescence microscope at the wavelength where FITC fluorescence is viewed. Unbound reagent is washed off. Stained slides are read by manual fluorescence microscope or scanned with the NOVA View Automated Fluorescence Microscope. The resulting digital images are reviewed and interpreted from the computer monitor. dsDNA positive samples exhibit an apple green fluorescence corresponding to areas of the substrate where autoantibody has bound.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for the NOVA Lite DAPI dsDNA Crithidia luciliae Kit, based on the provided text:

Acceptance Criteria and Reported Device Performance

Test/CharacteristicAcceptance CriteriaReported Device Performance
PrecisionReactivity Grades: Difference between reactivity grades within one run (between replicates) are within ± one reactivity grade. Average reactivity grade difference between any runs is within ± one reactivity grade.For digital image reading, grades were within ± one reactivity grade within one run (within triplicates), and the average grade was no more than one reactivity grade different between runs. (Numerical results provided in table: e.g., Sample 1: 93% positive, Grade range 3-4 for NOVA View; 100% positive, Grade 4 for Manual; 100% positive, Grade 4 for Digital).
Reproducibility (Between Sites/Instruments)Agreement: 90% agreement between operators and between sites.Manual Reading: Qualitative agreement: All samples showed 100% positive/negative agreement across both readers at all three sites for most samples. Sample 4 showed some variability (e.g., Reader 1 Site 1 was 7% negative for a positive sample, others 0%). Digital Reading: Qualitative agreement: All samples showed 100% positive/negative agreement across both readers at all three sites. Operator Agreement (per site): Manual Reading: 99.7% (Site 1), 100.0% (Site 2), 100.0% (Site 3). Digital Reading: 100.0% (Site 1), 100.0% (Site 2), 100.0% (Site 3).
Reproducibility (Between Lots)Qualitative Agreement: Positive, negative, and total agreement ≥ 90%. Grade Agreement: ≥ 90% within ± 1 reactivity grade.Qualitative Agreement: NOVA View: Positive agreement ranged from 91.7% to 100.0%. Negative agreement ranged from 96.4% to 100.0%. Total agreement ranged from 95.0% to 100.0%. Manual: 100% positive, negative, and total agreement. Digital: 92.9% positive agreement, 100% negative agreement, 97.5% total agreement. Grade Agreement: Manual: 100% within ±1 reactivity grade. Digital: 98% within ±1 reactivity grade.
LinearityNot explicitly stated as a pass/fail criterion, but the expectation is that dilutions will follow a predictable pattern.The results show a clear progression of intensity decrease with serial dilution for all three samples across NOVA View, Manual, and Digital interpretations, confirming linearity.
InterferenceGrades obtained on samples with interfering substances are within ± 1 reactivity grade of those obtained on the control samples, spiked with diluent.No interference was detected with hemoglobin (up to 200 mg/dL), bilirubin (up to 100 mg/dL), triglycerides (up to 1,000 mg/dL), cholesterol (up to 224.3 mg/dL), rheumatoid factor (up to 28.02 IU/mL), and various medications (azathioprine, cyclophosphamide, hydroxychloroquine, ibuprofen, methotrexate, methylprednisolone, mycophenolate, naproxen, rituximab, and belimumab) at specified concentrations.
Sample Stability and HandlingNOVA View: Results (positive/negative) do not change category and are not different than the control sample. Manual Reading: Reactivity grades are within ±1 grade of the control sample. Digital Image Interpretation: Reactivity grades are within ±1 grade of the control sample.All samples fulfilled the acceptance criteria at each time point (up to 21 days at 2-8°C, up to 48 hours at room temperature, and up to 3 freeze/thaw cycles) for each condition.
Reagent Stability (Shelf Life)Reactivity grades of all samples/reagent controls run must be within ±1 reactivity grade of the control condition (week 0) for both manual and digital image interpretation for all three lots.The acceptance criteria were successfully met with the accelerated lots tested for a two-year preliminary expiration dating. All samples tested were within ±1 reactivity grade of the control kit. Real-time stability results to date (up to 24, 15, and 19 months for different lots) were within acceptance limits.
Reagent Stability (In-use/Open Vial - Conjugate & Controls)Appearance: Clear liquid, free from foreign matter. Grades: Within ±1 grade from each other. Fluorescence Grading: >3+ for undiluted positive control, 0 for undiluted negative control. Testing: Comparable to control.The acceptance criteria were successfully met for all 8 weeks tested for both conjugate and controls.
Single Well Titer (SWT)Accuracy: SWT is within ± 2 dilution steps of that of the manual end-point titer and the digital titer.Based on 31 samples, 80.6% of SWT results were within ± 1 dilution step of the manual titer, and 83.9% were within ±1 dilution step of the digital titer. Furthermore, 93.3% of SWT results were within ± 2 dilution steps of the manual titer and 93.5% were within ± 2 dilution steps of the digital titer. (Note: 2 out of 31 samples were outside the ±2 dilution step range). Between sites reproducibility study: 100% of SWT results at two external sites were within ± 1 dilution step of the manual titer (14/14 samples), and 92.9% were within ± 1 dilution step of the digital titer (13/14 samples). 100% of SWT results were within ± 2 dilution steps of both manual and digital titers.

Study Details:

  1. Sample sizes used for the test set and the data provenance:

    • Precision Study: 6 samples (2 negative, 2 borderline, 2 positive), each processed in triplicate across 14 runs (2 runs/day for 7 days), resulting in 42 data points per sample.
    • Reproducibility Studies (Between sites/instruments): 10 samples (3 negative, 7 positive), each tested in triplicate, twice a day for 5 days at each of 3 sites. This results in 30 data points per sample per site, or 90 data points per sample across all sites. Total data points for this study: 10 samples * 30 data points/sample * 3 sites = 900 data points.
    • Reproducibility (Between lots): 20 clinically and/or analytically characterized samples, tested in duplicate.
    • Linearity Study: 3 positive samples (high, medium, low), serially diluted from 1:10 up to 1:5120. (Number of replicates not specified for this part, but results are given for each dilution).
    • Interference Study: 3 specimens (one negative, one positive, one strong positive) for each interferent, with interfering substances spiked at three different concentrations in 10% of total specimen volume. Samples assessed in triplicates.
    • Sample Stability and Handling: 3 samples (negative, cut-off, positive), tested in duplicates for various conditions (up to 21 days at 2-8°C, up to 48 hours at room temperature, up to 3 freeze/thaw cycles).
    • Reagent Stability (Shelf-life): 3 lots of the kit, tested over 4 weeks accelerated stability (each week = 6 months real time). Real-time stability data was available up to 24, 15, and 19 months for the respective lots at the time of submission.
    • Reagent Stability (In-use/Open Vial): Not detailed how many units/tests were performed each week for 8 weeks.
    • Clinical Performance (Initial Study): 766 clinically characterized serum samples (391 SLE, 375 other diseases). No explicit country of origin is stated, but given this is an FDA submission for Inova Diagnostics, Inc. in San Diego, California, it is reasonable to infer a US-centric data provenance. The study appears to be retrospective based on "clinically characterized serum samples."
    • Clinical Performance (3 Sites Study): 269 clinically characterized samples tested at three sites. Total for reporting: 100 positive (SLE) and 169 negative (non-SLE) per site. The samples comprise 300 SLE and 507 non-SLE clinical diagnoses in total across the three sites. The data provenance is likely multi-center, potentially within the US. The description "clinically characterized samples" suggests these were collected and diagnosed prior to the study, implying a retrospective nature.
    • Expected Values: 120 samples from apparently healthy subjects (60 females, mean age 41, range 18-73).
    • Comparison with Predicate Device: The same 744 serum samples used in the initial clinical study (391 SLE, 353 other diseases).
    • SWT Validation: 31 positive samples for initial validation. 7 positive samples in the between-sites reproducibility study.
  2. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

    • For clinical studies: The ground truth for the clinical studies is stated as "clinically characterized serum samples" and "clinical diagnosis." This implies that the samples were obtained from patients with established diagnoses, likely made by medical professionals (e.g., rheumatologists for SLE patients). The text does not specify the number of experts or their exact qualifications (e.g., "Radiologist with 10 years of experience" is not mentioned, as this is an immunoassay, but rather "clinicians" or "diagnosticians").
    • For analytical studies (Precision, Reproducibility, Linearity, Interference, Stability): The ground truth (Expected Result/Expected Grade) for the control samples or known samples was established by the manufacturer, often based on previous characterization or established laboratory practices. The interpretation of "Manual Reading" and "Digital Reading" results are performed by "trained operators."
  3. Adjudication method for the test set:

    • For analytical results (Precision, Reproducibility, Linearity, Interference, Stability): The text mentions that "Digital images were interpreted and confirmed" in multiple sections (e.g., Linearity, Interference, Sample Stability). For the "Reproducibility Studies (Between sites/instruments)", manual and digital reading was performed by "two operators at each site, to assess between operator reproducibility." The acceptance criteria then focus on agreement percentages between operators. This implies that if disagreements occurred, they were likely adjudicated to reach the "Summary" percentages. However, a specific formal adjudication method like "2+1" or "3+1" is not explicitly stated.
    • For clinical results: The clinical samples were "clinically characterized," meaning their diagnosis served as the ground truth. There's no indication of an adjudication process for these clinical diagnoses within the context of this device study.
  4. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

    • This is not a traditional MRMC comparative effectiveness study involving AI assistance improving human readers. The study compares three modes of interpretation:
      • Manual Reading: Human interpretation using a traditional fluorescence microscope.
      • Digital Reading: Human interpretation of NOVA View generated images on a computer monitor.
      • NOVA View (Software): Automated interpretation by the device's software (algorithm only), which is then confirmed by a trained operator.
    • Therefore, the setup is more of a comparison between manual microscopy, human interpretation of digital images, and the device's automated output. The device itself (NOVA View) is not presented as an AI-assistance tool for human readers but as an alternative interpretation method that still requires human confirmation.
    • The "effect size of how much human readers improve with AI vs without AI assistance" is not directly measured in this context because the "NOVA View" results are the algorithm's output, not a human reader assisted by the algorithm. The "Digital Reading" is human interpretation of the images produced by the NOVA View device, which might be considered an "assisted" or "different modality" reading but not in the typical AI-driven improvement sense.

    Let's look at sensitivity/specificity to show the comparison between manual, digital (human on digital images), and NOVA View (algorithm):

    Initial Clinical Study (N=766)

    • Sensitivity (on SLE):
      • Manual: 48.1% (43.2-53.0)
      • Digital: 48.1% (43.2-53.0)
      • NOVA View: 57.0% (52.1-61.8)
    • Specificity:
      • Manual: 91.2% (87.9-93.7)
      • Digital: 92.3% (89.1-94.6)
      • NOVA View: 88.8% (85.2-91.6)

    Clinical Studies 3 Sites (N=807)

    • Sensitivity (on SLE):
      • Manual Reading: 32.7% (27.6-38.2)
      • Digital Reading: 34.0% (28.9-39.5)
      • NOVA View: 40.0% (34.6-45.6)
    • Specificity:
      • Manual Reading: 95.5% (93.3-97.0)
      • Digital Reading: 95.5% (93.3-97.0)
      • NOVA View: 85.4% (82.1-88.2)

    In both clinical studies, the NOVA View algorithm demonstrates higher sensitivity for SLE detection compared to manual or digital human readings, but lower specificity. This highlights a performance difference, not an improvement of human readers with AI assistance.

  5. If a standalone (i.e. algorithm only without human-in-the loop performance) was done:

    • Yes, the "NOVA View" performance reported in the tables (e.g., sensitivity, specificity, qualitative agreements in reproducibility studies) represents the standalone algorithm's performance.
    • The text explicitly states: "All results generated with NOVA View device must be confirmed by a trained operator." This indicates that while the software generates automated classifications, the final clinical interpretation includes a human-in-the-loop for confirmation. The reported performance metrics for "NOVA View" specifically reflect the device's automated output.
  6. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

    • Clinical Ground Truth: "Clinical diagnosis" for patient-derived samples (e.g., Systemic Lupus Erythematosus (SLE), Drug Induced Lupus, Infectious Disease, etc.). This is likely based on a combination of clinical findings and serological tests by clinicians. It is stated as "clinically characterized serum samples."
    • Analytical Ground Truth: For the precision, reproducibility, linearity, interference, and stability studies, the "Expected Result" or "Expected Grade" for the tested samples serves as the ground truth. These are typically reference materials or well-characterized samples with known positive/negative status or reactivity grades established by the manufacturer.
  7. The sample size for the training set:

    • The document does not explicitly state the sample size of a training set for the NOVA View algorithm. It describes validation studies (test sets) for the kit and the performance of the NOVA View device, but not how the algorithm itself was developed or trained.
    • What is mentioned is that for the SWT (Single Well Titer) feature, "The SWT function was established using 22 dsDNA positive samples that represent various levels of antibodies." However, this refers to establishing the intensity curves for titer determination, not necessarily a broad 'training set' for the overall positive/negative classification logic or image analysis.
  8. How the ground truth for the training set was established:

    • As the training set size is not stated, neither is the method for establishing its ground truth.
    • For the 22 dsDNA positive samples used to establish SWT intensity curves, it is implied that manual and digital readings (human interpretations) served as comparison points for establishing the LIU (Light Intensity Units) to titer relationship. The validation of SWT compares its output to manual and digital end-point titers.

§ 866.5100 Antinuclear antibody immunological test system.

(a)
Identification. An antinuclear antibody immunological test system is a device that consists of the reagents used to measure by immunochemical techniques the autoimmune antibodies in serum, other body fluids, and tissues that react with cellular nuclear constituents (molecules present in the nucleus of a cell, such as ribonucleic acid, deoxyribonucleic acid, or nuclear proteins). The measurements aid in the diagnosis of systemic lupus erythematosus (a multisystem autoimmune disease in which antibodies attack the victim's own tissues), hepatitis (a liver disease), rheumatoid arthritis, Sjögren's syndrome (arthritis with inflammation of the eye, eyelid, and salivary glands), and systemic sclerosis (chronic hardening and shrinking of many body tissues).(b)
Classification. Class II (performance standards).