FDA 510(k) Search - By Innolitics (SaMD and AI Experts)

The Helix Laboratory Platform is a qualitative in vitro diagnostic device intended for exome sequencing and detection of single nucleotide variants (SNVs) and small insertions and deletions (indels) in human genomic DNA extracted from saliva samples collected with Oragene® Dx OGD-610. The Helix Laboratory Platform is only intended for use with other devices that are germline assays authorized by FDA for use with this device. The device is performed at the Helix laboratory in San Diego, CA.

Device Description

The Helix Laboratory Platform (HLP) is a high throughput DNA sequencing platform for targeted sequencing of an individual's whole exome. It is intended for targeted sequencing of an individual's whole exome for use with a genetic test application. Genetic test applications may be third party partner ("Partner") genetic test applications or a Helix genetic test application such as the Helix Genetic Health Risk App (HRA; K192073). The DNA sequence generated by this device is intended as input to clinical germline DNA assays intended for use with this device that have FDA marketing authorization. A brief overview of the commercialized workflow is shown in Figure 1 (refer to the section titled "Test Principle" for more specific information regarding the commercialized workflow.) HLP consists of a HiSeq sequencing instrument, cBot system, library preparation reagents, sequencing reagents, and data analysis software. The Helix Laboratory Platform also interacts with the Helix Laboratory Automation Systems and Content Mapping Systems which serve as repositories for the data and do not perform data analysis. The test detects single nucleotide variants (SNVs) insertions and deletions (indels) up to 20 base pairs (bp) and is limited to making high-confidence variant calls that meet prespecified quality metrics (i.e., the analytical range) within the reportable range. Sequencing is performed at the Helix clinical laboratory in San Diego. CA.

AI/ML Overview

Acceptance Criteria and Device Performance Study

This document describes the acceptance criteria for the Helix Laboratory Platform (HLP) and the studies conducted to demonstrate that the device meets these criteria. The HLP is a qualitative in vitro diagnostic device intended for exome sequencing and detection of single nucleotide variants (SNVs) and small insertions and deletions (indels) in human genomic DNA extracted from saliva samples, for use with other FDA-authorized germline assays.

1. Table of Acceptance Criteria and Reported Device Performance

Variant Type	Acceptance Criteria (PPA, TPPV, NPA)	Reported Device Performance (Summary Across Studies)	Notes on Performance & Exclusions
SNV	PPA ≥ 99.5%, TPPV ≥ 99.5%, NPA ≥ 99.99%	PPA: 99.91% - 99.98% (across various studies and stratifications)
TPPV: 99.93% - 99.99% (across various studies and stratifications)
NPA: ≥ 99.99% (consistently reported as 1.0000 in various contexts)	All overall SNV performance metrics (PPA, TPPV, NPA) consistently met or exceeded acceptance criteria across precision, between-lot reproducibility, and accuracy studies, and across different regions (Coding, Mendeliome, Priority) and GC content ranges.
Indel (all sizes)	PPA ≥ 99.0%, TPPV ≥ 99.0%	PPA: 98.63% - 99.98% (varies by size and study)
TPPV: 91.92% - 99.92% (varies by size and study)	Overall indel performance met criteria. However, for indels ≥ 6bp, particularly insertions, PPA and TPPV were sometimes below the 99.0% threshold (e.g., as low as 92.12% PPA and 91.92% TPPV for NA12878 in Precision study). These indels ≥ 6bp are noted to require independent validation as per the Instructions for Use. Indels in regions with GC content >65% are excluded from reporting due to observed suboptimal performance.
Exogenous Interference (Food)	NPA ≥ 99.99%, PPA ≥ 99.5%, TPPV ≥ 99.5% (all with 95% CI lower bound of 99.0%)	Immediately after food: Mean PPA 0.9988 (lower bound 0.9986), Mean NPA 0.9999 (lower bound 0.9999), Mean TPPV 0.9362 (lower bound 0.9355)	Performance immediately after food failed to meet acceptance criteria for mean PPA and TPPV, attributed to one poorly performing sample. This indicates that saliva samples should be collected at least 30 minutes after consuming food. The "30 minutes after food" condition met all criteria.

2. Sample Sizes Used for the Test Set and Data Provenance

The major testing was performed across several studies:

Precision (Cell lines): 6 unique reference cell line samples (NA12877, NA12878, NA24385, NA24149, NA24143, NA24631). Each was tested with 72 replicates, for a total of 432 replicates.
Precision (Clinical Specimens): 18 unique saliva (clinical) samples. Originally intended for 72 replicates each, resulting in ~1296 replicates. However, due to QC failures, 118 replicates were not evaluable, leaving 1178 evaluable replicates across 17 samples (one sample had all replicates fail).
Between-Lot Reproducibility: 24 samples (6 cell lines and 18 saliva-derived DNAs). Each sample produced 54 replicate sequences with combinatorial sets of reagents, totaling 1296 intended replicates. 1287 evaluable replicates passed QC.
DNA Input: 20 unique samples with known variants. Tested at 35ng, 50ng, 70ng, and 100ng DNA input, each in triplicate (totaling 240 intended samples). 219 samples were evaluable.
Index Swapping - Barcoding: 48 saliva samples with known variants. Run in triplicate, totaling 160 libraries. 157 were used in analysis.
Interfering Substances (Endogenous): 60 donors, each providing 3 aliquots (no treatment, plus 2 different endogenous substances). 180 no-treatment libraries and 120 treatment libraries were generated. 299 out of 300 samples were evaluable.
Interfering Substances (Exogenous): 22 donors (originally 20, 2 added for food group), each providing samples for various conditions (before, immediately after, 30 min after consumption of food, drink, gum, mouthwash). 198 intended samples. Number of evaluable samples varied by condition.
Interfering Substances (Microbial): 6 cell line DNA samples across 5 bacterial content conditions (0%, 10%, 20%, 30%, 50%), each in triplicate (totaling 90 samples). 81 samples evaluable. Also, fresh saliva from 3 donors tested across 3 conditions (baseline, bacteria spiked-in, yeast spiked-in), each in triplicate (totaling 27 samples).
Interfering Substances (Smoking): 5 donors, each providing samples for 3 conditions (before, immediately after, 30 min after smoking), each in triplicate (totaling 45 samples).
Accuracy Study 1 (Reference Cell Lines): 6 well-characterized cell lines (same as Precision study).
Accuracy Study 2 (Clinical Specimens): 1002 clinical samples and 96 unique cell line samples.

Data Provenance:
The reference cell line samples (NA12877, NA12878, NA24385, NA24149, NA24143, NA24631) are publicly available from the Genome in a Bottle (GIAB) consortium and Platinum Genomes project, primarily representing Northern European (Utah) and Ashkenazim Jewish ethnicities. One GIAB sample was Asian Chinese.
Clinical samples were saliva samples collected from donors within the Helix lab's specimen collection. These are therefore retrospective samples. The country of origin is not explicitly stated for the clinical samples but is assumed to be the USA, where the Helix lab is located.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The ground truth for the reference cell lines (NA12877, NA12878, NA24385, NA24149, NA24143, NA24631) was established by publicly available reference datasets from the Genome in a Bottle (GIAB) consortium and the Platinum Genomes project. These consortia involve multiple expert groups and diverse sequencing technologies to establish highly confident variant calls. No specific number of, or qualifications for, individual experts are listed, as the ground truth relies on these highly vetted, community-accepted reference standards.

For the clinical samples in Accuracy Study 2, a validated Sanger sequencing method was used as the comparator method to confirm the accuracy of specific variants. This implies expert interpretation of Sanger sequencing results, but the number and qualifications of these experts are not explicitly stated.

4. Adjudication Method for the Test Set

The ground truth for reference cell lines was based on publicly available, highly vetted datasets (GIAB, Platinum Genomes), which typically involve a consensus-based approach from multiple sequencing technologies and analyses rather than active, real-time adjudication by a small group of experts for this specific study.

For samples where a reference sequence was generated within a study (e.g., Precision, DNA Input, Endogenous/Exogenous Substances, Microbial Interference, Smoking studies), it was established by majority call comparison over multiple replicates of a sample within the study. This implies an internal consensus mechanism rather than external expert adjudication.

For Accuracy Study 2 (clinical samples), Sanger sequencing was used as the comparator. Discrepancies between HLP and Sanger sequencing would be reviewed, but a formal adjudication method (e.g., 2+1, 3+1) involving external experts for these specific discrepancies is not described.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was performed or described in this document. The evaluation focuses on the standalone analytical performance of the Helix Laboratory Platform and its concordance with established ground truth or comparator methods, not on how human readers' performance improves with or without AI assistance from this platform.

6. Standalone Performance Study

Yes, extensive standalone performance studies were conducted. The entire "Performance Characteristics" section (Section L) details the intrinsic analytical performance of the Helix Laboratory Platform (HLP) without human intervention for variant calling and quality assessment. The reported PPA, NPA, and TPPV values across various studies (Precision, Between-Lot Reproducibility, DNA Input, Index Swapping, Interfering Substances, Accuracy) demonstrate the algorithm's performance in detecting SNVs and indels when operating independently.

7. Type of Ground Truth Used

The primary types of ground truth used were:

Expert Consensus / Community Standards: For reference cell lines, publicly available datasets from the Genome in a Bottle (GIAB) consortium and the Platinum Genomes project were used. These are highly confident, multi-platform consensus truth sets.
Validated Comparator Method: For clinical samples where specific variants were assessed (e.g., Accuracy Study 2), a validated Sanger sequencing method served as the comparator ground truth.
Internal Majority Call: For various precision and interference studies where multiple replicates of a sample were generated and analyzed, a "majority call" across these replicates was used to establish an internal reference sequence for performance comparison.

8. Sample Size for the Training Set

The document does not explicitly state the sample size used for training the Helix bioinformatics pipeline's algorithms. It mentions optimization processes in Section L.1, such as "Optimization of variant read depth, allele fraction and callability thresholds" and "Establishment of filter and QC threshold for variant calling." These optimizations were "based on historical reference sample runs" and "analyzed with [...redacted...] of the bioinformatics pipeline representing different conditions relative to the quality metric criteria." While this indicates that data was used for optimizing and establishing parameters, specific training set sizes are not provided.

9. How the Ground Truth for the Training Set Was Established

Similar to point 8, the document does not explicitly detail how the ground truth for any training set was established. However, the optimization efforts heavily relied on "historical reference sample runs" and "reference samples." It is highly likely that these reference samples would have included publicly available, well-characterized control genomes like those from the GIAB and Platinum Genomes projects, for which the "truth" variants are established through extensive, multi-platform sequencing and expert consensus by those consortia. The process described for establishing QC thresholds (e.g., in Section L.1.b) implies a comparison against these known reference datasets to fine-tune filtering parameters and improve accuracy.

Summary

Regulation Number and Section

§ 866.6000 Whole exome sequencing constituent device.

(a)
Identification. A whole exome sequencing constituent device is for germline whole exome sequencing of genomic deoxyribonucleic acid (DNA) isolated from human specimens. The DNA sequence generated by this device is intended as input for clinical germline DNA assays that have FDA marketing authorization and are intended for use with this device.(b)
Classification. Class II (special controls). The special controls for this device are:(1) The intended use on the device's label and labeling required under § 809.10 of this chapter must include:
(i) The indicated variant types for which acceptable, as determined by FDA, validation data has been provided. Distinct variant types are considered as single nucleotide variant, insertion, deletion, tandem repeats, copy number variants, or gene rearrangements, and validated for specific sizes and lengths, as applicable.
(ii) The indicated specimen type(s) for which acceptable, as determined by FDA, validation data has been provided.
(2) The labeling required under § 809.10(b) of this chapter must include:
(i) The identification of, or the specifications for, the collection device or devices to be used for sample collection, as applicable.
(ii) A description of the reportable range, which is the region of the genome for which the assay is intended to provide results, as well as a description of the targeted regions of the genome that have enhanced coverage. This must include a description of any genomic regions that are excluded from the reportable region due to unacceptable risk of erroneous results, or for other reasons. A description of the clinically relevant genes excluded from the reportable range must also be included, if applicable.
(iii) A description of the design features and control elements, including the quality metrics and thresholds which are used for reporting the analytical range (the genomic DNA in the reportable range that passed the quality metrics in the run required for reporting to the user) that are incorporated into the testing procedure, that mitigate the risk of incorrect clinical results. The following metrics are considered applicable in the generation of high confidence data and the established thresholds for these metrics for reporting must be described and be determined to be acceptable by FDA: cluster density and percent of cluster pass quality filter, percent of bases meeting the minimum base quality score, average coverage of reads, percent of reads mapped on target, percent of reportable region with coverage meeting the minimum requirement, percent of unassigned read indices, percent of reads for non-human DNA, allele fraction, and strand bias. Any alternate metrics used must be described and an acceptable, as determined by FDA, rationale for applicability must be provided.
(iv) A representative sample of the device output report(s) provided to users, which must include any relevant limitations of the device, as determined applicable by FDA.
(3) Design verification and validation must include:
(i) A detailed description of the impact of any software, including software applications and hardware-based devices that incorporate software, on the device's function.
(ii) Acceptable data, as determined by FDA, demonstrating how the key quality metrics and quality metric thresholds in the list in paragraph (b)(2)(iii) of this section for reporting were established and optimized for accuracy using appropriate DNA standards with established reference genomic sequence. Data must include, as applicable, base quality score, allele fraction for heterozygosity and coverage, and other applicable metrics.
(iii) Data demonstrating acceptable, as determined by FDA, analytical device performance using patient specimens representing the full spectrum of expected variant types reported across the genome and in genomic regions that are difficult to sequence. The number of specimens tested must be sufficient to obtain estimates of device performance that are representative of the device performance that can be expected for the reportable region and clinically relevant subsets of the reportable region, as applicable. For each study, data must include a summary of the key quality metric data; the number and percentage of true positives (TP), false positives (FP), and false negatives (FN); number and percentage of no-calls; positive percent agreement (PPA); negative percent agreement (NPA); positive predictive value (PPV); technical positive percent value (TPPV); and non-reference concordance (NRC). These data must be provided per sample and stratified by variant type. The variant data must also be further stratified by size and zygosity (homozygous common allele, heterozygous, homozygous rare allele). Data demonstrating the accuracy assay based on guanine and cytosine (GC) content, pseudogenes, and proximity to short tandem repeats must also be presented. The data must be presented for the entire exome and also for clinically relevant subsets of the reportable region. For each study, the number of run failures and repeat/requeued specimens must be summarized.
(iv) Documentation of acceptance criteria that are applied to analytical and clinical validation studies, which must be justified based on the estimated risk of erroneous results on clinically significant genes and variants and must be clinically acceptable, as determined by FDA. The acceptance criteria must be pre-specified prior to clinical and analytical validation studies, and all validation testing results must be documented with respect to those acceptance criteria.
(v) Analytical validation must be demonstrated by conducting studies that provide:
(A) Data demonstrating acceptable, as determined by FDA, accuracy based on agreement with an acceptable, as determined by FDA, comparator method(s) that has been validated to have high accuracy and reproducibility. Accuracy of the test shall be evaluated with reference standards and clinical specimens for each indicated specimen type of a number determined acceptable by FDA, collected and processed in a manner consistent with the test's instructions for use.
(B) Data demonstrating acceptable, as determined by FDA, precision from a precision study using clinical samples to adequately evaluate intra-run, inter-run, and total variability across operator, instrument, lot, day, and site, as applicable. The samples must include the indicated range of DNA input. Precision, including repeatability and reproducibility, must be assessed by agreement between replicates, and also supported by sequencing quality metrics for targeted regions across the panel. Precision must be demonstrated per specimen and in aggregate. Precision data must be calculated and presented with and without no calls/invalid results.
(C) Data demonstrating acceptable, as determined by FDA, accuracy in the presence of clinically relevant levels of potential interfering substances that are present in the specimen type and intended use population, including, for example, endogenous substances, exogenous substances, and microbes, as applicable.
(D) Data demonstrating the absence of sample cross contamination due to index swapping (misassignment).
(E) Data demonstrating that the pre-analytical steps such as DNA extraction are robust such that sources of variability in these steps and procedures do not diminish the accuracy and precision of the device.
(F) Data demonstrating that acceptable, as determined by FDA, device performance is maintained across the range of claimed DNA input concentrations for the assay.
(vi) Design verification and validation for software within the whole exome sequencing constituent device must include the following:
(A) Detailed description of the software, including specifications and requirements for the format of data input and output, such that users can determine if the device conforms to user needs and intended uses.
(B) Device design must include a detailed strategy to ensure cybersecurity risks that could lead to loss of genetic data security, are adequately addressed and mitigated (including device interface specifications and how safe reporting of the genetic test is maintained when software is updated). Verification and validation must include security testing to demonstrate effectiveness of the associated controls.
(C) Device design must ensure that a record of critical events, including a record of all genetic test orders using the whole exome sequencing constituent device, device malfunctions, and associated acknowledgments, is stored and accessible for an adequate period to allow for auditing of communications between the whole exome sequencing constituent device and downstream clinical genetic tests, and to facilitate the sharing of pertinent information with the responsible parties for those devices.
(vii) A protocol reviewed and determined acceptable by FDA, that specifies the verification and validation activities that will be performed for anticipated bioinformatic software modifications to reevaluate performance claims or performance specifications. This protocol must include a process for assessing whether a modification to the bioinformatics software could significantly affect the safety or effectiveness of the device. The protocol must include assessment metrics, acceptance criteria, and analytical methods for the performance testing of changes, as applicable. The protocol must also include the process for communicating to developers of downstream clinical genetic tests the impact of the bioinformatics software change on the whole exome sequencing constituent system genetic data output so they may implement appropriate corresponding actions.