Search Results

The MSK-IMPACT assay is a qualitative in vitro diagnostic test that uses targeted next generation sequencing of formalin-fixed paraffin-embedded tumor tissue matched with normal specimens from patients with solid malignant neoplasms to detect tumor gene alterations in a broad multi gene panel. The test is intended to provide information on somatic mutations (point mutations and small insertions and deletions) and microsatellite instability for use by qualified health care professionals in accordance with professional guidelines, and is not conclusive or prescriptive for labeled use of any specific therapeutic product. MSK-IMPACT is a single-site assay performed at Memorial Sloan Kettering Cancer Center.

Device Description

A description of required equipment, software, reagents, vendors, and storage conditions were provided, and are described in the product labeling (MSK-IMPACT manual). MSK assumes responsibility for the device.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text for the MSK-IMPACT assay.

Device Name: MSK-IMPACT (Integrated Mutation Profiling of Actionable Cancer Targets)
Type of Test: Next generation sequencing tumor profiling test
Purpose: Qualitative in vitro diagnostic test for detecting somatic mutations (point mutations, small insertions and deletions) and microsatellite instability (MSI) in formalin-fixed paraffin-embedded (FFPE) tumor tissue matched with normal specimens from patients with solid malignant neoplasms.

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are generally embedded within the "Performance" section and the "Reporting" section's "Table 3. Sample Level Quality Control Metrics." The reported performance is found throughout the "Performance" section.

Acceptance Criteria (from text)	Reported Device Performance (from text)
Specimen Requirements:
Minimum Tumor Proportion: >10% of tumor cells; >20% viable tumor preferred; >25% for MSI testing.	The minimum tumor proportion required for the MSI assay was established as 25% (based on CRC specimens, assay and score reproducible to 8% tumor proportion qualitatively, but decreased trend quantitatively) (Table 1). The DNA extraction method was validated with historic data from >10,000 specimens, demonstrating invalid rates of 7.2% to 18.4%, supporting performance across FFPE tumor types (Table 5).
Quality Control Metrics (Table 3):
Average target coverage: > 200X	For normal samples, mean coverage across all targeted exons was 571X (SD = 373X). Analysis of normal samples showed that with mean sample coverage of 571X, 98% of exons are sequenced with coverage greater than 306X (or normalized coverage >0.54), leading to a conservative threshold of 200X mean sample coverage. In silico downsampling to 203X coverage detected 94% of mutations with 10% VAF (Performance L.1.b and Table 3).
Coverage Uniformity: ≥ 98% target exons above 100X coverage	99.5% of exons were sequenced to a depth of 100X or greater, and 98.6% to 250X or greater. It’s expected that 98% of exons will be sequenced to >100X coverage when mean sample coverage is 185X. (Performance L.1.b)
Base Quality: > 80% of bases with QS above > Q30	Not explicitly detailed in the performance section but stated as a QC metric in Table 3. Implicitly met if overall performance is approved.
% Cluster passing filter (Cluster PF): > 80%	Not explicitly detailed in the performance section but stated as a QC metric in Table 3. Implicitly met if overall performance is approved.
% Reads passing filter (Reads PF): > 80%	Not explicitly detailed in the performance section but stated as a QC metric in Table 3. Implicitly met if overall performance is approved.
Hotspot Mutation calling threshold: DP ≥ 20, AD ≥ 8, VF ≥ 2%	Filtering scheme designed to reject false positives while maintaining detection capability. Example: pre-filter SNVs (hotspot) had 1 false positive, post-filter 0 (Table 4). LoD confirmation: 5 replicates for 6 SNVs at 5% MAF showed 100% positive call rates, except one replicate failing on PTEN exon 6 due to low read depth below 5% (Performance L.2.b.ii and Table 11).
Non-hotspot Mutation threshold: DP > 20, AD ≥ 10, VF ≥ 5%	Filtering scheme designed to reject false positives while maintaining detection capability. Example: pre-filter SNVs (non-hotspot) had 342 false positives, post-filter 0 (Table 4). LoD study showed most mutations detected at low VAFs (e.g., 2-9% in Tables 10A-J). Confirmed LoD study (Part 2) for various mutations showed 100% positive call rates for variant types except one discordant case (PTEN exon 8 deletion) at 3.6-7.9% VF (Table 11).
Indels: Fewer than 20% of samples in an established 'standard normal' database (This seems to be a filtering criteria for indels, not a reporting metric.)	Indels had 40,793 pre-filter false positives, reduced to 8 post-filter (Rejection Rate 0.999) (Table 4). LoD confirmation: 5 replicates for 3 deletions and 4 insertions at 5% MAF showed 100% positive call rates, except one deletion (PTEN exon 6), which also failed read depth (Performance L.2.b.ii and Table 11).
Positive Run Control: The difference between the observed and expected frequencies for the known mutations should be within 5%.	Mixed positive control sample with expected VFs: Results reviewed to confirm known mutations called and observed frequencies match expected values within 5% (Controls, b).
Negative Run Control: The correlation between expected and observed mutation frequencies should be 0.9 or higher.	Pooled negative control: Observed mutation frequencies compared against expected for 862 common SNPs; correlation expected to be 0.9 or higher (Controls, c). Figure 2 shows correlation of 0.975 (with slope 0.971 and intercept -0.004) for observed vs. expected variant frequency, establishing consistent correlation >0.9.
Sample-Mix up QC: Flagged if pairs of samples from the same patient with > 5% discordance and from different patients with < 5% discordance.	Pipeline computes 'percent discordance'; expected discordance between tumors and matched normal should be low (<5%), between different patients high (~25%). Samples flagged if >5% for same patient ("unexpected mismatches") or <5% for different patients ("unexpected matches") (Device Description, 4.e.i).
Major Contamination QC: % heterozygous sites at fingerprint SNPs < 55%; Average MAF at homozygous fingerprint SNPs < 2%.	Samples flagged if average minor allele frequency at homozygous SNP sites exceeds 2% (Device Description, 4.e.ii).
Criteria for calling test failure: If a sample presents with mean coverage across all exons < 50x and no mutations are detected due to the low overall coverage, the test is deemed "failed" for the sample.	Not explicitly detailed in the performance section but stated in Table 3. Implicitly met if overall performance is approved.
Analytical Performance (General):
Precision (Within-run, Between-run, Total Variability): Using clinical samples, covering all mutation types (positive/negative), including samples near LoD. Assessed by agreement within replicates and sequencing quality metrics.	Panel-Wide Reproducibility: 69 mutations in clinical specimens and 13 in commercial cell line (total 82). All mutations showed 100% concordance except 4 in clinical specimens and 3 in commercial sample. Discordant cases were in repetitive regions, or had low frequencies near 2% (Performance L.2.a.ii and Table 7). Positive call rates varied per mutation and specimen (Table 7 and 8). Per Specimen Precision: (N=5 replicates). Overall positive call rates ranged from 80% to 100% across various specimens (Table 8). Intra-assay repeatability: All results concordant except for ARID1B exon 2 insertion and BRAF V600M point mutation (commercial control). Reference Material (NA20810): 23 replicates. Zygosity results were 100% concordant. Difference between expected and mean observed mutation frequencies was very small (absolute difference = 0.09%±0.45%), providing supplemental evidence of reproducibility (Performance L.2.a.iv). MSI Precision: 12 specimens (6 MSI-H, 6 MSS) tested with 3 inter- and 3 intra-run replicates. All samples had 100% agreement between calls (Performance L.2.a.v and Table 9).
Analytical Sensitivity (LoD): Defined as mutant allele fraction at which 95% of replicates are reliably detected. Confirmed with multiple replicates.	Part 1 (Dilution Series): Serial dilutions of patient samples were used to identify lowest reliable mutant fraction. Most mutations were called at lowest dilution (e.g., BRAF V600E at 2% VF, KRAS G12D at 6% VF, EGFR ins at 3% VF), except PIK3CA (PIK3CA Exon 2 (R88Q) was WT at 1:16 dilution) (Tables 10A-J). Part 2 (Confirmation): 5 replicates tested for 3 deletions, 4 insertions, and 6 SNVs at 5% minor allele frequency. All variants had 100% positive call rates except one replicate for a PTEN exon 6 deletion (mutation read depth below estimated LoD of 5%) (Performance L.2.b.ii and Table 11). LoD is stated as 2% for hotspot and 5% for non-hotspot mutations (Assay Cut-off).
Analytical Specificity: Maintained by paired tumor/matched normal sequencing.	Established during assay optimization; paired tumor/matched normal sequencing minimizes interference (Performance L.2.g).
Accuracy (Method Comparison): Using clinical specimens representing intended specimen type and range of tumor types. Specific criteria for SNV/MNVs, insertions, deletions, and MSI.	Overall Accuracy: 432 out of 433 cases (99.8% with 95% CI (98.7%, 100.0%)) successfully detected known mutations compared to orthogonal methods. One discordant case (EGFR exon 20 duplication) was identified due to filtering algorithm, which was subsequently modified. (Performance L.2.i.i) PPA by Mutation Type/Gene: SNV/MNVs showed 100% PPA for all listed genes (Table 15A). Insertions showed PPA from 93.8% (EGFR) to 100% (Table 15B). Deletions showed 100% PPA for all listed genes (Table 15C). Wildtype Calls (Supplemental Study): 95 specimens with 109 mutations and 3026 wild-type calls across 33 hotspots in 10 genes. Variant-level concordance: PPA was 100% (96.7%, 100.0% CI), NPA was 100% (99.9%, 100.0% CI) (Performance L.2.i.ii). MSI Accuracy (MSIsensor): CRC/EC (Training): Cut-off of 10 established based on concordance with MSI-PCR or MMR IHC using 138 CRC and 40 EC specimens. CRC (Validation): 135 CRC patients, 66 with both MSK-IMPACT MSI and IHC results. PPV = 92.3% (12/13, 95% CI 64.0%-99.8%), NPV = 98.1% (52/53, 95% CI 90.0%-100.0%) (Table 16). Non-CRC/EC: 119 non-CRC/EC samples assessed by MSIsensor and MSI-PCR. Excluding missing data: PPV = 93.9% (46/49, 83.1%-98.7% CI), NPV = 96.7% (58/60, 88.5%-99.6% CI). Including missing data: PPV = 78.0% (46/59, 65.3%-87.7% CI), NPV = 96.7% (58/60, 88.5%-99.6% CI). (Table 17).

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Sizes:
- Precision Studies:
  - 10 samples (9 FFPE specimens, 1 commercial cell line) used for panel-wide reproducibility (Table 6).
  - Well-characterized reference standard (HapMap cell line NA20810) in 23 replicates for sequencing error rates and reproducibility.
  - 12 specimens (6 MSI-H, 6 MSS) for MSI precision.
- Analytical Sensitivity (LoD):
  - Part 1 (Dilution Series): Patient samples (number not specified, but for 5 validation exons, implying at least 5 patients) with 5-8 serial dilutions.
  - Part 2 (Confirmation): Unspecified number of samples, providing variants for 3 deletions, 4 insertions, and 6 SNVs, each tested with 5 replicates.
  - MSI LoD: CRC specimens (number not specified, but 5 replicates run).
- DNA Input Assessment: Unspecified number of samples (historical data from >10,000 samples mentioned in pre-analytical performance context). Table 13 presents data by DNA input amounts but not sample count for each bin.
- Accuracy (Method Comparison):
  - 267 unique mutations in 433 FFPE tumor specimens for the main comparison (Table 14).
  - 95 specimens for the supplemental wildtype calls study.
  - 138 colorectal cancer (CRC) and 40 endometrial carcinoma (EC) specimens (training set) for MSI cutoff establishment.
  - 135 CRC patients (66 with both MSK-IMPACT and IHC) for MSI cutoff validation.
  - 119 unique non-CRC and non-EC tumor-normal pair samples for MSI comparison in other cancer types.
Data Provenance:
- General: The device is performed at Memorial Sloan Kettering Cancer Center (MSK), indicating the data likely originates from their patient population.
- Retrospective/Prospective:
  - The pre-analytical performance (specimen invalid rates) used historical data from >10,000 specimens, implying a retrospective chart review.
  - The MSI validation study (CRC patients) was a retrospective-prospective chart review.
  - The clinical performance section mentions a large-scale, prospective clinical sequencing initiative using MSK-IMPACT involving >10,000 patients, whose data are publicly accessible. This cohort likely informed the broader context and understanding of the device but was not explicitly stated as the test set for the analytical validation.
  - The analytical performance studies (precision, LoD, accuracy) used clinical samples/specimens, which could be retrospective or prospectively collected for the purpose of the study. The text doesn't explicitly state for each study.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and the Qualifications of Those Experts

The text does not specify the number of experts used to establish the ground truth for the test set, nor their specific qualifications (e.g., "radiologist with 10 years of experience").

However, it does indicate:

For the accuracy studies, results were compared to "original results obtained with the validated orthogonal methods." This implies that the ground truth was established by these validated orthogonal methods, which are presumably performed and interpreted by qualified personnel using established clinical diagnostics.
For MSI, the MSIsensor results were compared to "a validated MSI-PCR or MMR IHC test," a "commercially available PCR assay," or a "validated IHC panel (MLH1, MSH2, MSH6 and PMS2)." Again, this suggests ground truth from established, clinical laboratory methods.
The "Clinical Evidence Curation" section mentions that "OncoKB undergoes periodic updates through the review of new information by a panel of experts," which informs the clinical interpretation of detected mutations. This expert panel contributes to the broader clinical context of the mutations, but not directly the ground truth for the analytical test set itself.

4. Adjudication Method (e.g., 2+1, 3+1, none) for the Test Set

The text does not describe a formal adjudication method (like 2+1 or 3+1 consensus with experts) for establishing the ground truth of the test set cases. Instead, the ground truth was derived from "validated orthogonal methods."

For example, in the accuracy study, the MSK-IMPACT results were "compared to the original results obtained with the validated orthogonal methods." This indicates that the results from the comparison methods served as the reference standard, rather than requiring an additional expert adjudication process on top of those existing validated methods.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

No, an MRMC comparative effectiveness study was not described. This document pertains to the analytical validation of a genetic sequencing assay, which inherently does not involve human readers interpreting images in a multi-reader, multi-case setup. Therefore, a comparative effectiveness study measuring human reader improvement with AI assistance (which is typical for imaging AI) is not applicable here.

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done

Yes, the analytical performance studies (precision, analytical sensitivity, and analytical accuracy) described are all measures of the standalone performance of the MSK-IMPACT assay, which relies on its sequencing and bioinformatics pipeline without direct human-in-the-loop diagnostic interpretation to produce the raw mutation calls.

The "Mutation calling SNVs and Indels" section and "Summary of mutation filtering scheme" (Figure 1) describe the automated pipeline for identifying mutations.
The "Performance" section details how characteristics like precision, LoD, and accuracy were determined for the assay itself by comparing its outputs to known or established results from other validated methods. These do not involve a human interpreting the device's output to make a diagnosis within the performance evaluation but rather assess the accuracy of the device's genomic calls directly.

7. The Type of Ground Truth Used

The primary type of ground truth used was:

Orthogonal Methods / Comparator Assays: For the accuracy studies, the MSK-IMPACT results were compared against "original results obtained with the validated orthogonal methods." This included comparison to:
- Validated orthogonal methods for SNVs and indels.
- Established MSI-PCR or MMR IHC tests for Microsatellite Instability status.
Known Reference Material: For precision, a "well characterized reference standard (HapMap cell line NA20810)" was used, with reference genotypes obtained from the 1000 Genomes database.
Expected Values/Dilution Series: For Limit of Detection studies, serial dilutions of patient samples with "known mutations" and "expected frequencies" were used.

Therefore, the ground truth is a combination of established methods, known reference materials, and empirically derived expected values.

8. The Sample Size for the Training Set

The document explicitly mentions training data primarily in the context of the MSI cutoff:

MSI Cutoff Training: A "training specimen dataset consisting of 138 colorectal cancer (CRC) and 40 endometrial carcinoma (EC) specimens with matched normal and having MSI status results from a validated MSI-PCR or MMR IHC test."

For the mutation calling pipeline (SNVs and indels), the text refers to:

Optimization of thresholds: "The threshold values for the filtering criteria were established based on paired-sample mutation analysis on replicates of normal FFPE samples, and optimized to reject all false positive SNVs and almost all false positive indel calls from the reference dataset." The size of this "reference dataset" or "replicates of normal FFPE samples" used for training/optimization of filtering thresholds is not explicitly stated as a defined "training set sample size" for the SNV/indel calling. It implies an internal dataset used during development.

9. How the Ground Truth for the Training Set Was Established

For the MSI cutoff training set:

The ground truth was established by "validated MSI-PCR or MMR IHC test" results. These are existing, established clinical diagnostic methods for determining MSI status.

For the SNV/indel pipeline optimization/threshold establishment:

The ground truth for optimizing filtering thresholds was based on "paired-sample mutation analysis on replicates of normal FFPE samples" and "reference dataset." This suggests that the "true" status of these calls (i.e., whether they were true positives, false positives, etc.) would have been known or definitively determined through external means (e.g., highly confident calls from a different (perhaps more laborious or deeply sequenced) method, or a known characteristic of the "normal FFPE samples"). However, the specific method for establishing this ground truth for the filtering optimization is not explicitly detailed beyond being from a "reference dataset."

Ask a Question

Ask a specific question about this device

Page 1 of 1