(322 days)
NP Screen is a semi-quantitative in vitro diagnostic test that uses real-time PCR to determine the level of Epstein-Barr Virus Nuclear Antigen-1 (EBNA-1) DNA in nasopharyngeal cellular specimens collected using the NP Screen Trans-Oral Nasopharyngeal Brush. The test is intended for use in conjunction with endoscopy and other clinical information to assess the likelihood that EBV-associated nasopharyngeal carcinoma (NPC) is present. The test is indicated for use in adults of Chinese descent with signs and symptoms of nasopharyngeal carcinoma.
NP Screen is a semi-quantitative in vitro test for the detection of Epstein-Barr Virus Nuclear Antigen-1 gene (EBNA-1) in nasopharyngeal epithelial specimens are collected by the clinician, using the NP Screen Trans-Oral Collection Brush, and placed into the Transport Medium for shipping. The test is performed in a single laboratory, Primex Clinical Laboratories, Inc. Total nucleic acid is extracted from the specimen and the dsDNA is then quantitated. If the amount of extracted DNA does not meet the minimum required for testing, the specimen is rejected, and a new specimen must be collected. Extracted specimen DNA that meets the specification is normalized to a pre-defined concentration. The standardized DNA is tested in duplicate using real-time Polymerase Chain Reaction (real-time PCR) and nucleic acid hybridization for the detection of target EBV EBNA-1 DNA. Simultaneous amplification and detection of human RNAse P DNA serves as the internal control for assessing all steps of the NP Screen assay. Low and High Positive External Controls are included with each run.
The provided text describes the evaluation of the NP Screen assay, a real-time PCR test for Epstein-Barr Virus Nuclear Antigen-1 (EBNA-1) DNA to assess the likelihood of EBV-associated nasopharyngeal carcinoma (NPC) in adults of Chinese descent with signs and symptoms of NPC.
Here's an analysis of the acceptance criteria and study details:
1. A table of acceptance criteria and the reported device performance:
The document doesn't explicitly define "acceptance criteria" in a typical table format with specific performance thresholds for sensitivity, specificity, or predictive values against a predefined benchmark. Instead, it focuses on the device's ability to provide clinically relevant risk stratification when used in conjunction with endoscopy. The performance is presented as post-test risks of NPC.
Therefore, I will present the key clinical performance data that demonstrates the device's utility in risk assessment, which can be interpreted as fulfilling its intended purpose and thereby meeting implicit "acceptance criteria" for a de novo classification:
Endoscopy (Degree of Suspicion) | NP Screen Result | Reported Device Performance (Risk of NPC) | Benefit/Impact on Risk (Compared to Pre-Test Risk) |
---|---|---|---|
High Suspicion (Pre-Test Risk: 79.6%) | Positive | 100% (33/33) [95% CI: 91.4%; 100%] | Statistically significantly higher than pre-test risk, confirming high risk. |
Negative | 10% (1/10) [95% CI: 0.7%; 37.8%] | Statistically significantly lower than pre-test risk, reducing estimated risk. | |
Intermediate Suspicion (Pre-Test Risk: 16.7%) | Positive | 90.0% (9/10) [95% CI: 62.1%; 99.4%] | Statistically significantly higher than pre-test risk, increasing estimated risk. |
Negative | 0.0% (0/40) [95% CI: 0.0%; 7.7%] | Statistically significantly lower than pre-test risk, reducing estimated risk to zero. | |
Low Suspicion (Pre-Test Risk: 1.7%) | Positive | 66.7% (6/9) [95% CI: 39.4%; 88.0%] | Statistically significantly higher than pre-test risk, greatly increasing estimated risk. |
Negative | 0.0% (0/342) [95% CI: 0.0%; 1.0%] | Statistically significantly lower than pre-test risk, reducing estimated risk to zero. | |
No Suspicion (Pre-Test Risk: 0.149%) | Positive | 16.7% (1/6) [95% CI: 1.6%; 33.6%] | Statistically significantly higher than pre-test risk, indicating a notable increase despite low baseline. |
Negative | 0.0% (0/640) [95% CI: 0.0%; 0.146%] | Statistically significantly lower than pre-test risk, reducing estimated risk to zero. | |
Combined (Pre-Test Risk: 4.8%) | Positive | 84.5% (49/58) [95% CI: 74.0%; 91.7%] | Substantially increases risk estimation. |
Negative | 0.10% (1/1,032) [95% CI: 0.01%; 0.91%] | Substantially decreases risk estimation. |
2. Sample size used for the test set and the data provenance (e.g. country of origin of the data, retrospective or prospective)
- Sample Size for Test Set: 1,138 evaluable patients. (A total of 1146 patients were enrolled, with 8 excluded due to invalid NP Screen results).
- Data Provenance: Prospective clinical study. The collection site was located in Toronto, Canada.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g. radiologist with 10 years of experience)
The document primarily relies on biopsy results to establish the ground truth for NPC diagnosis, especially for patients with High Suspicion endoscopy findings or those for whom biopsy was recommended at follow-up. For patients with Intermediate or Low Suspicion who did not receive a biopsy after 3 months, or patients with No Suspicion at baseline, their clinical status was considered negative for NPC based on the observed clinical course without further intervention.
While endoscopy findings categorized by "degree of suspicion" were used as a pre-screening step and part of the clinical study design, the ultimate determination of NPC presence for analysis relied on biopsy. The endoscopy examinations were performed by "ENT surgeon[s]" according to "standard clinical practice," implying qualified medical professionals, but specifically states "biopsy" results are the definitive ground truth where applicable. The number of ENT surgeons or pathologists involved in reading slides for biopsy is not specified.
4. Adjudication method (e.g. 2+1, 3+1, none) for the test set
The document does not describe an explicit adjudication method for reconciling disagreements between readers or interpretations, primarily because the clinical status (ground truth) for patients who underwent biopsy was determined by the biopsy result itself. For patients not biopsied, their clinical status was determined by the absence of further recommendations for biopsy after observation or discharge.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
No, a multi-reader multi-case (MRMC) comparative effectiveness study was not done. This device is an in-vitro diagnostic (IVD) test measuring EBV DNA levels, not an AI-powered image analysis tool that would typically involve human readers. The study evaluates the NP Screen assay in conjunction with endoscopy (human assessment), but not how human readers' performance changes with or without AI assistance from this specific device.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
Yes, the NP Screen assay essentially performs as a standalone algorithm in terms of generating its quantitative EBV DNA Detection Level (EDL) and then categorizing it into "Positive," "Equivocal," or "Negative." The clinical study then integrates these standalone assay results with human endoscopy findings to present post-test risks. The performance figures (Risks of NPC given an NP screen result) represent the direct output of the assay (alone or in combination with endoscopy category), demonstrating its standalone diagnostic utility within the context of the clinical pathway.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
The primary type of ground truth used for determining the presence of NPC was pathology (biopsy results). For patients who were not biopsied (e.g., those discharged or considered negative after follow-up), the ground truth was effectively outcomes data / clinical follow-up, where the absence of a confirmed NPC diagnosis over time was considered as negative for NPC.
8. The sample size for the training set
The document does not explicitly describe a separate "training set" for the NP Screen assay in the context of a machine learning or AI model development. The assay is an in-vitro diagnostic (real-time PCR) with established cut-off points. These cut-off points were determined using dilutions of a material traceable to the 1st WHO International Standard (Section L.1.f) and were "clinically validated in the clinical study." This implies that the clinical study described (N=1138 evaluable patients) served as the primary validation set to demonstrate the clinical utility of the pre-defined cut-offs, rather than a dataset used for iterative model training.
9. How the ground truth for the training set was established
As noted above, a distinct "training set" in the machine learning sense is not described. The cut-off points for the NP Screen assay were established based on EBV concentrations traceable to the 1st WHO International Standard for Epstein-Barr Virus for Nucleic Acid Amplification Techniques. This is an analytical method rather than a clinical ground truth establishment for a training set. The clinical study then validated the performance of these pre-defined cut-offs against biopsy and clinical follow-up outcomes.
§ 866.3236 Device to detect or measure nucleic acid from viruses associated with head and neck cancers.
(a)
Identification. A device to detect or measure nucleic acid from viruses associated with head and neck cancers is an in vitro diagnostic test for prescription use in the detection of viral nucleic acid in nasopharyngeal or oropharyngeal cellular specimens from patients with signs and symptoms of head and neck cancer. The test result is intended to be used in conjunction with other clinical information to aid in assessing the clinical status of virus-associated head and neck cancers and/or the likelihood that head and neck cancer is present.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Any device used for specimen collection and transport must be FDA-cleared, -approved, or -classified as 510(k) exempt (standalone or as part of a test system) for the collection of human specimens; alternatively, the sample collection device must be cleared in a premarket submission as a part of this device.
(2) The labeling required under § 809.10(b) of this chapter must include, as determined to be appropriate by FDA:
(i) An intended use statement that includes the following:
(A) The analyte(s) detected by the device;
(B) Data output of the device (qualitative, semiquantitative, or quantitative);
(C) The specimen types with which the device is intended for use;
(D) The clinical indications appropriate for test use (
e.g., in conjunction with endoscopy);(E) The intended use populations (
e.g., signs and symptoms, ethnicity); and(F) The intended use location(s) (
e.g., specific name and location of testing facility or facilities).(ii) A detailed device description, including reagents, instruments, ancillary materials, specimen collection and transport devices, controls, and a detailed explanation of the methodology, including all pre-analytical methods for processing of specimens.
(iii) A detailed explanation of the interpretation of results.
(iv) Limiting statements indicating:
(A) The device is not intended for use in screening for head and neck cancer in asymptomatic populations.
(B) Results of the device are not predictive of a patient's future risk of head and neck cancer.
(C) Patients who test negative for the virus should be managed in accordance with the standard of care, based on the assessment of endoscopy and/or other clinical information by a licensed healthcare professional.
(D) Results of the device are not intended to be used as the sole basis for determining the need for biopsy or for any other patient management decision.
(3) Design verification and validation must include the following:
(i) A detailed device description including pre-analytical specimen processing, assay technology, target region, primer/probe sequences, reagents, controls, instrument requirements, and the computational path from collected raw data to reported result.
(ii) Detailed documentation and results from analytical performance studies, including characterization of the cutoff(s), limit of detection, limit of quantitation, precision (including multisite reproducibility, if applicable), inclusivity, cross-reactivity, interference, carryover/cross-contamination, reagent stability, and specimen/sample stability, as determined to be appropriate by FDA.
(iii) Detailed documentation of a clinical performance study that includes patients from the intended use population, including the clinical study protocol, with a predefined statistical analysis plan, and a clinical study report with testing results and results of all statistical analyses.
(iv) A detailed description of the impact of any software, including software applications and software incorporated in hardware-based devices, on the device's functions.