Search Results

For the quantitative measurement of N-terminal pro Brain Natriuretic Peptide (NT-proBNP) in human serum and plasma (K2 EDTA or Lithium Heparin) using the VITROS 3600 Immunodiagnostic System to aid in the diagnosis of heart failure. The test can also be used in the assessment of heart failure severity in patients diagnosed with heart failure.

Device Description

The VITROS NT-proBNP II test is performed using the VITROS VITROS NT-proBNP II Reagent Pack and the VITROS NT-proBNP II Calibrators on the VITROS Systems.

The VITROS NT-proBNP II test utilizes a one-step immunometric bridging assay design. A well is pushed from the pack and patient sample is dispensed into the antibody coated well. The assay reagent and the conjugate reagent are then dispensed into the well with the patient sample. NT-proBNP present in the sample binds with horseradish peroxidase (HRP)-labeled antibody conjugate which is captured by biotinylated anti-NT-proBNP capture antibody which is bound to Streptavidin coated microwells. The well is incubated for 8 minutes, before unbound materials are removed by washing.

The bound HRP conjugate is measured by a luminescent reaction. A reagent containing luminogenic substrate (a luminol derivative and a peracid salt) and an electron transfer agent, is added to the wells. The HRP in the bound conjugate catalyzes the oxidation of the luminol derivative, producing light. The electron transfer agent (a substituted acetanilide) increases the level of light produced and prolongs its emission. The light signals are read by the System. The amount of HRP conjugate bound is directly proportional to the concentration of NT-proBNP present.

AI/ML Overview

The provided document describes the analytical and clinical performance of the VITROS Immunodiagnostic Products NT-proBNP II Reagent Pack, an in vitro diagnostic device used to aid in the diagnosis and assessment of heart failure.

Here's an analysis of the acceptance criteria and the study proving the device meets them:

1. Table of acceptance criteria and the reported device performance

The document does not explicitly present a "table of acceptance criteria" in terms of pre-defined thresholds for performance metrics that the device must meet for clearance. Instead, it describes the design goals and then reports the observed performance. For clarity, I will create a table summarizing the reported performance, which implicitly indicates the criteria were met or exceeded for FDA clearance.

Test Category	Specific Test / Metric	Acceptance Criteria (Implicit/Design Goal from predicate or general IVD standards)	Reported Device Performance
Analytical Performance
Precision	Repeatability (Within-run) %CV	Typically < 10% for quantitative assays; lower for low concentrations.	Range: 1.0% - 2.1% (SD 0.52 to 370)
	Within-calibration %CV	Typically < 15%	Range: 1.9% - 5.0% (SD 1.63 to 710)
	Within-lab %CV	Typically < 20%	Range: 2.4% - 5.7% (SD 1.91 to 730)
	Total Precision %CV (additional analysis)	Consistent with product performance expectations.	Range: 2.62% - 8.96% (SD 2.86 to 761)
Limit of Detection (LoD)	Designed to be <= 30.0 pg/mL	0.46 pg/mL (Observed); 0.49 pg/mL (Claimed)
Limit of Quantitation (LoQ)	Designed to be <= 30.0 pg/mL at 20% CV	0.46 pg/mL at 20% CV (Observed); 20.0 pg/mL (Claimed, to maintain linearity)
Linearity	Across measuring range	Linear over the measuring range	Linear from 20.0 to 30,000 pg/mL
Matrix Comparison	Serum vs. EDTA plasma vs. Lithium Heparin plasma	No significant effect, meet acceptance criteria.	All three tube types suitable for use (met acceptance criteria).
Analytical Specificity	Bias from common substances < 10%	No bias > 10% observed for most tested compounds.	Specific interferents (Cefoxitin sodium, Sodium Azide) showed >10% bias.
Cross-Reactivity	Various related peptides (e.g., ANP, proBNP, BNP32)	Low cross-reactivity desired.	Ranges from <1.0% to 39.1% (non-glycosylated proBNP). Most are <1%.
High Dose Hook Effect	No effect up to X concentration	No high dose hook effect up to 300,000 pg/mL.
Clinical Performance
Aid in HF Diagnosis (ED Setting)	AUC (Overall)	Not explicitly stated, but typically high (e.g., > 0.85 or 0.90 for this type of test)	0.920 (95% CI: 0.909-0.931)
	AUC (Age-stratified)		Ranged from 0.904 to 0.954
	AUC (Clinical Subgroups)		Ranged from 0.899 to 0.945
	Posttest Probability of HF (Positive result)	High positive predictive value/posttest probability for rule-in.	Range: 80.4% - 85.7% across age groups
	Posttest Probability of non-HF (Negative result)	High negative predictive value/posttest probability for rule-out.	Range: 96.5% - 98.3% across age groups
	Likelihood Ratio Positive (LR+)	High (e.g., > 5-10 for strong rule-in)	Range: 4.52 - 6.84 across age groups
	Likelihood Ratio Negative (LR-)	Low (e.g., < 0.1-0.2 for strong rule-out)	Range: 0.01 - 0.05 across age groups
Aid in HF Diagnosis (Outpatient Setting)	AUC (Overall)		0.880 (95% CI: 0.822 to 0.937)
	AUC (Clinical Subgroups)		Ranged from 0.838 to 0.940
	Sensitivity (Rule-out cutoff 125 pg/mL)	High (for rule-out, e.g., >90%)	91.7% (44/48)
	Specificity (Rule-out cutoff 125 pg/mL)	Reasonable (for rule-out, may be lower)	67.2% (490/729)
	NPV (Rule-out cutoff 125 pg/mL)	High (for rule-out, e.g., >90%)	99.2% (490/494)
	PPV (Rule-out cutoff 125 pg/mL)	(for rule-out, may be lower)	15.6% (44/283)
Correlation with NYHA	Statistical significance of relationship with HF severity	Statistically significant trend.	Jonckheere-Terpstra test p < 0.0001 (statistically significant correlation).

2. Sample size used for the test set and the data provenance

ED Setting (Diagnosis of Heart Failure):
- Sample Size: 2200 subjects.
- Data Provenance: Multi-center prospective study, 20 collection sites across the United States.
Outpatient Setting (Diagnosis of Heart Failure):
- Sample Size: 777 subjects.
- Data Provenance: Multi-center prospective study, 10 collection sites across the United States.
Correlation with NYHA Functional Classification:
- Sample Size: 1143 subjects with heart failure.
- Data Provenance: Not explicitly stated if this was a separate collection or a subset of the ED/outpatient studies, but given it's "subjects with heart failure," it likely derives from similar clinical populations.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

For both the ED and Outpatient settings: The final clinical diagnosis (ground truth) was adjudicated by independent cardiologists or ED physicians experienced in diagnosing HF.
Number of Experts: Not specified. The document only mentions "independent cardiologists or ED physicians," implying more than one, but not a precise number.
Qualifications: "Experienced in diagnosing HF." No specific number of years of experience or board certifications are mentioned.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

The document states that the final clinical diagnosis was "adjudicated by independent cardiologists or ED physicians experienced in diagnosing HF." It does not specify a quantitative adjudication method like "2+1" or "3+1." This suggests that the adjudication process was qualitative and relied on the clinical expertise of the adjudicators to reach a consensus diagnosis for each patient, rather than a fixed number of readers and a tie-breaking rule.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

No, a MRMC comparative effectiveness study was not done. This document describes the performance validation of an in vitro diagnostic (IVD) immunoassay kit (VITROS NT-proBNP II Reagent Pack), which measures blood biomarkers. This is not an AI/imaging device where human readers would typically be assisted by AI. The product is a laboratory test, and its performance is evaluated based on its accuracy in measuring the biomarker and its diagnostic utility compared to clinical truth.

6. If a standalone (i.e. algorithm only without human-in-the loop performance) was done

Yes, this is effectively a standalone performance study. The VITROS NT-proBNP II Reagent Pack provides a quantitative measurement of NT-proBNP. Its performance is assessed purely on the analytical accuracy of the measurement and its direct correlation with clinical outcomes (diagnosis of HF, NYHA classification) as determined by the expert adjudicators, without human interpretation of the device's numerical output for diagnosis (i.e., it's a direct measurement, not an interpretative AI). The output values are then interpreted against established cutoffs to aid in diagnosis, but the device itself does not involve a human-in-the-loop for its direct performance.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

The primary ground truth for heart failure diagnosis was expert clinical judgment/adjudication. For the diagnostic studies, "The final clinical diagnosis was adjudicated by independent cardiologists or ED physicians experienced in diagnosing HF."
For the assessment of heart failure severity, the ground truth was New York Heart Association (NYHA) Functional Classification, which is a clinical classification based on patient symptoms and physical activity levels.

8. The sample size for the training set

This document describes the validation of an in vitro diagnostic device (a reagent pack and system) for measuring a biomarker, not a machine learning or AI algorithm in the typical sense that would involve a "training set" for model development. The assays (immunoassays) are based on chemical reactions rather than statistical models trained on large datasets.
The "training" of such a device primarily involves rigorous analytical development and characterization, ensuring the chemical processes are precise and accurate. Therefore, the concept of a separate "training set" with ground truth data, as used in AI/ML, is not directly applicable here. The document focuses on the performance validation of the developed assay.

9. How the ground truth for the training set was established

As noted above, the concept of a "training set" with ground truth for an AI/ML algorithm doesn't directly apply to this type of IVD immunoassay. The analytical methods and performance characteristics (precision, linearity, LoD, LoQ, analytical specificity, etc.) are established through laboratory experiments and characterization studies, not by training on a clinical dataset with ground truth in the AI/ML sense. Clinical performance is then validated against independently established clinical diagnoses (expert adjudication).

Ask a Question

Ask a specific question about this device

Page 1 of 1