Search Results

The NeoBase™ 2 Non-derivatized MSMS kit is intended for the measurement and evaluation of amino acid, succinylacetone, free carnitine, acylcarnitine, nucleoside and lysophospholipid concentrations (Table 1) with a tandem mass spectrometer from newborn heel prick blood specimens dried on filter paper. Quantitative analytis of these analytes and their relationship with each other is intended to provide analyte concentration profiles that may aid in screening newborns for metabolic disorders.

Device Description

Not Found

AI/ML Overview

The provided text describes the acceptance criteria and study results for the NeoBase 2 Non-derivatized MSMS kit.

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state numerical acceptance criteria for the screening performance studies (e.g., minimum sensitivity or specificity targets). Instead, it states that "All verification studies were successfully concluded and met the respective study's predetermined acceptance criteria." The clinical studies for screening performance are presented as agreement between the new device (NeoBase 2) and the predicate device (NeoBase). The agreement is presented as contingency tables (e.g., "Screening positive" vs "Screening negative" for both devices).

The performance is demonstrated by the agreement between the NeoBase 2 Non-derivatized MSMS kit and the predicate device, NeoBase Non-derivatized MSMS kit, in detecting various metabolic disorders in newborn screening. The results are presented in terms of the number of positive and negative screens detected by each device, along with the number of confirmed positive specimens.

Summary of Device Performance (from Tables A, B, C, D):

Disorder Group	Cut-off Type (Percentile)	NeoBase 2 Screening Positive (with Predicate Positive)	NeoBase 2 Screening Negative (with Predicate Negative)	Total Specimens	Confirmed Positive Specimens (detected by both methods)
Study 1
Amino acid disorders	99th	621	1591	1751	15
Amino acid disorders	99.5th	452	1645	1751	15
Amino acid disorders	1st	161	1687	1737	1 (OTCD)
Fatty acid oxidation	99th	801	1581	1746	10
Fatty acid oxidation	99.5th	451	1661	1746	10
Fatty acid oxidation	Low Percentile	1732	1386	1738	2 (CUD)
Organic acid condition	99th	571	1660	1751	15
Organic acid condition	99.5th	361	1697	1751	15
ADA-SCID	99th	2	1661	1738	2
ADA-SCID	99.5th	2	1700	1738	2
X-ALD	99th	2	1724	1738	2
X-ALD	99.5th	2	1731	1738	2
Study 2
Amino acid disorders	99th	1161	2353	2648	19
Amino acid disorders	99.5th	782	2474	2648	18
Amino acid disorders	1st	141	2571	2631	2 (OTCD)
Fatty acid oxidation	99th	1601	2326	2641	12
Fatty acid oxidation	99.5th	1081	2442	2641	12
Fatty acid oxidation	Low Percentile	1581	2363	2632	3
Organic acid condition	99th	861	2479	2642	13
Organic acid condition	99.5th	422	2561	2642	12
ADA-SCID	99th	2	2563	2631	2
ADA-SCID	99.5th	2	2578	2631	2
X-ALD	99th	2	2626	2631	2
X-ALD	99.5th	2	2628	2631	2

2. Sample Size Used for the Test Set and Data Provenance

Study 1 Sample Size:
- Amino acid disorders, Fatty acid oxidation, Organic acid conditions: 1751 samples (for 99th and 99.5th percentile cut-offs) and 1737-1746 samples (for 1st and low percentile cut-offs).
- ADA-SCID and X-ALD: 1738 samples.
Study 2 Sample Size:
- Amino acid disorders, Fatty acid oxidation, Organic acid conditions: 2631-2648 samples.
- ADA-SCID and X-ALD: 2631 samples.
Data Provenance: The data was obtained from "routine newborn screening" in "two CLIA-certified state laboratories." The confirmed positive specimens were described as "retrospective" for Study 2. This suggests a retrospective study design using existing samples and accompanying diagnostic information. The country of origin is not explicitly stated but is implied to be the US due to "CLIA-certified state laboratories."

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

The document does not specify the number or qualifications of experts used to establish the ground truth for the test set. It mentions "confirmed positive specimens," implying a definitive diagnostic process was followed to establish the true disease status of these samples, but details on the experts involved are not provided.

4. Adjudication Method for the Test Set

The document does not describe an adjudication method for the test set, such as 2+1 or 3+1. The acceptance is based on the agreement between the new device and the predicate device, using established cut-offs derived from routine newborn screening data.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was done. This device is a diagnostic kit measuring analyte concentrations, not an AI system assisting human readers. Therefore, the concept of "how much human readers improve with AI vs without AI assistance" is not applicable.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

The study described is a comparison of the new device (NeoBase 2) to a predicate device (NeoBase) in obtaining analyte concentrations. While not explicitly stated as an "algorithm only" study, it's a standalone performance comparison of two test kits. The results (analyte concentrations and screening positive/negative classifications) are derived directly from the kit's operation with a tandem mass spectrometer, without human interpretation being part of the primary measurement process itself. The interpretation of the analyte profiles to aid in screening for metabolic disorders would typically involve medical professionals, but the performance data presented is on the analytical and classification output of the device.

7. Type of Ground Truth Used

The ground truth for the test set was based on "confirmed positive specimens." This implies that the true disease status of these specimens was established through clinical diagnosis and follow-up, which would typically involve a combination of clinical outcomes, biochemical testing, and/or genetic testing, ultimately confirmed by clinical experts. For ADA-SCID and X-ALD, it explicitly states "comparing the result... to the clinical condition."

8. Sample Size for the Training Set

The document does not explicitly mention a "training set" in the context of machine learning or AI. The term "cut-offs for both methods were determined by calculating the 99.5th and 99th percentile for all analytes" using "data from routine newborn screening." This large volume of routine newborn screening data could be considered analogous to a training or reference population used to establish the operating characteristics of the screening test. The specific sample size for this cut-off determination is not given, but it is implied to be a large dataset from the "two CLIA-certified state laboratories."

9. How the Ground Truth for the Training Set Was Established

As discussed in point 8, there isn't a traditional "training set" for an AI model. However, the cut-off values (e.g., 99th, 99.5th, 1st, 10th percentiles) used to define "screening positive" or "screening negative" were established using "data from routine newborn screening." This means the ground truth for establishing these cut-offs would inherently come from the statistical distribution of analyte levels in a large, presumably healthy and general newborn population, along with the understanding of what analyte levels are indicative of various metabolic disorders. The document states that the cut-off values "only apply to these studies."

Ask a Question

Ask a specific question about this device

K Number

K173829

Device Name

NeoLSD MSMS kit

Manufacturer

Wallac Oy, a subsidiary of PerkinElmer

Date Cleared

2018-07-18

(212 days)

Product Code

PQW,PQT,PQU,PQV,QCL,QCM

Regulation Number

862.1488

Type

Traditional

Panel

Clinical Chemistry

Reference & Predicate Devices

DEN150035

Predicate For

K190266

Intended Use

The NeoLSD MSMS Kit is intended for the quantitative measurement of the activity of the enzymes acid-pglucocerebrosidase (ABG), acid-sphingomyelinase (ASM), acid-a-glucosidase (GAA), B-galactocerebrosidase (GALC), α-galactosidase A (GLA) and α-L-iduronidase (IDUA) in dried blood spots (DBS) from newborn babies. The analysis of the enzymatic activity is intended as an aid in screening newborns for the following lysosomal storage disorders (LSD) respectively; Gaucher Disease, Nieman-Pick A/B Disease, Pompe Disease, Fabry Disease, and MPS I Disease.

Device Description

The NeoLSD MSMS test system uses mass spectrometry to quantitatively measure the activity of six lysosomal enzymes simultaneously from a dried blood spot sample. The NeoLSD MSMS test system is comprised of:

NeoLSD MSMS kit, including substrates, internal standards, solutions and controls
Waters TQD MSMS instrument comprised of,
a. Waters 1525 sample pump
b. Waters 2777c autosampler
c. Waters MassLynx v4.1 firmware C.
d. Power cables, tubing, syringes, connection cables
Waters NeoLynx v4.1 software and computer with monitor
PerkinElmer MSMS Workstation Software

The NeoLSD MSMS kit evaluates enzyme activities by measuring the product generated when an enzyme reacts with a synthesized substrate to create a specific end product. The activities of the six lysosomal enzymes present in a 3.2 mm punch from a dried blood spot (DBS) are simultaneously measured by the NeoLSD MSMS kit. The punches are incubated with the assay reagent mixture which contains;
. six substrates, one corresponding to each lysosomal enzyme
. six stable-isotope mass-labeled internal standards (IS) each designed to chemically resemble each product generated
. a buffer to maintain the reaction pH, and to carry inhibitors to limit activity from competing enzymes if present and additives to enhance the targeted enzyme reactions.

AI/ML Overview

The NeoLSD MSMS Kit is intended for the quantitative measurement of the activity of six lysosomal enzymes (acid-β-glucocerebrosidase (ABG), acid-sphingomyelinase (ASM), acid-α-glucosidase (GAA), β-galactocerebrosidase (GALC), α-galactosidase A (GLA), and α-L-iduronidase (IDUA)) in dried blood spots (DBS) from newborn babies. The analysis of enzymatic activity serves as an aid in screening newborns for Gaucher Disease, Niemann-Pick A/B Disease, Pompe Disease, Krabbe Disease, Fabry Disease, and MPS I Disease.

Here's an analysis of the acceptance criteria and the study that proves the device meets them:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are generally implied by the performance metrics reported, such as linearity ranges, precision (reproducibility %CV), and LoQ values. The screening performance, particularly sensitivity and specificity, are key for a screening tool.

Performance Characteristic	Acceptance Criteria (Implied)	Reported Device Performance (NeoLSD MSMS Kit)
Linear Range	Broad enough to cover physiological and pathological ranges	IDUA: 0.34 – 17.2 µmol/L/hGAA: 0.44 – 24.2 µmol/L/hABG: 0.69 – 20.1 µmol/L/hGLA: 0.97 – 20.9 µmol/L/hASM: 0.90 – 20.5 µmol/L/hGALC: 0.63 – 6.3 µmol/L/h
Lower Limit of Quantitation (LoQ)	Low enough to detect deficient enzyme activity (within acceptable CV%)	IDUA: 0.44 µmol/L/h (CV% at LoQ: 18.2%)GAA: 0.63 µmol/L/h (CV% at LoQ: 17.5%)ABG: 0.69 µmol/L/h (CV% at LoQ: 21.7%)GLA: 0.97 µmol/L/h (CV% at LoQ: 17.5%)ASM: 0.90 µmol/L/h (CV% at LoQ: 20.0%)GALC: 0.34 µmol/L/h (CV% at LoQ: 20.6%)
Reproducibility (%CV)	Within acceptable limits for a diagnostic assay (e.g., <20-30%)	Within-Laboratory CV% RangeIDUA: 4.7 – 6.9%GAA: 4.2 – 5.5%ABG: 11.6 – 13.8%GLA: 5.0 – 13.3%ASM: 7.3 – 11.0%GALC: 7.9 – 19.5%Between-Laboratory CV% RangeIDUA: 4.4 – 8.1%GAA: 3.5 – 7.6%ABG: 4.7 – 15.8%GLA: 5.6 – 8.4%ASM: 1.8 – 6.6%GALC: 2.1 – 7.0%Overall Reproducibility CV% RangeIDUA: 6.9 – 10.0%GAA: 5.6 – 9.4%ABG: 13.0 – 21.0%GLA: 8.6 – 15.7%ASM: 7.6 – 11.4%GALC: 9.3 – 20.7%
Sensitivity (overall)	High, to minimize false negatives in screening (e.g., >90%)	92.9% (76.5%-99.1%) (excluding invalid and lost-to-follow-up, including 2 Fabry females that were false negatives) With female Fabry subjects excluded, the test system has no false negative results for any of the enzymes.
Specificity (overall)	High, to minimize false positives (e.g., >95%)	99.4% (99.1%-99.6%) (excluding invalid and lost-to-follow-up)
False Positive Rate (overall)	Low, to minimize unnecessary follow-up (e.g., <5%)	0.6% (0.4% - 0.9%)
False Negative Rate (overall)	Very low, critical for screening (e.g., <1%)	7.1%* (0.9% - 23.5%) (*includes 2 Fabry females). When female Fabry subjects are excluded, the test system has no false negative results for any of the enzymes.
Interference	Minimal, or clearly identified and manageable	Several potential interferents identified (e.g., Glucose, Hematocrit, Hemoglobin, Triglycerides, EDTA), with their effects and implications described. For most, the interferences are not pronounced enough to impair affected/unaffected separation or occur at clinically irrelevant concentrations. Specific warnings are provided for high glucose, hematocrit, and triglyceride levels near cut-off values.

Note: The document provides performance metrics, implying these are the acceptance criteria that the device has met or is expected to meet for its intended use as a newborn screening aid.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Screening Performance Study:
- Routine Samples: 4011 newborn specimens (retrospective, 4 years old, used for follow-up of clinical status).
- Confirmed LSD Positive Samples (enriched): 30 newborn DBS specimens (from the site's biobank, ranging from 5.8 to 17.6 years of age).
- Total Test Set: 4041 specimens (4011 routine + 30 confirmed positive).
Data Provenance:
- Routine Samples: Retrospective routine newborn screening samples, 4 years old, from an EU (European) newborn screening laboratory.
- Confirmed LSD Positive Samples: From the site's biobank (likely the same EU lab), with ages ranging from 5.8 to 17.6 years.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

The document does not explicitly state the number or qualifications of experts used to establish the ground truth for the test set.

Instead, the ground truth for the 4011 routine samples was established based on:

Clinical outcome: "Clinical outcome was used as a comparator for all samples, including the 4011 routine screening samples, as derived from the civil registry status and national hospital registry. Subject´s survival at 4 years of age without LSD diagnosis or clinical signs suggestive of an LSD was used as clinical confirmation of an unaffected newborn."
For the 30 confirmed LSD positive samples: Their status was "known" as "confirmed LSD positive newborn DBS specimens."

Therefore, the ground truth relies on clinical follow-up data and prior confirmed diagnoses, rather than a panel of experts adjudicating each case for the study.

4. Adjudication Method for the Test Set

No explicit "adjudication method" in the sense of expert review (e.g., 2+1, 3+1) is described for the test set. The ground truth was established by:

Clinical outcome and registry data for routine samples to determine "unaffected" status.
Known (prior confirmed) diagnoses for the "confirmed positive" samples.
Screening algorithm: For routine samples, those below the initial cut-off were re-tested in duplicate to classify as normal, presumptive positive, or invalid.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was not done. This device is a diagnostic kit that quantitatively measures enzyme activity, not an interpretative imaging AI tool that assists human readers. Therefore, the concept of "human readers improve with AI vs without AI assistance" does not apply here.

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) was done

Yes, the screening performance study essentially represents a standalone (algorithm only) performance for the NeoLSD MSMS kit. The device measures enzyme activity, and the "screening results" (positive, negative, invalid) are derived directly from these quantitative measurements compared against predefined cut-off values and a re-testing algorithm. While a human laboratory technician performs the assay, the interpretation of the results as "screen positive" or "screen negative" is determined by the device's output and the established algorithm, without human interpretative judgment affecting the individual sample classification.

7. The Type of Ground Truth Used

The ground truth used was primarily:

Outcomes data/Clinical Confirmation: For the 4011 routine samples, "Subject´s survival at 4 years of age without LSD diagnosis or clinical signs suggestive of an LSD was used as clinical confirmation of an unaffected newborn." This is a form of clinical outcome data.
Pathology/Confirmed Diagnosis: For the 30 enriched samples, they were "confirmed LSD positive" specimens, indicating a definitive medical diagnosis.

8. The Sample Size for the Training Set

The document describes studies for establishing reference ranges and calibration, but it does not explicitly describe a "training set" in the context of machine learning model development. The development of this assay likely involved extensive analytical validation (e.g., linearity, LoQ, interference) and establishing reference ranges using large sample sets, which might be considered analogous to a training or development phase for defining assay parameters and cut-offs.

Reference Range Establishment:
- EU site: 5041 newborn samples were tested to establish cut-off values. These were "retrospective routine newborn screening samples" from newborns 0-30 days of age.
- US Site A: 5251 newborn DBS specimens, newborns ≤ 4 days.
- US Site B: 5053 newborn DBS specimens, newborns ≤ 7 days.

These large cohorts were used to determine population distributions, medians, and percentiles to set initial and retest cut-off values. While not a "training set" for an AI algorithm, they serve a similar purpose in defining the operational parameters for the device's classification logic.

9. How the Ground Truth for the Training Set Was Established

Given that there isn't a "training set" for an AI model, the "ground truth" for establishing the reference ranges and cut-offs was based on:

Population Distribution: Statistical analysis of enzyme activity levels in large cohorts of presumably healthy newborns (5041 from EU, 5251 from US Site A, 5053 from US Site B).
Expert-defined Percentiles: The initial cut-off values were based conservatively on "0.1 - 0.3 percentile of enzyme activity distribution and converted to a percentage of population median activity," which reflects expert consensus on appropriate thresholds for screening. The "retest cut-off values were set 5% lower from the initial cut-off percentage."

This process is standard for establishing normal ranges and screening cut-offs for diagnostic assays and involves statistical methods and clinical expert judgment in setting initial thresholds.

Ask a Question

Ask a specific question about this device

K Number

DEN140010

Device Name

PERKINELMER ENLITE NEONATAL TREC TEST SYSTEM

Manufacturer

WALLAC OY

Date Cleared

2014-12-15

(299 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The EnLite™ Neonatal TREC Kit is an in vitro diagnostic device intended for the semiquantitative determination of TREC (T-cell receptor excision circle) DNA in blood specimens dried on filter paper. The test is for use on the VICTOR™ EnLite instrument. The test is indicated for use as an aid in screening newborns for severe combined immunodeficiency disorder (SCID).

This test is not intended for use as a diagnostic test or for screening of SCID-like Syndromes, such as DiGeorge Syndrome, or Omenn Syndrome. It is also not intended to screen for less acute SCID syndromes such as leaky-SCID or variant SCID.

Device Description

The EnLite™ Neonatal TREC Kit is comprised of the EnLite™ Neonatal TREC Kit, the VICTOR™ EnLite instrument and the EnLite™ workstation software. The EnLite™ Neonatal TREC Kit contains reagents sufficient for 384 reactions or 1152 reactions, and multi-level, dried blood spot (DBS) calibrators and controls. The DBS calibrators and DBS controls have been prepared from porcine whole blood with a hematocrit value of 48-55%, and contain purified salmon-sperm, TREC, and beta-actin DNA.

AI/ML Overview

The EnLite™ Neonatal TREC Kit is an in vitro diagnostic device for semi-quantitative determination of T-cell receptor excision circles (TRECs) in dried blood specimens, used as an aid in screening newborns for severe combined immunodeficiency disorder (SCID).

1. Acceptance Criteria and Reported Device Performance

The acceptance criteria for the EnLite™ Neonatal TREC Kit are outlined in the regulatory information, specifically within the "Special Controls" section (Section T, point 1(iii)). These criteria detail the required analytical and clinical performance characteristics for the device. The reported device performance is presented throughout the "Performance Characteristics" section (Section M).

Here's a table summarizing key acceptance criteria and reported performance, focusing on the clinical validation study as that directly addresses the intended use of screening:

Table of Acceptance Criteria and Reported Device Performance

Performance Characteristic	Acceptance Criteria (from Special Controls)	Reported Performance (from Clinical Study)
Clinical Validity	Data demonstrating clinical validity using well-characterized prospectively or retrospectively obtained clinical specimens representative of the intended use population. A minimum of 10-15 confirmed positive specimens from more than one site, with relevant annotation, and SCID diagnosis by flow cytometry or clinically meaningful information regarding subject status at one year or beyond. Additional specimens characterized by other disorders with low/absent TREC (e.g., other T-cell lymphopenic disorders) to supplement the range of results. Pre-specified clinical decision point (cut-off) before studies. Results summarized in tabular format comparing interpretation to reference method. Point estimates and 95% CIs for PPA, NPA, and OPA. Data must include retest rate, false positive rate before retest, final false positive rate, and false negative rate.	The primary clinical study objective was to demonstrate the EnLite™ Neonatal TREC Kit's screening performance in the intended use population and its ability to discriminate between normal and SCID cases. The study was conducted retrospectively. SCID Positive Specimens: 17 archived confirmed SCID positive DBS specimens were obtained from newborn screening laboratories in the US. All 17 were confirmed for SCID by flow cytometry. These enriched the study due to the low incidence of SCID. Other Low TREC Specimens: An additional 9 DBS specimens from babies with low TREC values (0 to 20 TREC Copies/uL) were included. Comparator: For routine clinical study specimens, the comparator was the clinical assessment from medical records at one year of age or older (365 days), confirming the newborn was not identified with SCID, was not deceased from SCID-related complications, and was apparently healthy. For confirmed SCID cases, the comparator was the reference tests results for SCID confirmation. Pre-specified Cut-off: The cut-off for TREC was pre-determined to be 36 copies/uL and for beta-actin as 56 copies/uL, based on the 2.5th percentile of normal distribution data from a separate cut-off confirmation study using 2846 archived, retrospective newborn specimens from the Danish Newborn Screening Biobank. Retest Rate: The retest rate was 1.9%. False Positive Rate: The false positive rate using the cut-off of 36 in the first round of testing was 1.5%. After repeat testing on follow-up cases, the final false positive rate was 0.5%. False Negative Rate: The clinical data indicates 0 false negative results among the 16 confirmed SCID positives classified after the final testing round (Table 14). Performance (from Table 14, excluding invalid results): - Overall Percent Agreement (OPA): 99.7% (95% CI: 99.4% to 99.8%) - Positive Percent Agreement (PPA): 100% (95% CI: 79.4% to 100%) - Negative Percent Agreement (NPA): 99.7% (95% CI: 99.4% to 99.8%) Note: One SCID positive specimen in the clinical study was classified as an invalid result, leading to 16 confirmed SCID positives being used for final agreement calculations.

2. Sample Sizes and Data Provenance

Test Set Sample Size:
- Clinical Study: A total of 6,471 neonatal specimens were run, with 6,373 included in the final analysis. This included 6,389 routine Danish newborn screening biobanked newborn routine DBS samples and 82 enrichment samples (17 confirmed SCID positive samples, 9 confirmed low-level TREC specimens, and 56 samples used for blinding purposes). For the final agreement calculations, 5,454 specimens (after some exclusions and loss-to-follow-up) were used, specifically 5,442 after removing invalid results (16 confirmed SCID positives and 5,426 normal/presumptive normal).
- Cut-off Establishment Study: 3,243 archived, retrospective newborn specimens initially, with 2,846 included in the analysis after exclusions.
- Analytical Performance Studies (Examples):
  - Reproducibility (Site-to-Site): 90 measurements per sample (6 unique TREC levels, 10 runs x 3 laboratories x 3 replicates/sample).
  - Precision: 27 runs performed over 20 days. For TREC precision, 10 samples were assessed with 4 replicates/sample. For beta-actin, 7 samples were used.
  - LoB/LoD/LoQ: 5 samples for LoB (60 results per sample); 5 samples for LoD/LoQ (108 results per sample).
Data Provenance:
- Clinical Study & Cut-off Establishment:
  - Country of Origin: Denmark (samples from the Danish Newborn Screening Biobank, comprising the Danish population).
  - Retrospective/Prospective: All samples were archived, retrospective.
- SCID Enrichment Samples: 17 confirmed SCID positive DBS specimens were obtained from newborn screening laboratories in the US (retrospective).

3. Number of Experts and Qualifications for Ground Truth

The document does not explicitly state the number of experts used to establish the ground truth for the test set, nor their specific qualifications (e.g., "radiologist with 10 years of experience").

However, the ground truth for the 17 confirmed SCID positive specimens was established by flow cytometry, which is a specialized laboratory test requiring expert interpretation, presumably by qualified clinical immunologists or pathologists.
For the routine newborn specimens, the comparator for ground truth was the clinical assessment of the study subjects obtained from their medical records at one year of age or older (365 days), confirming they were not identified with SCID, were not deceased from SCID-related complications, and were apparently healthy. This clinical assessment would implicitly involve input from various medical professionals (pediatricians, specialists).
The "expert" component primarily comes from the reference method (flow cytometry) for SCID diagnosis and the subsequent clinical follow-up for the larger cohort.

4. Adjudication Method for the Test Set

The adjudication method for the test set was not explicitly described as a multi-expert consensus process like "2+1" or "3+1" that is common in medical imaging studies. Instead, the ground truth for SCID confirmation was primarily based on:

Laboratory Confirmation: Flow cytometry for the 17 confirmed SCID cases.
Clinical Outcomes: Medical record review at one year of age or older for the large cohort of routine newborns to determine the absence of SCID.

The device itself has an internal retesting algorithm (Section P.4 and P.4, Figure 8). Initial results below the cut-off are "presumptive positive" and are retested in duplicate. This internal retesting acts as a form of "internal adjudication" for the device's own classification, but it's not external expert adjudication of the ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was mentioned. This device is a laboratory diagnostic kit and not an AI-assisted diagnostic tool for human readers (like a CAD system for radiologists). Therefore, a study to measure how much human readers improve with AI vs. without AI assistance is not applicable to this type of device.

6. Standalone Performance

The study primarily assessed the standalone performance of the device/kit (EnLite™ Neonatal TREC Kit, VICTOR™ EnLite instrument, and EnLite™ workstation software) in classifying samples as "presumptive positive" or "normal" based on its quantitative TREC and beta-actin measurements and the predefined cut-offs. The results (PPA, NPA, OPA) reflect the performance of the integrated system in a laboratory setting, without direct human cognitive interpretation of raw data for diagnosis. The output from the device is a quantitative TREC value, which is then used with a hard cut-off.

7. Type of Ground Truth Used

The ground truth used was a combination of:

Laboratory Test Confirmation: For the known SCID positive cases, flow cytometry was used to confirm SCID.
Clinical Outcomes Data: For the large cohort of routine newborn screening samples, the absence of SCID was determined through medical record review (vaccination records, national patient registry, civil registration system) at one year of age or older, looking for signs of SCID or SCID-related complications/death.

8. Sample Size for the Training Set

The document describes the evaluation of an already developed device/kit, not a machine learning model. Therefore, there is no explicit "training set" in the context of machine learning model development. The data used for establishing the clinical cut-off (2,846 samples from the Danish Newborn Screening Biobank) could be considered analogous to a "development" or "calibration" dataset, which informed the final cut-off value used in the pivotal study (test set).

9. How the Ground Truth for the Training Set Was Established

As noted above, there isn't a "training set" for a machine learning model. For the dataset used to establish the clinical cut-off (2,846 samples):

These were archived, retrospective newborn specimens from the Danish Newborn Screening Biobank.
The ground truth in this context was based on the distribution of TREC and beta-actin values in this "normal newborn population". The 2.5 percentile of this distribution was then chosen as the clinical cut-off for TREC (36 copies/uL) and beta-actin (56 copies/uL). This is a statistical approach to defining "normal" for screening purposes, rather than a direct disease diagnosis for each individual sample.

Ask a Question

Ask a specific question about this device

K Number

K133652

Device Name

GSP NEONATAL TOTAL GALACTOSE KIT

Manufacturer

WALLAC OY, A SUBSIDIARY OF PERKINELMER, INC.

Date Cleared

2014-04-28

(152 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K090846,K071649

Predicate For

K190335

Intended Use

The GSP Neonatal Total Galactose kit is intended for the quantitative determination of total galactose (galactose and galactose-1-phosphate) concentrations in blood specimens dried on filter paper as an aid in screening newborns for galactosemia using the GSP® instrument.

Device Description

The GSP Neonatal Total Galactose kit contains sufficient reagents to perform 1152 assays. The GSP Neonatal Total Galactose test system measures total galactose, i.e. both galactose and galactose-1-phosphate, using a fluorescent galactose oxidase method. The fluorescence is measured using an excitation wavelength of 505 nm and an emission wavelength of 580 nm. The kit contains Neonatal Total Galactose Assay Reagent 1, Neonatal Total Galactose Assay Reagent 2, Neonatal Total Galactose Assay Buffer, Neonatal Total Galactose Assay Reconstitution Solution, and Neonatal Extraction Solution. Calibrators and Controls are also included.

AI/ML Overview

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria	Reported Device Performance
Precision (Total Variation)	Ranged from 9.3% to 14.1% CV.
Limit of Blank (LoB)	0.34 mg/dL
Limit of Detection (LoD)	0.97 mg/dL
Limit of Quantitation (LoQ)	1.15 mg/dL (defined as the lowest concentration with a total CV equal to or less than 20%).
Linearity	Demonstrated linear performance throughout the measuring range (from 1.15 mg/dL to 50 mg/dL).
Recovery	Average recovery of 109% for galactose, 117% for galactose-1-phosphate, and 103% for both combined from three contrived dried blood spot samples.
Interference	- Acetaminophen: Concentrations above 2.75 mg/dL caused a significant decrease (>15%) in measured total galactose. Maximum tested (5.5 mg/dL) caused a decrease of ~20-22%.- Conjugated Bilirubin: Concentrations above 16.6 mg/dL caused a significant decrease (>15%) in measured total galactose. At 24.9 mg/dL and above, the decrease was 100% at some total galactose concentrations.- Intralipid: Concentrations above 250 mg/dL (at 5 and 10 mg/dL total galactose) or 375 mg/dL (at 15 mg/dL total galactose) caused a significant increase (>15%) in measured total galactose. Maximum tested (1500 mg/dL) caused an increase of ~52-77%.- Hemoglobin (with Bilirubin): Hemoglobin levels at 198 g/L and above in combination with an elevated bilirubin level of 15 mg/dL caused a significant increase (>15%) in measured total galactose at certain total galactose concentrations. For example, at 5 mg/dL total galactose, 198 g/L Hb led to a 26.3% increase.- Non-Interfering Substances: Unconjugated bilirubin (20 mg/dL), ß-Nicotinamide adenine dinucleotide (100 µmol/L), Glutathione (3 mmol/L), Human Serum Albumin (30 mg/mL), Ascorbate (6 mg/dL), D-glucose (1000 mg/dL), D-mannose (100 mg/dL), D-fructose (18 mg/dL), Ampicillin (152 µmol/L), and Lithium heparin (0.375 mg/ml), and Hematocrit levels from 30% to 66% (102-230 g/L Hemoglobin) were found not to interfere.
Screening Performance vs. Predicate (95th percentile)	Overall percent agreement = 96.0%Positive percent agreement = 63.6%Negative percent agreement = 97.9%
Screening Performance vs. Predicate (99th percentile)	Overall percent agreement = 98.8%Positive percent agreement = 53.3%Negative percent agreement = 99.4%

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Precision Study: 7 samples, with 216 total measurements per sample (4 replicates per sample, in 27 runs over 21 days using 3 kit lots and 3 GSP instruments).
Sample Size for LoD: 216 determinations of 4 low-level samples.
Sample Size for LoB: 150 blank samples.
Sample Size for Recovery: 3 contrived dried blood spot samples.
Sample Size for Interference Studies: Not explicitly stated for each concentration, but involved various concentrations of interfering substances at three total galactose concentrations (5, 10, and 15 mg/dL).
Sample Size for Internal Method Comparison: 141 routine screening and spiked blood spot specimens.
Sample Size for Screening Performance Study: 2320 samples (6 confirmed positive samples and 2314 routine samples).
Data Provenance: The screening performance study was conducted at "one newborn screening laboratory in the United States." Other non-clinical studies (precision, linearity, LoB/LoD/LoQ, recovery, interference) appear to be internal laboratory studies without specific geographic provenance mentioned, but presumably also conducted in the US or Finland (Wallac Oy headquarters). The studies were retrospective, using banked samples and contrived samples.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

The document does not mention the use of experts to establish ground truth for the test set. For the screening performance study, "6 confirmed positive samples" are mentioned, implying prior clinical diagnosis as the ground truth. The qualifications of who confirmed these positive cases or how the "routine samples" were classified as normal are not specified.

4. Adjudication Method for the Test Set

Not applicable. The document does not describe any expert adjudication process for the test set. Ground truth for confirmed positive samples likely came from clinical diagnosis.

5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Not applicable. This device is an in-vitro diagnostic test kit (laboratory assay) for quantitative determination, not an imaging device or AI-assisted diagnostic tool that would involve human readers interpreting results in a MRMC study.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

This refers to the performance of the assay itself. The entire submission details the standalone performance of the GSP Neonatal Total Galactose kit (assay only) without human-in-the-loop interpretation beyond standard laboratory procedures and reporting. The "GSP® instrument" is automated, as stated in the comparison chart ("GSP instrument, automated").

7. The Type of Ground Truth Used

For Analytical Performance (Precision, LoB, LoD, LoQ, Linearity, Recovery, Interference): Ground truth was established by preparing samples with known concentrations of total galactose or specific interfering substances. For example, for recovery, the "recovery of galactose, galactose-1-phosphate, and both combined was determined from three contrived dried blood spot samples," meaning these samples were prepared with known amounts.
For Screening Performance Study: The ground truth for the 6 positive samples was "confirmed positive." This implies a clinical diagnosis of galactosemia, likely through follow-up diagnostic testing. The other 2314 samples are referred to as "routine samples" and classified as "normal" in the context of screening performance, likely based on either their negative predicate device result or their actual clinical status. The document also compares the new device's results against the predicate device's results as a reference for "Manual result."

8. The Sample Size for the Training Set

Not applicable in the conventional sense of machine learning training sets. This is a chemical assay, not an AI/ML device that requires a distinct training set for model development. The development and optimization of the assay would involve various experiments, but these are not referred to as "training sets" in this context.

9. How the Ground Truth for the Training Set Was Established

Not applicable, as there is no "training set" in the context of an AI/ML model for this chemical assay. The development of the calibrators and controls (which are prepared with known concentrations of galactose and galactose-1-phosphate) serves an analogous function in ensuring accuracy and consistency of the assay.

Ask a Question

Ask a specific question about this device

K Number

K131284

Device Name

GSP NEONATAL BIOTINIDASE KIT

Manufacturer

WALLAC OY

Date Cleared

2013-11-14

(192 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K090123

Predicate For

N/A

Intended Use

The GSP Neonatal Biotinidase kit is intended for the quantitative in vitro determination of human biotinidase activity in blood specimens dried on filter paper as an aid in screening newborns for biotinidase deficiency using the GSP instrument.

Device Description

The GSP Neonatal Biotinidase kit contains sufficient reagents to perform 1152 assays. The GSP Neonatal Biotinidase test system measures biotinidase activity, combining an enzyme reaction with a solid phase time-resolved immunofluorescence assay. The GSP Neonatal Biotinidase assay is based on the ability of the biotinidase enzyme to cleave the amide bond in Eu-labeled biotin. The enzyme reaction is stopped by addition of streptavidin which has high affinity for biotin (either Eu-labeled or free biotin). The streptavidin-biotin complexes are captured by the solid phase monoclonal antibody directed against streptavidin. DELFIA Inducer dissociates the molecules into the solution where the europium fluorescence is measured. The measured fluorescence is inversely proportional to the biotinidase activity of the sample.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Device Performance

Acceptance Criteria Category	Specific Criteria	Reported Device Performance
Precision (Variation)	Total variation (CV) for dried blood spot samples across 3 kit lots and 3 GSP instruments.	Total variation ranged from 7.5% to 12.7% CV.
Analytical Sensitivity	Limit of Blank (LoB)	LoB = 9.5 U/dL (95th percentile of blank samples, n=150)
	Limit of Detection (LoD)	LoD = 14.8 U/dL (based on 360 determinations of five low-level samples)
	Limit of Quantitation (LoQ)	LoQ = 14.8 U/dL (lowest activity with total CV ≤ 20%)
Linearity	Demonstrate linearity throughout the measuring range.	Demonstrated linear throughout the measuring range (from 14.8 U/dL to 325 U/dL).
Interference	Ampicillin, sulfisoxazole, glutathione, unconjugated bilirubin, and conjugated bilirubin/triglyceride effects.	Interfering: Ampicillin (≥1.4 mg/dL at low biotinidase, 2.8 mg/dL at high biotinidase), Sulfisoxazole (≥7.5 mg/dL at low biotinidase), Glutathione (>30 mg/dL), Unconjugated bilirubin (10 mg/dL at low biotinidase), Conjugated bilirubin (≥2.5 mg/dL), Triglyceride (≥250 mg/dL). Non-interfering: Adrenocorticotropic hormone, ascorbic acid, biotin, Gammaglobulin, gentamicin sulphate, hemoglobin, human serum albumin, kanamycin sulphate, penicillin G, phenytoin, phenobarbital, sulfmethoxazole, trimethoprim, valporic acid, vitamin K1 (at specified concentrations).
Clinical Performance	Agreement with predicate device for screening newborns for biotinidase deficiency.	Overall percent agreement: 99.6% (CI 99.2% - 99.8%) Positive percent agreement: 92.0% (CI 74.0% - 99.0%) Negative percent agreement: 99.7% (CI 99.3% - 99.9%)
	Classification of confirmed biotinidase deficient samples.	All 20 retrospective confirmed biotinidase deficiency specimens were classified as screening positive by the predicate. The GSP method initially classified 19/20 as positive, with one initially negative specimen testing positive in 4 repeat tests.

Study Details

Sample size used for the test set and the data provenance:
- Clinical Study Test Set: 2008 specimens (1988 routine newborn screening specimens and 20 retrospective confirmed biotinidase deficiency specimens).
- Provenance: This information is not explicitly stated, but it can be inferred that these are human blood specimens from newborn screening programs. The "retrospective confirmed biotinidase deficiency specimens" suggest historical data, implying a retrospective study design for these specific samples. The routine screening specimens would be prospective in nature from a screening program. No specific country of origin is mentioned.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document does not explicitly state the number of experts or their qualifications for establishing the ground truth.
- For the "retrospective confirmed biotinidase deficiency specimens," the ground truth is implied to be established through confirmation methods for biotinidase deficiency, but no details on expert involvement are provided.
Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- The document does not describe any specific adjudication method for establishing the ground truth of the clinical test set. The term "confirmed biotinidase deficiency specimens" implies that a definitive diagnosis was reached, but the process is not detailed.
- For the one discrepant clinical case, it was subjected to "multiple (4) repeat tests." This could be considered a form of internal adjudication/re-testing rather than external expert consensus.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No MRMC or human-in-the-loop study was conducted or described. This device is an in vitro diagnostic kit, meaning it is an automated assay, not an AI assisting human readers. The comparison is between the new automated device and a predicate manual method.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, the core of the clinical study involved evaluating the performance of the GSP Neonatal Biotinidase kit (an automated instrument-based assay) as a standalone device in comparison to the existing manual predicate device. The results are reported as direct comparisons between the two methods on the same samples.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The ground truth for the clinical study was based on the classification of specimens as "biotinidase deficient" or "normal." For the 20 known deficient specimens, the ground truth was "retrospective confirmed biotinidase deficiency." For the routine screening specimens, it is implied that the predicate device's result (manual Neonatal Biotinidase method) served as a comparative reference, which itself would have a ground truth based on established clinical diagnostic criteria for biotinidase deficiency. The cut-offs used for both devices (0.5th percentile for GSP and 30% of mean + 2SD for the predicate) are tied to population distributions expected for the condition.
The sample size for the training set:
- The document describes the GSP Neonatal Biotinidase kit as a diagnostic assay, and its development would typically involve internal validation and optimization data. The provided text, being a 510(k) summary, primarily focuses on the test set performance to demonstrate substantial equivalence.
- There is no explicit mention of a separate "training set" in the context of machine learning or AI models, as this is an in vitro diagnostic kit based on an enzymatic reaction. The calibrators and controls used in the kit are standardized and prepared from human whole blood, aiding in the assay's calibration and ongoing quality control during operation.
How the ground truth for the training set was established:
- Not applicable in the context of a "training set" for a traditional in vitro diagnostic assay.
- For the calibrators, they were "calibrated against in-house primary calibrators (dried blood spots, stored at -80 to -60°C) prepared using adult human blood (endogenous biotinidase activity in serum) and washed red blood cells as blood matrices." This describes how the reference values for the assay's internal calibration curve are established.

Ask a Question

Ask a specific question about this device

K Number

K110274

Device Name

AUTODELFIA NEONATAL IRT KIT

Manufacturer

WALLAC OY, A SUBSIDIARY OF PERKINELMER, INC.

Date Cleared

2011-06-10

(130 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The AutoDELFIA Neonatal IRT kit is intended for the quantitative determination of human immunoreactive trypsin(ogen) (IRT) in blood specimens dried on filter paper as an aid in screening newborns for cystic fibrosis using the 1235 AutoDELFIA® automatic immunoassay system.

Device Description

The AutoDELFIA Neonatal IRT assay is a solid phase, two-site fluoroimmunometric assay based on the direct sandwich technique in which two monoclonal antibodies (derived from mice) are directed against two separate antigenic determinants on the IRT molecule. Calibrators, controls and test specimens containing IRT are reacted simultaneously with immobilized monoclonal antibodies directed against a specific antigenic site on the IRT molecule and europium-labeled monoclonal antibodies (directed against a different antigenic site) in assay buffer. The assay buffer elutes IRT from the dried blood on filter paper disks. The complete assay requires only one incubation step. Enhancement Solution dissociates europium ions from the labeled antibody into solution where they form highly fluorescent chelates with components of the Enhancement Solution. The fluorescence in each well is then measured. The fluorescence of each sample is proportional to the concentration of IRT in the sample.

AI/ML Overview

The provided text describes a 510(k) premarket notification for the AutoDELFIA Neonatal IRT kit. This submission focuses on demonstrating substantial equivalence to a predicate device, rather than providing a detailed study report with specific acceptance criteria and performance data in the format typically used for AI/ML device evaluations. As such, many of the requested fields for AI/ML device studies are not directly applicable or explicitly stated in this document.

However, I can extract information related to the device's analytical performance characteristics, which serve as a form of acceptance criteria for this type of in-vitro diagnostic device.

Here's an attempt to populate the table and answer the questions based on the provided text, indicating where information is not available.

1. Table of Acceptance Criteria and the Reported Device Performance

For this in-vitro diagnostic device, "acceptance criteria" are generally established by demonstrating performance characteristics that are comparable to or better than a legally marketed predicate device, and that meet the required analytical performance for its intended use.

Characteristic (Feature)	Acceptance Criteria (from Predicate Device)	Reported Device Performance (New Device: B005-212/B005-204)
Measuring Range	4 (as defined by LoB) to 500 (as defined by upper calibrator) ng/mL blood	16 to 480 ng/mL blood
Linearity Range	No claims for linearity in labeling.	16 to 480 ng/mL blood
Analytical Sensitivity / Limit of Blank (LoB)	< 4 ng/mL blood	0.53 ng/mL blood
Limit of Detection (LoD)	Not explicitly stated, implied to be around 4 ng/mL blood (from LoB)	2.9 ng/mL blood
Antibody Cross-Reactions	α2-macroglobulin < 4 ng/ml blood, α1-antitrypsin < 4 ng/ml blood, Phospholipase A2 < 4 ng/ml blood, Chymotrypsin < 4 ng/ml blood, Human IgG < 4 ng/ml blood, Uropepsinogen < 4 ng/ml blood	α2-macroglobulin < 4 ng/ml blood, α1-antitrypsin < 4 ng/ml blood, Phospholipase A2 < 4 ng/ml blood, Chymotrypsin < 4 ng/ml blood, Human IgG < 4 ng/ml blood, Uropepsinogen < 4 ng/ml blood (All "Same" as predicate, which explicitly lists these values)
Hook effect	No hook effect has been found with IRT concentrations up to 40,000 ng/mL	No hook effect has been found with IRT concentrations up to 40,000 ng/mL
Precision (Total Variation CV%)	42.6 ng/mL blood CV% 9.3, 98.8 ng/mL blood CV% 10.0, 266 ng/mL blood CV% 9.6	16.7 ng/mL blood CV% 8.7, 22.5 ng/mL blood CV% 9.6, 48.0 ng/mL blood CV% 9.1, 104 ng/mL blood CV% 8.0, 247 ng/mL blood CV% 8.3, 401 ng/mL blood CV% 8.4, 449 ng/mL blood CV% 9.4

Note on "Acceptance Criteria": For this 510(k) submission, the "acceptance criteria" are implied to be achieving analytical performance characteristics that are comparable to or improved from the predicate device, thereby demonstrating substantial equivalence. The table shows that the new device generally performs comparably or better (e.g., lower LoB, explicit linearity claim, more detailed precision data, and a wider range of concentrations with good precision).

Regarding the study proving the device meets the acceptance criteria:

The document describes the submission as a 510(k) for an in-vitro diagnostic kit. The "study" here refers to the analytical performance evaluation conducted by the manufacturer to demonstrate substantial equivalence to the predicate device. The information provided is a summary of the device's analytical characteristics.

2. Sample size used for the test set and the data provenance:

Sample Size: Not explicitly stated in terms of number of individual patient samples. The precision data lists several concentration levels (e.g., 16.7 ng/mL, 22.5 ng/mL, etc.), implying multiple measurements were taken at each level. The cross-reactivity and hook effect studies would have involved specific spiked samples.
Data Provenance: Not explicitly stated (e.g., country of origin). It's an in-house analytical validation, likely conducted at the manufacturer's facility. It is a retrospective analysis of laboratory-prepared samples or collected blood spots rather than a prospective clinical study involving external patient recruitment.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

This question is more applicable to AI/ML devices that rely on expert interpretation for ground truth. For this in-vitro diagnostic assay, the "ground truth" for reported values (e.g., IRT concentration) is established by the analytical method itself and calibration against known standards. There's no mention of external expert consensus for establishing ground truth for the analytical performance data.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:

Not applicable for this type of analytical performance study of an in-vitro diagnostic kit. Adjudication methods like 2+1 or 3+1 are typically used in clinical studies where multiple human readers interpret medical images or clinical data, and a disagreement resolution process is needed to establish a definitive ground truth.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Not applicable. This is not an AI/ML device designed to assist human readers. It's an automated immunoassay system for quantitative measurement.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

No, this is not an AI/ML algorithm. It is an automated immunoassay kit where the "algorithm" is the biochemical reaction and the instrument's measurement and calculation of IRT concentration. The device operates in a standalone analytical capacity to measure IRT.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc):

The ground truth for the analytical performance characteristics (such as concentration, linearity, limit of blank, limit of detection, cross-reactivity, hook effect, and precision) would be established by:
- Reference materials/known standards: For calibration, linearity, and determining accurate concentrations.
- Spiked samples: For cross-reactivity and hook effect studies where known interferents or high concentrations are added.
- Repeated measurements: For precision studies.

8. The sample size for the training set:

Not explicitly stated, and the concept of a "training set" as understood in AI/ML is not directly applicable. For this type of device, development involves optimizing the assay components and conditions, which is an iterative process using various samples (e.g., patient samples, spiked samples, controls) but not typically referred to as a discrete "training set" in the AI/ML context.

9. How the ground truth for the training set was established:

As above, the concept of a "training set" with established ground truth in the AI/ML sense is not relevant here. Ground truth for internal development and optimization would be based on the known biochemical properties of the reagents, reference standards, and performance evaluation criteria.

Ask a Question

Ask a specific question about this device

K Number

K103484

Device Name

GSP NEONATAL THYROXINE (T4)

Manufacturer

WALLAC OY, A SUBSIDIARY OF PERKINELMER, INC.

Date Cleared

2011-04-22

(147 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K943416

Predicate For

N/A

Intended Use

The GSP Neonatal Thyroxine (T4) kit is intended for the quantitative determination of human thyroxine (T4) in blood specimens dried on filter paper as an aid in screening newborns for congenital (neonatal) hypothyroidism using the GSP instrument.

Device Description

The GSP Neonatal T4 assay is a solid phase time-resolved fluoroimmunoassay based on the competitive reaction between europium-labeled T4 and sample T4 for a limited amount of binding sites on T4 specific monoclonal antibodies (derived from mice). The use of 8-anilino-1-naphthalenesulfonic acid (ANS) and salicylate in the T4 Assay Buffer facilitates the release of T4 from the binding proteins. Thus the assay measures the total amount of T4 in the test specimen. A second antibody, directed against mouse IgG, is coated to the solid phase, and binds the IgG-thyroxine complex, giving convenient separation of the antibody-bound and free antigen. DELFIA Inducer dissociates europium ions from the labeled antibody into solution where they form highly fluorescent chelates with components of DELFIA Inducer. The fluorescence in each well is then measured. The fluorescence of each sample is inversely proportional to the concentration of T4 in the sample.

AI/ML Overview

The provided text describes a 510(k) premarket notification for an in vitro diagnostic device, the GSP Neonatal Thyroxine (T4) kit. This type of submission focuses on demonstrating substantial equivalence to a legally marketed predicate device rather than conducting a full clinical study with specific acceptance criteria and ground truth for disease diagnosis in the same way an AI/ML powered device might.

Therefore, the requested information regarding "acceptance criteria" for an AI device, "sample size for the test set," "number of experts," "adjudication method," "MRMC study," "standalone performance," and "ground truth for training/testing" in the context of an AI/ML study does not directly apply to this submission.

However, I can extract the closest analogous information available within this document, focusing on the performance characteristics presented to demonstrate equivalence.

Here's an attempt to answer the questions based on the provided document, interpreting "acceptance criteria" as performance metrics for this diagnostic kit.

1. Table of Acceptance Criteria and Reported Device Performance

For an in-vitro diagnostic kit like this, "acceptance criteria" are typically defined by demonstrating that the new device performs comparably to or within acceptable ranges relative to a predicate device and established analytical performance specifications. The document provides a comparison of various features and performance characteristics between the new GSP Neonatal T4 kit and its predicate device, AutoDELFIA Neonatal T4 Kit.

Performance Characteristic	Predicate Device (AutoDELFIA T4) Performance (Analogous to "Acceptance Criteria" for comparison)	GSP Neonatal T4 Kit Reported Performance (Analogous to "Device Performance")
Precision (CVs)	Control 1; 3.95 µg/dL serum - Intra-assay variation 14.9 % - Inter-assay variation 10.0 % - Total variation 18.0 % Control 2; 8.08 µg/dL serum - Intra-assay variation 10.6 % - Inter-assay variation 7.1 % - Total variation 12.7 % Control 3; 18.2 µg/dL serum - Intra-assay variation 8.2% - Inter-assay variation 4.3% - Total variation 9.3 %	Sample 1; 2.0 µg/dL - Within run 1.0% - Within lot 15.5% - Total variation 15.8% Sample 2; 4.8 µg/dL - Within run 7.3% - Within lot 10.7% - Total variation 11.4% Sample 3; 7.5 µg/dL - Within run 6.5% - Within lot 8.4% - Total variation 8.6% Sample 4; 16.6 µg/dL - Within run 4.5% - Within lot 7.8% - Total variation 8.5% Sample 5; 19.8 µg/dL - Within run 7.2% - Within lot 9.9% - Total variation 10.3% Sample 6; 21.4 µg/dL - Within run 7.1% - Within lot 9.8% - Total variation 10.1%
Measuring Range	1.5 µg/dL to the highest level calibrator	1.6 to 30 µg/dL serum
Limit of Blank (LoB)	< 1.5 µg/dL	0.457 µg/dL
Limit of Detection (LoD)	Not available	0.99 µg/dL
Limit of Quantitation (LoQ)	Not available	1.61 µg/dL
Interference	Bilirubin at 20 mg/dL has no significant effect.	Icteric (unconjugated bilirubin ≤ 342 µmol/L, equivalent to 20 mg/dL in serum, and conjugated bilirubin ≤ 237 µmol/L, equivalent to 20 mg/dL in serum), Lipemic (Intralipid¹ ≤ 15 mg/mL in serum), and Hemoglobin up to 15 g/L samples do not interfere. [¹Intralipid is a registered trademark of Fresenius Kabi AB.]
Cross-reactivity	LT3: 0.89% 3,3',5-Triiodoacetic acid: 0.45% 3,5-Diiodo-L-thyronine: < 0.1% 3,5-Diiodotyrosine (DIT): < 0.1% 5,5 Diphenylhydantoin: < 0.1% 3-iodo-L-tyrosine (MIT): < 0.1% Phenylbutazone: < 0.1% 6-n-Propyl-2-thiouracil: < 0.1% Methimazole: < 0.1% L-Tyrosine: < 0.1% Acetylsalicylic acid: < 0.1%	LT3: 1.67% 3,3',5-Triiodothyroacetic acid: 0.14% 3,5-Diiodo-L-thyronine: < 0.1% 3,5-Diiodotyrosine (DIT) dihydrate: < 0.1% 5,5-Diphenylhydantoin: < 0.1% 3-iodo-L-tyrosine (MIT): < 0.1% Phenylbutazone: < 0.1% 6-n-Propyl-2-thiouracil: < 0.1% Methimazole: < 0.1% L-Tyrosine: < 0.1% Acetylsalicylic acid: < 0.01%

2. Sample size used for the test set and the data provenance

The document does not specify a separate "test set" sample size in the context of an AI/ML algorithm validation. Instead, it describes analytical performance studies.

Precision study: The precision data (Within run, Within lot, Total variation) is presented for six different samples at various concentration levels (2.0, 4.8, 7.5, 16.6, 19.8, 21.4 µg/dL). The number of replicates or runs for each sample is not explicitly stated.
Interference study: Not explicitly stated, but the document mentions testing with specific concentrations of bilirubin, Intralipid, and hemoglobin.
Cross-reactivity study: Not explicitly stated, but specific substances and their cross-reactivity percentages are listed.
Data Provenance: The document does not specify the country of origin or whether the data was retrospective or prospective. Given the nature of a premarket submission for an IVD kit, these studies are typically conducted by the manufacturer as part of the validation process.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

This question is not applicable. For an immunoassay kit like this, the "ground truth" is established by the direct measurement of T4 concentration using the device itself, calibrated against known standards. There are no human "experts" establishing ground truth through image review or clinical assessment in the way an AI/ML device for diagnosis would require. The "qualification" of personnel pertains to "adequately trained laboratory personnel" running the assay.

4. Adjudication method for the test set

This question is not applicable. Adjudication methods are relevant for studies where multiple independent human readers or algorithms are assessing the same data, and their results need to be reconciled (e.g., in medical image interpretation). This is an analytical performance study of an immunoassay.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

This question is not applicable. This is an immunoassay kit, not an AI/ML-powered device intended for human-in-the-loop assistance in clinical decision-making or image interpretation. Therefore, no MRMC study was performed.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

This question is not applicable. The device itself is a standalone diagnostic kit (reagents and instrument) that produces quantitative T4 measurements. It is not an AI/ML algorithm.

7. The type of ground truth used

For an immunoassay, the "ground truth" for evaluating the device's performance is typically established through:

Known Calibrator Concentrations: The device's measurements are compared against a standard curve generated from calibrators with precisely known T4 concentrations.
Reference Methods: If available, comparison to a gold standard reference method for T4 measurement would be part of validation (though not explicitly detailed as "ground truth" in this summary).
Control Materials: Use of quality control materials with target T4 concentrations.

The document discusses "calibrators" and "controls" which are used to establish and verify the accuracy of the measurements.

8. The sample size for the training set

This question is not applicable. This is not an AI/ML device that requires a "training set" for model development. The performance data presented relates to the validation of the manufactured kit through analytical studies.

9. How the ground truth for the training set was established

This question is not applicable as there is no "training set" in the context of an AI/ML model for this type of medical device submission.

Ask a Question

Ask a specific question about this device

K Number

K102419

Device Name

GSP NEONATAL IRT KIT (3306-001U)

Manufacturer

WALLAC OY, SUBSIDIARY OF PERKINELMER

Date Cleared

2010-12-16

(113 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

The GSP Neonatal IRT kit is intended for the quantitative determination of IRT in blood specimens dried on filter paper as an aid in screening newborns for cystic fibrosis using the GSP® instrument.

Device Description

The GSP Neonatal IRT assay is a solid phase, two-site fluoroimmunometric assay based on the direct sandwich technique in which two monoclonal antibodies (derived from mice) are directed against two separate antigenic determinants on the IRT molecule. Calibrators, controls or test specimens containing IRT are reacted simultaneously with immobilized monoclonal antibodies directed against a specific antigenic site on the IRT molecule and europium-labeled monoclonal antibodies (directed against a different antigenic site) in assay buffer. The assay buffer elutes IRT from dried blood on filter paper disks. The complete assay requires only one incubation step. DELFIA Inducer dissociates europium ions from the labeled antibody into solution where they form highly fluorescent chelates with components of the DELFIA Inducer. The fluorescence in each well is then measured. The fluorescence of each sample is proportional to the concentration of IRT in the sample.

AI/ML Overview

The provided text describes the GSP Neonatal IRT kit, a device for screening newborns for cystic fibrosis. It compares the new device to a predicate device (AutoDELFIA Neonatal IRT kit) and presents information on its performance characteristics.

Here’s a breakdown of the acceptance criteria and study information:

1. Table of Acceptance Criteria and Reported Device Performance

Characteristic	Acceptance Criteria (Predicate Device K0003668)	Reported Device Performance (GSP Neonatal IRT kit)
Intended Use / Indications for Use	Quantitative determination of human IRT in blood specimens dried on filter paper as an aid in screening newborns for cystic fibrosis using the 1235 AutoDELFIA automatic immunoassay system.	Quantitative determination of IRT in blood specimens dried on filter paper as an aid in screening newborns for cystic fibrosis using the GSP® instrument.
Instrument	1235 AutoDELFIA Instrument	GSP Instrument
Dissociation solution	Enhancement Solution	DELFIA Inducer
Antibody Cross-Reactions
α2-macroglobulin	< 4 ng/ml blood	0.000%
α1-antitrypsin	< 4 ng/ml blood	0.000%
Phospholipase A2	< 4 ng/ml blood	0.014%
Chymotrypsin	< 4 ng/ml blood	0.959%
Human IgG	< 4 ng/ml blood	0.000%
Pepsinogen	< 4 ng/ml blood (Uro)Pepsinogen	-0.056%
Complement Factor I	NA	0.000%
Measuring Range	4 (as defined by LoB) to 500 (as defined by upper calibrator) ng/mL blood	9 to 500 ng/mL blood
Tracer	Anti-IRT-Eu tracer stock solution, approximate concentration of ~50 µg/ml mouse monoclonal	Anti-IRT-Eu tracer stock solution, approximate concentration of ~40 µg/ml mouse monoclonal
Analytical Sensitivity / Limit of Blank	< 4 ng/mL blood	Limit of Blank: 0.76 ng/mL blood
Limit of Detection	Not explicitly stated for predicate; implied by LoB	2.2 ng/mL blood
Limit of Quantitation	Not explicitly stated for predicate; implied by LoB	2.2 ng/mL blood
Precision (Total Variation)	42.6 ng/mL blood CV% 9.3; 98.8 ng/mL blood CV% 10.0; 266 ng/mL blood CV% 9.6	10.9 ng/mL blood CV% 7.3; 22.2 ng/mL blood CV% 7.2; 28.5 ng/mL blood CV% 7.0; 40.0 ng/mL blood CV% 8.2; 50.2 ng/mL blood CV% 8.0; 61.6 ng/mL blood CV% 7.8; 93.5 ng/mL blood CV% 7.2; 302.3 ng/mL blood CV% 7.4; 449 ng/mL blood CV% 7.5

Study Proving Device Meets Acceptance Criteria:

The document describes a substantial equivalence claim, not a separate clinical study with independent acceptance criteria beyond demonstrating equivalence to the predicate device. The "study" here is the comparison and characterization of the new device's performance against the established predicate device (AutoDELFIA Neonatal IRT kit, K0003668). The acceptance criteria are largely implicitly defined by the performance characteristics of the predicate device, where the new device is shown to have similar or improved performance in analytical parameters.

2. Sample Size Used for the Test Set and Data Provenance

The provided text does not explicitly state a sample size for a test set in the context of a clinical study. The performance characteristics reported (e.g., precision, analytical sensitivity, measuring range) are typically derived from analytical verification and validation studies in a laboratory setting.

Sample Size: Not explicitly stated for a "test set" in a clinical context. The precision data lists various IRT concentrations tested, and each CV% (Coefficient of Variation) would have been derived from replicate measurements.
Data Provenance: The document does not specify the country of origin of the data or whether the studies generating these performance characteristics were retrospective or prospective. It describes the device's analytical performance.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts

N/A. This is an in-vitro diagnostic device for measuring a biomarker (IRT). The "ground truth" for its performance is based on analytical measurements and comparison to a known standard, not on expert clinical evaluation or interpretation of images.

4. Adjudication Method for the Test Set

N/A. Adjudication methods like 2+1 or 3+1 are typically used in clinical studies involving interpretation of data (e.g., images) by multiple human readers, not in the analytical validation of an in-vitro diagnostic assay measuring a biomarker.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

N/A. An MRMC study is not applicable to the analytical performance evaluation of this type of in-vitro diagnostic device. This device measures a quantitative biomarker and does not involve human interpretation of cases or "readers" in the traditional sense, nor does it aim to improve human reader performance.

6. If a Standalone (i.e. algorithm only without human-in-the-loop performance) was Done

The device itself is a "standalone" assay in the sense that it quantifies IRT levels directly from a blood spot sample using an instrument. There is no human interpretation of an algorithm output, but rather a direct measurement that aids in screening. The GSP® instrument performs the assay automatically.

7. The Type of Ground Truth Used

The ground truth for the device's performance (especially for parameters like measuring range, sensitivity, and precision) would be established by:

Reference materials/calibrators: Samples with known concentrations of IRT (calibrated using gravimetric methods, as mentioned).
Established analytical methods: Comparison to recognized analytical techniques for IRT measurement.
This is not "expert consensus, pathology, or outcomes data" in the clinical sense, but rather a robust analytical validation against known standards.

8. The Sample Size for the Training Set

N/A. This document pertains to an in-vitro diagnostic kit for measuring a biomarker, not a machine learning or AI algorithm that requires a "training set" in the computational sense. The device's performance is based on its chemical and immunological assay design.

9. How the Ground Truth for the Training Set was Established

N/A. As stated above, this is not an AI/ML device that uses a "training set."

Ask a Question

Ask a specific question about this device

K Number

K100682

Device Name

GSP NEONATAL 17A-OH-PROGESTERONE KIT MODEL: 3305-001U

Manufacturer

WALLAC OY

Date Cleared

2010-07-23

(135 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K081922

Predicate For

N/A

Intended Use

The GSP Neonatal 17α-OH-progesterone kit is intended for the quantitative determination of human 17α-OH-progesterone in blood specimens dried on filter paper as an aid in screening newborns for congenital adrenal hyperplasia (CAH) using the GSP™ instrument.

Device Description

The GSP Neonatal 17x-OH-progesterone (17-OHP) assay is a solid phase, time-resolved fluoroimmunoassay based on the competitive reaction between europium-labeled 17-OHP and sample 17-OHP for a limited amount of binding sites on 17-OHP specific polyclonal antibodies (derived from rabbit). Danazol facilitates the release of 17-OHP from the binding proteins. A second antibody, directed against rabbit IgG, is coated to the solid phase, giving convenient separation of the antibody-bound and free antigen. DELFIA Inducer dissociates europium ions from the labeled antigen into solution where they form highly fluorescent chelates with components of DELFIA Inducer. The fluorescence in each well is then measured. The fluorescence of each sample is inversely proportional to the concentration of 17-OHP in the sample.

AI/ML Overview

The provided text describes the GSP Neonatal 17α-OH-progesterone kit and its comparison to a predicate device, the AutoDELFIA Neonatal 17α-OH-progesterone kit. The primary study presented focuses on the "Screening Efficacy" of the new GSP kit by comparing its results to those of the predicate device using various percentile cut-offs.

Here's an analysis of the acceptance criteria and study information:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state formal "acceptance criteria" in a quantitative manner (e.g., "The device must achieve X% accuracy"). Instead, it demonstrates performance by comparing the new device (GSP kit) to the legally marketed predicate device (AutoDELFIA kit) and showing high agreement.

The screening efficacy tables (Tables 1-9) implicitly demonstrate the device's acceptable performance based on its agreement with the predicate device. For the purpose of this analysis, we will consider the "Overall percent agreement" with the predicate device as the key performance metric presented.

Characteristic	Acceptance Criteria (Implicit, based on predicate comparison)	Reported Device Performance (GSP Neonatal 17α-OH-progesterone kit)
Screening Efficacy (Overall Percent Agreement with Predicate Device, AutoDELFIA kit)
- 90% cutoff, ≥2500g	High agreement with predicate device	95.9% (CI 94.9%-96.8%)
- 90% cutoff, 1250g-2249g	High agreement with predicate device	98.6% (CI 97.0%-99.5%)
- 90% cutoff, <1250g	High agreement with predicate device	96.1% (CI 93.3%-98.0%)
- 95% cutoff, ≥2500g	High agreement with predicate device	98.2% (CI 97.4%-98.7%)
- 95% cutoff, 1250g-2249g	High agreement with predicate device	98.6% (Cl 97.0%-99.5%)
- 95% cutoff, <1250g	High agreement with predicate device	96.8% (CI 94.1%-98.4%)
- 99% cutoff, ≥2500g	High agreement with predicate device	99.8% (CI 99.5%-100%)
- 99% cutoff, 1250g-2249g	High agreement with predicate device	98.2% (CI 96.4%-99.2%)
- 99% cutoff, <1250g	High agreement with predicate device	100% (CI 98.8%-100%)
Method Comparison (Regression with Predicate Device)	Y = mx + c, with high R-value	Y= 0.97x + 0.27; r = 0.96 (for 2567 samples)
Analytical Sensitivity (Limit of Detection)	Comparable to predicate device	1.4 ng/mL serum
Analytical Specificity (Cross-Reactions)	Comparable to predicate device	Similar percentages for various interfering substances

2. Sample size used for the test set and the data provenance

Test Set Sample Size: A total of 2589 specimens were evaluated across all weight categories and cut-offs.
- ≥2500g: 1842 specimens (Tables 1, 4, 7)
- 1250g-2249g: 439 specimens (Tables 2, 5, 8)
- <1250g: 308 specimens (Tables 3, 6, 9)
- Additionally, 23 known CAH cases were included within these 2589 samples.
- For the general method comparison (Y= 0.97x + 0.27; r = 0.96), 2567 samples were compared.
Data Provenance: The study used retrospective archived specimens and leftover samples from specimens submitted for routine screening. The document does not specify the country of origin of the data, but it states the study was conducted "in one newborn screening laboratory."

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

The document does not mention the use of experts to establish ground truth for this comparison study. Instead, the "ground truth" for the screening efficacy study is implicitly defined by the results of the predicate device (AutoDELFIA Neonatal 17α-OH-progesterone kit). The 23 "known CAH cases" would have had their diagnosis established through other means, likely clinical follow-up and definitive diagnostic tests, which are not detailed in this report.

4. Adjudication method for the test set

There is no mention of an adjudication method in the context of comparing the GSP kit to the AutoDELFIA kit. The comparison directly uses the test results from both devices.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

This is not applicable. The device described is a quantitative in vitro diagnostic (IVD) kit for measuring 17α-OH-progesterone, not an AI-assisted diagnostic tool that involves human readers interpreting images or results. Therefore, no MRMC study or AI assistance effect size is discussed.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, the performance presented for the GSP Neonatal 17α-OH-progesterone kit is a standalone (algorithm only) performance. The kit is designed to quantitatively determine 17α-OH-progesterone levels in blood specimens using the GSP™ instrument; it does not involve human interpretation in the workflow described for the direct comparison.

7. The type of ground truth used

The primary "ground truth" against which the new device's performance is measured is the predicate device's performance (AutoDELFIA Neonatal 17α-OH-progesterone kit). This is a common approach for demonstrating substantial equivalence for new IVD devices.

Additionally, the study included 23 known CAH cases, indicating that these cases had a definitive diagnosis of Congenital Adrenal Hyperplasia, likely established through a combination of clinical outcomes, follow-up testing, and potentially genetic analysis, although the specific details are not provided. These cases served as a validation point within the overall sample set.

8. The sample size for the training set

The document describes a device comparison and does not explicitly reference a "training set" in the context of machine learning or algorithm development. This is an IVD kit, where performance is typically established through analytical and clinical validation studies rather than machine learning training.

9. How the ground truth for the training set was established

As there is no mention of a separate "training set" for an algorithm in this context, information on how its ground truth was established is not provided. The calibrators and controls used for the kit itself would be part of the manufacturing quality control and calibration process, established using gravimetric methods and human blood matrix (as mentioned in Table 1).

Ask a Question

Ask a specific question about this device

K Number

K081922

Device Name

AUTODELFIA NEONATAL 17A-OH-PROGESTERONE KIT

Manufacturer

WALLAC OY

Date Cleared

2009-04-16

(283 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K042425

Predicate For

K100682

Intended Use

The AutoDELFIA® Neonatal 17a-OH-progesterone kit is intended for the quantitative determination of human 17a-OH-progesterone in blood specimens dried on filter paper as an aid in screening newborns for congenital adrenal hyperplasia (CAH) using the 1235 AutoDELFIA® automatic immunoassay system.

Device Description

The AutoDELFIA Neonatal 17α-OH-progesterone (17-OHP) assay is a solid phase, time-resolved fluoroimmunoassay based on the competitive reaction between europium-labeled 17-OHP and sample 17-OHP for a limited amount of binding sites on 17-OHP specific polyclonal antibodies (derived from rabbit). Danazol facilitates the release of 17-OHP from the binding proteins. A second antibody, directed against rabbit IgG, is coated to the solid phase, giving convenient separation of the antibody-bound and free antigen. Enhancement Solution dissociates europium ions from the labeled antigen into solution where they form highly fluorescent chelates with components of the Enhancement Solution. The fluorescence in each well is then measured. The fluorescence of each sample is inversely proportional to the concentration of 17-OHP in the sample.

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Device: AutoDELFIA® Neonatal 17α-OH-progesterone kit (B024)
Intended Use: Quantitative determination of human 17α-OH-progesterone in blood specimens dried on filter paper as an aid in screening newborns for congenital adrenal hyperplasia (CAH) using the 1235 AutoDELFIA® automatic immunoassay system.

1. Table of Acceptance Criteria and Reported Device Performance

The document describes the proposed device (B024) as substantially equivalent to a predicate device (K042425, B015) and highlights differences in performance characteristics. The acceptance criteria are implicitly defined by demonstrating similar or improved performance compared to the predicate device, especially regarding cross-reactivity and screening efficacy.

Characteristic	Acceptance Criteria (Implied by Predicate)	Reported Proposed Device (B024) Performance
Antibody Cross-Reactions	Lower cross-reactivity with physiologically important steroids in neonates than predicate device.	17α-OH pregnenolone sulfate: 0.78 % (Predicate: 2.0 %) 11-Deoxycortisol: 0.62 % (Predicate: 1.82 %) 17α-OH pregnenolone: 0.83 % (Predicate: 1.20 %) Progesterone: 0.37 % (Predicate: 0.47 %)
Analytical Sensitivity / Limit of Blank (LoB)	1.3 ng/mL serum (Predicate)	0.37 ng/mL serum (Improved)
Analytical Sensitivity / Limit of Detection (LoD)	Not explicitly stated for predicate in table, but overall analytical sensitivity is desired to be good.	0.84 ng/mL serum (Improved over predicate's LoB)
Analytical Sensitivity / Limit of Quantitation (LoQ)	Not explicitly stated for predicate in table.	1.4 ng/mL serum
Precision (Total Variation, full calibration curve)	CV% values for various concentrations (e.g., 13.2% for 25.9 ng/mL, 10.8% for 53.0 ng/mL, 10.9% for 114 ng/mL)	Range of CV% values (e.g., 13.0% for 2.12 ng/mL, 9.8% for 4.69 ng/mL, 14.8% for 7.52 ng/mL, 8.3% for 27.0 ng/mL, 9.2% for 54.4 ng/mL, 10.8% for 109 ng/mL, 9.1% for 182 ng/mL)
Precision (Total Variation, one calibration curve per 4 plates)	CV% values for various concentrations (e.g., 14.0% for 25.8 ng/mL, 12.4% for 52.9 ng/mL, 11.8% for 115 ng/mL)	Range of CV% values (e.g., 14.0% for 2.25 ng/mL, 12.0% for 4.89 ng/mL, 15.8% for 7.79 ng/mL, 9.7% for 27.7 ng/mL, 10.5% for 55.7 ng/mL, 12.7% for 113 ng/mL, 11.3% for 188 ng/mL)
Screening Efficacy (CAH case detection)	Must detect known CAH cases similarly or better than predicate device.	Detected all known CAH cases in studies at appropriate cut-off with one exception for very high percentiles. For instance, in Study 1, ≥ 2250 g, 90th percentile, 13 out of 13 CAH cases were detected. In Study 2, ≥ 2250 g, 90th percentile, 13 out of 13 CAH cases were detected. (See * below for exception)
Median Values in Newborn Screening	Comparable patterns to predicate, potentially with lower absolute values due to reduced cross-reactivity.	For Studies 1 & 2, median values for various weight categories were consistently lower for B024 compared to B015, supporting the claim of reduced cross-reactivity and increased specificity.

In Study 1, < 1250 g, 95th percentile, the new kit detected 1 out of 2 CAH cases (the other case was due to maternal CAH treatment impacting the results). In Study 2, ≥ 2250 g, 95th percentile, using percentiles higher than the 90th resulted in one false negative out of 13 clinically confirmed CAH samples. The document notes that laboratories should consider this when setting screening cut-offs.

The Study Proving Device Meets Acceptance Criteria

The device's performance was evaluated through screening efficacy studies performed in two newborn screening laboratories, comparing the proposed kit (B024) to the predicate device (B015).

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Sizes:
- Study 1:
  - < 1250 g: 364 subjects (Tables 2 & 3)
  - 1250-2249 g: 500 subjects (Tables 4 & 5)
  - ≥ 2250 g: 1328 subjects (Tables 6 & 7)
  - Total Study 1: 2192 subjects
- Study 2:
  - < 1250 g: 168 subjects (Tables 8 & 9)
  - 1250-2249 g: 372 subjects (Tables 10 & 11)
  - ≥ 2250 g: 1299 subjects (Tables 12 & 13)
  - Total Study 2: 1839 subjects
- Total Samples for Screening Efficacy: 2192 + 1839 = 4031 subjects (pooled studies, across different weight categories and percentiles).
- Known CAH Cases:
  - Study 1: 17 confirmed CAH cases.
  - Study 2: 13 confirmed CAH cases (all in ≥ 2250g category).
Data Provenance: Retrospective specimens and excess samples. The country of origin is not explicitly stated, but the submitter is Wallac Oy, Finland, suggesting the studies likely occurred in European or Western laboratories.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The document refers to "confirmed CAH case samples" and "clinically confirmed CAH samples." It does not specify the number of experts or their qualifications directly involved in establishing the ground truth for the individual test samples. The implication is that the CAH diagnoses were established clinically prior to the samples being used for the study.

4. Adjudication Method for the Test Set

The document does not describe an explicit adjudication method (e.g., 2+1, 3+1) for the ground truth of the test set samples. It relies on previously "confirmed" or "clinically confirmed" CAH diagnoses.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

No, an MRMC comparative effectiveness study was not done. This device is an in-vitro diagnostic (IVD) kit that provides quantitative measurements, not an imaging AI algorithm requiring human reader interpretation with or without AI assistance. The comparison is between the performance of two IVD kits.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, this was a standalone performance study. The device (immunoassay kit) directly produces a quantitative result (17-OHP concentration), which is then compared against established cut-offs. There is no "human-in-the-loop" component in the interpretation of the kit's raw result in the context of this performance assessment, although human laboratory personnel perform the assay and clinical interpretation of results based on screening guidelines.

7. The Type of Ground Truth Used

The ground truth used was clinical diagnosis/outcomes data for Congenital Adrenal Hyperplasia (CAH). The document refers to "confirmed CAH case samples" and "clinically confirmed CAH samples."

8. The Sample Size for the Training Set

The document does not provide details of a specific "training set" in the context of machine learning. For an IVD kit like this, "training" typically refers to the assay development and validation process (e.g., antibody selection, calibration curve establishment, optimization of reagents), which is not quantified by a sample size in the same way as an AI algorithm's training data. The description indicates a "new antiserum in B024," which suggests a re-optimization or re-development that would have involved internal validation and method development.

9. How the Ground Truth for the Training Set was Established

As noted above, a distinct "training set" with ground truth in the AI/ML sense is not applicable here. The ground truth for developing and validating the assay's components (like the new antiserum) would have involved:

Characterization of pure steroid compounds for cross-reactivity assessment.
Use of spiked samples with known concentrations of 17-OHP to establish linearity, analytical sensitivity, and precision.
Reference methods or established clinical samples during assay development and optimization to ensure it accurately measures 17-OHP.

Ask a Question

Ask a specific question about this device

Page 1 of 2