(79 days)
This in vitro diagnostic method is intended to quantitatively measure ferritin (an iron-storage protein) in human serum on the Bayer Immuno 1 System. Measurements of ferritin aid in the diagnosis of diseases affecting iron metabolism, such as hemochromatosis (iron overload) and iron deficiency anemia.
This diagnostic method is not intended for use on any other system.
The Bayer Immuno 1 Ferritin (FER) Method uses a homogenous sandwich immunoassay format. Samples are reacted with Ferritin Antibody Conjugate R1 (Antibody linked to FITC) and Ferritin Antibody Conjugate R2 (Antibody linked to calf intestine alkaline phosphatase) and incubated on the Immuno 1 system at 37°C. The anti-Ferritin antibody conjugates combine with sample ferritin to form a sandwich complex, followed by addition of monoclonal Immuno-Magnetic particle (mIMP) Reagent binds the antibody which After further incubacomplexes. tion, the mIMP/antibody complex is washed and a para-nitrophenyl phosphate substrate, which reacts with the enzyme conjugate, is added. The resulting paranitrophenoxide is monitored at 405 nm and 450 nm using a filter switch protocol. The dose/ response curve is proportional to the amount of ferritin in the sample.
This document describes the validation of a change to the Bayer Immuno 1™ System Ferritin Assay, specifically an increase in the concentration of the Level 6 calibrator from 1000 ng/mL to 2500 ng/mL. The primary goal of the study was to demonstrate that this change does not negatively impact the assay's performance and that the new assay performance is equivalent to previously demonstrated performance.
1. Acceptance Criteria and Reported Device Performance
The document implies that the acceptance criteria for the new assay performance are met if key performance characteristics like precision, analytical sensitivity, and correlation with the existing method remain comparable or within acceptable ranges. The specific acceptance criteria themselves are not explicitly listed in a single table, but rather embedded within the description of each study.
Performance Metric | Acceptance Criteria (Implied/Existing) | Reported Device Performance (with L6=2500 ng/mL) |
---|---|---|
Imprecision | No change from existing method sheet imprecision estimates. NCCLS document EP5-T2 guidelines followed. | Proposed L6 Calibrator @ 2500 ng/mL: |
Pooled Human Serum (Pool-1): Mean=25.2, Within SD=0.5, Within %CV=1.9, Total SD=0.6, Total %CV=2.6 | ||
Bayer TESTpoint Controls (Control 1): Mean=28.8, Within SD=0.5, Within %CV=1.8, Total SD=0.6, Total %CV=2.2 | ||
Bayer TESTpoint Controls (Control 2): Mean=104.6, Within SD=0.5, Within %CV=0.5, Total SD=1.4, Total %CV=1.3 | ||
Pooled Human Serum (Pool-2): Mean=206.7, Within SD=8.6, Within %CV=4.1, Total SD=8.6, Total %CV=4.1 | ||
Bayer TESTpoint Controls (Control 3): Mean=237.4, Within SD=6.0, Within %CV=2.5, Total SD=6.0, Total %CV=2.5 |
Current Method Sheet (for comparison):
Control 1: Mean=21.1, Within SD=0.2, Within %CV=1.0, Total SD=1.5, Total %CV=7.1
Control 2: Mean=148.1, Within SD=3.2, Within %CV=2.1, Total SD=7.4, Total %CV=5.0
Control 3: Mean=344.2, Within SD=8.5, Within %CV=2.5, Total SD=17.1, Total %CV=5.0
Conclusion: "The proposed new Level 6 calibrator does not impact imprecision estimates: no change will be made to the method sheet." The reported numbers appear comparable, and in some cases, the %CVs for the proposed L6 are lower than the current method, supporting the claim of no impact. |
| Analytical Sensitivity | Unchanged from current expected value (0.3 ng/mL). | "This value was determined to be 0.3 ng/mL, unchanged from our current expected value." |
| Correlation (Samples 1000 ng/mL) | Good correlation with diluted current method samples considered. | y = 0.975x + 291.1, r = 0.931, Sy.x = 162.2, n = 30. Range: 812 - 2412. (Current samples were diluted 1:5). |
| Correlation (All Samples Combined) | Good overall correlation. | y = 1.162x - 9.21, r = 0.990, Sy.x = 101.7, n = 102. Range: 30.2 - 2412. (Includes neat and diluted samples). |
| Dilution of Over-Range Samples | No statistical difference between recoveries, regression analysis r ≥ 0.975, slope ≥ 0.975. | T-test: t Stat = -1.725 (less than t Critical one-tail 2.015 and two-tail 2.571), meaning no statistical difference. Regression analysis: y = 0.991x + 185.7, r = 0.998, Sy.x = 201.9, n = 6. All samples were diluted 1:10. |
| Specificity/Cross-Reactivity | No impact / "No change will be made to the method sheet." | "No reformulations were made to Reagents and the proposed new Level 6 calibrator will not impact specificity." "No reformulations were made to Reagents and the proposed new Level 6 calibrator will not impact cross-reactivity." (No specific new data presented due to no change in reagents). |
| Reference Intervals | No impact / "no new claim is being made." | "The change to a higher Level 6 calibrator does not impact sample recoveries, so no new claim is being made." (No specific new data presented as no impact was expected/observed). |
| Calibrator Stability | Stability must be determined and verified. | "Data for real-time stability will continue to be generated for at least two years. Additional performance testing will occur at specific time points to verify on-system stability claims for the life-time of the product." (Study is ongoing). |
2. Sample Size and Data Provenance (Test Set)
-
Imprecision Study (Test Set):
- Sample Size:
- Replicate analyses of human serum (Pools 1 & 2) and Bayer TESTpoint Ligand Controls (Controls 1 to 3).
- For the 2500 ng/mL L6 calibrator, data was generated over 5 days using four replicates per day (total 20 replicates for each sample/control).
- For the current L6 calibrator, estimates were collected over 20 days using two replicates per day (total 40 replicates for each sample/control) based on NCCLS document EP5-T2.
- Data Provenance: Not explicitly stated, but implies laboratory-generated data using human serum pools and commercial controls. Likely retrospective from an internal database for the "current" method and prospective for the "proposed" method within the company's R&D facilities.
- Sample Size:
-
Correlation Data (Test Set):
- Sample Size: 102 samples.
- 72 samples with concentrations less than 1000 ng/mL.
- 30 samples with concentrations greater than 1000 ng/mL but less than 2500 ng/mL.
- Data Provenance: "obtained from outside sources." This suggests external patient samples, potentially from a variety of sources but specific countries are not mentioned. The data is retrospective in the sense that these were pre-existing samples used for comparison.
- Sample Size: 102 samples.
-
Dilution of Over-Range Samples (Test Set):
- Sample Size: 6 samples with concentrations greater than 2500 ng/mL.
- Data Provenance: Not explicitly stated, but these were "patient samples." Likely laboratory-generated from clinical samples.
3. Number of Experts and Qualifications (Ground Truth for Test Set)
This document describes the validation of an in vitro diagnostic (IVD) assay. The concept of "experts" to establish ground truth as typically understood in AI/imaging studies (e.g., radiologists interpreting images) does not directly apply here. For IVDs, the "ground truth" is typically established by:
- Reference Methods: Comparisons to established, validated methods or reference materials.
- Certified Reference Materials/Standards: Calibrators are linked to international standards (e.g., WHO 1st International Standard (IS 80/602) for Ferritin).
- Laboratory Procedures: Rigorous, standardized laboratory procedures and quality control.
In this context, the "experts" are the biomedical scientists and engineers at Bayer who designed, developed, and validated the assay, and who interpret the analytical performance data against regulatory and scientific standards (e.g., NCCLS guidelines). No specific number or qualifications of "experts" for ground truth establishment on the test set samples are given, as the "ground truth" is derived from the established analytical validity of the reference methods (the current assay) and the quantitative results themselves.
4. Adjudication Method (Test Set)
Not applicable in the context of an IVD assay validation as described. Adjudication methods (like 2+1 reader consensus) are typically used for subjective interpretations by human readers, often in medical imaging or pathology. For quantitative assays like this, performance is measured against numerical values and statistical comparisons, not expert consensus on individual sample interpretations.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
Not applicable. This is an validation study for an in vitro diagnostic assay, not a study involving human readers interpreting cases with and without AI assistance.
6. Standalone Performance (Algorithm Only without Human-in-the-Loop)
Yes, the studies described are standalone performance evaluations of the assay system itself. The system measures ferritin levels, and its performance (imprecision, sensitivity, correlation, dilution linearity) is assessed based on its quantitative output. There is no "human-in-the-loop" component in the direct measurement of ferritin by the Immuno 1 system; the human element is in operating the instrument, performing quality control, and interpreting the numerical results.
7. Type of Ground Truth Used
- Reference Standard / Certified Material: The calibrators are standardized to the "World Health Organization (WHO) 1st International Standard (IS 80/602)." This is the ultimate ground truth for concentration assignment.
- Comparative Method: For correlation studies, the "current Bayer Immuno 1 Ferritin Assay (L6 = 1000 ng/mL)" served as the comparator or de facto "ground truth" against which the proposed assay (with L6 = 2500 ng/mL) was evaluated.
- Known Concentrations: For imprecision, analytical sensitivity, and dilution studies, laboratory-prepared human serum pools and commercial controls with their expected or established concentrations are used as the reference points.
8. Sample Size for the Training Set
The document does not explicitly delineate a "training set" in the context of machine learning. For an IVD assay, the equivalent of a "training set" would be the data used during the initial development and optimization of the assay reagents, instrument parameters, and calibration curve generation procedures.
The section "4.0 CALIBRATORS OVERVIEW" mentions: "Each production lot of calibrators is anchored against a Master Lot and value assigned by comparative analysis of twenty "new" calibrator replicates nested within twenty "master" calibrator replicates." This process of generating master lots and assigning values could be considered analogous to "training" the system's calibration.
However, a specific "training set" sample size for developing the entire assay, beyond this calibrator assignment process, is not given.
9. How the Ground Truth for the Training Set was Established
As noted in point 8, the assay development process does not typically involve a separate "training set" with distinct "ground truth" establishment as in machine learning. However, for the crucial aspect of calibration:
- International Standard Linkage: "Bayer SETpoint Ferritin Calibrators were first standardized to the World Health Organization (WHO) 1st International Standard (IS 80/602) and the linkage has been perpetuated by nested testing matches of subsequent master lots." This WHO standard forms the primary ground truth for ferritin concentration.
- Master Lot Anchoring: "Each production lot of calibrators is anchored against a Master Lot and value assigned by comparative analysis of twenty "new" calibrator replicates nested within twenty "master" calibrator replicates." This indicates a hierarchical system where new calibrator lots are assigned values relative to established master lots, which themselves trace back to the international standard.
- Control and Serum Pool Verification: "A calibration curve is then generated with the new calibrators and acceptable control and serum pool recoveries are verified before release." This ensures that the established calibration curve provides accurate results for known controls and serum pools.
§ 866.5340 Ferritin immunological test system.
(a)
Identification. A ferritin immunological test system is a device that consists of the reagents used to measure by immunochemical techniques the ferritin (an iron-storing protein) in serum and other body fluids. Measurements of ferritin aid in the diagnosis of diseases affecting iron metabolism, such as hemochromatosis (iron overload) and iron deficiency amemia.(b)
Classification. Class II (performance standards).