Search Results

The ADVIA Centaur Anti-Thyroid Peroxidase II (aTPOII) assay is for in vitro diagnostic use in the quantitative measurement of autoantibodies against thyroid peroxidase in human serum and plasma (EDTA and lithium heparin) using the ADVIA Centaur XP system.

Anti-thyroid peroxidase (aTPO) measurements are used, in conjunction with a clinical assessment, as an aid in the diagnosis of autoimmune thyroiditis and/or Graves' disease.

Device Description

The ADVIA Centaur Anti-Thyroid Peroxidase II (aTPOII) consists of:

aTPOII ReadyPack® primary reagent pack (Lite Reagent, Solid Phase)
aTPOII CAL
Devices sold separately and included in the ADVIA Centaur® Anti-Thyroid Peroxidase II (aTPOII) are:
ADVIA Centaur aTPOII MCM (MCM 1, MCM 2–4)
ADVIA Centaur aTPOII QC
ADVIA Centaur aTPOII DIL ReadyPack ancillary reagent pack

AI/ML Overview

N/A

Ask a Question

Ask a specific question about this device

K Number

K252608

Device Name

AI-Rad Companion Prostate MR

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-09-09

(22 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K241770,K193283

Predicate For

N/A

Intended Use

AI-Rad Companion Prostate MR is indicated for the processing and annotation of DICOM MR prostate images acquired in adult male populations that demonstrate indications of oncological abnormalities in the prostate.

The AI-Rad Companion Prostate MR software aims to support the radiologist and provides the following functionality:
• Viewing, analyzing, evaluating prostate MR images including DCE, ADC, T2 and DWI
• Hosting application for and provides interface to external Prostate MR AI plug-in device
• Accept/reject/edit the results generated by the plug-in software Prostate MR AI

Device Description

AI-Rad Companion Prostate MR is a diagnostic aid in the interpretation of prostate MRI examinations acquired according to the PI-RADS standard.

AI-Rad Companion Prostate MR provides quantitative and qualitative information based on bi or multiparametric prostate MR DICOM images. It displays information on the segmented gland, prostate volume, and segmented lesions along with their classifications. This information can be used to support the reading and reporting of prostate MR studies, as well as the planning of prostate biopsies in the case of ultrasound guided MR-US fusion biopsies of the prostate gland.

The primary features of AI-Rad Companion Prostate MR include:
• Display of Automatic Segmentation and volume of the prostate gland as well as display of automatic segmentation, quantification and classification of lesions
• Manual Adjustment of gland and lesion segmentation and editing of lesion scores, diameter, and localization of the automated generated lesions
• Marking of new lesions
• Export of results as RTSS format for import into supporting ultrasound or fusion biopsy planning systems

AI/ML Overview

Based on the provided FDA 510(k) clearance letter for AI-Rad Companion Prostate MR (K252608), there is no specific study described that proves the device meets predefined acceptance criteria for performance metrics (e.g., sensitivity, specificity, accuracy). The document primarily focuses on demonstrating substantial equivalence to a predicate device (AI-Rad Companion Prostate MR K193283) and adherence to non-clinical verification and validation standards for software development and risk management.

The document explicitly states: "No clinical tests were conducted to test the performance and functionality of the modifications introduced within AI-Rad Companion Prostate MR."

Therefore, a table of acceptance criteria and reported device performance, information about sample sizes, expert ground truth establishment, adjudication methods, multi-reader multi-case studies, standalone performance, and training set details are not available in this document as no clinical performance study for the modified device was performed.

The document emphasizes that modifications and improvements were verified and validated through non-clinical tests (software verification and validation, unit, system, and integration tests), which demonstrated conformity to industry standards and the predicate device's existing safety and effectiveness.

Here’s a breakdown of what is stated in the document regarding testing:

1. A table of acceptance criteria and the reported device performance:

Not provided. The document does not include a table of specific clinical acceptance criteria (e.g., target sensitivity or specificity values) or reported device performance metrics against such criteria. The focus is on demonstrating that software enhancements do not adversely affect safety and effectiveness, assuming the predicate device's performance was already acceptable.

2. Sample sized used for the test set and the data provenance (e.g. country of origin of the data, retrospective or prospective):

Not provided. Since no clinical performance study was conducted for this specific submission, details on test set sample sizes and data provenance are not presented.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g. radiologist with 10 years of experience):

Not applicable. As no clinical study is reported, this information is not available.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:

Not applicable.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Not done. The document explicitly states "No clinical tests were conducted."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Not explicitly stated for the modified device. While the device description mentions automatic segmentation and classification, the overall context emphasizes a "diagnostic aid" that "aims to support the radiologist" and has functionality to "Accept/reject/edit the results generated by the plug-in software Prostate MR AI." This suggests an interactive workflow where standalone performance is not the primary claim for this particular submission. The separate product, "Prostate MR AI (K241770)," which performs the core AI tasks, is likely where standalone performance would be detailed, but not in this document.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

Not applicable for this submission, as no new clinical performance study is detailed for the modified device. The original predicate device's performance would have relied on a ground truth, but that information is not part of this document.

8. The sample size for the training set:

Not provided. Since this submission is for an updated version of an already cleared device and no new clinical performance study is detailed, the training set size for the underlying AI model (likely part of K241770 or the predicate K193283) is not included here.

9. How the ground truth for the training set was established:

Not provided. This information would typically be detailed in the original submission for the AI algorithm (likely K241770 or K193283), not in this update focused on software enhancements and substantial equivalence.

In summary, the provided document focuses on demonstrating that the enhancements and modifications to the AI-Rad Companion Prostate MR do not adversely affect the safety and effectiveness of the existing predicate device. It relies on non-clinical software verification and validation, and substantial equivalence arguments, rather than presenting a de novo clinical performance study with new acceptance criteria and results.

Ask a Question

Ask a specific question about this device

K Number

K242981

Device Name

Atellica IM Thyroglobulin (Tg)

Manufacturer

Siemens Healthcare Diagnostics, Inc.

Date Cleared

2025-06-20

(267 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K241423

Predicate For

N/A

Intended Use

The Atellica IM Thyroglobulin (Tg) assay is for in vitro diagnostic use in the quantitative measurement of thyroglobulin in human serum and plasma (EDTA and lithium heparin) using the Atellica IM Analyzer.

Thyroglobulin measurements are used as an aid in monitoring differentiated thyroid cancer patients who have undergone thyroidectomy with or without radioiodine ablation.

Device Description

The Atellica IM Thyroglobulin (Tg) assay includes:

Tg ReadyPack primary reagent pack:
- Lite Reagent: mouse monoclonal anti-human Tg antibody labeled with acridinium ester (~1.13 μg/mL); bovine serum albumin (BSA); mouse IgG; buffer; stabilizers; preservatives (7.5 mL/reagent pack).
- Solid Phase: streptavidin-coated paramagnetic microparticles preformed with biotinylated mouse monoclonal antihuman Tg antibody (~267 μg/mL); BSA; mouse IgG; buffer; stabilizers; preservatives (15.0 mL/reagent pack).
Ancillary Well Reagent: BSA; bovine gamma globulin; buffer; preservatives (6.0 mL/reagent pack).
Tg CAL: After reconstitution, human thyroglobulin; BSA; buffer; stabilizers; preservatives (2.0 mL/vial).

The following devices are sold separately:

Atellica IM Tg MCM:
- MCM 1: After reconstitution, bovine serum albumin (BSA); buffer; stabilizers; preservatives (1.0 mL/vial).
- MCM 2–5: After reconstitution, various levels of human thyroglobulin; BSA; buffer; stabilizers; preservatives (1.0 mL/vial).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) summary for the Atellica IM Thyroglobulin (Tg) assay:

Device: Atellica IM Thyroglobulin (Tg) Assay
Purpose: Quantitative measurement of thyroglobulin in human serum and plasma as an aid in monitoring differentiated thyroid cancer patients who have undergone thyroidectomy with or without radioiodine ablation.

1. Table of Acceptance Criteria and Reported Device Performance

The provided document describes various performance characteristics, which serve as acceptance criteria for the device. The reported performance is directly from the summary.

Acceptance Criteria Category	Specific Acceptance Criteria (implicit from study design)	Reported Device Performance
Detection Capability	LoB, LoD, LoQ determined per CLSI EP17-A2	LoB: 0.039 ng/mL (0.059 pmol/L) LoD: 0.044 ng/mL (0.067 pmol/L) LoQ: 0.050 ng/mL (0.076 pmol/L)
Precision	Precision determined per CLSI EP05-A3 (within-laboratory and repeatability)	Repeatability (CV%): 1.2% - 6.4% across various concentrations Within-Laboratory Precision (CV%): 2.3% - 9.0% across various concentrations
Reproducibility	Reproducibility determined per CLSI EP05-A3 (across sites, runs, days)	Reproducibility (CV%): 1.9% - 5.8% across various concentrations
Linearity	Linearity determined per CLSI EP06-ed2 within stated assay range	Linear for 0.050–150 ng/mL (0.076–227 pmol/L)
Specimen Equivalence	Performance equivalence across serum, EDTA plasma, lithium heparin plasma	Performance confirmed equivalent across serum, EDTA plasma, lithium heparin plasma, and associated gel barrier tubes.
Interferences (HIL)	Bias < 10% for Hemoglobin, Bilirubin, Lipemia at specified concentrations	No bias > 10% observed for tested HIL substances.
Interferences (Other Substances)	Bias < 10% for various common substances/medications/biomarkers at specified concentrations	No bias > 10% observed for tested other substances.
Cross-Reactivity	Cross-reactivity < 1.0% for specified substances (T3, T4, TSH, Galectin-3, T2)	Cross-reactivity < 1.0% for tested substances.
Reagent Stability	Defined on-board and reconstituted calibrator stability	28 days on-board; Calibrators stable 45 days (2-8°C) / 60 days (≤ -20°C, thaw once).
Sample Stability	Defined stability for various sample types and storage conditions	Stable 3-4 days (2-8°C), 4 days (RT), 12-24 months (frozen); ≤ 4 freeze-thaw cycles.
High Dose Hook Effect	No hook effect within a specified concentration range	No hook effect up to 80,000 ng/mL (121,200 pmol/L).
Expected Values	Reference intervals established per CLSI EP28-A3c	Healthy Adults: 2.44–74.9 ng/mL Post-thyroidectomy adults: < 1.27 ng/mL
Clinical Performance	Sensitivity and specificity calculated by comparing assay results to structural disease (SD) at a defined cut-off (0.2 ng/mL). Confidences intervals for these parameters.	Sensitivity: 98.2% (95% CI: 94.6%, 100.0%) Specificity: 53.4% (95% CI: 47.8%, 58.0%) PPV: 10.0% (95% CI: 8.7%, 11.2%) NPV: 99.8% (95% CI: 99.5%, 100.0%)

2. Sample Size Used for the Test Set and Data Provenance

Clinical Performance Test Set Sample Size: 291 serum samples collected from 189 subjects.
Data Provenance:
- The document states "A prospective, multi-center study was conducted." This indicates prospective data collection across multiple sites.
- The country of origin is not explicitly stated in the provided text.
- All samples were from subjects diagnosed with differentiated thyroid cancer, 6 or more weeks following thyroidectomy or radioiodine ablation.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

The document does not specify the number of experts or their qualifications for establishing the ground truth (structural disease). It simply states: "SD [Structural Disease] was established and classified as either positive or negative by cross-sectional or functional imaging results."
This suggests that the ground truth was derived from standard clinical imaging reports rather than a consensus of independent expert readers specifically for this study.

4. Adjudication Method for the Test Set

The document does not describe an adjudication method for the test set's ground truth (structural disease). It implies that the imaging results themselves provided the classification. This means there was no adjudication process as typically seen with multiple human readers reviewing images.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done

No, an MRMC comparative effectiveness study was not done.
This study is for an in vitro diagnostic (IVD) assay (a lab test), not an AI-assisted imaging or diagnostic tool where human readers work with or without AI. The performance metrics presented are for the analytical and clinical performance of the assay itself, comparing its results to a ground truth (structural disease status), not to human reader performance or improvement with AI.

6. If a Standalone Performance Study Was Done

Yes, this is effectively a standalone (algorithm only) performance study.
The Atellica IM Tg assay is an automated in vitro diagnostic device. Its performance characteristics (sensitivity, specificity, precision, linearity, etc.) are evaluated intrinsically, independent of human interpretation of the assay result values. The output is a quantitative measurement of thyroglobulin.

7. The Type of Ground Truth Used

Ground truth for clinical performance: Structural disease (SD) status obtained from "cross-sectional or functional imaging results."
Ground truth for analytical performance (LoB, LoD, LoQ, Precision, etc.): Established through laboratory protocols and reference materials (e.g., CLSI guidelines, certified reference materials like BCR CRM 457, spiked samples, control materials).

8. The Sample Size for the Training Set

The document does not specify a separate training set or its sample size for the Atellica IM Tg assay.
For IVD assays like this, the "training" is typically inherent in the assay's development and optimization process (e.g., reagent formulation, calibration curve development), which uses various known samples and standards, rather than a distinct, labeled "training dataset" as would be seen for a machine learning algorithm. The performance characteristics studies presented are akin to a "verification/validation set."

9. How the Ground Truth for the Training Set Was Established

As a traditional IVD assay, there isn't a "training set" in the sense of a machine learning model.
Ground truth for assay development and calibration: This would have been established using reference materials (like BCR CRM 457), characterized control samples, and potentially a large panel of clinically characterized patient samples used during the assay's development and optimization phases. These activities are part of the broader product development lifecycle rather than a distinct "training set" with ground truth generated by experts in the context of a clinical study for submission. Standardization is explicitly noted as traceable to BCR CRM 457, which serves as a primary standard for establishing the quantitative accuracy of the assay.

Ask a Question

Ask a specific question about this device

K Number

K250443

Device Name

MAGNETOM Avanto Fit; MAGNETOM Skyra Fit; MAGNETOM Sola Fit; MAGNETOM Viato.Mobile

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-06-16

(122 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K231587,K232535,K213693,K153343,K220151,K220589,K221733,K240608,K191040

Predicate For

N/A

Intended Use

The MAGNETOM system is indicated for use as a magnetic resonance diagnostic device (MRDD) that produces transverse, sagittal, coronal and oblique cross sectional images, spectroscopic images and/or spectra, and that displays the internal structure and/or function of the head, body, or extremities. Other physical parameters derived from the images and/or spectra may also be produced. Depending on the region of interest, contrast agents may be used. These images and/or spectra and the physical parameters derived from the images and/or spectra when interpreted by a trained physician yield information that may assist in diagnosis.

The MAGNETOM system may also be used for imaging during interventional procedures when performed with MR compatible devices such as in-room displays and MR Safe biopsy needles.

Device Description

The subject device, MAGNETOM Avanto Fit with software syngo MR XA70A, consists of new and modified software and hardware that is similar to what is currently offered on the predicate device, MAGNETOM Avanto Fit with syngo MR XA50A (K220151).

A high-level summary of the new and modified hardware and software is provided below:

For MAGNETOM Avanto Fit with syngo MR XA70:

Hardware

New Hardware:
myExam 3D Camera
BM Head/Neck 20

Modified Hardware:
Sanaflex (cushions for patient positioning)

Software

New Features and Applications:
myExam Autopilot Brain
myExam Autopilot Knee
3D Whole Heart
HASTE_interactive
GRE_PC
Open Recon
Deep Resolve Gain
Fleet Reference Scan
Physio logging
complex averaging
AutoMate Cardiac
Ghost Reduction
BLADE diffusion
Beat Sensor
Deep Resolve Sharp
Deep Resolve Boost and Deep Resolve Boost (TSE)
Deep Resolve Boost HASTE
Deep Resolve Boost EPI Diffusion

Modified Features and Applications:
SPACE improvement (high band)
SPACE improvement (incr grad)
Brain Assist
Eco power mode
myExam Angio Advanced Assist (Test Bolus)

The subject device, MAGNETOM Skyra Fit with software syngo MR XA70A, consists of new and modified software and hardware that is similar to what is currently offered on the predicate device, MAGNETOM Skyra Fit with syngo MR XA50A (K220589).

A high-level summary of the new and modified hardware and software is provided below:

For MAGNETOM Skyra Fit with syngo MR XA70:

Hardware

New Hardware:
myExam 3D Camera

Modified Hardware:
Sanaflex (cushions for patient positioning)

Software

New Features and Applications:
Beat Sensor
HASTE_interactive
GRE_PC
3D Whole Heart
Deep Resolve Gain
Open Recon
Ghost Reduction
Fleet Reference Scan
BLADE diffusion
HASTE diffusion
Physio logging
complex averaging
Deep Resolve Swift Brain
Deep Resolve Sharp
Deep Resolve Boost and Deep Resolve Boost (TSE)
Deep Resolve Boost HASTE
Deep Resolve Boost EPI Diffusion
AutoMate Cardiac
SVS_EDIT

Modified Features and Applications:
SPACE improvement (high band)
SPACE improvement (incr grad)
Brain Assist
Eco power mode
myExam Angio Advanced Assist (Test Bolus)

The subject device, MAGNETOM Sola Fit with software syngo MR XA70A, consists of new and modified software and hardware that is similar to what is currently offered on the predicate device, MAGNETOM Sola Fit with syngo MR XA51A (K221733).

A high-level summary of the new and modified hardware and software is provided below:

For MAGNETOM Sola Fit with syngo MR XA70:

Hardware

New Hardware:
myExam 3D Camera

Modified Hardware:
Sanaflex (cushions for patient positioning)

Software

New Features and Applications:
GRE_PC
3D Whole Heart
Ghost Reduction
Fleet Reference Scan
BLADE diffusion
Physio logging
Open Recon
Complex averaging
Deep Resolve Sharp
Deep Resolve Boost and Deep Resolve Boost (TSE)
Deep Resolve Boost HASTE
Deep Resolve Boost EPI Diffusion
AutoMate Cardiac
Implant suite

Modified Features and Applications:
SPACE improvement (high band)
SPACE improvement (incr grad)
Brain Assist
Eco power mode

The subject device, MAGNETOM Viato.Mobile with software syngo MR XA70A, consists of new and modified software and hardware that is similar to what is currently offered on the predicate device, MAGNETOM Viato.Mobile with syngo MR XA51A (K240608).

A high-level summary of the new and modified hardware and software is provided below:

For MAGNETOM Viato.Mobile with syngo MR XA70:

Hardware

New Hardware:
n.a.

Modified Hardware:
Sanaflex (cushions for patient positioning)

Software

Modified Features and Applications:
SPACE improvement (high band)
SPACE improvement (incr grad)
Brain Assist
Eco power mode

Furthermore, the following minor updates and changes were conducted for the subject devices:

Low SAR Protocol minor update (for all subject devices but MAGNETOM Skyra Fit): the goal of the SAR adaptive protocols was to be able to perform knee, spine, heart and brain examinations with 50% of the max allowed SAR values in normal mode for head and whole-body SAR. The SAR reduction was achieved by parameter adaptations like Flip angle, TR, RF Pulse Type, Turbo Factor, concatenations. For cardiac clinically accepted alternative imaging contrasts are used (submitted with K232494).

Implementation of image sorting prepare for PACS (submitted with K231560).

Implementation of improved DICOM color support (submitted with K232494).

Needle intervention AddIn was added all subject device (submitted with K232494).

Inline Image Filter switchable for users: in the subject device, users have the ability to switch the "Inline image filter" (implicite Filter) on or off. This filter is an image-based filter that can be applied to specific pulse sequence types. The function of the filter remains unchanged from the previous device MAGNETOM Sola with syngo MR XA61A (K232535).

SVS_EDIT is newly added for MAGNETOM Skyra Fit, but without any changes (submitted with K203443)

Brain Assist received an improvement and is identical to that of snygo MR XA61A (K232535)

Open Recon is introduced for all systems. The function of Open Recon remains unchanged from the previous submissions (submitted with K221733).

Lock TR and FA in Bold received a minor UI update

Implant Suite is newly introduced for MAGNETOM Sola Fit and MAGNETOM Viato.Mobile, but without any changes (submitted with K232535)

myExam Autopilot Brain and myExam Autopilot Knee are newly introduced for the subject device MAGNETOM AVANTO Fit and are unchanged from previous submissions (submitted with K221733).

myExam Angio Advanced Assist (Test Bolus) received a bug fixing and minimal UI improvements.

AI/ML Overview

The provided text is an FDA 510(k) clearance letter for various MAGNETOM MRI Systems. While it details new and modified software and hardware features, it does not include specific acceptance criteria or a study that "proves the device meets the acceptance criteria" in terms of performance metrics like sensitivity, specificity, or accuracy for a diagnostic task.

Instead, the document focuses on demonstrating substantial equivalence to predicate devices. This is achieved by:

Stating that the indications for use are the same.
Listing numerous predicate and reference devices.
Detailing hardware and software changes.
Mentioning non-clinical tests like software verification and validation, sample clinical images, and image quality assessment to show that the new features maintain an "equivalent safety and performance profile" to the predicate devices.
Referencing scientific publications for certain features to support their underlying principles and utility.
Briefly describing the training and validation data for two AI features: Deep Resolve Boost and Deep Resolve Sharp, but without performance acceptance criteria or detailed results.

Therefore, much of the requested information cannot be extracted from this document because it is not a study report detailing clinical performance against predefined acceptance criteria for a specific diagnostic outcome.

However, I can extract the information related to the AI features as best as possible from the "AI Features/Applications training and validation" section (Page 16).

Acceptance Criteria and Study Details (Limited to AI Features)

1. Table of Acceptance Criteria and Reported Device Performance

Feature	Acceptance Criteria	Reported Device Performance
Deep Resolve Boost	(Not explicitly stated in the provided document as specific numerical thresholds, but implied through evaluation metrics.)	"The impact of the network has been characterized by several quality metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Most importantly, the performance was evaluated by visual comparisons to evaluate e.g., aliasing artifacts, image sharpness and denoising levels." (Exact numerical results not provided).
Deep Resolve Sharp	(Not explicitly stated in the provided document as specific numerical thresholds, but implied through evaluation metrics and verification activities.)	"The impact of the network has been characterized by several quality metrics such as peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and perceptual loss. In addition, the feature has been verified and validated by inhouse tests. These tests include visual rating and an evaluation of image sharpness by intensity profile comparisons of reconstructions with and without Deep Resolve Sharp." (Exact numerical results not provided).

2. Sample size used for the test set and the data provenance

Deep Resolve Boost:
- Test Set Sample Size: Not explicitly stated as a separate "test set" size. The document mentions "training and validation data" for over 25,000 TSE slices, over 10,000 HASTE slices (for refinement), and over 1,000,000 EPI Diffusion slices. It's unclear what proportion of this was used specifically for final testing, or if the "validation" mentioned includes the final performance evaluation.
- Data Provenance: Retrospective, described as "Input data was retrospectively created from the ground truth by data manipulation and augmentation." Country of origin is not specified.
Deep Resolve Sharp:
- Test Set Sample Size: Not explicitly stated as a separate "test set" size. The document mentions "training and validation" on more than 10,000 high resolution 2D images. Similar to Deep Resolve Boost, it's unclear what proportion was specifically for final testing.
- Data Provenance: Retrospective, described as "Input data was retrospectively created from the ground truth by data manipulation." Country of origin is not specified.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

This information is not provided in the document. The definition of "ground truth" for the AI features refers to the acquired datasets themselves rather than expert-labeled annotations. Visual comparisons are mentioned as part of the evaluation, but without details on expert involvement or qualifications.

4. Adjudication method for the test set

This information is not provided in the document. While "visual comparisons" and "visual rating" are mentioned, no specific adjudication method (e.g., 2+1, 3+1) is described.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

No, a MRMC comparative effectiveness study demonstrating human reader improvement with AI assistance is not described in this document. The focus of the AI features (Deep Resolve Boost and Deep Resolve Sharp) is on image quality enhancement (denoising, sharpness) and reconstruction rather than assisting human readers in a diagnostic task that can be quantified by an effect size.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, the evaluation of Deep Resolve Boost and Deep Resolve Sharp, based on metrics like PSNR, SSIM, and perceptual loss, and "visual comparisons" or "visual rating" appears to be an assessment of the algorithm's performance in enhancing image quality in a standalone capacity, without direct human-in-the-loop interaction for diagnosis.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

Deep Resolve Boost: "The acquired datasets (as described above) represent the ground truth for the training and validation." This implies the original, full-quality, unaltered MRI scan data. Further, "Input data was retrospectively created from the ground truth by data manipulation and augmentation. This process includes further under-sampling of the data by discarding k-space lines, lowering of the SNR level by addition Restricted of noise and mirroring of k-space data."
Deep Resolve Sharp: "The acquired datasets represent the ground truth for the training and validation." Similar to Boost, this refers to original, high-resolution MRI scan data. For training, "k-space data has been cropped such that only the center part of the data was used as input. With this method corresponding low-resolution data as input and high-resolution data as output / ground truth were created for training and validation."

8. The sample size for the training set

Deep Resolve Boost:
- TSE: more than 25,000 slices
- HASTE (for refinement): more than 10,000 HASTE slices
- EPI Diffusion: more than 1,000,000 slices
Deep Resolve Sharp: more than 10,000 high resolution 2D images.

9. How the ground truth for the training set was established

Deep Resolve Boost: The ground truth was established by the "acquired datasets" themselves (full-quality MRI scans). The training input data was then derived from this ground truth by simulating degraded images (e.g., under-sampling, adding noise).
Deep Resolve Sharp: Similarly, the ground truth was the "acquired datasets" (high-resolution MRI scans). The training input data was derived by cropping k-space data to create corresponding low-resolution inputs.

Ask a Question

Ask a specific question about this device

K Number

K243570

Device Name

Dimension LOCI Thyroid Stimulating Hormone Flex reagent cartridge (TSHL); Dimension LOCI Free Thyroxine Flex reagent cartridge (FT4L)

Manufacturer

Siemens Healthcare Diagnostics

Date Cleared

2025-04-25

(158 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K081074,K073604

Predicate For

N/A

Intended Use

The TSHL method is an in vitro diagnostic test for the quantitative measurement of Thyroid Stimulating Hormone (TSH, thyrotropin) in human serum and plasma on the Dimension® EXL™ integrated chemistry system with LOCI® Module. Measurements of TSH are used in the diagnosis and monitoring of thyroid disease.

The FT4L method is an in vitro diagnostic test for the quantitative measurement of Free Thyroxine in human serum and plasma on the Dimension® EXL™ integrated chemistry system with LOCI® Module. Measurements of free thyroxine are used in the diagnosis and monitoring of thyroid disease.

Device Description

The Dimension® LOCI® Thyroid Stimulating Hormone Flex® reagent cartridge (TSHL) and Dimension® LOCI® Free Thyroxine Flex® reagent cartridge (FT4L) assays were cleared under K081074 and K073604, respectively. The components of the cleared assays were modified to reduce biotin interference.

The modified Assays are comprised of the following components:

Dimension® LOCI® Thyroid Stimulating Hormone Flex® reagent cartridge (TSHL): prepackaged liquid reagents in a plastic eight-well cartridge. Wells 1-2 contain Biotinylated TSH antibody (7.5 µg/mL mouse monoclonal), wells 3-4 contain TSH antibody coated Chemibeads (200 µg/mL mouse monoclonal), and wells 5-6 contain Streptavidin Sensibeads (1400 µg/mL recombinant E. coli). Wells 1-6 contain buffers, stabilizers and preservatives. Wells 7-8 are empty.

Dimension® LOCI® Free Thyroxine Flex® reagent cartridge (FT4L): prepackaged liquid reagents in a plastic eight-well cartridge. Wells 1-2 contain Streptavidin Sensibeads (225 µg/mL recombinant E. coli), wells 3-4 contain T3 Chemibeads (200 µg/mL), and wells 5-6 contain FT4 Biotinylated antibody (50 ng/mL mouse monoclonal). Wells 1-6 contain buffers, stabilizers and preservatives. Wells 7-8 are empty.

Test Principle: Both devices use a homogeneous chemiluminescent immunoassay based on LOCI® technology.
For TSHL, it's a sandwich immunoassay where sample is incubated with biotinylated antibody and Chemibeads to form bead-TSH-biotinylated antibody sandwiches. Sensibeads are added and bind to the biotin to form bead-pair immunocomplexes. Illumination at 680 nm generates singlet oxygen from Sensibeads which diffuses into Chemibeads, triggering a chemiluminescent reaction. The resulting signal is measured at 612 nm and is a direct function of TSH concentration.
For FT4L, it's a sequential immunoassay where sample is incubated with biotinylated antibody. T3 Chemibeads are added and form bead/biotinylated antibody immunocomplexes with the non-saturated fraction of the biotinylated antibody. Sensibeads are then added and bind to the biotin to form bead pair immunocomplexes. Illumination at 680 nm generates singlet oxygen from Sensibeads which diffuses into the Chemibeads, triggering a chemiluminescent reaction. The resulting signal is measured at 612 nm and is an inverse function of FT4 concentration.

AI/ML Overview

The document provided is a 510(k) clearance letter from the FDA for two in-vitro diagnostic (IVD) devices: Dimension® LOCI® Thyroid Stimulating Hormone Flex® reagent cartridge (TSHL) and Dimension® LOCI® Free Thyroxine Flex® reagent cartridge (FT4L). It describes the devices, their intended use, and the performance characteristics tested to demonstrate substantial equivalence to previously cleared predicate devices.

However, it's crucial to understand that this document describes a reagent cartridge, which is a laboratory assay, not an AI/ML-driven device or an imaging device. Therefore, many of the requested criteria (e.g., sample size for training/test sets for AI, data provenance like country of origin for AI, ground truth establishment by experts, adjudication methods, MRMC studies, standalone AI performance) are not applicable to this type of device. The document details the performance of the assay itself in measuring biomarker concentrations, not an AI's ability to interpret images or assist human readers.

I will interpret the request based on the information provided for this specific IVD device, noting where certain requested details are not relevant to the nature of the device.

Acceptance Criteria and Study to Prove Device Meets Criteria (for an IVD Reagent Cartridge)

The device in question, a reagent cartridge for quantitative measurement of TSH and FT4, is a laboratory assay, not an AI/ML or imaging interpretation device. Therefore, the "acceptance criteria" and "study" are focused on analytical performance characteristics (accuracy, precision, linearity, interference, detection limits, etc.) compared to a predicate device, rather than diagnostic accuracy metrics of an AI.

1. Table of Acceptance Criteria and Reported Device Performance

For an IVD reagent cartridge, "acceptance criteria" are typically defined by ranges, limits, or statistical agreementsdemonstrating analytical performance comparable or superior to the predicate device and meeting relevant clinical or analytical standards (e.g., CLSI guidelines). The reported performance demonstrates that the modified devices meet these standards.

Performance Characteristic	Acceptance Criteria (Implicit from CLSI Guidelines/Predicate Comparison)	Reported Device Performance (TSHL)	Reported Device Performance (FT4L)
Detection Limits	Meet/Be comparable to predicate; within acceptable analytical ranges.	LoB: 0.003 µIU/LLoD: 0.005 µIU/LLoQ: 0.007 µIU/L	LoB: 0.03 ng/dLLoD: 0.05 ng/dLLoQ: 0.06 ng/dL
Linearity / Measuring Interval	Linear across the claimed measuring range with acceptable bias.	0.007 – 100 µIU/mL	0.1 – 8.0 ng/dL
Method Comparison (vs. Predicate)	High correlation (r close to 1), slope close to 1, small y-intercept.	N=145 Serum samplesy = 0.99x + 0.039 µIU/mL(Correlation (r) implicitly high, as regression equation suggests strong agreement)	N=146 Serum samplesy = 1.02x + 0.03 ng/dL(Correlation (r) implicitly high, as regression equation suggests strong agreement)
Precision (Repeatability)	Within-run and total precision (SD/CV) within acceptable clinical laboratory limits.	TSHL: Levels 0.110-88.676 µIU/mLWithin-Run %CV: 2.6-4.4%Total %CV: 1.1-3.0% (Note: Table 5 "Total" %CV for Level 1 is 2.6%, matching within-run %CV, but for others, it's lower. This might be a typo in the table, typically Total CV > Within-Run CV).	FT4L:Levels 0.81-6.41 ng/dLWithin-Run %CV: 2.2-2.6%Total %CV: 0.9-1.1%
Precision (Reproducibility)	Total reproducibility (SD/CV) across lots and systems within acceptable clinical laboratory limits.	TSHL:Levels 0.094-81.372 µIU/mLReproducibility %CV: 4.6-7.6%	FT4L:Levels 0.70-6.49 ng/dLReproducibility %CV: 1.8-2.4%
Recovery (Dilution)	For TSHL, diluted samples should show recovery close to 100% of the true value.	TSHL:Recovery ranged from 100% to 106% for various samples diluted 5x.	N/A (FT4L not described for dilution recovery)
Interference (Biotin)	Modified assay shows significantly reduced interference compared to predicate.	TSHL & FT4L: Specimens with biotin up to 1200 ng/mL demonstrate ≤10% change in results (significant improvement from predicate's 250 ng/mL for TSHL and 100 ng/mL for FT4L).	TSHL & FT4L: Specimens with biotin up to 1200 ng/mL demonstrate ≤10% change in results.
Reference Range Verification	Results from healthy samples confirm the established reference intervals.	TSHL: Verified for adults (0.358-3.74 µIU/mL) and pediatric populations.	FT4L: Verified for adults (0.76-1.46 ng/dL) and pediatric populations.
Matrix Comparison	Comparable performance across different sample matrices.	Comparable values to serum samples for lithium heparin, sodium heparin, and K2-EDTA plasma.	Same as TSHL.
Hook Effect	No significant hook effect within specified range.	No hook effect observed up to 30,000 µIU/mL.	N/A (FT4L not described for hook effect)

2. Sample Sizes and Data Provenance for the Test Set

The concept of a "test set" in the context of an IVD reagent cartridge refers to the set of samples used for various analytical performance studies. These are not typically split into "training" and "test" sets as in AI/ML.

Method Comparison:
- TSHL: 145 patient samples (serum)
- FT4L: 146 patient samples (serum)
Precision (Repeatability): 5 serum samples (TSHL), 3 serum samples (FT4L)
Precision (Reproducibility): 5 serum samples (TSHL), 3 serum samples (FT4L)
Linearity: Low and high human serum pools used to create dilution series (TSHL: 12 levels, FT4L: 10 levels)
Interference (Biotin and HIL): Samples spiked with interferents, specific TSH/FT4 levels tested.
Dilution Recovery: 7 samples (TSHL)
Reference Range Verification: "Apparently healthy samples" (specific N not provided, but typically a statistically significant number for verification per CLSI EP28-A3C).
Matrix Comparison: Samples of various tube types (Serum, lithium heparin, sodium heparin, K2-EDTA plasma)

Data Provenance: The document does not specify the country of origin of the patient samples. The studies are explicitly described as analytical performance studies rather than clinical outcome studies, and they are retrospective (samples tested in the lab, not followed prospectively).

3. Number of Experts and Qualifications for Ground Truth

This is not applicable as the device is a quantitative IVD assay (reagent cartridge), not an AI/ML device requiring expert interpretation of complex clinical data or images. The "ground truth" for this device is the actual concentration of TSH or FT4 in the sample, typically established either by:

Reference methods (e.g., mass spectrometry, although not explicitly stated as the ground truth method here).
The predicate device itself (as used in method comparison studies, where the predicate is the "comparison assay").
Spiking known concentrations into matrices.

4. Adjudication Method for the Test Set

This is not applicable for a quantitative IVD reagent. Adjudication methods (e.g., 2+1, 3+1) are typically used in scenarios where human experts interpret data (like medical images), and their disagreements need to be resolved to establish a definitive ground truth for AI model evaluation.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

This is not applicable. An MRMC study is designed to evaluate the diagnostic performance of human readers, often with and without AI assistance, on a set of cases. This device is a reagent cartridge that provides a quantitative measurement, not an AI that assists human interpretation.

6. Standalone Performance (Algorithm Only Without Human-in-the-Loop)

This is not applicable. This device is a reagent cartridge that runs on an automated system, providing a quantitative result. It's inherently "standalone" in providing the measurement, but it's not an "algorithm only" in the sense of an AI interpreting complex data. The performance metrics listed (precision, accuracy relative to predicate, linearity, etc.) are its "standalone" performance.

7. Type of Ground Truth Used

The "ground truth" for this type of quantitative diagnostic test is based on:

Comparison to a legally marketed predicate device: The current, FDA-cleared versions of the TSHL and FT4L assays (K081074 and K073604) acted as the "gold standard" or comparison method for the method comparison studies.
Known concentrations: For linearity, recovery, and interference studies, samples were prepared with known concentrations or spiked with known amounts of analytes or interferents.
Analytically verified samples: Samples used for precision studies have mean values derived from repeated measurements.

8. Sample Size for the Training Set

This is not applicable as the device is a non-AI/ML IVD reagent cartridge. There is no concept of a "training set" for this type of product. The development and optimization of the reagent formulation are internal processes, but they don't involve "training" a model on a dataset in the AI sense.

9. How Ground Truth for the Training Set Was Established

This is not applicable for the same reason as point 8.

Ask a Question

Ask a specific question about this device

K Number

K242551

Device Name

syngo Dynamics (Version VA41D)

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-04-03

(219 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K222428

Predicate For

N/A

Intended Use

syngo Dynamics is a multimodality, vendor agnostic Cardiology image and information system intended for medical image management and processing that provides capabilities relating to the review and digital processing of medical images.

syngo Dynamics supports clinicians by providing post image processing functions for image manipulation, and/or quantification that are intended for use in the interpretation and analysis of medical images for disease detection, diagnosis, and/or patient management within the healthcare institution's network.

syngo Dynamics is not intended to be used for display or diagnosis of digital mammography images in the U.S.

Device Description

syngo Dynamics is a software only medical device which is used with common IT hardware. Recommended configurations are defined for the hardware required to run the device, and hardware is not considered as part of the medical device.

syngo Dynamics is intended to be used by trained healthcare professionals in a professional healthcare facility to review, edit, and manipulate image data, as well as to generate quantitative data, qualitative data, and diagnostic reports.

syngo Dynamics is a digital image display and reporting system with flexible deployment – it can function as a standalone medical device that includes a DICOM Server or as an integrated module within an Electronic Health Record (EHR) System with a DICOM Archive that receives images from digital image acquisition devices such as ultrasound and x-ray angiography machines. There are three deployments: Standalone, EHR/EHS Integrated, and Multi-Modality Cardiovascular (MMCV). MMCV deployment functions as a standalone medical device with capability of natively support 2D and 3D CT and MR image types.

The use of syngo Dynamics is focused on cardiac ultrasound (echocardiography), angiography (x-ray), cardiac nuclear medicine (NM), CT and MR studies that cover both adult and pediatric medicine. Also supported is vascular ultrasound and ultrasound in Obstetrics/Gynecology and Maternal Fetal Medicine (fetal echocardiography during pregnancy).

syngo Dynamics is based on a client-server architecture. The syngo Dynamics server processes the data from the connected imaging modalities, and stores data and images to a DICOM server and routes them for permanent storage, printing, and review. The client provides the user interface for interactive image viewing, reporting, and processing; and can be installed on network connected workstations.

syngo Dynamics provides various semi-automated anatomical visualization tools.

syngo Dynamics offers multiple access strategies: A Workplace that provides full functionality for reading and reporting; A Remote Workplace that provides additionally compressed images with access to full fidelity images for reading and reporting; and a browser based WebViewer that provides access to additionally compressed images and reports from compatible devices (including mobile devices).

In the United States, monitors (displays) should not be used for diagnosis, unless the monitor (display) has specifically received 510(k) clearance for this purpose.

AI/ML Overview

This FDA 510(k) clearance letter pertains to syngo Dynamics (Version VA41D), a Medical Image Management and Processing System (MIMPS). While the document broadly discusses the device's substantial equivalence to a predicate device (syngo Dynamics VA40F) and its general functionalities, the only specific AI/ML-enabled function for which performance data and acceptance criteria are detailed is the Auto EF algorithm for calculating left ventricular ejection fraction from ultrasound images.

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based specifically on the Auto EF algorithm information provided:

1. Table of Acceptance Criteria and Reported Device Performance (Auto EF Algorithm)

The document states that "Additional acceptance criteria were defined with a total of 12 predetermined acceptance criteria," but only explicitly details one primary statistical criterion and provides summarized performance for a few other aspects.

Acceptance Criterion	Reported Device Performance (syngo Dynamics VA41D)
Pearson's correlation coefficient (r) between biplane EF generated by Auto EF and ground truth $\ge 0.800$	0.822 (compared to 0.826 for predicate VA40F)
Increased percentage of cases with biplane EF results	93.3% (140 of 150 cases, compared to 92.0% for predicate VA40F)
Bias of absolute EF	Minimal, -0.2% (unchanged from predicate VA40F)
Percentage of cases where absolute biplane EF delta between Auto EF and GT $\le$ 10%	87.9% (compared to 83.7% for predicate VA40F)
All 12 predetermined acceptance criteria	Exceeded all 12 defined acceptance criteria.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: n = 150 cases.
Data Provenance: The test data originated from 3 sites in the U.S., representing geographic diversity from 2 different regions. The data was collected retrospectively, as it was independent of the training data. The document states it is "representative of the intended use population for Auto EF" and balanced for gender, covering ages 21-93 years and BMIs 16.5-48.8. It also included data from three ultrasound manufacturers (Philips, GE, and Siemens).

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

Number of Experts: 2 experienced sonographers.
Qualifications: "experienced sonographers." Specific details regarding their years of experience or board certifications are not provided in the document.

4. Adjudication Method for the Test Set

Adjudication Method: The two sonographers worked independently to establish the ground truth. There is no mention of a formal adjudication process (e.g., 2+1, 3+1), arbitration by a third expert, or a consensus meeting after independent readings. They "did not have access to Auto EF when establishing the ground truth."

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

MRMC Study: No. The document explicitly states, "No clinical studies were carried out for syngo Dynamics (Version VA41D). All performance testing was conducted in a non-clinical fashion as part of the verification and validation activities for the medical device." The evaluation focused on the algorithm's performance against ground truth, not on human reader performance with or without AI assistance.
Effect Size of Human Reader Improvement: Not applicable, as no MRMC study was performed.

6. Standalone (Algorithm-Only) Performance Study

Standalone Study: Yes. The performance validation of the Auto EF algorithm was conducted in a standalone manner. The "Auto EF results with the subject device" were compared directly against the established ground truth. The algorithm processed the images and generated biplane EF values without human intervention in the calculation process, although the system allows users to "review, edit or reject the results."

7. Type of Ground Truth Used

Type of Ground Truth: Expert consensus with a conventional manual method based on the "Method of Disks" (MOD), also known as the Modified Simpson's Rule. The ground truth was established by two independent sonographers calculating left ventricular volumes and ejection fraction.

8. Sample Size for the Training Set

Training Set Sample Size: Not explicitly stated. The document mentions the algorithm was "re-trained with more training data" compared to the predicate device, but does not provide a specific number.

9. How the Ground Truth for the Training Set Was Established

Training Set Ground Truth Establishment: Not explicitly detailed. The document only states that the "LV auto contouring algorithm has been updated with pre-training and additional annotated training data." It does not specify the method (e.g., expert consensus, manual contouring) or the number/qualifications of experts involved in annotating the training data. However, given that the test set ground truth was established by sonographers using the Method of Disks, it is highly probable that a similar methodology was used for the training data annotation.

Ask a Question

Ask a specific question about this device

K Number

K242952

Device Name

INNOVANCE Antithrombin

Manufacturer

Siemens Healthcare Diagnostic Products GmbH

Date Cleared

2025-03-28

(184 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K081769,K933125

Predicate For

N/A

Intended Use

INNOVANCE Antithrombin is a chromogenic assay for the automated quantitation of functionally active antithrombin in human citrated plasma and can be used as an aid in the diagnosis of antithrombin deficiency.

INNOVANCE Antithrombin is indicated as an aid in monitoring antithrombin activity to support QFITLIA (fitusiran) dosing in adult and pediatric patients aged 12 years and older with hemophilia A or B with or without factor VIII or IX inhibitors.

Device Description

The INNOVANCE Antithrombin assay is suitable for the determination of physiologically active antithrombin on automatic analyzers and enables the diagnosis of inherited or acquired antithrombin deficiencies. The INNOVANCE Antithrombin assay utilizes a chromogenic measuring principle. An excess of human factor Xa is added to citrated plasma. In the presence of heparin, a portion of the enzyme is complexed and inactivated by the antithrombin present in the sample. Excess, uninhibited factor Xa then cleaves a specific chromogenic substrate, causing the release of a dye. The substrate cleavage is determined by the increase in the absorbance value at 405 nm. The release of dye is inversely proportional to the inhibiting activity of antithrombin in the plasma sample, i.e., the smaller the concentration of functionally active antithrombin, the higher the absorbance signal per time unit.

AI/ML Overview

The provided text describes the analytical and clinical performance of the INNOVANCE Antithrombin assay, particularly in relation to guiding QFITLIA (fitusiran) dosing. It's important to note that this document is for an in vitro diagnostic (IVD) device (an assay), not a software-based AI/ML device that typically involves human readers and image analysis. Therefore, some of the requested information (like number of experts for ground truth, adjudication methods, MRMC studies, or training set details for an AI algorithm) are not directly applicable or found in this type of submission.

However, I will extract and infer the closest applicable information based on the provided text.

Here's a breakdown of the acceptance criteria and study details:

Acceptance Criteria and Reported Device Performance

For an IVD device like the INNOVANCE Antithrombin, acceptance criteria are typically related to analytical performance characteristics that demonstrate the assay's reliability and accuracy. The document highlights precision (repeatability/reproducibility), analytical specificity (interference), and detection capabilities (limit of quantitation).

Acceptance Criteria Category	Specific Acceptance Criterion (Implicit/Explicit)	Reported Device Performance (INNOVANCE Antithrombin)
Precision	Repeatability (within-run precision)	Pathological Plasma Pool 1 (mean: 15.73 % of norm): Repeatability CV: 8.77 % Pathological Plasma Pool 2 (mean: 9.75 % of norm): Repeatability SD: 1.36 % of Norm
	Within-device/lab precision	Pathological Plasma Pool 1 (mean: 15.73 % of norm): Within-Device/Lab CV: 9.65 % Pathological Plasma Pool 2 (mean: 9.75 % of norm): Within-Device/Lab SD: 1.59 % of Norm
	Total precision (combined lots)	Pathological Plasma Pool 1 (mean: 15.73 % of norm): Total combined lots CV: 9.95 % Pathological Plasma Pool 2 (mean: 9.75 % of norm): Total combined lots SD: 1.59 % of Norm
	Reproducibility (multi-site/between-lab)	Pathological Plasma Pool 1 (mean: 15.54 % of norm): Reproducibility CV: 8.85 % Pathological Plasma Pool 2 (mean: 11.05 % of norm): Reproducibility Lab SD: 2.19 % of Norm
Analytical Specificity	Interference (from common substances & therapeutics)	No interference up to: - Triglycerides 211 mg/dL - Hemoglobin 1000 mg/dL - Bilirubin 60 mg/dL No interferences from therapeutics up to: - Desmopressin: 0.0144 µg/mL - Tranexamic Acid: 0.48 mg/mL - Recombinant Factor VIIa: 2.16 µg/mL - Coagulation Factor VIII: 0.96 IU/mL - Coagulation Factor IX: 1.44 IU/mL - Activated Prothrombin Complex Concentrate (aPCC): 2.4 IU/mL
Detection Capabilities	Limit of Quantitation (LoQ)	LoQ was determined as 7.32% of norm. (Calculated based on a Total Error goal of not exceeding 4% of norm).
Clinical Performance	Aid in monitoring AT activity for QFITLIA dosing to achieve target range	Clinical data from the ATLAS-OLE study demonstrated that individualized QFITLIA AT-DR using the INNOVANCE Antithrombin assay was successful at achieving AT levels within the targeted AT activity range of 15-35%. Median observed annualized bleeding rate (IQR) for treated bleeds was 3.7 (0.0; 7.5) overall, 1.9 (0.0; 5.6) in inhibitor patients and 3.8 (0.0; 11.2) in non-inhibitor patients. This supports its use for safe and effective dosing.

Study Information

Sample sizes used for the test set and the data provenance:
- Analytical Performance Studies (Test Set):
  - Precision (Single Site): n=240 determinations (3 reagent lots, 20 days, 2 runs/day, 2 samples/run for 2 pathological plasma pools).
  - Reproducibility (Multi-Site): Each of the three internal study sites performed 5 runs per day with 3 replicates of each of the two pathological plasma pools per run (3x5x2x3). The total number of unique samples or determinations isn't given as a single 'n' for the entire reproducibility study, but it involves multiple runs and replicates across sites.
  - Analytical Specificity/Interference: Panel of exogenous substances tested on two plasma pools (low AT activity ~10-15% of norm, and high AT activity ~90% of norm). Specific 'n' values for samples tested per interferent are not given, but described as "paired-difference experiments".
  - Limit of Quantitation (LoQ): n=120 determinations (4 replicates of 5 patient samples, run once per day for 3 days using 2 reagent lots on one BCS XP System).
  - Data Provenance: The analytical studies were performed "internally at the Siemens company site in Germany (Site 1)" and "three (3) internal study sites (Sites 1, 2, and 3)". This implies prospective data collection for these specific validation studies. The clinical data used for drug dosing came from a multicenter clinical trial (ATLAS-OLE study).
- Clinical Validation Data (Test Set for new indication):
  - QFITLIA ATLAS-OLE Study: A total of 227 patients were treated with QFITLIA. Of these, 213 patients were transitioned to an AT-DR (Antithrombin-based Dosing Regimen) with the target AT activity range of 15-35%. 199 patients started at the 50 mg dose every other month with dosing guided by INNOVANCE Antithrombin.
  - Data Provenance: This was a multicenter, open-label extension study (ATLAS-OLE, ClinicalTrials.gov Identifier NCT03754790). The text indicates AT activity was measured "at baseline (prior to QFITLIA initiation) as well as after QFITLIA exposure throughout the ATLAS-OLE study". This is prospective clinical trial data. The country of origin for the patient data is not explicitly stated but is typically international for large clinical trials.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- This is an IVD assay clearance, not an AI/ML device in image analysis. Therefore, there are no "experts" in the sense of human readers interpreting clinical data for an algorithm's ground truth.
- The "ground truth" for an IVD device's performance is typically established by:
  - Reference Methods: Highly accurate, established laboratory methods.
  - Clinical Outcomes/Patient Status: For the clinical validation, the "ground truth" is the patient's actual antithrombin activity level using the assay itself, and subsequently, the clinical outcomes related to bleeding rates and successful maintenance of AT levels within the therapeutic range under QFITLIA dosing. The QFITLIA clinical trial (ATLAS-OLE) provides this clinical outcome data.
- The expertise lies in the chemists, biochemists, and clinical laboratory scientists who designed and ran the analytical validation studies, and the clinical investigators (e.g., hematologists) involved in the QFITLIA clinical trial who managed patient care and assessed clinical outcomes. Their qualifications are inherent to conducting such studies, but not explicitly detailed as "experts establishing ground truth" in the same way as for AI.
Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- Not applicable for this type of IVD device. Adjudication methods like 2+1 (two readers agree, third resolves discrepancy) are common in AI/ML clinical validation studies involving human interpretation (e.g., radiology reads). For an assay, measurements are quantitative and discrepancies would be resolved through re-testing, calibration checks, or instrument troubleshooting, not human adjudication of a qualitative decision.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- Not applicable as this is an IVD assay, not an AI/ML device assisting human readers.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- The core "performance" of this device is its analytical measurement. The "standalone" performance here refers to the assay's ability to accurately quantify antithrombin activity itself, which is what the analytical performance studies (precision, interference, LoQ) demonstrate. There isn't an "algorithm" in the typical AI sense; it's a chemical reaction and spectrophotometric measurement. The clinical validation then shows that this standalone measurement (AT activity) is useful in guiding QFITLIA dosing, which is "without human-in-the-loop performance" of the assay itself, though humans still perform the dosing decisions based on the assay output.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Analytical Ground Truth: For precision, the "ground truth" samples were "pathological plasma pools" with known/target mean AT activities. For interference, the "ground truth" for "no interference" was defined by comparing results with and without interferent, with acceptance based on a predefined allowable difference (e.g., within X% of the control). For LoQ, calibration standards and accepted statistical methods (CLSI document EP17-A2) were used to determine the lowest reliable measurable concentration.
- Clinical Ground Truth: For the clinical validation, the "ground truth" for the device's utility in QFITLIA dosing was clinical outcomes data from the ATLAS-OLE study, specifically:
  - The ability to achieve and maintain AT activity levels within the target therapeutic range (15-35%).
  - The observed annualized bleeding rates in patients whose dosing was guided by the assay.
The sample size for the training set:
- This is an IVD assay, not an AI/ML algorithm that requires a "training set" in the machine learning sense. The assay works based on established chemical principles, not on being trained on a dataset.
- The closest analogy might be the samples used for initial assay development, calibration, and internal optimization, but these are not referred to as a "training set" in this context.
How the ground truth for the training set was established:
- Not applicable, as there is no "training set" for this IVD assay in the AI/ML sense.

Ask a Question

Ask a specific question about this device

K Number

K242745

Device Name

AI-Rad Companion Organs RT

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-03-27

(197 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K231765,K223774,K211881,K232899

Predicate For

N/A

Intended Use

AI-Rad Companion Organs RT is a post-processing software intended to automatically contour DICOM CT and MR pre-defined structures using deep-learning-based algorithms.

Contours that are generated by AI-Rad Companion Organs RT may be used as input for clinical workflows including external beam radiation therapy treatment planning. AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept contours generated by AI-Rad Companion Organs RT.

The outputs of AI-Rad Companion Organs RT are intended to be used by trained medical professionals.

The software is not intended to automatically detect or contour lesions.

Device Description

AI-Rad Companion Organs RT provides automatic segmentation of pre-defined structures such as Organs-at-risk (OAR) from CT or MR medical series, prior to dosimetry planning in radiation therapy. AI-Rad Companion Organs RT is not intended to be used as a standalone diagnostic device and is not a clinical decision-making software.

CT or MR series of images serve as input for AI-Rad Companion Organs RT and are acquired as part of a typical scanner acquisition. Once processed by the AI algorithms, generated contours in DICOMRTSTRUCT format are reviewed in a confirmation window, allowing clinical user to confirm or reject the contours before sending to the target system. Optionally, the user may select to directly transfer the contours to a configurable DICOM node (e.g., the Treatment Planning System (TPS), which is the standard location for the planning of radiation therapy).

AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept the automatically generated contours. Then the output of AI-Rad Companion Organs RT must be reviewed and, where necessary, edited with appropriate software before accepting generated contours as input to treatment planning steps. The output of AI-Rad Companion Organs RT is intended to be used by qualified medical professionals, who can perform a complementary manual editing of the contours or add any new contours in the TPS (or any other interactive contouring application supporting DICOM-RT objects) as part of the routine clinical workflow.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Device Performance Study for AI-Rad Companion Organs RT

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria for the AI-Rad Companion Organs RT device, particularly for the enhanced CT contouring algorithm, are based on comparing its performance to the predicate device and relevant literature/cleared devices. The primary metrics used are Dice coefficient and Absolute Symmetric Surface Distance (ASSD).

Table 3: Acceptance Criteria of AIRC Organs RT VA50

Validation Testing Subject	Acceptance Criteria	Reported Device Performance (Summary)
Organs in Predicate Device	All organs segmented in the predicate device are also segmented in the subject device.	Confirmed. The device continued to segment all organs previously handled by the predicate.
	The average (AVG) Dice score difference between the subject and predicate device is < 3%.	Confirmed. "For existing organs, the average (AVG) Dice score difference between the subject device and predicate device is smaller than 3%."
New Organs for Subject Device	The subject device in the selected reference metric has a higher value than the defined baseline value.	Confirmed. "The performance results of the subject device for the new CT organs are comparable to the reference literature & cleared devices. Here equivalence for the new organs is defined such that the selected reference metric has a higher value than the defined baseline."

Table 3: Performance Summary of the Subject Device CT Contouring (Overall Average Dice Coefficients)

Anatomic Region	Avg Dice (%)	Std Dice (%)	95% CI
Head & Neck	76.1	14.3	[75.1, 77.2]
Head & Neck lymph nodes	69.3	13.9	[68.7, 70.0]
Thorax	76.9	15.8	[76.2, 77.6]
Abdomen	87.3	10.1	[86.3, 88.2]
Pelvis	85.7	9.6	[85.0, 86.5]
Cardiac	75.6	15.1	[74.1, 77.1]

Table 4: Detailed Performance Evaluation of the New Organs in the Subject Device (Selected Examples)

Organ Name	No.	AVG Dice (%)	STD Dice (%)	MED Dice (%)	95%CI Dice	AVG ASSD (mm)	STD ASSD (mm)	MED ASSD (mm)	95%CI ASSD
Left Breast	30	90.4	3.8	91	[89, 91.8]	2.4	2.2	1.8	[1.5, 3.2]
Right Breast	30	90.2	3.7	90.8	[88.8, 91.5]	1.9	0.7	1.8	[1.7, 2.2]
Bowel Bag	33	95	3.6	96.5	[93.7, 96.3]	1.9	1.5	1.4	[1.4, 2.5]
Pituitary	30	75.8	7.4	77	[73.1, 78.6]	0.7	0.3	0.6	[0.5, 0.8]
Brainstem	30	88.4	2.5	88.8	[87.5, 89.3]	1	0.3	0.9	[0.9, 1.1]
Esophagus	30	85.6	4.2	86	[84, 87.2]	0.6	0.3	0.6	[0.5, 0.7]
MEDIASTINAL LN 9L	31	38.3	21.1	42.9	[30.6, 46.1]	5.3	4.4	3.7	[3.7, 6.9]

(Note: The full Table 4 from the document provides detailed performance for all 37 new organs. This table includes a selection for illustrative purposes.)

2. Sample Sizes and Data Provenance

Test Set Sample Size:
- CT Contouring Algorithm: N = 579 cases
- MR Contouring Algorithm: The MR algorithm is unchanged from the predicate, so its performance is unchanged. The predicate was validated using 66 cases.
Data Provenance (CT Contouring Algorithm Test Set):
- Geographic Origin (Overall N=579): Data from multiple clinical sites across North American, South American, Asia, Australia, and Europe.
- Example Cohorts (Table 5: Validation Testing Data Information based on Cohort):
  - Cohort A.1 (N=73): Germany (14), Brazil (59)
  - Cohort A.2 (N=40): Canada (40)
  - Cohort A.3 (N=301): South/North America (184), EU (44), Asia (33), Australia (28), Unknown (12)
  - Cohort B (N=165): South/North America (100), EU (51), Asia (6), Australia (3), Unknown (5)
- Retrospective/Prospective: "retrospective performance study on CT data previously acquired for RT treatment planning."

3. Number of Experts Used to Establish Ground Truth and Qualifications

Number of Experts for Ground Truth: "a team of experienced annotators mentored by radiologists or radiation oncologists" for initial manual annotation. "a board-certified radiation oncologist" performed a quality assessment including review and correction of each annotation. The document does not specify an exact number of individuals for these teams, but describes the roles and qualifications.
Qualifications of Experts:
- "experienced annotators"
- "radiologists or radiation oncologists" (mentors for annotators)
- "board-certified radiation oncologist" (for quality assessment/review)

4. Adjudication Method for the Test Set

The document describes the ground truth establishment process as: "manual annotation" by experienced annotators mentored by radiologists/radiation oncologists, followed by a "quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist." This indicates a hierarchical review/correction process rather than a multi-reader consensus adjudication between equally-weighted readers (e.g., 2+1 or 3+1). The final accepted contour after the board-certified radiation oncologist's review served as the ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was described. The study focused on the standalone performance of the AI algorithm against established ground truth and comparison with a predicate device and literature. The document does not mention an effect size of how much human readers improve with AI vs. without AI assistance. The intended use specifies that the AI-generated contours must be reviewed, edited, and accepted by trained medical professionals, implying a human-in-the-loop workflow, but the validation study presented focuses on the AI's autonomous segmentation accuracy.

6. Standalone (Algorithm Only) Performance Study

Yes, a standalone performance study was done. The performance metrics (Dice coefficient, ASSD) and the comparison to an expert-established ground truth demonstrate the algorithm's autonomous segmentation capability. The study validated the "autocontouring algorithms" and their performance.

7. Type of Ground Truth Used

The ground truth used for the test set was expert consensus / manual annotation based on clinical guidelines. Specifically: "Ground truth annotations were established following RTOG and clinical guidelines using manual annotation." This was further reviewed and corrected by a board-certified radiation oncologist.

8. Sample Size for the Training Set

The document provides the sample sizes for the training set for new organs introduced:

Table 6: Training Dataset Characteristics (Examples):
- Lacrimal Glands Left/Right: 247
- Pituitary Gland: 247
- Humeral Head Left/Right: 207
- Bowel Bag: 544
- Pelvic Bone Left/Right: 160
- Sacrum: 160
- Mediastinal LN (various): 136
- Femoral Head Left/Right: 160
- Brainstem: 247
- Esophagus: 247
- Breast Left/Right: 172
- Supraglottic Larynx: 247
- Glottis: 247

The total training set size for all organs is not explicitly summed, but these numbers indicate the scale of the training data used for the specific new organs.

9. How the Ground Truth for the Training Set Was Established

"In both the annotation process for the training and validation testing data, the annotation protocols for the OAR were defined following the applicable guidelines. The ground truth annotations were drawn manually by a team of experienced annotators mentored by radiologists or radiation oncologists using an internal annotation tool. Additionally, a quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist using validated medical image annotation tools."

This indicates the same rigorous process of expert manual annotation and review was applied to establish ground truth for the training set as for the test set. The validation testing and training data were explicitly stated to be independent.

Ask a Question

Ask a specific question about this device

K Number

K241770

Device Name

Prostate MR AI (VA10A)

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-03-05

(258 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212783,K181704

Predicate For

N/A

Intended Use

Prostate MR AI is a plug-in Radiological Computer Assisted Detection and Diagnosis Software device intended to be used · with a separate hosting application · as a concurrent reading aid to assist radiologists in the interpretation of a prostate MRI examination acquired according to the PI-RADS standard · in adult men (40 years and older) with suspected cancer in treatment naïve prostate glands The plug-in software analyzes non-contrast T2 weighted (T2W) and diffusion weighted image (DWI) series to segment the prostate gland and to provide an automatic detection and segmentation of regions suspicious for cancer. For each suspicious region detected, the algorithm moreover provides a lesion Score, by way of PI-RADS interpretation suggestion. Outputs of the device should be interpreted consistently with ACR recommendations using all available MR data (e.g., dynamic contrast enhanced images [if available]). Patient management decisions should not be made solely based on analysis by the Prostate MR AI algorithm.

Device Description

This premarket notification addresses the Siemens Healthineers Prostate MR AI (VA10A) Radiological Computer Assisted Detection and Diagnosis Software (CADe/CADx). Prostate MR AI is a Computer Assisted Detection and Diagnosis algorithm designed to plug into a hosting workflow that assists radiologists in the detection of suspicious lesions and their classification. It is used as a concurrent reading aid to assist radiologists in the interpretation of a prostate MRI examination acquired according to the PI-RADS standard. The automatic lesion detection requires transversal T2W and DWI series as inputs. The device automatically exports a list of detected prostate regions that are suspicious for cancer (each list entry consists of contours and a classification by Score and Level of Suspicion (LoS)), a computed suspicion map, and a per-case LoS. The results of the Prostate MR AI plug-in (with the case-level LoS, lesion center points, lesion diameters, lesion ADC median, lesion 10th percentile, suspicion map, and non-PZ segmentation considered optional) are to be shown in a hosting application that allows the radiologist to view the original case, as well as confirm, reject, or edit lesion candidates with their contours and Scores as generated by the Prostate MR AI plug-in. Moreover, the radiologist can add lesions with contours and PI-RADS scores and finalize the case. In addition, the outputs include an automatically computed prostate segmentation, as well as sub-segmentations of the peripheral zone and the rest of the prostate (non-PZ). The algorithm will augment the prostate workflow of currently cleared syngo.MR General Engine if activated via a separate license on the General Engine.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

Acceptance Criteria	Reported Device Performance
Automatic Prostate Segmentation
Median Dice score between AI algorithm results and ground truth masks exceeds 0.9.	The median of the Dice score between the AI algorithm results and the corresponding ground truth masks exceeds the threshold of 0.9.
Median normalized volume difference between algorithm results and ground truth masks is within ±5%.	The median of the normalized volume difference between the algorithm results and the corresponding ground truth masks is within a ±5% range.
AI algorithm results are statistically non-inferior to individual reader variability (5% margin of error, 5% significance level).	The AI algorithm results as compared to any individual reader are statistically non-inferior based on variabilities that existed among the individual readers within the 5% margin of error and 5% significance level.
Prostate Lesion Detection and Classification
Case-level sensitivity of lesion detection ≥ 0.80 for both radiology and pathology ground truth.	The case-level sensitivity of the lesion detection is equal or greater than 0.80 for both radiology and pathology ground truth.
False positive rate per case of lesion detection < 1 false positive per case for radiology ground truth.	The false positive rate per case of the lesion detection is smaller than one false positive per case for radiology ground truth.
Accuracy of PI-RADS classification of radiology ground truth lesions (detected by algorithm) ≥ 0.8.	The accuracy of the PI-RADS classification of radiology ground truth lesions detected by the algorithm is equal or greater than 0.8.
Non-inferior performance in GE vs Siemens and African American vs non-African American cases, and in cases with peripheral zone vs non-peripheral lesions.	The non-inferior performance of the subject device in GE vs Siemens and African American vs non-African American cases, and in cases with peripheral zone vs non-peripheral lesions was demonstrated. (Note: Specific metrics for this non-inferiority are not explicitly stated as distinct numerical criteria but are stated as "met".)
Clinical Performance (Reader Study - Case-level discrimination of Gleason Grade Group ≥ 1)
Statistically significant improvement in AUROC for aided reading vs unaided reading.	Fully Inclusive Analysis: AUROC improved from 0.6758 (unaided) to 0.7010 (aided), difference of 0.0252 (95% C.I. [0.0011, 0.0493]; P=0.040). Maximally Restrictive Analysis: AUROC improved from 0.6579 (unaided) to 0.6948 (aided), difference of 0.0368 (95% C.I. [0.0108, 0.0628]; P=0.006). In both analyses, the improvement was statistically significant and the primary endpoint thus met.
Clinical Performance (Reader Study - Lesion-level reading performance)
Statistically significant improvement in AUwAFROC for aided reading vs unaided reading.	Fully Inclusive Analysis: AUwAFROC improved in aided reading by 0.0350 (95% C.I.:[0.0020, 0.0681], P=0.037). Maximally Restrictive Analysis: AUwAFROC improved in aided vs. unaided reading by 0.302 (95% C.I.: [0.0080,0.0520], P=0.008). In both analyses, the improvement was statistically significant and the secondary endpoint thus met.
Statistically significant improvement in Fleiss' Kappa for interreader agreement in per-case PI-RADS scores for aided reading vs unaided reading.	Fleiss' Kappa improved from 0.283 (unaided) to 0.371 (aided), with a difference of 0.087 (95% C.I. [0.051, 0.125]). The improvement was statistically significant (P<0.0001).

Study Information

2. Sample size used for the test set and the data provenance:

Automatic Prostate Segmentation: 222 transversal T2 series.
- Provenance: More than 10 clinical sites.
- Retrospective/Prospective: Not explicitly stated, but the description of comparing against ground truth generated implies retrospective use of existing scans.
Prostate Lesion Detection and Classification (Standalone Performance):
- 105 cases from 6 sites (against radiology ground truth).
- 115 cases from 6 sites (against pathology ground truth).
- 340 cases from the multi-reader multi-case study (used for evaluation, implied prospective for this part of the evaluation, but the cases themselves were retrospective for the reader study).
- Provenance: 6 sites (for 105 and 115 cases), and two US sites (for 340 cases).
- Retrospective/Prospective: The cases for the lesion detection and classification evaluation were used to compare against established ground truths, suggesting retrospective analysis of existing data. The cases for the reader study were retrospectively selected.
Multi-Reader Multi-Case (MRMC) Study: 340 cases.
- Provenance: Two US sites. Cases were consecutive and specifically included additional consecutive patient cases from men of African descent to ensure at least 13% Black or African American ethnicity.
- Retrospective/Prospective: Cases were selected retrospectively.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Automatic Prostate Segmentation: 3 expert radiologists. No specific years of experience or subspecialty beyond "radiologists" are mentioned but implied as "expert".
Prostate Lesion Detection and Classification (Radiology Ground Truth): 3 expert radiologists in prostate MRI reading.
MRMC Study (Lesion-level reference standard): 3 experienced radiologists acting as Truthers.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:

Automatic Prostate Segmentation: Pixel-wise consensus among the 3 expert radiologists.
Prostate Lesion Detection and Classification (Radiology Ground Truth): Consensus reading of the 3 expert radiologists.
MRMC Study (Case-level reference standard): Biopsy results (Gleason Grade Group GGG ≥ 1), or for cases without biopsy, PSA density and follow-up data.
MRMC Study (Lesion-level reference standard): Consensus lesions with a consensus PI-RADS of at least 3 from majority voting among the 3 experienced radiologists. (This implies a form of consensus/majority vote).

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Yes, an MRMC study was done with a paired split-plot design, combining two fully-crossed MRMC sub-studies.

Case-level AUROC improvement (discriminating Gleason Score ≥ 1):
- Fully Inclusive Analysis: +0.0252 (from 0.6758 unaided to 0.7010 aided).
- Maximally Restrictive Analysis: +0.0368 (from 0.6579 unaided to 0.6948 aided).
Lesion-level AUwAFROC improvement:
- Fully Inclusive Analysis: +0.0350.
- Maximally Restrictive Analysis: +0.0302.
Fleiss' Kappa (interreader agreement in per-case PI-RADS scores) improvement: +0.087 (from 0.283 unaided to 0.371 aided).

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, standalone performance was evaluated for:

Automatic Prostate Segmentation: Compared algorithm results to ground truth generated by radiologists.
Prostate Lesion Detection and Classification: Compared automatic detection and classification results to radiology ground truth and pathology ground truth.
MRMC Study (AI Standalone reference): The ROC curves shown graphically include a "grey curve [that] denotes AI standalone performance."

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

For Automatic Prostate Segmentation: Pixel-wise consensus from 3 expert radiologists.
For Prostate Lesion Detection and Classification:
- Consensus reading of 3 expert radiologists (radiology ground truth).
- Biopsy results for the same patient (pathology ground truth).
For MRMC Study (Case-level): Biopsy results (Gleason Grade Group GGG ≥ 1), and in cases where biopsy was unavailable, PSA density and follow-up (12 months negative by PSA or MRI).
For MRMC Study (Lesion-level): Consensus lesions with a consensus PI-RADS of at least 3 from majority voting among 3 experienced radiologists.

8. The sample size for the training set:

The document states: "The cases for the reader study were kept completely separate from those used for the training of the Prostate MR AI algorithm." However, it does not specify the sample size for the training set. It only mentions that the AI algorithm was "trained on a database of prostate MR image series acquired according to the PI-RADS standard (non-contrast T2W and DWI image series), and corresponding radiological and/or biopsy findings."

9. How the ground truth for the training set was established:

The ground truth for the training set was established based on "corresponding radiological and/or biopsy findings." Specific details on the adjudication method (e.g., number of experts, consensus process) for the training set are not provided in this document, only the source of the ground truth.

Ask a Question

Ask a specific question about this device

K Number

K242685

Device Name

Atellica® CH Creatinine_3 (Crea3)

Manufacturer

Siemens Healthcare Diagnostics Inc.

Date Cleared

2024-12-04

(89 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K161494

Predicate For

N/A

Intended Use

The Atellica® CH Creatinine_3 (Crea3) assay is for in vitro diagnostic use in the quantitative determination of creatinine in human serum, plasma (lithium heparin, dipotassium EDTA, and sodium heparin), and urine using the Atellica® CH Analyzer. Such measurements are used in the diagnosis and treatment of renal diseases, and in monitoring renal dialysis.

Device Description

The Atellica CH Crea3 assay is based on the reaction of picrate with creatinine in an alkaline medium to produce a red chromophore creatinine picrate complex. The rate of complex formation is measured at 505/571 nm and is proportional to the creatinine concentration. The Atellica CH Crea3 assay is a modification of the Jaffe method, using rate blanking and intercept correction. Rate blanking is used to minimize bilirubin interference. Also, because non-specific serum/plasma protein interactions with this reagent have been found to produce a positive bias of approximately 0.3 mg/dL (26.5 µmol/L), serum/plasma measurements are automatically corrected by subtracting 0.3 mg/dL (26.5 µmol/L) from each result.

AI/ML Overview

The provided text describes the performance characteristics and studies for the Atellica® CH Creatinine_3 (Crea3) assay, a new in vitro diagnostic device for quantitative determination of creatinine. It compares this new device to a predicate device, the Atellica® CH Creatinine_2 (Crea_2) assay.

Here's an analysis of the acceptance criteria and the study proving the device meets them, based on the provided text:

Important Note: The document focuses on establishing substantial equivalence for an in vitro diagnostic (IVD) test, which primarily relies on analytical performance characteristics rather than clinical outcome studies or multi-reader multi-case (MRMC) comparative effectiveness studies typically seen with imaging AI devices. Therefore, some of your requested information (like number of experts for ground truth, adjudication methods, MRMC studies, and training set details for an AI model) are not directly applicable or provided in this type of submission.

Acceptance Criteria and Reported Device Performance

The acceptance criteria for this device are established through various analytical performance studies, primarily comparing it to a legally marketed predicate device (Atellica® CH Creatinine_2). The acceptance criteria are implicitly defined by the successful demonstration of equivalence or meeting pre-defined performance goals for each characteristic.

Here's a table summarizing the acceptance criteria (inferred from the "designed to have" or "determined in accordance with" statements and the reported results meeting these) and the reported device performance:

Performance Characteristic	Acceptance Criteria (Implicit)	Reported Device Performance (Atellica® CH Creatinine_3 (Crea3))
Detection Capability	LoB: $\le$ LoD for serum and urine samples. LoD: $\le$ 0.15 mg/dL for serum/plasma; $\le$ 3.00 mg/dL for urine. LoQ: $\le$ 0.15 mg/dL for serum/plasma with $\le$ 0.10 mg/dL total analytical error; $\le$ 3.00 mg/dL for urine with $\le$ 1.50 mg/dL total analytical error.	Serum/plasma: LoB: 0.05 mg/dL LoD: 0.10 mg/dL LoQ: 0.15 mg/dL Urine: LoB: 0.50 mg/dL LoD: 1.00 mg/dL LoQ: 3.00 mg/dL (All results meet the stated design goals/acceptance criteria).
Precision	Determined in accordance with CLSI Document EP05-A3 (indicates adherence to specific statistical targets for repeatability and within-lab precision, implicitly accepted if within CLSI guidelines for the assay's use).	Serum Samples (n=80 each): - Serum 1 (0.38 mg/dL): Repeatability SD 0.006, CV 1.6%; Within-Lab SD 0.012, CV 3.2% - Serum 2 (0.73 mg/dL): Repeatability SD 0.023, CV 3.2%; Within-Lab SD 0.029, CV 4.0% - Serum 3 (0.73 mg/dL): Repeatability SD 0.006, CV 0.8%; Within-Lab SD 0.019, CV 2.6% - Serum 4 (1.18 mg/dL): Repeatability SD 0.007, CV 0.6%; Within-Lab SD 0.019, CV 1.6% - Serum QC 1 (1.85 mg/dL): Repeatability SD 0.007, CV 0.4%; Within-Lab SD 0.024, CV 1.3% - Serum QC 2 (6.21 mg/dL): Repeatability SD 0.011, CV 0.2%; Within-Lab SD 0.067, CV 1.1% - Serum 5 (17.39 mg/dL): Repeatability SD 0.035, CV 0.2%; Within-Lab SD 0.189, CV 1.1% - Serum 6 (28.54 mg/dL): Repeatability SD 0.056, CV 0.2%; Within-Lab SD 0.317, CV 1.1% Urine Samples (n=80 each): - Urine 1 (56.74 mg/dL): Repeatability SD 0.102, CV 0.2%; Within-Lab SD 0.746, CV 1.3% - Urine 2 (135.80 mg/dL): Repeatability SD 0.206, CV 0.2%; Within-Lab SD 1.601, CV 1.2% - Urine QC 1 (195.79 mg/dL): Repeatability SD 0.253, CV 0.1%; Within-Lab SD 2.376, CV 1.2% (All results demonstrate low CVs, indicating good precision).
Reproducibility	Determined in accordance with CLSI Document EP05-A3 (implies meeting specific statistical targets for variability components across different days, lots, and instruments).	Serum Samples (n=225 each): Overall CV (%) for reproducibility ranges from 1.0% to 5.0%. Urine Samples (n=225 each): Overall CV (%) for reproducibility ranges from 1.4% to 1.6%. (All results demonstrate good reproducibility across conditions).
Assay Comparison	Serum: Correlation coefficient $\ge$ 0.950 and slope of 1.00 $\pm$ 0.05, compared to predicate (Atellica CH Creatinine 2), using Weighted Deming regression. Urine: Correlation coefficient $\ge$ 0.950 and slope of 0.000 $\pm$ 3.00, compared to predicate (Atellica CH Creatinine 2), using Weighted Deming regression.	Serum (n=151): Regression equation y = 1.00x - 0.04 mg/dL, correlation coefficient (r) = 1.000. Sample range 0.44 to 28.64 mg/dL. Urine (n=113): Regression equation y = 1.00x + 0.14 mg/dL, correlation coefficient (r) = 1.000. Sample range 12.60 to 237.06 mg/dL. (Both serum and urine results meet the acceptance criteria for correlation and slope).
Specimen Equivalence	Determined using Weighted Deming regression (implicitly, the regression line should demonstrate equivalence, i.e., close to y=x, with high correlation coefficient).	Sodium Heparin (n=50): y = 1.00x + 0.00 mg/dL, r=0.999. Lithium Heparin (n=50): y = 0.99x + 0.06 mg/dL, r=0.999. Dipotassium EDTA (n=50): y = 0.98x + 0.04 mg/dL, r=0.998. (All demonstrate strong equivalence to serum reference).
Interferences (HIL)	$\le$ 10% interference from hemoglobin, bilirubin, and lipemia. Bias > 10% or 0.15 mg/dL (whichever is greater for serum/plasma) is considered interference.	Reported biases for Hemoglobin (1000 mg/dL), Conjugated Bilirubin (40-45 mg/dL), Unconjugated Bilirubin (45-60 mg/dL), and Lipemia (2250-3000 mg/dL) are all within the $\pm$10% or $\pm$0.15 mg/dL threshold for the tested analyte concentrations, demonstrating acceptable interference profiles.
Interfering Substances	Bias $\le$ 10% or $\pm$0.15 mg/dL for Serum/plasma samples. Bias $\le$ 10% for Urine samples (for listed substances).	Most tested substances (e.g., Acetaminophen, Ascorbic Acid, etc.) show negligible bias, meeting the criteria. *Substances showing bias beyond* acceptance criteria for Serum: - Cefoxitin: Significant interference (e.g., 243.6% and 947.9% bias at high concentrations). - Cephalothin: Shows significant bias (e.g., 44.0% bias at 180 mg/dL). - Glucose: Shows bias beyond 10% at higher concentrations (e.g., 11.5% at 500 mg/dL and 22.5% at 1000 mg/dL). - Total Protein: Shows bias beyond 0.15 mg/dL at 15 g/dL (0.45 mg/dL). - Acetohexamide: Shows bias beyond 10% at 2.0 mg/dL (10.4%). - Hydroxocobalamin (Cyanokit): Shows significant bias (e.g., 14.5% and 49.3% at higher concentrations). Substances showing bias beyond acceptance criteria for Urine: - Cefoxitin:** Shows bias beyond 10% at higher concentrations (e.g., 11.3% and 15.4%). (The document explicitly lists these substances under "Interference beyond $\pm$10% for Serum" and "Interference beyond $\pm$10% for Urine," indicating that they failed the non-interference criteria at the tested concentrations. This is typical for IVD submissions, where known interferences are identified for labeling purposes).
Standardization	The assay shall be traceable to the reference material SRM967, from the National Institute of Standards and Technology (NIST).	Statement confirms the assay is traceable to NIST SRM967.

Study Details:

Sample Size and Data Provenance:
- Test Set Sample Sizes:
  - Detection Capability: Not explicitly stated as "sample size" but data points obtained according to CLSI EP17-A2.
  - Precision: 80 data points per serum/urine sample type (duplicate runs for 20 days, 2 runs/day).
  - Reproducibility: 225 data points per serum/urine sample type (n=5 in 1 run for 5 days using 3 instruments and 3 reagent lots).
  - Assay Comparison: 151 serum samples and 113 urine samples.
  - Specimen Equivalence: 50 samples for each plasma type (Sodium Heparin, Lithium Heparin, Dipotassium EDTA) compared to serum.
  - Interference (HIL & Non-Interfering Substances): Not explicitly stated as a total sample size, but experiments are designed to test specific analyte concentrations with and without interferents, following CLSI EP07-ED3.
- Data Provenance: Not explicitly stated in terms of country of origin. Given the manufacturer (Siemens Healthcare Diagnostics Inc. in Tarrytown, New York, USA) and FDA submission, it's highly probable the studies were conducted in the US or in compliance with US regulatory standards. The studies described are retrospective in the sense that they use pre-collected or prepared samples to assess the analytical performance of the device under controlled conditions, not prospective in tracking patient outcomes in a clinical trial.
Number of experts used to establish the ground truth for the test set and qualifications of those experts:
- For an in vitro diagnostic (IVD) device measuring a quantitative analyte like creatinine, "ground truth" is typically established by reference methods or established laboratory standards and calibrators, not by human expert consensus or labeling of medical images.
- The "ground truth" for creatinine concentration in this context is based on traceable reference materials (NIST SRM 967) and established laboratory measurement principles, and the performance is compared against a legally marketed predicate device.
- Therefore, this question (relevant for AI/imaging devices) does not directly apply to this type of IVD submission.
Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- Not applicable. Adjudication methods are typically used in clinical trials or image labeling pipelines where there's human interpretation involved and a need to resolve disagreements among multiple readers; this is an analytical performance study of an IVD assay.
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- Not applicable. This is not an AI/imaging device. It's an in vitro diagnostic assay.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- This is an automated IVD assay performed on the Atellica® CH Analyzer. Its intended use is quantitative determination of creatinine. Therefore, the performance described (precision, accuracy, interference, etc.) is its standalone performance without a human in the loop for the analytical measurement itself, though a human still interacts with the instrument and interprets the results in a clinical context.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The primary "ground truth" for the Atellica® CH Creatinine_3 assay's performance is traceability to NIST SRM 967 (a certified reference material for creatinine) and comparison to a legally marketed predicate device (Atellica® CH Creatinine_2) using method comparison validated against CLSI guidelines. This is a form of analytical reference standard and comparative performance to an established method.
The sample size for the training set:
- This device is an analytical chemistry assay, not a machine learning/AI algorithm that requires a "training set" in the computational sense. The "development" or "optimization" of the assay would involve various experimental data, but it's not codified as a "training set" for an algorithm.
How the ground truth for the training set was established:
- Not applicable, as there is no "training set" in the AI/ML context for this type of device. The assay development would rely on scientific principles of analytical chemistry, reagent formulation, and instrument calibration against known standards.

Ask a Question

Ask a specific question about this device

Page 1 of 30