K Number
K223343
Date Cleared
2023-03-28

(147 days)

Product Code
Regulation Number
892.1000
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

The MAGNETOM system is indicated for use as a magnetic resonance diagnostic device (MRDD) that produces transverse, sagittal, coronal and oblique cross sectional images, spectroscopic images and/or spectra, and that displays the internal structure and/or function of the head, body, or extremities. Other physical parameters derived from the images and/or spectra may also be produced. Depending on the region of interest, contrast agents may be used. These images and/or spectra and the physical parameters derived from the images and/or spectra when interpreted by a trained physician yield information that may assist in diagnosis.

The MAGNETOM system may also be used for imaging during interventional procedures when performed with MR compatible devices such as in-room displays and MR Safe biopsy needles.

Device Description

MAGNETOM Amira and MAGNETOM Sempra with syngo MR XA50M include new and modified features comparing to the predicate devices MAGNETOM Amira and MAGNETOM Sempra with syngo MR XA12M (K183221, cleared on February 14, 2019).

AI/ML Overview

The provided document is a 510(k) summary for the Siemens MAGNETOM Amira and Sempra MR systems, detailing their substantial equivalence to predicate devices. It describes new and modified hardware and software features, including AI-powered "Deep Resolve Boost" and "Deep Resolve Sharp."

However, the document does not contain the detailed information necessary to fully answer the specific questions about acceptance criteria and a study proving the device meets those criteria, particularly in the context of AI performance. The provided text is a summary for regulatory clearance, not a clinical study report.

Specifically, it lacks:

  • Concrete, quantifiable acceptance criteria for the AI features (e.g., a specific PSNR threshold that defines "acceptance").
  • A comparative effectiveness study (MRMC) to show human reader improvement with AI assistance.
  • Stand-alone algorithm performance metrics for the AI features (beyond general quality metrics like PSNR/SSIM, which are not explicitly presented as acceptance criteria).
  • Details on expert involvement, adjudication, or ground truth establishment for a test set used for regulatory acceptance, as the "test statistics and test results" section refers to quality metrics and visual inspection, and "clinical settings with cooperation partners" rather than a formal test set for regulatory submission.

The "Test statistics and test results" section for Deep Resolve Boost mentions "After successful passing of the quality metrics tests, work-in-progress packages of the network were delivered and evaluated in clinical settings with cooperation partners." It also mentions "seven peer-reviewed publications" covering 427 patients which "concluded that the work-in-progress package and the reconstruction algorithm can be beneficially used for clinical routine imaging." This indicates real-world evaluation but does not provide specific acceptance criteria or detailed study results for the regulatory submission itself.

Based on the provided text, here's what can be extracted and what is missing:

1. Table of acceptance criteria and reported device performance:

The document does not explicitly state quantifiable "acceptance criteria" for the AI features (Deep Resolve Boost and Deep Resolve Sharp) that were used for regulatory submission. Instead, it describes general successful evaluation methods:

Acceptance Criteria (Inferred/Methods Used)Reported Device Performance (Summary)
For Deep Resolve Boost:
  • Successful passing of quality metrics tests (PSNR, SSIM)
  • Visual inspection to detect potential artifacts
  • Evaluation in clinical settings with cooperation partners
  • No misinterpretation, alteration, suppression, or introduction of anatomical information reported | Deep Resolve Boost:
  • Impact characterized by PSNR and SSIM. Visual inspection conducted for artifacts.
  • Evaluated in clinical settings with cooperation partners.
  • Seven peer-reviewed publications (427 patients on 1.5T and 3T systems, covering prostate, abdomen, liver, knee, hip, ankle, shoulder, hand and lumbar spine).
  • Publications concluded beneficial use for clinical routine imaging.
  • No reported cases of misinterpretation, altered, suppressed, or introduced anatomical information.
  • Significant time savings reported in most cases by enabling faster image acquisition. |
    | For Deep Resolve Sharp:
  • Successful passing of quality metrics tests (PSNR, SSIM, perceptual loss)
  • In-house visual rating
  • Evaluation of image sharpness by intensity profile comparisons of reconstruction with and without Deep Resolve Sharp | Deep Resolve Sharp:
  • Impact characterized by PSNR, SSIM, and perceptual loss.
  • Verified and validated by in-house tests, including visual rating and evaluation of image sharpness by intensity profile comparisons.
  • Both tests showed increased edge sharpness. |

2. Sample sized used for the test set and the data provenance:

The document mixes "training" and "validation" datasets. It doesn't explicitly refer to a separate "test set" for regulatory evaluation with clear sample sizes for that purpose. The "Test statistics and test results" section refers to general evaluations and published studies.

  • "Validation" Datasets (internal validation, not explicitly a regulatory test set):
    • Deep Resolve Boost: 1,874 2D slices
    • Deep Resolve Sharp: 2,057 2D slices
  • Data Provenance (Training/Validation):
    • Source: For Deep Resolve Boost: "in-house measurements and collaboration partners." For Deep Resolve Sharp: "in-house measurements."
    • Origin: Not specified by country.
    • Retrospective/Prospective: "Input data was retrospectively created from the ground truth by data manipulation and augmentation" (for Boost) and "retrospectively created from the ground truth by data manipulation" (for Sharp). This implies the underlying acquired datasets were retrospective.
  • "Clinical Settings" / Publications (Implied real-world evaluation, not a regulatory test set):
    • Deep Resolve Boost: "a total of seven peer-reviewed publications 427 patients"
    • Data Provenance: Not specified by origin or retrospective/prospective for these external evaluations.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

This information is not provided in the document. It mentions "visual inspection" and "visual rating," but does not detail the number or qualifications of experts involved in these processes for the "validation" sets or any dedicated regulatory "test set." For the "seven peer-reviewed publications," the expertise of the authors is implied but not detailed as part of the regulatory submission.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set:

This information is not provided in the document.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

A formal MRMC comparative effectiveness study demonstrating human reader improvement with AI assistance is not described in this document. The document focuses on the technical performance of the AI features themselves and their general clinical utility as reported in external publications (e.g., faster imaging, no misinterpretation), but not a comparative study of human performance with and without the AI.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, the sections on "Test statistics and test results" for both Deep Resolve Boost and Deep Resolve Sharp describe evaluation of the algorithm's performance using quality metrics (PSNR, SSIM, perceptual loss) and visual/intensity profile comparisons. This implies standalone algorithm evaluation. No specific quantifiable results for these metrics are provided as acceptance criteria, only that tests were successfully passed and showed increased sharpness for Deep Resolve Sharp.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc):

The ground truth for the AI training and validation datasets is described as:

  • Deep Resolve Boost: "The acquired datasets represent the ground truth for the training and validation. Input data was retrospectively created from the ground truth by data manipulation and augmentation." This implies that the original, full-quality MR images serve as the ground truth.
  • Deep Resolve Sharp: "The acquired datasets represent the ground truth for the training and validation. Input data was retrospectively created from the ground truth by data manipulation." Similarly, the original, high-resolution MR images are the ground truth.

This indicates the ground truth is derived directly from the originally acquired (presumably high-quality/standard) MRI data, rather than an independent clinical assessment like pathology or expert consensus. The AI's purpose is to reconstruct a high-quality image from manipulated or undersampled input, so the "truth" is the original high-quality image.

8. The sample size for the training set:

  • Deep Resolve Boost: 24,599 2D slices
  • Deep Resolve Sharp: 11,920 2D slices

Note that the document states: "due to reasons of data privacy, we did not record how many individuals the datasets belong to. Gender, age and ethnicity distribution was also not recorded during data collection."

9. How the ground truth for the training set was established:

As described in point 7:

  • Deep Resolve Boost: The "acquired datasets" (original, full-quality MR images) served as the ground truth. Input data for the AI model was then "retrospectively created from the ground truth by data manipulation and augmentation," including undersampling, adding noise, and mirroring k-space data.
  • Deep Resolve Sharp: The "acquired datasets" (original MR images) served as the ground truth. Input data was "retrospectively created from the ground truth by data manipulation," specifically by cropping k-space data so only the center part was used as low-resolution input, with the original full data as the high-resolution output/ground truth.

§ 892.1000 Magnetic resonance diagnostic device.

(a)
Identification. A magnetic resonance diagnostic device is intended for general diagnostic use to present images which reflect the spatial distribution and/or magnetic resonance spectra which reflect frequency and distribution of nuclei exhibiting nuclear magnetic resonance. Other physical parameters derived from the images and/or spectra may also be produced. The device includes hydrogen-1 (proton) imaging, sodium-23 imaging, hydrogen-1 spectroscopy, phosphorus-31 spectroscopy, and chemical shift imaging (preserving simultaneous frequency and spatial information).(b)
Classification. Class II (special controls). A magnetic resonance imaging disposable kit intended for use with a magnetic resonance diagnostic device only is exempt from the premarket notification procedures in subpart E of part 807 of this chapter subject to the limitations in § 892.9.