(188 days)
Sonic DL is a Deep Learning based image reconstruction technique that is available for use on GE Healthcare 1.5T and 3.0T MR systems. Sonic DL reconstructs MR images from highly under-sampled data, and thereby enables highly accelerated acquisitions. Sonic DL is intended for cardiac imaging, and for patients of all ages.
Sonic DL is a new software feature intended for use with GE Healthcare MR systems. It consists of a deep learning based reconstruction algorithm that is applied to data from MR cardiac cine exams obtained using a highly accelerated acquisition technique.
Sonic DL is an optional feature that is integrated into the MR system software and activated through a purchasable software option key.
Here's a breakdown of the acceptance criteria and the study details for the Sonic DL device, based on the provided document:
Sonic DL Acceptance Criteria and Study Details
1. Table of Acceptance Criteria and Reported Device Performance
The document describes the performance of Sonic DL in comparison to conventional ASSET Cine images. While explicit numerical acceptance criteria for regulatory clearance are not stated, the studies aim to demonstrate non-inferiority or superiority in certain aspects. The implicit acceptance criteria are:
- Diagnostic Quality: Sonic DL images must be rated as being of diagnostic quality.
- Functional Measurement Agreement: Functional cardiac measurements (e.g., LV volumes, EF, CO) from Sonic DL images must agree closely with those from conventional ASSET Cine images, ideally within typical inter-reader variability.
- Reduced Scan Time: Sonic DL must provide significantly shorter scan times.
- Preserved Image Quality: Image quality must be preserved despite higher acceleration.
- Single Heartbeat Imaging (Functional): Enable functional imaging in a single heartbeat.
- Rapid Free-Breathing Functional Imaging: Enable rapid functional imaging without breath-holds.
Implicit Acceptance Criterion | Reported Device Performance |
---|---|
Diagnostic Quality | "on average the Sonic DL images were rated as being of diagnostic quality" (second reader study). |
Functional Measurement Agreement | "the inter-method variability (coefficient of variability comparing functional measurements taken with Sonic DL images versus measurements using the conventional ASSET Cine images) was smaller than the inter-observer intra-method variability for the conventional ASSET Cine images for all parameters, indicating that Sonic DL is suitable for performing functional cardiac measurements" (first reader study). |
"Functional measurements using Sonic DL 1 R-R free breathing images from 10 subjects were compared to functional measurements using the conventional ASSET Cine breath hold images, and showed close agreement" (additional clinical testing for 1 R-R free breathing). | |
Reduced Scan Time | "providing a significant reduction in scan time compared to the conventional ASSET Cine images" (second reader study). |
"the Sonic DL feature provided significantly shorter scan times than the conventional Cine imaging" (overall conclusion). | |
Preserved Image Quality | "capable of reconstructing Cine images from highly under sampled data that are similar to the fully sampled Cine images in terms of image quality and temporal sharpness" (nonclinical testing). |
"the image quality of 13 Sonic DL 1 R-R free breathing cases was evaluated by a U.S. board certified radiologist, and scored higher than the corresponding conventional free breathing Cine images from the same subjects" (additional clinical testing for 1 R-R free breathing). | |
Single Heartbeat Functional Imaging | "Sonic DL is capable of achieving a 12 times acceleration factor and obtaining free-breathing images in a single heartbeat (1 R-R)" (additional clinical testing). |
Rapid Free-Breathing Functional Imaging | "Sonic DL is capable of... obtaining free-breathing images in a single heartbeat (1 R-R)" (additional clinical testing). |
2. Sample Size Used for the Test Set and Data Provenance
The document describes two primary reader evaluation studies and additional clinical testing.
- First Reader Study (Functional Measurements):
- Sample Size: 107 image series from 57 unique subjects (46 patients, 11 healthy volunteers).
- Data Provenance: Data from 7 sites: 2 GE Healthcare facilities and 5 external clinical collaborators. This indicates data from multiple sources, likely a mix of prospective and retrospective collection. The geographic origin of these sites is not explicitly stated but implies a multi-center study potentially from different countries where GE Healthcare operates or collaborates.
- Second Reader Study (Image Quality Assessment):
- Sample Size: 127 image sets, which included a subset of the subjects from the first study.
- Data Provenance: Same as the first reader study (clinical sites and healthy volunteers at GE Healthcare facilities).
- Additional Clinical Testing (1 R-R Free Breathing):
- Functional Measurements: 10 subjects.
- Image Quality Evaluation: 13 subjects.
- Data Provenance: In vivo cardiac cine images from 19 healthy volunteers. This implies prospective collection or a subset of prospectively collected healthy volunteer data.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
- First Reader Study (Functional Measurements): Three radiologists. Qualifications are not explicitly stated, but their role in making quantitative measurements implies expertise in cardiac MRI.
- Second Reader Study (Image Quality Assessment): Three radiologists. Qualifications are not explicitly stated, but their role in blinded image quality assessments implies expertise in cardiac MRI interpretation.
- Additional Clinical Testing (1 R-R Free Breathing Image Quality): One U.S. board certified radiologist.
4. Adjudication Method for the Test Set
The document does not explicitly state an adjudication method (like 2+1, 3+1, or none) for either the functional measurements or the image quality assessments. For the first study, it mentions "inter-method variability" and "inter-observer intra-method variability," suggesting that the readings from the three radiologists were compared against each other and against the conventional method, but not necessarily adjudicated to establish a single "ground truth" per case. For the second study, "blinded image quality assessments" were performed, and ratings were averaged, but no adjudication process is described.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, and the effect size
A clear MRMC comparative effectiveness study, in the sense of measuring human reader improvement with AI vs. without AI assistance, is not explicitly described.
The studies compare the performance of Sonic DL images (algorithm output) against conventional images, with human readers evaluating both.
- The first reader study compares quantitative measurements from Sonic DL images to conventional images, indicating suitability for performing functional cardiac measurements by showing smaller inter-method variability than inter-observer intra-method variability for conventional images. This suggests Sonic DL is at least as reliable as the variability between conventional human measurements.
- The second reader study involves blinded image quality assessments of both conventional and Sonic DL images, confirming that Sonic DL images were rated as diagnostic quality.
- The additional clinical testing for 1 R-R free breathing shows that Sonic DL images were "scored higher than the corresponding conventional free breathing Cine images" by a U.S. board-certified radiologist.
These are comparisons of the image quality and output from the AI system versus conventional imaging, interpreted by readers, rather than measuring human reader performance assisted by the AI system.
Therefore, the effect size of how much human readers improve with AI vs. without AI assistance is not provided because the studies were designed to evaluate the image output quality and measurement agreement of the AI-reconstructed images themselves, not to assess an AI-assisted workflow for human readers.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
Yes, standalone performance was assessed for image quality metrics.
- Nonclinical Testing: "Model accuracy metrics such as Peak-Signal-to-Noise (PSNR), Root-Mean-Square Error (RMSE), Structural Similarity Index Measure (SSIM), and Mean Absolute Error (MAE) were used to compare simulated Sonic DL images with different levels of acceleration and numbers of phases to the fully sampled images." This is a standalone evaluation of the algorithm's output quality against a reference.
- In Vivo Testing: "model accuracy and temporal sharpness evaluations were conducted using in vivo cardiac cine images obtained from 19 health volunteers." This is also a standalone technical evaluation of the algorithm's output on real data.
7. The Type of Ground Truth Used
- Nonclinical Testing (Simulated Data): The ground truth was the "fully sampled images" generated from an MRXCAT phantom and a digital phantom.
- Clinical Testing (Reader Studies):
- Functional Measurements: The "ground truth" for comparison was the measurements taken from the "conventional ASSET Cine images." The variability of these conventional measurements across readers also served as a baseline for comparison. This is a form of clinical surrogate ground truth (comparing to an established accepted method).
- Image Quality Assessments: The "ground truth" was the expert consensus/opinion of the radiologists during their blinded assessments of diagnostic quality.
- Additional Clinical Testing (1 R-R Free Breathing): Functional measurements were compared to "conventional ASSET Cine breath hold images" (clinical surrogate ground truth). Image quality was based on the scoring by a "U.S. board certified radiologist" (expert opinion).
No pathology or outcomes data were used as ground truth. The ground truth in the clinical setting was primarily based on established imaging techniques (conventional MR) and expert radiologist assessments.
8. The Sample Size for the Training Set
The document does not explicitly state the sample size for the training set used for the deep learning model. It only describes the data used for testing the device.
9. How the Ground Truth for the Training Set Was Established
Since the training set size is not provided, the method for establishing its ground truth is also not described in the provided text. Typically, for deep learning reconstruction, the "ground truth" for training often involves fully sampled or high-quality reference images corresponding to the undersampled input data.
§ 892.1000 Magnetic resonance diagnostic device.
(a)
Identification. A magnetic resonance diagnostic device is intended for general diagnostic use to present images which reflect the spatial distribution and/or magnetic resonance spectra which reflect frequency and distribution of nuclei exhibiting nuclear magnetic resonance. Other physical parameters derived from the images and/or spectra may also be produced. The device includes hydrogen-1 (proton) imaging, sodium-23 imaging, hydrogen-1 spectroscopy, phosphorus-31 spectroscopy, and chemical shift imaging (preserving simultaneous frequency and spatial information).(b)
Classification. Class II (special controls). A magnetic resonance imaging disposable kit intended for use with a magnetic resonance diagnostic device only is exempt from the premarket notification procedures in subpart E of part 807 of this chapter subject to the limitations in § 892.9.