(62 days)
SubtleSYNTH is a software as a medical device consisting of a software machine learning algorithm that synthesizes a SynthSTIR contrast image of a case from T1-weighted and T2-weighted spine MR images.
The SubtleSYNTH device is a software as a medical device consisting of a machine learning software algorithm that synthesizes a SynthSTIR contrast image of a case from T1-weighted and T2-weighted MR images. It is a post-processing software that does not directly interact with the MR scanner. Once a MR scan is acquired, a technologist sends the study from the scanner to a compatible medical device data system (MDDS) via the DICOM protocol. The compatible MDDS, then, makes the images available to SubtleSYNTH for processing.
SubtleSYNTH uses a convolutional network-based algorithm to synthesize an image with desired contrast weighting from other, previously obtained sequences such as T1- and T2-weighted images. The image processing can be performed on MRI images with predefined or specific acquisition protocol settings.
The SynthSTIR image is created by SubtleSYNTH and sent back to the picture archiving and communication system (PACS) or other DICOM node by the compatible MDDS for clinical review.
Because the software runs in the background, it has no user interface. It is intended to be used by radiologists in an imaging center, clinic, or hospital.
Note, depending on the functionality of the compatible MDDS, SubtleSYNTH can be used within the facility's network or remotely. The SubtleSYNTH device itself is not networked and therefore does not increase the cybersecurity risk of its users. Users are provided cybersecurity recommendations in labeling.
Acceptance Criteria and Device Performance Study for SubtleSYNTH (1.x)
This summary outlines the acceptance criteria and the study demonstrating that the SubtleSYNTH (1.x) device meets these criteria, based on the provided 510(k) summary.
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criteria Category | Specific Acceptance Criteria | Reported Device Performance | Study |
---|---|---|---|
Quantitative Image Fidelity | Root Mean Square Error (RMSE) between reference STIR and SynthSTIR a defined threshold. | All elements > 0.9 | Bench Study |
Bland-Altman analysis for bias (mean intensity difference between SynthSTIR and acquired STIR) for key tissues (Bone, Disc, CSF, Spinal Cord, Fat) showing no significant bias. | Bias samples randomly distributed near zero lines, no trend. 99% CI analysis implies no significant bias. | Bench Study | |
Interchangeability (Primary Endpoint) | Interchangeability between acquired STIR and SynthSTIR images not significantly greater than 10%. | Study A: Interchangeability = 2.12% (95% CI [-1.31%, 5.88%]) | |
Study B: Interchangeability = 0.63% (95% CI [-4.19%, 5.9%]) | Interchangeability Studies (A & B) | ||
Interchangeability Sub-analyses | Primary endpoint met for all Primary Categories (Degenerative, Infection, Trauma, Cord Lesion, Non-cord lesion, Vascular, Hemorrhage, Normal) and all scanner vendors. | Primary endpoint met for all Primary Categories and all scanner vendors in both Study A and Study B. | Interchangeability Studies (A & B) |
2. Sample Size for Test Set and Data Provenance
-
Quantitative Bench Study Test Set:
- Sample Size: 80 acquired studies.
- Data Provenance: Retrospective, sourced from clinical sites/hospitals in California, USA, and New York, USA.
- MRI Scanners: GE, Fonar, Philips, Siemens, Toshiba. Field strengths: 0.3T, 0.6T, 1.0T, 1.5T (42 series), 3T (35 series).
- Patient Demographics (of the 80 studies): Ages 16-89 years, 40 females, 36 males, 4 unknown sex. Ethnicity unknown due to anonymization.
- Clinical Categories: 8 cord lesions, 20 degenerative diseases, 10 infections, 15 non-cord lesions, 17 trauma, 10 normal series.
-
Interchangeability Studies (A & B) Test Set:
- Sample Size: 104 cases (common to both studies).
- Data Provenance: Selected from 269 gathered cases, sourced from populations in California, USA, and New York, USA. Retrospective.
- MRI Scanners: GE, Hitachi, Philips, Siemens, Toshiba. Field strengths: 0.3T (6 cases), 1.5T (56 cases), 3T (42 cases).
- Patient Demographics (of the 104 cases): Ages 1-89 years, 51 females, 53 males. Ethnicity unknown due to anonymization.
- Clinical Categories: 12 cord lesions, 12 degenerative disease, 12 hemorrhage, 12 infection, 12 non-cord lesions, 12 normal, 12 vascular, 20 trauma.
3. Number of Experts for Ground Truth and Qualifications
-
Quantitative Bench Study:
- Number of Experts: Not explicitly stated for ROI labeling. However, "an in-house radiologist" assigned clinical categories to collected images.
- Qualifications: "in-house radiologist" (specific experience level not provided in the document).
-
Interchangeability Studies (A & B):
- Number of Experts: Not explicitly stated for ground truth establishment. "an in-house radiologist" assigned clinical categories to the 104 cases for case selection.
- Qualifications: "in-house radiologist" (specific experience level not provided). The studies involved human readers for image interpretation, but their classifications were compared against each other and the "acquired STIR" and "SynthSTIR" images, not against a definitive expert-established ground truth of pathology for reader performance evaluation.
4. Adjudication Method for the Test Set
The document does not explicitly describe an adjudication method (like 2+1, 3+1) for establishing a definitive ground truth for the test set before the readers evaluated the images in the interchangeability studies.
In the interchangeability studies, readers themselves were making classifications, and their consistency across image types (acquired STIR vs. SynthSTIR) was evaluated. The primary endpoint focused on the interchangeability (disagreement rate) between interpretations derived from SynthSTIR versus acquired STIR, rather than comparing reader interpretations to a gold standard ground truth of disease presence/absence.
For the assignment of primary and secondary categories by readers, they were provided recommendations on how to prioritize conditions when more than one was present. There is no mention of an adjudication process if readers disagreed on these classifications when making their assessments.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No explicit MRMC comparative effectiveness study (comparing human readers with AI assistance vs. without AI assistance) was described.
The "interchangeability studies" (Study A and Study B) involved multiple readers evaluating two different image modalities (SynthSTIR vs. acquired STIR). These studies assessed if readers could make similar classifications using the synthesized images as they could with the acquired images. This is a form of reader study, but it is not framed as an AI-assisted vs. unassisted reader study to determine an effect size of improvement by AI assistance. Instead, it aims to demonstrate that the AI-generated images can be used interchangeably with gold-standard images for diagnostic classification.
6. Standalone Performance Study
Yes, a standalone performance study was done.
The quantitative bench testing section describes an "algorithm only" or "standalone" performance evaluation of SubtleSYNTH. This involved:
- Comparing the SynthSTIR output directly against the acquired STIR for RMSE, cosine similarity, and Bland-Altman analysis.
- This evaluation focused on the intrinsic image quality and fidelity of the synthesized images produced by the algorithm.
7. Type of Ground Truth Used
- Quantitative Bench Study: The "ground truth" for the quantitative assessment was the acquired STIR image from the MRI scanner. The SynthSTIR image was compared against this acquired STIR image (which is considered the clinical standard for STIR imaging).
- Interchangeability Studies (A & B): The "ground truth" for evaluating interchangeability was not a definitive disease diagnosis (e.g., pathology report). Instead, the studies assessed the agreement or interchangeability of classifications made by radiologists when viewing SynthSTIR images compared to when viewing acquired STIR images. An in-house radiologist assigned initial clinical categories to cases for selection, but this wasn't detailed as definitive ground truth for individual lesion diagnosis.
8. Sample Size for the Training Set
- Training Set Sample Size: 424 cases.
9. How the Ground Truth for the Training Set Was Established
The document states that the training dataset consists of "Sag T1w, Sag T2w, and Sag STIR images." It implies that the Sag STIR images serve as the ground truth or target for the SubtleSYNTH algorithm to learn how to synthesize a SynthSTIR image from Sag T1w and Sag T2w inputs.
The training data was collected from a variety of sources:
- MRI Scanners: GE, Hitachi, Philips, and Siemens. Field strengths: 0.7T, 1.2T, 1T, 1.5T (254 cases), 2T, 3T (160 cases).
- Patient Demographics: Ages 14-89 years, 193 females, 176 males, 55 unknown sex. Ethnicity unknown.
- Provenance: Sourced from populations throughout the USA.
The process of "establishing ground truth" for the training set in this context largely means having a reliably acquired and labeled set of "gold standard" STIR images that the model learns to replicate or generate from other input sequences. There is no mention of expert labeling or pathology for individual findings within the training set, rather the acquired STIR image itself acts as the reference truth for the synthesis task.
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).