K Number
K240697
Date Cleared
2024-09-09

(179 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

See-Mode Augmented Reporting Tool, Thyroid (SMART-T) is a stand-alone reporting software to assist trained medical professionals in analyzing thyroid ultrasound images of adult (>=22 years old) patients who have been referred for an ultrasound examination.

Output of the device includes regions of interest (ROIs) placed on the thyroid ultrasound images assisting healthcare professionals to localize nodules in thyroid studies. The device also outputs ultrasonographic lexicon-based descriptors based on ACR TI-RADS. The software generates a report based on the image analysis results to be reviewed and approved by a qualified clinician after performing quality control.

SMART-T may also be used as a structured reporting software for further ultrasound studies. The software includes tools for reading measurements and annotations from the images that can be used for generating a structured report.

Patient management decisions should not be made solely on the basis of analysis by See-Mode Augmented Reporting Tool, Thyroid.

Device Description

See-Mode Augmented Reporting Tool, Thyroid (SMART-T) is a stand-alone, web-based image processing and reporting software for localization, characterization and reporting of thyroid ultrasound images.

The software analyzes thyroid ultrasound images and uses machine learning algorithms to extract specific information. The algorithms can identify and localize suspicious soft tissue nodules and also generate lexicon-based descriptors, which are classified according to ACR TI-RADS (composition, echogenicity, shape, margin, and echogenic foci) with a calculated TI-RADS level according to the ACR TI-RADS chart.

SMART-T may also be used as a structured reporting software for further ultrasound studies. The software includes tools for reading measurements and annotations from the images that can be used for generating a structured report.

The software then generates a report based on the image analysis results to be reviewed and approved by a qualified clinician after performing quality control. Any information within this report can be changed and modified by the clinician if needed during quality control and before finalizing the report.

The software runs on a standard "off-the-shelf" computer and can be accessed within the client web browser to perform the reporting of ultrasound images. Input data and images for the software are acquired through DICOM-compliant ultrasound imaging devices.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for the See-Mode Augmented Reporting Tool, Thyroid (SMART-T) device, based on the provided text:

Acceptance Criteria and Device Performance

Acceptance Criteria CategorySpecific MetricAcceptance Criteria (Explicitly Stated or Inferred)Reported Device Performance (Aided)Reported Device Performance (Unaided)Standalone Performance (Algorithm Only)
Nodule LocalizationAULROC (IOU > 0.5)Improvement over unaided performance0.758 (0.711, 0.803)0.736 (0.693, 0.780)0.703 (0.642, 0.762)
AULROC (IOU > 0.6)Improvement over unaided performance0.734 (0.682, 0.781)0.682 (0.632, 0.730)N/A
AULROC (IOU > 0.7)Improvement over unaided performance0.686 (0.629, 0.740)0.548 (0.490, 0.610)N/A
AULROC (IOU > 0.8)Improvement over unaided performance0.593 (0.529, 0.658)0.356 (0.293, 0.423)N/A
Localization Accuracy (Bounding box IOU > 0.5)Superior to unaided performance95.6% (94.1, 97.0)93.6% (92.1, 95.0)95.1%
TI-RADS DescriptorsComposition AccuracySignificant improvement over unaided performance84.9% (82.2, 87.5)80.4% (77.3, 83.4)86.7%
Echogenicity AccuracySignificant improvement over unaided performance77.4% (74.4, 80.3)70.0% (67.0, 72.8)68.2%
Shape AccuracySignificant improvement over unaided performance90.8% (88.2, 93.1)86.4% (83.7, 88.8)93.4%
Margin AccuracySignificant improvement over unaided performance73.5% (70.2, 76.7)57.3% (53.3, 61.2)58.4%
Echogenic Foci AccuracySignificant improvement over unaided performance75.2% (71.9, 78.5)71.1% (67.1, 74.9)70.3%
TI-RADS Level AgreementOverall TI-RADS Level AgreementSignificant improvement over unaided performance60.0% (56.8, 63.3)51.1% (47.8, 54.5)63.8% (60.0, 67.7)
TI-RADS Level Agreement (TR-1)Improvement over unaided performance59.0% (42.3, 74.9)52.9% (37.3, 68.3)61.9% (40.0, 82.6)
TI-RADS Level Agreement (TR-2)Improvement over unaided performance38.1% (31.1, 45.6)31.2% (24.6, 38.1)41.1% (31.7, 50.4)
TI-RADS Level Agreement (TR-3)Significant improvement over unaided performance68.9% (62.6, 74.9)58.8% (52.2, 65.4)71.7% (64.9, 78.3)
TI-RADS Level Agreement (TR-4)Significant improvement over unaided performance61.4% (56.5, 66.3)52.1% (47.2, 57.0)65.5% (59.1, 71.6)
TI-RADS Level Agreement (TR-5)Significant improvement over unaided performance71.3% (61.8, 80.5)62.0% (52.2, 71.5)77.0% (66.1, 87.3)

Note: The acceptance criteria are largely inferred from the study's objective to demonstrate "superior performance," "significant improvement," and "consistent performance" compared to unaided reading, and "on-par" with aided use for standalone. Exact numerical thresholds for acceptance were not explicitly stated as distinct acceptance criteria.


Study Details

2. Sample size used for the test set and the data provenance:

  • Test Set Sample Size: 600 cases from 600 unique patients.
  • Data Provenance: Retrospective collection of thyroid ultrasound images. 74% of the data was acquired from the US. The cases in the MRMC study were sourced from institutions or sources not part of the model training or development datasets to ensure generalizability.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

  • Number of Experts: Two expert US-board certified radiologists and one adjudicator (also a US-board certified radiologist with the most years of experience).
  • Qualifications: US-board certified radiologists, with one having "the most years of experience" for adjudication.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set:

  • Adjudication Method: 2+1 (Two expert radiologists' consensus, with an additional expert radiologist adjudicating disagreements). Specifically, the text states "consensus labels of two expert US-board certified radiologists and an adjudicator (also US-board certified radiologist with the most years of experience)."

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance:

  • MRMC Study Done: Yes.
  • Effect Size of Improvement (Aided vs. Unaided):
    • AULROC (IOU > 0.5): 0.022 (0.758 aided - 0.736 unaided)
    • AULROC (IOU > 0.6): 0.052 (0.734 aided - 0.682 unaided)
    • AULROC (IOU > 0.7): 0.138 (0.686 aided - 0.548 unaided)
    • AULROC (IOU > 0.8): 0.237 (0.593 aided - 0.356 unaided)
    • Localization Accuracy: 2.0% improvement (95.6% aided - 93.6% unaided)
    • TI-RADS Descriptors Accuracy Improvements:
      • Composition: 4.5% (84.9% vs 80.4%)
      • Echogenicity: 7.4% (77.4% vs 70.0%)
      • Shape: 4.4% (90.8% vs 86.4%)
      • Margin: 16.2% (73.5% vs 57.3%)
      • Echogenic Foci: 4.1% (75.2% vs 71.1%)
    • Overall TI-RADS Level Agreement: 8.9% (60.0% vs 51.1%)

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:

  • Standalone Study Done: Yes. The text explicitly states: "To evaluate the standalone performance of our device, where the output of the models are directly compared against ground truth labels."

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

  • Nodule Benign/Malignant Status: Sourced from reference standard of Fine Needle Aspiration (FNA) or 2-year follow-up for benign cases (outcomes data/pathology).
  • Localization, ACR TI-RADS Lexicon Descriptors, and TI-RADS Level Agreement: Expert consensus based on the labels of two expert US-board certified radiologists and an adjudicator.

8. The sample size for the training set:

  • The document states that the cases in the MRMC study were sourced from institutions or sources not part of the model training or development datasets. However, the specific sample size for the training set is not provided in the given text.

9. How the ground truth for the training set was established:

  • The document implies that the training data was distinct from the test set, but it does not explicitly describe how the ground truth for the training set was established. It only details the ground truth establishment for the test set used in the standalone and MRMC studies.

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.