K Number
K231668
Date Cleared
2023-07-07

(30 days)

Product Code
Regulation Number
892.2050
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

Spine CAMP™ is a fully-automated software that analyzes X-ray images of the spine to produce reports that contain static and/or motion metrics. Spine CAMP™ can be used to obtain metrics from sagittal plane radiographs of the lumbar and/or cervical spine and it can be used to visualize intervertebral motion via an image registration method referred to as "stabilization." The radiographic metrics can be used to characterize and assess spinal health in accordance with established quidance. For example, common clinical uses include assessing spinal stability, alignment, degeneration, fusion, motion preservation, and implant performance. The metrics produced by Spine CAMP are intended to be used to support qualified and licensed professional healthcare practitioners in clinical decision making for skeletally mature patients of age 18 and above.

Device Description

Spine CAMP™ is a fully-automated image processing software device. It is designed to be used with X-ray images and is intended to aid medical professionals in the measurement and assessment of spinal parameters. Spine CAMP™ is capable of calculating distances, angles, linear displacements, angular displacements, and mathematical combinations of these metrics to characterize the morphology, alignment, and motion of the spine. These analysis results are presented in the form of reports, annotated images, and visualizations of intervertebral motion to support their interpretation.

AI/ML Overview

The provided text describes the Spine CAMP™ (1.1) device, an automated software for analyzing X-ray images of the spine, and refers to performance data used to demonstrate its substantial equivalence to a predicate device. However, the text does not contain a detailed table of acceptance criteria nor a comprehensive study report with specific performance metrics (e.g., accuracy, sensitivity, specificity etc.) compared against those criteria. It primarily focuses on the comparison to a predicate device and general claims of equivalence.

Based on the information provided, here's what can be extracted and inferred regarding the acceptance criteria and study:

1. Table of Acceptance Criteria and Reported Device Performance

The text does not explicitly provide a table of acceptance criteria with specific quantitative thresholds (e.g., "accuracy > X%, sensitivity > Y%") nor detailed reported device performance against such criteria. Instead, it states that "Statistical correlations and equivalence tests were performed by directly comparing vertebral landmark coordinates, image calibration, and intervertebral measurements between Spine CAMP™ v1.1 and the predicate device as well as spinopelvic measurements between Spine CAMP™ v1.1 and the reference device. This analysis demonstrated correlation and statistical equivalence for all variables evaluated."

This implies that the acceptance criteria were based on demonstrating statistical equivalence or strong correlation to the predicate device (Spine CAMP™ v1.0) and a "reference device" (QMA for spinopelvic measurements) for the identified variables. The exact metrics and their thresholds for establishing "statistical equivalence" are not detailed.

What is present regarding "performance":

  • Performance Goal: "demonstrated correlation and statistical equivalence for all variables evaluated."
  • Variables Evaluated: Vertebral landmark coordinates, image calibration, intervertebral measurements, spinopelvic measurements.
  • Result: "This analysis demonstrated correlation and statistical equivalence for all variables evaluated."

Without an explicit table, we cannot populate one.

2. Sample Size for the Test Set and Data Provenance

  • Sample Size for Test Set:
    • 215 lateral cervical spine radiographs
    • 232 lateral lumbar spine radiographs
  • Data Provenance: The text does not explicitly state the country of origin. It indicates that the dataset "had not been used to train any of the AI models," implying it was a test set. The term "retrospective or prospective" is not specified, but the use of an "existing" dataset (previously analyzed by Spine CAMP™ v1.0) suggests it was retrospective.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications of those Experts

  • Number of Experts: Five experienced operators.
  • Qualifications of Experts: Described as "experienced operators." No specific qualifications like "radiologist with 10 years of experience" are provided. It's implied they were trained professionals capable of using the "reference device, QMA, for spinopelvic measurements."

4. Adjudication Method for the Test Set

The text states: "this dataset was analyzed by five experienced operators using the reference device, QMA, for spinopelvic measurements." This implies that the measurements from these five operators using the QMA device were used to establish the reference standard (ground truth) for spinopelvic measurements. It does not specify an adjudication method like 2+1 or 3+1 if there were discrepancies among the operators, or if their measurements were averaged/concatenated to form the ground truth. It simply states they "analyzed" the data.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

  • MRMC Study: Yes, an indirect form of comparative effectiveness was conducted by having "five experienced operators" analyze the dataset using a "reference device" (QMA) to provide a ground truth for comparison with the AI's spinopelvic measurements. The primary comparison in the study, however, was between Spine CAMP™ v1.1 and the predicate device (Spine CAMP™ v1.0), and between Spine CAMP™ v1.1 and the human-generated "reference device" (QMA) data.
  • Effect Size of Human Readers Improving with AI vs. Without AI Assistance: This specific metric is not provided in the text. The study did not appear to be designed as an MRMC study comparing human reader performance with and without AI assistance; rather, it compared the AI's performance to established methods (predicate device and human-operated reference device).

6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done

Yes, the testing described appears to be a standalone evaluation of the Spine CAMP™ v1.1 algorithm. It mentions "evaluating Spine CAMP™ v1.1 performance on a large dataset" and comparing its outputs to those of the predicate device and the reference device (QMA operated by humans). This focuses on the algorithm's output directly, rather than how it changes human workflow or decision-making.

7. The Type of Ground Truth Used

  • For Intervertebral & Image Calibration Measurements: The ground truth appears to be implicitly established by the predicate device's (Spine CAMP™ v1.0) outputs. The study performed "statistical correlations and equivalence tests... between Spine CAMP™ v1.1 and the predicate device."
  • For Spinopelvic Measurements: The ground truth was established by the measurements from five experienced operators using a "reference device" (QMA). This can be considered a form of "expert consensus" or "expert measurement" acting as the reference standard.
  • No mention of pathology or outcomes data as ground truth.

8. The Sample Size for the Training Set

The text states: "Spine CAMP's primary component, the AI Engine, was updated by retraining its AI models with more imaging for improved generalization and performance." However, the specific sample size for the training set is not provided.

9. How the Ground Truth for the Training Set Was Established

The text mentions that the AI models were "retrained." It does not describe how the ground truth for this training data was established. It only states that the test dataset "had not been used to train any of the AI models," implying separate data for training and testing.

§ 892.2050 Medical image management and processing system.

(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).