Search Results

VELYS™ Hip Navigation is an image-processing software indicated to assist in the positioning of total hip replacement components. It is intended to assist in precisely positioning total hip replacement components intra-operatively by measuring their positions relative to the bone structures of interest provided that the points of interest can be identified from radiology images.

VELYS™ Hip Navigation is also indicated for assisting healthcare professionals in preoperative planning and postoperative analysis of orthopaedic surgery in Total Hip Replacement and Total Knee Replacement. The device allows for overlaying of prosthesis templates on radiological images and includes tools for performing measurements on the image and for positioning the template. Clinical judgment and experience are required to properly use the software. The software is not for primary image interpretation. The software is not for use on mobile phones.

Device Description

VELYS™ Hip Navigation (VHN) is a Software as a Medical Device that provides the clinician with intra-operative measurements and visuals of acetabular cup orientation, femoral component leg length and offset calculations, and implant constructs based on user-defined, but machine learning (ML) default-positioned, bony-anatomy landmark points.

VHN includes a machine learning model that places the default position of the landmark based on the output of the model; the user has full control to manipulate the landmark positions after placement. The model inputs the x-ray or fluoroscopy images and outputs a default location for the landmark annotation tool. This machine learning model is Human-in-the-Loop, as the user is expected to position the annotation as they see fit.

AI/ML Overview

The provided FDA 510(k) clearance letter and summary for VELYS™ Hip Navigation contain information about its acceptance criteria and some aspects of the study proving its performance. However, several requested details are not explicitly stated in the document.

Here's a breakdown of the available information:

1. Table of Acceptance Criteria and Reported Device Performance

The document describes one main performance study related to Human vs AI System Output Validation.

Acceptance Criteria	Reported Device Performance
For each of the 5 respective outputs (leg length, femoral offset, total offset, cup inclination, and cup anteversion), the machine learning model generated data point was within the range of the human operator data points on each of the test images.	The study resulted in a "Full Pass" which validates the acceptance criteria. This means the machine learning model generated data points fell within the range of manual operated data points for Leg Length, Femoral Offset, Total Offset, Inclination, and Anteversion outputs.
(Implicit) Workflow Efficiency: Reduced time to complete workflows when AI-Assisted Landmarks are enabled.	Human vs AI System Output Validation testing showed the time to complete the workflows took less time than the manual cases when AI-Assisted Landmarks are enabled.

2. Sample size used for the test set and the data provenance

Test Set Sample Size: The document does not explicitly state the sample size for the "Human vs AI System Output Validation" test set. It mentions "each of the test cases" but not the total number of cases.
Data Provenance: The x-ray and fluoroscopic image data used for model training were extracted from user data from the production VHN software. It also states that "Clinical institutions geographically spread across the US were strategically selected in an effort to capture the widest range of patient populations and minimize bias." This suggests retrospective data collected from real-world clinical usage.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of Experts: Unspecified. The acceptance criteria state "human operator data points," implying multiple human operators, but the exact number isn't quantified.
Qualifications of Experts: Unspecified. They are referred to as "human operators" but their specific qualifications (e.g., orthopedic surgeons, radiologists, years of experience) are not provided in this document.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

Adjudication Method: Not explicitly stated. The acceptance criteria refer to the "range of the human operator data points," which suggests a comparison against a collective outcome of human operators, but a specific adjudication method like 2+1 or 3+1 is not detailed.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

MRMC Comparative Effectiveness Study: The document describes a "Human vs AI System Output Validation" which compares AI-assisted performance (where the ML model generates default landmark positions, and the user confirms/adjusts) against manual performance (human operators manually selecting landmarks). This is a comparative study of a sort, but not explicitly labeled as a standard MRMC study in the context of reader improvement.
Effect Size of Human Improvement (with AI vs. without AI): An effect size related to improvement in accuracy is not provided. However, the study did show an improvement in workflow efficiency: "When AI-Assisted Landmarks are enabled, Human vs AI System Output Validation testing showed the time to complete the workflows took less time than the manual cases." The magnitude of this time reduction (effect size) is not quantified.

6. If a standalone (i.e. algorithm only without human-in-the loop performance) was done

Standalone Performance: Not explicitly stated in the context of the user-facing output metrics (leg length, offset, inclination, anteversion). The "Human vs AI System Output Validation" explicitly involves human operators. The "Model Performance Metrics" (mAR, mAP, mAP@0.75) for OneTrial, CupCheckGuidedBilateral, and CupCheckGuidedUnilateral models might represent standalone algorithm performance metrics before the human-in-the-loop interaction, but it's not directly applied to the final output values of the VHN software.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)

Type of Ground Truth (for Human vs AI System Output Validation): The ground truth for the "Human vs AI System Output Validation" appears to be the "range of the human operator data points." This falls under a form of expert consensus/reference range, where human operators' established measurements serve as the benchmark.

8. The sample size for the training set

Training Set Sample Size: The model development used a total of 18,550 images. This dataset was split into:
- 90% for training: (0.90 * 18,550) = 16,695 images
- 5% for validation
- 5% for test (this test set is independent from the training/validation data, and separate from the "Human vs AI System Output Validation" test set mentioned above).

9. How the ground truth for the training set was established

Ground Truth for Training Set: "Training data was pulled from user data from the production VHN software. Clinicians used the software and annotated the patient images in a clinical setting. Therefore, the reference standard is the clinician's annotations for each image which was pulled with the image." This indicates that the ground truth for the training set was established by clinician annotations (expert annotations) during routine clinical use of the production software.

Ask a Question

Ask a specific question about this device

Page 1 of 1