(108 days)
OsteoDetect analyzes wrist radiographs using machine learning techniques to identify and highlight distal radius fractures during the review of posterior-anterior (PA) and lateral (LAT) radiographs of adult wrists.
OsteoDetect is a software device designed to assist clinicians in detecting distal radius fractures during the review of posterior-anterior (PA) and lateral (LAT) radiographs of adult wrists. The software uses deep learning techniques to analyze wrist radiographs (PA and LAT views) for distal radius fracture in adult patients.
1. Table of Acceptance Criteria and Reported Device Performance
Standalone Performance
Performance Metric | Acceptance Criteria (Implicit) | Reported Device Performance (Estimate) | 95% Confidence Interval |
---|---|---|---|
AUC of ROC | High | 0.965 | (0.953, 0.976) |
Sensitivity | High | 0.921 | (0.886, 0.946) |
Specificity | High | 0.902 | (0.877, 0.922) |
PPV | High | 0.813 | (0.769, 0.850) |
NPV | High | 0.961 | (0.943, 0.973) |
Localization Accuracy (average pixel distance) | Small | 33.52 pixels | Not provided for average distance itself, but standard deviation of 30.03 pixels. |
Generalizability (AUC for all subgroups) | High | ≥ 0.926 (lowest subgroup - post-surgical radiographs) | Not explicitly provided for all, but individual subgroup CIs available in text. |
MRMC (Reader Study) Performance - Aided vs. Unaided Reads
Performance Metric | Acceptance Criteria (Implicit: Superiority of Aided) | Reported Device Performance (OD-Aided) | Reported Device Performance (OD-Unaided) | 95% Confidence Interval (OD-Aided) | 95% Confidence Interval (OD-Unaided) | p-value for difference |
---|---|---|---|---|---|---|
AUC of ROC | AUC_aided - AUC_unaided > 0 | 0.889 | 0.840 | Not explicitly given for AUCs themselves, but difference CI: (0.019, 0.080) | Not explicitly given for AUCs themselves, but difference CI: (0.019, 0.080) | 0.0056 |
Sensitivity | Superior Aided | 0.803 | 0.747 | (0.785, 0.819) | (0.728, 0.765) | Not explicitly given for individual metrics, but non-overlapping CIs imply significance. |
Specificity | Superior Aided | 0.914 | 0.889 | (0.903, 0.924) | (0.876, 0.900) | Not explicitly given for individual metrics, but non-overlapping CIs imply significance. |
PPV | Superior Aided | 0.883 | 0.844 | (0.868, 0.896) | (0.826, 0.859) | Not explicitly given for individual metrics, but non-overlapping CIs imply significance. |
NPV | Superior Aided | 0.853 | 0.814 | (0.839, 0.865) | (0.800, 0.828) | Not explicitly given for individual metrics, but non-overlapping CIs imply significance. |
2. Sample Size and Data Provenance for Test Set
Standalone Performance Test Set:
- Sample Size: 1000 images (500 PA, 500 LAT)
- Data Provenance: Retrospective. Randomly sampled from an existing validation database of consecutively collected images from patients receiving wrist radiographs at the (b) (4) from November 1, 2016 to April 30, 2017. The study population included images from the US.
MRMC (Reader Study) Test Set:
- Sample Size: 200 cases.
- Data Provenance: Retrospective. Randomly sampled from the same validation database used for the standalone performance study. The data includes cases from the US.
3. Number of Experts and Qualifications for Ground Truth
Standalone Performance Test Set and MRMC (Reader Study) Test Set:
- Number of Experts: Three.
- Qualifications: U.S. board-certified orthopedic hand surgeons.
4. Adjudication Method for Test Set
Standalone Performance Test Set:
- Adjudication Method (Binary Fracture Presence/Absence): Majority opinion of at least 2 of the 3 clinicians.
- Adjudication Method (Localization - Bounding Box): The union of the bounding box of each clinician identifying the fracture.
MRMC (Reader Study) Test Set:
- Adjudication Method: Majority opinion of three U.S. board-certified orthopedic hand surgeons. (Note: this was defined on a per-case basis, considering PA, LAT, and oblique images if available).
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? Yes.
- Effect Size (Improvement of Human Readers with AI vs. without AI assistance):
- The least squares mean difference between the AUC for OsteoDetect-aided and OsteoDetect-unaided reads is 0.049 (95% CI, (0.019, 0.080)). This indicates a statistically significant improvement in diagnostic accuracy (AUC) of 4.9 percentage points when readers were aided by OsteoDetect.
- Sensitivity: Improved from 0.747 (unaided) to 0.803 (aided), an improvement of 0.056.
- Specificity: Improved from 0.889 (unaided) to 0.914 (aided), an improvement of 0.025.
6. Standalone (Algorithm Only) Performance Study
- Was a standalone study done? Yes.
7. Type of Ground Truth Used
Standalone Performance Test Set:
- Type of Ground Truth: Expert consensus (majority opinion of three U.S. board-certified orthopedic hand surgeons).
MRMC (Reader Study) Test Set:
- Type of Ground Truth: Expert consensus (majority opinion of three U.S. board-certified orthopedic hand surgeons).
8. Sample Size for Training Set
The document does not explicitly state the sample size for the training set. It mentions "randomly withheld subset of the model's training data" for setting the operating point, implying a training set existed, but its size is not provided.
9. How Ground Truth for Training Set Was Established
The document does not explicitly state how the ground truth for the training set was established. It only refers to a "randomly withheld subset of the model's training data" during the operating point setting.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.