K Number
K222176
Device Name
BoneView
Manufacturer
Date Cleared
2023-03-02

(223 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

BoneView 1.1-US is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs of: Ankle, Foot, Knee, Tibia/Fibula, Wrist, Hand, Elbow, Forearm, Humerus, Shoulder, Clavicle, Pelvis, Hip, Femur, Ribs, Thoracic Spine, Lumbosacral Spine. BoneView 1.1-US is intended for use as a concurrent reading aid during the interpretation of radiographs. BoneView 1.1-US is for prescription use only.

Device Description

BoneView 1.1-US is a software-only device intended to assist clinicians in the interpretation of: . limbs radiographs of children/adolescents and . limbs, pelvis, rib cage, and dorsolumbar vertebra radiographs of adults. BoneView 1.1-US can be deployed on-premise or on cloud and be connected to several computing platforms and X-ray imaging platforms such as X-ray radiographic systems, or PACS. After the acquisition of the radiographs on the patient and their storage in the DICOM Source, the radiographs are automatically received by BoneView 1.1-US from the user's DICOM Source through an intermediate DICOM node. Once received by BoneView 1.1-US, the radiographs are automatically processed by the AI algorithm to identify regions of interest. Based on the processing result, BoneView 1.1-US generates result files in DICOM format. These result files consist of a summary table and result images (annotations on a copy of the original images or annotations to be toggled on/off). BoneView 1.1-US does not alter the original images, nor does it change the order of original images or delete any image from the DICOM Source. Once available, the result files are sent by BoneView 1.1-US to the DICOM Destination through the same intermediate DICOM node. The DICOM Destination can be used to visualize the result files provided by BoneView 1.1-US or to transfer the results to another DICOM host for visualization. The users are then as a concurrent reading aid to provide their diagnosis.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly stated as numerical targets in a table. Instead, the study aims to demonstrate that the device performs with "high sensitivity and high specificity" and that its performance on children/adolescents is "similar" to that on adults. For the clinical study, the acceptance criteria are implicitly that the diagnostic accuracy of readers aided by BoneView is superior to that of readers unaided.

However, the document provides the performance metrics for both standalone testing and the clinical study.

Standalone Performance (Children/Adolescents Clinical Performance Study Dataset)

Operating PointMetricValue (95% Clopper-Pearson CI)Description
High-sensitivity (DOUBT FRACT)Sensitivity0.909 [0.889 - 0.926]The probability that the device correctly identifies a fracture when a fracture is present. This operating point is designed to be highly sensitive to possible fractures, potentially including subtle ones, and is indicated by a dotted bounding box.
High-sensitivity (DOUBT FRACT)Specificity0.821 [0.796 - 0.844]The probability that the device correctly identifies the absence of a fracture when no fracture is present.
High-specificity (FRACT)Sensitivity0.792 [0.766 - 0.817]The probability that the device correctly identifies a fracture when a fracture is present. This operating point is designed to be highly specific, meaning it provides a high degree of confidence that a detected fracture is indeed a fracture, and is indicated by a solid bounding box.
High-specificity (FRACT)Specificity0.965 [0.952 - 0.976]The probability that the device correctly identifies the absence of a fracture when no fracture is present.

Comparative Standalone Performance (Children/Adolescents vs. Adult)

Operating PointDatasetSensitivity (95% CI)Specificity (95% CI)95% CI on the difference (Sensitivity)95% CI on the difference (Specificity)
High-sensitivity (DOUBT FRACT)Adult clinical performance study0.928 [0.919 - 0.936]0.811 [0.8 - 0.821]-0.019 [-0.039 - 0.001]0.010 [-0.016 - 0.037]
High-sensitivity (DOUBT FRACT)Children/adolescents clinical performance0.909 [0.889 - 0.926]0.821 [0.796 - 0.844]
High-specificity (FRACT)Adult clinical performance study0.841 [0.829 - 0.853]0.932 [0.925 - 0.939]-0.049 [-0.079 - -0.021]0.033 [0.019 - 0.046]
High-specificity (FRACT)Children/adolescents clinical performance0.792 [0.766 - 0.817]0.965 [0.952 - 0.976]

Clinical Study Performance (MRMC - Reader Performance with/without AI assistance)

MetricUnaided Performance (95% bootstrap CI)Aided Performance (95% bootstrap CI)Increase
Specificity0.906 (0.898-0.913)0.956 (0.951-0.960)+5%
Sensitivity0.648 (0.640-0.656)0.752 (0.745-0.759)+10.4%

2. Sample sizes used for the test set and data provenance:

  • Standalone Performance Test Set:
    • Children/Adolescents: 2,000 radiographs (52.8% males, age range [2 – 21]; mean 11.54 +/- 4.7). The anatomical areas of interest included all those in the Indications for Use for this population group.
    • Adults (cited from predicate device K212365): 8,918 radiographs (47.2% males, age range [21 – 113]; mean 52.5 +/- 19.8). The anatomical areas of interest included all those in the Indications for Use for this population group.
  • Clinical Study Test Set (MRMC): 480 cases (31.9% males, age range [21 – 93]; mean 59.2 +/- 16.4). These cases were from all anatomical areas of interest included in BoneView's Indications for Use.
  • Data Provenance: The document states "various manufacturers" (e.g., Canon, Fujifilm, GE Healthcare, Konica Minolta, Philips, Primax, Samsung, Siemens for standalone data; GE Healthcare, Kodak, Konica Minolta, Philips, Samsung for clinical study data). The general context implies a European or North American source for the regulatory submission (France for the manufacturer, FDA for the review). It is explicitly stated that these datasets were independent of training data. The studies are described as retrospective.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

  • Clinical Study (MRMC Test Set): Ground truth was established by a panel of three U.S. board-certified radiologists. No further details on their years of experience are provided, only their certification.
  • Standalone Test Sets (Children/Adolescents & Adult): The document doesn't explicitly state the number or qualifications of experts used to establish ground truth for the standalone test sets. However, it indicates these datasets were used for "diagnostic performances," implying a definitive ground truth. Given the rigorous nature of FDA submissions, it's highly probable that board-certified radiologists or other qualified medical professionals established this ground truth.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set:

  • Clinical Study (MRMC Test Set): The ground truth was established by a panel of three U.S. board-certified radiologists. The method of adjudication (e.g., majority vote, discussion to consensus) is not explicitly detailed, but it states they "assigned a ground truth label." This strongly suggests a consensus or majority-based method from the panel of three, rather than just 2+1 or 3+1 with a tie-breaker.
  • Standalone Test Sets: Not explicitly stated, though a panel or consensus method is standard for robust ground truth establishment.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance:

  • Yes, a fully-crossed multi-reader, multi-case (MRMC) retrospective reader study was conducted.
  • Effect Size of Improvement with AI Assistance:
    • Specificity: Improved by +5% (from 0.906 unaided to 0.956 aided).
    • Sensitivity: Improved by +10.4% (from 0.648 unaided to 0.752 aided).
    • The study found that "the diagnostic accuracy of readers in the intended use population is superior when aided by BoneView than when unaided by BoneView."
    • Subgroup analysis also found that "Sensitivity and Specificity were higher for Aided reads versus Unaided reads for all of the anatomical areas of interest."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

  • Yes, standalone performance testing was conducted for both the children/adolescent population and the adult population (the latter referencing the predicate device's data). The results are provided in the tables under section 1.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

  • Expert Consensus: The ground truth for the clinical MRMC study was established by a "panel of three U.S. board-certified radiologists who assigned a ground truth label indicating the presence of a fracture and its location." For the standalone testing, although not explicitly stated, it is commonly established by expert interpretation of the radiographs, often through consensus, to determine the presence or absence of fractures.

8. The sample size for the training set:

  • The training of BoneView was performed on a training dataset of 44,649 radiographs, representing 151,096 images. This dataset covered all anatomical areas of interest in the Indications for Use and was sourced from various manufacturers.

9. How the ground truth for the training set was established:

  • The document implies that the "training was performed on a training dataset... for all anatomical areas of interest." While it doesn't explicitly state how ground truth was established for this massive training set, it is standard practice for medical imaging AI that ground truth for training data is established through expert annotation (e.g., radiologists, orthopedic surgeons) of the images, typically through a labor-intensive review process.

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.