K Number
K212365
Device Name
BoneView
Manufacturer
Date Cleared
2022-03-01

(214 days)

Product Code
Regulation Number
892.2090
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

BoneView is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs of:

Study Type (Anatomical Area of Interest)Compatible Radiographic View(s)
AnkleFrontal, Lateral, Oblique
FootFrontal, Lateral, Oblique
KneeFrontal, Lateral
Tibia/FibulaFrontal, Lateral
FemurFrontal, Lateral
WristFrontal, Lateral, Oblique
HandFrontal, Oblique
ElbowFrontal, Lateral
ForearmFrontal, Lateral
HumerusFrontal, Lateral
ShoulderFrontal, Lateral, Axillary
ClavicleFrontal
PelvisFrontal
HipFrontal, Frog Leg Lateral
RibsFrontal Chest, Rib series
Thoracic SpineFrontal, Lateral
Lumbosacral SpineFrontal, Lateral

BoneView is intended for use as a concurrent reading aid during the interpretations of radiographs. BoneView is for prescription use only and is indicated for adults only.

Device Description

BoneView is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs.

BoneView can be deployed on-premises or on cloud and be connected to several computing platforms and X-ray imaging platforms such as X-ray radiographic systems, or PACS. More precisely, BoneView can be deployed:

  • In the cloud with a PACS as the DICOM Source
  • . On-premises with a PACS as the DICOM Source
  • On-premises with an X-ray system as the DICOM Source

After the acquisition of the radiographs on the patient and their storage in the DICOM Source, the radiographs are automatically received by BoneView from the user's DICOM Source through an intermediate DICOM node (for example, a specific Gateway, or a dedicated API). The DICOM Source can be the user's image storage system (for example, the Picture Archiving and Communication System, or PACS), or other radiological equipment (for example X-ray systems).

Once received by BoneView, the radiographs are automatically processed by the AI algorithm to identify regions of interest. Based on the processing result, BoneView generates result files in DICOM format. These result files consist of a summary table and result images (annotations on a copy of the original images or annotations to be toggled on/off). BoneView does not alter the original images, nor does it change the order of original images or delete any image from the DICOM Source.

Once available, the result files are sent by BoneView to the DICOM Destination through the same intermediate DICOM node. Similar to the DICOM Source, the DICOM Destination can be the user's image storage system (for example, the Picture Archiving and Communication System, or PACS), or other radiological equipment (for example X-ray systems). The DICOM Source and the DICOM Destination are not necessarily identical.

The DICOM Destination can be used to visualize the result files provided by BoneView or to transfer the results to another DICOM host for visualization. The users are then able to use them as a concurrent reading aid to provide their diagnosis.

The general layout of images processed by BoneView is comprising:

(1) The "summary table" – it is a first image that is derived from the detected regions of interest in the following result images and that displays the results of the overall study along with the Gleamer – BoneView logo. This summary can be configured to be present or not.

(2) The result images – they are provided for all the images that were processed by BoneView and contain:

  • . Around the Regions of Interest (if any), a rectangle with a solid or dotted line depending on the confidence of the algorithm (see below)
  • . Around the entire image, a white frame showing that the images were processed by BoneView
  • . Below the image:
    • o The Gleamer BoneView logo
    • o The number of Regions of interest that are displayed in the result image
    • (if any) The caution message if it was identified that the image was not part of o the indication for use of BoneView

The training of BoneView was performed on a training dataset of 44,649 radiographs, representing 151,096 images (52.4% of males, with age: range [0 – 109]; mean 42.4 +/- 24.6) for all anatomical areas of interest in the Indications for Use and from various manufacturers. BoneView has been designed to solve the problem of missed fractures including subtle fractures, and thus detects fractures with a high sensitivity. In this regard, the display of findings is triggered by a "high-sensitivity operating point" (DOUBT FRACT) that will enable the display of a dotted-line bounding box around the region of interest. Additionally, the users need to be confident that when BoneView identifies a fracture, it is actually a fracture. In this regard, an additional information is introduced to the user with a "high-specificity operating point" (FRACT).

These two operating points are implemented in the User Interface as follow:

  • Dotted-line Bounding Box: suspicious area / subtle fracture (when the level of . confidence of the Al algorithm associated with the finding is above "high-sensitivity operating point" and below "high-specificity operating point") displayed as a dotted bounding box around the area of interest

  • . Solid-line Bounding Box: definite or unequivocal fractures (when the level of confidence of the AI algorithm associated with the finding is above "high-specificity operating point") displayed as a solid bounding box around the area of interest
    BoneView can provide 4 levels of results:

  • . FRACT: BoneView identified at least one solid-line bounding box on the result images,

  • . DOUBT FRACT: BoneView did not identify any solid-line bounding box on the result images but it identified at least one dotted-line bounding box in the result images,

  • . NO FRACT: BoneView did not identify any bounding box at all in the result images,

  • NOT AVAILABLE: BoneView identified that the original images are out of its Indications for Use

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the device meets them, based on the provided text:


1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly present a table of acceptance criteria (i.e., predefined thresholds that the device must meet). Instead, it shows the reported performance of the device from standalone testing and a clinical study. I will present the reported performance, which implicitly are the metrics used to demonstrate effectiveness.

Standalone Performance (High-Sensitivity Operating Point - DOUBT FRACT):

MetricGlobal Performance (95% CI)
Specificity0.811 [0.8 - 0.821]
Sensitivity0.928 [0.919 - 0.936]

Standalone Performance (High-Specificity Operating Point - FRACT):

MetricGlobal Performance (95% CI)
Specificity0.932 [0.925 - 0.939]
Sensitivity0.841 [0.829 - 0.853]

Clinical Study (Reader Performance with AI vs. Without AI Assistance):

MetricUnaided (95% CI)Aided (95% CI)
Specificity0.906 [0.898-0.913]0.956 [0.951-0.960]
Sensitivity0.648 [0.640-0.656]0.752 [0.745-0.759]

2. Sample Sizes Used for the Test Set and Data Provenance

  1. Standalone Performance Test Set:

    • Sample Size: 8,918 radiographs (n(positive)=3,886, n(negative)=5,032).
    • Data Provenance: The dataset was independent of the data used for model training and establishment of device operating points. It included full anatomical areas of interest for adults (age range [21-113]; mean 52.5 +/- 19.8, 47.2% males). Images were sourced from various manufacturers (Agfa, Fujifilm, GE Healthcare, Kodak, Konica Minolta, Philips, Primax, Samsung, Siemens). No specific country of origin is mentioned, but the variety of manufacturers suggests a diverse dataset. The study description implies it's a retrospective analysis of existing radiographs.
  2. Clinical Study (MRMC) Test Set:

    • Sample Size: 480 cases (31.9% males, age range [21-93]; mean 59.2 +/- 16.4). It covered all anatomical areas of interest listed in BoneView's Indications for Use.
    • Data Provenance: The dataset was independent of the data used for model training and establishment of device operating points. Images were from various manufacturers (GE Healthcare, Kodak, Konica Minolta, Philips, Samsung). The study implies it's a retrospective analysis of existing radiographs.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Their Qualifications

  • Standalone Performance Test Set: The document does not explicitly state how the ground truth was established for the standalone test set (e.g., number of experts). However, given the nature of the clinical study, it's highly probable that similar expert review was used.
  • Clinical Study (MRMC) Test Set:
    • Number of Experts: A panel of three experts.
    • Qualifications: U.S. board-certified radiologists. The document does not specify their years of experience.

4. Adjudication Method for the Test Set

  • Clinical Study (MRMC) Test Set: Ground truth was assigned by a panel of three U.S. board-certified radiologists. The method implies a consensus or majority rule (e.g., 2+1 or 3+1), as a "ground truth label indicating the presence or absence of a fracture and its location" was assigned per case. The specific adjudication method (e.g., majority vote, independent reads then consensus) is not detailed, but the use of a panel suggests a robust method to establish ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

  • Yes, an MRMC study was done.
  • Effect Size of Human Readers' Improvement with AI vs. Without AI Assistance (based on the reported deltas):
    • Specificity Improvement: +5% increase (from 0.906 unaided to 0.956 aided).
    • Sensitivity Improvement: +10.4% increase (from 0.648 unaided to 0.752 aided).
    • The study found that "the diagnostic accuracy of readers...is superior when aided by BoneView than when unaided."

6. Standalone (Algorithm Only) Performance

  • Yes, a standalone performance study was done.
  • The results are detailed in the "Bench Testing" section (7.4) and summarized in the table above for both "high-sensitivity operating point" and "high-specificity operating point." This evaluation used 8,918 radiographs and assessed the detection of fractures with high sensitivity and high specificity.

7. Type of Ground Truth Used

  • For the Clinical Study (MRMC) and likely for the Standalone Test Set: Expert consensus (a panel of three U.S. board-certified radiologists assigned the ground truth label for presence or absence and location of a fracture).

8. Sample Size for the Training Set

  • Training Set Sample Size: 44,649 radiographs, representing 151,096 images.
  • Patient Demographics for Training Set: 52.4% males, age range [0-109]; mean 42.4 +/- 24.6.
  • The training data covered "all anatomical areas of interest in the Indications for Use and from various manufacturers."

9. How the Ground Truth for the Training Set Was Established

  • The document states that the training of BoneView was performed on this dataset. However, it does not explicitly detail how the ground truth for this training set was established. It is implied that fractures were somehow labeled for the supervised deep learning methodology, but the process (e.g., specific number of radiologists, their qualifications, adjudication method) is not described for the training data.

§ 892.2090 Radiological computer-assisted detection and diagnosis software.

(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.