K Number
K234042
Date Cleared
2024-06-07

(169 days)

Product Code
Regulation Number
892.2050
Panel
RA
Reference & Predicate Devices
Predicate For
N/A
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

EFAI BONESUITE XR BONE AGE PRO ASSESSMENT SYSTEM (EFAI BAPXR) is designed to view and quantify bone age from 2D Posterior Anterior (PA) view of left-hand radiographs using deep learning techniques to aid in the analysis of bone age assessment of patients between 2 to 16 years old for pediatric radiologists. The results should not be relied upon alone by pediatric radiologists to make diagnostic decisions. The images shall be with left hand and wrist fully visible within the field of view, and shall be without any major bone destruction, deformity, fracture, excessive motion, or other major artifacts.

Device Description

The device is a software designed to aid the quantification of bone age for patients between 2 to 16 years old. The software uses deep learning techniques to analyze posterior-anterior (PA) radiographs of the left-hand according to the Greulich-Pyle (GP) method.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device's performance, based on the provided text:

EFAI Bonesuite XR Bone Age Pro Assessment System (BAP-XR-100) Performance Study

1. A table of acceptance criteria and the reported device performance

The acceptance criteria for this device are based on the intercept and slope of a Deming regression analysis between the device's output (EFAI BAPXR) and the Ground Truth (GT). The criteria are that both the intercept and slope of the regression line must fall within the range of the highest acceptable bias. The text does not explicitly state the numerical "highest acceptable bias" range, but it states that the observed results met these general criteria.

MetricAcceptance Criteria (General)Reported Device Performance (EFAI BAPXR vs. GT)
Deming Regression InterceptFall within the range of the highest acceptable bias-0.07 (95% CI: [-0.13, -0.01])
Deming Regression SlopeFall within the range of the highest acceptable bias1.00 (95% CI: [0.99, 1.00])
Percentage of cases with bone age difference 88%
Bland-Altman 95% Limits of Agreement (EFAI BAPXR vs. GT)(Not explicitly stated as an primary acceptance criterion, but reported as an indicator of high consistency)-0.517 to 0.743 (with CIs in gray dashed lines)

2. Sample size used for the test set and the data provenance (e.g., country of origin of the data, retrospective or prospective)

  • Test Set (Clinical Study): 600 cases
  • Data Provenance: Retrospectively collected from 27 locations across multiple states and multiple clinical organizations in the United States.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g., radiologist with 10 years of experience)

  • Number of Experts: Four (4)
  • Qualifications of Experts: U.S. board-certified expert radiologists. Specific experience level (e.g., years) is not mentioned.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set

The ground truth for the test set was established through a "Ground Truthing Workflow" involving multiple stages:

  • Bone Age Assessment: Individual assessments by the four expert radiologists.
  • Consensus Via Grading: Implies a process of evaluating and potentially assigning grades to assessments based on predetermined criteria (e.g., differences).
  • Majority Voting: Most likely used when assessments differed, to reach an initial consensus.
  • Final Adjudication: This step suggests a process where discrepancies or remaining disagreements after majority voting were resolved by a final decision-making body or method. The flowchart indicates a systematic process to ensure consistency and consensus, though the exact rules for "Final Adjudication" (e.g., if a lead adjudicator made a final decision or if all 4 radiologists had to agree) are not explicitly detailed beyond "consensus among all readers reviewing the radiographs."

This detailed workflow suggests a robust, multi-reader consensus approach for ground truthing, rather than a simple 'none' or majority vote without further review.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

No, a multi-reader multi-case (MRMC) comparative effectiveness study (human readers with AI vs. without AI assistance) was not explicitly described. The clinical study was a standalone performance study of the EFAI BAPXR device itself, comparing its output to ground truth established by expert radiologists, not measuring human reader improvement with AI assistance.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done. The description states: "EFAI conducted a standalone performance study with the proposed device EFAI BAPXR..." This study measured the performance of the EFAI BAPXR algorithm directly against the established ground truth.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)

The ground truth used was expert consensus based on assessments by four U.S. board-certified expert radiologists, following a structured "Ground Truthing Workflow" that included individual assessments, consensus via grading, majority voting, and final adjudication, comparing their findings to the Greulich-Pyle Atlas.

8. The sample size for the training set

The training set comprised 23,578 cases.

9. How the ground truth for the training set was established

For the training set, the ground truth was established as the average of the bone age assessments independently done by three board-certified radiologists.

§ 892.2050 Medical image management and processing system.

(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).