(112 days)
Rayvolve is a computer-assisted detection and diagnosis (CAD) software device to assist radiologists and emergency physicians in detecting fractures during the review of radiographs of the musculosketal system. Rayvolve is indicated for adult and pediatric population (≥ 2 years).
Rayvolve is indicated for radiographs of the following industry-standard radiographic views and study types.
Study type (Anatomic Area of interest) / Radiographic Views* supported: Ankle/ AP, Lateral, Oblique Clavicle/ AP, AP Angulated View Elbow/ AP, Lateral Forearm/ AP, Lateral Hip /AP, Frog-leg lateral Humerus /AP, Lateral Knee/ AP, Lateral Pelvis /AP Shoulder/ AP, Lateral, Axillary Tibia/fibula/ AP, Lateral Wrist/ PA, Lateral, Oblique Hand / PA, Lateral, Oblique Foot/ AP, Lateral, Oblique.
- Definitions of anatomic area of interest and radiographic views are consistent with the ACR-SPR-SSR Practice Parameter for the Performance of Radiography of the Extremities guideline.
The medical device is called Rayvolve. It is a standalone software that uses deep learning techniques to detect and localize fractures on osteoarticular X-rays. Rayvolve is intended to be used as an aided-diagnosis device and does not operate autonomously.
Rayvolve has been developed to use the current edition of the DICOM image standard. DICOM is the international standard for transmitting, storing, printing, processing, and displaying medical imaging.
Using the DICOM standard allows Rayvolve to interact with existing DICOM Node servers (eg.: PACS) and clinical-grade image viewers. The device is designed for running on-premise, cloud platform, connected to the radiology center local network, and can interact with the DICOM Node server.
When remotely connected to a medical center DICOM Node server. Rayvolve directly interacts with the DICOM files to output the prediction (potential presence or absence of fracture) the initial image appears first, followed by the image processed by Ravvolve.
Rayvolve does not intend to replace medical doctors. The instructions for use are strictly and systematically transmitted to each user and used to train them on Ravvolve's use.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) summary for Rayvolve:
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are not explicitly listed in a single table with defined thresholds. However, based on the performance data presented, the implicit acceptance criteria for standalone performance appear to be:
- High Sensitivity, Specificity, and AUC for fracture detection.
- Non-inferiority of the retrained algorithm (including pediatric population) compared to the predicate device, specifically by ensuring the lower bound of the difference in AUCs (Retrained - Predicate) for each anatomical area is greater than -0.05.
- Superior diagnostic accuracy of readers when aided by Rayvolve compared to unaided readers, as measured by AUC in an MRMC study.
- Improved sensitivity and specificity for readers when aided by Rayvolve.
Table: Acceptance Criteria (Implicit) and Reported Device Performance
Acceptance Criterion (Implicit) | Reported Device Performance (Standalone & MRMC Studies) |
---|---|
Standalone Performance (Pediatric Population Inclusion) | |
High Sensitivity for fracture detection in pediatric population (implicitly > 0.90 based on predicate). | 0.9611 (95% CI: 0.9480; 0.9710) |
High Specificity for fracture detection in pediatric population (implicitly > 0.80 based on predicate). | 0.8597 (95% CI: 0.8434; 0.8745) |
High AUC for fracture detection in pediatric population (implicitly > 0.90 based on predicate). | 0.9399 (95% Bootstrap CI: 0.9330; 0.9470) |
Non-inferiority of Retrained Algorithm (compared to Predicate for adult & pediatric) | |
Lower bound of difference in AUCs (Retrained - Predicate) > -0.05 for all anatomical areas. | "The lower bounds of the differences in AUCs for the Retrained model compared to the Predicate model are all greater than -0.05, indicating that the Retrained model's performance is not inferior to the Predicate model across all organs." (Specific values for each organ are not provided, only the conclusion that they meet the criterion.) The Total AUC for Retrained is 0.98781 (0.98247; 0.99048) compared to Predicate 0.98607 (0.98104; 0.99058). Overlapping CIs and the non-inferiority statement support this. This suggests the inclusion of pediatric data did not degrade performance in adult data. |
MRMC Clinical Reader Study | |
Diagnostic accuracy (AUC) of readers aided by Rayvolve is superior to unaided readers. | Reader AUC improved from 0.84602 to 0.89327, a difference of 0.04725 (95% Cl: 0.03376; 0.061542) (p=0.0041). This demonstrates statistically significant superiority. |
Reader sensitivity is improved with Rayvolve assistance. | Reader sensitivity improved from 0.86561 (95% Wilson's Cl: 0.84859, 0.88099) to 0.9554 (95% Wilson's CI: 0.94453, 0.96422). |
Reader specificity is improved with Rayvolve assistance. | Reader specificity improved from 0.82645 (95% Wilson's Cl: 0.81187, 0.84012) to 0.83116 (95% Wilson's CI: 0.81673, 0.84467). |
2. Sample Sizes and Data Provenance
-
Test Set (Pediatric Standalone Study):
- Sample Size: 3016 radiographs.
- Data Provenance: Not explicitly stated regarding country of origin. The study was retrospective.
-
Test Set (Adult Predicate Standalone Study - for comparison):
- Sample Size: 2626 radiographs.
- Data Provenance: Not explicitly stated regarding country of origin.
-
Test Set (MRMC Clinical Reader Study):
- Sample Size: 186 cases.
- Data Provenance: Not explicitly stated regarding country of origin. The study was retrospective.
-
Training Set:
- Sample Size: 150,000 osteoarticular radiographs. (Expanded from 115,000 for the predicate device).
- Data Provenance: Not explicitly stated regarding country of origin.
3. Number of Experts and Qualifications for Ground Truth (Test Set)
- Number of Experts: A panel of three (3) US board-certified MSK radiologists.
- Qualifications of Experts: US board-certified MSK (Musculoskeletal) radiologists. Years of experience are not specified, but board certification implies a certain level of expertise.
4. Adjudication Method for the Test Set (Ground Truth Establishment)
- Method: "Each case had been previously evaluated by a panel of three US board-certified MSK radiologists to provide ground truth binary labeling the presence or absence of fracture and the localization information for fractures." This implies a consensus-based ground truth, likely achieved through discussion and agreement among the three radiologists. The term "panel" suggests a collaborative review. No specific "2+1" or "3+1" rule is mentioned, but "panel of three" indicates a rigorous approach to consensus.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was it done?: Yes, a fully crossed multi-reader, multi-case (MRMC) retrospective reader study was done.
- Effect Size of Improvement:
- AUC Improvement: Reader AUC was significantly improved from 0.84602 (unaided) to 0.89327 (aided), resulting in a difference (effect size) of 0.04725 (95% Cl: 0.03376; 0.061542) (p=0.0041).
- Sensitivity Improvement: Reader sensitivity improved from 0.86561 (unaided) to 0.9554 (aided).
- Specificity Improvement: Reader specificity improved from 0.82645 (unaided) to 0.83116 (aided).
6. Standalone (Algorithm Only) Performance Study
- Was it done?: Yes, standalone performance assessments were conducted for both the pediatric population inclusion and the retrained algorithm.
- Pediatric Standalone Study: Sensitivity (0.9611), Specificity (0.8597), and AUC (0.9399) were reported.
- Retrained Algorithm Standalone Study: Non-inferiority was assessed by comparing AUCs against the predicate device's standalone performance, showing improvements or non-inferiority across body parts (e.g., Total AUC for retrained was 0.98781 vs. predicate 0.98607).
7. Type of Ground Truth Used
- For Test Sets (Standalone & MRMC): Expert consensus by a panel of three US board-certified MSK radiologists. They provided binary labeling (presence/absence of fracture) and localization information (bounding boxes) for fractures. This is a form of expert consensus.
8. Sample Size for the Training Set
- Sample Size: 150,000 osteoarticular radiographs.
9. How Ground Truth for the Training Set was Established
The document states that the "training dataset for the subject device was expanded to include 150,000 osteoarticular radiographs". While it confirms the size and composition (mixed adult/pediatric, osteoarticular radiographs), it does not explicitly describe how the ground truth for this training set was established. It mentions that the "previous truthed predicate test dataset was strictly walled off and not included in the new training dataset," implying that the training data was "truthed," but the method (e.g., expert review, automated labeling, etc.) is not detailed. Given the large training set size, it is common for such datasets to be curated through a combination of established clinical reports, expert review, or semi-automated processes, but the specific methodology is not provided in this summary.
§ 892.2090 Radiological computer-assisted detection and diagnosis software.
(a)
Identification. A radiological computer-assisted detection and diagnostic software is an image processing device intended to aid in the detection, localization, and characterization of fracture, lesions, or other disease-specific findings on acquired medical images (e.g., radiography, magnetic resonance, computed tomography). The device detects, identifies, and characterizes findings based on features or information extracted from images, and provides information about the presence, location, and characteristics of the findings to the user. The analysis is intended to inform the primary diagnostic and patient management decisions that are made by the clinical user. The device is not intended as a replacement for a complete clinician's review or their clinical judgment that takes into account other relevant information from the image or patient history.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithm, including a description of the algorithm inputs and outputs, each major component or block, how the algorithm and output affects or relates to clinical practice or patient care, and any algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will provide improved assisted-read detection and diagnostic performance as intended in the indicated user population(s), and to characterize the standalone device performance for labeling. Performance testing includes standalone test(s), side-by-side comparison(s), and/or a reader study, as applicable.
(iii) Results from standalone performance testing used to characterize the independent performance of the device separate from aided user performance. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Devices with localization output must include localization accuracy testing as a component of standalone testing. The test dataset must be representative of the typical patient population with enrichment made only to ensure that the test dataset contains a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant disease, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Results from performance testing that demonstrate that the device provides improved assisted-read detection and/or diagnostic performance as intended in the indicated user population(s) when used in accordance with the instructions for use. The reader population must be comprised of the intended user population in terms of clinical training, certification, and years of experience. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, positive and negative predictive values, and diagnostic likelihood ratio). Test datasets must meet the requirements described in paragraph (b)(1)(iii) of this section.(v) Appropriate software documentation, including device hazard analysis, software requirements specification document, software design specification document, traceability analysis, system level test protocol, pass/fail criteria, testing results, and cybersecurity measures.
(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the device instructions for use, including the intended reading protocol and how the user should interpret the device output.
(iii) A detailed description of the intended user, and any user training materials or programs that address appropriate reading protocols for the device, to ensure that the end user is fully aware of how to interpret and apply the device output.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics and medical history, user experience, and imaging equipment.