Search Results

TechCare Trauma is intended to analyze 2D X ray radiographs using techniques to aid in the detection, localization, and characterization of fractures and/or elbow joint effusion during the review of commonly acquired radiographs of: Ankle, Foot, Knee, Leg (includes Tibia/Fibula), Femur, Wrist, Hand/Finger, Elbow, Forearm, Arm (includes Humerus), Shoulder, Clavicle, Pelvis, Hip, Thorax (includes ribs).

TechCare Trauma can provide results for fracture in neonates and infants (from birth to less than 2 years), children and adolescents (aged 2 to less than 22 years) and adults (aged 22 years and over).

TechCare Trauma can provide results for elbow joint effusions in children and adolescents (aged 2 to less than 22 years) and adults (aged 22 years and over).

The intended users of TechCare Trauma are clinicians with the authority to diagnose fractures and/or elbow joint effusions in various settings including primary care (e. g., family practice, internal medicine), emergency medicine, urgent care, and specialty care (e. g. orthopedics), as well as radiologists who review radiographs across settings.

TechCare Trauma results are not intended to be used on a stand-alone basis for clinical decision-making. Primary diagnostic and patient management decisions are made by the clinical user.

Device Description

The TechCare Trauma device is a software as Medical Device (SaMD). More specifically it is defined as a "radiological computer assisted detection and diagnostic software for suspected fractures".

As a CADe/x software, TechCare Trauma is an image processing device intended to aid in the detection and localization of fractures and elbow joint effusions on acquired medical images (2D X-ray radiographs).

TechCare Trauma uses an artificial intelligence algorithm to analyze acquired medical images (2D X-ray radiographs) for features suggestive of fractures and elbow joint effusions.

TechCare Trauma can provide results for fractures in neonates and infants (from birth to less than 2 years), children and adolescents (aged 2 to less than 22 years) and adults (aged 22 years and over) regardless of their condition.

TechCare Trauma can provide results for elbow joint effusions in children and adolescents (aged 2 to less than 22 years) and adults (aged 22 years and over).The device detects and identifies fractures and elbow joint effusions based on a visual model's analysis of images and provides information about the presence and location of these prespecified findings to the user.

It relies solely on images provided by DICOM sources. Once integrated into existing networks, TechCare Trauma automatically receives and processes these images without any manual intervention. The processed results, which consist of one or more images derived from the original inputs, are then sent to specified DICOM destinations. This ensures that the results can be seamlessly viewed on any compatible DICOM viewer, allowing smooth into medical imaging workflows.

TechCare Trauma can be deployed on-premises or on cloud and be connected to multiple DICOM sources / destinations (including but not limited to DICOM storage platform, PACS, VNA and radiological equipment, such as X-ray systems), ensuring easy integration into existing clinical workflows.

AI/ML Overview

Here's a detailed breakdown of the acceptance criteria and study findings for the TechCare Trauma device, based on the provided text:

Acceptance Criteria and Device Performance

The acceptance criteria for the TechCare Trauma device appear to be based on achieving high diagnostic accuracy, specifically measured by the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve for both standalone performance and multi-reader multi-case (MRMC) comparative studies. The study demonstrated successful performance against these implied criteria.

Table of Acceptance Criteria and Reported Device Performance

Metric	Acceptance Criteria (Implied/Study Goal)	Reported Device Performance (Standalone)	Reported Device Performance (MRMC with AI vs. without AI)
Standalone Performance (Image-level ROC-AUC)	High accuracy (specific threshold not explicitly stated but implied by achievement across all categories)	Fracture - Adult: 0.962 [0.957 - 0.967] Fracture - Pediatric: 0.962 [0.955 - 0.969] EJE - Adult: 0.965 [0.936 - 0.986] EJE - Pediatric: 0.976 [0.963 - 0.986] (Further detailed by anatomical regions, age, gender, image view, and imaging hardware manufacturers, all showing high AUCs.)	Not applicable (standalone algorithm only)
Reader Performance (MRMC ROC-AUC)	Superior to unaided reader performance (statistically significant improvement)	Not applicable (human reader performance)	Adult Fracture: Improved from 0.865 to 0.955 (Δ 0.090, p < 0.001) Adult EJE: Improved from 0.851 to 0.914 (Δ 0.064, p < 0.001) Pediatric Fracture: Improved from 0.857 to 0.931 (Δ 0.074, p < 0.001) Pediatric EJE: Improved from 0.877 to 0.941 (Δ 0.063, p = 0.002)
Reader Performance (MRMC Sensitivity)	Increased Sensitivity with AI aid	Not applicable (human reader performance)	Adult Fracture: Increased by 21.8% (from 0.807 to 0.983) Adult EJE: Increased by 12.7% (from 0.872 to 0.983) Pediatric Fracture: Increased by 19.9% (from 0.804 to 0.964) Pediatric EJE: Increased by 18.2% (from 0.825 to 0.975)
Reader Performance (MRMC Specificity)	Maintained or increased Specificity with AI aid	Not applicable (human reader performance)	Adult Fracture: Increased by 1.47% (from 0.815 to 0.827) Adult EJE: Increased by 1.08% (from 0.738 to 0.746) Pediatric Fracture: Remained the same (0.797) Pediatric EJE: Increased by 1.43% (from 0.839 to 0.851)

Study Details

2. Sample sizes used for the test set and the data provenance (e.g. country of origin of the data, retrospective or prospective)

Standalone Performance Test Set:

Fracture Detection: 4109 radiographs of US adult patients and 2872 radiographs of US pediatric patients.
EJE Detection: 280 radiographs of US adult patients and 483 radiographs of US pediatric patients.
Data Provenance: Retrospective, obtained from various states in the US (at least 4) and various imaging hardware manufacturers (at least 14).

MRMC Comparative Effectiveness Study Test Set:

Adult US population for fracture detection: 304 radiological cases
Pediatric US population for fracture detection: 256 radiological cases
Adult US population for EJE detection: 109 radiological cases
Pediatric US population for EJE detection: 100 radiological cases
Data Provenance: Retrospective, external multicenter anonymized datasets obtained from sites that were different from the training data sites, ensuring independence. All data from US patients.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Standalone Performance Test Set:

Number of Experts: Three American Board of Radiology (ABR)-certified radiologists for both adult and pediatric cases.
Qualifications: Minimum of 5 years of experience since ABR certification. Pediatric cases were annotated by pediatric radiologists, and adult cases by musculoskeletal (MSK) radiologists.

MRMC Comparative Effectiveness Study Test Set:

Number of Experts: Three ABR-certified radiologists.
Qualifications: At least five years of experience. Pediatric cases were annotated by a panel of three pediatric radiologists, while adult cases were reviewed by a panel of three MSK radiologists.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

Both Standalone and MRMC Studies:

Adjudication Method: Two radiologists independently assessed each case. For cases with disagreement between the first two, a third radiologist independently reviewed the case. The final reference standard (ground truth) was determined by majority consensus (referred to as "2+1" if two agree, or "3+0" if all three agree after subsequent review).

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Yes, a MRMC comparative effectiveness study was done.
Effect Size of Improvement (AI vs. without AI assistance):
- Adult Fracture (ROC AUC delta): 0.090 (from 0.865 to 0.955)
- Adult EJE (ROC AUC delta): 0.064 (from 0.851 to 0.914)
- Pediatric Fracture (ROC AUC delta): 0.074 (from 0.857 to 0.931)
- Pediatric EJE (ROC AUC delta): 0.063 (from 0.877 to 0.941)
Additionally, significant improvements in sensitivity were observed:
- Adult Fracture Sensitivity: +21.8%
- Adult EJE Sensitivity: +12.7%
- Pediatric Fracture Sensitivity: +19.9%
- Pediatric EJE Sensitivity: +18.2%

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done.
- Fracture Detection ROC-AUC: 0.962 (Adults) and 0.962 (Pediatrics)
- EJE Detection ROC-AUC: 0.965 (Adults) and 0.976 (Pediatrics)

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)

The ground truth for both standalone and MRMC studies was established by expert consensus of ABR-certified radiologists.

8. The sample size for the training set

Training Set Sample Size: 95,266 images.

9. How the ground truth for the training set was established

The document does not explicitly describe how the ground truth for the training set was established. However, given the detailed methodology for the test set ground truth, it is highly probable that a similar expert-driven annotation process (potentially internal and/or external) was followed for the training data as well. The text states the training was performed "from various manufacturers," suggesting a diverse dataset that would necessitate robust ground truthing.

Ask a Question

Ask a specific question about this device

K Number

K240712

Device Name

icobrain aria

Manufacturer

icometrix NV

Date Cleared

2024-11-07

(237 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Intended Use

icobrain aria is a computer-assisted detection (CADe) and diagnosis (CADx) software device to be used as a concurrent reading aid to help trained radiologists in the detection, assessment and characterization of Amyloid Related Imaging Abnormalities (ARIA) from a set of brain MR images. The software provides information about the presence, location, size, severity and changes of ARIA-E (brain edema or sulcal effusions) and ARIA-H (hemosiderin deposition, including microhemorrhage and superficial siderosis). Patient management decisions should not be made solely on the basis of analysis by icobrain aria.

Device Description

icobrain aria is a software-only device for assisting radiologists with the detection of amyloid-related imaging abnormalities (ARIA) on brain MRI scans of Alzheimer's disease patients under an amyloid beta-directed antibody therapy. The device utilizes 2D fluid-attenuated inversion recovery (FLAR) for the detection of ARIA-E (edema/sulcal effusion) and 2D T2* gradient echo (T2*-GRE) for the detection of ARIA-H (hemosiderin deposition).

icobrain aria automatically processes input brain MRI scans in DICOM format from two time points and generates annotated DICOM images and an electronic report.

AI/ML Overview

Here's a summary of the acceptance criteria and study that proves the device meets them, based on the provided text:

icobrain aria: Acceptance Criteria and Performance Study Summary

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly listed in a single, dedicated table with pass/fail thresholds. Instead, they are implicitly defined by the statistically significant improvements demonstrated in the clinical (MRMC) study, and the "in line with human experts" conclusion from standalone performance. The document focuses on showing the effect size of the improvement rather than pre-defined absolute thresholds for sensitivity, specificity, or AUC for human-AI combined performance. For standalone metrics, it reports specific values and concludes they are "in line with the performance of human experts," suggesting the internal acceptance criteria were met.

Therefore, the table below will summarize the reported performance results from the clinical study, which implicitly met the acceptance criteria by demonstrating significant improvement over unassisted reading.

Performance Metric	Acceptance Criteria (Implicit, based on study outcomes)	Reported Device Performance (Assisted)	Reported Device Performance (Unassisted)	Result
ARIA-E Detection (AUC)	Significant improvement over unassisted reading	0.873 (95% CI [0.835, 0.911])	0.822	Significant Improvement (+0.051 AUC, p=0.001)
ARIA-E Detection (Sensitivity)	Increase over unassisted reading	86.5%	70.9%	Significant Increase
ARIA-E Detection (Specificity)	Maintain above 80% with assisted reading	83.0%	91.7%	Maintained above 80% (slight decrease compared to unassisted, but still high)
Pooled ARIA-H Detection (AUC)	Significant improvement over unassisted reading	0.825 (95% CI [0.781, 0.869])	0.781	Significant Improvement (+0.044 AUC, p=0.001)
Pooled ARIA-H Detection (Sensitivity)	Increase over unassisted reading	79.0%	68.7%	Significant Increase
Pooled ARIA-H Detection (Specificity)	Maintain above 80% with assisted reading	80.3%	82.8%	Maintained above 80% (slight decrease compared to unassisted, but still high)
ARIA-H Microhemorrhages Detection (AUC)	Significant improvement over unassisted reading	0.808 (95% CI [0.760, 0.855])	0.779	Significant Improvement (+0.029 AUC, p=0.032)
ARIA-H Microhemorrhages Detection (Sensitivity)	Increase over unassisted reading	79.6%	69.3%	Significant Increase
ARIA-H Microhemorrhages Detection (Specificity)	Maintain above 80% with assisted reading	76.7%	83.1%	Below 80% for this specific subtype
ARIA-H Superficial Siderosis Detection (AUC)	Significant improvement over unassisted reading	0.784 (95% CI [0.732, 0.836])	0.721	Significant Improvement (+0.063 AUC, p=0.003)
ARIA-H Superficial Siderosis Detection (Sensitivity)	Increase over unassisted reading	59.9%	49.7%	Significant Increase
ARIA-H Superficial Siderosis Detection (Specificity)	Maintain above 80% with assisted reading	95.6%	92.7%	Maintained and improved
Localization Performance	Significant improvement in accuracy for spatial distribution	Significantly better for assisted reads	N/A	Met
ARIA Severity Measurement Accuracy	Significantly lower absolute differences vs. ground truth	Significantly lower assisted vs. unassisted	N/A	Met
Inter-reader Variability (Kendall's Coeff. of Concordance)	Significantly lower for assisted reads	ARIA-E: 0.809 (assisted) / 0.720 (unassisted); ARIA-H: 0.799 (assisted) / 0.656 (unassisted)	N/A	Significant Reduction
Reading Time	Faster with assisted reading	Median 2:21min (assisted)	Median 2:34min (unassisted)	Faster

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: 199 cases.
Data Provenance: MRI datasets from subjects diagnosed with Alzheimer's disease. To guarantee independence, test data subjects were not included in the training set.
- Country of Origin: More than 100 sites in 20 countries. Approximately half the data originated from the US and the other half from outside the US.
- Retrospective/Prospective: The study used retrospective data from clinical trials (aducanumab clinical trials PRIME (NCT02677572), EMERGE (NCT02484547), and ENGAGE (NCT02477800)). This data provenance applies to both training and testing datasets.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: A consensus of 3 experts was used for the clinical (MRMC) study ground truth. For standalone testing, the ground truth was established by unspecified "expert neuroradiologists."
Qualifications of Experts:
- Clinical Study (MRMC): Experts who performed "safety ARIA reading in clinical trials for Aβ-directed antibody therapies in AD."
- Standalone Testing: "expert neuroradiologists (with experience performing safety ARIA reading in clinical trials for Aβ-directed antibody therapies in AD) manually segmented both ARIA-H findings." This indicates they had prior, relevant experience.

4. Adjudication Method for the Test Set

Adjudication Method: "A consensus of 3 experts" was used to establish the ground truth for the clinical (MRMC) study. The specific consensus method (e.g., majority vote, discussion to agreement) is not detailed, but the term "consensus" implies a collective agreement process.

5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study was Done, and Effect Size of Improvement

MRMC Study Done: Yes, a fully-crossed MRMC retrospective reader study was conducted.
Effect Size (AUC difference, Assisted vs. Unassisted):
- ARIA-E Detection: +0.051 AUC (95% CI [0.020, 0.083]), p=0.001
- Pooled ARIA-H Detection: +0.044 AUC (95% CI [0.017, 0.070]), p=0.001
- ARIA-H Microhemorrhages: +0.029 AUC (95% CI [0.002, 0.055]), p=0.032
- ARIA-H Superficial Siderosis: +0.063 AUC (95% CI [0.023, 0.102]), p=0.003

Readers also showed significant increases in sensitivity, significant decreases in inter-reader variability, and were on average faster when assisted.

6. If a Standalone (i.e. Algorithm only without human-in-the-loop performance) was Done

Standalone Study Done: Yes, "icometrix conducted standalone performance assessments."
- Standalone Performance Highlights (Main Test Set on 199 cases):
  - ARIA-E Diagnosis: Sensitivity 0.94, Specificity 0.67, AUC 0.84
  - ARIA-H Diagnosis: Sensitivity 0.87, Specificity 0.66, AUC 0.81
  - ARIA-E Finding-level: True Positive Rate 69.1%, False Positive findings per case 0.7
  - ARIA-H New Microhemorrhages Finding-level: True Positive Rate 66.1%, False Positive findings per case 0.9
  - ARIA-H New Superficial Siderosis Finding-level: True Positive Rate 62.5%, False Positive findings per case 0.1
- The document concludes that standalone performance was "in line with the performance of human experts."

7. The Type of Ground Truth Used

Ground Truth Type: Expert consensus for the clinical study (MRMC) and expert manual annotations for the standalone testing.
- Details: For standalone testing, "expert neuroradiologists ... manually segmented both ARIA-E and ARIA-H findings. Ground truth ARIA measurements were derived from the expert manual annotated masks." For the MRMC study, ground truth was obtained via "a consensus of 3 experts."

8. The Sample Size for the Training Set

Training Set Sample Size:
- FLAIR images (for ARIA-E): 475 image pairs from 172 subjects.
- T2-GRE images (for ARIA-H):* 326 image pairs from 177 subjects.

9. How the Ground Truth for the Training Set Was Established

Ground Truth Establishment for Training Set: The data used for developing the algorithms "have been manually annotated by expert neuroradiologists with prior experience of reading ARIA in clinical trials of amyloid beta-directed antibody drugs." This implies manual annotation by experts served as the ground truth for training.

Ask a Question

Ask a specific question about this device

K Number

K240845

Device Name

Rayvolve

Manufacturer

AZmed SAS

Date Cleared

2024-07-17

(112 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K220164

Predicate For

N/A

Intended Use

Rayvolve is a computer-assisted detection and diagnosis (CAD) software device to assist radiologists and emergency physicians in detecting fractures during the review of radiographs of the musculosketal system. Rayvolve is indicated for adult and pediatric population (≥ 2 years).

Rayvolve is indicated for radiographs of the following industry-standard radiographic views and study types.

Study type (Anatomic Area of interest) / Radiographic Views* supported: Ankle/ AP, Lateral, Oblique Clavicle/ AP, AP Angulated View Elbow/ AP, Lateral Forearm/ AP, Lateral Hip /AP, Frog-leg lateral Humerus /AP, Lateral Knee/ AP, Lateral Pelvis /AP Shoulder/ AP, Lateral, Axillary Tibia/fibula/ AP, Lateral Wrist/ PA, Lateral, Oblique Hand / PA, Lateral, Oblique Foot/ AP, Lateral, Oblique.

Definitions of anatomic area of interest and radiographic views are consistent with the ACR-SPR-SSR Practice Parameter for the Performance of Radiography of the Extremities guideline.

Device Description

Rayvolve has been developed to use the current edition of the DICOM image standard. DICOM is the international standard for transmitting, storing, printing, processing, and displaying medical imaging.

Using the DICOM standard allows Rayvolve to interact with existing DICOM Node servers (eg.: PACS) and clinical-grade image viewers. The device is designed for running on-premise, cloud platform, connected to the radiology center local network, and can interact with the DICOM Node server.

When remotely connected to a medical center DICOM Node server. Rayvolve directly interacts with the DICOM files to output the prediction (potential presence or absence of fracture) the initial image appears first, followed by the image processed by Ravvolve.

Rayvolve does not intend to replace medical doctors. The instructions for use are strictly and systematically transmitted to each user and used to train them on Ravvolve's use.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) summary for Rayvolve:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly listed in a single table with defined thresholds. However, based on the performance data presented, the implicit acceptance criteria for standalone performance appear to be:

High Sensitivity, Specificity, and AUC for fracture detection.
Non-inferiority of the retrained algorithm (including pediatric population) compared to the predicate device, specifically by ensuring the lower bound of the difference in AUCs (Retrained - Predicate) for each anatomical area is greater than -0.05.
Superior diagnostic accuracy of readers when aided by Rayvolve compared to unaided readers, as measured by AUC in an MRMC study.
Improved sensitivity and specificity for readers when aided by Rayvolve.

Table: Acceptance Criteria (Implicit) and Reported Device Performance

Acceptance Criterion (Implicit)	Reported Device Performance (Standalone & MRMC Studies)
Standalone Performance (Pediatric Population Inclusion)
High Sensitivity for fracture detection in pediatric population (implicitly > 0.90 based on predicate).	0.9611 (95% CI: 0.9480; 0.9710)
High Specificity for fracture detection in pediatric population (implicitly > 0.80 based on predicate).	0.8597 (95% CI: 0.8434; 0.8745)
High AUC for fracture detection in pediatric population (implicitly > 0.90 based on predicate).	0.9399 (95% Bootstrap CI: 0.9330; 0.9470)
Non-inferiority of Retrained Algorithm (compared to Predicate for adult & pediatric)
Lower bound of difference in AUCs (Retrained - Predicate) > -0.05 for all anatomical areas.	"The lower bounds of the differences in AUCs for the Retrained model compared to the Predicate model are all greater than -0.05, indicating that the Retrained model's performance is not inferior to the Predicate model across all organs." (Specific values for each organ are not provided, only the conclusion that they meet the criterion.) The Total AUC for Retrained is 0.98781 (0.98247; 0.99048) compared to Predicate 0.98607 (0.98104; 0.99058). Overlapping CIs and the non-inferiority statement support this. This suggests the inclusion of pediatric data did not degrade performance in adult data.
MRMC Clinical Reader Study
Diagnostic accuracy (AUC) of readers aided by Rayvolve is superior to unaided readers.	Reader AUC improved from 0.84602 to 0.89327, a difference of 0.04725 (95% Cl: 0.03376; 0.061542) (p=0.0041). This demonstrates statistically significant superiority.
Reader sensitivity is improved with Rayvolve assistance.	Reader sensitivity improved from 0.86561 (95% Wilson's Cl: 0.84859, 0.88099) to 0.9554 (95% Wilson's CI: 0.94453, 0.96422).
Reader specificity is improved with Rayvolve assistance.	Reader specificity improved from 0.82645 (95% Wilson's Cl: 0.81187, 0.84012) to 0.83116 (95% Wilson's CI: 0.81673, 0.84467).

2. Sample Sizes and Data Provenance

Test Set (Pediatric Standalone Study):
- Sample Size: 3016 radiographs.
- Data Provenance: Not explicitly stated regarding country of origin. The study was retrospective.
Test Set (Adult Predicate Standalone Study - for comparison):
- Sample Size: 2626 radiographs.
- Data Provenance: Not explicitly stated regarding country of origin.
Test Set (MRMC Clinical Reader Study):
- Sample Size: 186 cases.
- Data Provenance: Not explicitly stated regarding country of origin. The study was retrospective.
Training Set:
- Sample Size: 150,000 osteoarticular radiographs. (Expanded from 115,000 for the predicate device).
- Data Provenance: Not explicitly stated regarding country of origin.

3. Number of Experts and Qualifications for Ground Truth (Test Set)

Number of Experts: A panel of three (3) US board-certified MSK radiologists.
Qualifications of Experts: US board-certified MSK (Musculoskeletal) radiologists. Years of experience are not specified, but board certification implies a certain level of expertise.

4. Adjudication Method for the Test Set (Ground Truth Establishment)

Method: "Each case had been previously evaluated by a panel of three US board-certified MSK radiologists to provide ground truth binary labeling the presence or absence of fracture and the localization information for fractures." This implies a consensus-based ground truth, likely achieved through discussion and agreement among the three radiologists. The term "panel" suggests a collaborative review. No specific "2+1" or "3+1" rule is mentioned, but "panel of three" indicates a rigorous approach to consensus.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was it done?: Yes, a fully crossed multi-reader, multi-case (MRMC) retrospective reader study was done.
Effect Size of Improvement:
- AUC Improvement: Reader AUC was significantly improved from 0.84602 (unaided) to 0.89327 (aided), resulting in a difference (effect size) of 0.04725 (95% Cl: 0.03376; 0.061542) (p=0.0041).
- Sensitivity Improvement: Reader sensitivity improved from 0.86561 (unaided) to 0.9554 (aided).
- Specificity Improvement: Reader specificity improved from 0.82645 (unaided) to 0.83116 (aided).

6. Standalone (Algorithm Only) Performance Study

Was it done?: Yes, standalone performance assessments were conducted for both the pediatric population inclusion and the retrained algorithm.
- Pediatric Standalone Study: Sensitivity (0.9611), Specificity (0.8597), and AUC (0.9399) were reported.
- Retrained Algorithm Standalone Study: Non-inferiority was assessed by comparing AUCs against the predicate device's standalone performance, showing improvements or non-inferiority across body parts (e.g., Total AUC for retrained was 0.98781 vs. predicate 0.98607).

7. Type of Ground Truth Used

For Test Sets (Standalone & MRMC): Expert consensus by a panel of three US board-certified MSK radiologists. They provided binary labeling (presence/absence of fracture) and localization information (bounding boxes) for fractures. This is a form of expert consensus.

8. Sample Size for the Training Set

Sample Size: 150,000 osteoarticular radiographs.

9. How Ground Truth for the Training Set was Established

The document states that the "training dataset for the subject device was expanded to include 150,000 osteoarticular radiographs". While it confirms the size and composition (mixed adult/pediatric, osteoarticular radiographs), it does not explicitly describe how the ground truth for this training set was established. It mentions that the "previous truthed predicate test dataset was strictly walled off and not included in the new training dataset," implying that the training data was "truthed," but the method (e.g., expert review, automated labeling, etc.) is not detailed. Given the large training set size, it is common for such datasets to be curated through a combination of established clinical reports, expert review, or semi-automated processes, but the specific methodology is not provided in this summary.

Ask a Question

Ask a specific question about this device

K Number

K223491

Device Name

Critical Care Suite with Pneumothorax Detection AI Algorithm, Critical Care Suite 2.1, Critical Care Suite

Manufacturer

GE Medical Systems, LLC

Date Cleared

2023-05-25

(185 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K183182

Predicate For

N/A

Intended Use

Critical Care Suite with Pneumothorax Detection AI Algorithm is a computer-aided triage, notification, and diagnostic device that analyzes frontal chest X-ray images for the presence of a pneumothorax. Critical Care Suite identifies and highlights images with a pneumothorax to enable case prioritization or triage and assist as a concurrent reading aide during interpretation of radiographs.

Intended users include qualified independently licensed healthcare professionals (HCPs) trained to independently assess the presence of pneumothoraxes in radiographic images and radiologists.

Critical Care Suite should not be used in-lieu of full patient evaluation or solely relied upon to make or confirm a diagnosis. It is not intended to replace the review of the X-ray image by a qualified physician. Critical Care Suite is indicated for adults and Transitional Adolescents (18 to < 22 years old but treated like adults).

Device Description

Critical Care Suite is a suite of Al algorithms for the automated image analysis of frontal chest X-rays acquired on a digital x-ray system for the presence of critical findings. Critical Care Suite with Pneumothorax Detection Al Algorithm is indicated for adults and transitional adolescents (18 to <22 years old but treated like adults) and is intended to be used by licensed qualified healthcare professionals (HCPs) trained to independently assess the presence of pneumothoraxes in radiographic images and radiologists. Critical Care Suite is a software module that can be deployed on several computing platforms such as PACS, On Premise, On Cloud or X-ray Imaging Systems.

Today's clinical workflow, hospitals are overburdened by large volume of orders and long turnaround times for radiologist reports. Critical Care Suite with the Pneumothorax Detection Al Algorithm enables effective prioritization and assists in the detection / diagnosis of pneumothoraxes for radiologists and HCPs that have been trained to independently assess the presence of pneumothoraxes in radiographic images. It performs this task by flagging images with a suspicious finding and providing a localization overlay of the suspected pneumothorax as well as a graphical representation of the algorithm's confidence in the resultant finding. These outputs can be displayed wherever the reviewing physician normally conducts their reads per their standard of care, including PACS, On Premise, On Cloud and Digital Projection Radiographic Systems.

AI/ML Overview

Here's a summary of the acceptance criteria and study details for the GE Medical Systems, LLC Critical Care Suite with Pneumothorax Detection AI Algorithm, based on the provided document:

1. Table of Acceptance Criteria and Reported Device Performance

The document primarily focuses on reporting the device's performance against its own established criteria rather than explicitly listing pre-defined "acceptance criteria" tables. However, we can infer the acceptance criteria from the reported performance goals.

Metric	Acceptance Criteria (Implied from Performance)	Reported Device Performance (Standalone)	Reported Device Performance (MRMC with AI Assistance vs. Non-Aided)
Pneumothorax Detection (Standalone Algorithm)	Detect pneumothorax in frontal chest X-ray images, with high diagnostic accuracy.	AUC of 96.1% (94.9%, 97.2%)	Not Applicable
Sensitivity (Overall)	High sensitivity for overall pneumothorax detection.	84.3% (80.6%, 88.0%)	Not Applicable
Specificity (Overall)	High specificity for overall pneumothorax detection.	93.2% (90.8%, 95.6%)	Not Applicable
Sensitivity (Large Pneumothorax)	High sensitivity for large pneumothoraxes.	96.3% (93.1%, 99.2%)	Not Applicable
Sensitivity (Small Pneumothorax)	High sensitivity for small pneumothoraxes.	75.0% (69.2%, 80.8%)	Not Applicable
Pneumothorax Localization (Standalone Algorithm)	Localize suspected pneumothoraxes effectively.	Partially localized 98.1% (96.6%, 99.6%) of actual pneumothorax within an image (apical, lateral, inferior regions).	Not Applicable
	Full agreement between regions.	67.8% (62.7%, 73.0%)	Not Applicable
	Overlap with true pneumothorax area.	DICE Similarity Coefficient of 0.705 (0.683, 0.724)	Not Applicable
Reader Performance Improvement (MRMC Study)	Improve reader performance for pneumothorax detection.	Mean AUC improved by 14.5% (7.0%, 22.0%; p=.002) from 76.8% (non-aided) to 91.3% (aided).	14.5% improvement in mean AUC
Reader Sensitivity Improvement	Increase reader sensitivity.	Reader sensitivity increased by 16.3% (13.1%, 19.5%; p<.001) from 67.4% (non-aided) to 83.7% (aided).	16.3% improvement in sensitivity
Reader Specificity Improvement	Increase reader specificity.	Reader specificity increased by 12.4% (9.6%, 15.1%; p<.001) from 76.6% (non-aided) to 89.0% (aided).	12.4% improvement in specificity
Reader Performance Improvement (Large Pneumothorax)	Improve reader performance for large pneumothoraxes.	Mean AUC improved by 10.5% (3.2%, 17.8%, p=0.009). Sensitivity improved by 13.4% (10.0%, 16.9%, p<.001).	10.5% improvement in mean AUC (large); 13.4% improvement in sensitivity (large)
Reader Performance Improvement (Small Pneumothorax)	Improve reader performance for small pneumothoraxes.	Mean AUC improved by 17.6% (9.3%, 25.9%, p<0.001). Sensitivity improved by 18.7% (13.8%, 23.6%, p<.001).	17.6% improvement in mean AUC (small); 18.7% improvement in sensitivity (small)
Improvement Across User Groups	Demonstrate improvement across different clinical user types.	All physicians (Rad, IM, ER) improved 10.4% (2.8%, 17.9%, p=0.015). Nurse practitioners improved 24.1% (1.2%, 47.0%, p=0.045). Non-radiologists (ER, IM, NP) improved 17.5% (9.6%, 25.4%, p<0.001).	Varied improvements across user groups as noted.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: 804 images
Data Provenance: The test set included images from two North American sites.
Retrospective/Prospective: The document does not explicitly state if the test set was retrospective or prospective. However, given it's a "final validation ground truth dataset" that was not used in training, it's highly likely to be retrospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: Three blinded radiologists.
Qualifications of Experts: Radiologists (no specific experience level mentioned, but "blinded radiologists" implies qualified professionals).

4. Adjudication Method for the Test Set

Adjudication Method: The ground truth was established by "three blinded radiologists." This implies a consensus method, likely majority rule or a process where discrepancies were resolved to arrive at a single ground truth label. The specific phrase "consensus" or "adjudication" is not used, but the description points to this approach.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of How Much Human Readers Improve with AI vs Without AI Assistance

MRMC Study Done: Yes, a multi-reader multi-case study was conducted.
Effect Size of Human Reader Improvement with AI vs. Without AI Assistance:
- Mean AUC: Improved by 14.5% (from 76.8% non-aided to 91.3% aided; p=0.002).
- Sensitivity: Increased by 16.3% (from 67.4% non-aided to 83.7% aided; p<0.001).
- Specificity: Increased by 12.4% (from 76.6% non-aided to 89.0% aided; p<001).
- Large Pneumothorax (Mean AUC): Improved by 10.5% (p=0.009).
- Large Pneumothorax (Sensitivity): Improved by 13.4% (p<0.001).
- Small Pneumothorax (Mean AUC): Improved by 17.6% (p<0.001).
- Small Pneumothorax (Sensitivity): Improved by 18.7% (p<0.001).

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Standalone Study Done: Yes, the "standalone performance of the Pneumothorax Detection AI Algorithm was tested against this dataset."

7. The Type of Ground Truth Used

Type of Ground Truth: Expert consensus by three blinded radiologists.

8. The Sample Size for the Training Set

Sample Size for Training Set: The algorithm was developed using "over 12,000 images." This number includes images used for training, verification, and validation, but the specific breakdown for the training set alone is not provided. It's implied that the majority would be for training.

9. How the Ground Truth for the Training Set Was Established

Ground Truth for Training Set: The document states that the "Pneumothorax Detection AI Algorithm was developed using over 12,000 images from six sources, including the National Institute of Health and sites within the United States, Canada, and India." It then clarifies this data was "segregated into training, verification, and validation datasets." While it doesn't explicitly detail the methodology for establishing ground truth for the training set, it's standard practice that such large datasets for deep learning and medical imaging are meticulously annotated by medical experts (e.g., radiologists) or derived from existing clinical reports and pathology, which would then be reviewed or confirmed by experts. Given the rigor for the validation set, it's reasonable to infer a similar expert-driven process for the training data, although the specifics are not provided in this excerpt.

Ask a Question

Ask a specific question about this device

K Number

K222176

Device Name

BoneView

Manufacturer

Gleamer

Date Cleared

2023-03-02

(223 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212365

Predicate For

K242171

Intended Use

BoneView 1.1-US is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs of: Ankle, Foot, Knee, Tibia/Fibula, Wrist, Hand, Elbow, Forearm, Humerus, Shoulder, Clavicle, Pelvis, Hip, Femur, Ribs, Thoracic Spine, Lumbosacral Spine. BoneView 1.1-US is intended for use as a concurrent reading aid during the interpretation of radiographs. BoneView 1.1-US is for prescription use only.

Device Description

BoneView 1.1-US is a software-only device intended to assist clinicians in the interpretation of: . limbs radiographs of children/adolescents and . limbs, pelvis, rib cage, and dorsolumbar vertebra radiographs of adults. BoneView 1.1-US can be deployed on-premise or on cloud and be connected to several computing platforms and X-ray imaging platforms such as X-ray radiographic systems, or PACS. After the acquisition of the radiographs on the patient and their storage in the DICOM Source, the radiographs are automatically received by BoneView 1.1-US from the user's DICOM Source through an intermediate DICOM node. Once received by BoneView 1.1-US, the radiographs are automatically processed by the AI algorithm to identify regions of interest. Based on the processing result, BoneView 1.1-US generates result files in DICOM format. These result files consist of a summary table and result images (annotations on a copy of the original images or annotations to be toggled on/off). BoneView 1.1-US does not alter the original images, nor does it change the order of original images or delete any image from the DICOM Source. Once available, the result files are sent by BoneView 1.1-US to the DICOM Destination through the same intermediate DICOM node. The DICOM Destination can be used to visualize the result files provided by BoneView 1.1-US or to transfer the results to another DICOM host for visualization. The users are then as a concurrent reading aid to provide their diagnosis.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria are not explicitly stated as numerical targets in a table. Instead, the study aims to demonstrate that the device performs with "high sensitivity and high specificity" and that its performance on children/adolescents is "similar" to that on adults. For the clinical study, the acceptance criteria are implicitly that the diagnostic accuracy of readers aided by BoneView is superior to that of readers unaided.

However, the document provides the performance metrics for both standalone testing and the clinical study.

Standalone Performance (Children/Adolescents Clinical Performance Study Dataset)

Operating Point	Metric	Value (95% Clopper-Pearson CI)	Description
High-sensitivity (DOUBT FRACT)	Sensitivity	0.909 [0.889 - 0.926]	The probability that the device correctly identifies a fracture when a fracture is present. This operating point is designed to be highly sensitive to possible fractures, potentially including subtle ones, and is indicated by a dotted bounding box.
High-sensitivity (DOUBT FRACT)	Specificity	0.821 [0.796 - 0.844]	The probability that the device correctly identifies the absence of a fracture when no fracture is present.
High-specificity (FRACT)	Sensitivity	0.792 [0.766 - 0.817]	The probability that the device correctly identifies a fracture when a fracture is present. This operating point is designed to be highly specific, meaning it provides a high degree of confidence that a detected fracture is indeed a fracture, and is indicated by a solid bounding box.
High-specificity (FRACT)	Specificity	0.965 [0.952 - 0.976]	The probability that the device correctly identifies the absence of a fracture when no fracture is present.

Comparative Standalone Performance (Children/Adolescents vs. Adult)

Operating Point	Dataset	Sensitivity (95% CI)	Specificity (95% CI)	95% CI on the difference (Sensitivity)	95% CI on the difference (Specificity)
High-sensitivity (DOUBT FRACT)	Adult clinical performance study	0.928 [0.919 - 0.936]	0.811 [0.8 - 0.821]	-0.019 [-0.039 - 0.001]	0.010 [-0.016 - 0.037]
High-sensitivity (DOUBT FRACT)	Children/adolescents clinical performance	0.909 [0.889 - 0.926]	0.821 [0.796 - 0.844]
High-specificity (FRACT)	Adult clinical performance study	0.841 [0.829 - 0.853]	0.932 [0.925 - 0.939]	-0.049 [-0.079 - -0.021]	0.033 [0.019 - 0.046]
High-specificity (FRACT)	Children/adolescents clinical performance	0.792 [0.766 - 0.817]	0.965 [0.952 - 0.976]

Clinical Study Performance (MRMC - Reader Performance with/without AI assistance)

Metric	Unaided Performance (95% bootstrap CI)	Aided Performance (95% bootstrap CI)	Increase
Specificity	0.906 (0.898-0.913)	0.956 (0.951-0.960)	+5%
Sensitivity	0.648 (0.640-0.656)	0.752 (0.745-0.759)	+10.4%

2. Sample sizes used for the test set and data provenance:

Standalone Performance Test Set:
- Children/Adolescents: 2,000 radiographs (52.8% males, age range [2 – 21]; mean 11.54 +/- 4.7). The anatomical areas of interest included all those in the Indications for Use for this population group.
- Adults (cited from predicate device K212365): 8,918 radiographs (47.2% males, age range [21 – 113]; mean 52.5 +/- 19.8). The anatomical areas of interest included all those in the Indications for Use for this population group.
Clinical Study Test Set (MRMC): 480 cases (31.9% males, age range [21 – 93]; mean 59.2 +/- 16.4). These cases were from all anatomical areas of interest included in BoneView's Indications for Use.
Data Provenance: The document states "various manufacturers" (e.g., Canon, Fujifilm, GE Healthcare, Konica Minolta, Philips, Primax, Samsung, Siemens for standalone data; GE Healthcare, Kodak, Konica Minolta, Philips, Samsung for clinical study data). The general context implies a European or North American source for the regulatory submission (France for the manufacturer, FDA for the review). It is explicitly stated that these datasets were independent of training data. The studies are described as retrospective.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Clinical Study (MRMC Test Set): Ground truth was established by a panel of three U.S. board-certified radiologists. No further details on their years of experience are provided, only their certification.
Standalone Test Sets (Children/Adolescents & Adult): The document doesn't explicitly state the number or qualifications of experts used to establish ground truth for the standalone test sets. However, it indicates these datasets were used for "diagnostic performances," implying a definitive ground truth. Given the rigorous nature of FDA submissions, it's highly probable that board-certified radiologists or other qualified medical professionals established this ground truth.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set:

Clinical Study (MRMC Test Set): The ground truth was established by a panel of three U.S. board-certified radiologists. The method of adjudication (e.g., majority vote, discussion to consensus) is not explicitly detailed, but it states they "assigned a ground truth label." This strongly suggests a consensus or majority-based method from the panel of three, rather than just 2+1 or 3+1 with a tie-breaker.
Standalone Test Sets: Not explicitly stated, though a panel or consensus method is standard for robust ground truth establishment.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Yes, a fully-crossed multi-reader, multi-case (MRMC) retrospective reader study was conducted.
Effect Size of Improvement with AI Assistance:
- Specificity: Improved by +5% (from 0.906 unaided to 0.956 aided).
- Sensitivity: Improved by +10.4% (from 0.648 unaided to 0.752 aided).
- The study found that "the diagnostic accuracy of readers in the intended use population is superior when aided by BoneView than when unaided by BoneView."
- Subgroup analysis also found that "Sensitivity and Specificity were higher for Aided reads versus Unaided reads for all of the anatomical areas of interest."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, standalone performance testing was conducted for both the children/adolescent population and the adult population (the latter referencing the predicate device's data). The results are provided in the tables under section 1.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

Expert Consensus: The ground truth for the clinical MRMC study was established by a "panel of three U.S. board-certified radiologists who assigned a ground truth label indicating the presence of a fracture and its location." For the standalone testing, although not explicitly stated, it is commonly established by expert interpretation of the radiographs, often through consensus, to determine the presence or absence of fractures.

8. The sample size for the training set:

The training of BoneView was performed on a training dataset of 44,649 radiographs, representing 151,096 images. This dataset covered all anatomical areas of interest in the Indications for Use and was sourced from various manufacturers.

9. How the ground truth for the training set was established:

The document implies that the "training was performed on a training dataset... for all anatomical areas of interest." While it doesn't explicitly state how ground truth was established for this massive training set, it is standard practice for medical imaging AI that ground truth for training data is established through expert annotation (e.g., radiologists, orthopedic surgeons) of the images, typically through a labor-intensive review process.

Ask a Question

Ask a specific question about this device

K Number

K220164

Device Name

Rayvolve

Manufacturer

AZmed SAS

Date Cleared

2022-06-02

(133 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K193417

Predicate For

K240845

Intended Use

Rayvolve is a computer-assisted detection and diagnosis (CAD) software device to assist radiologists and emergency physicians in detecting fractures during the review of radiographs of the musculoskeletal system. Rayvolve is indicated for adults only (≥ 22 years old). Rayvolve is indicated for radiographs of the following industry-standard radiographic views and study types.

Study Type (Anatomic Area of interest) / Radiographic Views supported: Ankle / Frontal, Lateral,Oblique Clavicle / Frontal Elbow / Frontal, Lateral Forearm / Frontal, Lateral Hip / Frontal, Frog Leg Lateral Humerus / Frontal, Lateral Knee / Frontal, Lateral Pelvis / Frontal Shoulder / Frontal, Lateral, Axillary Tibia/fibula / Frontal. Lateral Wrist / Frontal, Lateral, Oblique Hand / Frontal, Lateral Foot / Frontal, Lateral

*For the purposes of this table, "Frontal" is considered inclusive of both posteroanterior (PA) and anteroposterior (AP) views.

+Definitions of anatomic area of interest and radiographic views are consistent with the American College of Radiology (ACR) standards and guidelines.

Device Description

The medical device is called Rayvolve. It is a standalone software that uses deep learning techniques to detect and localize fractures on osteoarticular X-rays. Rayvolve is intended to be used as an aided-diagnosis device and does not operate autonomously. It is intended to work in combination with Picture Archiving and communication system (PACS) servers. When remotely connected to a medical center PACS server, Rayvolve directly interacts with the DICOM files to output the prediction (potential presence of fracture). Rayvolve does not intend to replace medical doctors. The instructions for use are strictly and systematically transmitted to each user and used to train them on Rayvolve's use.

AI/ML Overview

Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criterion (Primary Endpoint)	Reported Device Performance	Study Type
Standalone Study: Characterize the detection accuracy of Rayvolve for detecting adult patient fractures (AUC, Sensitivity, Specificity)	AUC: 0.98607 (95% CI: 0.98104; 0.99058) Sensitivity: 0.98763 (95% CI: 0.97559; 0.99421) Specificity: 0.88558 (95% CI: 0.87119; 0.89882)	Standalone Bench Testing
MRMC Study: Diagnostic accuracy of readers aided by Rayvolve is superior to unaided readers (AUC of ROC curve comparison). H0: T-test for p (no statistical difference) > 0.05; H1: T-Test for p (statistical difference) < 0.05	Reader AUC significantly improved from 0.84602 (unaided) to 0.89327 (aided), a difference of -0.04725 (95% CI: 0.03376; 0.061542), p=0.0041 (indicating superiority of aided reads).	Clinical Reader Study (MRMC)

Secondary Endpoints (Standalone Study): Demonstrate Rayvolve's ability to perform across different subgroup variables (gender, age, anatomic region, machine acquisition, machine view, weight-bearing, complex & uncommon cases).
Reported Performance: Rayvolve performs with high accuracy across study types (including anatomic areas of interest, views, patient age and sex, and machine) and across potential confounders such as different X-ray manufacturers. Specific AUCs for various subgroups are provided in the document and demonstrate high performance.

Secondary Endpoints (MRMC Study): Report the sensitivity and specificity of Rayvolve-aided and unaided reads.
Reported Performance:

Unaided Sensitivity: 0.86561 (95% CI: 0.84859, 0.88099)
Aided Sensitivity: 0.9554 (95% CI: 0.94453, 0.96422)
Unaided Specificity: 0.82645 (95% CI: 0.81187, 0.84012)
Aided Specificity: 0.83116 (95% CI: 0.81673, 0.84467)

2. Sample Size Used for the Test Set and Data Provenance

Standalone Test Set Size: 2626 radiographs
- Provenance: Data were acquired from 4 sites in the US.
MRMC Test Set Size: 186 cases (equivalent to 186 patients)
- Provenance: Data were acquired from 4 sites in the US. All radiographs in the validation study were independent of the training data.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

Number of Experts: Three US board-certified MSK radiologists.
Qualifications: "US board-certified MSK radiologists." The document does not specify their years of experience.

4. Adjudication Method for the Test Set

MRMC Study Ground Truth Adjudication: A panel of three US board-certified MSK radiologists reviewed each case to provide ground truth binary labeling (presence or absence of fracture) and localization information. While not explicitly stated as "2+1" or "3+1", the use of a panel of three experts strongly suggests a consensus-based adjudication, where if at least two agreed, that would likely be the accepted ground truth.

5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study was Done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Yes, an MRMC comparative effectiveness study was done.
Effect Size of Improvement with AI Assistance:
- Reader AUC significantly improved by 0.04725 (from 0.84602 to 0.89327).
- Reader sensitivity improved by 0.08979 (from 0.86561 to 0.9554).
- Reader specificity improved by 0.00471 (from 0.82645 to 0.83116).

6. If a Standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance assessment was conducted.
- The results are detailed in the table above: AUC of 0.98607, sensitivity of 0.98763, and specificity of 0.88558.

7. The type of ground truth used

Expert Consensus: For both standalone (bench testing) and MRMC (clinical data), the ground truth was established by human experts, specifically "a panel of three US board-certified MSK radiologists" for the MRMC study, who provided binary labeling indicating the presence or absence of fracture and localization information.

8. The sample size for the training set

The document states: "The dataset used to develop the Rayvolve deep learning algorithm is composed of labeled osteoarticular radiographs." and "Rayvolve training set has been established before the collection of the standalone and MRMC studies data."
However, the exact sample size for the training set is not explicitly provided in the given text.

9. How the ground truth for the training set was established

The document states: "The dataset used to develop the Rayvolve deep learning algorithm is composed of labeled osteoarticular radiographs."
Similar to the test sets, it is implied that the ground truth for the training set was established through expert labeling, given the nature of a "labeled" dataset for deep learning in medical image analysis. However, the specific process or number/qualifications of experts for the training set ground truth are not explicitly detailed in the provided text.

Ask a Question

Ask a specific question about this device

K Number

K212365

Device Name

BoneView

Manufacturer

Gleamer

Date Cleared

2022-03-01

(214 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K193417

Predicate For

K222176,K223491

Intended Use

BoneView is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs of:

Study Type (Anatomical Area of Interest)	Compatible Radiographic View(s)
Ankle	Frontal, Lateral, Oblique
Foot	Frontal, Lateral, Oblique
Knee	Frontal, Lateral
Tibia/Fibula	Frontal, Lateral
Femur	Frontal, Lateral
Wrist	Frontal, Lateral, Oblique
Hand	Frontal, Oblique
Elbow	Frontal, Lateral
Forearm	Frontal, Lateral
Humerus	Frontal, Lateral
Shoulder	Frontal, Lateral, Axillary
Clavicle	Frontal
Pelvis	Frontal
Hip	Frontal, Frog Leg Lateral
Ribs	Frontal Chest, Rib series
Thoracic Spine	Frontal, Lateral
Lumbosacral Spine	Frontal, Lateral

BoneView is intended for use as a concurrent reading aid during the interpretations of radiographs. BoneView is for prescription use only and is indicated for adults only.

Device Description

BoneView is intended to analyze radiographs using machine learning techniques to identify and highlight fractures during the review of radiographs.

BoneView can be deployed on-premises or on cloud and be connected to several computing platforms and X-ray imaging platforms such as X-ray radiographic systems, or PACS. More precisely, BoneView can be deployed:

In the cloud with a PACS as the DICOM Source
. On-premises with a PACS as the DICOM Source
On-premises with an X-ray system as the DICOM Source

After the acquisition of the radiographs on the patient and their storage in the DICOM Source, the radiographs are automatically received by BoneView from the user's DICOM Source through an intermediate DICOM node (for example, a specific Gateway, or a dedicated API). The DICOM Source can be the user's image storage system (for example, the Picture Archiving and Communication System, or PACS), or other radiological equipment (for example X-ray systems).

Once received by BoneView, the radiographs are automatically processed by the AI algorithm to identify regions of interest. Based on the processing result, BoneView generates result files in DICOM format. These result files consist of a summary table and result images (annotations on a copy of the original images or annotations to be toggled on/off). BoneView does not alter the original images, nor does it change the order of original images or delete any image from the DICOM Source.

Once available, the result files are sent by BoneView to the DICOM Destination through the same intermediate DICOM node. Similar to the DICOM Source, the DICOM Destination can be the user's image storage system (for example, the Picture Archiving and Communication System, or PACS), or other radiological equipment (for example X-ray systems). The DICOM Source and the DICOM Destination are not necessarily identical.

The DICOM Destination can be used to visualize the result files provided by BoneView or to transfer the results to another DICOM host for visualization. The users are then able to use them as a concurrent reading aid to provide their diagnosis.

The general layout of images processed by BoneView is comprising:

(1) The "summary table" – it is a first image that is derived from the detected regions of interest in the following result images and that displays the results of the overall study along with the Gleamer – BoneView logo. This summary can be configured to be present or not.

(2) The result images – they are provided for all the images that were processed by BoneView and contain:

. Around the Regions of Interest (if any), a rectangle with a solid or dotted line depending on the confidence of the algorithm (see below)
. Around the entire image, a white frame showing that the images were processed by BoneView
. Below the image:
- o The Gleamer BoneView logo
- o The number of Regions of interest that are displayed in the result image
- (if any) The caution message if it was identified that the image was not part of o the indication for use of BoneView

The training of BoneView was performed on a training dataset of 44,649 radiographs, representing 151,096 images (52.4% of males, with age: range [0 – 109]; mean 42.4 +/- 24.6) for all anatomical areas of interest in the Indications for Use and from various manufacturers. BoneView has been designed to solve the problem of missed fractures including subtle fractures, and thus detects fractures with a high sensitivity. In this regard, the display of findings is triggered by a "high-sensitivity operating point" (DOUBT FRACT) that will enable the display of a dotted-line bounding box around the region of interest. Additionally, the users need to be confident that when BoneView identifies a fracture, it is actually a fracture. In this regard, an additional information is introduced to the user with a "high-specificity operating point" (FRACT).

These two operating points are implemented in the User Interface as follow:

Dotted-line Bounding Box: suspicious area / subtle fracture (when the level of . confidence of the Al algorithm associated with the finding is above "high-sensitivity operating point" and below "high-specificity operating point") displayed as a dotted bounding box around the area of interest
. Solid-line Bounding Box: definite or unequivocal fractures (when the level of confidence of the AI algorithm associated with the finding is above "high-specificity operating point") displayed as a solid bounding box around the area of interest
BoneView can provide 4 levels of results:
. FRACT: BoneView identified at least one solid-line bounding box on the result images,
. DOUBT FRACT: BoneView did not identify any solid-line bounding box on the result images but it identified at least one dotted-line bounding box in the result images,
. NO FRACT: BoneView did not identify any bounding box at all in the result images,
NOT AVAILABLE: BoneView identified that the original images are out of its Indications for Use

AI/ML Overview

Here's a summary of the acceptance criteria and the study that proves the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly present a table of acceptance criteria (i.e., predefined thresholds that the device must meet). Instead, it shows the reported performance of the device from standalone testing and a clinical study. I will present the reported performance, which implicitly are the metrics used to demonstrate effectiveness.

Standalone Performance (High-Sensitivity Operating Point - DOUBT FRACT):

Metric	Global Performance (95% CI)
Specificity	0.811 [0.8 - 0.821]
Sensitivity	0.928 [0.919 - 0.936]

Standalone Performance (High-Specificity Operating Point - FRACT):

Metric	Global Performance (95% CI)
Specificity	0.932 [0.925 - 0.939]
Sensitivity	0.841 [0.829 - 0.853]

Clinical Study (Reader Performance with AI vs. Without AI Assistance):

Metric	Unaided (95% CI)	Aided (95% CI)
Specificity	0.906 [0.898-0.913]	0.956 [0.951-0.960]
Sensitivity	0.648 [0.640-0.656]	0.752 [0.745-0.759]

2. Sample Sizes Used for the Test Set and Data Provenance

Standalone Performance Test Set:
- Sample Size: 8,918 radiographs (n(positive)=3,886, n(negative)=5,032).
- Data Provenance: The dataset was independent of the data used for model training and establishment of device operating points. It included full anatomical areas of interest for adults (age range [21-113]; mean 52.5 +/- 19.8, 47.2% males). Images were sourced from various manufacturers (Agfa, Fujifilm, GE Healthcare, Kodak, Konica Minolta, Philips, Primax, Samsung, Siemens). No specific country of origin is mentioned, but the variety of manufacturers suggests a diverse dataset. The study description implies it's a retrospective analysis of existing radiographs.
Clinical Study (MRMC) Test Set:
- Sample Size: 480 cases (31.9% males, age range [21-93]; mean 59.2 +/- 16.4). It covered all anatomical areas of interest listed in BoneView's Indications for Use.
- Data Provenance: The dataset was independent of the data used for model training and establishment of device operating points. Images were from various manufacturers (GE Healthcare, Kodak, Konica Minolta, Philips, Samsung). The study implies it's a retrospective analysis of existing radiographs.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Their Qualifications

Standalone Performance Test Set: The document does not explicitly state how the ground truth was established for the standalone test set (e.g., number of experts). However, given the nature of the clinical study, it's highly probable that similar expert review was used.
Clinical Study (MRMC) Test Set:
- Number of Experts: A panel of three experts.
- Qualifications: U.S. board-certified radiologists. The document does not specify their years of experience.

4. Adjudication Method for the Test Set

Clinical Study (MRMC) Test Set: Ground truth was assigned by a panel of three U.S. board-certified radiologists. The method implies a consensus or majority rule (e.g., 2+1 or 3+1), as a "ground truth label indicating the presence or absence of a fracture and its location" was assigned per case. The specific adjudication method (e.g., majority vote, independent reads then consensus) is not detailed, but the use of a panel suggests a robust method to establish ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Yes, an MRMC study was done.
Effect Size of Human Readers' Improvement with AI vs. Without AI Assistance (based on the reported deltas):
- Specificity Improvement: +5% increase (from 0.906 unaided to 0.956 aided).
- Sensitivity Improvement: +10.4% increase (from 0.648 unaided to 0.752 aided).
- The study found that "the diagnostic accuracy of readers...is superior when aided by BoneView than when unaided."

6. Standalone (Algorithm Only) Performance

Yes, a standalone performance study was done.
The results are detailed in the "Bench Testing" section (7.4) and summarized in the table above for both "high-sensitivity operating point" and "high-specificity operating point." This evaluation used 8,918 radiographs and assessed the detection of fractures with high sensitivity and high specificity.

7. Type of Ground Truth Used

For the Clinical Study (MRMC) and likely for the Standalone Test Set: Expert consensus (a panel of three U.S. board-certified radiologists assigned the ground truth label for presence or absence and location of a fracture).

8. Sample Size for the Training Set

Training Set Sample Size: 44,649 radiographs, representing 151,096 images.
Patient Demographics for Training Set: 52.4% males, age range [0-109]; mean 42.4 +/- 24.6.
The training data covered "all anatomical areas of interest in the Indications for Use and from various manufacturers."

9. How the Ground Truth for the Training Set Was Established

The document states that the training of BoneView was performed on this dataset. However, it does not explicitly detail how the ground truth for this training set was established. It is implied that fractures were somehow labeled for the supervised deep learning methodology, but the process (e.g., specific number of radiologists, their qualifications, adjudication method) is not described for the training data.

Ask a Question

Ask a specific question about this device

K Number

K193417

Device Name

FractureDetect (FX)

Manufacturer

Imagen Technologies, Inc.

Date Cleared

2020-07-30

(234 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN180005

Predicate For

K212365,K220164

Intended Use

FractureDetect (FX) is a computer-assisted detection and diagnosis (CAD) software device to assist clinicians in detecting fractures during the review of radiographs of the musculoskeletal system. FX is indicated for adults only.

FX is indicated for radiographs of the following industry-standard radiographic views and study types.

Study Type(Anatomic Areaof Interest⁺)	Radiographic View(s)Supported*
Ankle	Frontal, Lateral, Oblique
Clavicle	Frontal
Elbow	Frontal, Lateral
Femur	Frontal, Lateral
Forearm	Frontal, Lateral
Hip	Frontal, Frog Leg Lateral
Humerus	Frontal, Lateral
Knee	Frontal, Lateral
Pelvis	Frontal
Shoulder	Frontal, Lateral, Axillary
Tibia / Fibula	Frontal, Lateral
Wrist	Frontal, Lateral, Oblique

*For the purposes of this table, "Frontal" is considered inclusive of both posteroanterior (PA) and anteroposterior (AP) views.

+Definitions of anatomic area of interest and radiographic views are consistent with the American College of Radiology (ACR) standards and guidelines.

Device Description

FractureDetect (FX) is a computer-assisted detection and diagnosis (CAD) software device designed to assist clinicians in detecting fractures during the review of commonly acquired adult radiographs. FX does this by analyzing radiographs and providing relevant annotations, assisting clinicians in the detection of fractures within their diagnostic process at the point of care. FX was developed using robust scientific principles and industry-standard deep learning algorithms for computer vision.

FX creates, as its output, a DICOM overlay with annotations indicating the presence or absence of fractures. If any fracture is detected by FX, the output overlay is composed to include the text annotation "Fracture: DETECTED" and to include one or more bounding boxes surrounding any fracture site(s). If no fracture is detected by FX, the output overlay is composed to include the text annotation "Fracture: NOT DETECTED" and no bounding box is included. Whether or not a fracture is detected, the overlay includes a text annotation identifying the radiograph as analyzed by FX and instructions for users to access labeling. The FX overlay can be toggled on or off by the clinicians within their PACS viewer, allowing for uninhibited concurrent review of the original radiograph.

AI/ML Overview

Here's a detailed breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Device Performance

Acceptance Criteria	Reported Device Performance
Standalone Performance
Overall Sensitivity	0.951 (95% Wilson's CI: 0.940, 0.960)
Overall Specificity	0.893 (95% Wilson's CI: 0.886, 0.898)
Overall Area Under the Curve (AUC)	0.982 (95% Bootstrap CI: 0.9790, 0.9850)
AUC per Study Type: Ankle	0.983 (0.972, 0.991)
AUC per Study Type: Clavicle	0.962 (0.948, 0.975)
AUC per Study Type: Elbow	0.964 (0.940, 0.982)
AUC per Study Type: Femur	0.989 (0.983, 0.994)
AUC per Study Type: Forearm	0.987 (0.977, 0.995)
AUC per Study Type: Hip	0.982 (0.962, 0.995)
AUC per Study Type: Humerus	0.983 (0.974, 0.991)
AUC per Study Type: Knee	0.996 (0.993, 0.998)
AUC per Study Type: Pelvis	0.982 (0.973, 0.989)
AUC per Study Type: Shoulder	0.962 (0.938, 0.982)
AUC per Study Type: Tibia / Fibula	0.994 (0.991, 0.997)
AUC per Study Type: Wrist	0.992 (0.988, 0.996)
MRMC Comparative Effectiveness (Reader Performance with AI vs. without AI)
Reader AUC (FX-Aided) vs. (FX-Unaided)	Improved from 0.912 to 0.952, a difference of 0.0406 (95% CI: 0.0127, 0.0685) (p=.0043)
Reader Sensitivity (FX-Aided) vs. (FX-Unaided)	Improved from 0.819 (95% Wilson's CI: 0.794, 0.842) to 0.900 (95% Wilson's CI: 0.880, 0.917)
Reader Specificity (FX-Aided) vs. (FX-Unaided)	Improved from 0.890 (95% Wilson's CI: 0.879, 0.900) to 0.918 (95% Wilson's CI: 0.908, 0.927)

Study Details

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size:
- Standalone Study: 11,970 radiographs.
- MRMC Reader Study: 175 cases.
Data Provenance: Not explicitly stated, but the experts establishing ground truth are specified as U.S. board-certified, suggesting the data is likely from the U.S. There is no indication whether the data was retrospective or prospective, but for an FDA submission of this nature, historical retrospective data is common.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: A panel of three experts was used for the MRMC study's ground truth.
Qualifications: "U.S. board-certified orthopedic surgeons or U.S. board-certified radiologists." Specific years of experience are not mentioned.

4. Adjudication Method for the Test Set

Adjudication Method: A "panel of three" experts assigned a ground truth binary label (presence or absence of fracture). This implies a consensus-based adjudication. While not explicitly stated (e.g., 2-out-of-3, or further adjudication if there was disagreement), the phrasing suggests a collective agreement to establish the "ground truth." This is analogous to a 3-expert consensus, where the majority rules.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was an MRMC study done? Yes.
Effect Size (Improvement with AI vs. without AI assistance):
- Readers' AUC significantly improved by 0.0406 (from 0.912 to 0.952).
- Readers' sensitivity improved by 0.081 (from 0.819 to 0.900).
- Readers' specificity improved by 0.028 (from 0.890 to 0.918).

6. Standalone (Algorithm Only) Performance Study

Was a standalone study done? Yes.
Performance:
- Sensitivity: 0.951
- Specificity: 0.893
- Overall AUC: 0.982
- High accuracy across study types and potential confounders (image brightness, x-ray manufacturers).

7. Type of Ground Truth Used

Standalone Study: The ground truth for the standalone study is not explicitly detailed but given the MRMC study, it's highly probable it also leveraged expert consensus, similar to the MRMC setup, for fracture detection.
MRMC Study: Expert Consensus by a panel of three U.S. board-certified orthopedic surgeons or U.S. board-certified radiologists.

8. Sample Size for the Training Set

The document does not explicitly state the sample size for the training set. It only mentions "robust scientific principles and industry-standard deep learning algorithms for computer vision" were used for development.

9. How the Ground Truth for the Training Set Was Established

The document does not explicitly describe how the ground truth for the training set was established. It only mentions "Supervised Deep Learning" as the methodology, which implies labeled data was used for training, but the process of obtaining these labels is not detailed.

Ask a Question

Ask a specific question about this device

K Number

DEN180005

Device Name

OsteoDetect

Manufacturer

Imagen Technologies, Inc.

Date Cleared

2018-05-24

(108 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K142823,K161042,K180800

Predicate For

N/A

Intended Use

OsteoDetect analyzes wrist radiographs using machine learning techniques to identify and highlight distal radius fractures during the review of posterior-anterior (PA) and lateral (LAT) radiographs of adult wrists.

Device Description

OsteoDetect is a software device designed to assist clinicians in detecting distal radius fractures during the review of posterior-anterior (PA) and lateral (LAT) radiographs of adult wrists. The software uses deep learning techniques to analyze wrist radiographs (PA and LAT views) for distal radius fracture in adult patients.

AI/ML Overview

1. Table of Acceptance Criteria and Reported Device Performance

Standalone Performance

Performance Metric	Acceptance Criteria (Implicit)	Reported Device Performance (Estimate)	95% Confidence Interval
AUC of ROC	High	0.965	(0.953, 0.976)
Sensitivity	High	0.921	(0.886, 0.946)
Specificity	High	0.902	(0.877, 0.922)
PPV	High	0.813	(0.769, 0.850)
NPV	High	0.961	(0.943, 0.973)
Localization Accuracy (average pixel distance)	Small	33.52 pixels	Not provided for average distance itself, but standard deviation of 30.03 pixels.
Generalizability (AUC for all subgroups)	High	≥ 0.926 (lowest subgroup - post-surgical radiographs)	Not explicitly provided for all, but individual subgroup CIs available in text.

MRMC (Reader Study) Performance - Aided vs. Unaided Reads

Performance Metric	Acceptance Criteria (Implicit: Superiority of Aided)	Reported Device Performance (OD-Aided)	Reported Device Performance (OD-Unaided)	95% Confidence Interval (OD-Aided)	95% Confidence Interval (OD-Unaided)	p-value for difference
AUC of ROC	AUC_aided - AUC_unaided > 0	0.889	0.840	Not explicitly given for AUCs themselves, but difference CI: (0.019, 0.080)	Not explicitly given for AUCs themselves, but difference CI: (0.019, 0.080)	0.0056
Sensitivity	Superior Aided	0.803	0.747	(0.785, 0.819)	(0.728, 0.765)	Not explicitly given for individual metrics, but non-overlapping CIs imply significance.
Specificity	Superior Aided	0.914	0.889	(0.903, 0.924)	(0.876, 0.900)	Not explicitly given for individual metrics, but non-overlapping CIs imply significance.
PPV	Superior Aided	0.883	0.844	(0.868, 0.896)	(0.826, 0.859)	Not explicitly given for individual metrics, but non-overlapping CIs imply significance.
NPV	Superior Aided	0.853	0.814	(0.839, 0.865)	(0.800, 0.828)	Not explicitly given for individual metrics, but non-overlapping CIs imply significance.

2. Sample Size and Data Provenance for Test Set

Standalone Performance Test Set:

Sample Size: 1000 images (500 PA, 500 LAT)
Data Provenance: Retrospective. Randomly sampled from an existing validation database of consecutively collected images from patients receiving wrist radiographs at the (b) (4) from November 1, 2016 to April 30, 2017. The study population included images from the US.

MRMC (Reader Study) Test Set:

Sample Size: 200 cases.
Data Provenance: Retrospective. Randomly sampled from the same validation database used for the standalone performance study. The data includes cases from the US.

3. Number of Experts and Qualifications for Ground Truth

Standalone Performance Test Set and MRMC (Reader Study) Test Set:

Number of Experts: Three.
Qualifications: U.S. board-certified orthopedic hand surgeons.

4. Adjudication Method for Test Set

Standalone Performance Test Set:

Adjudication Method (Binary Fracture Presence/Absence): Majority opinion of at least 2 of the 3 clinicians.
Adjudication Method (Localization - Bounding Box): The union of the bounding box of each clinician identifying the fracture.

MRMC (Reader Study) Test Set:

Adjudication Method: Majority opinion of three U.S. board-certified orthopedic hand surgeons. (Note: this was defined on a per-case basis, considering PA, LAT, and oblique images if available).

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was an MRMC study done? Yes.
Effect Size (Improvement of Human Readers with AI vs. without AI assistance):
- The least squares mean difference between the AUC for OsteoDetect-aided and OsteoDetect-unaided reads is 0.049 (95% CI, (0.019, 0.080)). This indicates a statistically significant improvement in diagnostic accuracy (AUC) of 4.9 percentage points when readers were aided by OsteoDetect.
- Sensitivity: Improved from 0.747 (unaided) to 0.803 (aided), an improvement of 0.056.
- Specificity: Improved from 0.889 (unaided) to 0.914 (aided), an improvement of 0.025.

6. Standalone (Algorithm Only) Performance Study

Was a standalone study done? Yes.

7. Type of Ground Truth Used

Standalone Performance Test Set:

Type of Ground Truth: Expert consensus (majority opinion of three U.S. board-certified orthopedic hand surgeons).

MRMC (Reader Study) Test Set:

Type of Ground Truth: Expert consensus (majority opinion of three U.S. board-certified orthopedic hand surgeons).

8. Sample Size for Training Set

The document does not explicitly state the sample size for the training set. It mentions "randomly withheld subset of the model's training data" for setting the operating point, implying a training set existed, but its size is not provided.

9. How Ground Truth for Training Set Was Established

The document does not explicitly state how the ground truth for the training set was established. It only refers to a "randomly withheld subset of the model's training data" during the operating point setting.

Ask a Question

Ask a specific question about this device

Page 1 of 1