Search Results

EchoConfidence is Software as a Medical Device (SaMD) that displays images from a Transthoracic Echocardiogram, and assists the user in reviewing the images, making measurements and writing a report.

The intended medical indication is for patients requiring review or analysis of their echocardiographic images acquired for their cardiac anatomy, structure and function. This includes automatic view classification; segmentation of cardiac structures including the left and right ventricle, chamber walls, left and right atria and great vessels; measures of cardiac function; and Doppler assessments.

The intended patient population is both healthy individuals and patients in whom an underlying cardiac disease is known or suspected; the intended patient age range is for adults (>= 22 years old) and adolescent in the age range 18 – 21 years old.

Device Description

EchoConfidence is Software as a Medical Device (SaMD) that displays images from a Transthoracic Echocardiogram, and assists the user in reviewing the images, making measurements and writing a report.

AI/ML Overview

Here's an analysis of the provided FDA 510(k) clearance letter for EchoConfidence (USA), incorporating all the requested information:

Acceptance Criteria and Device Performance Study for EchoConfidence (USA)

The EchoConfidence (USA) device, a Software as a Medical Device (SaMD) for reviewing, measuring, and reporting on Transthoracic Echocardiogram images, underwent a clinical evaluation to demonstrate its performance against predefined acceptance criteria.

1. Acceptance Criteria and Reported Device Performance

The primary acceptance criteria for EchoConfidence were based on the "mean absolute error" (MAE) of the AI's measurements compared to three human experts. The reported performance details indicate that the device met these criteria.

Acceptance Criteria Category	Acceptance Criteria	Reported Device Performance
Primary Criteria (AI vs. Human Expert MAE)	The upper 95% confidence interval of the difference between the MAE of the AI (against 3 human experts) and the MAE of the 3 human experts (against each other) must be less than +25%.	In the majority of cases, the point estimate (of the difference between AI MAE and human expert MAE) was substantially below 0% (indicating the AI agrees with humans more than they agree with each other). The reporting consistently showed that the upper 95% confidence interval was <0%, and well below the +25% criterion standard.
Subgroup Analysis (Consistency)	The performance criteria should be met across various demographic and technical subgroups to ensure robust and generalizable performance.	Across 20 subgroups (by age, gender, ethnicity, cardiac pathologies, ultrasound equipment vendor/model, year of scan, and qualitative image quality), the finding was consistent: the point estimation showed the AI agreed with human experts better than the humans agreed with themselves, and the upper 95% confidence interval was <0% and well below the +25% criterion.

2. Sample Size and Data Provenance

Test Set Sample Size: 200 echocardiographic cases from 200 different patients.
Data Provenance: All cases were delivered via a US Echocardiography CoreLab. The data used for validation was derived from non-public, US-based sources and was kept on servers controlled by the CoreLab, specifically to prevent it from entering the training dataset. The study was retrospective.

3. Number and Qualifications of Experts for Ground Truth

Number of Experts: Three (3) human experts.
Qualifications of Experts: The experts were US accredited and US-based, employed by the US CoreLab that supplied the data. While specific years of experience are not mentioned, their accreditation and employment by a CoreLab imply significant expertise in echocardiography and clinical measurements.

4. Adjudication Method for the Test Set

The ground truth was established by having each of the three human experts independently perform the measurements for each echocardiogram, as if for clinical use. A physician then reviewed and adjusted, if needed, approximately 10% of the measurements. This could be interpreted as a form of a 3-expert reading with a final physician review/adjudication for a subset of cases. The primary analysis method, however, preserved the individual measurements of each expert rather than averaging them, by comparing the AI's MAE to each expert's measurements and then comparing inter-expert MAE.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

The provided text does not explicitly describe a MRMC comparative effectiveness study where human readers' performance with AI assistance is compared to their performance without AI assistance to measure improvement (effect size). The study rather focuses on comparing the AI's performance to human experts directly, and comparing inter-human expert variability. The device is described as assisting the user in reviewing images, making measurements, and writing reports, suggesting a human-in-the-loop application, but a specific MRMC study measuring reader improvement with AI assistance is not detailed.

6. Standalone (Algorithm Only) Performance

Yes, a standalone performance study was done. The primary acceptance criteria directly evaluate the "mean absolute error" (MAE) of the AI against the 3 human expert reads. This directly assesses the algorithm's performance in generating measurements without human intervention during the measurement process, assuming the output measurements are directly from the AI. The comparison with inter-expert variability helps contextualize this standalone AI performance.

7. Type of Ground Truth Used

The ground truth used was expert consensus / expert measurements. The process involved three human experts independently performing measurements, with a physician reviewing and potentially adjusting ~10% of these measurements. This establishes a "clinical expert gold standard" based on their interpretation and measurement.

8. Sample Size for the Training Set

The sample size for the training set is not explicitly stated in the provided document. It only mentions that the dataset used for development and internal testing was derived from a separate source and was not from the US-based CoreLab that provided the validation data.

9. How Ground Truth for the Training Set Was Established

The method for establishing ground truth for the training set is not explicitly described in the provided document. It only states that the development dataset was separate from the validation dataset and that within the development dataset, source patients were specifically tagged as being used for either training or internal testing.

Ask a Question

Ask a specific question about this device

Page 1 of 1