K Number
K240013
Manufacturer
Date Cleared
2024-09-23

(265 days)

Product Code
Regulation Number
870.2200
Panel
CV
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

EchoGo Heart Failure 2.0 is an automated machine learning-based decision support system, indicated as a diagnostic aid for patients undergoing routine functional cardiovascular assessment using echocardiography. When utilised by an interpreting clinician, this device provides information that may be useful in detecting heart failure with preserved ejection fraction (HFpEF).

EchoGo Heart Failure 2.0 is indicated in adult populations over 25 years of age. Patient management decisions should not be made solely on the results of the EchoGo Heart Failure 2.0 analysis.

EchoGo Heart Failure 2.0 takes as input an apical 4-chamber view of the heart that has been captured and assessed to have an ejection fraction ≥50%.

Device Description

EchoGo Heart Failure 2.0 takes as input a 2D echocardiogram of an apical four chamber tomographic view and reports as output a binary classification suggestive of the presence, or absence of heart failure with preserved ejection fraction (HFpEF). EchoGo Heart Failure 2.0 also provides users with an EchoGo Score ranging from 0 to 100% to support the binary classification. The EchoGo Score informs the binary classification when referenced against the pre-determined decision threshold (50%).

To aid in the interpretation of the EchoGo Score, a comparative visual analysis is provided. A histogram format displays the reported EchoGo Score output against a population of patients with known disease status (Independent Testing Dataset). This allows the user to interpret the EchoGo Score relative to the decision threshold of 50%.

EchoGo Heart Failure 2.0 should receive an input echocardiogram acquired without contrast and contain at least one full cardiac cycle.

EchoGo Heart Failure 2.0 is fully automated and does not comprise a graphical user interface.

EchoGo Heart Failure 2.0 is intended to be used by an interpreting clinician as an aid to diagnosis for HFpEF. The ultimate diagnostic decision remains the responsibility of the interpreting clinician using patient presentation, medical history, and the results of available diagnostic tests, one of which may be EchoGo Heart Failure 2.0.

EchoGo Heart Failure 2.0 is a prescription only device.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

CriteriaAcceptance LimitReported Device Performance
I. Device Performance (Sensitivity & Specificity)Implicit within reporting of performance: The device must demonstrate sufficient sensitivity and specificity for detecting HFpEF as a diagnostic aid. The specific acceptance limits are not explicitly stated as numerical thresholds but are demonstrated by the reported performance being "substantively equivalent to the predicate device and met pre-specified levels of performance."Sensitivity: 90.3% (95% CI: 88.5, 92.4%) when removing "no classification" studies. 84.9% (95% CI: 83.0, 87.5%) when including "no classification" studies. Specificity: 86.1% (95% CI: 83.4, 88.3%) when removing "no classification" studies. 78.6% (95% CI: 75.3, 81.1%) when including "no classification" studies.
II. Accuracy of EchoGo Score (AUROC & Goodness-of-Fit)Implicit within reporting of performance: The EchoGo Score must be accurate and align with known and expected proportions of HFpEF. Statistical significance (p-value > 0.05) for the Hosmer-Lemeshow Test and a sufficiently high AUROC are expected.Area Under the Receiver Operating Characteristic Curve (AUROC): 0.947 (95% CI: 0.934, 0.958) when removing "no classification" studies. 0.937 (95% CI: 0.924, 0.949) when considering all studies. Hosmer-Lemeshow Test for goodness-of-fit: p=0.304 (not significant, indicating acceptable fit).
III. Proportion of Non-Diagnostic OutputsA priori acceptance limits: The proportion of "no classification" outputs must be within pre-specified limits (the exact numerical limit is not provided, but the text states it was "within a priori acceptance limits").7.4% (116 out of 1,578 studies) were categorized as "No Classification."
IV. Precision (Repeatability and Reproducibility)Implicit within reporting of performance: The device must demonstrate high repeatability and acceptable reproducibility in its classification output.Repeatability: 100% in all measures. Reproducibility: 82.6% Positive Agreement and 82.4% Negative Agreement.

Study Details

  1. Sample size used for the test set and the data provenance:

    • Test Set Sample Size: 1,578 patients (785 controls and 793 cases).
    • Data Provenance: Retrospective case:control study. The data was collected from multiple independent clinical sites spanning five states in the US.
  2. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

    • The document states that the ground truth was established by "ground truth classifications of cases (HFpEF) or controls," but it does not specify the number or qualifications of experts who established this ground truth for the test set.
  3. Adjudication method for the test set:

    • The document does not explicitly describe an adjudication method (e.g., 2+1, 3+1). It only refers to "ground truth classifications," implying these were already established.
  4. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

    • No, a multi-reader multi-case (MRMC) comparative effectiveness study evaluating human readers with and without AI assistance was not done. The study focuses on the standalone performance of the device. The device is intended as a "diagnostic aid" for use "by an interpreting clinician," but its performance evaluation presented here is not an MRMC study.
  5. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

    • Yes, a standalone performance study was done. The reported sensitivity, specificity, AUROC, and precision values are for the device (algorithm) itself without human intervention in the classification output for the test set. The device provides a "binary classification suggestive of the presence, or absence of heart failure with preserved ejection fraction (HFpEF)" and an "EchoGo Score."
  6. The type of ground truth used (expert consensus, pathology, outcomes data, etc):

    • The ground truth was based on "ground truth classifications of cases (HFpEF) or controls," and "known and expected proportions of HFpEF." While not explicitly stated as "expert consensus," this terminology strongly implies clinical diagnoses were used to establish the HFpEF status for each patient in the dataset. It does not mention pathology or outcomes data specifically for ground truth.
  7. The sample size for the training set:

    • The sample size for the training set is not explicitly stated in the provided text. It mentions that the "Subject device AI model was trained on more data and with additional preprocessing steps and data augmentations" compared to the predicate device, and the testing data cohort was a "22.9% increase in data beyond the testing data cohort utilized for the 510k submission of EchoGo Heart Failure 1.0." However, the exact size of the training set is not provided.
  8. How the ground truth for the training set was established:

    • The document does not explicitly describe how the ground truth for the training set was established. It only states that the AI model was "trained on more data" with "additional preprocessing steps and data augmentations." It is highly probable it was established similarly to the test set ground truth (i.e., using clinical diagnoses or expert classifications), given the nature of the diagnostic task.

§ 870.2200 Adjunctive cardiovascular status indicator.

(a)
Identification. The adjunctive cardiovascular status indicator is a prescription device based on sensor technology for the measurement of a physical parameter(s). This device is intended for adjunctive use with other physical vital sign parameters and patient information and is not intended to independently direct therapy.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Software description, verification, and validation based on comprehensive hazard analysis must be provided, including:
(i) Full characterization of technical parameters of the software, including any proprietary algorithm(s);
(ii) Description of the expected impact of all applicable sensor acquisition hardware characteristics on performance and any associated hardware specifications;
(iii) Specification of acceptable incoming sensor data quality control measures; and
(iv) Mitigation of impact of user error or failure of any subsystem components (signal detection and analysis, data display, and storage) on accuracy of patient reports.
(2) Scientific justification for the validity of the status indicator algorithm(s) must be provided. Verification of algorithm calculations and validation testing of the algorithm using a data set separate from the training data must demonstrate the validity of modeling.
(3) Usability assessment must be provided to demonstrate that risk of misinterpretation of the status indicator is appropriately mitigated.
(4) Clinical data must be provided in support of the intended use and include the following:
(i) Output measure(s) must be compared to an acceptable reference method to demonstrate that the output measure(s) represent(s) the predictive measure(s) that the device provides in an accurate and reproducible manner;
(ii) The data set must be representative of the intended use population for the device. Any selection criteria or limitations of the samples must be fully described and justified;
(iii) Agreement of the measure(s) with the reference measure(s) must be assessed across the full measurement range; and
(iv) Data must be provided within the clinical validation study or using equivalent datasets to demonstrate the consistency of the output and be representative of the range of data sources and data quality likely to be encountered in the intended use population and relevant use conditions in the intended use environment.
(5) Labeling must include the following:
(i) The type of sensor data used, including specification of compatible sensors for data acquisition;
(ii) A description of what the device measures and outputs to the user;
(iii) Warnings identifying sensor reading acquisition factors that may impact measurement results;
(iv) Guidance for interpretation of the measurements, including warning(s) specifying adjunctive use of the measurements;
(v) Key assumptions made in the calculation and determination of measurements;
(vi) The measurement performance of the device for all presented parameters, with appropriate confidence intervals, and the supporting evidence for this performance; and
(vii) A detailed description of the patients studied in the clinical validation (
e.g., age, gender, race/ethnicity, clinical stability) as well as procedural details of the clinical study.