Search Results

The UltraSight AI Guidance is intended to assist medical professionals (not including expert sonographers) in acquiring cardiac ultrasound images. UltraSight Al Guidance is an accessory to compatible general-purpose diagnostic ultrasound systems. UltraSight Al Guidance is indicated for use in two-dimensional transthoracic echocardiography (2D- TTE) for adult patients, specifically in the acquisition of the following standard views: Parasternal Long-Axis (PLAX), Parasternal Short-Axis at the Aortic Valve (PSAX-AV), Parasternal Short-Axis at the Mitral Valve (PSAX-MV), Parasternal Short-Axis at the Papillary Muscle (PSAX-PM), Apical 4-Chamber (AP4), Apical 5-Chamber (AP5), Apical 2-Chamber (AP2), Apical 3-Chamber (AP3), Subcostal 4-Chamber (SubC4), and Subcostal Inferior Vena Cava (SC-IVC).

Device Description

UltraSight Al Guidance is a mobile application based on machine learning that uses artificial intelligence (AI) to provide dynamic real-time guidance on the position and orientation of the transducer to help non-expert users acquire diagnostic-quality tomographic views of the system provides guidance for ten standard cardiac views.

AI/ML Overview

Here's a detailed breakdown of the acceptance criteria and the studies performed for the UltraSight AI Guidance device, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

For Standalone AI Performance (Algorithm Only):

Performance Metric	Acceptance Criteria	Reported Device Performance (Mean)	95% Confidence Interval	Additional Notes
Quality Bar	AUC > 0.8	0.86	[0.85, 0.87]	Shows good classification performance.
	PPV > 0.75	0.93	[0.92, 0.94]	Shows good classification performance.
View Detection	AUC > 0.8	0.988	(0.985, 0.990)	Good classification performance for "Hold position" vs. "Navigate" and "Hold position" vs. "No heart". Stratified analysis also met this.
Probe Guidance	AUC > 0.8	0.821	[0.813, 0.827]	Good classification performance for guiding probe movements. Stratified tests showed acceptable individual classifiers.

For Clinical Performance (Human-in-the-loop with AI guidance):

Performance Metric (Visual Quality for Visual Assessment - Majority Agreement)	Acceptance Criteria (Implicit: demonstrate non-experts can acquire diagnostic quality)	Reported Device Performance (Non-expert users with AI Guidance)	Additional Notes
LV size	N/A (Comparative to sonographer performance implicitly desired)	93-100% of cases (Pivotal Study)	Sufficient visual quality for assessment.
LV function	N/A	93-100% of cases (Pivotal Study)	Sufficient visual quality for assessment.
RV size	N/A	93-100% of cases (Pivotal Study)	Sufficient visual quality for assessment.
Non-trivial pericardial effusion	N/A	93-100% of cases (Pivotal Study)	Sufficient visual quality for assessment.
MV structure	N/A	98% of cases (Pivotal Study)	Sufficient visual quality for assessment.
RV function	N/A	94% of cases (Pivotal Study)	Sufficient visual quality for assessment.
Left atrium size	N/A	94% of cases (Pivotal Study)	Sufficient visual quality for assessment.
AV structure	N/A	89% of cases (Pivotal Study)	Sufficient visual quality for assessment.
TV structure	N/A	74% of cases (Pivotal Study)	Sufficient visual quality for assessment.
IVC size	N/A	67% of cases (Pivotal Study)	Sufficient visual quality for assessment.
Diagnostic Quality Score >= 3 (ACEP scale) for specific views	N/A	Pilot Study: Range 78.6-97.9% of clips	Most clips taken by non-expert users met this.

Note: The clinical study explicitly states the goal was to evaluate whether non-expert users could acquire diagnostic quality images with the AI guidance. The results demonstrate this effectively, comparing favorably to the predicate. The "acceptance criteria" for clinical performance are implicitly met by successful completion of these clinical endpoints at high percentages.

2. Sample Size Used for the Test Set and Data Provenance

For Standalone AI Performance Testing:

Quality Bar Test Set: 312 clips
View Detection & Guidance Tests Test Set: 75 subjects, totaling 2.3 million frames of ultrasound images.
Data Provenance: The data used for performance testing (test set) was collected at different sites, geographically separated, from the sites used for algorithm development. The data was collected from a population representative of the intended population. It is implied to be retrospective, as it's a pre-collected "test set". The country of origin is not explicitly stated.

For Clinical Performance (Pivotal Study):

Test Set (Subjects): 240 subjects.
Data Provenance: Prospective, multi-center study. Country of origin not explicitly stated, but the submission is to the US FDA, suggesting the study likely included US sites or data relevant to the US population. The comparison group was scans by cardiac sonographers without AI guidance.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

For Standalone AI Performance Testing (Quality Bar):

Number of Experts: 3.
Qualifications: Cardiologists. No further details on their years of experience are provided.

For Clinical Performance (Pilot & Pivotal Studies):

Number of Experts: 5.
Qualifications: Expert cardiologists. No further details on their years of experience are provided.

4. Adjudication Method for the Test Set

For Standalone AI Performance Testing:

Quality Bar: Each clip was annotated with a "diagnosable / non-diagnosable" label by three cardiologists. The final ground truth for "diagnosable" vs. "non-diagnosable" is based on their inputs, but the exact adjudication method (e.g., simple majority, weighted majority, or if a consensus meeting occurred) is not explicitly detailed. However, the use of "majority agreement" for clinical parameters in the clinical study (below) suggests a similar approach might have been used implicitly for the standalone ground truth if not explicitly stated.
View Detection: Ground truth labels were defined on the frame level using annotation of expert sonographers. No specific number of experts or adjudication method is described beyond "expert sonographers."
Probe Guidance: Similar to view detection, ground truth for guidance cues was established by experts, but the exact method or number of experts for adjudication is not detailed.

For Clinical Performance (Pilot & Pivotal Studies):

Adjudication Method: Cardiologists reviewed clips and their assessments were based on majority agreement for visual quality of cardiac parameters. They were blinded to whether the clip was acquired by a non-expert user or a sonographer and to each other's evaluations. Cohen's kappa coefficient was used to assess intra-cardiologist variability.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance

Yes, a prospective multi-center, multi-case (MRMC) pivotal clinical study was conducted.

Comparison: Non-expert users with UltraSight AI Guidance vs. Cardiac sonographers without AI guidance (using the same hardware).
Effect Size (Improvement with AI for non-expert users): The study demonstrated that non-expert users, with the AI guidance, achieved diagnostic quality scans comparable to those performed by sonographers. For the four co-primary endpoints (LV size, LV function, RV size, and non-trivial pericardial effusion), non-expert users with AI guidance acquired scans deemed to have adequate visual quality in 93-100% of cases (based on majority agreement of expert cardiologists). This indicates a significant improvement for non-expert users, enabling them to produce scans previously only achievable by expert sonographers. While a direct "without AI" performance for non-experts wasn't explicitly tested in this specific pivotal comparative arm (they compared with experts without AI), the entire premise is that without AI guidance, these non-experts would not be able to achieve such diagnostic quality, implying a very large effect size of the AI in bringing non-experts to near-expert performance.

The Pilot Study provides further context: "The exams performed by the non-expert users had sufficient visual quality in 100% of cases based on majority agreement to assess LV size and function, RV size, and pericardial effusion." This further reinforces the high effectiveness.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, standalone performance testing of the AI algorithms was conducted. The results for the Quality Bar, View Detection, and Probe Guidance features, with their respective AUC and PPV scores against defined acceptance criteria, are presented under "Non-Clinical Standalone Performance Testing of AI algorithms."

7. The Type of Ground Truth Used

For Standalone AI Performance (Quality Bar): Expert cardiologist annotation ("diagnosable / non-diagnosable") based on ACEP guidelines.
For Standalone AI Performance (View Detection & Probe Guidance): Expert sonographer annotations.
For Clinical Performance (Pilot & Pivotal Studies): Expert cardiologist consensus (majority agreement) on the visual quality for assessing various cardiac parameters. This is effectively expert consensus.

8. The Sample Size for the Training Set

The document notes "Algorithm development" and lists a "number of subjects" and "number of samples." While not explicitly called "training set," these numbers represent the data used for the algorithm's development.

Number of Subjects: 580
Number of Samples: 5 million frames of ultrasound images

9. How the Ground Truth for the Training Set Was Established

The document states that the data used for performance testing (test set) was collected at "different sites, geographically separated, from the sites used for collection of the algorithm development data." This implies that the ground truth for the algorithm development data (training set) would have also been established through similar expert annotations or a process that led to the labels required for training the deep learning models. However, the specific methods for establishing ground truth for the training set are not explicitly described in the provided text. It can be inferred that it involved expert labeling, similar to the test set, but no details on the number of experts, their qualifications, or adjudication methods for the training data are given.

Ask a Question

Ask a specific question about this device

Page 1 of 1