Search Results

FETOLY-HEART is intended to analyse fetal ultrasound images and clips using machine learning techniques to automatically detect heart views and quality criteria within the views. The device is intended for use as a concurrent reading aid during the acquisition and interpretation of fetal ultrasound images.

FETOLY-HEART is indicated for use during routine fetal heart examination of 2nd and 3rd trimester pregnancy (gestational age: from 17 to 40 weeks).

Device Description

FETOLY-HEART is a software that aims at helping sonographers, obstetricians, radiologists, maternal-fetal medicine specialists, and pediatric cardiologists (designated as healthcare professionals i.e. HCPs) to perform fetal ultrasound examinations of the fetal heart in real-time. FETOLY-HEART can be used by HCPs during fetal ultrasound examinations in the second and third trimesters (gestational age window: from 17 to 40 weeks). The software is intended to assist HCPs in the completeness assessment of the fetal heart ultrasound examination in accordance with national and international guidelines.

To utilize FETOLY-HEART, the software needs to be installed on a hardware device which is connected to an Ultrasound Machine through an HDMI connection. The software receives ultrasound images captured by the connected Ultrasound Machine in real-time. The software's frozen deep learning algorithm, which was trained by supervised learning, analyzes images of this ultrasound image stream to detect heart views and quality criteria within those views. The software provides the following user-accessible information:

. Examination completeness: the software displays in real-time which heart views and quality criteria are verified by the software during the examination. It is the main and principal output of the FETOLY-HEART device. The verified heart views and quality criteria are accessible by clinicians at any moment of the ultrasound examination, in real-time.
. Completeness illustration: the software selects an image subset that illustrates the verified views and quality criteria. These images can be reviewed by clinicians to verify the views and criteria's presence. This is a secondary output of the FETOLY-HEART device. Optionally, clinicians can display detected quality criteria localization on selected images.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving FETOLY-HEART meets them, based on the provided FDA 510(k) summary:

Acceptance Criteria and Performance Study for FETOLY-HEART

The FETOLY-HEART device uses machine learning to automatically detect fetal heart views and quality criteria within those views from ultrasound images. The acceptance criteria focus on the device's accuracy in these detections, measured by sensitivity, specificity, and mean Intersection over Union (mIoU) for bounding box localization.

1. Table of Acceptance Criteria and Reported Device Performance

Measured Metric	Acceptance Criteria	Reported Device Performance (Point Estimate)	Bootstrap CI (95%)
Fetal Heart View Detection
Sensitivity (for each view)	≥ 85%	Abdomen: 0.976Four Chamber: 0.987LVOT: 0.983RVOT: 0.987Three Vessels: 0.981	Abdomen: (0.960, 0.990) Four Chamber: (0.974, 0.997)LVOT: (0.969, 0.994)RVOT: (0.974, 0.996)Three Vessels: (0.965, 0.993)
Specificity (for each view)	≥ 85%	Abdomen: 0.998Four Chamber: 1.00LVOT: 0.999RVOT: 0.998Three Vessels: 0.998	Abdomen: (0.996, 1.000)Four Chamber: (1.000, 1.000)LVOT: (0.998, 1.000)RVOT: (0.996, 1.000)Three Vessels: (0.997, 1.000)
Quality Criteria Detection
Sensitivity	≥ 90%	Ranges from 0.903 (Abdomen - Left rib) to 0.990 (Four Chamber - Left atrium)	(All reported lower bounds of CI for sensitivity met ≥ 0.85 acceptance criteria for views, and ≥ 0.90 for quality criteria.)
Specificity	≥ 90%	Ranges from 0.990 (Four Chamber - Connection between crux and atrial septum) to 1.00 (Four Chamber - Right atrium/Foramen ovale flap)	(All reported lower bounds of CI for specificity met ≥ 0.85 acceptance criteria for views, and ≥ 0.90 for quality criteria.)
Quality Criteria Bounding Box Localization
Mean Intersection over Union (mIoU)	≥ 50%	Values range from 0.512 (Abdomen - Inferior vena cava) to 0.792 (Three Vessels - Spine)	(All reported lower bounds of CI for mIoU met ≥ 0.50 acceptance criteria.)

Note: For the detailed range of sensitivities, specificities, and mIoU for each of the 52 quality criteria, refer to the tables provided in the original document (pages 15-17).

2. Sample Size and Data Provenance

Test Set Sample Size: 2,288 fetal ultrasound images across 480 patient cases.
Data Provenance: The data originated from 7 distinct clinical sites in the United States. The data was collected retrospectively in reverse chronological order. It includes full examination still images, cardiac clip frames, and full examination video frames. The cases are stated to be representative of the intended use population.

3. Number of Experts and Qualifications for Ground Truth

Number of Experts: Six annotators and additional adjudicators.
Qualifications of Experts: 3 sonographers and 3 OB/GYN doctors. Specific experience levels (e.g., "10 years of experience") are not provided, but their professional titles indicate clinical expertise in relevant fields.

4. Adjudication Method for the Test Set

View Classification: A 2+1 ground truth procedure was used.
- Images were assigned to pairs of annotators.
- If the two annotators agreed on the view classification, that was considered the ground truth.
- If the pair of annotators disagreed, an adjudicator reviewed the images and made the final decision.
Quality Criteria Classification and Localization (Bounding Boxes):
- Each image was annotated by a pair of annotators who drew bounding boxes.
- Agreement on Localization: If their bounding boxes had at least 50% overlap, their coordinates were averaged to form the ground truth.
- Disagreement on Presence or Localization: If the overlap was lower or there was a disagreement on the criterion's presence, an adjudicator reviewed the boxes.
  - The final decision regarding the presence of the criterion was based on majority consensus among the adjudicator and annotators.
  - The final decision for the criteria localization (bounding box) was based on the adjudicator's decision to either keep one of the annotator's boxes or draw a new one.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

The document does not indicate that an MRMC comparative effectiveness study was done to evaluate how much human readers improve with AI vs. without AI assistance. The study described is a standalone performance test of the algorithm itself.

6. Standalone Performance (Algorithm Only) Study

Yes, a standalone performance study was conducted. The results presented in the tables (sensitivity, specificity, mIoU) are for the FETOLY-HEART algorithm's performance without integration into a human reading workflow or human-in-the-loop performance measurement.

7. Type of Ground Truth Used

The ground truth used was expert consensus based on a 2+1 adjudication method by a panel of sonographers and OB/GYN doctors. For quality criteria localization, it involved expert-drawn bounding boxes with an adjudication process.

8. Sample Size for the Training Set

The document states that the testing dataset originated from distinct clinical sites from which the data used during model development (training/validation) was sourced, ensuring testing independence. However, the sample size for the training set is not explicitly provided in the given text.

9. How the Ground Truth for the Training Set Was Established

The document states, "The software's frozen deep learning algorithm, which was trained by supervised learning...". While it confirms supervised learning was used (implying labeled data), it does not explicitly detail the method for establishing ground truth for the training set. It only describes the ground truth establishment for the test set. It is common for ground truth for training data to be established by experts, potentially through similar consensus or adjudication processes, but this specific document does not describe it for the training set.

Ask a Question

Ask a specific question about this device

K Number

K230346

Device Name

Voluson SWIFT; Voluson SWIFT+

Manufacturer

GE Medical Systems Ultrasound & Primary Care Diagnostic, LLC

Date Cleared

2023-06-20

(132 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K210438,K220358,K213642

Predicate For

K250543

Intended Use

The Voluson SWIFT, Voluson SWIFT+ is a general-purpose diagnostic ultrasound system intended for use by qualified and trained healthcare professionals that are legally authorized or licensed by law in the country, state or other local municipality in which he or she practices for ultrasound imaging, measurement, display and analysis of the human body and fluid. The users may or may not be working under supervision or authority of a physician. Voluson SWIFT, Voluson SWIFT+ clinical applications include: Fetal Obstetrics; Abdominal (including renal and GYN/Pelvic); Pediatio; Small Organ (Breast, Testes, Thyroid, etc.); Neonatal Cephalic; Cardiac (Adult and Pediatric); Peripheral Vascular (PV); Musculo-skeletal Conventional and Superficial; Transrectal(including UrologyProstate) (TR); Transvaginal (TV).

Mode of operation include: B, M, AMM (Anatonical M-Mode), PW Doppler, CW Doppler, Color Doppler, Color M Doppler, Power Doppler, HD-Flow (High Definition-Flow), Harmonic Imaging, Coded Pulse, 3D/4D Inaging mode, Elastography, B-Flow and Combined modes: BM, B/ Color, B/PWD, B/Power/PWD. The Voluson SWIFT / Voluson SWIFT+ are intended to be used in a hospital or medical clinic.

Device Description

The subject device is a Track 3 device, primarily intended for general-purpose radiology evaluation and specialized for OB/GYN with particular features for real-time 3D/4D acquisition. The Voluson SWIFT, Voluson SWIFT+ provide digital acquisition, processing and display capability. Voluson SWIFT, Voluson SWIFT+ consist of a mobile console with control panel, full touch monitor, optional image storage and printing devices. It provides high-performance ultrasound imaging and analysis and have comprehensive networking and DICOM capability. It utilizes a variety of linear, curved linear, matrix phased array transducer including mechanical and electronic scanning transducers which provide accurate real-time three-dimensional imaging supporting all standard acquisition modes.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Device Performance for Auto-Caliper:

Acceptance Criteria	Reported Device Performance
Success rate of the AI feature (Auto-Caliper) for 2D caliper placement is 70% or higher.	The reported success rate is not explicitly stated as a single percentage, but the aggregate data on absolute difference between AI predicted diameter and ground truth diameter can be used to infer performance. The provided table shows that 85.4% of cases have an absolute difference less than or equal to 2.0 mm (5.3 + 32.8 + 22.1 + 25.2 = 85.4%), and 5.3% of cases have an absolute difference less than 0.1mm. The study's goal was to demonstrate performance, and while a direct "success rate" percentage isn't given in relation to the 70% criteria, the detailed error distribution provides a granular view of accuracy. If "success" is defined by a certain tolerance (e.g., within 2mm), then the performance is high.

Acceptance Criteria

Reported Device Performance

Success rate of the AI feature (Auto-Caliper) for 2D caliper placement is 70% or higher.

The reported success rate is not explicitly stated as a single percentage, but the aggregate data on absolute difference between AI predicted diameter and ground truth diameter can be used to infer performance. The provided table shows that 85.4% of cases have an absolute difference less than or equal to 2.0 mm (5.3 + 32.8 + 22.1 + 25.2 = 85.4%), and 5.3% of cases have an absolute difference less than 0.1mm. The study's goal was to demonstrate performance, and while a direct "success rate" percentage isn't given in relation to the 70% criteria, the detailed error distribution provides a granular view of accuracy. If "success" is defined by a certain tolerance (e.g., within 2mm), then the performance is high.

Study Details for Auto-Caliper Feature:

2. Sample Size Used for the Test Set and Data Provenance:
* Test Set Sample Size: 67 volumes, with a total of 134 follicles evaluated (2 follicles per volume).
* Data Provenance: Data collected across multiple geographical sites: Germany, India, Spain, United Kingdom, USA. The data was collected from women examined in regular clinical practice. It was de-identified by external clinical partners.
* Retrospective/Prospective: The data appears to be retrospective, as it was "collected from patients from regular clinical practice" and then "de-identified."

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:
* Number of Experts: Not explicitly stated as a single number. The verification for the Auto-Caliper AI feature was performed by "clinical experts" following a specific protocol. The "truthing process" for training data mentions "clinical experts" and a "senior sonographer" reviewing a random subset. It is reasonable to infer that experts with similar qualifications were used for the test set ground truth.
* Qualifications: "Clinical experts" and a "senior sonographer." Specific experience levels (e.g., "10 years of experience") are not provided.

4. Adjudication Method for the Test Set:
* Method: The outputs were evaluated by the clinical expert and the assessment was documented as Pass/No result/Fail. This suggests a qualitative assessment by individual experts rather than a specific multi-reader consensus method like 2+1 or 3+1 for resolving discrepancies. However, the quantitative evaluation involved calculating the deviation from manual measurements, which likely served as the definitive ground truth for performance metrics.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
* Was one done? No. The document does not describe an MRMC comparative effectiveness study where human readers' performance with and without AI assistance was compared.

6. Standalone (Algorithm Only) Performance Study:
* Was one done? Yes. The provided data is a standalone performance evaluation of the Auto-Caliper AI feature, comparing its predictions against ground truth (manual measurements by experts). The evaluation directly reports the deviation of the AI predictions from the manual measurements.

7. Type of Ground Truth Used:
* Type: Expert consensus / Manual measurements. The ground truth was established by "clinical experts" who manually placed calipers following a specific protocol. The "deviation of the measurements predicted by the Auto-Caliper tool from the manual measurements" confirms that expert manual measurements served as the reference.

8. Sample Size for the Training Set:
* Sample Size: 223 volumes.

9. How the Ground Truth for the Training Set Was Established:
* Method: A "curation protocol has been developed by clinical experts to be followed by curators." Additionally, "during and after the data curation process, a senior sonographer reviewed a random subset of the curated dataset for clinical accuracy." This indicates a structured process involving clinical expert input and review to establish the ground truth for the training data.

Ask a Question

Ask a specific question about this device

Page 1 of 1