Search Results

The Philips Lumify Diagnostic Ultrasound System is intended for diagnostic ultrasound imaging in B (2D), Color Doppler, Combined (B+Color), Pulsed Wave Doppler (PWD), and M-modes.

It is indicated for diagnostic ultrasound imaging and fluid flow analysis in the following applications: Fetal/Obstetric, Abdominal, Pediatric, Cephalic, Urology, Gynecological, Cardiac Fetal Echo, Small Organ, Musculoskeletal, Peripheral Vessel, Carotid, Cardiac, Lung,

The Lumify system is a transportable ultrasound system intended for use in environments where healthcare is provided by healthcare professionals.

Device Description

The Philips Lumify Diagnostic Ultrasound System (Lumify) is a mobile, durable, and reusable, software-controlled medical device, which is intended to acquire high-resolution ultrasound data and to display the data in B (2D), Pulsed Wave Doppler, Color Doppler, Combined (B+ Color), and M modes.

The Lumify Diagnostic Ultrasound System (Android) utilizes:

A commercial off-the-shelf (COTS) Android mobile device (smart phone or tablet)
The Philips Ultrasound Lumify software running as an application on the COTS device
The Philips C5-2 Curved array USB transducer
The Philips L12-4 Linear array USB transducer
The Philips S4-1 Sector array USB transducer
Lumify Micro B Transducer Cable
Lumify Micro C Transducer Cable

The Lumify system is compatible with iOS or Android operating systems. The Lumify system software provides various imaging features, including an Android-specific feature with artificial intelligence (AI) based, Auto EF Quantification (ejection fraction) technology during cardiac imaging.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for the Philips Lumify Diagnostic Ultrasound System with Auto EF Quantification, based on the provided FDA 510(k) summary:

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria	Reported Device Performance
Correlation of LVivo EF Ejection Fraction (EF) measurements with average manual tracing results.	Strong correlation demonstrated: r = 0.82, 95% CI (0.72, 0.88). The endpoint criteria were met.
Correlation of LVivo EF End-Diastolic Volume (EDV) measurements with average manual tracing results.	Strong correlation demonstrated: r = 0.95, 95% CI (0.91, 0.96).
Correlation of LVivo EF End-Systolic Volume (ESV) measurements with average manual tracing results.	Strong correlation demonstrated: r = 0.94, 95% CI (0.90, 0.96).
Percentage of clips successfully processed automatically by LVivo EF.	76 out of 80 clips (95%) were automatically processed.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: 80 patients' Apical 4 Chamber (A4CH) view clips.
Data Provenance: The data were acquired with the Lumify Diagnostic Ultrasound System, specifically for this clinical performance study. Patients were selected based on eligibility, and data were acquired consecutively for patients with normal and impaired LV function. This suggests a prospective acquisition for the purpose of the study. The document does not specify the country of origin for the data.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

The ground truth was established by "manual tracing performed by sonographers."

Number of Experts: The document refers to "sonographers" in the plural, but does not specify the exact number of sonographers involved in the manual tracing.
Qualifications of Experts: The document implies that these were qualified "sonographers" experienced in echocardiographic LV function evaluation, but does not provide specific qualifications (e.g., years of experience, board certification). It can be inferred that they are healthcare professionals who routinely perform this task.

4. Adjudication Method for the Test Set

The ground truth for EF, EDV, and ESV was established by the "average results by manual tracing." This implies that IF multiple sonographers performed the manual tracings, their results were averaged. However, it does not explicitly state an adjudication method like 2+1 or 3+1 (where discrepancies are resolved by a third expert or consensus). It refers to "the average results by manual tracing," suggesting a quantitative aggregation rather than a specific adjudicative consensus process if multiple readers were used. If only one sonographer performed the tracing for each case, no adjudication would be necessary.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

There is no mention of a Multi-Reader Multi-Case (MRMC) comparative effectiveness study being done to evaluate how much human readers improve with AI vs. without AI assistance. The study focuses on the agreement between the AI's automated measurements and a "manual tracing" ground truth, not on reader performance with and without AI assistance.

6. Standalone (Algorithm Only) Performance

Yes, a standalone study was done. The clinical performance study directly compared the "automated EF evaluation by LVivo EF" (the algorithm's performance) against "Ejection Fraction (EF) evaluation by manual tracing performed by sonographers" (the ground truth). The results (correlation coefficients) reflect the algorithm's performance without a human-in-the-loop scenario. The LVivo EF automatically processed 95% of the clips, indicating its standalone capability.

7. Type of Ground Truth Used

The ground truth used was expert consensus/manual tracing. Specifically, it was defined as "Ejection Fraction (EF) evaluation by manual tracing performed by sonographers" and "the average results by manual tracing" for EF, EDV, and ESV. This is considered an expert-derived ground truth based on conventional, established methods for echocardiographic LV function evaluation.

8. Sample Size for the Training Set

The document explicitly states: "The data used for clinical performance study were completely distinct from that used during training of the algorithm, and there was no overlap between the two data sets." However, it does not provide the sample size for the training set used to develop the LVivo EF algorithm.

9. How the Ground Truth for the Training Set was Established

The document states that the clinical performance study data were "completely distinct from that used during training of the algorithm," but it does not describe how the ground truth for the training set was established. It implies that such a training process occurred ("training of the algorithm"), but details on its ground truth are not provided in this summary.

Ask a Question

Ask a specific question about this device

K Number

K220068

Device Name

Butterfly iQ/iQ+ Ultrasound System

Manufacturer

Butterfly Network, Inc.

Date Cleared

2023-03-31

(445 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K203406,K170714,K202406

Predicate For

K232808

Intended Use

The Butterfly iQ+ Ultrasound System is indicated for use by trained healthcare professionals in environments where healthcare is provided to enable diagnostic ultrasound imaging and measurement of anatomical structures and fluids of adult and pediatric patients for the following clinical applications: Peripheral Vessel (including carotid, deep ven thrombosis and arterial studies), Procedural Guidance, Small Organs (including thyroid, scrotum and breast), Cardiac, Abdominal, Lung, Urology, Fetal/Obstetric, Gynecological, Musculoskeletal (conventional), Musculoskeletal (superficial) and Ophthalmic. Modes of operation include B-mode + M-mode, B-mode + Color Doppler, B-mode + Power Doppler, Spectral Pulsed Wave Doppler.

Device Description

The Butterfly iQ/Butterfly iQ+ Ultrasound System is a hand-held general-purpose diagnostic imaging system for use by trained healthcare professionals in environments where healthcare is provided to enable visualization and measurement of anatomical structures and fluid of adult and pediatric patients. The system consists of a single transducer with broad imaging capabilities connected to a standard handheld commercial off the shelf (COTS) mobile device compatible with the Butterfly iO/iO+ mobile application (app). The subject device introduces the Auto B-line Counter, a software application backed by an image analysis algorithm. The purpose of the Auto B-line Counter is to provide automated detection and automatic calculation of the number of B-lines to a user in a given rib space and also provides the users the capabilities of reviewing the detected B-lines (via visual overlays). The overlay of B-lines does not mark images for detection of specific pathologies. The Auto B-line Counter enables the automated identification and count of B-lines during a lung scan and is integrated into the existing Butterfly iQ/iQ+ mobile application for use with the Butterfly iQ or iQ+ transducers.

AI/ML Overview

The provided text describes the Butterfly iQ/iQ+ Ultrasound System, which introduces an "Auto B-line Counter" software application. The information below summarizes the acceptance criteria and the study that proves the device meets these criteria.

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria for the Auto B-line Counter algorithm's performance are primarily established through analytical validation and clinical performance evaluation.

Metric	Acceptance Criteria (Non-inferiority to clinician annotators)	Reported Device Performance
Analytical Validation	Algorithm performance non-inferior to clinician annotators (Ground Truth)	Met acceptance criteria for all tests. Performance assessed by: - Intraclass Correlation Coefficient (ICC) between annotators for Quality Indicator. - Dice Coefficient Score (DSC) for conformance of automatic B-line segmentation to ground truth. (Specific numerical thresholds for ICC and DSC for acceptance are not provided in the text, but the claim is that criteria were met.)
Clinical Performance Evaluation	Algorithm performance non-inferior to clinician annotator ground truth	Demonstrated non-inferiority. Performance assessed by calculating the Intraclass Correlation Coefficient (ICC) between the tool and the ground truth. Algorithm's performance was consistent among clinically meaningful subgroups: age, gender, and BMI. (Specific numerical thresholds for ICC for acceptance are not provided, but the claim is that non-inferiority was shown).

2. Sample Sizes Used for the Test Set and Data Provenance

Analytical Testing Test Set:
- Sample Size: 6000 de-identified cines.
- Data Provenance: Acquired from 253 sites. The datasets spanned many demographic variables including gender (male, female, and unidentified), age (20-90 years), and ethnicity via collection from a multitude of clinical sites with diverse and distinct racial patient populations. The data included various clinical subgroups and confounders such as congestive heart failure, heart failure with reduced ejection fraction, diabetes (with and without chronic complications), myocardial infarction, peripheral vascular disease, and renal disease. This suggests a retrospective, multi-center, multi-national (implied by "diverse and distinct racial patient populations") data provenance, although explicit country of origin is not stated.
Clinical Performance Evaluation Test Set:
- Sample Size: 99 subjects.
- Data Provenance: Not explicitly detailed beyond being used for clinical performance evaluation. Given the context, it is likely also retrospective data from a similar pool as the analytical test set, or specifically collected for this evaluation.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: The text refers to "expert annotators" (plural) for determining ground truth locations of B-lines and for establishing ICC. While a specific number isn't provided, the use of "annotators" and "experts" in plural implies a group.
Qualifications of Experts: The text states, "The ground truthing for B-line counts was determined by the ICC among expert annotators presented with lung cines and instructions to determine the maximum number of B-Lines using the instant percent method. The ground truth locations of B-lines were then determined by expert annotator segmentations." The exact qualifications (e.g., number of years of experience, specialty like radiologist or emergency physician) are not detailed.

4. Adjudication Method for the Test Set

Adjudication Method: The ground truthing involved assessing the "Intraclass Correlation Coefficient (ICC) among expert annotators." This suggests that multiple experts independently provided assessments, and their agreement (measured by ICC) was used to establish the ground truth or validate its reliability. It doesn't explicitly state a specific adjudication method like 2+1 or 3+1 for resolving discrepancies, but rather implies consensus or high agreement as the basis for ground truth.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

The document does not describe a Multi-Reader Multi-Case (MRMC) comparative effectiveness study where human readers' performance with and without AI assistance is compared. The studies described focus on the standalone performance of the AI algorithm against human annotator ground truth.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

Yes, standalone performance was done. The analytical validation and clinical performance evaluation sections explicitly describe testing the "Auto B-line Counter algorithm performance" as being "non-inferior to clinician annotators (Ground Truth)" and calculating the ICC between "the tool and the ground truth." This indicates a direct comparison of the algorithm's output against established ground truth, representing standalone performance.

7. Type of Ground Truth Used

Expert Consensus / Expert Annotation: The ground truth for B-line counts was determined by the "ICC among expert annotators" and by "expert annotator segmentations" for the locations of B-lines. This strongly indicates an expert consensus or expert annotation approach, where human experts interpret the images to establish the reference standard.

8. Sample Size for the Training Set

The document does not specify the sample size for the training set. It only states that the "data used for verification is completely distinct from that used during the training process and there is no overlap between the two."

9. How the Ground Truth for the Training Set Was Established

The document does not explicitly describe how the ground truth for the training set was established. It only ensures the independence of training and testing data and mentions the process for establishing ground truth for the verification/test sets.

Ask a Question

Ask a specific question about this device

Page 1 of 1