(119 days)
Al Platform 2.0 is intended for noninvasive processing of ultrasound images to detect, measure, and calculate relevant medical parameters of structures and function of patients with suspected disease. In addition, it can provide Quality Score feedback to assist healthcare professionals, trained and qualified to conduct echocardiography and lung ultrasound scans in the current standard of care while acquiring ultrasound images. The device is intended to be used on images of adult patients.
Exo Al Platform 2.0 (AIP 2.0) is a software as a medical device (SaMD) that helps qualified users with image-based assessment of ultrasound examinations in adult patients. It is designed to simplify workflow by helping trained healthcare providers evaluate, quantify, and generate reports for ultrasound images. AIP 2.0 takes as an input in the Digital Imaging and Communications in Medicine (DICOM) format from ultrasound scanners of a specific range and allows users to detect, measure, and calculate relevant medical parameters of structures and function of patients with suspected disease. In addition, it provides frame and clip quality score in real-time for the Left Ventricle from the four-chamber apical and parasternal long axis views of the heart and lung scans. In addition, the Al modules are provided as a software component to be integrated by another computer programmer into their legally marketed ultrasound imaging device. Essentially, the Algorithm and API, which are modules, are medical device accessories.
Key features of the software are
- Lung Al: An Al-assisted tool for suggesting the presence of lung structures and artifacts on ultrasound images, namely A-lines. Additionally, a per-frame and per-clip quality score is generated for each lung scan.
- Cardiac Al: An Al-assisted tool for the quantification of Left Ventricular Ejection Fraction (LVEF), Myocardium wall thickness (Interventricular Septum (IVSd), Posterior wall (PWd)), and IVC diameter on cardiac ultrasound images. Additionally, a per-frame and per-clip quality score is generated for each Apical and PLAX cardiac scan.
The provided text describes the acceptance criteria and the study that proves the device, AI Platform 2.0 (AIP002), meets these criteria for specific functionalities. This device is a software as a medical device (SaMD) intended for processing ultrasound images for adult patients, including detecting, measuring, and calculating medical parameters, and providing quality score feedback during image acquisition.
Here's a breakdown of the requested information:
1. A table of acceptance criteria and the reported device performance
The document specifies performance metrics for two main functionalities tested: Left Ventricle Wall Thickness and Inferior Vena Cava (IVC) measurements, and Quality AI (for frames and clips). The acceptance criteria are implicitly high correlation with expert measurements, indicated by high Interclass Correlation (ICC) values.
Functionality/Measurement | Acceptance Criteria (Implicit) | Reported Device Performance (ICC with 95% CI) |
---|---|---|
LV Wall Thickness | High correlation with experts | |
InterVentricular Septum (IVSd) | 0.93 (0.89 – 0.96) | |
Posterior Wall (PWd) | 0.94 (0.89 – 0.97) | |
Inferior Vena Cava (IVC) | High correlation with experts | |
IVC Dmin | 0.93 (0.90 – 0.95) | |
IVC Dmax | 0.94 (0.90 – 0.96) | |
Quality AI | High agreement with experts | |
Overall agreement (frames) | 0.94 (0.94 – 0.95) | |
Overall agreement (clips) | 0.94 (0.92 – 0.95) | |
Diagnostic Classification | >95% agreement with experts (ACEP score >=3) | 98.3% of clips rated ACEP >=3 by experts received at least "Minimum criteria met for diagnosis" by Clip Quality AI. |
98.0% of scans considered "Minimal criteria met for diagnosis" or "good" by Quality AI were deemed diagnostic by experts (ACEP score of 3 or higher). |
2. Sample size used for the test set and the data provenance
- LV Wall Thickness and IVC measurements: 100 subjects.
- Quality AI (Section a): 184 patients, resulting in 226 clips (29,732 frames).
- Quality AI (Section b, real-time scanning): 396 lung and cardiac scans.
- Data Provenance: The test data encompassed diverse demographic variables (gender, age, ethnicity) from multiple sites in metropolitan cities with diverse racial patient populations. The text states the data was entirely separated from the training/tuning datasets. The studies were retrospective for the initial quality evaluation (comparing to previously acquired data rated by sonographers) and prospective for the real-time quality AI evaluation (data acquired while using the AI in real-time by users with varying experience).
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
- LV Wall Thickness and IVC measurements: Ground truth was established as the average measurement of three experts. Their specific qualifications (e.g., years of experience, specialty) are not explicitly stated beyond "experts."
- Quality AI (Section a): Ground truth was established by "experienced sonographers." Their number and specific qualifications are not detailed beyond "experienced."
- Quality AI (Section b, real-time scanning): Ground truth for diagnostic classification was established by "expert readers" (ACEP score of 3 or above). Their number and specific qualifications are not detailed beyond "expert readers."
4. Adjudication method for the test set
- LV Wall Thickness and IVC measurements: The adjudication method was taking the average measurement of three experts. This implies a form of consensus or central tendency for ground truth.
- Quality AI (Section a): Ground truth was based on "quality rating by experienced sonographers on each frame and the entire clip." It doesn't explicitly state an adjudication method beyond this, implying individual expert ratings were used or a single consensus was reached, but not a specific multi-reader adjudication process like 2+1 or 3+1.
- Quality AI (Section b): Ground truth was based on "ACEP quality of 3 or above by expert readers." Similar to Section a, a specific adjudication method beyond "expert readers" is not detailed.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
The document does not explicitly describe a traditional MRMC comparative effectiveness study that directly quantifies the improvement of human readers with AI assistance versus without AI assistance.
The Quality AI section (b) indicates that 26 users (including 18 novice users) conducted 396 lung and cardiac scans using the real-time quality AI feedback. This suggests an evaluation of the AI's ability to guide users to acquire diagnostic quality images, which is an indirect measure of assisting human performance. However, it does not provide an effect size of how much human readers improve in their interpretation or diagnosis with AI assistance. The study focuses on the AI's ability to help users acquire diagnostic quality images.
6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done
Yes, standalone performance was evaluated for the following:
- Left Ventricle Wall Thickness and IVC measurements: The performance (ICC) was calculated directly between the AI's measurements and the expert-derived ground truth. This is a standalone performance metric.
- Quality AI (Section a): The overall agreement (ICC) between the Quality AI and quality ratings by experienced sonographers was calculated. This also represents standalone performance of the AI's quality assessment function.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
The ground truth used for the evaluated functionalities was expert consensus/measurement:
- LV Wall Thickness and IVC measurements: Average measurement of three experts.
- Quality AI: Quality ratings by experienced sonographers (Section a) and ACEP quality scores by expert readers (Section b).
No mention of pathology or outcomes data as ground truth.
8. The sample size for the training set
The document explicitly states: "The test data was entirely separated from the training/tuning datasets and was not used for any part of the training/tuning." However, it does not provide the specific sample size for the training set.
9. How the ground truth for the training set was established
The document does not explicitly describe how the ground truth for the training set was established. It only mentions that the AI models use "non-adaptive machine learning algorithms trained with clinical data." The Predetermined Change Control Plan also refers to "new training data" and augmenting the training dataset, but without details on ground truth establishment for these training datasets.
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).