Search Results

BrainInsight is intended for automatic labeling, spatial measurement, and volumetric quantification of brain structures from a set of low-field MR images and returns annotated and segmented images, color overlays and reports.

Device Description

BrainInsight is a fully automated MR imaging post-processing medical software that provides image alignment, whole brain segmentation, ventricle segmentation, and midline shift measurements of brain structures from a set MR images. The BrainInsight processing architecture includes a proprietary automated internal pipeline based on machine learning tools. The output annotated and segmented images are provided in standard image format using segmented color overlays and reports that can be displayed on third-party workstations and FDA-cleared Picture Archive and Communications Systems (PACS). The high throughput capability makes the software suitable for use in routine patient care as a support tool for clinicians in assessment of low-field (0.064 T) structural MRIs. BrainInsight provides overlays and reports based on 0.064 T 3D MRI series of T1 Gray/White, T2-Fast, and FLAIR images.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for the BrainInsight™ device, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria were defined based on non-inferiority testing, aiming for the model performance to be no worse than the average annotator's discrepancy.

Midline Shift Discrepancy (Lower is Better)

Application	Modality	Acceptance Criteria (Model <= Mean Annotator)	Reported Device Performance (Model Discrepancy)	Reported Mean Annotator Discrepancy
Midline Shift	T1	Model <= 1.42	0.99	1.42
Midline Shift	T2	Model <= 1.00	0.76	1.00
Midline Shift	T2-Fast	Model <= 1.38	1.00	1.38
Midline Shift	FLAIR	Model <= 1.21	0.90	1.21

Lateral Ventricle Segmentation Discrepancy (Lower is Better)

Application	Modality	Acceptance Criteria (Model <= Mean Annotator)	Reported Device Performance (Model Discrepancy)	Reported Mean Annotator Discrepancy
Lateral Ventricle Left	T1	Model <= 0.18	0.17	0.18
Lateral Ventricle Left	T2	Model <= 0.24	0.20	0.24
Lateral Ventricle Left	T2-Fast	Model <= 0.18	0.16	0.18
Lateral Ventricle Left	FLAIR	Model <= 0.12	0.12	0.12
Lateral Ventricle Right	T1	Model <= 0.19	0.19	0.19
Lateral Ventricle Right	T2	Model <= 0.24	0.22	0.24
Lateral Ventricle Right	T2-Fast	Model <= 0.16	0.15	0.16
Lateral Ventricle Right	FLAIR	Model <= 0.13	0.13	0.13

Mean Absolute Error for Midline Shift (Lower is Better)

Application	Modality	Acceptance Criteria (Implicitly, to be within acceptable clinical error)	Reported Device Performance (Error)
Midline Shift	T1	Not explicitly stated, but clinical acceptability implied by meeting non-inferiority	1.01 mm
Midline Shift	T2	Not explicitly stated, but clinical acceptability implied by meeting non-inferiority	0.80 mm
Midline Shift	T2-Fast	Not explicitly stated, but clinical acceptability implied by meeting non-inferiority	0.89 mm
Midline Shift	FLAIR	Not explicitly stated, but clinical acceptability implied by meeting non-inferiority	0.75 mm

Dice Overlap and Volume Differences for Segmentation (Higher Dice, Lower Volume Difference are Better)

Application	Modality	Performance Metric	Acceptance Criteria (Implicitly, to be clinically acceptable and comparable to annotators)	Device Performance	Annotator Performance
Left Ventricle	T1	Dice Overlap (%)	Not explicitly stated	85	90
Right Ventricle	T1	Dice Overlap (%)	Not explicitly stated	83	90
Whole Brain	T1	Dice Overlap (%)	Not explicitly stated	95	97
Left Ventricle	T1	Volume Differences (%)	Not explicitly stated	25	9
Right Ventricle	T1	Volume Differences (%)	Not explicitly stated	26	11
Whole Brain	T1	Volume Differences (%)	Not explicitly stated	3	2
Left Ventricle	T2	Dice Overlap (%)	Not explicitly stated	84	88
Right Ventricle	T2	Dice Overlap (%)	Not explicitly stated	82	87
Whole Brain	T2	Dice Overlap (%)	Not explicitly stated	96	97
Left Ventricle	T2	Volume Differences (%)	Not explicitly stated	27	21
Right Ventricle	T2	Volume Differences (%)	Not explicitly stated	26	20
Whole Brain	T2	Volume Differences (%)	Not explicitly stated	5	5
Left Ventricle	T2-Fast	Dice Overlap (%)	Not explicitly stated	86	91
Right Ventricle	T2-Fast	Dice Overlap (%)	Not explicitly stated	86	92
Left Ventricle	T2-Fast	Volume Differences (%)	Not explicitly stated	26	17
Right Ventricle	T2-Fast	Volume Differences (%)	Not explicitly stated	23	13
Left Ventricle	FLAIR	Dice Overlap (%)	Not explicitly stated	89	93
Right Ventricle	FLAIR	Dice Overlap (%)	Not explicitly stated	88	94
Left Ventricle	FLAIR	Volume Differences (%)	Not explicitly stated	9	7
Right Ventricle	FLAIR	Volume Differences (%)	Not explicitly stated	11	8

Summary of Device Performance against Acceptance Criteria:
The document states: "The test results show high accuracy of BrainInsight performance as compared to the reference and annotators and the subject device met all acceptance criteria." This implies that for all metrics where non-inferiority criteria were set (Midline Shift Discrepancy and Lateral Ventricle Discrepancy), the model performed as well as or better than the mean annotator. For other metrics, the performance was presented as being accurate and acceptable.

2. Sample Size Used for the Test Set and Data Provenance

Sample Size for Test Set: The exact numerical sample size for the test set is not explicitly stated. However, the document mentions that each model and application were validated using an appropriate sample size to yield statistically significant results.
Data Provenance:
- Country of Origin: Not specified.
- Retrospective or Prospective: Not specified.
- Acquisition Device: All test images were acquired using Hyperfine Swoop Portable MR imaging system with software versions 8.3 and 8.4.
Test Set Distribution:
- Age: >2 to 12 years (20.6%), >12 to <18 years (8.8%), >18 to 90 years (70.6%)
- Gender: 33% Female / 41% Male / 25% Anonymized
- Pathology: Stroke (Infarct), Hydrocephalus, Hemorrhage (SAH, SDH, IVH, IPH), Mass/Edema, Tumor.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

Number of Experts: The document states that the datasets for training and validation were annotated by "multiple experts." It then mentions that "The entire group of training image sets was divided into segments and each segment was given to a single expert." This phrasing is somewhat ambiguous for the test set specifically. It is implied that multiple experts were involved in the ground truth establishment for the overall process, but it doesn't clearly state how many experts independently evaluated each case in the test set, nor if the "single expert per segment" approach also applied to the test set ground truth.
Qualifications of Experts: Not specified beyond being referred to as "experts" and "annotators."

4. Adjudication Method for the Test Set

The adjudication method varies by application:

Midline Shift: Ground truth was determined based on the average shift distance of all annotators. This implies a form of consensus or averaging method rather than a strict adjudication by a senior expert.
Segmentation (Lateral Ventricles, Whole Brain): Ground truth for segmentation was calculated using Simultaneous Truth and Performance Level Estimation (STAPLE). STAPLE is an algorithm that estimates a "true" segmentation from multiple segmentations, weighting them based on their estimated performance. This is an algorithmic adjudication method.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was a MRMC study done? No, a traditional MRMC comparative effectiveness study that measures how human readers improve with AI vs. without AI assistance was not explicitly described for this submission. The study focuses on standalone performance of the AI model against expert annotations and the "mean annotator" performance.
Effect Size of Human Improvement (if applicable): Not applicable, as an MRMC comparative effectiveness study was not detailed.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

Was a standalone study done? Yes, the described performance evaluation appears to be a standalone (algorithm only) study. The device's performance is compared directly against the ground truth established by annotators, and against the mean discrepancy of the annotators themselves. There is no mention of human readers using the AI output to improve their performance compared to a baseline.

7. Type of Ground Truth Used

The type of ground truth used varies by the measurement:

Midline Shift: Expert consensus, calculated as the average shift distance of all annotators.
Segmentation (Lateral Ventricles, Whole Brain): Algorithmic consensus, calculated using Simultaneous Truth and Performance Level Estimation (STAPLE) based on expert annotations.
General: It is based on expert annotations of images acquired from the Hyperfine Swoop portable MRI system.

8. Sample Size for the Training Set

Sample Size for Training Set: The exact numerical sample size for the training set is not explicitly stated. The document only mentions that the data collection for the training and validation datasets was done at "multiple sites."

9. How the Ground Truth for the Training Set Was Established

The data collection for the training and validation datasets was done at multiple sites.
The datasets were annotated by multiple experts.
The "entire group of training image sets was divided into segments and each segment was given to a single expert."
"The expert's determination became the ground truth for each image set in their segment." This implies a form of single-reader ground truth for each segmented batch, rather than multi-reader consensus for every single case within the training set.

Ask a Question

Ask a specific question about this device

Page 1 of 1