Search Results
Found 262 results
510(k) Data Aggregation
(174 days)
Neurophet AQUA AD Plus is intended for automatic labeling, visualization, and volumetric quantification of segmentable brain structures and lesions, as well as SUVR quantification from a set of MR and PET images. Volumetric measurements may be compared to reference percentile data.
Neurophet AQUA AD Plus is a software device intended for the automatic labeling of brain structures, visualization, and volumetric quantification of segmented brain regions and lesions, as well as standardized uptake value ratio (SUVR) quantification using MR and PET images. The volumetric outcomes are compared to normative reference data to support the evaluation of neurodegeneration and cognitive impairment.
The device is designed to assist physicians in clinical evaluation by streamlining the clinical workflow from patient registration through image analysis, analysis result archiving, and report generation using software-based functionalities. The device provides percentile-based results by comparing an individual's imaging-derived quantitative analysis results to reference populations. Percentile-based results are provided for reference only and are not intended to serve as a standalone basis for diagnostic decision-making. Clinical interpretation must be performed by qualified healthcare professionals.
Here's a breakdown of the acceptance criteria and study details for the Neurophet AQUA AD Plus, based on the provided FDA 510(k) Clearance Letter:
Acceptance Criteria and Device Performance for Neurophet AQUA AD Plus
The Neurophet AQUA AD Plus employs multiple AI modules for automated segmentation and quantitative analysis of brain structures and lesions using MR and PET images. The device's performance was validated against predefined acceptance criteria for each module.
1. Table of Acceptance Criteria and Reported Device Performance
| AI Module | Performance Metric | Acceptance Criteria | Reported Device Performance |
|---|---|---|---|
| T1-SegEngine (T1-weighted structural MRI segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | 95% CI of DSC: [0.750, 0.850] for major cortical brain structures 95% CI of DSC: [0.800, 0.900] for major subcortical brain structures | Cortical Regions: Mean DSC: 0.83 ± 0.04 (95% CI: 0.82–0.84) Subcortical Regions: Mean DSC: 0.87 ± 0.03 (95% CI: 0.86–0.88) |
| Reproducibility (Average Volume Difference Percentage - AVDP) | Equivalence range: 1.0–5.0% for both subcortical and cortical regions | Subcortical Regions: Mean AVDP: 2.50 ± 0.93% (95% CI: 2.26–2.74) Cortical Regions: Mean AVDP: 1.79 ± 0.74% (95% CI: 1.60–1.98) | |
| FLAIR-SegEngine (T2-FLAIR hyperintensity segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | Mean DSC ≥ 0.80 | Mean DSC: 0.90 ± 0.04 (95% CI: 0.89–0.91) |
| Reproducibility (Mean AVDP and Absolute Lesion Volume Difference) | Absolute difference < 0.25 cc Mean AVDP < 2.5% | Mean AVDP: 0.99 ± 0.66% Mean absolute lesion volume difference: 0.08 ± 0.06 cc | |
| PET-Engine (SUVR and Centiloid quantification) | SUVR Accuracy (Intraclass Correlation Coefficient - ICC) | ICC ≥ 0.60 across Alzheimer's-relevant regions (compared to FDA-cleared reference product K221405) | ICC ≥ 0.993 across seven Alzheimer's-relevant regions |
| Centiloid Classification (Kappa value for amyloid positivity) | κ ≥ 0.70 (indicating substantial agreement with consensus expert visual reads) | Kappa values met or exceeded criterion (specific values not provided, but noted as meeting/exceeding) | |
| ED-SegEngine (edema-like T2-FLAIR hyperintensity segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | DSC ≥ 0.70 | Mean DSC: 0.91 ± 0.09 (95% CI: 0.89–0.93) |
| HEM-SegEngine (GRE/SWI hypointense lesion segmentation) | Accuracy (F1-score / DSC) | F1-score ≥ 0.60 | Median F1-score (DSC): 0.860 (95% CI: 0.824–0.902) |
2. Sample Sizes and Data Provenance for the Test Set
- T1-SegEngine (Accuracy): 60 independent T1-weighted MRI cases. Data provenance not explicitly stated, but implicitly from public repositories (e.g., ADNI, AIBL, PPMI) and institutional clinical sites as mentioned for training data, and distinct from training.
- T1-SegEngine (Reproducibility): 60 subjects with paired T1-weighted scans (120 scans total). Data provenance not explicitly stated.
- FLAIR-SegEngine (Accuracy): 136 independent T2-FLAIR cases. Data provenance not explicitly stated, but distinct from training data.
- FLAIR-SegEngine (Reproducibility): Paired T2-FLAIR scans (number not specified). Data provenance not explicitly stated.
- PET-Engine (SUVR accuracy): 30 paired MRI–PET datasets. Data provenance not explicitly stated, but implicitly from multi-center studies including varied tracers and sites.
- PET-Engine (Centiloid classification): 176 paired T1-weighted MRI and amyloid PET scans from ADNI and AIBL. These are public repositories, likely involving diverse geographical data (e.g., USA, Australia). Data is retrospective.
- ED-SegEngine (Accuracy): 100 T2-FLAIR scans collected from U.S. and U.K. clinical sites. Data is retrospective.
- HEM-SegEngine (Accuracy): 106 GRE/SWI scans from U.S. clinical sites. Data is retrospective.
For all modules, validation datasets were fully independent from training datasets at the subject level, drawn from distinct sites and/or repositories where applicable.
The validation cohorts covered adult subjects across a broad age range (approximately 40–80+ years), with both females and males represented.
Racial/ethnic composition included White, Asian, Black, and African American subjects, depending on the underlying public and institutional datasets.
Clinical subgroups included clinically normal, mild cognitive impairment, and Alzheimer's disease for structural, FLAIR, and PET modules, and cerebrovascular/amyloid‑related pathologies for ED‑ and HEM‑SegEngines.
3. Number of Experts and Qualifications for Ground Truth
For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines):
- Number of Experts: Not explicitly stated as a specific number, but "subspecialty-trained neuroradiologists" were used.
- Qualifications: "Subspecialty-trained neuroradiologists." Specific years of experience are not mentioned.
For Centiloid classification in the PET-Engine:
- Number of Experts: "Consensus expert visual reads." The exact number isn't specified, but implies multiple experts.
- Qualifications: "Experts" trained in established amyloid PET reading criteria. Specific qualifications beyond "expert" and training in criteria are not detailed.
4. Adjudication Method for the Test Set
For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines):
- "Consensus/adjudication procedures and internal quality control to ensure consistency" were used for establishing reference segmentations. The specific 2+1, 3+1, or other detailed method is not provided.
For Centiloid classification in the PET-Engine:
- "Consensus expert visual interpretation" was used. The specific method details are not provided.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
The provided text does not indicate that an MRMC comparative effectiveness study was done to compare human readers with AI assistance versus without AI assistance. The performance studies primarily focus on the standalone (algorithm-only) performance of the device against expert-derived ground truth or a cleared reference product.
6. Standalone (Algorithm-Only) Performance Study
Yes, a standalone (algorithm only without human-in-the-loop performance) study was done for all AI modules. The text explicitly states: "Standalone performance tests were conducted for each module using validation datasets that were completely independent from those used for model development and training." The results presented in the table above reflect this standalone performance.
7. Type of Ground Truth Used
- Expert Consensus:
- For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines), reference segmentations were generated by "subspecialty-trained neuroradiologists using predefined anatomical and lesion‑labeling criteria, with consensus/adjudication procedures."
- For Centiloid classification in the PET-Engine, reference labels were derived from "consensus expert visual interpretation using established amyloid PET reading criteria."
- Comparison to Cleared Reference Product:
- For SUVR quantification in the PET-Engine, reference values were obtained from an "FDA‑cleared reference product (K221405)" (Neurophet SCALE PET).
8. Sample Size for the Training Set
The exact sample size for the training set is not explicitly stated as a single number. However, the document mentions:
- "The AI-based modules (T1‑SegEngine, FLAIR‑SegEngine, PET‑Engine, ED‑SegEngine, HEM‑SegEngine) were trained using multi-center MRI and PET datasets collected from public repositories (e.g., ADNI, AIBL, PPMI) and institutional clinical sites."
- "Training data covered:
- Adult subjects across a broad age range (approximately 20–80+ years), with both sexes represented and including multiple racial/ethnic groups (e.g., White, Asian, Black).
- A spectrum of clinical conditions relevant to the intended use, including clinically normal, mild cognitive impairment, and Alzheimer's disease, as well as patients with cerebrovascular and amyloid‑related pathologies for lesion-segmentation modules.
- MRI acquired on major vendor platforms (GE, Siemens, Philips) at 1.5T and 3T... and amyloid PET acquired on multiple PET systems with commonly used tracers (Amyvid, Neuraceq, Vizamyl)."
This indicates a large and diverse training set, although a precise count of subjects or images isn't provided.
9. How the Ground Truth for the Training Set Was Established
The document implies that the training data included "manual labels" as it states: "No images or manual labels from the training datasets were reused in the validation datasets." However, it does not explicitly detail the process by which these "manual labels" or ground truth for the training set were established (e.g., number of experts, qualifications, adjudication method for training data). It's reasonable to infer that similar expert-driven processes were likely used for training ground truth as for validation, but this is not explicitly confirmed in the provided text.
Ask a specific question about this device
(210 days)
Surgical Reality Viewer is a medical imaging visualization software intended to assist trained healthcare professionals with preoperative and intraoperative visualizations, by displaying 2D and 3D renderings of DICOM compliant patient images and normal anatomic segmentations derived from patient images as well as functions for manipulation of segmentations and 3D models.
Surgical Reality Viewer assists the trained healthcare professional who is responsible for making all final patient management decisions.
The machine learning algorithms in use by Surgical Reality Viewer are intended for use on adult patients aged 22 years and over.
Surgical Reality Viewer is medical imaging visualization software that accepts DICOM compliant images (e.g. CT-scans or MR images) and segmentation files in various 3D object file formats (e.g. NifTi, OBJ, MHD, STL, etc.). The device can generate preliminary segmentations of normal anatomy on demand using machine learning and computer vision algorithms. It provides tools for editing and/or creating segmentations using various built-in 2D and 3D image manipulation functions. The software generates a 3D segmented view of the loaded patient data, either on a supported 2D or 3D screen, and offers features such as pre-operative (re)viewing of DICOM data overlaid with segmentation, (intra/post)operative visualization of anatomical structures, 2D-viewing, volume rendering, surface rendering, immersive and interactive 3D-viewing, 2D and 3D measuring of DICOM image data, storing on a local device, anatomic labelling including segmentation tools, and tools for annotations, brushing or carving of anatomical structures. Surgical Reality Viewer runs on a dedicated computer within the customer environment, meeting specific hardware requirements including a Windows operating system (version 10 or higher), GPU (Nvidia GeForce 2070), CPU (Intel i7), 16GB RAM, and at least 100GB free hard drive space.
Here's a breakdown of the acceptance criteria and study details for the Surgical Reality Viewer, based on the provided FDA 510(k) clearance letter and summary:
Acceptance Criteria and Reported Device Performance
The provided document details the performance of the machine learning algorithms for various anatomical segmentations using the Sørensen–Dice coefficient (DSC). Additionally, it describes a qualitative assessment of suitability.
Table of Acceptance Criteria (Implicit) and Reported Device Performance
| Anatomical Structure | Metric (Implicit Acceptance Criteria) | Reported Device Performance |
|---|---|---|
| Lobe segmentation | Average Sørensen–Dice coefficient (DSC) | 0.97 |
| - LUL | DSC | 0.98 |
| - LLL | DSC | 0.98 |
| - RUL | DSC | 0.98 |
| - RLL | DSC | 0.98 |
| - RML | DSC | 0.96 |
| Vessel segmentation | Average Sørensen–Dice coefficient (DSC) | 0.84 |
| - Artery | DSC | 0.84 |
| - Vein | DSC | 0.83 |
| Airway segmentation | Sørensen–Dice coefficient (DSC) | 0.96 |
| Aorta segmentation | Sørensen–Dice coefficient (DSC) | 0.96 |
| Pulmonary segmentation | Average Sørensen–Dice coefficient (DSC) | 0.85 |
| - Left segments | DSC | 0.85 |
| - Right segments | DSC | 0.85 |
| Qualitative Scores (Suitability) | (Score 1-5, higher is better) | Reported Scores: |
| Airways segmentations | Suitability score | 4.8 |
| Artery segmentations | Suitability score | 4.8 |
| Vein segmentations | Suitability score | 4.9 |
| Lobe Segmentations | Suitability score | 5.0 |
| Pulmonary lobe segments | Suitability score | 4.7 |
| Aorta segmentations | Suitability score | 5.0 |
Note on Acceptance Criteria: The document directly presents the performance metrics (DSC and qualitative scores). While explicit numerical acceptance criteria (e.g., "must be >= 0.95 DSC") are not stated, the reported high performance figures implicitly demonstrate the device meets acceptable levels for these metrics.
Study Details
1. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 102 CT images (Each study belonged uniquely to a single patient subject).
- Data Provenance: 60 (n=60) scans were obtained from the United States. The remaining 42 scans' country of origin is not specified, but the document mentions "geographical location" as a subgroup for generalizability.
- Retrospective/Prospective: Not explicitly stated, but the mention of "curated datasets" and "clinical testing dataset" without ongoing patient enrollment suggests a retrospective study.
2. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: Not explicitly stated as a specific number. The document mentions "trained professionals" who generated the initial segmentations and "thoracic surgeons with a minimum of 2 years professional working experience" who verified these segmentations. This implies at least two distinct groups of experts were involved, potentially multiple individuals within each group.
- Qualifications of Experts:
- Initial Segmentation Generation: "Trained professionals." (Specific professional background and experience level not detailed).
- Segmentation Verification: "Thoracic surgeons with a minimum of 2 years professional working experience."
3. Adjudication Method (for the Test Set)
- Adjudication Method: Not explicitly stated. The process described is "segmented by trained professionals and the segmentations were verified by thoracic surgeons." This suggests a single ground truth was established after the verification step, but the specific process for resolving discrepancies (e.g., consensus, tie-breaking by a third expert) is not detailed. It does not mention a 2+1 or 3+1 method.
4. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
- MRMC Study: No, an MRMC comparative effectiveness study was not explicitly described. The study focuses on the standalone performance of the algorithm against ground truth, and separate qualitative scoring of the suitability of segmentations. There is no mention of comparing human readers with and without AI assistance to determine an "effect size" of improvement.
5. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
- Standalone Study: Yes, a standalone performance study was done. The "Performance was verified by comparing segmentations generated by the machine learning models against ground truth segmentations generated by trained professionals." This directly assesses the algorithm's performance without a human in the loop for generating the primary segmentation output being evaluated for accuracy.
6. The Type of Ground Truth Used
- Type of Ground Truth: The ground truth for the quantitative analysis (DSC) was established by "expert consensus" (or at least expert-verified segmentations). Specifically, "segmentations generated by trained professionals and the segmentations were verified by thoracic surgeons." For the qualitative assessment, "medical professionals were tasked to qualitatively score the suitability of the segmentations provided through the Viewer," which is also an expert-based evaluation of the AI output.
7. The Sample Size for the Training Set
- Training Set Sample Size: Not explicitly stated. The document mentions "Each of the algorithms has been trained and tuned on curated datasets representative of the intended patient population," but does not provide a specific number for the training set. It only states that a "CT image was either part of the tuning or testing dataset and not in both," indicating that the 102 CT images used for testing were separate from the training/tuning data.
8. How the Ground Truth for the Training Set Was Established
- Training Set Ground Truth: Not explicitly stated. The document mentions "trained and tuned on curated datasets representative of the intended patient population." While not explicitly detailed, it's reasonable to infer that a similar expert-driven process (like the ground truth establishment for the test set) would have been used for creating the ground truth in the training dataset to ensure high-quality training data.
Ask a specific question about this device
(59 days)
AV Vascular is indicated to assist users in the visualization, assessment and quantification of vascular anatomy on CTA and/or MRA datasets, in order to assess patients with suspected or diagnosed vascular pathology and to assist with pre-procedural planning of endovascular interventions.
AV Vascular is a post-processing software application intended for visualization, assessment, and quantification of vessels in computed tomography angiography (CTA) and magnetic resonance angiography (MRA) data with a unified workflow for both modalities.
AV Vascular includes the following functions:
-
Advanced visualization: the application provides all relevant views and interactions for CTA and MRA image review: 2D slides, MIP, MPR, curved MPR (cMPR), stretched MPR (sMPR), path-aligned views (cross-sectional and longitudinal MPRs), 3D volume rendering (VR).
-
Vessel segmentation: automatic bone removal and vessel segmentation for head/neck and body CTA data, automatic vessel centerline, lumen and outer wall extraction and labeling for the main branches of the vascular anatomy in head/neck and body CTA data, semi-automatic and manual creation of vessel centerline and lumen for CTA and MRA data, interactive two-point vessel centerline extraction and single-point centerline extension.
-
Vessel inspection: enable inspection of an entire vessel using the cMPR or sMPR views as well as inspection of a vessel locally using vessel-aligned views (cross-sectional and longitudinal MPRs) by selecting a position along a vessel of interest.
-
Measurements: ability to create and save measurements of vessel and lumen inner and outer diameters and area, as well as vessel length and angle measurements.
-
Measurements and tools that specifically support pre-procedural planning: manual and automatic ring marker placement for specific anatomical locations, length measurements of the longest and shortest curve along the aortic lumen contour, angle measurements of aortic branches in clock position style, saving viewing angles in C-arm notation, and configurable templated
-
Saving and export: saving and export of batch series and customizable reports.
This summarization is based on the provided 510(k) clearance letter for Philips Medical Systems' AV Vascular device.
Acceptance Criteria and Device Performance for Aorto-iliac Outer Wall Segmentation
| Metrics | Acceptance Criteria | Reported Device Performance (Mean with 98.75% confidence intervals) |
|---|---|---|
| 3D Dice Similarity Coefficient (DSC) | > 0.9 | 0.96 (0.96, 0.97) |
| 2D Dice Similarity Coefficient (DSC) | > 0.9 | 0.96 (0.95, 0.96) |
| Mean Surface Distance (MSD) | < 1.0 mm | 0.57 mm (0.485, 0.68) |
| Hausdorff Distance (HD) | < 3.0 mm | 1.68 mm (1.23, 2.08) |
| ∆Dmin (difference in minimum diameter) | > 95% |∆Dmin| < 5 mm | 98.8% (98.3-99.2%) |
| ∆Dmax (difference in maximum diameter) | > 95% |∆Dmax| < 5 mm | 98.5% (97.9-98.9%) |
The reported device performance for all primary and secondary metrics meets the predefined acceptance criteria.
Study Details for Aorto-iliac Outer Wall Segmentation Validation
-
Sample Size used for the Test Set and Data Provenance:
- Sample Size: 80 patients
- Data Provenance: Retrospectively collected from 7 clinical sites in the US, 3 European hospitals, and one hospital in Asia.
- Independence from Training Data: All performance testing datasets were acquired from clinical sites distinct from those which provided the algorithm training data. The algorithm developers had no access to the testing data, ensuring complete independence.
- Patient Characteristics: At least 80% of patients had thoracic and/or abdominal aortic diseases and/or iliac artery diseases (e.g., thoracic/abdominal aortic aneurysm, ectasia, dissection, and stenosis). At least 20% had been treated with stents.
- Demographics:
- Geographics: North America: 58 (72.5%), Europe: 3 (3.75%), Asia: 19 (23.75%)
- Sex: Male: 59 (73.75%), Female: 21 (26.25%)
- Age (years): 21-50: 2 (2.50%), 51-70: 31 (38.75%), >71: 45 (56.25%), Not available: 2 (2.5%)
-
Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:
- Number of Experts: Three
- Qualifications: US-board certified radiologists.
-
Adjudication Method for the Test Set:
- The three US-board certified radiologists independently performed manual contouring of the outer wall along the aorta and iliac arteries on cross-sectional planes for each CT angiographic image.
- After quality control, these three aortic and iliac arterial outer wall contours were averaged to serve as the reference standard contour. This can be considered a form of consensus/averaging after independent readings.
-
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- The provided document does not indicate that a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done to measure human reader improvement with AI assistance. The study focused on the standalone performance of the AI algorithm compared to an expert-derived ground truth.
-
Standalone (Algorithm Only Without Human-in-the-Loop Performance):
- Yes, the performance data provided specifically describes the standalone performance of the AI-based algorithm for aorto-iliac outer wall segmentation. The algorithm's output was compared directly against the reference standard without human intervention in the segmentation process.
-
Type of Ground Truth Used:
- Expert Consensus/Averaging: The ground truth was established by averaging the independent manual contouring performed by three US-board certified radiologists.
-
Sample Size for the Training Set:
- The document states that the testing data were independent of the training data and that developers had no access to the testing data. However, the exact sample size for the training set is not specified in the provided text.
-
How the Ground Truth for the Training Set Was Established:
- The document implies that training data were used, but it does not describe how the ground truth for the training set was established. It only ensures that the testing data did not come from the same clinical sites as the training data and that algorithm developers had no access to the testing data.
Ask a specific question about this device
(122 days)
AI-Rad Companion Brain MR is a post-processing image analysis software that assists clinicians in viewing, analyzing, and evaluating MR brain images.
AI-Rad Companion Brain MR provides the following functionalities:
• Automated segmentation and quantitative analysis of individual brain structures and white matter hyperintensities
• Quantitative comparison of each brain structure with normative data from a healthy population
• Presentation of results for reporting that includes all numerical values as well as visualization of these results
AI-Rad Companion Brain MR runs two distinct and independent algorithms for Brain Morphometry analysis and White Matter Hyperintensities (WMH) segmentation, respectively. In overall, comprises four main algorithmic features:
• Brain Morphometry
• Brain Morphometry follow-up
• White Matter Hyperintensities (WMH)
• White Matter Hyperintensities (WMH) follow-up
The feature for Brain Morphometry is available since the first version of the device (VA2x), while segmentation of White Matter Hyperintensities was added since VA4x and the follow-up analysis for both is available since VA5x. The brain morphometry and brain morphometry follow-up feature have not been modified and remain identical to previous VA5x mainline version.
AI-Rad Companion Brain MR VA60 is an enhancement to the predicate, AI-Rad Companion Brain MR VA50 (K232305). Just as in the predicate, the brain morphometry feature of AI-Rad Companion Brain MR addresses the automatic quantification and visual assessment of the volumetric properties of various brain structures based on T1 MPRAGE datasets. From a predefined list of brain structures (e.g. Hippocampus, Caudate, Left Frontal Gray Matter, etc.) volumetric properties are calculated as absolute and normalized volumes with respect to the total intracranial volume. The normalized values are compared against age-matched mean and standard deviations obtained from a population of healthy reference subjects. The deviation from this reference population can be visualized as 3D overlay map or out-of-range flag next to the quantitative values.
Additionally, identical to the predicate, the white matter hyperintensities feature addresses the automatic quantification and visual assessment of white matter hyperintensities on the basis of T1 MPRAGE and T2 weighted FLAIR datasets. The detected WMH can be visualized as a 3D overlay map and the quantification in count and volume as per 4 brain regions in the report.
Here's a structured overview of the acceptance criteria and study details for the AI-Rad Companion Brain MR, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Reported Device Performance
| Acceptance Criteria | Reported Device Performance (AI-Rad Companion Brain MR WMH Feature) | Reported Device Performance (AI-Rad Companion Brain MR WMH Follow-up Feature) |
|---|---|---|
| WMH Segmentation Accuracy | Pearson correlation coefficient between WMH volumes and ground truth annotation: 0.96Interclass correlation coefficient between WMH volumes and ground truth annotation: 0.94Dice score: 0.60F1-score: 0.67Detailed Dice Scores for WMH Segmentation:Mean: 0.60Median: 0.62STD: 0.1495% CI: [0.57, 0.63]Detailed ASSD Scores for WMH Segmentation:Mean: 0.05Median: 0.00STD: 0.1595% CI: [0.02, 0.08] | |
| New or Enlarged WMH Segmentation Accuracy (Follow-up) | Pearson correlation coefficient between new or enlarged WMH volumes and ground truth annotation: 0.76Average Dice score: 0.59Average F1-score: 0.71Detailed Dice Scores for New/Enlarged WMH Segmentation (by Vendor - Siemens, GE, Philips):Siemens: Mean 0.64, Med 0.67, STD 0.15, 95% CI [0.60, 0.69]GE: Mean 0.56, Med 0.60, STD 0.14, 95% CI [0.51, 0.61]Philips: Mean 0.55, Med 0.59, STD 0.16, 95% CI [0.50, 0.61]Detailed ASSD Scores for New/Enlarged WMH Segmentation (by Vendor - Siemens, GE, Philips):Siemens: Mean 0.02, Med 0.00, STD 0.06, 95% CI [0.00, 0.04]GE: Mean 0.09, Med 0.01, STD 0.23, 95% CI [0.03, 0.19]Philips: Mean 0.04, Med 0.00, STD 0.11, 95% CI [0.00, 0.08] |
Study Details
-
Sample Size Used for the Test Set and Data Provenance:
- White Matter Hyperintensities (WMH) Feature: 100 subjects (Multiple Sclerosis patients (MS), Alzheimer's patients (AD), cognitive impaired (CI), and healthy controls (HC)).
- White Matter Hyperintensities (WMH) Follow-up Feature: 165 subjects (Multiple Sclerosis patients (MS) and Alzheimer's patients (AD)).
- Data Provenance: Data acquired from Siemens, GE, and Philips scanners. Testing data had balanced distribution with respect to gender and age of the patient according to target patient population, and field strength (1.5T and 3T). This indicates a retrospective, multi-vendor, multi-national (implied by vendor diversity) dataset.
-
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications:
- Number of Experts: Three radiologists.
- Qualifications: Not explicitly stated beyond "radiologists." It is not specified if they are board-certified, or their years of experience.
-
Adjudication Method for the Test Set:
- For each dataset, three sets of ground truth annotations were created manually.
- Each set was annotated by a disjoint group consisting of an annotator, a reviewer, and a clinical expert.
- The clinical expert was randomly assigned per case to minimize annotation bias.
- The clinical expert reviewed and corrected the initial annotation of the changed WMH areas according to a specified annotation protocol. Significant corrections led to re-communication with the annotator and re-review.
- This suggests a 3+1 Adjudication process, where three initial annotations are reviewed by a clinical expert.
-
If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study Was Done:
- No, an MRMC comparative effectiveness study comparing human readers with and without AI assistance was not done. The study focuses on the standalone performance of the AI algorithm against expert ground truth.
-
If a Standalone (i.e. algorithm only without human-in-the loop performance) Was Done:
- Yes, a standalone performance study was done. The "Accuracy was validated by comparing the results of the device to manual annotated ground truth from three radiologists." This evaluates the algorithm's performance directly.
-
The Type of Ground Truth Used:
- Expert Consensus / Manual Annotation: The ground truth for both WMH and WMH follow-up features was established through "manual annotated ground truth from three radiologists" and involved a "standard annotation process" with annotators, reviewers, and clinical experts.
-
The Sample Size for the Training Set:
- The document states that the "training data used for the fine tuning the hyper parameters of WMH follow-up algorithm is independent of the data used to test the white matter hyperintensity algorithm follow up algorithm." However, the specific sample size for the training set is not provided in the given text.
-
How the Ground Truth for the Training Set Was Established:
- The document implies that the WMH follow-up algorithm "does not include any machine learning/ deep learning component," suggesting a rule-based or conventional image processing algorithm. Therefore, "training" might refer to parameter tuning rather than machine learning model training.
- For the "fine-tuning the hyper parameters of WMH follow-up algorithm," the ground truth establishment method for this training data is not explicitly detailed in the provided text. It only states that this data was "independent of the data used to test" the algorithm.
Ask a specific question about this device
(149 days)
Imagine® Enterprise Suite (IES) is a medical diagnostic device that receives, stores, and shares the medical images from and to DICOM-compliant entities such as imaging modalities (such as X-ray Angiograms (XA), Echocardiograms (US), MRI, CT, CR, DR, IVUS, OCT, PET and SPECT), external PACS, and other diagnostic workstations. It is used in the display and quantification of medical images, after image acquisition from modalities, for post-procedure clinical decision support. It constitutes a PACS for the communication and storage of medical images and provides a worklist of stored medical images that can be used to open patient studies in one of its image viewers. It is intended to display images and related information that are interpreted by trained professionals to render findings and/or diagnosis, but it does not directly generate any diagnosis or potential findings. Not intended for primary diagnosis of mammographic images. Not intended for intra-procedural or real-time use. Not intended for diagnostic use on mobile devices.
The Imagine® Enterprise Suite (IES) has, as its backbone, the IES PACS – a DICOM stack for the communication and storage of medical images. It is based on its predecessor, the HCP DICOM Net® PACS (K023467). The IES is made up of the following modules:
IES_EntViewer: This viewer module can be launched from the IES PACS Worklist and is intended primarily for the review and manipulation of angiographic X-ray images. It also supports the review of images from other modalities in single or combination views, thereby serving as a general-purpose multi-modality viewer.
IES_EchoViewer: This viewer module can be launched from the IES Worklist and is intended for specialized viewing, manipulation, and measurements of Echocardiography images.
IES_RadViewer: This viewer module can be launched from the IES Worklist and is intended for specialized viewing, manipulation, and measurements of Radiological images. It also supports the fusion of Radiological images (such as MRI and CT) with Nuclear Medicine images (such as PET and SPECT).
IES_ZFPViewer: This viewer is intended for non-diagnostic review of medical images over a web browser. It supports an independent worklist and a viewing component that requires no installation for the end user. It works within an intranet or over the internet via user-provided VPN or static IP.
AngioQuant: This module can be launched from the IES_EntViewer to perform automatic quantification of coronary arteries. It uses, as input, the cardiac angiogram studies stored on the IES PACS. It is intended for display and quantification of Xray angiographic images after image acquisition in the cathlab, for post-procedure clinical decision support within the cathlab workflow. It is not intended for intra-procedural or real-time use. The Imagine® Enterprise Suite (IES) is integrated with ML only for the segmentation of coronary vessels from X-ray angiographic images and uses deep learning methodology for image analysis.
Here's a breakdown of the acceptance criteria and study details for the Imagine® Enterprise Suite, specifically focusing on the AngioQuant module's machine learning component, as described in the provided 510(k) summary:
1. Table of Acceptance Criteria and Reported Device Performance
The 510(k) summary provides a narrative description of the performance evaluation rather than a direct table of acceptance criteria with corresponding performance metrics for every criterion. However, it explicitly states that the performance of the IES_AngioQuant module's machine learning-based coronary vessel segmentation function was evaluated using several metrics and compared against an FDA-cleared predicate device.
| Acceptance Criterion (Inferred from Study Design) | Reported Device Performance (IES_AngioQuant ML component) |
|---|---|
| Quantitative Performance Metrics for Coronary Vessel Segmentation | Evaluated using: |
| Jaccard Index (Intersection over Union) | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Dice Score | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Precision | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Accuracy | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Recall | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Visual Assessment of Segmentation | Conducted in conjunction with quantitative metrics. |
| Comparative Performance to Predicate Device | Performance was compared against the FDA-cleared predicate device, CAAS Workstation (510(k) No. K232147). |
| Reproducibility/Consistency of Ground Truth (Implicit for verification) | Verification performed by two independent board-certified interventional cardiologists. |
Note: The specific numerical values for Jaccard Index, Dice Score, Precision, Accuracy, and Recall are not provided in the summary. The summary highlights that these metrics were used for evaluation.
2. Sample Size and Data Provenance
- Test Set Sample Size: An independent external test set comprising 30 patient studies was used.
- Data Provenance: The dataset consisted of anonymized angiographic studies sourced from multiple U.S. and international clinical sites. It was a retrospective dataset. The dataset included adult patients of mixed gender and represented a range of age, body habitus, and diverse race and ethnicity. Clinically relevant variability, including lesion severity, vessel anatomy, image quality, and imaging equipment vendors, was represented.
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: Two independent board-certified interventional cardiologists.
- Qualifications of Experts: Each expert had more than 10 years of clinical experience.
4. Adjudication Method for the Test Set
The summary does not explicitly state a formal adjudication method like "2+1" or "3+1" for differences between the experts. However, it states that the ground truth (reference standard) was established using the FDA-cleared Medis QAngio XA (K182611) software, with verification performed by the two independent board-certified interventional cardiologists. This implies that the experts reviewed and confirmed the ground truth generated by the predicate software, rather than independently generating it and then adjudicating differences.
5. MRMC Comparative Effectiveness Study
An MRMC comparative effectiveness study was not explicitly described in the summary. The performance comparison was primarily an algorithm-only comparison against a predicate device (CAAS Workstation) for the ML component. The summary does not mention how much human readers improve with or without AI assistance.
6. Standalone (Algorithm Only) Performance
Yes, a standalone (algorithm only without human-in-the-loop performance) study was done for the IES_AngioQuant module's machine learning-based coronary vessel segmentation function. Its performance was evaluated using quantitative metrics and visual assessment, and then compared against the FDA-cleared predicate device (CAAS Workstation).
7. Type of Ground Truth Used
The ground truth was established using an FDA-cleared software (Medis QAngio XA, K182611), with its output verified by expert consensus of two independent board-certified interventional cardiologists.
8. Sample Size for the Training Set
A total of 762 anonymized angiographic studies were used for training, validation, and internal testing sets combined. The summary does not provide an exact breakdown of how many studies were specifically in the training set versus the validation and internal testing sets.
9. How the Ground Truth for the Training Set Was Established
The summary states that the ground truth ("truthing") for the dataset (which includes the training, validation, and internal testing sets) was established using the FDA-cleared Medis QAngio XA (K182611) software, with verification performed by two independent board-certified interventional cardiologists, each with more than 10 years of clinical experience. Implicitly, this same method was used for establishing ground truth for the training set.
Ask a specific question about this device
(47 days)
TruSPECT is intended for acceptance, transfer, display, storage, and processing of images for detection of radioisotope tracer uptakes in the patient's body. The device using various processing modes supported by the various clinical applications and various features designed to enhance image quality. The emission computerized tomography data can be coupled with registered and/or fused CT/MR scans and with physiological signals in order to depict, localize, and/or quantify the distribution of radionuclide tracers and anatomical structures in scanned body tissue for clinical diagnostic purposes. The acquired tomographic image may undergo emission-based attenuation correction.
Visualization tools include segmentation, colour coding, and polar maps. Analysis tools include Quantitative Perfusion SPECT (QPS), Quantitative Gated SPECT (QGS) and Quantitative Blood Pool Gated SPECT (QBS) measurements, Multi Gated Acquisition (MUGA) and Heart-to-Mediastinum activity ratio (H/M).
The system also includes reporting tools for formatting findings and user selected areas of interest. It is capable of processing and displaying the acquired information in traditional formats, as well as in three-dimensional renderings, and in various forms of animated sequences, showing kinetic attributes of the imaged organs.
TruSPECT is based on Windows operating system. Due to special customer requirements and the clinical focus the TruSPECT can be configured with different combinations of Windows OS based software options and clinical applications which are intended to assist the physician in diagnosis and/or treatment planning. This includes commercially available post-processing software packages.
TruSPECT is a processing workstation primarily intended for, but not limited to cardiac applications. The workstation can be integrated with the D-SPECT cardiac scanner system or used as a standalone post-processing station.
The TruSPECT Processing Station is a software-only medical device (SaMD) designed to operate on a dedicated, high-performance computer platform. It is distributed as pre-installed medical imaging software intended to support image visualization, quantitation, analysis, and comparison across multiple imaging modalities and acquisition time points. The software supports both functional imaging modalities, such as Single Photon Emission Computed Tomography (SPECT) and Nuclear Medicine (NM), as well as anatomical imaging modalities, such as Computed Tomography (CT).
The system enables integration, display, and analysis of multimodal image datasets to assist qualified healthcare professionals in image review and interpretation within the clinical workflow. The software is intended for use by trained medical professionals and assists in image assessment for various clinical applications, including but not limited to cardiology, electrophysiology, and organ function evaluation. The software does not perform automated diagnosis and does not replace the clinical judgment of the user.
The TruSPECT software operates on the Microsoft Windows® operating system and can be configured with various software modules and clinical applications according to user requirements and intended use. The configuration may include proprietary Spectrum Dynamics modules and commercially available third-party post-processing software packages operating within the TruSPECT framework.
The modified TruSPECT system integrates the TruClear AI application as part of its software suite. The TruClear AI module is a software-based image processing component designed to assist in the enhancement of SPECT image data acquired on the TruSPECT system. The module operates within the existing reconstruction and review workflow and does not alter the system's intended use, indications for use, or fundamental technology.
Verification and validation activities were performed to confirm that the addition of the TruClear AI module functions as intended and that overall system performance remains consistent with the previously cleared TruSPECT configuration. These activities included performance evaluations using simulated phantom datasets and representative clinical image data, conducted in accordance with FDA guidance. The results demonstrated that the modified TruSPECT system incorporating TruClear AI meets all predefined performance specifications and continues to operate within the parameters of its intended clinical use.
Here's a breakdown of the acceptance criteria and study details for the TruClear AI module of the TruSPECT Processing Station, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Reported Device Performance
| Parameter | Acceptance Criteria | Reported Device Performance (Key Performance Results) |
|---|---|---|
| LVEF | Bland Altman Mean: ±3% | Strong correlation (r=0.94). Bland–Altman analyses showed mean differences within pre-specified acceptance criteria. |
| Bland Altman SD: ≤ 4% | (Implicitly met as mean differences were within criteria) | |
| Regression r (min): > 0.8 | r=0.94 | |
| Slope (range): 0.9 – 1.1 | (Implicitly met as mean differences were within criteria) | |
| Intercept (limit): ± 10% | (Implicitly met as mean differences were within criteria) | |
| EDV | Bland Altman Mean: ± 5 ml | Strong correlation (r=0.98). Bland–Altman analyses showed mean differences within pre-specified acceptance criteria. |
| Bland Altman SD: ≤ 8 ml | (Implicitly met as mean differences were within criteria) | |
| Regression r (min): > 0.8 | r=0.98 | |
| Slope (range): 0.9 – 1.1 | (Implicitly met as mean differences were within criteria) | |
| Intercept (limit): ± 10 ml | (Implicitly met as mean differences were within criteria) | |
| Perfusion Volume | Bland Altman Mean: ± 5 ml | Strong correlation. Bland–Altman analyses showed mean differences within pre-specified acceptance criteria. |
| Bland Altman SD: ≤ 8 ml | (Implicitly met as mean differences were within criteria) | |
| Regression r (min): > 0.8 | (Implicitly met as strong correlation noted) | |
| Slope (range): 0.9 – 1.1 | (Implicitly met as mean differences were within criteria) | |
| Intercept (limit): ± 10 ml | (Implicitly met as mean differences were within criteria) | |
| TPD | Bland Altman Mean: ± 3% | Strong correlation (r=0.98). Bland–Altman analyses showed mean differences within pre-specified acceptance criteria. |
| Bland Altman SD: ≤ 5% | (Implicitly met as mean differences were within criteria) | |
| Regression r (min): > 0.8 | r=0.98 | |
| Slope (range): 0.9 – 1.1 | (Implicitly met as mean differences were within criteria) | |
| Intercept (limit): ± 10% | (Implicitly met as mean differences were within criteria) | |
| Visual Similarity (Denoised vs. Reference) | (Not explicitly quantified as a numeric acceptance criterion range, but implied) | Denoised images were 'similar' to reference, consistent with high inter-reader agreement. Visual similarity ratings indicated denoised images were 'similar' to reference. |
| Inter-observer Agreement (Visual Comparison) | (Not explicitly quantified as an acceptance criterion) | 97–100% after dichotomization (scores ≥3 vs <3) across key metrics. |
Study Details
-
Sample size used for the test set and the data provenance:
- Test Set Sample Size: 24 patients (8 female, 16 male), which yielded 74 images.
- Data Provenance: Multi-center, retrospective dataset from three hospitals in the UK and Germany.
-
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Two (2)
- Qualifications of Experts: Independent, board-certified nuclear medicine physicians.
-
Adjudication method for the test set:
- The document states "two independent, board-certified nuclear medicine physicians visually compared denoised low-count images to the high-count reference using a 5-point Likert scale; inter-observer percent agreement after dichotomization (scores ≥3 vs <3) was 97–100% across key metrics." This suggests a consensus-based approach for establishing some aspect of the ground truth, particularly for the visual similarity assessment, though not explicitly a formal 2+1 or 3+1 adjudication for defining disease status. The reference standard itself was the high-count image, and the experts were comparing the derived AI-processed images to this reference.
-
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- An MRMC comparative effectiveness study was not explicitly described in terms of human readers improving with AI vs. without AI assistance. The study focused on validating the AI algorithm's output against a reference standard (high-count image) using visual and quantitative assessment. The two nuclear medicine physicians visually compared the denoised images to the reference, not their own diagnostic performance with and without AI.
-
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance assessment of the algorithm was conducted. The quantitative evaluation using the FDA-cleared Cedars-Sinai QPS/QGS to derive perfusion and functional parameters (TPD, volume, EDV, LVEF) directly compared the algorithm's output on low-count images (after denoising) to the high-count reference images. The Bland-Altman and correlation analyses are indicators of standalone performance.
-
The type of ground truth used:
- The primary reference standard (ground truth) for the study was the clinical routine high-count SPECT image (~1.0 MCounts) acquired under standard D-SPECT protocols.
- For quantitative parameters, FDA-cleared Cedars-Sinai QPS/QGS was used on the high-count reference images to derive the ground truth values for perfusion and functional parameters (TPD, volume, EDV, LVEF).
- For visual assessment, the "high-count reference" images served as the ground truth for comparison.
-
The sample size for the training set:
- The total dataset was 352 patients. The training/tuning set consisted of a portion of these patients; specifically, the "held-out test set" was 24 patients, meaning the remaining 328 patients (352 - 24) were used for training and tuning the algorithm.
-
How the ground truth for the training set was established:
- The document implies the same ground truth methodology was used for the training set as for the test set. The algorithm was trained to transform low-count images to effectively match the characteristics of the clinical routine high-count SPECT image as the "gold standard." The Cedars-Sinai QPS/QGS would also have been used on these high-count images to generate the quantitative targets for training, allowing the AI to learn to derive similar quantitative parameters from denoised low-count images.
Ask a specific question about this device
(104 days)
PeekMed web is a system designed to help healthcare professionals carry out pre-operative planning for several surgical procedures, based on their imported patients' imaging studies. Experience in usage and a clinical assessment are necessary for the proper use of the system in the revision and approval of the output of the planning. The multi-platform system works with a database of digital representations related to surgical materials supplied by their manufacturers.
This medical device consists of a decision support tool for qualified healthcare professionals to quickly and efficiently perform the pre-operative planning for several surgical procedures, using medical imaging with the additional capability of planning the 2D or 3D environment. The system is designed for the medical specialties within surgery, and no specific use environment is mandatory, whereas the typical use environment is a room with a computer. The patient target group is adult patients who have an injury or disability diagnosed previously. There are no other considerations for the intended patient population.
PeekMed web is a system designed to help healthcare professionals carry out pre-operative planning for several surgical procedures, based on their imported patients' imaging studies. Experience in usage and a clinical assessment are necessary for the proper use of the system in the revision and approval of the output of the planning.
The multi-platform system works with a database of digital representations related to surgical materials supplied by their manufacturers.
As the PeekMed web is capable of representing medical images in a 2D or 3D environment, performing relevant measurements on those images, and also capable of adding templates, it can then provide a total overview of the surgery. Being software, it does not interact with any part of the body of the user and/or patient.
The acceptance criteria and study proving device performance are described below, based on the provided FDA 510(k) clearance letter for PeekMed web (K252856).
1. Table of Acceptance Criteria and Reported Device Performance
The provided document lists the acceptance criteria but does not explicitly state the reported device performance for each metric from the validation studies. It only states that the efficacy results "met the acceptance criteria for ML model performance." Therefore, the "Reported Device Performance" column reflects this general statement.
| ML model | Acceptance Criteria | Reported Device Performance |
|---|---|---|
| Segmentation | DICE is no less than 90%HD-95 is no more than 8STD DICE is between +/- 10%Precision is more than 85%Recall is more than 90% | Met the acceptance criteria for ML model performance |
| Landmarking | MRE is no more than 7mmSTD MRE is between +/- 5mm | Met the acceptance criteria for ML model performance |
| Classification | Accuracy is no less than 90%.Precision is no less than 85%Recall is no less than 90%F1 score is no less than 90% | Met the acceptance criteria for ML model performance |
| Detection | MAP is no less than 90%.Precision is no less than 85%Recall is no less than 90% | Met the acceptance criteria for ML model performance |
| Reconstruction | DICE is no less than 90%HD-95 is no more than 8STD DICE is between +/- 10%Precision is more than 85%Recall is more than 90% | Met the acceptance criteria for ML model performance |
2. Sample Size Used for the Test Set and Data Provenance
The document distinguishes between a "testing" dataset (used for internal evaluation during development) and an "external validation" dataset. The external validation dataset serves as the independent test set for assessing final model performance.
- Test Set (External Validation):
- Segmentation ML model: 672 unique datasets
- Landmarking ML model: 561 unique datasets
- Classification ML model: 367 unique datasets
- Detection ML model: 198 unique datasets
- Reconstruction ML model: 87 unique datasets
- Data Provenance: The document states that ML models were developed with datasets "from multiple sites." It does not specify the country of origin of the data nor explicitly state whether the data was retrospective or prospective, though "external validation datasets were collected independently of the development data" and "labeled by a separate team," suggesting a retrospective approach to data collection for the validation.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
The document mentions that the "External validation...was employed to provide an accurate assessment of the model's performance." and that the dataset was "labeled by a separate team". It does not specify the number of experts used or their specific qualifications (e.g., "radiologist with 10 years of experience").
4. Adjudication Method for the Test Set
The document states that the ground truth for the external validation dataset was "labeled by a separate team." It does not specify an adjudication method such as 2+1, 3+1, or if multiple experts were involved and how discrepancies were resolved.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
No, the document does not indicate that a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done to evaluate how much human readers improve with AI vs. without AI assistance. The testing focused on the standalone performance of the ML models against a predefined ground truth.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
Yes, a standalone performance evaluation (algorithm only without human-in-the-loop) was done. The performance data section describes the "efficacy results of the [specific] ML model using the testing and external validation datasets against the predefined ground truth," indicating an assessment of the algorithm's performance independent of human interaction during the measurement. The device is described as a "decision support tool" requiring "clinical assessment... for the proper use of the system in the revision and approval of the output," implying the algorithm provides output that a human reviews, but the performance testing described here is on the raw algorithm output.
7. The Type of Ground Truth Used
The ground truth used for both the training and test sets is referred to as "predefined ground truth" and established by "labeling" or a "separate team" for the external validation sets. This implies a human-generated expert consensus or annotation-based ground truth, although the specific expertise and method of consensus are not detailed. It is not explicitly stated as pathology or outcomes data.
8. The Sample Size for the Training Set
The ML models were trained with datasets from multiple sites totaling:
- 2852 X-ray datasets
- 2073 CT scans
- 209 MRIs
These total datasets were split as follows:
- Training Set: 80% of the total dataset for each modality.
- X-ray: 0.80 * 2852 = 2281.6 (approx. 2282)
- CT scans: 0.80 * 2073 = 1658.4 (approx. 1658)
- MRIs: 0.80 * 209 = 167.2 (approx. 167)
9. How the Ground Truth for the Training Set Was Established
The document states, "ML models were developed with datasets...We trained the ML models with 80% of the dataset..." and refers to "predefined ground truth." While it doesn't explicitly detail the process for training data, it is implied that the training data also had human-generated ground truth (annotations/labels), similar to the validation data, as ML models rely on labeled data for supervised learning. It mentions that "leakage between development and validation data sets did not occur," and the external validation set was "labeled by a separate team," suggesting the training data was also labeled by experts, possibly the "internal procedures" mentioned for ML model development.
Ask a specific question about this device
(116 days)
Alzevita is intended for use by neurologists and radiologists experienced in the interpretation and analysis of brain MRI scans. It enables automated labelling, visualization, and volumetric measurement of the hippocampus from high-resolution T1-weighted MRI images. The software facilitates comparison of hippocampal volume against a normative dataset derived from MRI scans of healthy control subjects aged 55 to 90 years, acquired using standardized imaging protocols on 1.5T/3T MRI scanners.
Alzevita is a cloud-based, AI-powered medical image processing software as a medical device intended to assist neurologists and radiologists with expertise in the analysis of 3D brain MRI scans. The software performs fully automated segmentation and volumetric quantification of the hippocampus, a brain structure involved in memory and commonly affected by neurodegenerative conditions.
Alzevita is designed to replace manual hippocampal segmentation workflows with a fast, reproducible, and standardized process. It provides quantitative measurements of hippocampal volume, enabling consistent outputs that can assist healthcare professionals in evaluating structural brain changes.
The software operates through a secure web interface and is compatible with commonly used operating systems and browsers. It accepts 3D MRI scans in DICOM or NIfTI format and displays the MRI image in the MRI viewer allowing trained healthcare professionals to view, zoom, and analyze the MRI scan alongside providing a visual and tabular volumetric analysis report.
The underlying algorithm used in Alzevita is locked, meaning it does not modify its behavior at runtime or adapt to new inputs. This ensures consistent performance and reproducibility of results across users and imaging conditions. Any future modifications to the algorithm including performance updates or model re-training will be submitted to the FDA for review and clearance prior to deployment, in compliance with FDA regulatory requirements and applicable guidance for AI/ML-based SaMD.
Here's a detailed description of the acceptance criteria and the study proving the Alzevita device meets those criteria, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance
| Metric | Acceptance Criteria | Reported Device Performance (Alzevita 95% Confidence Intervals) | Criteria (Pass/Fail) |
|---|---|---|---|
| Overall Dice Score | ≥ 75% | (0.85, 0.86) | Pass |
| Overall Hausdorff Distance | ≤ 6.1 mm | (1.43, 1.59) | Pass |
| Overall Correlation Coefficient | ≥ 0.82 | Not explicitly given as CI, but stated as met | Pass |
| Overall Relative Volume Difference | ≤ 24.6% | Not explicitly given as CI, but stated as met | Pass |
| Overall Bland-Altman Mean Difference (Total Hippocampus Volume) | ≤ 1010 mm³ | Not explicitly given as CI, but stated as met | Pass |
| Subgroup Dice Score (Clinical Subgroups) | ≥ 83% (implied from results) | Control: (0.87, 0.88)MCI: (0.84, 0.85)AD: (0.82, 0.84) | Pass |
| Subgroup Hausdorff Distance (Clinical Subgroups) | ≤ 3 mm (implied from results) | Control: (1.32, 1.41)MCI: (1.44, 1.62)AD: (1.48, 2.10) | Pass |
| Subgroup Dice Score (Gender) | ≥ 83% (implied) | Female: (0.85, 0.87)Male: (0.84, 0.86) | Pass |
| Subgroup Hausdorff Distance (Gender) | ≤ 3 mm (implied) | Female: (1.40, 1.57)Male: (1.41, 1.66) | Pass |
| Subgroup Dice Score (Magnetic Field Strength) | ≥ 83% (implied) | 3T: (0.86, 0.87)1.5T: (0.83, 0.85) | Pass |
| Subgroup Hausdorff Distance (Magnetic Field Strength) | ≤ 3 mm (implied) | 3T: (1.38, 1.47)1.5T: (1.45, 1.79) | Pass |
| Subgroup Dice Score (Slice Thickness) | ≥ 83% (implied) | 1 mm: (0.87, 0.88)1.2 mm: (0.84, 0.85) | Pass |
| Subgroup Hausdorff Distance (Slice Thickness) | ≤ 3 mm (implied) | 1 mm: (1.35, 1.43)1.2 mm: (1.47, 1.72) | Pass |
| Subgroup Dice Score (US Geographical Region) | ≥ 83% (implied) | East US: (0.84, 0.86)West US: (0.85, 0.87)Central US: (0.85, 0.87)Canada: (0.82, 0.88) | Pass |
| Subgroup Hausdorff Distance (US Geographical Region) | ≤ 3 mm (implied) | East US: (1.44, 1.71)West US: (1.35, 1.55)Central US: (1.35, 1.47)Canada: (1.07, 2.34) | Pass |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size for Test Set: 298 subjects.
- Data Provenance: The test set data was collected from the publicly available ADNI (Alzheimer's Disease Neuroimaging Initiative) dataset. It is retrospective and sampled using stratified random sampling, with subjects recruited from ADNI 1 & ADNI 3 datasets.
- Geographical Distribution: Approximately equal geographical distribution within the USA (East coast, Central US regions, West coast) and Canada.
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
- Number of Experts: Three certified radiologists.
- Qualifications of Experts: They are described as "certified radiologists in India, adhering to widely recognized and standardized segmentation protocols." Specific experience level (e.g., years of experience) is not provided.
4. Adjudication Method for the Test Set
- Adjudication Method: A consensus ground truth was established by integrating individual delineations from the three certified radiologists into a single consensus mask for each case. This integration was performed using the STAPLE (Simultaneous Truth and Performance Level Estimation) algorithm.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was a MRMC study done? No, the document describes a standalone performance evaluation of the Alzevita algorithm against a consensus ground truth. There is no mention of a human-in-the-loop study comparing human readers with and without AI assistance.
- Effect size of human readers improvement: Not applicable, as no MRMC study was conducted.
6. Standalone Performance Study
- Was a standalone performance study done? Yes. The entire validation study described evaluates the Alzevita algorithm's performance in segmenting the hippocampus and calculating its volume against a ground truth, without human intervention in the segmentation process.
7. Type of Ground Truth Used
- Type of Ground Truth: Expert consensus. Specifically, it was established through manual segmentation by three certified radiologists, with their individual segmentations integrated via the STAPLE algorithm. This STAPLE-derived consensus mask served as the ground truth.
8. Sample Size for the Training Set
- Sample Size for Training Set: 200 cases.
9. How the Ground Truth for the Training Set Was Established
- Training Set Ground Truth Establishment: "Expert radiologists manually segmented the hippocampus to create the ground truth, which is then used as input for training the Alzevita segmentation model." The number and specific qualifications of the expert radiologists for the training set's ground truth are not detailed beyond "expert radiologists." There is no mention of an adjudication method like STAPLE for the training set ground truth, suggesting individual expert segmentation or an unspecified consensus process.
Ask a specific question about this device
(172 days)
AI-CVD® is an opportunistic AI-powered quantitative imaging tool that provides automated CT-derived anatomical and density-based measurements for clinician review. The device does not provide diagnostic interpretation or risk prediction. It is solely intended to aid physicians and other healthcare providers in determining whether additional diagnostic tests are appropriate for implementing preventive healthcare plans. AI-CVD® has a modular structure where each module is intended to report quantitative imaging measurements for each specific component of the CT scan. AI-CVD® quantitative imaging measurement modules include coronary artery calcium (CAC) score, aortic wall calcium score, aortic valve calcium score, mitral valve calcium score, cardiac chambers volumetry, epicardial fat volumetry, aorta and pulmonary artery sizing, lung density, liver density, bone mineral density, and muscle & fat composition.
Using AI-CVD® quantitative imaging measurements and their clinical evaluation, healthcare providers can investigate patients who are unaware of their risk of coronary heart disease, heart failure, atrial fibrillation, stroke, osteoporosis, liver steatosis, diabetes, and other adverse health conditions that may warrant additional risk assessment, monitoring or follow-up. AI-CVD® quantitative imaging measurements are to be reviewed by radiologists or other medical professionals and should only be used by healthcare providers in conjunction with clinical evaluation.
AI-CVD® is not intended to rule out the risk of cardiovascular diseases. AI-CVD® opportunistic screening software can be applied to non-contrast thoracic CT scans such as those obtained for CAC scans, lung cancer screening scans, and other chest diagnostic CT scans. Similarly, AI-CVD® opportunistic screening software can be applied to contrast-enhanced CT scans such as coronary CT angiography (CCTA) and CT pulmonary angiography (CTPA) scans. AI-CVD® opportunistic bone density module and liver density module can be applied to CT scans of the abdomen and pelvis. All volumetric quantitative imaging measurements from the AI-CVD® opportunistic screening software are adjusted by body surface area (BSA) and reported both in cubic centimeter volume (cc) and percentiles by gender reference data from people who participated in the Multi-Ethnic Study of Atherosclerosis (MESA) and Framingham Heart Study (FHS). Except for coronary artery calcium scoring, other AI-CVD® modules should not be ordered as a standalone CT scan but instead should be used as an opportunistic add-on to existing and new CT scans.
AI-CVD® is an opportunistic AI-powered modular tool that provides automated quantitative imaging reports on CT scans and outputs the following measurements:
- Coronary Artery Calcium Score
- Aortic Wall and Valves Calcium Scores
- Mitral Valve Calcium Score
- Cardiac Chambers Volume
- Epicardial Fat Volume
- Aorta and Main Pulmonary Artery Volume and Diameters
- Liver Attenuation Index
- Lung Attenuation Index
- Muscle and Visceral Fat
- Bone Mineral Density
The above quantitative imaging measurements enable care providers to take necessary actions to prevent adverse health outcomes.
AI-CVD® modules are installed by trained personnel only. AI-CVD® is executed via parent software which provides the necessary inputs and receives the outputs. The software itself does not offer user controls or access.
AI-CVD® reads a CT scan (in DICOM format) and extracts scan specific information like acquisition time, pixel size, scanner type, etc. AI-CVD® uses trained AI models that automatically segment and report quantitative imaging measurements specific to each AI-CVD® module. The output of each AI-CVD® module is inputted into the parent software which exports the results for review and confirmation by a human expert.
AI-CVD® is a post-processing tool that works on existing and new CT scans.
AI-CVD® passes if the human expert approves the segmentation highlighted by the AI-CVD® module is correctly placed on the target anatomical region. For example, Software passes if the human expert sees the AI-CVD® cardiac chamber volumetry module highlighted the heart anatomy.
AI-CVD® fails if the human expert sees the segmentation highlighted by the AI-CVD® module is not correctly placed on the target anatomical region. For example, Software fails if the human expert sees the AI-CVD® cardiac chamber volumetry module highlighted the lungs anatomy or a portion of the sternum or any adjacent organs. Furthermore, Software fails if the human expert sees that the quality of the CT scan is compromised by image artifacts, severe motion, or excessive noise.
The user cannot change or edit the segmentation or results of the device. The user must accept or reject the segmentation where the AI-CVD® quantitative imaging measurements are performed.
AI-CVD® is an AI-powered post-processing tool that works on non-contrast and contrast-enhanced CT scans of chest and abdomen.
AI-CVD® is a multi-module deep learning-based software platform developed to automatically segment and quantify a broad range of cardiovascular, pulmonary, musculoskeletal, and metabolic biomarkers from standard chest or whole-body CT scans. AI-CVD® system builds upon the open-source TotalSegmentator as its foundational segmentation framework, incorporating additional supervised learning and model training layers specific to each module's clinical task.
The provided FDA 510(k) Clearance Letter for AI-CVD® outlines several modules, each with its own evaluation. However, the document does not provide a single, comprehensive table of acceptance criteria with reported device performance for all modules. Instead, it describes clinical validation studies and agreement analyses, generally stating "acceptable bias and reproducibility" or "acceptable agreement and reproducibility" without specific numerical thresholds or metrics. Similarly, detailed information on sample sizes, ground truth establishment methods (beyond general "manual reference standards" or "human expert knowledge"), and expert qualifications is quite limited for most modules.
Here's an attempt to extract and synthesize the information based on the provided text, recognizing the gaps:
Acceptance Criteria and Study Details for AI-CVD®
1. Table of Acceptance Criteria and Reported Device Performance
The document does not explicitly state numerical acceptance criteria for each module. Instead, it describes performance in terms of agreement with manual measurements or gold standard references, generally stating "acceptable bias and reproducibility" or "comparable performance." The table below summarizes what is reported.
| AI-CVD® Module | Acceptance Criteria (Implicit/General) | Reported Device Performance |
|---|---|---|
| Coronary Artery Calcium Score | Comparative safety and effectiveness with expert manual measurements. | Demonstrated comparative safety and effectiveness between expert manual measurements and both automated Agatston CAC scores and AI-derived relative density-based calcium scores. |
| Aortic Wall & Aortic Valve Calcium Scores | Acceptable bias and reproducibility compared to manual reference standards. | Bland-Altman agreement analyses demonstrated acceptable bias and reproducibility across imaging protocols. |
| Mitral Valve Calcium Score | Reproducible quantification compared to manual measurements. | Agreement analyses demonstrated reproducible mitral valve calcium quantification across imaging protocols. |
| Cardiac Chambers Volume | Based on previously FDA-cleared technology (AutoChamber™ K240786). | (No new performance data presented for this specific module as it leverages a cleared predicate). |
| Epicardial Fat Volume | Acceptable agreement and reproducibility with manual measurements. | Agreement studies comparing AI-derived epicardial fat volumes with manual measurements and across non-contrast and contrast-enhanced CT acquisitions demonstrated acceptable agreement and reproducibility. |
| Aorta & Main Pulmonary Artery Volume & Diameters | Low bias and comparable performance with manual reference measurements. | Agreement studies comparing AI-derived measurements with manual reference measurements demonstrated low bias and comparable performance across gated and non-gated CT acquisitions. Findings support reliability. |
| Liver Attenuation Index | Acceptable reproducibility across imaging protocols. | Agreement analysis comparing AI-derived liver attenuation measurements across imaging protocols demonstrated acceptable reproducibility. |
| Lung Attenuation Index | Reproducible measurements across CT acquisitions. | Agreement studies demonstrated reproducible lung density measurements across gated and non-gated CT acquisitions. |
| Muscle & Visceral Fat | Acceptable reproducibility across imaging protocols. | Agreement analyses between AI-derived fat and muscle measurements demonstrated acceptable reproducibility across imaging protocols. |
| Bone Mineral Density | Based on previously FDA-cleared technology (AutoBMD K213760). | (No new performance data presented for this specific module as it leverages a cleared predicate). |
2. Sample Size and Data Provenance for the Test Set
- Coronary Artery Calcium (CAC) Score:
- Sample Size: 913 consecutive coronary calcium screening CT scans.
- Data Provenance: "Real-world" data acquired across three community imaging centers. This suggests a retrospective collection from a U.S. or similar healthcare system, though the specific country of origin is not explicitly stated. The term "consecutive" implies that selection bias was minimized.
- Other Modules (Aortic Wall/Valve, Mitral Valve, Epicardial Fat, Aorta/Pulmonary Artery, Liver, Lung, Muscle/Visceral Fat):
- The document refers to "agreement analyses" and "agreement studies" but does not specify the sample size for the test sets used for these individual modules.
- Data Provenance: The document generally states that "clinical validation studies were performed based upon retrospective analyses of AI-CVD® measurements performed on large population cohorts such as the Multi-Ethnic Study of Atherosclerosis (MESA) and Framingham Heart Study (FHS)." It is unclear if these cohorts were solely used for retrospective analysis, or if the "real-world" data mentioned for CAC was also included for other modules. MESA and FHS are prospective, longitudinal studies conducted primarily in the U.S.
3. Number of Experts and Qualifications for Ground Truth
- Coronary Artery Calcium (CAC) Score:
- Number of Experts: Unspecified, referred to as "expert manual measurements."
- Qualifications: Unspecified, but implied to be human experts capable of performing manual Agatston scoring.
- Other Modules:
- Number of Experts: Unspecified, generally referred to as "manual reference standards" or "manual measurements."
- Qualifications: Unspecified.
4. Adjudication Method for the Test Set
The document does not describe a specific adjudication method (e.g., 2+1, 3+1) for establishing ground truth on the test set. It mentions "expert manual measurements" or "manual reference standards," suggesting that the ground truth was established by human experts, but the process of resolving discrepancies among multiple experts (if any were used) is not detailed.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
-
Was an MRMC study done? No, the document does not describe an MRMC comparative effectiveness study where human readers' performance with and without AI assistance was evaluated. The performance data presented focuses on the standalone AI performance compared to human expert measurements.
-
Effect Size of Human Reader Improvement: Not applicable, as an MRMC study was not described.
6. Standalone (Algorithm Only) Performance Study
- Was a standalone study done? Yes, the described performance evaluations for all modules (where new performance data was presented) are standalone performance studies. The studies compare the AI-CVD® algorithm's output directly against manual measurements or established reference standards.
7. Type of Ground Truth Used
- Coronary Artery Calcium Score: Expert manual measurements (Agatston scores).
- Aortic Wall and Aortic Valve Calcium Scores: Manual reference standards.
- Mitral Valve Calcium Score: Manual measurements.
- Epicardial Fat Volume: Manual measurements.
- Aorta and Main Pulmonary Artery Volume and Diameters: Manual reference measurements.
- Liver Attenuation Index: (Implicitly) Manual reference measurements or established methods for hepatic attenuation.
- Lung Attenuation Index: (Implicitly) Manual reference measurements or established methods for lung density.
- Muscle and Visceral Fat: (Implicitly) Manual reference measurements.
- Cardiac Chambers Volume & Bone Mineral Density: Leveraged previously cleared predicate devices, suggesting the ground truth for their original clearance would apply.
8. Sample Size for the Training Set
The document provides information on the foundational segmentation framework (TotalSegmentator) and hints at customization for AI-CVD® modules:
- TotalSegmentator (Foundational Framework):
- General anatomical segmentation: 1,139 total body CT cases.
- High-resolution cardiac structure segmentation: 447 coronary CT angiography (CCTA) scans.
- AI-CVD® Custom Datasets: The document states that "Custom datasets were constructed for coronary artery calcium scoring, aortic and valvular calcifications, cardiac chamber volumetry, epicardial and visceral fat quantification, bone mineral density assessment, liver fat estimation, muscle mass and quality, and lung attenuation analysis." However, it does not provide the specific sample sizes for these custom training datasets for each AI-CVD® module.
9. How Ground Truth for the Training Set Was Established
- TotalSegmentator (Foundational Framework): The architecture utilizes nnU-Net, which was trained on the described CT cases. Implicitly, these cases would have had expert-derived ground truth segmentations for training the neural network.
- AI-CVD® Custom Datasets: "For each module, iterative model enhancement was applied: human reviewers evaluated model-generated segmentations and corrected any inaccuracies, and these corrections were looped back into the training process to improve performance and generalizability." This indicates that human experts established and refined the ground truth by reviewing and correcting model-generated segmentations, which were then used for retraining. The qualifications of these "human reviewers" are not specified.
Ask a specific question about this device
(266 days)
The MediAI-BA is designed to view and quantify bone age from 2D Posterior Anterior (PA) view of left-hand radiographs using deep learning techniques to aid in the analysis of bone age assessment of patients between 2 to 18 years old for pediatric radiologists. The results should not be relied upon alone by pediatric radiologists to make diagnostic decisions. The images shall be with left hand and wrist fully visible within the field of view, and shall be without any major bone destruction, deformity, fracture, excessive motion, or other major artifacts.
Limitations:
- This software is not intended for use in patients with growth disorders caused by congenital anomalies (e.g., Down syndrome, Noonan syndrome, congenital adrenal hyperplasia, methylmalonic acidemia, skeletal dysplasia, chronic renal disease, or prior long-term steroid exposure), as these conditions may cause complex skeletal changes beyond bone maturation.
- Images showing anatomical variations or notable abnormalities (e.g., bone tumors, sequelae of fractures, or congenital deformities) in the region required for interpretation are excluded from the intended use.
This AI-based software utilizes an internal algorithm that integrates global skeletal maturity features extracted from the whole hand radiograph with local skeletal maturity features derived from key Regions of Interest (ROIs). By synthesizing these skeletal maturity features, the software determines the accurate final bone age.
MediAI-BA provides an optional heatmap visualization that highlights regions contributing to the AI model output. The heatmap is intended only as supplementary, qualitative information to illustrate internal AI operations and is not intended for clinical interpretation, growth plate localization, or independent bone age assessment.
The confidence score graph is an internal model visualization intended only to illustrate the relative sharpness of the model's output distribution. It is not calibrated to clinical likelihood, has not been clinically validated, and is not intended to support diagnostic decisions or selection of a specific bone age.
Here's a breakdown of the acceptance criteria and study details for the MediAI-BA device, based on the provided FDA 510(k) clearance letter:
MediAI-BA Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance:
| Acceptance Criteria (Performance Metric) | Target (Implicit from "no significant bias" and "high consistency") | Reported Device Performance and Confidence Intervals |
|---|---|---|
| Deming Regression - Slope | Close to 1 (indicating no proportional bias) | 1.000 (95% CI: 0.989–1.002) |
| Deming Regression - Intercept | Close to 0 (indicating no systematic bias) | 0.08 (95% CI: −0.004–0.158) |
| Bland-Altman Analysis - 95% Limits of Agreement | Narrow range (demonstrating high consistency and agreement) | −0.66 (−1.96 SD) to 0.71 (+1.96 SD) |
| Frequency Distribution of Differences - Mean | Close to 0 (indicating negligible average difference) | 0.026 years |
| Frequency Distribution of Differences - Standard Deviation | Low (indicating high precision) | 0.3505 years |
| Frequency Distribution of Differences - Cases within 0.5 years | High percentage (indicating strong agreement for a large majority of cases) | 89% of all cases |
| Heatmap Consistency (SSIM) | ≥ 0.85 (for most evaluation cases under normal conditions) | Most of 30 evaluation cases met criteria under brightness adjustment and Gaussian noise. All 5 cases met criteria under rotation. |
| Heatmap Accuracy | Bone age changes observed when highlighted region is masked (indicating region's contribution to output) | Bone age changes observed in 27 out of 30 cases when the highlighted region of the heatmap was masked. |
Study Details
2. Sample size used for the test set and the data provenance:
- Sample Size: 600 cases.
- Data Provenance:
- Country of Origin: United States.
- Collection Sites: Five sites across multiple states and multiple clinical organizations.
- Retrospective/Prospective: Not explicitly stated, but the description of "collected from five sites" suggests a retrospective collection of existing images for this study. The phrase "None of the cases used in this study were utilized for training or development of the MediAI-BA model" reinforces that these were untouched test cases.
- Demographics: 50.0% males and 50.0% females. Racial/ethnic composition included White, Hispanic, Black, Asian & Pacific Islander, among others.
- Image Sources: X-ray scanner manufacturers included Samsung Electronics, Carestream Health, Kodak, Siemens, and Konica Minolta.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Four evaluators.
- Qualifications of Experts: Not explicitly stated, but the context of "pediatric radiologists" in the Indications for Use and the assertion that the device "demonstrated performance comparable to bone age readings obtained by human evaluators using the GP atlas method" strongly imply that these evaluators were pediatric radiologists experienced in bone age assessment using the GP (Greulich and Pyle) atlas method.
4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- The document states that ground truth was "established by four evaluators." It does not specify the exact adjudication method (e.g., whether it was consensus, average, or majority rule among the four).
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was not explicitly described. The study compared the device's standalone performance against the ground truth established by human evaluators. It did not evaluate how human readers' performance might improve when assisted by the AI.
6. If a standalone (i.e. algorithm only without human-in-the loop performance) was done:
- Yes, a standalone performance study was done. The performance metrics (Deming regression, Bland-Altman, frequency distribution of differences) directly compare the "software's bone age analysis results" and "MediAI-BA outputs" against the "ground truth." This is a direct measurement of the algorithm's standalone performance.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc):
- The ground truth was established by "four evaluators" using the "GP atlas method." This indicates expert consensus/interpretation using a recognized standard (GP atlas).
8. The sample size for the training set:
- Not specified in the provided text. The document explicitly states that "None of the cases used in this study were utilized for training or development of the MediAI-BA model," but does not give details about the training set itself.
9. How the ground truth for the training set was established:
- Not specified in the provided text.
Ask a specific question about this device
Page 1 of 27