Search Results
Found 847 results
510(k) Data Aggregation
(234 days)
RevealAI-Lung Software is a computer aided diagnostic (CADx) software application intended for the characterization of incidentally-detected lung nodules on computed tomography (CT) scans. When a nodule is identified, the Software automatically compares the nodule characteristics with a clinically established database of lung nodules and provides a similarity score to assist clinicians' assessment of patients' cancer risk.
The mSI score is indicated for the evaluation of incidentally-detected pulmonary nodules of diameter 6-15mm in patients aged 18 years or above. In cases where multiple abnormalities are present, the mSI score can be used to assess each abnormality independently. Risk should be interpreted on an individual patient level and mSI is a relative risk score, not a percentage cancer risk.
Note that mSI is not indicated for lung cancer screening. The validation data excluded CT images with missing slices.
The RevealAI-Lung device is a post-processing software program that analyzes patient lung computed tomography (CT) images and is designed to provide computer-aided diagnostic (CADx) information about lung nodules to radiologists.
The user opens the patient's lung CT image from a third-party acquisition device in an existing medical device viewing system and scrolls through the image slices as in their normal workflow. The user identifies a lung nodule on the CT image, and evaluates that nodule for cancer risk and the potential need for follow-up using existing known risk factors, clinical management guidelines and the Reveal-AI-Lung provided mSI score. In cases where multiple nodules are present, RevealAI-Lung can be used to assess each nodule independently.
Here's a breakdown of the acceptance criteria and the study proving RevealAI-Lung meets them, based on the provided FDA 510(k) Clearance Letter:
1. Table of Acceptance Criteria and Reported Device Performance
| Acceptance Criterion | Reported Device Performance |
|---|---|
| Primary Endpoint (Multi-Reader Multi-Case (MRMC) Study): Improvement in radiologists' ability to discriminate between malignant and benign pulmonary nodules from CT images with and without the aid of the mSI. Measured as the difference in Area Under the Receiver Operating Characteristic Curve (AUC). | Average AUC improvement: 0.181 (from 0.538 unassisted to 0.719 with RevealAI-Lung assistance). This difference was statistically significant (p < 0.0001). |
| Consistency of Performance Across Readers: Every radiologist must improve their performance when using RevealAI-Lung. | Achieved: Every radiologist (10/10) improved their performance when using RevealAI-Lung. Individual AUC improvements ranged from 0.106 to 0.258. |
| Sensitivity Improvement (at 5% malignancy likelihood threshold): Increase in sensitivity when using RevealAI-Lung. | Increased sensitivity by 14 points (from 0.68 ± 0.039 to 0.82 ± 0.036). |
| Specificity Improvement (at 5% malignancy likelihood threshold): Increase in specificity when using RevealAI-Lung. | Increased specificity by 12 points (from 0.344 ± 0.041 to 0.467 ± 0.043). |
| Standalone Performance: Ability of RevealAI-Lung to discriminate between benign and malignant nodules. | Achieved: Standalone testing of RevealAI-Lung demonstrated it performed as expected in discriminating between benign and malignant nodules. (Specific quantitative metrics for standalone AUC are not explicitly provided, but "performed as expected" is stated.) |
| Validation on External Populations: Consistent device performance across additional incidental nodule populations. | Achieved: Tested on three additional populations (US, Canada, UK). Each study produced performance with an AUC > 0.8, and demonstrated follow-up decisions would be improved compared to clinical guidelines. |
| Consistency Across Subgroups: Performance improvements consistent across patient, nodule, and technical parameters. | Achieved: Results were independent of radiologist experience, patient demographics (age, sex, race/ethnicity), scan characteristics (contrast, scan date, manufacturer), and nodule parameters (size, lobe, opacity). Range of improvement in subgroups: 0.12 - 0.30. |
| Software Quality System Compliance: Adherence to FDA guidance for software in medical devices, 21 CFR §892.2060 special controls, human factors, usability, and cybersecurity. | Achieved: Design, validation, and verification were planned, executed, and documented according to FDA guidance. Assessed as Moderate Level of Concern. Usability evaluations confirmed safety and effectiveness. Cybersecurity activities and risk management were performed. |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size for Clinical Performance Testing (MRMC Study): 108 cases (patients) with incidental lung nodules. The cases included size-matched benign and malignant nodules.
- Sample Size for Validation Testing on External Populations: 675 patients with incidental lung nodules (276 with cancer).
- Data Provenance:
- MRMC Study: Sourced from 3 US sites and 1 in Canada.
- External Validation Studies: One each from the US, Canada, and the UK.
- Retrospective or Prospective: Both the MRMC study and the external validation studies appear to be based on retrospective data, as they used "CT series... from patients in routine practice where lung nodules had been noted incidentally on the original radiology report" and involved "following the patients for at least 5 years" for ground truth (where pathology was not available).
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
The document specifies the ground truth for the test sets (both MRMC and external validation) was established with "strict requirement for diagnostic certainty (either pathologic confirmation or two-years radiologic monitoring to confirm benign nodules)."
While it doesn't explicitly state the number of experts who established the ground truth, the involvement of "pathologic confirmation" or "two-years radiologic monitoring" implies the standard clinical practice involving pathologists and/or radiologists in the diagnostic process. The MRMC study itself involved 10 radiologists reading the cases, and while they were assessing malignancy likelihood, the ground truth for those cases was pre-established based on the methods described.
4. Adjudication Method for the Test Set
The adjudication method for establishing the ground truth (pathologic confirmation or two-year radiological monitoring) is not explicitly detailed in terms of expert consensus (e.g., 2+1, 3+1). However, the "strict requirement for diagnostic certainty" implies a high standard of clinical diagnosis.
For the MRMC study's reader evaluations, there was no direct adjudication of reader disagreement against each other. Instead, each reader's interpretation (with and without AI) was compared against the pre-established ground truth for each case. Each case was read twice by each reader, separated by a 28-day washout period, with AI use randomized for the second read.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, What Was the Effect Size of How Much Human Readers Improve with AI vs. Without AI Assistance
Yes, an MRMC comparative effectiveness study was done.
- Reader Improvement: Radiologists improved their accuracy for the diagnosis of pulmonary nodules by an average of 18 points (0.181 AUC).
- Average AUC without the device: 0.538
- Average AUC with the device: 0.719
- Statistical Significance: This difference was statistically significant (p < 0.0001; Dorfman-Berbaum-Metz ANOVA random-reader random-case (RRRC) with jackknife (Wilcoxon)).
- Consistent Improvement: Every radiologist (10 out of 10) improved their performance when using RevealAI-Lung, with individual improvements ranging from 0.11 to 0.26 AUC points.
6. If a Standalone (i.e., Algorithm Only Without Human-in-the-Loop Performance) Was Done
Yes, standalone testing was done.
- Performance: "Standalone testing of RevealAI-Lung demonstrated that it performed as expected in discriminating between benign and malignant nodules."
- Additional Validation: "Validation of RevealAI-Lung was performed to determine device performance against the ground truth using pre-established acceptance criteria. The device was subsequently tested on incidental nodules from three additional populations (one each US, Canada, and the UK). Each of these studies produced performance with an AUC > 0.8, and demonstrated follow-up decisions would be improved compared to clinical guidelines." This indicates strong standalone performance on external datasets.
7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)
The ground truth for both training and validation sets was established with "strict requirement for diagnostic certainty":
- Pathologic Confirmation: For malignant nodules, this would typically involve biopsy results.
- Two-Years Radiologic Monitoring: For benign nodules, this means stable appearance over two years of follow-up CT scans, indicating a non-cancerous nature.
- Outcome Data: The phrase "following the patients for at least 5 years" for confidently matched diagnoses used in training, and implied in validation, points to long-term outcomes data to confirm the definitive diagnosis.
8. The Sample Size for the Training Set
- Training Dataset: RevealAI-Lung was trained on "radiologist-identified lung nodules from 4-30mm in diameter."
- Specific Sample Size: The exact number of cases or nodules in the training set is not explicitly stated in the provided document, beyond the characteristics of the subjects (median age 63, 43% female).
9. How the Ground Truth for the Training Set Was Established
- Method: "Only nodules that were confidently matched to a definitive diagnosis were used for training, including following the patients for at least 5 years."
- This implies a combination of pathology (for malignant cases) and long-term radiologic stability/outcomes (for benign cases) to ensure diagnostic certainty, similar to the method described for the test sets. The mention of "radiologist-identified lung nodules" for the training set likely refers to how the nodules were initially marked or selected, while the "confidently matched to a definitive diagnosis" over 5 years is how their ground truth was ultimately confirmed.
Ask a specific question about this device
(174 days)
Neurophet AQUA AD Plus is intended for automatic labeling, visualization, and volumetric quantification of segmentable brain structures and lesions, as well as SUVR quantification from a set of MR and PET images. Volumetric measurements may be compared to reference percentile data.
Neurophet AQUA AD Plus is a software device intended for the automatic labeling of brain structures, visualization, and volumetric quantification of segmented brain regions and lesions, as well as standardized uptake value ratio (SUVR) quantification using MR and PET images. The volumetric outcomes are compared to normative reference data to support the evaluation of neurodegeneration and cognitive impairment.
The device is designed to assist physicians in clinical evaluation by streamlining the clinical workflow from patient registration through image analysis, analysis result archiving, and report generation using software-based functionalities. The device provides percentile-based results by comparing an individual's imaging-derived quantitative analysis results to reference populations. Percentile-based results are provided for reference only and are not intended to serve as a standalone basis for diagnostic decision-making. Clinical interpretation must be performed by qualified healthcare professionals.
Here's a breakdown of the acceptance criteria and study details for the Neurophet AQUA AD Plus, based on the provided FDA 510(k) Clearance Letter:
Acceptance Criteria and Device Performance for Neurophet AQUA AD Plus
The Neurophet AQUA AD Plus employs multiple AI modules for automated segmentation and quantitative analysis of brain structures and lesions using MR and PET images. The device's performance was validated against predefined acceptance criteria for each module.
1. Table of Acceptance Criteria and Reported Device Performance
| AI Module | Performance Metric | Acceptance Criteria | Reported Device Performance |
|---|---|---|---|
| T1-SegEngine (T1-weighted structural MRI segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | 95% CI of DSC: [0.750, 0.850] for major cortical brain structures 95% CI of DSC: [0.800, 0.900] for major subcortical brain structures | Cortical Regions: Mean DSC: 0.83 ± 0.04 (95% CI: 0.82–0.84) Subcortical Regions: Mean DSC: 0.87 ± 0.03 (95% CI: 0.86–0.88) |
| Reproducibility (Average Volume Difference Percentage - AVDP) | Equivalence range: 1.0–5.0% for both subcortical and cortical regions | Subcortical Regions: Mean AVDP: 2.50 ± 0.93% (95% CI: 2.26–2.74) Cortical Regions: Mean AVDP: 1.79 ± 0.74% (95% CI: 1.60–1.98) | |
| FLAIR-SegEngine (T2-FLAIR hyperintensity segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | Mean DSC ≥ 0.80 | Mean DSC: 0.90 ± 0.04 (95% CI: 0.89–0.91) |
| Reproducibility (Mean AVDP and Absolute Lesion Volume Difference) | Absolute difference < 0.25 cc Mean AVDP < 2.5% | Mean AVDP: 0.99 ± 0.66% Mean absolute lesion volume difference: 0.08 ± 0.06 cc | |
| PET-Engine (SUVR and Centiloid quantification) | SUVR Accuracy (Intraclass Correlation Coefficient - ICC) | ICC ≥ 0.60 across Alzheimer's-relevant regions (compared to FDA-cleared reference product K221405) | ICC ≥ 0.993 across seven Alzheimer's-relevant regions |
| Centiloid Classification (Kappa value for amyloid positivity) | κ ≥ 0.70 (indicating substantial agreement with consensus expert visual reads) | Kappa values met or exceeded criterion (specific values not provided, but noted as meeting/exceeding) | |
| ED-SegEngine (edema-like T2-FLAIR hyperintensity segmentation) | Accuracy (Dice Similarity Coefficient - DSC) | DSC ≥ 0.70 | Mean DSC: 0.91 ± 0.09 (95% CI: 0.89–0.93) |
| HEM-SegEngine (GRE/SWI hypointense lesion segmentation) | Accuracy (F1-score / DSC) | F1-score ≥ 0.60 | Median F1-score (DSC): 0.860 (95% CI: 0.824–0.902) |
2. Sample Sizes and Data Provenance for the Test Set
- T1-SegEngine (Accuracy): 60 independent T1-weighted MRI cases. Data provenance not explicitly stated, but implicitly from public repositories (e.g., ADNI, AIBL, PPMI) and institutional clinical sites as mentioned for training data, and distinct from training.
- T1-SegEngine (Reproducibility): 60 subjects with paired T1-weighted scans (120 scans total). Data provenance not explicitly stated.
- FLAIR-SegEngine (Accuracy): 136 independent T2-FLAIR cases. Data provenance not explicitly stated, but distinct from training data.
- FLAIR-SegEngine (Reproducibility): Paired T2-FLAIR scans (number not specified). Data provenance not explicitly stated.
- PET-Engine (SUVR accuracy): 30 paired MRI–PET datasets. Data provenance not explicitly stated, but implicitly from multi-center studies including varied tracers and sites.
- PET-Engine (Centiloid classification): 176 paired T1-weighted MRI and amyloid PET scans from ADNI and AIBL. These are public repositories, likely involving diverse geographical data (e.g., USA, Australia). Data is retrospective.
- ED-SegEngine (Accuracy): 100 T2-FLAIR scans collected from U.S. and U.K. clinical sites. Data is retrospective.
- HEM-SegEngine (Accuracy): 106 GRE/SWI scans from U.S. clinical sites. Data is retrospective.
For all modules, validation datasets were fully independent from training datasets at the subject level, drawn from distinct sites and/or repositories where applicable.
The validation cohorts covered adult subjects across a broad age range (approximately 40–80+ years), with both females and males represented.
Racial/ethnic composition included White, Asian, Black, and African American subjects, depending on the underlying public and institutional datasets.
Clinical subgroups included clinically normal, mild cognitive impairment, and Alzheimer's disease for structural, FLAIR, and PET modules, and cerebrovascular/amyloid‑related pathologies for ED‑ and HEM‑SegEngines.
3. Number of Experts and Qualifications for Ground Truth
For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines):
- Number of Experts: Not explicitly stated as a specific number, but "subspecialty-trained neuroradiologists" were used.
- Qualifications: "Subspecialty-trained neuroradiologists." Specific years of experience are not mentioned.
For Centiloid classification in the PET-Engine:
- Number of Experts: "Consensus expert visual reads." The exact number isn't specified, but implies multiple experts.
- Qualifications: "Experts" trained in established amyloid PET reading criteria. Specific qualifications beyond "expert" and training in criteria are not detailed.
4. Adjudication Method for the Test Set
For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines):
- "Consensus/adjudication procedures and internal quality control to ensure consistency" were used for establishing reference segmentations. The specific 2+1, 3+1, or other detailed method is not provided.
For Centiloid classification in the PET-Engine:
- "Consensus expert visual interpretation" was used. The specific method details are not provided.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
The provided text does not indicate that an MRMC comparative effectiveness study was done to compare human readers with AI assistance versus without AI assistance. The performance studies primarily focus on the standalone (algorithm-only) performance of the device against expert-derived ground truth or a cleared reference product.
6. Standalone (Algorithm-Only) Performance Study
Yes, a standalone (algorithm only without human-in-the-loop performance) study was done for all AI modules. The text explicitly states: "Standalone performance tests were conducted for each module using validation datasets that were completely independent from those used for model development and training." The results presented in the table above reflect this standalone performance.
7. Type of Ground Truth Used
- Expert Consensus:
- For structural and lesion segmentation modules (T1-, FLAIR-, ED-, HEM-SegEngines), reference segmentations were generated by "subspecialty-trained neuroradiologists using predefined anatomical and lesion‑labeling criteria, with consensus/adjudication procedures."
- For Centiloid classification in the PET-Engine, reference labels were derived from "consensus expert visual interpretation using established amyloid PET reading criteria."
- Comparison to Cleared Reference Product:
- For SUVR quantification in the PET-Engine, reference values were obtained from an "FDA‑cleared reference product (K221405)" (Neurophet SCALE PET).
8. Sample Size for the Training Set
The exact sample size for the training set is not explicitly stated as a single number. However, the document mentions:
- "The AI-based modules (T1‑SegEngine, FLAIR‑SegEngine, PET‑Engine, ED‑SegEngine, HEM‑SegEngine) were trained using multi-center MRI and PET datasets collected from public repositories (e.g., ADNI, AIBL, PPMI) and institutional clinical sites."
- "Training data covered:
- Adult subjects across a broad age range (approximately 20–80+ years), with both sexes represented and including multiple racial/ethnic groups (e.g., White, Asian, Black).
- A spectrum of clinical conditions relevant to the intended use, including clinically normal, mild cognitive impairment, and Alzheimer's disease, as well as patients with cerebrovascular and amyloid‑related pathologies for lesion-segmentation modules.
- MRI acquired on major vendor platforms (GE, Siemens, Philips) at 1.5T and 3T... and amyloid PET acquired on multiple PET systems with commonly used tracers (Amyvid, Neuraceq, Vizamyl)."
This indicates a large and diverse training set, although a precise count of subjects or images isn't provided.
9. How the Ground Truth for the Training Set Was Established
The document implies that the training data included "manual labels" as it states: "No images or manual labels from the training datasets were reused in the validation datasets." However, it does not explicitly detail the process by which these "manual labels" or ground truth for the training set were established (e.g., number of experts, qualifications, adjudication method for training data). It's reasonable to infer that similar expert-driven processes were likely used for training ground truth as for validation, but this is not explicitly confirmed in the provided text.
Ask a specific question about this device
(210 days)
Surgical Reality Viewer is a medical imaging visualization software intended to assist trained healthcare professionals with preoperative and intraoperative visualizations, by displaying 2D and 3D renderings of DICOM compliant patient images and normal anatomic segmentations derived from patient images as well as functions for manipulation of segmentations and 3D models.
Surgical Reality Viewer assists the trained healthcare professional who is responsible for making all final patient management decisions.
The machine learning algorithms in use by Surgical Reality Viewer are intended for use on adult patients aged 22 years and over.
Surgical Reality Viewer is medical imaging visualization software that accepts DICOM compliant images (e.g. CT-scans or MR images) and segmentation files in various 3D object file formats (e.g. NifTi, OBJ, MHD, STL, etc.). The device can generate preliminary segmentations of normal anatomy on demand using machine learning and computer vision algorithms. It provides tools for editing and/or creating segmentations using various built-in 2D and 3D image manipulation functions. The software generates a 3D segmented view of the loaded patient data, either on a supported 2D or 3D screen, and offers features such as pre-operative (re)viewing of DICOM data overlaid with segmentation, (intra/post)operative visualization of anatomical structures, 2D-viewing, volume rendering, surface rendering, immersive and interactive 3D-viewing, 2D and 3D measuring of DICOM image data, storing on a local device, anatomic labelling including segmentation tools, and tools for annotations, brushing or carving of anatomical structures. Surgical Reality Viewer runs on a dedicated computer within the customer environment, meeting specific hardware requirements including a Windows operating system (version 10 or higher), GPU (Nvidia GeForce 2070), CPU (Intel i7), 16GB RAM, and at least 100GB free hard drive space.
Here's a breakdown of the acceptance criteria and study details for the Surgical Reality Viewer, based on the provided FDA 510(k) clearance letter and summary:
Acceptance Criteria and Reported Device Performance
The provided document details the performance of the machine learning algorithms for various anatomical segmentations using the Sørensen–Dice coefficient (DSC). Additionally, it describes a qualitative assessment of suitability.
Table of Acceptance Criteria (Implicit) and Reported Device Performance
| Anatomical Structure | Metric (Implicit Acceptance Criteria) | Reported Device Performance |
|---|---|---|
| Lobe segmentation | Average Sørensen–Dice coefficient (DSC) | 0.97 |
| - LUL | DSC | 0.98 |
| - LLL | DSC | 0.98 |
| - RUL | DSC | 0.98 |
| - RLL | DSC | 0.98 |
| - RML | DSC | 0.96 |
| Vessel segmentation | Average Sørensen–Dice coefficient (DSC) | 0.84 |
| - Artery | DSC | 0.84 |
| - Vein | DSC | 0.83 |
| Airway segmentation | Sørensen–Dice coefficient (DSC) | 0.96 |
| Aorta segmentation | Sørensen–Dice coefficient (DSC) | 0.96 |
| Pulmonary segmentation | Average Sørensen–Dice coefficient (DSC) | 0.85 |
| - Left segments | DSC | 0.85 |
| - Right segments | DSC | 0.85 |
| Qualitative Scores (Suitability) | (Score 1-5, higher is better) | Reported Scores: |
| Airways segmentations | Suitability score | 4.8 |
| Artery segmentations | Suitability score | 4.8 |
| Vein segmentations | Suitability score | 4.9 |
| Lobe Segmentations | Suitability score | 5.0 |
| Pulmonary lobe segments | Suitability score | 4.7 |
| Aorta segmentations | Suitability score | 5.0 |
Note on Acceptance Criteria: The document directly presents the performance metrics (DSC and qualitative scores). While explicit numerical acceptance criteria (e.g., "must be >= 0.95 DSC") are not stated, the reported high performance figures implicitly demonstrate the device meets acceptable levels for these metrics.
Study Details
1. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 102 CT images (Each study belonged uniquely to a single patient subject).
- Data Provenance: 60 (n=60) scans were obtained from the United States. The remaining 42 scans' country of origin is not specified, but the document mentions "geographical location" as a subgroup for generalizability.
- Retrospective/Prospective: Not explicitly stated, but the mention of "curated datasets" and "clinical testing dataset" without ongoing patient enrollment suggests a retrospective study.
2. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: Not explicitly stated as a specific number. The document mentions "trained professionals" who generated the initial segmentations and "thoracic surgeons with a minimum of 2 years professional working experience" who verified these segmentations. This implies at least two distinct groups of experts were involved, potentially multiple individuals within each group.
- Qualifications of Experts:
- Initial Segmentation Generation: "Trained professionals." (Specific professional background and experience level not detailed).
- Segmentation Verification: "Thoracic surgeons with a minimum of 2 years professional working experience."
3. Adjudication Method (for the Test Set)
- Adjudication Method: Not explicitly stated. The process described is "segmented by trained professionals and the segmentations were verified by thoracic surgeons." This suggests a single ground truth was established after the verification step, but the specific process for resolving discrepancies (e.g., consensus, tie-breaking by a third expert) is not detailed. It does not mention a 2+1 or 3+1 method.
4. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
- MRMC Study: No, an MRMC comparative effectiveness study was not explicitly described. The study focuses on the standalone performance of the algorithm against ground truth, and separate qualitative scoring of the suitability of segmentations. There is no mention of comparing human readers with and without AI assistance to determine an "effect size" of improvement.
5. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
- Standalone Study: Yes, a standalone performance study was done. The "Performance was verified by comparing segmentations generated by the machine learning models against ground truth segmentations generated by trained professionals." This directly assesses the algorithm's performance without a human in the loop for generating the primary segmentation output being evaluated for accuracy.
6. The Type of Ground Truth Used
- Type of Ground Truth: The ground truth for the quantitative analysis (DSC) was established by "expert consensus" (or at least expert-verified segmentations). Specifically, "segmentations generated by trained professionals and the segmentations were verified by thoracic surgeons." For the qualitative assessment, "medical professionals were tasked to qualitatively score the suitability of the segmentations provided through the Viewer," which is also an expert-based evaluation of the AI output.
7. The Sample Size for the Training Set
- Training Set Sample Size: Not explicitly stated. The document mentions "Each of the algorithms has been trained and tuned on curated datasets representative of the intended patient population," but does not provide a specific number for the training set. It only states that a "CT image was either part of the tuning or testing dataset and not in both," indicating that the 102 CT images used for testing were separate from the training/tuning data.
8. How the Ground Truth for the Training Set Was Established
- Training Set Ground Truth: Not explicitly stated. The document mentions "trained and tuned on curated datasets representative of the intended patient population." While not explicitly detailed, it's reasonable to infer that a similar expert-driven process (like the ground truth establishment for the test set) would have been used for creating the ground truth in the training dataset to ensure high-quality training data.
Ask a specific question about this device
(275 days)
Seg Pro V3 is a software device intended to assist trained radiation oncology professionals, including, but not limited to, radiation oncologists, medical physicists, and dosimetrists, during their clinical workflows of radiation therapy treatment planning by providing initial contours of organs at risk on DICOM images. Seg Pro V3 is intended to be used on adult patients only.
The contours are generated by deep-learning algorithms and then transferred to radiation therapy treatment planning systems. Seg Pro V3 must be used in conjunction with a DICOM-compliant treatment planning system to review and edit results generated. Seg Pro V3 is not intended to be used for decision making or to detect lesions.
Seg Pro V3 is an adjunct tool and is not intended to replace a clinician's judgment and manual contouring of the normal organs on DICOM images. Clinicians must not use the software generated output alone without review as the primary interpretation.
The proposed device, Seg Pro V3, is a standalone software that is designed to be used by trained radiation oncology professionals to automatically delineate (segment/contour) organs-at-risk (OARs) on DICOM images. This auto-contouring of OARs is intended to facilitate radiation therapy workflows.
The device receives images in DICOM format as input and automatically generates the contours of OARs, which are stored in DICOM format and in RTSTRUCT modality. The device must be used in conjunction with a DICOM-compliant treatment planning system (TPS) to review and edit results. Once data is routed to Seg Pro V3, the data will be processed and no user interaction is required, nor provided.
The deployment environment is recommended to be in a local network with an existing hospital-grade IT system in place. Seg Pro V3 should be installed on a specialized server supporting deep learning processing. The configurations are only being operated by the manufacturer.
- Local network setting of input and output destinations.
- Presentation of labels and their color.
- Processed image management and output (RTSTRUCT) file management.
Here's an analysis of the acceptance criteria and study proving the device meets those criteria, based on the provided FDA 510(k) clearance letter for Seg Pro V3 (RT-300):
Acceptance Criteria and Reported Device Performance
| Acceptance Criteria (Metric) | Threshold (for large, medium, small volume structures) | Reported Device Performance (Mean DSC for respective sizes) |
|---|---|---|
| Dice Similarity Coefficient (DSC) | > 0.80 for large-volume structures | 0.90 |
| Dice Similarity Coefficient (DSC) | > 0.65 for medium-volume structures | 0.86 |
| Dice Similarity Coefficient (DSC) | > 0.50 for small-volume structures | 0.73 |
| Overall Mean DSC | (N/A - overall performance reported) | 0.85 |
| Overall Median 95% Hausdorff Distance (HD) | (N/A - overall performance reported) | 2.62 mm |
| Median 95% HD for large-volume structures | (N/A - specific threshold not defined) | 3.01 mm |
| Median 95% HD for medium-volume structures | (N/A - specific threshold not defined) | 2.57 mm |
| Median 95% HD for small-volume structures | (N/A - specific threshold not defined) | 2.27 mm |
Study Details Proving Device Meets Acceptance Criteria
2. Sample size used for the test set and the data provenance:
- Sample Size: 175 cases.
- Data Provenance: Consecutively collected from the Cancer Imaging Archive (TCIA) datasets. The data was acquired independently from product development training and internal testing. Race and ethnic distribution within the study data patient population was unavailable.
- Geographic Origin (inferred): TCIA is primarily a US-based resource, so data is likely from the United States or a diverse international collection.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Three.
- Qualifications of Experts: Board-certified radiation oncologists.
4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- Adjudication Method: "Each OAR contour used as ground truth (GT) was independently generated by three board-certified radiation oncologists." This implies a consensus or agreement among all three experts was used to define the ground truth, effectively a 3-way consensus. The document does not explicitly state an adjudication method like 2+1, but the independent generation by three experts suggests a high-quality, agreed-upon ground truth.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- MRMC Study: No. The study primarily evaluated the standalone performance of the AI algorithm. The clinical validation mentions that Seg Pro V3 "operates as intended within a clinical workflow and supports its intended use as an adjunct tool," but it does not present data from an MRMC study comparing human reader performance with and without AI assistance.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Standalone Performance: Yes. "a standalone performance evaluation was conducted to assess the Organ-at-Risk (OAR) contouring capabilities of Seg Pro V3. The observed results indicated that Seg Pro V3 by itself, in the absence of any interaction with a clinician, can contour developed OARs with satisfactory results." The reported DSC and HD metrics are from this standalone evaluation.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Ground Truth Type: Expert consensus. The ground truth (GT) for each OAR contour was "independently generated by three board-certified radiation oncologists."
8. The sample size for the training set:
- The document explicitly states that the 175 cases used for the standalone performance evaluation were "acquired independently from product development training and internal testing." However, the document does not specify the sample size of the training set used to develop the deep learning models.
9. How the ground truth for the training set was established:
- The document does not specify how the ground truth for the training set was established. It only describes the ground truth establishment for the test set.
Ask a specific question about this device
(285 days)
BriefCase-Triage is a radiological computer aided triage and notification software indicated for use in the analysis of contrast-enhanced CT images that include the brain, in adults or transitional adolescents aged 18 and older. The device is intended to assist hospital networks and appropriately trained medical specialists in workflow triage by flagging and communication of suspected positive cases of Brain Aneurysm (BA) findings that are 3.0 mm or larger.
BriefCase-Triage uses an artificial intelligence algorithm to analyze images and flag suspect cases in parallel to the ongoing standard of care image interpretation. The user is presented with notifications for suspect cases. Notifications include compressed preview images that are meant for informational purposes only and not intended for diagnostic use beyond notification. The device does not alter the original medical image and is not intended to be used as a diagnostic device.
The results of BriefCase-Triage are intended to be used in conjunction with other patient information and based on professional judgment, to assist with triage/prioritization of medical images. Notified clinicians are responsible for viewing full images per the standard of care.
BriefCase-Triage is a radiological computer-assisted triage and notification software device.
The software is based on an algorithm programmed component and is intended to run on a linux-based server in a cloud environment.
The BriefCase-Triage receives filtered DICOM Images, and processes them chronologically by running the algorithms on each series to detect suspected cases. Following the AI processing, the output of the algorithm analysis is transferred to an image review software (desktop application). When a suspected case is detected, the user receives a pop-up notification and is presented with a compressed, low-quality, grayscale image that is captioned "not for diagnostic use, for prioritization only" which is displayed as a preview function. This preview is meant for informational purposes only, does not contain any marking of the findings, and is not intended for primary diagnosis beyond notification.
Here's a breakdown of the acceptance criteria and study details for the BriefCase-Triage device, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Reported Device Performance
| Metric | Acceptance Criteria (Performance Goal) | Reported Device Performance |
|---|---|---|
| Primary Endpoints | ||
| Sensitivity | 80% | 87.8% (95% CI: 83.1%-91.6%) |
| Specificity | 80% | 91.6% (95% CI: 87.9%-94.5%) |
| Secondary Endpoints | ||
| Time-to-Notification (mean) | Comparable to predicate device | 44.8 seconds (95% CI: 41.4-48.2) |
| Negative Predictive Value (NPV) | N/A | 98.9% (95% CI: 98.4%-99.2%) |
| Positive Predictive Value (PPV) | N/A | 47.6% (95% CI: 38.4%-57.1%) |
| Positive Likelihood Ratio (PLR) | N/A | 10.5 (95% CI: 7.2-15.3) |
| Negative Likelihood Ratio (NLR) | N/A | 0.13 (95% CI: 0.1-0.19) |
Note on Additional Operating Points (AOPs): The device also met performance goals (80% sensitivity and specificity) for three additional operating points (AOP1, AOP2, AOP3) with slightly varying sensitivity/specificity trade-offs (e.g., AOP3: Sensitivity 86.2%, Specificity 93.6%).
Study Details
1. Sample size used for the test set and the data provenance:
- Sample Size: 544 cases
- Data Provenance: Retrospective, blinded, multicenter study from 6 US-based clinical sites. The cases were distinct in time or center from those used for algorithm training.
2. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Three (3) senior board-certified radiologists.
- Qualifications: "Senior board-certified radiologists." (Specific number of years of experience not detailed in the provided text).
3. Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- The text states the ground truth was "determined by three senior board-certified radiologists." It doesn't explicitly describe an adjudication method like "2+1" or "3+1." This implies a consensus approach where all three radiologists agreed, or a majority rule, but the exact mechanism for resolving discrepancies (if any) is not specified.
4. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was NOT done. The study's primary objective was to evaluate the standalone performance of the BriefCase-Triage software. The secondary endpoint compared the device's time-to-notification to that of the predicate device, but not its impact on human reader performance.
5. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance study was done. The primary endpoints (sensitivity and specificity) measure the algorithm's performance in identifying Brain Aneurysm (BA) findings.
6. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert Consensus: The ground truth was "determined by three senior board-certified radiologists."
7. The sample size for the training set:
- Not explicitly stated. The document mentions the algorithm was "trained during software development on images of the pathology" and that "critical findings were tagged in all CTs in the training data set." However, the specific sample size for this training data is not provided.
8. How the ground truth for the training set was established:
- Manually labeled ("tagged") images: The text states, "As is customary in the field of machine learning, deep learning algorithm development consisted of training on manually labeled ('tagged') images. In that process, critical findings were tagged in all CTs in the training data set." It does not specify who performed the tagging or their qualifications, nor the method of consensus if multiple taggers were involved.
Ask a specific question about this device
(268 days)
It is used by radiation oncology department to segment CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaption.
The proposed device, AccuContour 4.0 Family, is a standalone software with the following variants: AccuContour and AccuContour-Lite. The functions of AccuContour-Lite is a subset of AccuContour.
AccuContour:
It is used by oncology department to register multi-modality images and segment (non-contrast) CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaptation.
The product has two image processing functions:
- Deep learning contouring: it can automatically contour organs-at-risk, in head and neck, thorax, abdomen and pelvis (for both male and female) areas,
- Automatic registration: rigid and deformable registration, and
- Manual contouring.
It also has the following general functions:
- Receive, add/edit/delete, transmit, input/export, medical images and DICOM data;
- Patient management;
- Review tool of processed images;
- Extension tool;
- Plan evaluation and plan comparison;
- Dose analysis.
AccuContour-Lite:
It is used by oncology department to segment (non-contrast) CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaptation.
The product has one image processing function:
Deep learning contouring: it can automatically contour organs-at-risk, in head and neck, thorax, abdomen and pelvis (for both male and female) areas,
It also has the following general functions:
- Receive, add/edit/delete, transmit, input/export, medical images and DICOM data;
- Patient management;
- Review tool of processed images.
Here's an analysis of the acceptance criteria and study details for the AccuContour 4.0, extracted and organized from the provided FDA 510(k) clearance letter.
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are derived from the "Pass Criteria" columns in Tables 1, 2, 3, and 4, which specify minimum DSC and maximum HD95 values. The reported device performance is represented by the "Lower Bound 95% CI" for both DSC and HD95, and the "Average Rating" for clinical applicability.
Table A: Performance for Synthetic CT (sCT) Contouring Function (Derived from MR Images)
| Organ & Structure | Size | DSC Pass Criteria | HD95 Pass Criteria (mm) | Reported DSC (Lower Bound 95% CI) | Reported HD95 (Lower Bound 95% CI, mm) | Average Rating (1-5) | Meet Criteria? (DSC) | Meet Criteria? (HD95) |
|---|---|---|---|---|---|---|---|---|
| TemporalLobe_L | Medium | 0.65 | N/A | 0.886 | 4.319 (N/A criteria) | 4.5 | Yes | N/A |
| TemporalLobe_R | Medium | 0.65 | N/A | 0.878 | 4.382 (N/A criteria) | 4.6 | Yes | N/A |
| Brain | Large | 0.8 | N/A | 0.986 | 1.877 (N/A criteria) | 4.7 | Yes | N/A |
| BrainStem | Medium | 0.65 | N/A | 0.843 | 4.999 (N/A criteria) | 4.5 | Yes | N/A |
| SpinalCord | Medium | 0.65 | N/A | 0.867 | 3.030 (N/A criteria) | 4.8 | Yes | N/A |
| OpticChiasm | Small | 0.5 | N/A | 0.804 | 4.771 (N/A criteria) | 4.1 | Yes | N/A |
| OpticNerve_L | Small | 0.5 | N/A | 0.822 | 2.235 (N/A criteria) | 4.1 | Yes | N/A |
| OpticNerve_R | Small | 0.5 | N/A | 0.794 | 2.422 (N/A criteria) | 4.2 | Yes | N/A |
| InnerEar_L | Small | 0.5 | N/A | 0.843 | 2.164 (N/A criteria) | 4.2 | Yes | N/A |
| InnerEar_R | Small | 0.5 | N/A | 0.806 | 2.102 (N/A criteria) | 4.4 | Yes | N/A |
| MiddleEar_L | Small | 0.5 | N/A | 0.824 | 3.580 (N/A criteria) | 4.5 | Yes | N/A |
| MiddleEar_R | Small | 0.5 | N/A | 0.792 | 3.700 (N/A criteria) | 4.4 | Yes | N/A |
| Eye_L | Small | 0.5 | N/A | 0.906 | 1.659 (N/A criteria) | 4.8 | Yes | N/A |
| Eye_R | Small | 0.5 | N/A | 0.897 | 1.584 (N/A criteria) | 4.9 | Yes | N/A |
| Lens_L | Small | 0.5 | N/A | 0.836 | 3.368 (N/A criteria) | 4.5 | Yes | N/A |
| Lens_R | Small | 0.5 | N/A | 0.841 | 3.379 (N/A criteria) | 4.2 | Yes | N/A |
| Pituitary | Small | 0.5 | N/A | 0.801 | 2.267 (N/A criteria) | 4.4 | Yes | N/A |
| Mandible | Small | 0.5 | N/A | 0.913 | 1.844 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_L | Small | 0.5 | N/A | 0.830 | 2.819 (N/A criteria) | 4.4 | Yes | N/A |
| TMJ_R | Small | 0.5 | N/A | 0.817 | 2.722 (N/A criteria) | 4.5 | Yes | N/A |
| OralCavity | Medium | 0.65 | N/A | 0.916 | 3.677 (N/A criteria) | 4.7 | Yes | N/A |
| Larynx | Medium | 0.65 | N/A | 0.795 | 2.196 (N/A criteria) | 4.4 | Yes | N/A |
| Trachea | Medium | 0.65 | N/A | 0.870 | 2.452 (N/A criteria) | 4.5 | Yes | N/A |
| Esophagus | Medium | 0.65 | N/A | 0.800 | 2.680 (N/A criteria) | 4.7 | Yes | N/A |
| Parotid_L | Medium | 0.65 | N/A | 0.851 | 2.386 (N/A criteria) | 4.6 | Yes | N/A |
| Parotid_R | Medium | 0.65 | N/A | 0.868 | 2.328 (N/A criteria) | 4.6 | Yes | N/A |
| Submandibular_L | Medium | 0.65 | N/A | 0.833 | 4.920 (N/A criteria) | 4.5 | Yes | N/A |
| Submandibular_R | Medium | 0.65 | N/A | 0.783 | 2.348 (N/A criteria) | 4.3 | Yes | N/A |
| Thyroid | Medium | 0.65 | N/A | 0.803 | 1.911 (N/A criteria) | 4.8 | Yes | N/A |
| BrachialPlexus_L | Medium | 0.65 | N/A | 0.828 | 5.347 (N/A criteria) | 4.4 | Yes | N/A |
| BrachialPlexus_R | Medium | 0.65 | N/A | 0.800 | 5.062 (N/A criteria) | 4.3 | Yes | N/A |
| Lung_L | Large | 0.8 | N/A | 0.968 | 1.635 (N/A criteria) | 4.5 | Yes | N/A |
| Lung_R | Large | 0.8 | N/A | 0.976 | 1.516 (N/A criteria) | 4.7 | Yes | N/A |
| Heart | Large | 0.8 | N/A | 0.959 | 2.496 (N/A criteria) | 4.5 | Yes | N/A |
| Liver | Large | 0.8 | N/A | 0.941 | 2.439 (N/A criteria) | 4.0 | Yes | N/A |
| Kidney_L | Large | 0.8 | N/A | 0.892 | 2.748 (N/A criteria) | 4.7 | Yes | N/A |
| Kidney_R | Large | 0.8 | N/A | 0.895 | 2.797 (N/A criteria) | 4.5 | Yes | N/A |
| Stomach | Large | 0.8 | N/A | 0.782 | 4.754 (N/A criteria) | 4.1 | No* | N/A |
| Pancreas | Medium | 0.65 | N/A | 0.827 | 6.271 (N/A criteria) | 4.0 | Yes | N/A |
| Duodenum | Medium | 0.65 | N/A | 0.815 | 6.447 (N/A criteria) | 4.1 | Yes | N/A |
| Rectum | Medium | 0.65 | N/A | 0.796 | 2.047 (N/A criteria) | 3.9 | Yes | N/A |
| BowelBag | Large | 0.8 | N/A | 0.808 | 7.380 (N/A criteria) | 4.0 | Yes | N/A |
| Bladder | Large | 0.8 | N/A | 0.943 | 2.082 (N/A criteria) | 4.5 | Yes | N/A |
| Marrow | Large | 0.8 | N/A | 0.889 | 1.842 (N/A criteria) | 4.6 | Yes | N/A |
| FemurHead_L | Medium | 0.65 | N/A | 0.950 | 2.261 (N/A criteria) | 4.5 | Yes | N/A |
| FemurHead_R | Medium | 0.65 | N/A | 0.941 | 2.466 (N/A criteria) | 4.6 | Yes | N/A |
*Note: For Stomach, the reported DSC (0.782) is below the pass criteria (0.8). However, the document states, "The results indicate that the auto-segmentation performance of the AccuContour system for sCT images derived from both CBCT and MR modalities meets the requirements for geometric accuracy." This suggests there might be an overall or combined assessment, or other factors led to acceptance despite this single instance. The average clinical rating is 4.1, which is above the threshold of 3.
Table B: Performance for Synthetic CT (sCT) Contouring Function (Derived from CBCT Images)
| Organ & Structure | Size | DSC Pass Criteria | HD95 Pass Criteria (mm) | Reported DSC (Lower Bound 95% CI) | Reported HD95 (Lower Bound 95% CI, mm) | Average Rating (1-5) | Meet Criteria? (DSC) | Meet Criteria? (HD95) |
|---|---|---|---|---|---|---|---|---|
| TemporalLobe_L | Medium | 0.65 | N/A | 0.854 | 3.451 (N/A criteria) | 4.8 | Yes | N/A |
| TemporalLobe_R | Medium | 0.65 | N/A | 0.859 | 3.258 (N/A criteria) | 4.6 | Yes | N/A |
| Brain | Large | 0.8 | N/A | 0.986 | 1.804 (N/A criteria) | 4.7 | Yes | N/A |
| BrainStem | Medium | 0.65 | N/A | 0.903 | 4.678 (N/A criteria) | 4.5 | Yes | N/A |
| SpinalCord | Medium | 0.65 | N/A | 0.869 | 2.088 (N/A criteria) | 4.8 | Yes | N/A |
| OpticChiasm | Small | 0.5 | N/A | 0.795 | 5.252 (N/A criteria) | 4.4 | Yes | N/A |
| OpticNerve_L | Small | 0.5 | N/A | 0.815 | 2.373 (N/A criteria) | 4.2 | Yes | N/A |
| OpticNerve_R | Small | 0.5 | N/A | 0.816 | 2.210 (N/A criteria) | 4.1 | Yes | N/A |
| InnerEar_L | Small | 0.5 | N/A | 0.800 | 2.144 (N/A criteria) | 4.5 | Yes | N/A |
| InnerEar_R | Small | 0.5 | N/A | 0.794 | 2.171 (N/A criteria) | 4.2 | Yes | N/A |
| MiddleEar_L | Small | 0.5 | N/A | 0.800 | 3.301 (N/A criteria) | 4.5 | Yes | N/A |
| MiddleEar_R | Small | 0.5 | N/A | 0.797 | 3.888 (N/A criteria) | 4.5 | Yes | N/A |
| Eye_L | Small | 0.5 | N/A | 0.944 | 1.553 (N/A criteria) | 4.8 | Yes | N/A |
| Eye_R | Small | 0.5 | N/A | 0.941 | 1.678 (N/A criteria) | 4.9 | Yes | N/A |
| Lens_L | Small | 0.5 | N/A | 0.820 | 3.532 (N/A criteria) | 4.5 | Yes | N/A |
| Lens_R | Small | 0.5 | N/A | 0.821 | 3.370 (N/A criteria) | 4.7 | Yes | N/A |
| Pituitary | Small | 0.5 | N/A | 0.802 | 2.496 (N/A criteria) | 4.4 | Yes | N/A |
| Mandible | Small | 0.5 | N/A | 0.870 | 2.227 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_L | Small | 0.5 | N/A | 0.774 | 2.775 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_R | Small | 0.5 | N/A | 0.800 | 2.791 (N/A criteria) | 4.5 | Yes | N/A |
| OralCavity | Medium | 0.65 | N/A | 0.885 | 3.794 (N/A criteria) | 4.8 | Yes | N/A |
| Larynx | Medium | 0.65 | N/A | 0.793 | 2.827 (N/A criteria) | 4.8 | Yes | N/A |
| Trachea | Medium | 0.65 | N/A | 0.873 | 2.545 (N/A criteria) | 4.5 | Yes | N/A |
| Esophagus | Medium | 0.65 | N/A | 0.800 | 2.811 (N/A criteria) | 4.5 | Yes | N/A |
| Parotid_L | Medium | 0.65 | N/A | 0.891 | 2.415 (N/A criteria) | 4.6 | Yes | N/A |
| Parotid_R | Medium | 0.65 | N/A | 0.894 | 2.525 (N/A criteria) | 4.6 | Yes | N/A |
| Submandibular_L | Medium | 0.65 | N/A | 0.745 | 5.026 (N/A criteria) | 4.8 | Yes | N/A |
| Submandibular_R | Medium | 0.65 | N/A | 0.797 | 2.192 (N/A criteria) | 4.7 | Yes | N/A |
| Thyroid | Medium | 0.65 | N/A | 0.823 | 2.182 (N/A criteria) | 4.8 | Yes | N/A |
| BrachialPlexus_L | Medium | 0.65 | N/A | 0.805 | 3.922 (N/A criteria) | 4.4 | Yes | N/A |
| BrachialPlexus_R | Medium | 0.65 | N/A | 0.823 | 3.529 (N/A criteria) | 4.2 | Yes | N/A |
| Lung_L | Large | 0.8 | N/A | 0.947 | 1.587 (N/A criteria) | 4.5 | Yes | N/A |
| Lung_R | Large | 0.8 | N/A | 0.971 | 1.635 (N/A criteria) | 4.3 | Yes | N/A |
| Heart | Large | 0.8 | N/A | 0.896 | 1.823 (N/A criteria) | 4.5 | Yes | N/A |
| Liver | Large | 0.8 | N/A | 0.914 | 2.595 (N/A criteria) | 4.6 | Yes | N/A |
| Kidney_L | Large | 0.8 | N/A | 0.922 | 2.645 (N/A criteria) | 4.7 | Yes | N/A |
| Kidney_R | Large | 0.8 | N/A | 0.906 | 2.611 (N/A criteria) | 4.5 | Yes | N/A |
| Stomach | Large | 0.8 | N/A | 0.858 | 4.681 (N/A criteria) | 4.2 | Yes | N/A |
| Pancreas | Medium | 0.65 | N/A | 0.822 | 5.548 (N/A criteria) | 4.4 | Yes | N/A |
| Duodenum | Medium | 0.65 | N/A | 0.818 | 5.252 (N/A criteria) | 4.1 | Yes | N/A |
| Rectum | Medium | 0.65 | N/A | 0.797 | 4.253 (N/A criteria) | 4.3 | Yes | N/A |
| BowelBag | Large | 0.8 | N/A | 0.850 | 5.028 (N/A criteria) | 4.0 | Yes | N/A |
| Bladder | Large | 0.8 | N/A | 0.926 | 3.322 (N/A criteria) | 4.7 | Yes | N/A |
| Marrow | Large | 0.8 | N/A | 0.837 | 2.148 (N/A criteria) | 4.7 | Yes | N/A |
| FemurHead_L | Medium | 0.65 | N/A | 0.893 | 1.639 (N/A criteria) | 4.8 | Yes | N/A |
| FemurHead_R | Medium | 0.65 | N/A | 0.927 | 1.807 (N/A criteria) | 4.9 | Yes | N/A |
Table C: Performance for 4DCT Registration Function (Rigid Registration)
| Organ & Structure | Size | DSC Pass Criteria | Reported DSC (Lower Bound 95% CI) | Average Rating (1-5) | Meet Criteria? |
|---|---|---|---|---|---|
| Trachea | Medium | 0.65 | 0.888 | 4.5 | Yes |
| Esophagus | Medium | 0.65 | 0.836 | 4.5 | Yes |
| Lung_L | Large | 0.8 | 0.932 | 4.7 | Yes |
| Lung_R | Large | 0.8 | 0.929 | 4.8 | Yes |
| Lung_All | Large | 0.8 | 0.930 | 4.8 | Yes |
| Heart | Large | 0.8 | 0.917 | 4.6 | Yes |
| SpinalCord | Medium | 0.65 | 0.943 | 4.6 | Yes |
| Liver | Large | 0.8 | 0.888 | 4.6 | Yes |
| Stomach | Large | 0.8 | 0.791 | 4.5 | No* |
| A_Aorta | Large | 0.8 | 0.917 | 4.4 | Yes |
| Spleen | Large | 0.8 | 0.786 | 4.5 | No* |
| Body | Large | 0.8 | 0.995 | 4.9 | Yes |
*Note: For Stomach (0.791) and Spleen (0.786), the reported DSC is below the pass criteria (0.8). However, the document states, "According to the results, the accuracy of 4DCT image registration images meets the requirements and all structure models demonstrating that only minor edits would be required in order to make the structure models acceptable for clinical use." The average clinical rating for both is 4.5, above the threshold of 3.
Table D: Performance for 4DCT Registration Function (Deformable Registration)
| Organ & Structure | Size | DSC Pass Criteria | Reported DSC (Lower Bound 95% CI) | Average Rating (1-5) | Meet Criteria? |
|---|---|---|---|---|---|
| Trachea | Medium | 0.65 | 0.940 | 4.7 | Yes |
| Esophagus | Medium | 0.65 | 0.866 | 4.6 | Yes |
| Lung_L | Large | 0.8 | 0.966 | 4.7 | Yes |
| Lung_R | Large | 0.8 | 0.949 | 4.5 | Yes |
| Lung_All | Large | 0.8 | 0.954 | 4.8 | Yes |
| Heart | Large | 0.8 | 0.931 | 4.6 | Yes |
| SpinalCord | Medium | 0.65 | 0.920 | 4.6 | Yes |
| Liver | Large | 0.8 | 0.936 | 4.5 | Yes |
| Stomach | Large | 0.8 | 0.889 | 4.5 | Yes |
| A_Aorta | Large | 0.8 | 0.947 | 4.6 | Yes |
| Spleen | Large | 0.8 | 0.913 | 4.8 | Yes |
| Body | Large | 0.8 | 0.997 | 4.9 | Yes |
2. Sample Size Used for the Test Set and Data Provenance
-
Synthetic CT (sCT) Contouring Function:
- Sample Size: 247 synthetic CT images (116 generated from MR, 131 generated from CBCT).
- Data Provenance:
- Demographic Distribution: 57% male, 43% female. Age distribution: 13% (21-40), 44.1% (41-60), 36.8% (61-80), 6.1% (81-100). Race: 78% White, 12% Black or African American, 10% Others.
- Imaging Equipment: MR images from GE (21.6%), Philips (56.9%), Siemens (21.6%). CBCT images from Varian (58.8%), Elekta (41.2%).
- Retrospective/Prospective: Not explicitly stated, but the description of demographic and equipment distribution from a "sample" indicates retrospective data collection from existing patient records.
- Country of Origin: The racial distribution explicitly mentions "U.S. clinical radiotherapy practice," suggesting the data is primarily from the United States.
-
4DCT Registration Function:
- Sample Size: 30 4DCT image sets.
- Data Provenance:
- Imaging Equipment: Siemens (90.0%), Philips (10.0%) scanners.
- Demographic Distribution: 17 males (56.7%), 13 females (43.3%). Age: 33-82 years, with majority in 51-65 (40.0%) and 66-80 (43.3%) year brackets.
- Image Characteristics: Uniform 3mm slice thickness (100%).
- Sourcing Location: Most images (90.0%) from Drexel Town Square Health Center/Community Memorial Hospital, remainder from Froedtert Hospital.
- Retrospective/Prospective: Not explicitly stated, but implies retrospective data from patient archives of the mentioned hospitals.
- Country of Origin: Based on the hospital names (Drexel Town Square Health Center, Community Memorial Hospital, Froedtert Hospital), the data is from the United States.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: Not explicitly stated. The text mentions "clinical experts evaluate the clinical applicability" and "RTStruct contoured by the professional physician as the gold standard." This implies at least one, and likely multiple, qualified medical professionals.
- Qualifications of Experts: The experts are described as "clinical experts" and "professional physician(s)." Their specific qualifications (e.g., "radiologist with 10 years of experience") are not provided. They are implied to be clinically qualified radiotherapy personnel.
4. Adjudication Method for the Test Set
- Adjudication Method: Not explicitly stated. The ground truth for segmentation is stated to be "RTStruct contoured by the professional physician". For clinical applicability, "clinical experts evaluate the clinical applicability" and assign a 1-5 scale score. This suggests a single expert (or group consensus without specific adjudication rules like 2+1) established the ground truth segmentation, and separate clinical experts evaluated the results. There is no mention of a formal adjudication process for disagreements in ground truth labeling if multiple experts were involved in its creation.
5. Multi Reader Multi Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? No.
- Effect Size of Human Improvement (if applicable): Not applicable, as no MRMC study comparing human readers with and without AI assistance was reported. The testing focused solely on the algorithm's performance against expert-generated ground truth and expert evaluation of the algorithm's output.
6. Standalone Performance
- Was a standalone performance study done? Yes. The entire report details the "Performance Test Report on Synthetic CT (sCT) Contouring Function" and "Performance Test Report on 4DCT Registration Function," measuring the algorithm's performance (DSC, HD95) against gold standard contours and qualitative evaluation by clinical experts. This reflects the algorithm's performance independent of human interaction during the contouring process.
7. Type of Ground Truth Used
- Ground Truth: For the synthetic CT contouring and 4DCT registration functions, the ground truth was "RTStruct contoured by the professional physician" (i.e., expert consensus or expert-generated contours).
8. Sample Size for the Training Set
- Training Set Sample Size: Not provided in the document.
9. How the Ground Truth for the Training Set was Established
- Training Set Ground Truth Establishment: Not provided in the document. The document only details the ground truth used for the validation/test set.
Ask a specific question about this device
(59 days)
AV Vascular is indicated to assist users in the visualization, assessment and quantification of vascular anatomy on CTA and/or MRA datasets, in order to assess patients with suspected or diagnosed vascular pathology and to assist with pre-procedural planning of endovascular interventions.
AV Vascular is a post-processing software application intended for visualization, assessment, and quantification of vessels in computed tomography angiography (CTA) and magnetic resonance angiography (MRA) data with a unified workflow for both modalities.
AV Vascular includes the following functions:
-
Advanced visualization: the application provides all relevant views and interactions for CTA and MRA image review: 2D slides, MIP, MPR, curved MPR (cMPR), stretched MPR (sMPR), path-aligned views (cross-sectional and longitudinal MPRs), 3D volume rendering (VR).
-
Vessel segmentation: automatic bone removal and vessel segmentation for head/neck and body CTA data, automatic vessel centerline, lumen and outer wall extraction and labeling for the main branches of the vascular anatomy in head/neck and body CTA data, semi-automatic and manual creation of vessel centerline and lumen for CTA and MRA data, interactive two-point vessel centerline extraction and single-point centerline extension.
-
Vessel inspection: enable inspection of an entire vessel using the cMPR or sMPR views as well as inspection of a vessel locally using vessel-aligned views (cross-sectional and longitudinal MPRs) by selecting a position along a vessel of interest.
-
Measurements: ability to create and save measurements of vessel and lumen inner and outer diameters and area, as well as vessel length and angle measurements.
-
Measurements and tools that specifically support pre-procedural planning: manual and automatic ring marker placement for specific anatomical locations, length measurements of the longest and shortest curve along the aortic lumen contour, angle measurements of aortic branches in clock position style, saving viewing angles in C-arm notation, and configurable templated
-
Saving and export: saving and export of batch series and customizable reports.
This summarization is based on the provided 510(k) clearance letter for Philips Medical Systems' AV Vascular device.
Acceptance Criteria and Device Performance for Aorto-iliac Outer Wall Segmentation
| Metrics | Acceptance Criteria | Reported Device Performance (Mean with 98.75% confidence intervals) |
|---|---|---|
| 3D Dice Similarity Coefficient (DSC) | > 0.9 | 0.96 (0.96, 0.97) |
| 2D Dice Similarity Coefficient (DSC) | > 0.9 | 0.96 (0.95, 0.96) |
| Mean Surface Distance (MSD) | < 1.0 mm | 0.57 mm (0.485, 0.68) |
| Hausdorff Distance (HD) | < 3.0 mm | 1.68 mm (1.23, 2.08) |
| ∆Dmin (difference in minimum diameter) | > 95% |∆Dmin| < 5 mm | 98.8% (98.3-99.2%) |
| ∆Dmax (difference in maximum diameter) | > 95% |∆Dmax| < 5 mm | 98.5% (97.9-98.9%) |
The reported device performance for all primary and secondary metrics meets the predefined acceptance criteria.
Study Details for Aorto-iliac Outer Wall Segmentation Validation
-
Sample Size used for the Test Set and Data Provenance:
- Sample Size: 80 patients
- Data Provenance: Retrospectively collected from 7 clinical sites in the US, 3 European hospitals, and one hospital in Asia.
- Independence from Training Data: All performance testing datasets were acquired from clinical sites distinct from those which provided the algorithm training data. The algorithm developers had no access to the testing data, ensuring complete independence.
- Patient Characteristics: At least 80% of patients had thoracic and/or abdominal aortic diseases and/or iliac artery diseases (e.g., thoracic/abdominal aortic aneurysm, ectasia, dissection, and stenosis). At least 20% had been treated with stents.
- Demographics:
- Geographics: North America: 58 (72.5%), Europe: 3 (3.75%), Asia: 19 (23.75%)
- Sex: Male: 59 (73.75%), Female: 21 (26.25%)
- Age (years): 21-50: 2 (2.50%), 51-70: 31 (38.75%), >71: 45 (56.25%), Not available: 2 (2.5%)
-
Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:
- Number of Experts: Three
- Qualifications: US-board certified radiologists.
-
Adjudication Method for the Test Set:
- The three US-board certified radiologists independently performed manual contouring of the outer wall along the aorta and iliac arteries on cross-sectional planes for each CT angiographic image.
- After quality control, these three aortic and iliac arterial outer wall contours were averaged to serve as the reference standard contour. This can be considered a form of consensus/averaging after independent readings.
-
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- The provided document does not indicate that a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done to measure human reader improvement with AI assistance. The study focused on the standalone performance of the AI algorithm compared to an expert-derived ground truth.
-
Standalone (Algorithm Only Without Human-in-the-Loop Performance):
- Yes, the performance data provided specifically describes the standalone performance of the AI-based algorithm for aorto-iliac outer wall segmentation. The algorithm's output was compared directly against the reference standard without human intervention in the segmentation process.
-
Type of Ground Truth Used:
- Expert Consensus/Averaging: The ground truth was established by averaging the independent manual contouring performed by three US-board certified radiologists.
-
Sample Size for the Training Set:
- The document states that the testing data were independent of the training data and that developers had no access to the testing data. However, the exact sample size for the training set is not specified in the provided text.
-
How the Ground Truth for the Training Set Was Established:
- The document implies that training data were used, but it does not describe how the ground truth for the training set was established. It only ensures that the testing data did not come from the same clinical sites as the training data and that algorithm developers had no access to the testing data.
Ask a specific question about this device
(122 days)
AI-Rad Companion Brain MR is a post-processing image analysis software that assists clinicians in viewing, analyzing, and evaluating MR brain images.
AI-Rad Companion Brain MR provides the following functionalities:
• Automated segmentation and quantitative analysis of individual brain structures and white matter hyperintensities
• Quantitative comparison of each brain structure with normative data from a healthy population
• Presentation of results for reporting that includes all numerical values as well as visualization of these results
AI-Rad Companion Brain MR runs two distinct and independent algorithms for Brain Morphometry analysis and White Matter Hyperintensities (WMH) segmentation, respectively. In overall, comprises four main algorithmic features:
• Brain Morphometry
• Brain Morphometry follow-up
• White Matter Hyperintensities (WMH)
• White Matter Hyperintensities (WMH) follow-up
The feature for Brain Morphometry is available since the first version of the device (VA2x), while segmentation of White Matter Hyperintensities was added since VA4x and the follow-up analysis for both is available since VA5x. The brain morphometry and brain morphometry follow-up feature have not been modified and remain identical to previous VA5x mainline version.
AI-Rad Companion Brain MR VA60 is an enhancement to the predicate, AI-Rad Companion Brain MR VA50 (K232305). Just as in the predicate, the brain morphometry feature of AI-Rad Companion Brain MR addresses the automatic quantification and visual assessment of the volumetric properties of various brain structures based on T1 MPRAGE datasets. From a predefined list of brain structures (e.g. Hippocampus, Caudate, Left Frontal Gray Matter, etc.) volumetric properties are calculated as absolute and normalized volumes with respect to the total intracranial volume. The normalized values are compared against age-matched mean and standard deviations obtained from a population of healthy reference subjects. The deviation from this reference population can be visualized as 3D overlay map or out-of-range flag next to the quantitative values.
Additionally, identical to the predicate, the white matter hyperintensities feature addresses the automatic quantification and visual assessment of white matter hyperintensities on the basis of T1 MPRAGE and T2 weighted FLAIR datasets. The detected WMH can be visualized as a 3D overlay map and the quantification in count and volume as per 4 brain regions in the report.
Here's a structured overview of the acceptance criteria and study details for the AI-Rad Companion Brain MR, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Reported Device Performance
| Acceptance Criteria | Reported Device Performance (AI-Rad Companion Brain MR WMH Feature) | Reported Device Performance (AI-Rad Companion Brain MR WMH Follow-up Feature) |
|---|---|---|
| WMH Segmentation Accuracy | Pearson correlation coefficient between WMH volumes and ground truth annotation: 0.96Interclass correlation coefficient between WMH volumes and ground truth annotation: 0.94Dice score: 0.60F1-score: 0.67Detailed Dice Scores for WMH Segmentation:Mean: 0.60Median: 0.62STD: 0.1495% CI: [0.57, 0.63]Detailed ASSD Scores for WMH Segmentation:Mean: 0.05Median: 0.00STD: 0.1595% CI: [0.02, 0.08] | |
| New or Enlarged WMH Segmentation Accuracy (Follow-up) | Pearson correlation coefficient between new or enlarged WMH volumes and ground truth annotation: 0.76Average Dice score: 0.59Average F1-score: 0.71Detailed Dice Scores for New/Enlarged WMH Segmentation (by Vendor - Siemens, GE, Philips):Siemens: Mean 0.64, Med 0.67, STD 0.15, 95% CI [0.60, 0.69]GE: Mean 0.56, Med 0.60, STD 0.14, 95% CI [0.51, 0.61]Philips: Mean 0.55, Med 0.59, STD 0.16, 95% CI [0.50, 0.61]Detailed ASSD Scores for New/Enlarged WMH Segmentation (by Vendor - Siemens, GE, Philips):Siemens: Mean 0.02, Med 0.00, STD 0.06, 95% CI [0.00, 0.04]GE: Mean 0.09, Med 0.01, STD 0.23, 95% CI [0.03, 0.19]Philips: Mean 0.04, Med 0.00, STD 0.11, 95% CI [0.00, 0.08] |
Study Details
-
Sample Size Used for the Test Set and Data Provenance:
- White Matter Hyperintensities (WMH) Feature: 100 subjects (Multiple Sclerosis patients (MS), Alzheimer's patients (AD), cognitive impaired (CI), and healthy controls (HC)).
- White Matter Hyperintensities (WMH) Follow-up Feature: 165 subjects (Multiple Sclerosis patients (MS) and Alzheimer's patients (AD)).
- Data Provenance: Data acquired from Siemens, GE, and Philips scanners. Testing data had balanced distribution with respect to gender and age of the patient according to target patient population, and field strength (1.5T and 3T). This indicates a retrospective, multi-vendor, multi-national (implied by vendor diversity) dataset.
-
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications:
- Number of Experts: Three radiologists.
- Qualifications: Not explicitly stated beyond "radiologists." It is not specified if they are board-certified, or their years of experience.
-
Adjudication Method for the Test Set:
- For each dataset, three sets of ground truth annotations were created manually.
- Each set was annotated by a disjoint group consisting of an annotator, a reviewer, and a clinical expert.
- The clinical expert was randomly assigned per case to minimize annotation bias.
- The clinical expert reviewed and corrected the initial annotation of the changed WMH areas according to a specified annotation protocol. Significant corrections led to re-communication with the annotator and re-review.
- This suggests a 3+1 Adjudication process, where three initial annotations are reviewed by a clinical expert.
-
If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study Was Done:
- No, an MRMC comparative effectiveness study comparing human readers with and without AI assistance was not done. The study focuses on the standalone performance of the AI algorithm against expert ground truth.
-
If a Standalone (i.e. algorithm only without human-in-the loop performance) Was Done:
- Yes, a standalone performance study was done. The "Accuracy was validated by comparing the results of the device to manual annotated ground truth from three radiologists." This evaluates the algorithm's performance directly.
-
The Type of Ground Truth Used:
- Expert Consensus / Manual Annotation: The ground truth for both WMH and WMH follow-up features was established through "manual annotated ground truth from three radiologists" and involved a "standard annotation process" with annotators, reviewers, and clinical experts.
-
The Sample Size for the Training Set:
- The document states that the "training data used for the fine tuning the hyper parameters of WMH follow-up algorithm is independent of the data used to test the white matter hyperintensity algorithm follow up algorithm." However, the specific sample size for the training set is not provided in the given text.
-
How the Ground Truth for the Training Set Was Established:
- The document implies that the WMH follow-up algorithm "does not include any machine learning/ deep learning component," suggesting a rule-based or conventional image processing algorithm. Therefore, "training" might refer to parameter tuning rather than machine learning model training.
- For the "fine-tuning the hyper parameters of WMH follow-up algorithm," the ground truth establishment method for this training data is not explicitly detailed in the provided text. It only states that this data was "independent of the data used to test" the algorithm.
Ask a specific question about this device
(206 days)
qXR-Detect is a computer-assisted detection (CADe) software device that analyzes chest radiographs and highlights suspicious regions of interest (ROIs). The device is intended to identify, highlight, and categorize suspicious regions of interest (ROI). Any suspicious ROI is highlighted by qXR-Detect and categorized into one of six categories (lung, pleura, bone, Mediastinum & Hila & Heart, hardware and other). The device is intended for use as a concurrent reading aid. qXR-Detect is indicated for adults only.
qXR-Detect is a computer-assisted detection (CADe) software device that analyzes chest radiographs and highlights suspicious regions of interest (ROIs). The device is intended to identify, highlight and categorize suspicious regions of interest (ROI). Any suspicious ROI is highlighted by qXR-Detect and categorized into one of six categories (lung, pleura, bone, Mediastinum &Hila & Heart, hardware and other). qXR-Detect is indicated for adults only. qXR-Detect is an adjunct tool and is not intended to replace a clinician's review of the radiograph or his/her clinical judgment. The users must not use the qXR-Detect generated output as the primary interpretation.
The qXR-Detect device is intended to generate a secondary digital radiographic image that facilitates the confirmation of the presence of suspicious region of interest within the categories on a chest X-Ray.
The software works with DICOM chest X-ray images and can be deployed on a secure cloud server. De-identified chest X-rays are sent to qXR-Detect via HTTPS from the client's software. Results are fetched by the client's software and then forwarded to their PACS or any other systems including but not limited to specified radiology software database once the processing is complete or to the console of the digital radiographic processing system.
Here's a breakdown of the acceptance criteria and the study details for the qXR-Detect device, based on the provided FDA clearance letter:
Acceptance Criteria and Reported Device Performance
Device Name: qXR-Detect
Type: Computer-assisted detection (CADe) software device
| Category | Acceptance Criteria (Standalone Performance) - AUC (95% CI) | Reported Device Performance (Standalone Performance) - AUC (95% CI) |
|---|---|---|
| Lung | Not explicitly stated as a minimum threshold in the provided text, but implied as satisfactory performance. | 0.893 (0.879-0.907) |
| Pleura | Not explicitly stated | 0.95 (0.94-0.96) |
| Mediastinum/Hila | Not explicitly stated | 0.891 (0.875-0.907) |
| Bone | Not explicitly stated | 0.879 (0.854-0.905) |
| Hardware | Not explicitly stated | 0.958 (0.95-0.966) |
| Other | Not explicitly stated | 0.915 (0.895-0.935) |
| Category | Acceptance Criteria (Standalone Performance) - wAFROC (95% CI) | Reported Device Performance (Standalone Performance) - wAFROC (95% CI) |
| Lung | Implied that wAFROC should be above 0.8 for most categories | 0.831 (0.816-0.846) |
| Pleura | Implied that wAFROC should be above 0.8 for most categories | 0.89 (0.875-0.905) |
| Mediastinum & Hila & Heart | Implied that wAFROC should be above 0.8 for most categories | 0.867 (0.85-0.883) |
| Bone | Implied that wAFROC should be above 0.8 for most categories | 0.821 (0.789-0.852) |
| Hardware | Implied that wAFROC should be above 0.8 for most categories | 0.771 (0.759-0.782) |
| Others | Implied that wAFROC should be above 0.8 for most categories | 0.871 (0.845-0.897) |
| Aggregate | Implied that wAFROC should be above 0.8 for most categories | 0.839 (0.824-0.854) |
| Category | Acceptance Criteria (Clinical Performance) - wAFROC Improvement | Reported Device Performance (Clinical Performance) - wAFROC Improvement |
| Overall wAFROC | Not explicitly stated as a minimum threshold, but statistical significance (P value < 0.001) and improvement was targeted. | Improved from 0.6894 (unaided) to 0.7505 (aided), an improvement of 0.0611 (P < 0.001). |
| All Categories | Improvement expected | All categories showed improvement. |
| False Positives per Image | Reduction expected | Reduced from 0.4182 (unaided) to 0.3300 (aided). |
| Category | Acceptance Criteria (Clinical Performance) - AUC Improvement | Reported Device Performance (Clinical Performance) - AUC Improvement |
| Overall AUC | Not explicitly stated as a minimum threshold, but statistical significance and improvement was targeted. | Improved from 0.8466 (unaided) to 0.8720 (aided). |
| All Categories | Improvement expected | All categories showed improvement. |
Study Details for Device Performance
1. Acceptance Criteria and Reported Device Performance (See table above)
2. Sample size used for the test set and the data provenance:
- Standalone Performance Study Test Set: The exact sample size for the standalone test set is not explicitly given, but the text states "Most of the scans for the study were obtained from across the US spanning 40 states and 5 regions in the US."
- Clinical Performance Study Test Set: 301 samples were used.
- Data Provenance (Clinical Performance Test Set): Not explicitly stated, but given the training data provenance and the testing context, it is likely that the clinical test set also included data from across the US (40 states and 5 regions). The data was retrospective, as it was described as a "multireader multicase study conducted on 301 samples."
- Data Characteristics: Well-balanced in terms of gender (approx. 50-50 male-to-female distribution). Age distribution from 22 to over 85 years.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Standalone Performance Study: 3 ground truthers annotated the chest X-ray scans. Their specific qualifications are not detailed beyond "ground truthers," but it's implied they are experts in medical image interpretation for the purpose of establishing ground truth for suspicious ROIs.
- Clinical Performance Study: The ground truth for the clinical study was established by the same process as the standalone study (3 ground truthers), though it's not explicitly re-stated in the clinical study section. The "readers" in the clinical study (who read images with and without AI assistance) were 18 professionals including radiologists, ER physicians, and family medicine practitioners. Their years of experience are not specified.
4. Adjudication method for the test set:
- Standalone Performance Study (Ground Truth Establishment): The method described implies a consensus-based approach, though not a specific numerical adjudication. "If there is at least one ground truth boundary for a particular category, the scan is considered to be positive for that category." This suggests that even a single expert identifying a boundary contributed to the ground truth, rather than requiring a majority consensus in all cases for each boundary. However, the overarching "ground truth established by 3 ground truthers" suggests collective expert input. A more precise adjudication method (e.g., 2-out-of-3 majority) is not explicitly stated for individual boundary decisions.
- Clinical Performance Study: The ground truth for judging reader performance against was the same as established for the standalone study. The comparison of reader performance (unaided vs. aided) implicitly uses this established ground truth.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- Yes, an MRMC comparative effectiveness study was done.
- Effect Size of Improvement:
- Overall wAFROC: Improved by 0.0611 (from 0.6894 unaided to 0.7505 aided). This improvement was statistically significant (P < 0.001).
- False Positives per image: Reduced from 0.4182 (unaided) to 0.3300 (aided).
- Overall AUC: Improved from 0.8466 (unaided) to 0.8720 (aided).
- Individual Reader Improvement: 17 out of 18 readers showed improvement in wAFROC-FOM across all categories. All 18 readers improved in detecting and localizing suspicious lung ROIs.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance study was done. The results are presented in "Table 2 Standalone Performance Testing Results for qXR-Detect" (AUC metrics) and "Table 3 Standalone Performance Testing Results for localization - qXR-Detect" (wAFROC metrics).
7. The type of ground truth used:
- The ground truth was established by expert consensus/annotation. "3 ground truthers annotated the chest X-ray scans for the presence of suspicious ROI categories."
8. The sample size for the training set:
- The underlying algorithm was trained on a large dataset of ~2.5 million Chest X-Ray scans.
9. How the ground truth for the training set was established:
- The document states that the training data "consisted of various abnormal regions of interest." While it doesn't explicitly detail the methodology for establishing ground truth for the training set, given the detailed ground truth process for the test set, it's highly probable that the training set ground truth was also established through expert radiological review and annotation, similar to the process described for the test set.
Ask a specific question about this device
(149 days)
Imagine® Enterprise Suite (IES) is a medical diagnostic device that receives, stores, and shares the medical images from and to DICOM-compliant entities such as imaging modalities (such as X-ray Angiograms (XA), Echocardiograms (US), MRI, CT, CR, DR, IVUS, OCT, PET and SPECT), external PACS, and other diagnostic workstations. It is used in the display and quantification of medical images, after image acquisition from modalities, for post-procedure clinical decision support. It constitutes a PACS for the communication and storage of medical images and provides a worklist of stored medical images that can be used to open patient studies in one of its image viewers. It is intended to display images and related information that are interpreted by trained professionals to render findings and/or diagnosis, but it does not directly generate any diagnosis or potential findings. Not intended for primary diagnosis of mammographic images. Not intended for intra-procedural or real-time use. Not intended for diagnostic use on mobile devices.
The Imagine® Enterprise Suite (IES) has, as its backbone, the IES PACS – a DICOM stack for the communication and storage of medical images. It is based on its predecessor, the HCP DICOM Net® PACS (K023467). The IES is made up of the following modules:
IES_EntViewer: This viewer module can be launched from the IES PACS Worklist and is intended primarily for the review and manipulation of angiographic X-ray images. It also supports the review of images from other modalities in single or combination views, thereby serving as a general-purpose multi-modality viewer.
IES_EchoViewer: This viewer module can be launched from the IES Worklist and is intended for specialized viewing, manipulation, and measurements of Echocardiography images.
IES_RadViewer: This viewer module can be launched from the IES Worklist and is intended for specialized viewing, manipulation, and measurements of Radiological images. It also supports the fusion of Radiological images (such as MRI and CT) with Nuclear Medicine images (such as PET and SPECT).
IES_ZFPViewer: This viewer is intended for non-diagnostic review of medical images over a web browser. It supports an independent worklist and a viewing component that requires no installation for the end user. It works within an intranet or over the internet via user-provided VPN or static IP.
AngioQuant: This module can be launched from the IES_EntViewer to perform automatic quantification of coronary arteries. It uses, as input, the cardiac angiogram studies stored on the IES PACS. It is intended for display and quantification of Xray angiographic images after image acquisition in the cathlab, for post-procedure clinical decision support within the cathlab workflow. It is not intended for intra-procedural or real-time use. The Imagine® Enterprise Suite (IES) is integrated with ML only for the segmentation of coronary vessels from X-ray angiographic images and uses deep learning methodology for image analysis.
Here's a breakdown of the acceptance criteria and study details for the Imagine® Enterprise Suite, specifically focusing on the AngioQuant module's machine learning component, as described in the provided 510(k) summary:
1. Table of Acceptance Criteria and Reported Device Performance
The 510(k) summary provides a narrative description of the performance evaluation rather than a direct table of acceptance criteria with corresponding performance metrics for every criterion. However, it explicitly states that the performance of the IES_AngioQuant module's machine learning-based coronary vessel segmentation function was evaluated using several metrics and compared against an FDA-cleared predicate device.
| Acceptance Criterion (Inferred from Study Design) | Reported Device Performance (IES_AngioQuant ML component) |
|---|---|
| Quantitative Performance Metrics for Coronary Vessel Segmentation | Evaluated using: |
| Jaccard Index (Intersection over Union) | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Dice Score | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Precision | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Accuracy | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Recall | Value not explicitly stated, but was among the comprehensive set of metrics used for evaluation. |
| Visual Assessment of Segmentation | Conducted in conjunction with quantitative metrics. |
| Comparative Performance to Predicate Device | Performance was compared against the FDA-cleared predicate device, CAAS Workstation (510(k) No. K232147). |
| Reproducibility/Consistency of Ground Truth (Implicit for verification) | Verification performed by two independent board-certified interventional cardiologists. |
Note: The specific numerical values for Jaccard Index, Dice Score, Precision, Accuracy, and Recall are not provided in the summary. The summary highlights that these metrics were used for evaluation.
2. Sample Size and Data Provenance
- Test Set Sample Size: An independent external test set comprising 30 patient studies was used.
- Data Provenance: The dataset consisted of anonymized angiographic studies sourced from multiple U.S. and international clinical sites. It was a retrospective dataset. The dataset included adult patients of mixed gender and represented a range of age, body habitus, and diverse race and ethnicity. Clinically relevant variability, including lesion severity, vessel anatomy, image quality, and imaging equipment vendors, was represented.
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: Two independent board-certified interventional cardiologists.
- Qualifications of Experts: Each expert had more than 10 years of clinical experience.
4. Adjudication Method for the Test Set
The summary does not explicitly state a formal adjudication method like "2+1" or "3+1" for differences between the experts. However, it states that the ground truth (reference standard) was established using the FDA-cleared Medis QAngio XA (K182611) software, with verification performed by the two independent board-certified interventional cardiologists. This implies that the experts reviewed and confirmed the ground truth generated by the predicate software, rather than independently generating it and then adjudicating differences.
5. MRMC Comparative Effectiveness Study
An MRMC comparative effectiveness study was not explicitly described in the summary. The performance comparison was primarily an algorithm-only comparison against a predicate device (CAAS Workstation) for the ML component. The summary does not mention how much human readers improve with or without AI assistance.
6. Standalone (Algorithm Only) Performance
Yes, a standalone (algorithm only without human-in-the-loop performance) study was done for the IES_AngioQuant module's machine learning-based coronary vessel segmentation function. Its performance was evaluated using quantitative metrics and visual assessment, and then compared against the FDA-cleared predicate device (CAAS Workstation).
7. Type of Ground Truth Used
The ground truth was established using an FDA-cleared software (Medis QAngio XA, K182611), with its output verified by expert consensus of two independent board-certified interventional cardiologists.
8. Sample Size for the Training Set
A total of 762 anonymized angiographic studies were used for training, validation, and internal testing sets combined. The summary does not provide an exact breakdown of how many studies were specifically in the training set versus the validation and internal testing sets.
9. How the Ground Truth for the Training Set Was Established
The summary states that the ground truth ("truthing") for the dataset (which includes the training, validation, and internal testing sets) was established using the FDA-cleared Medis QAngio XA (K182611) software, with verification performed by two independent board-certified interventional cardiologists, each with more than 10 years of clinical experience. Implicitly, this same method was used for establishing ground truth for the training set.
Ask a specific question about this device
Page 1 of 85