Search Results

AutoContour is intended to assist radiation treatment planners in contouring structures within medical images in preparation for radiation therapy treatment planning.

Device Description

As with AutoContour Model RADAC V2, the AutoContour Model RADAC V3 device is software that uses DICOM-compliant image data (CT or MR) as input to: (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical organs of the head and neck, thorax, abdomen and pelvis for adult male and female patients, (2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system.

AutoContour Model RADAC V3 consists of 3 main components:

1. A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
1. A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
1. A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device's performance, based on the provided document:

1. Table of Acceptance Criteria & Reported Device Performance

Feature/Metric	Acceptance Criteria	Reported Device Performance (Mean DSC/Rating)
CT Structures
Large volume DSC	>= 0.8	Initial Validation: 0.88 +/- 0.06External Validation: 0.90 +/- 0.09
Medium volume DSC	>= 0.65	Initial Validation: 0.88 +/- 0.08External Validation: 0.83 +/- 0.12
Small volume DSC	>= 0.5	Initial Validation: 0.75 +/- 0.12External Validation: 0.79 +/- 0.11
Clinical Appropriateness (1-5 scale, 5 best)	Average score >= 3	Average rating of 4.5
MR Structures
Medium volume DSC	>= 0.65	Initial Validation: 0.87 +/- 0.07External Validation: 0.87 +/- 0.07
Small volume DSC	>= 0.5	Initial Validation: 0.74 +/- 0.07External Validation: 0.74 +/- 0.07
Clinical Appropriateness (1-5 scale, 5 best)	Average score >= 3	Average rating of 4.4

2. Sample Sizes Used for the Test Set and Data Provenance

CT Test Set (Internal Validation): Approximately 10% of the training images, averaging 50 test images per structure model.
- Provenance: Retrospective data from "among the patients used for CT training and testing 51.7% were male and 48.3% female. Patient ages range 11-30 : 0.3%, 31-50 : 6.2%, 51-70 : 43.3%, 71-100 : 50.3%. Race 84.0% White, 12.8% Black or African American, 3.2% Other." No specific country of origin is mentioned, but implies internal company data.
CT Test Set (External Clinical Validation): Variable per structure model, ranging from 19 to 63 images.
- Provenance: Publicly available CT datasets from The Cancer Imaging Archive (TCIA archive). This suggests diverse, likely multi-national origin, but exact countries are not specified. The studies cited are primarily from US institutions (e.g., Memorial Sloan Kettering Cancer Center, MD Anderson Cancer Center). This data is retrospective.
MR Test Set (Internal Validation):
- Brain models: 92 testing images (from TCIA GLIS-RT dataset).
- Pelvis models: Sample size not explicitly stated for testing, but refers to "Prostate-MRI-US-Biopsy dataset."
- Provenance: TCIA datasets (implying diverse origin, likely US-centric as above), retrospective.
MR Test Set (External Clinical Validation):
- Brain models: 20 MR T1 Ax post (BRAVO) image scans acquired from a clinical partner (no specific country mentioned, but likely US given the context).
- Pelvis models: 19 images from a publicly available Gold Atlas Data set. The Gold Atlas project has references indicating collaboration across European and US institutions (e.g., Medical Physics - Europe/US).
- Provenance: Retrospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

Number of Experts: Three (3)
Qualifications: "clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist." No specific years of experience are mentioned.

4. Adjudication Method for the Test Set

Method: "Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) guidelines as appropriate by three clinically experienced experts". This implies a consensus-based approach, likely 3-way consensus. If initial contours differed, discussions and adjustments would lead to a final agreed-upon ground truth.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

No, an MRMC comparative effectiveness study was not explicitly done to measure improvement for human readers with AI vs without AI assistance.
- The study focuses on the performance of the AI algorithm itself (standalone performance) and its clinical appropriateness as rated by experts. The "External Reviewer Average Rating" indicates how much editing would be required by a human, rather than directly measuring human reader performance improvement with assistance.
- "independent reviewers (not employed by Radformation) were used to evaluate the clinical appropriateness of structure models as they would be evaluated for the purposes of treatment planning. This external review was performed as a replacement to intraobserver variability testing done with the RADAC V2 structure models as it better quantified the usefulness of the structure model outputs in an unbiased clinical setting." This suggests an assessment of the usability of the AI-generated contours for human review and modification, but not a direct MRMC study comparing assisted vs. unassisted human performance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, standalone performance was done.
- The Dice Similarity Coefficient (DSC) metrics presented are a measure of the algorithm's performance in generating contours when compared to expert-defined ground truth, without human intervention during the contour generation process. The "External Reviewer Average Rating" also evaluates the standalone output's quality before any human editing.

7. The Type of Ground Truth Used

Type of Ground Truth: Expert consensus.
- "Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) guidelines as appropriate by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist."

8. The Sample Size for the Training Set

CT Training Set: Average of 373 training image sets per structure model.
MR Training Set:
- Brain models: Average of 274 training image sets.
- Pelvis models: Sample size not explicitly stated for training, but refers to "Prostate-MRI-US-Biopsy dataset."
- It's important to note that specific numbers vary per structure, as shown in Table 4 and Table 8.

9. How the Ground Truth for the Training Set Was Established

The document implies that the training data and their corresponding ground truths were prepared internally prior to the testing phase. While it doesn't explicitly state how the ground truth for the training set was established, it strongly suggests a similar rigorous, expert-driven approach as described for the test sets.
"The test datasets were independent from those used for training and consisted of approximately 10% of the number of training image sets used as input for the model." This indicates that ground truth was established for both training and testing datasets.
"Publically available CT datasets from The Cancer Imaging Archive (TCIA archive) were used and both AutoContour and manually added ground truth contours following the same structure guidelines used for structure model training were added to the image sets." This suggests that for publicly available datasets used for both training and external validation, ground truth was added following the same NRG/RTOG guidelines. For proprietary training data, a similar expert-based ground truth creation likely occurred.

Ask a Question

Ask a specific question about this device

Page 1 of 1