Search Results

AutoContour is intended to assist radiation treatment planners in contouring and reviewing structures within medical images in preparation for radiation therapy treatment planning.

Device Description

As with AutoContour Model RADAC V3, the AutoContour Model RADAC V4 device is software that uses DICOM-compliant image data (CT or MR) as input to: (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical organs of the head and neck, thorax, abdomen and pelvis for adult male and female patients, (2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system.

AutoContour Model RADAC V4 consists of 3 main components:

1. A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
1. A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
1. A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.

AI/ML Overview

Here's an analysis of the acceptance criteria and study findings for the Radformation AutoContour (Model RADAC V4) device, based on the provided text:

1. Acceptance Criteria and Reported Device Performance

The primary acceptance criterion for the automated contouring models is the Dice Similarity Coefficient (DSC), which measures the spatial overlap between the AI-generated contour and the ground truth contour. The criteria vary based on the estimated size of the anatomical structure. Additionally, for external clinical testing, an external reviewer rating was used to assess clinical appropriateness.

Acceptance Criteria Category	Metric (for AI performance)	Performance Criteria (for AI performance)	Reported Device Performance (Mean ± Std Dev) for CT Models	Reported Device Performance (Mean ± Std Dev) for MR Models	Reported Device Performance (Mean External Reviewer Rating 1-5, higher is better)
Contouring Accuracy (CT Models)	Mean Dice Similarity Coefficient (DSC)	Large Volume Structures: ≥ 0.8	0.92 ± 0.06	0.96 ± 0.03	N/A
		Medium Volume Structures: ≥ 0.65	0.85 ± 0.09	0.84 ± 0.07	N/A
		Small Volume Structures: ≥ 0.5	0.81 ± 0.12	0.74 ± 0.09	N/A
Clinical Appropriateness (CT Models)	External Reviewer Rating (1-5 scale)	Average Score ≥ 3	N/A	N/A	4.57 (across all CT models)
Contouring Accuracy (MR Models)	Mean Dice Similarity Coefficient (DSC)	Large Volume Structures: ≥ 0.8	N/A	0.96 ± 0.03 (training data) 0.80 ± 0.09 (external data)	N/A
		Medium Volume Structures: ≥ 0.65	N/A	0.84 ± 0.07 (training data) 0.84 ± 0.09 (external data)	N/A
		Small Volume Structures: ≥ 0.5	N/A	0.74 ± 0.09 (training data) 0.61 ± 0.14 (external data)	N/A
Clinical Appropriateness (MR Models)	External Reviewer Rating (1-5 scale)	Average Score ≥ 3	N/A	N/A	4.6 (across all MR models)

2. Sample Size Used for the Test Set and Data Provenance

CT Models Test Set:
- Sample Size: For individual CT structure models, the number of testing sets ranged from 10 to 116 for the internal validation (Table 4) and 13 to 82 for the external clinical testing (Table 6). The document states "approximately 10% of the number of training image sets" were used for testing in the internal validation, with an average of 54 testing image sets per CT structure model.
- Data Provenance: Imaging data for training was gathered from 4 institutions in 2 different countries (United States and Switzerland). External clinical testing data for CT models was sourced from various TCIA (The Cancer Imaging Archive) datasets (Pelvic-Ref, Head-Neck-PET-CT, Pancreas-CT-CB, NSCLC, LCTSC, QIN-BREAST) and shared from several unidentified institutions in the United States. Data was retrospective, as it was acquired and then used for model validation.
MR Models Test Set:
- Sample Size: For individual MR structure models, the number of testing sets ranged from 45 for internal validation (Table 8) and 5 to 45 for external clinical testing (Table 10). The document states an average of 45 testing image sets per MR Brain model and 77 testing image sets per MR Pelvis model were used for internal validation.
- Data Provenance: Imaging data for training and internal testing was acquired from the Cancer Imaging Archive GLIS-RT dataset (for Brain models) and two open-source datasets plus one institution in the United States (for Pelvis models). External clinical testing data for MR models was from a clinical partner (for Brain models), two publicly available datasets (Prostate-MRI-U-S-Biopsy, Gold Atlas Pelvis, SynthRad), and two institutions utilizing MR Linacs for image acquisitions. Data was retrospective.
General Note: Test datasets were independent from those used for training.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

Number of Experts: Three (3) experts were used.
Qualifications of Experts: The ground truth was established by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist.

4. Adjudication Method for the Test Set

Method: Ground truthing of each test data set was generated manually using consensus (NRG/RTOG) guidelines as appropriate by the three experts. This implies an expert consensus method, likely involving discussion and agreement among the three. The document does not specify a quantitative adjudication method like "2+1" or "3+1" but rather a "consensus" guided by established clinical guidelines.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

The document does not report an MRMC comparative effectiveness study comparing human readers with AI assistance versus without AI assistance. The study focuses purely on the AI's performance and its clinical appropriateness as rated by external reviewers.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was Done

Yes, a standalone performance evaluation was done. The core of the performance data presented (Dice Similarity Coefficient) is a measure of the algorithm's direct output compared to the ground truth, without a human in the loop during the contour generation phase. The external reviewer ratings also assess the standalone performance of the AI-generated contours regarding their clinical utility for subsequent editing and approval.

7. The Type of Ground Truth Used

Type: The ground truth used was expert consensus, specifically from three clinically experienced experts (2 radiation therapy physicists and 1 radiation dosimetrist), guided by NRG/RTOG guidelines.

8. The Sample Size for the Training Set

CT Models Training Set: For CT structure models, there was an average of 341 training image sets.
MR Models Training Set: For MR Brain models, there was an average of 149 training image sets. For MR Pelvis models, there was an average of 306 training image sets.

9. How the Ground Truth for the Training Set Was Established

The document states that the deep-learning based structure models were "trained using imaging datasets consisting of anatomical organs" and that the "test datasets were independent from those used for training." While it extensively details how ground truth was established for the test sets (manual generation by three experts using consensus and NRG/RTOG guidelines), it does not explicitly describe how the ground truth for the training sets was established. However, given the nature of deep learning models for medical image segmentation, it is highly probable that the training data also had meticulously generated, expert-annotated ground truth contours, likely following similar rigorous processes as the test sets, potentially from various institutions or public datasets. The consistency of the model architecture and training methodologies (e.g., "very similar CNN architecture was used to train these new CT models") suggests a standardized approach to data preparation, including ground truth generation, for both training and testing.

Ask a Question

Ask a specific question about this device

K Number

K230685

Device Name

AutoContour Model RADAC V3

Manufacturer

Radformation, Inc.

Date Cleared

2023-04-14

(32 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K220598

Predicate For

K242729,K242994

Intended Use

AutoContour is intended to assist radiation treatment planners in contouring structures within medical images in preparation for radiation therapy treatment planning.

Device Description

As with AutoContour Model RADAC V2, the AutoContour Model RADAC V3 device is software that uses DICOM-compliant image data (CT or MR) as input to: (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical organs of the head and neck, thorax, abdomen and pelvis for adult male and female patients, (2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system.

AutoContour Model RADAC V3 consists of 3 main components:

1. A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
1. A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
1. A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device's performance, based on the provided document:

1. Table of Acceptance Criteria & Reported Device Performance

Feature/Metric	Acceptance Criteria	Reported Device Performance (Mean DSC/Rating)
CT Structures
Large volume DSC	>= 0.8	Initial Validation: 0.88 +/- 0.06External Validation: 0.90 +/- 0.09
Medium volume DSC	>= 0.65	Initial Validation: 0.88 +/- 0.08External Validation: 0.83 +/- 0.12
Small volume DSC	>= 0.5	Initial Validation: 0.75 +/- 0.12External Validation: 0.79 +/- 0.11
Clinical Appropriateness (1-5 scale, 5 best)	Average score >= 3	Average rating of 4.5
MR Structures
Medium volume DSC	>= 0.65	Initial Validation: 0.87 +/- 0.07External Validation: 0.87 +/- 0.07
Small volume DSC	>= 0.5	Initial Validation: 0.74 +/- 0.07External Validation: 0.74 +/- 0.07
Clinical Appropriateness (1-5 scale, 5 best)	Average score >= 3	Average rating of 4.4

2. Sample Sizes Used for the Test Set and Data Provenance

CT Test Set (Internal Validation): Approximately 10% of the training images, averaging 50 test images per structure model.
- Provenance: Retrospective data from "among the patients used for CT training and testing 51.7% were male and 48.3% female. Patient ages range 11-30 : 0.3%, 31-50 : 6.2%, 51-70 : 43.3%, 71-100 : 50.3%. Race 84.0% White, 12.8% Black or African American, 3.2% Other." No specific country of origin is mentioned, but implies internal company data.
CT Test Set (External Clinical Validation): Variable per structure model, ranging from 19 to 63 images.
- Provenance: Publicly available CT datasets from The Cancer Imaging Archive (TCIA archive). This suggests diverse, likely multi-national origin, but exact countries are not specified. The studies cited are primarily from US institutions (e.g., Memorial Sloan Kettering Cancer Center, MD Anderson Cancer Center). This data is retrospective.
MR Test Set (Internal Validation):
- Brain models: 92 testing images (from TCIA GLIS-RT dataset).
- Pelvis models: Sample size not explicitly stated for testing, but refers to "Prostate-MRI-US-Biopsy dataset."
- Provenance: TCIA datasets (implying diverse origin, likely US-centric as above), retrospective.
MR Test Set (External Clinical Validation):
- Brain models: 20 MR T1 Ax post (BRAVO) image scans acquired from a clinical partner (no specific country mentioned, but likely US given the context).
- Pelvis models: 19 images from a publicly available Gold Atlas Data set. The Gold Atlas project has references indicating collaboration across European and US institutions (e.g., Medical Physics - Europe/US).
- Provenance: Retrospective.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

Number of Experts: Three (3)
Qualifications: "clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist." No specific years of experience are mentioned.

4. Adjudication Method for the Test Set

Method: "Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) guidelines as appropriate by three clinically experienced experts". This implies a consensus-based approach, likely 3-way consensus. If initial contours differed, discussions and adjustments would lead to a final agreed-upon ground truth.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done

No, an MRMC comparative effectiveness study was not explicitly done to measure improvement for human readers with AI vs without AI assistance.
- The study focuses on the performance of the AI algorithm itself (standalone performance) and its clinical appropriateness as rated by experts. The "External Reviewer Average Rating" indicates how much editing would be required by a human, rather than directly measuring human reader performance improvement with assistance.
- "independent reviewers (not employed by Radformation) were used to evaluate the clinical appropriateness of structure models as they would be evaluated for the purposes of treatment planning. This external review was performed as a replacement to intraobserver variability testing done with the RADAC V2 structure models as it better quantified the usefulness of the structure model outputs in an unbiased clinical setting." This suggests an assessment of the usability of the AI-generated contours for human review and modification, but not a direct MRMC study comparing assisted vs. unassisted human performance.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, standalone performance was done.
- The Dice Similarity Coefficient (DSC) metrics presented are a measure of the algorithm's performance in generating contours when compared to expert-defined ground truth, without human intervention during the contour generation process. The "External Reviewer Average Rating" also evaluates the standalone output's quality before any human editing.

7. The Type of Ground Truth Used

Type of Ground Truth: Expert consensus.
- "Ground truthing of each test data set were generated manually using consensus (NRG/RTOG) guidelines as appropriate by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist."

8. The Sample Size for the Training Set

CT Training Set: Average of 373 training image sets per structure model.
MR Training Set:
- Brain models: Average of 274 training image sets.
- Pelvis models: Sample size not explicitly stated for training, but refers to "Prostate-MRI-US-Biopsy dataset."
- It's important to note that specific numbers vary per structure, as shown in Table 4 and Table 8.

9. How the Ground Truth for the Training Set Was Established

The document implies that the training data and their corresponding ground truths were prepared internally prior to the testing phase. While it doesn't explicitly state how the ground truth for the training set was established, it strongly suggests a similar rigorous, expert-driven approach as described for the test sets.
"The test datasets were independent from those used for training and consisted of approximately 10% of the number of training image sets used as input for the model." This indicates that ground truth was established for both training and testing datasets.
"Publically available CT datasets from The Cancer Imaging Archive (TCIA archive) were used and both AutoContour and manually added ground truth contours following the same structure guidelines used for structure model training were added to the image sets." This suggests that for publicly available datasets used for both training and external validation, ground truth was added following the same NRG/RTOG guidelines. For proprietary training data, a similar expert-based ground truth creation likely occurred.

Ask a Question

Ask a specific question about this device

K Number

K220598

Device Name

AutoContour Model RADAC V2

Manufacturer

Radformation, Inc.

Date Cleared

2022-08-24

(175 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

K230685

Intended Use

AutoContour is intended to assist radiation treatment planners in contouring structures within medical images in preparation for radiation therapy treatment planning.

Device Description

As with AutoContour RADAC, the AutoContour RADAC V2 device is software that uses DICOM-compliant image data (CT or MR) as input to (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical orqans of the head and neck, thorax, abdomen and pelvis for adult male and female patients.(2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system

AutoContour RADAC V2 consists of 3 main components:

1. A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
1. A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
1. A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.

AI/ML Overview

The provided text describes the acceptance criteria and the study that proves the device, AutoContour RADAC V2, meets these criteria. Here's a breakdown of the requested information:

1. A table of acceptance criteria and the reported device performance

The acceptance criterion for contouring accuracy is measured by the Mean Dice Similarity Coefficient (DSC), which varies based on the estimated volume of the structure.

Structure Size Category	DSC Acceptance Criteria (Mean)	Reported Device Performance (Mean DSC +/- STD)
Large volume structures	> 0.80	0.94 +/- 0.03
Medium volume structures	> 0.65	0.82 +/- 0.09
Small volume structures	> 0.50	0.61 +/- 0.14

The document also provides detailed DSC results for each contoured structure, which all meet or exceed their respective size category's acceptance criteria. For example, for "A_Aorta" (Large), the reported DSC Mean is 0.91, which is >0.80. For "Brainstem" (Medium), the reported DSC Mean is 0.90, which is >0.65. For "OpticChiasm" (Small), the reported DSC Mean is 0.63, which is >0.50.

2. Sample size used for the test set and the data provenance

CT Test Set:
- Sample Size: An average of 140 test image sets per CT structure model, constituting 20% of the training images. The specific number of test data sets for each CT structure is provided in the table (e.g., A_Aorta: 60, Bladder: 372).
- Data Provenance:
  - Country of Origin: Not explicitly stated, but the patient demographics suggest diverse origins, likely within the US, given the prevalence of specific cancers and racial demographics. The acquisition was done using a Philips Big Bore CT simulator.
  - Retrospective or Prospective: Not explicitly stated, but common in such validation studies, the data is typically retrospective patient data.
  - Demographics: 51.7% male, 48.3% female. Age range: 11-30 (0.3%), 31-50 (6.2%), 51-70 (43.3%), 71-100 (50.3%). Race: 84.0% White, 12.8% Black or African American, 3.2% Other.
  - Clinical Relevance: Data spanned across common radiation therapy treatment subgroups (Prostate, Breast, Lung, Head and Neck cancers).
MR Test Set:
- Sample Size: An average of 16 test image sets per MR structure model. Specific numbers are not provided for each MR structure, but the total validation set for sensitivity and specificity was 16 datasets.
- Data Provenance:
  - Country of Origin: Massachusetts General Hospital, Boston, MA.
  - Retrospective or Prospective: The text states "These training sets consisted primarily of glioblastoma and astrocytoma cases from the Cancer Imaging Archive (TCIA) Glioma data set." and that "The testing dataset was acquired at a different institution using a different scanner and sequence parameters", implying retrospective data collection from existing archives/institutions.
  - Demographics: 56% Male and 44% Female patients, with ages ranging from 20-80. No Race or Ethnicity data was provided.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of Experts: Three clinically experienced experts.
Qualifications: Two radiation therapy physicists and one radiation dosimetrist.

4. Adjudication method for the test set

Method: Ground truthing of each test dataset was generated manually using consensus (NRG/RTOG) guidelines, as appropriate, by the three clinically experienced experts. This implies a form of expert consensus adjudication.

5. If a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance

MRMC Study: No, an MRMC comparative effectiveness study involving human readers with and without AI assistance was not conducted. The performance data focuses on the software's standalone accuracy (Dice Similarity Coefficient, sensitivity, and specificity). The text states: "As with the Predicate Device, no clinical trials were performed for AutoContour RADAC V2."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Standalone Performance: Yes, the primary performance evaluation provided is for the software's standalone performance, measured by the Dice Similarity Coefficient (DSC), sensitivity, and specificity of the auto-generated contours against expert-established ground truth. The study explicitly states, "Further tests were performed on independent datasets from those included in training and validation sets in order to validate the generalizability of the machine learning model." This is a validation of the algorithm's performance.

7. The type of ground truth used

Type of Ground Truth: Expert consensus of manually contoured structures, established using NRG/RTOG (Radiation Therapy Oncology Group) guidelines. This is a form of expert consensus.

8. The sample size for the training set

CT Training Set: An average of 700 training image sets per CT structure model. The specific number of training data sets for each CT structure is provided in the table (e.g., A_Aorta: 240, Bladder: 1000).
MR Training Set: An average of 81 training image sets for MR structure models.

9. How the ground truth for the training set was established

The document implies that the ground truth for the training set was also established manually, similar to the test set, as it states "Datasets used for testing were removed from the training dataset pool before model training began, and used exclusively for testing." It is standard practice for medical imaging AI to train on expertly contoured data. While not explicitly detailed for the training set, the consistency in ground truth methodology for both training and testing in such submissions suggests expert manual contouring based on established guidelines would have been used for training as well.
- Source for MR Training Data: Primarily glioblastoma and astrocytoma cases from The Cancer Imaging Archive (TCIA) Glioma data set.

Ask a Question

Ask a specific question about this device

K Number

K200323

Device Name

AutoContour

Manufacturer

Radformation, Inc.

Date Cleared

2020-10-30

(263 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K130393,K181572

Predicate For

K220598

Intended Use

AutoContour is intended to assist radiation treatment planners in contouring structures within medical images in preparation for radiation therapy treatment planning.

Device Description

AutoContour consists of 3 main components:

An "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT datasets that are to be automatically uploaded to:
A cloud-based AutoContour automatic contouring service that produces initial contours and
A web application accessed via web browser which allows the user to perform registration with other image sets as well as review, edit, and export the structure set containing the contours.

AI/ML Overview

The provided text describes the acceptance criteria and study proving the device meets those criteria. Here's a breakdown of the requested information:

1. Table of Acceptance Criteria & Reported Device Performance

The document states that formal acceptance criteria and reported device performance are detailed in "Radformation's AutoContour Complete Test Protocol and Report." However, this specific report is not included in the provided text. The summary only generally states that "Nonclinical tests were performed... which demonstrates that AutoContour performs as intended per its indications for use" and "Verification and validation tests were performed to ensure that the software works as intended and pass/fail criteria were used to verify requirements."

Therefore, a table of acceptance criteria and reported device performance cannot be constructed from the provided text.

2. Sample Size Used for the Test Set and Data Provenance

The document mentions that "tests were performed on independent datasets from those included in training and validation sets in order to validate the generalizability of the machine learning model." However, the sample size for the test set is not explicitly stated.

Regarding data provenance:

The document implies the data used was medical image data (specifically CT, and for registration purposes, MR and PET).
The country of origin is not specified.
The terms "training and validation sets" and "independent datasets" suggest these were retrospective datasets used for model development and evaluation. There is no mention of prospective data collection.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

The document does not provide any information about the number of experts used to establish ground truth for the test set or their qualifications.

4. Adjudication Method for the Test Set

The document does not specify any adjudication method (e.g., 2+1, 3+1, none) used for the test set.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance?

The document explicitly states: "As with the Predicate Devices, no clinical trials were performed for AutoContour." This indicates that an MRMC comparative effectiveness study involving human readers and AI assistance was not conducted. Therefore, no effect size for human reader improvement is reported.

6. If a Standalone (i.e. algorithm only without human-in-the-loop performance) was done

The document mentions "tests were performed on independent datasets from those included in training and validation sets in order to validate the generalizability of the machine learning model." This strongly suggests that standalone performance of the algorithm was evaluated. Although specific metrics for this standalone performance are not detailed in the provided text, the validation of a machine learning model against independent datasets implies a standalone evaluation.

7. The Type of Ground Truth Used

The document mentions that AutoContour is intended to "assist radiation treatment planners in contouring structures within medical images." Given this, the ground truth for the contours would typically be expert consensus or expert-annotated contours. However, the document itself does not explicitly state the type of ground truth used (e.g., expert consensus, pathology, outcomes data).

8. The Sample Size for the Training Set

The document mentions "training and validation sets" but does not provide the sample size for the training set.

9. How the Ground Truth for the Training Set Was Established

The document mentions "training and validation sets" but does not detail how the ground truth for the training set was established. Similar to the test set, it would likely involve expert contouring, but this is not explicitly stated.

Ask a Question

Ask a specific question about this device

Page 1 of 1