Search Results

Synapse 3D Base Tools is medical imaging software that is intended to provide trained medical professionals with tools to aid them in reading, interpreting, reporting, and treatment planning. Synapse 3D Base Tools accepts DICOM compliant medical images acquired from a variety of imaging devices including, CT, MR, CR, US, NM, PT, and XA, etc.

This product is not intended for use with or for the primary diagnostic interpretation of Mammography images. Synapse 3D Base Tools provides several levels of tools to the user: Basic imaging tools for general images, including 2D viewing, volume rendering and 3D volume viewing, orthogonal / oblique / curved Multi-Planar Reconstructions (MPR), Maximum (MIP), Average (RaySum) and Minimum (MinIP) Intensity Projection, 4D volume viewing, image fusion, image subtraction, surface rendering, sector and rectangular shape MPR image viewing, MPR for dental images, creating and displaying multiple MPR images along an object, time-density distribution, basic image processing, noise reduction, CINE, measurements, annotations, reporting, printing, storing, distribution, and general image management and administration tools, etc.

• Tools for regional segmentation of anatomical structures within the image data, path definition through vascular and other tubular structures, and boundary detection.
• Image viewing tools for modality specific images, including CT PET fusion and ADC image viewing for MR studies.
• Imaging tools for CT images including virtual endoscopic viewing, dual energy image viewing.
• Imaging tools for MR images including delayed enhancement image viewing, diffusion-weighted MRI image viewing.

The intended patient population for all applications implemented as base tools is limited to adult population (over 22 years old).

Device Description

The 3D image analysis software Synapse 3D Base Tools (V7.0) is medical application software running on Windows server/client configuration installed on commercial general-purpose Windows-compatible computers. It offers software tools which can be used by trained professionals to interpret medical images obtained from various medical devices, to create reports, or to develop treatment plans.

AI/ML Overview

The provided text details the FDA 510(k) clearance for Synapse 3D Base Tools (V7.0). It primarily focuses on demonstrating substantial equivalence to a predicate device and includes information on nonclinical and certain clinical performance testing for newly added deep-learning-based organ segmentation features.

Here's an analysis of the acceptance criteria and study that proves the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

The core of acceptance criteria for this 510(k) submission appears to be demonstrating substantial equivalence to a predicate device (Synapse 3D Base Tools V6.6 K221677) and proving the safety and effectiveness of new features, particularly those utilizing deep learning for automatic or semi-automatic organ extraction.

While no explicit "acceptance criteria" table is provided in the document in terms of specific thresholds for the overall device functionality, the performance section for the deep learning models serves as such for those specific features. The acceptance criterion for these features is implicitly showing a high Dice Similarity Coefficient (DICE) score, indicating strong agreement between the automated segmentation and the ground truth.

Table of Acceptance Criteria and Reported Device Performance (for Deep Learning Segmentation)

Segmented Structure (Modality)	Number of Cases	Acceptance Criteria (Implicit) - High DICE Score	Reported Device Performance (Average DICE)
Duodenum (CT)	30	High DICE	0.85
Stomach (CT)	30	High DICE	0.96
Lung section (Left S1S2) (CT)	30	High DICE	0.92
Lung section (Left S3) (CT)	30	High DICE	0.88
Lung section (Left S4) (CT)	30	High DICE	0.75
Lung section (Left S5) (CT)	30	High DICE	0.81
Lung section (Left S6) (CT)	30	High DICE	0.9
Lung section (Left S8) (CT)	30	High DICE	0.85
Lung section (Left S9) (CT)	30	High DICE	0.73
Lung section (Left S10) (CT)	30	High DICE	0.87
Lung section (Right S1) (CT)	30	High DICE	0.89
Lung section (Right S2) (CT)	30	High DICE	0.89
Lung section (Right S3) (CT)	30	High DICE	0.91
Lung section (Right S4) (CT)	30	High DICE	0.88
Lung section (Right S5) (CT)	30	High DICE	0.85
Lung section (Right S6) (CT)	30	High DICE	0.9
Lung section (Right S7) (CT)	30	High DICE	0.8
Lung section (Right S8) (CT)	30	High DICE	0.84
Lung section (Right S9) (CT)	30	High DICE	0.71
Lung section (Right S10) (CT)	30	High DICE	0.83
Pancreas section (Body) (CT)	29	High DICE	0.91
Pancreas section (Head) (CT)	29	High DICE	0.95
Pancreas section (Tail) (CT)	29	High DICE	0.99
Spleen (CT)	35	High DICE	0.95
Pancreas duct (CT)	29	High DICE	0.74
Pancreas (CT)	30	High DICE	0.86
ROI (CT)*	29	High DICE	0.85
Liver section (S1) (CT)	31	High DICE	0.99
Liver section (S2) (CT)	31	High DICE	0.99
Liver section (S3) (CT)	31	High DICE	0.97
Liver section (S4) (CT)	31	High DICE	0.97
Liver section (S5) (CT)	31	High DICE	0.92
Liver section (S6) (CT)	31	High DICE	0.94
Liver section (S7) (CT)	31	High DICE	0.98
Liver section (S8) (CT)	31	High DICE	0.97
Gall bladder (CT)	37	High DICE	0.92
Bronchus (CT)	30	High DICE	0.87
Lung lobe (Left Lower) (CT)	30	High DICE	0.99
Lung lobe (Left Upper) (CT)	30	High DICE	0.99
Lung lobe (Right Lower) (CT)	30	High DICE	0.99
Lung lobe (Right Middle) (CT)	30	High DICE	0.97
Lung lobe (Right Upper) (CT)	30	High DICE	0.99
Pulmonary Arteries (CT)	30	High DICE	0.83
Pulmonary Veins (CT)	30	High DICE	0.85
Pancreas vessel (CT)	30	High DICE	0.9
Prostate (MRI)	30	High DICE	0.9
Rectal ROI (tumor) (MRI)*	27	High DICE	0.75
Ureter (T2) (MRI)	33	High DICE	0.63
Bladder (MRI)	35	High DICE	0.93
Pelvis (MRI)	34	High DICE	0.94
Seminal vesicle (MRI)	32	High DICE	0.7
Ureter (T1Dynamic) (MRI)	33	High DICE	0.76
Prostate tumor (DWI) (MRI)*	36	High DICE	0.65
Prostate tumor (T2) (MRI)*	39	High DICE	0.6
Kidney tumor (MRI)*	31	High DICE	0.88
Left Kidney (MRI)	31	High DICE	0.97
Right Kidney (MRI)	31	High DICE	0.98
ROI (MRI)*	133	High DICE	0.72
Rectal muscularis propria (MRI)	32	High DICE	0.91
Mesorectum (MRI)	32	High DICE	0.9
Pelvic vessel (Artery) (MRI)	30	High DICE	0.81
Pelvic vessel (Vein) (MRI)	30	High DICE	0.8
Kidney vessel (Artery) (MRI)	32	High DICE	0.92
Kidney vessel (Vein) (MRI)	32	High DICE	0.86
Pelvic nerve (MRI)	30	High DICE	0.7
Levator ani muscle (MRI)	30	High DICE	0.77
Overall (Total cases)	1086	Consistent and Acceptable Performance	Range of 0.60 to 0.99 (Average DICE)

Note: For items marked with an asterisk (*), the extraction is performed semi-automatically. All others are executed automatically. The acceptance criterion is "High DICE," as no specific quantitative threshold is given, but the reported values generally indicate good agreement. "Additional distance based metrics 95% Hausdorff Distance and Mean Surface Distance were also reported along with the subgroup analysis. Detailed results are reported in the labeling."

Study that Proves the Device Meets Acceptance Criteria

The study described is a performance testing for the new deep-learning-based automatic or semi-automatic organ extraction functions.

Sample size used for the test set and the data provenance:
- Sample Size: 1086 cases were collected for performance testing.
- Data Provenance: The data was collected newly from US patient populations across various regions: US_East (295 cases), US_Midwest (175 cases), US_Southeast (185 cases), US_Southwest (73 cases), and US_Northwest (4 cases). This indicates a prospective data collection specifically for this testing, originating from the US. The text also mentions the test data is "independence from training data."
- Demographics: The test set included 672 men, 414 women, and a range of ages from 22 to 120+ years old. Modalities covered CT and MRI from various major manufacturers (SIEMENS, GE, PHILIPS, CANON, FUJIFILM).
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document does not specify the number of experts or their qualifications used to establish the ground truth. It only states that the performance testing used an "average DICE" score, implying a comparison against some form of expertly derived ground truth.
Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- The document does not specify any adjudication method for establishing the ground truth.
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was not explicitly mentioned or performed as part of this submission for demonstrating substantial equivalence. The clinical testing mentioned focused on the standalone performance of the new deep learning features (i.e., automatic or semi-automatic segmentation accuracy) rather than human reader performance with or without AI assistance. The submission states, "The subject of this 510(k) notification, Synapse 3D Base Tools does not require clinical studies to support safety and effectiveness of the software."
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance evaluation was done for the automatic (and semi-automatic) deep learning segmentation functions. The Dice Similarity Coefficient (DICE) scores provided are a measure of the algorithm's performance in segmenting anatomical structures compared to a ground truth, without human intervention in the segmentation process itself, although some extractions are noted as "semi-automatic" where human interaction would refine the output.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The text implies the ground truth for the segmentation tasks was established by expert consensus/manual annotation (as DICE is a metric comparing algorithmic output to a reference segmentation, typically derived from expert outlines). However, the specific method (e.g., single expert, multi-expert consensus) is not detailed. It mentions "Additional distance based metrics 95% Hausdorff Distance and Mean Surface Distance were also reported," which are also used for comparing segmentation masks to a ground truth.
The sample size for the training set:
- The document does not explicitly provide the sample size for the training set. It only states that the test data was "independence from training data," implying a separate training dataset was used.
How the ground truth for the training set was established:
- The document does not provide details on how the ground truth for the training set was established. However, for deep learning segmentation, it is typically established through manual annotation by qualified experts.

Ask a Question

Ask a specific question about this device

K Number

K242925

Device Name

MR Contour DL

Manufacturer

GE HealthCare

Date Cleared

2025-04-01

(189 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K220598,K213976,K230082

Predicate For

N/A

Intended Use

MR Contour DL generates a Radiotherapy Structure Set (RTSS) DICOM with segmented organs at risk which can be used by trained medical professionals. It is intended to aid in radiation therapy planning by generating initial contours to accelerate workflow for radiation therapy planning. It is the responsibility of the user to verify the processed output contours and user-defined labels for each organ at risk and correct the contours/labels as needed. MR Contour DL is intended to be used with images acquired on MR scanners, in adult patients.

Device Description

MR Contour DL is a post processing application intended to assist a clinician by generating contours of organ at risk (OAR) from MR images in the form of a DICOM Radiotherapy Structure Set (RTSS) series. MR Contour DL is designed to automatically contour the organs in the head/neck, and in the pelvis for Radiation Therapy (RT) planning of adult cases. The output of the MR Contour DL is intended to be used by radiotherapy (RT) practitioners after review and editing, if necessary, and confirming the accuracy of the contours for use in radiation therapy planning.

MR Contour DL uses customizable input parameters that define RTSS description, RTSS labeling, organ naming and coloring. MR Contour DL does not have a user interface of its own and can be integrated with other software and hardware platforms. MR Contour DL has the capability to transfer the input and output series to the customer desired DICOM destination(s) for review.

MR Contour DL uses deep learning segmentation algorithms that have been designed and trained specifically for the task of generating organ at risk contours from MR images. MR Contour DL is designed to contour 37 different organs or structures using the deep learning algorithms in the application processing workflow.

The input of the application is MR DICOM images in adult patients acquired from compatible MR scanners. In the user-configured profile, the user has the flexibility to choose both the covered anatomy of input scan and the specific organs for segmentation. The proposed device has been tested on GE HealthCare MR data.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) clearance letter for MR Contour DL:

1. Table of Acceptance Criteria and Reported Device Performance

Device: MR Contour DL

Metric	Organ Anatomy Region	Acceptance Criteria	Reported Performance (Mean)	Outcome
DICE Similarity Coefficient (DSC)	Small Organs (e.g., chiasm, inner-ear)	≥ 50%	67.4% - 98.8% (across all organs)	Met
	Medium Organs (e.g., brainstem, eye)	≥ 65%	79.6% - 95.5% (across relevant organs)	Met
	Large Organs (e.g., bladder, head-body)	≥ 80%	90.3% - 99.3% (across relevant organs)	Met
95th percentile Hausdorff Distance (HD95) Comparison	All Organs	Improved or Equivalent to Predicate Device	Improved or Equivalent in 24/28 organs analyzed; average HD95 of 4.7 mm (< predicate average)	Met
Likert Score (Reader Study)	All Organs	Mean Likert Score ≥ 3.0 (where 3 = good, some correction needed)	3.0 - 4.5 (across all organs)	Met

Note: The HD95 values for specific organs are provided in Table 4 of the document, showing individual comparisons (Improved, Not-Improved, Equivalent, N/A). The overall performance for HD95 is summarized as met based on the text "improved or equivalent HD95 value in 24/28 of the organs analyzed and an average HD95 performance of 4.7 mm, which is smaller than the average corresponding HD95 values of the predicate device."

2. Sample Sizes and Data Provenance

Test Set (Non-Clinical/Bench Testing):
- Total Cases: 105 retrospectively collected exams.
  - Head/Neck: 50 cases (23 from independently collected cohorts, 27 separated from development data)
  - Pelvis: 55 cases (32 from independently collected cohorts, 23 separated from development data)
- Data Provenance:
  - Country of Origin: USA (72% Head/Neck, 58% Pelvis) and Europe (NL 28% Head/Neck, UK 42% Pelvis)
  - Retrospective/Prospective: Retrospectively collected
Test Set (Clinical/Reader Study):
- Total Cases: 70 cases (a subset of the non-clinical test data).
  - Head/Neck: 30 cases
  - Pelvis: 40 cases
- Data Provenance: Same as non-clinical testing, as it was a subset.
Training Set: Not explicitly stated. The document mentions "separated from the development data cohorts before the models were trained," implying a training set existed but its size is not given.

3. Number of Experts and Qualifications for Ground Truth (Test Set)

Number of Experts: Three (3) board-certified radiation oncologists.
Qualifications: Two (2) from the USA, one (1) from Europe. All certified radiation oncologists. Experience level (e.g., "10 years experience") is not specified beyond "board certified."

4. Adjudication Method (Test Set)

Non-Clinical/Bench Testing (Ground Truth Generation):
1. Manual contours delineated by GEHC operators trained using international guidelines (DAHANCA, RTOG).
2. Manual contours were revised (corrected and approved) by the three board-certified radiation oncologists.
3. All three independently validated ground-truth contours were incorporated in the performance evaluation. This suggests a form of consensus or a voting mechanism, but the exact "adjudication" (e.g., 2+1, averaging) is not detailed. It implies that the final ground truth was derived from the combination of all three expert reviews.
Clinical/Reader Study:
- Automated contours were scored by the three certified radiation oncologists.
- Readers completed their assessments independently and were blinded to the results of other readers' assessments.
- All three independently provided Likert Scores were incorporated in the performance evaluation. Similar to ground truth generation, the exact method of combining scores beyond "incorporating" is not specified, but the final reported value is a "Likert score MEAN."

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was it done? Yes, a multi-reader study was conducted to assess the adequacy of the contours. The readers were radiologists providing an assessment (Likert score) of the AI-generated contours.
Effect Size of Human Readers' Improvement with AI vs. without AI Assistance: This study was structured to evaluate the adequacy of AI-generated contours for use in RT planning, with human readers providing an assessment of these pre-generated AI contours. It was not a comparative effectiveness study designed to measure the improvement in human reader performance when assisted by AI versus unassisted human performance (e.g., human-only contouring vs. AI-assisted human contouring). The study aimed to show that the AI's output is acceptable for human review and correction, not how much faster or more accurate humans become with the AI.

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance)

Was it done? Yes, the "Non-Clinical Testing" or "Bench Testing" section directly assesses the algorithm's standalone performance using DSC and HD95 metrics, comparing its output to expert-generated ground truth. The algorithm generates the initial contours, which are then evaluated for accuracy against the established ground truth.

7. Type of Ground Truth Used

Non-Clinical/Bench Testing: Expert consensus (manual contours by trained operators, revised and approved by three board-certified radiation oncologists).
Clinical/Reader Study: Expert opinion/assessment (Likert scores provided by three board-certified radiation oncologists on the adequacy of the AI-generated contours).

8. Sample Size for the Training Set

The sample size for the training set is not explicitly provided in the document. It only states that the test data cases (27 head/neck, 23 pelvis) were "separated from the development data cohorts before the models were trained."

9. How the Ground Truth for the Training Set was Established

The method for establishing ground truth for the training set is not explicitly detailed. It can be inferred that it followed a similar process to the test set ground truth (manual contouring by trained operators, potentially reviewed by experts), as it mentions "development data cohorts," but the specifics are absent.

Ask a Question

Ask a specific question about this device

K Number

K221305

Device Name

AI-Rad Companion Organs RT

Manufacturer

Siemens Medical Solutions USA, Inc

Date Cleared

2022-10-14

(162 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K213976,K193562

Predicate For

K232928

Intended Use

AI-Rad Companion Organs RT is a post-processing software intended to automatically contour DICOM CT imaging data using deep-learning-based algorithms.

Contours that are generated by AI-Rad Companion Organs RT may be used as input for clinical workflows including external beam radiation therapy treatment planning. AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept contours generated by AI-Rad Companion Organs RT.

The output of AI-Rad Companion Organs RT in the format of RTSTRUCT objects are intended to be used by trained medical professionals.

The software is not intended to automatically detect or contour lesions. Only DICOM images of adult patients are considered to be valid input.

Device Description

AI-Rad Companion Organs RT is a post-processing software used to automatically contour DICOM CT imaging data using deep-learning-based algorithms. AI-Rad Companion Organs RT contouring workflow supports CT input data and produces RTSTRUCT outputs. The configuration of the organ database and organ templates defining the organs and structures to be contoured based on the input DICOM data is managed via a configuration interface. Contours that are generated by AI-Rad Companion Organs RT may be used as input for clinical workflows including external beam radiation therapy treatment planning.

The output of AI-Rad Companion Organs RT, in the form of RTSTRUCT objects, are intended to be used by trained medical professionals. The output of AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept contours generated by AI-Rad Companion Organs RT application.

At a high-level, AI-Rad Companion Organs RT includes the following functionality:

1. Automated contouring of Organs at Risk (OAR) workflow
- a. Input -DICOM CT
- b. Output DICOM RTSTRUCT
1. Organ Templates configuration (incl. Organ Database)
1. Web-based preview of contouring results to accept or reject the generated contours

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for the AI-Rad Companion Organs RT device, based on the provided text:

1. Table of Acceptance Criteria & Reported Device Performance:

Validation Testing Subject	Acceptance Criteria	Reported Device Performance (Median)
Organs in Predicate Device	1. All organs segmented in the predicate device are also segmented in the subject device.	Met (all predicate organs are segmented in the subject device, implied by comparison tables).
	2. The lower bound of the 95th percentile CI of the segmentation (subject device) is greater than 0.1 Dice lower than the mean of the predicate device segmentation.	DICE: Subject: 0.85 (CI: [80.23, 84.61]) vs. Predicate: 0.85 (implied CI close to median). The statement "performance of the subject device and predicate device are comparable in DICE and ASSD" implies this criterion was met.
		ASSD: Subject: 0.93 (CI: [0.86, 1.14]) vs. Predicate: 0.94 (implied CI close to median). The statement "performance of the subject device and predicate device are comparable in DICE and ASSD" implies this criterion was met.
Head & Neck Lymph Nodes	1. The overall fail rate of each organ/anatomical structure is smaller than 15%.	Not explicitly stated for each organ/anatomical structure, but generally implied by acceptable DICE and ASSD.
	2. The lower bound of the 95th percentile CI of the segmentation (subject device) is greater than 0.1 Dice lower than the mean of the reference device segmentation.	DICE: Subject (Head and Neck lymph node class): Avg 81.32 (CI: [80.32, 82.12]) vs. Reference (Pelvic lymph node class): Avg 80. The statement "performance of the subject device for non-overlapping organs is comparable in DICE to the reference device" and the specific values show that 80.32 is not more than 0.1 lower than 80 (it's higher by 0.32), so this criterion appears met.
		ASSD: Subject (Head and Neck lymph node class): Avg 1.06 (CI: [0.99, 1.19]) vs. Reference: N.A. (No direct comparison for ASSD).

Note: The text did not explicitly state the "fail rate" for the Head & Neck Lymph Nodes, only that it should be "smaller than 15%". The conclusion implies all acceptance criteria were met. The confidence intervals for the predicate device's DICE and ASSD are missing in Table 4, but the statement "performance of the subject device and predicate device are comparable" suggests the criteria were acceptable.

2. Sample Size Used for the Test Set and Data Provenance:

Sample Size: N = 113 retrospective performance study on CT data.
- This N=113 is composed of:
  - Cohort A: 73 subjects (14 from Germany, 59 from Brazil)
  - Cohort B: 40 subjects (Canada: 40)
Data Provenance: Multiple clinical sites across North America (Canada) and Europe (Germany, Brazil – often considered part of South America, but grouped with "Europe" in the text for data collection context). The study used previously acquired CT data (retrospective).

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts:

Number of Experts: Not explicitly stated as a specific number. The text mentions "a team of experienced annotators" and "a board-certified radiation oncologist".
Qualifications:
- Annotators: "experienced annotators mentored by radiologists or radiation oncologists".
- Review/Correction: "board-certified radiation oncologist".

4. Adjudication Method for the Test Set:

The ground truth annotations were drawn manually by a team of experienced annotators and then underwent a "quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist". This suggests a method where initial annotations are created by multiple individuals and then reviewed/corrected by a single, highly qualified expert. This could be interpreted as a form of expert review/adjudication.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done:

No, a MRMC comparative effectiveness study was not explicitly stated as having been done. The performance evaluation focused on comparing the AI algorithm's output to expert-generated ground truth and comparing the device's performance to predicate/reference devices, not on how human readers improve with or without AI assistance.

6. If a Standalone (i.e. algorithm only without human-in-the loop performance) was done:

Yes, a standalone performance study was done. The study "validated the AI-Rad Companion Organs RT software from clinical perspective" by evaluating its auto-contouring algorithm, and calculating metrics like DICE coefficients and ASSD against ground truth annotations. The device's output "must be used in conjunction with appropriate software... to review, edit, and accept contours", indicating its standalone output is then reviewed by a human, but the validation of its generation of contours is standalone.

7. The Type of Ground Truth Used:

Expert Consensus/Manual Annotation with Expert Review (following guidelines): "Ground truth annotations were established following RTOG and clinical guidelines using manual annotation. The ground truth annotations were drawn manually by a team of experienced annotators mentored by radiologists or radiation oncologists using an internal annotation tool. Additionally, a quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist..." This indicates a robust expert-derived ground truth.

8. The Sample Size for the Training Set:

160 datasets (for Head & Neck specifically, other organs might have different training data, but this is the only training set sample size provided).

9. How the Ground Truth for the Training Set was Established:

"In both the annotation process for the training and validation testing data, the annotation protocols for the OAR were defined following the NRG/RTOG guidelines. The ground truth annotations were drawn manually by a team of experienced annotators mentored by radiologists or radiation oncologists using an internal annotation tool. Additionally, a quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist using validated medical image annotation tools."
- This is the same process as for the test set, ensuring consistency in ground truth establishment.

Ask a Question

Ask a specific question about this device

K Number

K220598

Device Name

AutoContour Model RADAC V2

Manufacturer

Radformation, Inc.

Date Cleared

2022-08-24

(175 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

K230685

Intended Use

AutoContour is intended to assist radiation treatment planners in contouring structures within medical images in preparation for radiation therapy treatment planning.

Device Description

As with AutoContour RADAC, the AutoContour RADAC V2 device is software that uses DICOM-compliant image data (CT or MR) as input to (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical orqans of the head and neck, thorax, abdomen and pelvis for adult male and female patients.(2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system

AutoContour RADAC V2 consists of 3 main components:

1. A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
1. A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
1. A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.

AI/ML Overview

The provided text describes the acceptance criteria and the study that proves the device, AutoContour RADAC V2, meets these criteria. Here's a breakdown of the requested information:

1. A table of acceptance criteria and the reported device performance

The acceptance criterion for contouring accuracy is measured by the Mean Dice Similarity Coefficient (DSC), which varies based on the estimated volume of the structure.

Structure Size Category	DSC Acceptance Criteria (Mean)	Reported Device Performance (Mean DSC +/- STD)
Large volume structures	> 0.80	0.94 +/- 0.03
Medium volume structures	> 0.65	0.82 +/- 0.09
Small volume structures	> 0.50	0.61 +/- 0.14

The document also provides detailed DSC results for each contoured structure, which all meet or exceed their respective size category's acceptance criteria. For example, for "A_Aorta" (Large), the reported DSC Mean is 0.91, which is >0.80. For "Brainstem" (Medium), the reported DSC Mean is 0.90, which is >0.65. For "OpticChiasm" (Small), the reported DSC Mean is 0.63, which is >0.50.

2. Sample size used for the test set and the data provenance

CT Test Set:
- Sample Size: An average of 140 test image sets per CT structure model, constituting 20% of the training images. The specific number of test data sets for each CT structure is provided in the table (e.g., A_Aorta: 60, Bladder: 372).
- Data Provenance:
  - Country of Origin: Not explicitly stated, but the patient demographics suggest diverse origins, likely within the US, given the prevalence of specific cancers and racial demographics. The acquisition was done using a Philips Big Bore CT simulator.
  - Retrospective or Prospective: Not explicitly stated, but common in such validation studies, the data is typically retrospective patient data.
  - Demographics: 51.7% male, 48.3% female. Age range: 11-30 (0.3%), 31-50 (6.2%), 51-70 (43.3%), 71-100 (50.3%). Race: 84.0% White, 12.8% Black or African American, 3.2% Other.
  - Clinical Relevance: Data spanned across common radiation therapy treatment subgroups (Prostate, Breast, Lung, Head and Neck cancers).
MR Test Set:
- Sample Size: An average of 16 test image sets per MR structure model. Specific numbers are not provided for each MR structure, but the total validation set for sensitivity and specificity was 16 datasets.
- Data Provenance:
  - Country of Origin: Massachusetts General Hospital, Boston, MA.
  - Retrospective or Prospective: The text states "These training sets consisted primarily of glioblastoma and astrocytoma cases from the Cancer Imaging Archive (TCIA) Glioma data set." and that "The testing dataset was acquired at a different institution using a different scanner and sequence parameters", implying retrospective data collection from existing archives/institutions.
  - Demographics: 56% Male and 44% Female patients, with ages ranging from 20-80. No Race or Ethnicity data was provided.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of Experts: Three clinically experienced experts.
Qualifications: Two radiation therapy physicists and one radiation dosimetrist.

4. Adjudication method for the test set

Method: Ground truthing of each test dataset was generated manually using consensus (NRG/RTOG) guidelines, as appropriate, by the three clinically experienced experts. This implies a form of expert consensus adjudication.

5. If a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance

MRMC Study: No, an MRMC comparative effectiveness study involving human readers with and without AI assistance was not conducted. The performance data focuses on the software's standalone accuracy (Dice Similarity Coefficient, sensitivity, and specificity). The text states: "As with the Predicate Device, no clinical trials were performed for AutoContour RADAC V2."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Standalone Performance: Yes, the primary performance evaluation provided is for the software's standalone performance, measured by the Dice Similarity Coefficient (DSC), sensitivity, and specificity of the auto-generated contours against expert-established ground truth. The study explicitly states, "Further tests were performed on independent datasets from those included in training and validation sets in order to validate the generalizability of the machine learning model." This is a validation of the algorithm's performance.

7. The type of ground truth used

Type of Ground Truth: Expert consensus of manually contoured structures, established using NRG/RTOG (Radiation Therapy Oncology Group) guidelines. This is a form of expert consensus.

8. The sample size for the training set

CT Training Set: An average of 700 training image sets per CT structure model. The specific number of training data sets for each CT structure is provided in the table (e.g., A_Aorta: 240, Bladder: 1000).
MR Training Set: An average of 81 training image sets for MR structure models.

9. How the ground truth for the training set was established

The document implies that the ground truth for the training set was also established manually, similar to the test set, as it states "Datasets used for testing were removed from the training dataset pool before model training began, and used exclusively for testing." It is standard practice for medical imaging AI to train on expertly contoured data. While not explicitly detailed for the training set, the consistency in ground truth methodology for both training and testing in such submissions suggests expert manual contouring based on established guidelines would have been used for training as well.
- Source for MR Training Data: Primarily glioblastoma and astrocytoma cases from The Cancer Imaging Archive (TCIA) Glioma data set.

Ask a Question

Ask a specific question about this device

Page 1 of 1