Search Results
Found 43 results
510(k) Data Aggregation
(30 days)
Ask a specific question about this device
(275 days)
Seg Pro V3 is a software device intended to assist trained radiation oncology professionals, including, but not limited to, radiation oncologists, medical physicists, and dosimetrists, during their clinical workflows of radiation therapy treatment planning by providing initial contours of organs at risk on DICOM images. Seg Pro V3 is intended to be used on adult patients only.
The contours are generated by deep-learning algorithms and then transferred to radiation therapy treatment planning systems. Seg Pro V3 must be used in conjunction with a DICOM-compliant treatment planning system to review and edit results generated. Seg Pro V3 is not intended to be used for decision making or to detect lesions.
Seg Pro V3 is an adjunct tool and is not intended to replace a clinician's judgment and manual contouring of the normal organs on DICOM images. Clinicians must not use the software generated output alone without review as the primary interpretation.
The proposed device, Seg Pro V3, is a standalone software that is designed to be used by trained radiation oncology professionals to automatically delineate (segment/contour) organs-at-risk (OARs) on DICOM images. This auto-contouring of OARs is intended to facilitate radiation therapy workflows.
The device receives images in DICOM format as input and automatically generates the contours of OARs, which are stored in DICOM format and in RTSTRUCT modality. The device must be used in conjunction with a DICOM-compliant treatment planning system (TPS) to review and edit results. Once data is routed to Seg Pro V3, the data will be processed and no user interaction is required, nor provided.
The deployment environment is recommended to be in a local network with an existing hospital-grade IT system in place. Seg Pro V3 should be installed on a specialized server supporting deep learning processing. The configurations are only being operated by the manufacturer.
- Local network setting of input and output destinations.
- Presentation of labels and their color.
- Processed image management and output (RTSTRUCT) file management.
Here's an analysis of the acceptance criteria and study proving the device meets those criteria, based on the provided FDA 510(k) clearance letter for Seg Pro V3 (RT-300):
Acceptance Criteria and Reported Device Performance
| Acceptance Criteria (Metric) | Threshold (for large, medium, small volume structures) | Reported Device Performance (Mean DSC for respective sizes) |
|---|---|---|
| Dice Similarity Coefficient (DSC) | > 0.80 for large-volume structures | 0.90 |
| Dice Similarity Coefficient (DSC) | > 0.65 for medium-volume structures | 0.86 |
| Dice Similarity Coefficient (DSC) | > 0.50 for small-volume structures | 0.73 |
| Overall Mean DSC | (N/A - overall performance reported) | 0.85 |
| Overall Median 95% Hausdorff Distance (HD) | (N/A - overall performance reported) | 2.62 mm |
| Median 95% HD for large-volume structures | (N/A - specific threshold not defined) | 3.01 mm |
| Median 95% HD for medium-volume structures | (N/A - specific threshold not defined) | 2.57 mm |
| Median 95% HD for small-volume structures | (N/A - specific threshold not defined) | 2.27 mm |
Study Details Proving Device Meets Acceptance Criteria
2. Sample size used for the test set and the data provenance:
- Sample Size: 175 cases.
- Data Provenance: Consecutively collected from the Cancer Imaging Archive (TCIA) datasets. The data was acquired independently from product development training and internal testing. Race and ethnic distribution within the study data patient population was unavailable.
- Geographic Origin (inferred): TCIA is primarily a US-based resource, so data is likely from the United States or a diverse international collection.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Three.
- Qualifications of Experts: Board-certified radiation oncologists.
4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- Adjudication Method: "Each OAR contour used as ground truth (GT) was independently generated by three board-certified radiation oncologists." This implies a consensus or agreement among all three experts was used to define the ground truth, effectively a 3-way consensus. The document does not explicitly state an adjudication method like 2+1, but the independent generation by three experts suggests a high-quality, agreed-upon ground truth.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- MRMC Study: No. The study primarily evaluated the standalone performance of the AI algorithm. The clinical validation mentions that Seg Pro V3 "operates as intended within a clinical workflow and supports its intended use as an adjunct tool," but it does not present data from an MRMC study comparing human reader performance with and without AI assistance.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Standalone Performance: Yes. "a standalone performance evaluation was conducted to assess the Organ-at-Risk (OAR) contouring capabilities of Seg Pro V3. The observed results indicated that Seg Pro V3 by itself, in the absence of any interaction with a clinician, can contour developed OARs with satisfactory results." The reported DSC and HD metrics are from this standalone evaluation.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Ground Truth Type: Expert consensus. The ground truth (GT) for each OAR contour was "independently generated by three board-certified radiation oncologists."
8. The sample size for the training set:
- The document explicitly states that the 175 cases used for the standalone performance evaluation were "acquired independently from product development training and internal testing." However, the document does not specify the sample size of the training set used to develop the deep learning models.
9. How the ground truth for the training set was established:
- The document does not specify how the ground truth for the training set was established. It only describes the ground truth establishment for the test set.
Ask a specific question about this device
(268 days)
It is used by radiation oncology department to segment CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaption.
The proposed device, AccuContour 4.0 Family, is a standalone software with the following variants: AccuContour and AccuContour-Lite. The functions of AccuContour-Lite is a subset of AccuContour.
AccuContour:
It is used by oncology department to register multi-modality images and segment (non-contrast) CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaptation.
The product has two image processing functions:
- Deep learning contouring: it can automatically contour organs-at-risk, in head and neck, thorax, abdomen and pelvis (for both male and female) areas,
- Automatic registration: rigid and deformable registration, and
- Manual contouring.
It also has the following general functions:
- Receive, add/edit/delete, transmit, input/export, medical images and DICOM data;
- Patient management;
- Review tool of processed images;
- Extension tool;
- Plan evaluation and plan comparison;
- Dose analysis.
AccuContour-Lite:
It is used by oncology department to segment (non-contrast) CT images, to generate needed information for treatment planning, treatment evaluation and treatment adaptation.
The product has one image processing function:
Deep learning contouring: it can automatically contour organs-at-risk, in head and neck, thorax, abdomen and pelvis (for both male and female) areas,
It also has the following general functions:
- Receive, add/edit/delete, transmit, input/export, medical images and DICOM data;
- Patient management;
- Review tool of processed images.
Here's an analysis of the acceptance criteria and study details for the AccuContour 4.0, extracted and organized from the provided FDA 510(k) clearance letter.
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are derived from the "Pass Criteria" columns in Tables 1, 2, 3, and 4, which specify minimum DSC and maximum HD95 values. The reported device performance is represented by the "Lower Bound 95% CI" for both DSC and HD95, and the "Average Rating" for clinical applicability.
Table A: Performance for Synthetic CT (sCT) Contouring Function (Derived from MR Images)
| Organ & Structure | Size | DSC Pass Criteria | HD95 Pass Criteria (mm) | Reported DSC (Lower Bound 95% CI) | Reported HD95 (Lower Bound 95% CI, mm) | Average Rating (1-5) | Meet Criteria? (DSC) | Meet Criteria? (HD95) |
|---|---|---|---|---|---|---|---|---|
| TemporalLobe_L | Medium | 0.65 | N/A | 0.886 | 4.319 (N/A criteria) | 4.5 | Yes | N/A |
| TemporalLobe_R | Medium | 0.65 | N/A | 0.878 | 4.382 (N/A criteria) | 4.6 | Yes | N/A |
| Brain | Large | 0.8 | N/A | 0.986 | 1.877 (N/A criteria) | 4.7 | Yes | N/A |
| BrainStem | Medium | 0.65 | N/A | 0.843 | 4.999 (N/A criteria) | 4.5 | Yes | N/A |
| SpinalCord | Medium | 0.65 | N/A | 0.867 | 3.030 (N/A criteria) | 4.8 | Yes | N/A |
| OpticChiasm | Small | 0.5 | N/A | 0.804 | 4.771 (N/A criteria) | 4.1 | Yes | N/A |
| OpticNerve_L | Small | 0.5 | N/A | 0.822 | 2.235 (N/A criteria) | 4.1 | Yes | N/A |
| OpticNerve_R | Small | 0.5 | N/A | 0.794 | 2.422 (N/A criteria) | 4.2 | Yes | N/A |
| InnerEar_L | Small | 0.5 | N/A | 0.843 | 2.164 (N/A criteria) | 4.2 | Yes | N/A |
| InnerEar_R | Small | 0.5 | N/A | 0.806 | 2.102 (N/A criteria) | 4.4 | Yes | N/A |
| MiddleEar_L | Small | 0.5 | N/A | 0.824 | 3.580 (N/A criteria) | 4.5 | Yes | N/A |
| MiddleEar_R | Small | 0.5 | N/A | 0.792 | 3.700 (N/A criteria) | 4.4 | Yes | N/A |
| Eye_L | Small | 0.5 | N/A | 0.906 | 1.659 (N/A criteria) | 4.8 | Yes | N/A |
| Eye_R | Small | 0.5 | N/A | 0.897 | 1.584 (N/A criteria) | 4.9 | Yes | N/A |
| Lens_L | Small | 0.5 | N/A | 0.836 | 3.368 (N/A criteria) | 4.5 | Yes | N/A |
| Lens_R | Small | 0.5 | N/A | 0.841 | 3.379 (N/A criteria) | 4.2 | Yes | N/A |
| Pituitary | Small | 0.5 | N/A | 0.801 | 2.267 (N/A criteria) | 4.4 | Yes | N/A |
| Mandible | Small | 0.5 | N/A | 0.913 | 1.844 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_L | Small | 0.5 | N/A | 0.830 | 2.819 (N/A criteria) | 4.4 | Yes | N/A |
| TMJ_R | Small | 0.5 | N/A | 0.817 | 2.722 (N/A criteria) | 4.5 | Yes | N/A |
| OralCavity | Medium | 0.65 | N/A | 0.916 | 3.677 (N/A criteria) | 4.7 | Yes | N/A |
| Larynx | Medium | 0.65 | N/A | 0.795 | 2.196 (N/A criteria) | 4.4 | Yes | N/A |
| Trachea | Medium | 0.65 | N/A | 0.870 | 2.452 (N/A criteria) | 4.5 | Yes | N/A |
| Esophagus | Medium | 0.65 | N/A | 0.800 | 2.680 (N/A criteria) | 4.7 | Yes | N/A |
| Parotid_L | Medium | 0.65 | N/A | 0.851 | 2.386 (N/A criteria) | 4.6 | Yes | N/A |
| Parotid_R | Medium | 0.65 | N/A | 0.868 | 2.328 (N/A criteria) | 4.6 | Yes | N/A |
| Submandibular_L | Medium | 0.65 | N/A | 0.833 | 4.920 (N/A criteria) | 4.5 | Yes | N/A |
| Submandibular_R | Medium | 0.65 | N/A | 0.783 | 2.348 (N/A criteria) | 4.3 | Yes | N/A |
| Thyroid | Medium | 0.65 | N/A | 0.803 | 1.911 (N/A criteria) | 4.8 | Yes | N/A |
| BrachialPlexus_L | Medium | 0.65 | N/A | 0.828 | 5.347 (N/A criteria) | 4.4 | Yes | N/A |
| BrachialPlexus_R | Medium | 0.65 | N/A | 0.800 | 5.062 (N/A criteria) | 4.3 | Yes | N/A |
| Lung_L | Large | 0.8 | N/A | 0.968 | 1.635 (N/A criteria) | 4.5 | Yes | N/A |
| Lung_R | Large | 0.8 | N/A | 0.976 | 1.516 (N/A criteria) | 4.7 | Yes | N/A |
| Heart | Large | 0.8 | N/A | 0.959 | 2.496 (N/A criteria) | 4.5 | Yes | N/A |
| Liver | Large | 0.8 | N/A | 0.941 | 2.439 (N/A criteria) | 4.0 | Yes | N/A |
| Kidney_L | Large | 0.8 | N/A | 0.892 | 2.748 (N/A criteria) | 4.7 | Yes | N/A |
| Kidney_R | Large | 0.8 | N/A | 0.895 | 2.797 (N/A criteria) | 4.5 | Yes | N/A |
| Stomach | Large | 0.8 | N/A | 0.782 | 4.754 (N/A criteria) | 4.1 | No* | N/A |
| Pancreas | Medium | 0.65 | N/A | 0.827 | 6.271 (N/A criteria) | 4.0 | Yes | N/A |
| Duodenum | Medium | 0.65 | N/A | 0.815 | 6.447 (N/A criteria) | 4.1 | Yes | N/A |
| Rectum | Medium | 0.65 | N/A | 0.796 | 2.047 (N/A criteria) | 3.9 | Yes | N/A |
| BowelBag | Large | 0.8 | N/A | 0.808 | 7.380 (N/A criteria) | 4.0 | Yes | N/A |
| Bladder | Large | 0.8 | N/A | 0.943 | 2.082 (N/A criteria) | 4.5 | Yes | N/A |
| Marrow | Large | 0.8 | N/A | 0.889 | 1.842 (N/A criteria) | 4.6 | Yes | N/A |
| FemurHead_L | Medium | 0.65 | N/A | 0.950 | 2.261 (N/A criteria) | 4.5 | Yes | N/A |
| FemurHead_R | Medium | 0.65 | N/A | 0.941 | 2.466 (N/A criteria) | 4.6 | Yes | N/A |
*Note: For Stomach, the reported DSC (0.782) is below the pass criteria (0.8). However, the document states, "The results indicate that the auto-segmentation performance of the AccuContour system for sCT images derived from both CBCT and MR modalities meets the requirements for geometric accuracy." This suggests there might be an overall or combined assessment, or other factors led to acceptance despite this single instance. The average clinical rating is 4.1, which is above the threshold of 3.
Table B: Performance for Synthetic CT (sCT) Contouring Function (Derived from CBCT Images)
| Organ & Structure | Size | DSC Pass Criteria | HD95 Pass Criteria (mm) | Reported DSC (Lower Bound 95% CI) | Reported HD95 (Lower Bound 95% CI, mm) | Average Rating (1-5) | Meet Criteria? (DSC) | Meet Criteria? (HD95) |
|---|---|---|---|---|---|---|---|---|
| TemporalLobe_L | Medium | 0.65 | N/A | 0.854 | 3.451 (N/A criteria) | 4.8 | Yes | N/A |
| TemporalLobe_R | Medium | 0.65 | N/A | 0.859 | 3.258 (N/A criteria) | 4.6 | Yes | N/A |
| Brain | Large | 0.8 | N/A | 0.986 | 1.804 (N/A criteria) | 4.7 | Yes | N/A |
| BrainStem | Medium | 0.65 | N/A | 0.903 | 4.678 (N/A criteria) | 4.5 | Yes | N/A |
| SpinalCord | Medium | 0.65 | N/A | 0.869 | 2.088 (N/A criteria) | 4.8 | Yes | N/A |
| OpticChiasm | Small | 0.5 | N/A | 0.795 | 5.252 (N/A criteria) | 4.4 | Yes | N/A |
| OpticNerve_L | Small | 0.5 | N/A | 0.815 | 2.373 (N/A criteria) | 4.2 | Yes | N/A |
| OpticNerve_R | Small | 0.5 | N/A | 0.816 | 2.210 (N/A criteria) | 4.1 | Yes | N/A |
| InnerEar_L | Small | 0.5 | N/A | 0.800 | 2.144 (N/A criteria) | 4.5 | Yes | N/A |
| InnerEar_R | Small | 0.5 | N/A | 0.794 | 2.171 (N/A criteria) | 4.2 | Yes | N/A |
| MiddleEar_L | Small | 0.5 | N/A | 0.800 | 3.301 (N/A criteria) | 4.5 | Yes | N/A |
| MiddleEar_R | Small | 0.5 | N/A | 0.797 | 3.888 (N/A criteria) | 4.5 | Yes | N/A |
| Eye_L | Small | 0.5 | N/A | 0.944 | 1.553 (N/A criteria) | 4.8 | Yes | N/A |
| Eye_R | Small | 0.5 | N/A | 0.941 | 1.678 (N/A criteria) | 4.9 | Yes | N/A |
| Lens_L | Small | 0.5 | N/A | 0.820 | 3.532 (N/A criteria) | 4.5 | Yes | N/A |
| Lens_R | Small | 0.5 | N/A | 0.821 | 3.370 (N/A criteria) | 4.7 | Yes | N/A |
| Pituitary | Small | 0.5 | N/A | 0.802 | 2.496 (N/A criteria) | 4.4 | Yes | N/A |
| Mandible | Small | 0.5 | N/A | 0.870 | 2.227 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_L | Small | 0.5 | N/A | 0.774 | 2.775 (N/A criteria) | 4.3 | Yes | N/A |
| TMJ_R | Small | 0.5 | N/A | 0.800 | 2.791 (N/A criteria) | 4.5 | Yes | N/A |
| OralCavity | Medium | 0.65 | N/A | 0.885 | 3.794 (N/A criteria) | 4.8 | Yes | N/A |
| Larynx | Medium | 0.65 | N/A | 0.793 | 2.827 (N/A criteria) | 4.8 | Yes | N/A |
| Trachea | Medium | 0.65 | N/A | 0.873 | 2.545 (N/A criteria) | 4.5 | Yes | N/A |
| Esophagus | Medium | 0.65 | N/A | 0.800 | 2.811 (N/A criteria) | 4.5 | Yes | N/A |
| Parotid_L | Medium | 0.65 | N/A | 0.891 | 2.415 (N/A criteria) | 4.6 | Yes | N/A |
| Parotid_R | Medium | 0.65 | N/A | 0.894 | 2.525 (N/A criteria) | 4.6 | Yes | N/A |
| Submandibular_L | Medium | 0.65 | N/A | 0.745 | 5.026 (N/A criteria) | 4.8 | Yes | N/A |
| Submandibular_R | Medium | 0.65 | N/A | 0.797 | 2.192 (N/A criteria) | 4.7 | Yes | N/A |
| Thyroid | Medium | 0.65 | N/A | 0.823 | 2.182 (N/A criteria) | 4.8 | Yes | N/A |
| BrachialPlexus_L | Medium | 0.65 | N/A | 0.805 | 3.922 (N/A criteria) | 4.4 | Yes | N/A |
| BrachialPlexus_R | Medium | 0.65 | N/A | 0.823 | 3.529 (N/A criteria) | 4.2 | Yes | N/A |
| Lung_L | Large | 0.8 | N/A | 0.947 | 1.587 (N/A criteria) | 4.5 | Yes | N/A |
| Lung_R | Large | 0.8 | N/A | 0.971 | 1.635 (N/A criteria) | 4.3 | Yes | N/A |
| Heart | Large | 0.8 | N/A | 0.896 | 1.823 (N/A criteria) | 4.5 | Yes | N/A |
| Liver | Large | 0.8 | N/A | 0.914 | 2.595 (N/A criteria) | 4.6 | Yes | N/A |
| Kidney_L | Large | 0.8 | N/A | 0.922 | 2.645 (N/A criteria) | 4.7 | Yes | N/A |
| Kidney_R | Large | 0.8 | N/A | 0.906 | 2.611 (N/A criteria) | 4.5 | Yes | N/A |
| Stomach | Large | 0.8 | N/A | 0.858 | 4.681 (N/A criteria) | 4.2 | Yes | N/A |
| Pancreas | Medium | 0.65 | N/A | 0.822 | 5.548 (N/A criteria) | 4.4 | Yes | N/A |
| Duodenum | Medium | 0.65 | N/A | 0.818 | 5.252 (N/A criteria) | 4.1 | Yes | N/A |
| Rectum | Medium | 0.65 | N/A | 0.797 | 4.253 (N/A criteria) | 4.3 | Yes | N/A |
| BowelBag | Large | 0.8 | N/A | 0.850 | 5.028 (N/A criteria) | 4.0 | Yes | N/A |
| Bladder | Large | 0.8 | N/A | 0.926 | 3.322 (N/A criteria) | 4.7 | Yes | N/A |
| Marrow | Large | 0.8 | N/A | 0.837 | 2.148 (N/A criteria) | 4.7 | Yes | N/A |
| FemurHead_L | Medium | 0.65 | N/A | 0.893 | 1.639 (N/A criteria) | 4.8 | Yes | N/A |
| FemurHead_R | Medium | 0.65 | N/A | 0.927 | 1.807 (N/A criteria) | 4.9 | Yes | N/A |
Table C: Performance for 4DCT Registration Function (Rigid Registration)
| Organ & Structure | Size | DSC Pass Criteria | Reported DSC (Lower Bound 95% CI) | Average Rating (1-5) | Meet Criteria? |
|---|---|---|---|---|---|
| Trachea | Medium | 0.65 | 0.888 | 4.5 | Yes |
| Esophagus | Medium | 0.65 | 0.836 | 4.5 | Yes |
| Lung_L | Large | 0.8 | 0.932 | 4.7 | Yes |
| Lung_R | Large | 0.8 | 0.929 | 4.8 | Yes |
| Lung_All | Large | 0.8 | 0.930 | 4.8 | Yes |
| Heart | Large | 0.8 | 0.917 | 4.6 | Yes |
| SpinalCord | Medium | 0.65 | 0.943 | 4.6 | Yes |
| Liver | Large | 0.8 | 0.888 | 4.6 | Yes |
| Stomach | Large | 0.8 | 0.791 | 4.5 | No* |
| A_Aorta | Large | 0.8 | 0.917 | 4.4 | Yes |
| Spleen | Large | 0.8 | 0.786 | 4.5 | No* |
| Body | Large | 0.8 | 0.995 | 4.9 | Yes |
*Note: For Stomach (0.791) and Spleen (0.786), the reported DSC is below the pass criteria (0.8). However, the document states, "According to the results, the accuracy of 4DCT image registration images meets the requirements and all structure models demonstrating that only minor edits would be required in order to make the structure models acceptable for clinical use." The average clinical rating for both is 4.5, above the threshold of 3.
Table D: Performance for 4DCT Registration Function (Deformable Registration)
| Organ & Structure | Size | DSC Pass Criteria | Reported DSC (Lower Bound 95% CI) | Average Rating (1-5) | Meet Criteria? |
|---|---|---|---|---|---|
| Trachea | Medium | 0.65 | 0.940 | 4.7 | Yes |
| Esophagus | Medium | 0.65 | 0.866 | 4.6 | Yes |
| Lung_L | Large | 0.8 | 0.966 | 4.7 | Yes |
| Lung_R | Large | 0.8 | 0.949 | 4.5 | Yes |
| Lung_All | Large | 0.8 | 0.954 | 4.8 | Yes |
| Heart | Large | 0.8 | 0.931 | 4.6 | Yes |
| SpinalCord | Medium | 0.65 | 0.920 | 4.6 | Yes |
| Liver | Large | 0.8 | 0.936 | 4.5 | Yes |
| Stomach | Large | 0.8 | 0.889 | 4.5 | Yes |
| A_Aorta | Large | 0.8 | 0.947 | 4.6 | Yes |
| Spleen | Large | 0.8 | 0.913 | 4.8 | Yes |
| Body | Large | 0.8 | 0.997 | 4.9 | Yes |
2. Sample Size Used for the Test Set and Data Provenance
-
Synthetic CT (sCT) Contouring Function:
- Sample Size: 247 synthetic CT images (116 generated from MR, 131 generated from CBCT).
- Data Provenance:
- Demographic Distribution: 57% male, 43% female. Age distribution: 13% (21-40), 44.1% (41-60), 36.8% (61-80), 6.1% (81-100). Race: 78% White, 12% Black or African American, 10% Others.
- Imaging Equipment: MR images from GE (21.6%), Philips (56.9%), Siemens (21.6%). CBCT images from Varian (58.8%), Elekta (41.2%).
- Retrospective/Prospective: Not explicitly stated, but the description of demographic and equipment distribution from a "sample" indicates retrospective data collection from existing patient records.
- Country of Origin: The racial distribution explicitly mentions "U.S. clinical radiotherapy practice," suggesting the data is primarily from the United States.
-
4DCT Registration Function:
- Sample Size: 30 4DCT image sets.
- Data Provenance:
- Imaging Equipment: Siemens (90.0%), Philips (10.0%) scanners.
- Demographic Distribution: 17 males (56.7%), 13 females (43.3%). Age: 33-82 years, with majority in 51-65 (40.0%) and 66-80 (43.3%) year brackets.
- Image Characteristics: Uniform 3mm slice thickness (100%).
- Sourcing Location: Most images (90.0%) from Drexel Town Square Health Center/Community Memorial Hospital, remainder from Froedtert Hospital.
- Retrospective/Prospective: Not explicitly stated, but implies retrospective data from patient archives of the mentioned hospitals.
- Country of Origin: Based on the hospital names (Drexel Town Square Health Center, Community Memorial Hospital, Froedtert Hospital), the data is from the United States.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: Not explicitly stated. The text mentions "clinical experts evaluate the clinical applicability" and "RTStruct contoured by the professional physician as the gold standard." This implies at least one, and likely multiple, qualified medical professionals.
- Qualifications of Experts: The experts are described as "clinical experts" and "professional physician(s)." Their specific qualifications (e.g., "radiologist with 10 years of experience") are not provided. They are implied to be clinically qualified radiotherapy personnel.
4. Adjudication Method for the Test Set
- Adjudication Method: Not explicitly stated. The ground truth for segmentation is stated to be "RTStruct contoured by the professional physician". For clinical applicability, "clinical experts evaluate the clinical applicability" and assign a 1-5 scale score. This suggests a single expert (or group consensus without specific adjudication rules like 2+1) established the ground truth segmentation, and separate clinical experts evaluated the results. There is no mention of a formal adjudication process for disagreements in ground truth labeling if multiple experts were involved in its creation.
5. Multi Reader Multi Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? No.
- Effect Size of Human Improvement (if applicable): Not applicable, as no MRMC study comparing human readers with and without AI assistance was reported. The testing focused solely on the algorithm's performance against expert-generated ground truth and expert evaluation of the algorithm's output.
6. Standalone Performance
- Was a standalone performance study done? Yes. The entire report details the "Performance Test Report on Synthetic CT (sCT) Contouring Function" and "Performance Test Report on 4DCT Registration Function," measuring the algorithm's performance (DSC, HD95) against gold standard contours and qualitative evaluation by clinical experts. This reflects the algorithm's performance independent of human interaction during the contouring process.
7. Type of Ground Truth Used
- Ground Truth: For the synthetic CT contouring and 4DCT registration functions, the ground truth was "RTStruct contoured by the professional physician" (i.e., expert consensus or expert-generated contours).
8. Sample Size for the Training Set
- Training Set Sample Size: Not provided in the document.
9. How the Ground Truth for the Training Set was Established
- Training Set Ground Truth Establishment: Not provided in the document. The document only details the ground truth used for the validation/test set.
Ask a specific question about this device
(266 days)
The primary function of ARTAssistant is to facilitate image processing with image registration and synthetic CT (sCT) generation in adaptive radiation therapy. This enables users to meticulously design ART plans based on the processed images.
ARTAssistant, is a standalone software which is positioned as an adaptive radiotherapy auxiliary system, aiming to provide a complete solution to assist the implementation of adaptive radiotherapy, helping hospitals to implement adaptive radiotherapy on ordinary image-guided accelerators based on the current situation. This system is mainly used to assist in the image processing of online adaptive radiotherapy, thereby helping users complete the design of the daily adaptive radiotherapy plan based on the processed images.
The product has three main functions on image processing:
- Automatic registration: rigid and deformable registration, and
- Image conversion: generation of synthetic CT from CBCT or MR, and
- Image contouring: it can manual contour organs-at-risk, in head and neck, thorax, abdomen and pelvis (for both male and female) areas assisted contouring tools.
It also has the following general functions:
- Receive, add/edit/delete, transmit, input/export medical images and DICOM data;
- Patient management;
- Review of processed images.
Here's an analysis of the ARTAssistant device, focusing on its acceptance criteria and the study that proves it meets those criteria, based on the provided FDA 510(k) clearance letter:
There is no specific table of acceptance criteria or reported device performance for ARTAssistant directly included in the provided 510(k) summary. The summary primarily focuses on comparing ARTAssistant's technological characteristics to predicate and reference devices and describes the performance tests conducted rather than explicit pass/fail criteria or quantitative results against those criteria.
However, based on the performance test descriptions, we can infer the intent of the acceptance criteria and how the device performance was evaluated.
Inferred Acceptance Criteria and Reported Device Performance
| Acceptance Criteria Category | Inferred/Stated Acceptance Criteria | Reported Device Performance |
|---|---|---|
| Automatic Rigid Registration | Non-inferiority in Normalized Mutual Information (NMI) and Hausdorff Distance (HD) compared to predicate device K221706. | "NMI and HD values of the proposed device was non-inferiority compares with that of the predicate device." |
| Automatic Deformable Registration | Non-inferiority in Normalized Mutual Information (NMI) and Hausdorff Distance (HD) compared to predicate device K221706. | "NMI and HD values of the proposed device was non-inferiority compares with that of the predicate device." |
| Image Conversion (sCT Generation) - Dosimetric Accuracy | Gamma Pass Rate within the acceptable range of AAPM TG-119 when comparing RTDose and sRTDose. | "Gamma Pass Rate of all test results is within the acceptable range of AAPM TG-119, which demonstrates the accuracy of the image conversion function." |
| Image Conversion (sCT Generation) - Anatomic/Geometric Accuracy | Segmentation results of ROIs on sCT compared to CBCT/MR demonstrate required geometric accuracy (evaluated by Dice similarity coefficient). | "The results indicate that the geometric accuracy of sCT images generated from both CBCT and MR meets the requirements." |
| Software Verification & Validation | Meet user needs and intended use, pass all software V&V tests. | "ARTAssistant passed all software verification and validation tests." |
Study Details:
1. Sample Size Used for the Test Set and Data Provenance:
- Automatic Rigid & Deformable Registration Functions:
- Sample Size: Not explicitly stated, but implies a collection of "multi-modality image sets from different patients." The count of sets/patients is not provided.
- Data Provenance: All fixed and moving images were generated in healthcare institutions in the U.S. Retrospective or prospective is not specified, but typically, such datasets are retrospective.
- Image Conversion Function:
- Sample Size: 247 testing image sets.
- Data Provenance: All test images were generated in the U.S. The data provenance is retrospective.
- Patient Demographics: 57% male, 43% female. Ages: 21-40 (13%), 41-60 (44.1%), 61-80 (36.8%), 81-100 (6.1%). Race: 78% White, 12% Black or African American, 10% Other.
- Cancer Types: Covers 6 cancer types (Intracranial tumor, nasopharyngeal carcinoma, esophagus cancer, lung cancer, liver cancer, cervical cancer) with specific distributions for both MR/CT and CBCT/CT test datasets.
- Scanner Models:
- CT: GE (28.3%), Philips (41.7%), Siemens (30%)
- MR: GE (21.6%), Philips (56.9%), Siemens (21.6%)
- CBCT: Varian (58.8%), Elekta (41.2%)
- Slice Thicknesses: Distributed as 1mm (19%), 2mm (22.8%), 2.5mm (17.4%), 3mm (17%), 5mm (23.8%).
2. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts:
- The document does not explicitly state the number of experts or their qualifications used to establish ground truth for the test set.
- For the Image Conversion Dosimetric Accuracy, the AAPM TG-119 method is mentioned, which implies established phantom-based criteria or expert-derived dose distributions as a reference.
- For the Image Conversion Anatomic/Geometric Accuracy (Dice coefficient), the "segmentation results of each ROI on CBCT/MR" were compared, implying these "true" segmentations would likely have been established by qualified medical professionals, but this is not confirmed.
3. Adjudication Method for the Test Set:
- The document does not explicitly state an adjudication method (such as 2+1 or 3+1) for the test set. The evaluation methods described (NMI, HD, Gamma Pass Rate, Dice coefficient) are quantitative metrics compared against either a predicate device's output or established physical/dosimetric accuracy standards.
4. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done:
- No, an MRMC comparative effectiveness study was not explicitly mentioned or performed.
- The performance tests focused on the algorithm's standalone performance in comparison to either a predicate device's algorithm or established accuracy standards, not on how human readers improve with AI assistance.
5. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done:
- Yes, a standalone performance evaluation was conducted. The described "Performance Test Report on Rigid Registration Function," "Performance Test Report on Deformable Registration Function," and "Performance Test Report on Image Conversion Function" all relate to the algorithm's direct output and quantitative measurements without human intervention being part of the primary performance evaluation.
6. The Type of Ground Truth Used:
- For Rigid and Deformable Registration: The ground truth for comparison was the performance metrics (NMI and HD) of the predicate device (AccuContour, K221706). This indicates a comparative ground truth rather than an absolute biological or pathological ground truth.
- For Image Conversion (Dosimetric Accuracy): The ground truth was based on the AAPM TG-119 method, implying a phantom-based or established dosimetric standard against which the sRTDose was compared to the RTDose derived from true CT.
- For Image Conversion (Anatomic/Geometric Accuracy): The ground truth was the segmentation results of ROIs on the original CBCT/MR images, against which the segmentations on the sCT images were compared using the Dice similarity coefficient. This suggests expert consensus or manually established contours on the original images as ground truth.
7. The Sample Size for the Training Set:
- For the deep learning model for image conversion: There were 560 training image sets.
- The document does not specify training set sizes for the rigid or deformable registration algorithms.
8. How the Ground Truth for the Training Set Was Established:
- For the deep learning model for image conversion: The document does not explicitly detail how the ground truth for the 560 training image sets was established. Given the nature of synthetic CT generation, the "ground truth" for training would typically involve pairs of input images (e.g., MR/CBCT) and corresponding reference CT images. This would likely be derived from clinical scans, potentially aligned and processed for model training, but the process of establishing the "correctness" of these pairs (e.g., precise anatomical alignment, image quality) is not elaborated upon.
- Data Provenance (Training Set): The training image set source is from China.
Ask a specific question about this device
(103 days)
TAIMedImg DeepMets is a software device intended to assist trained medical professionals by providing initial object contours on axial T1-weighted contrast-enhanced (T1WI+C) brain magnetic resonance (MR) images to accelerate workflow for radiation therapy treatment planning.
TAIMedImg DeepMets is intended only for patients with known (imaging diagnosed) brain metastases (BM) when cancer cells spread from primary site to the brain. It is not intended to be used with images of other brain tumors or other body parts. The software is intended for use with BM lesions with a diameter of ≥ 10 mm.
TAIMedImg DeepMets uses an artificial intelligence algorithm to contour images and offers automated segmentation for Gross Tumor Volume (GTV) contours of brain metastases. The software is an adjunctive tool and not intended for replacing the users' current standard practice of manual contouring process. All automatic output generated by the software shall be thoroughly reviewed by a trained medical professional prior to delivering any therapy or treatment. The physician retains the ultimate responsibility for making the final diagnosis and treatment decision.
TAIMedImg DeepMets is intended to be used by medical professionals trained in the use of the device.
Only DICOM images of adult patients are considered valid input. DeepMets does not support DICOM images of patients that have one of the following exclusions:
- (i) presence of prior craniotomy
- (ii) patients with clinical imaging diagnosis of brain tumors other than BM
- (iii) Images with patient motion: excessive motion leading to artifacts that make the scan technically inadequate
Medical professionals must finalize (confirm or modify) the contours generated by TAIMedImg DeepMets, as necessary, using an external platform available at the facility that supports DICOM-RT viewing/editing functions, such as image visualization software and treatment planning system.
TAIMedImg DeepMets is a software application system intended for use in the contouring (segmentation) of brain magnetic resonance (MR) images. The device comprises an AI inference module and a DICOM Radiotherapy Structure Sets (RTSS, or RTSTRUCT) converter module.
The AI inference module consists of image preprocessing, deep learning neural networks, and postprocessing components, and is intended to contour brain metastasis on the axial T1-weighted contrast-enhanced (T1WI+C) MR images. It utilizes deep learning neural networks to generate contours and annotations for the diagnosed brain metastases.
The DICOM RTSS converter module converts the contours, annotations, along with metadata, into a standard DICOM-RTSTRUCT file, making it compatible with radiotherapy treatment planning systems.
Here's a breakdown of the acceptance criteria and study details for TAIMedImg DeepMets, based on the provided FDA 510(k) clearance letter:
Acceptance Criteria and Device Performance
| Metric | Reported Device Performance (Mean) | 95% Confidence Interval | Acceptance Criteria | Source |
|---|---|---|---|---|
| Lesion-Wise Sensitivity (Se) (%) | 89.97 | (86.51, 93.43) | > 80 | Deep learning |
| False-Positive Rate (FPR) (FPs/case) | 0.354 | (0.215, 0.481) | < 0.5 | Deep learning |
| Dice Similarity Coefficient (DSC) | 0.70 | (0.67, 0.72) | ≥ 0.65 | Estimated |
| Hausdorff Distance (HD) (mm) | 6.66 | (5.86, 7.41) | ≤ 8.0 | Estimated |
| Centroid Distance (CD) (mm) | 1.75 | (1.33, 2.11) | ≤ 2.0 | Estimated |
Note: "Deep learning" in the Source column indicates comparisons to similar FDA-cleared deep learning devices. "Estimated" indicates acceptance criteria were based on literature and clinical justification.
Study Information
2. Sample size used for the test set and the data provenance:
- Sample Size: 158 MRI scans from 158 patients, containing 289 measurable lesions (≥ 10 mm in diameter, as defined by RANO-BM criteria).
- Data Provenance: The test set was an independent U.S. dataset collected from 16 imaging facilities, acquired using scanners from GE, Philips, Siemens, and Toshiba. It was completely independent and not used in any stage of algorithm development. The data is retrospective.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Three (3) clinically experienced radiologists/neuroradiologists.
- Qualifications: "Clinically experienced radiologists/neuroradiologists." Specific years of experience are not mentioned.
4. Adjudication method for the test set:
- Adjudication Method: Ground truth annotations were established based on consensus NRG/RTOG clinical guidelines by the three experts. This implies a consensual agreement among the three.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- The provided document does not mention a multi-reader multi-case (MRMC) comparative effectiveness study evaluating human reader improvement with AI assistance. The performance testing described is a standalone evaluation of the algorithm against expert-defined ground truth.
6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance testing was conducted. The results in the table above reflect the algorithm's performance without human intervention after the initial contour generation.
7. The type of ground truth used:
- Expert Consensus: Ground truth annotations were manually established based on consensus NRG/RTOG clinical guidelines by three clinically experienced radiologists/neuroradiologists.
8. The sample size for the training set:
- Initial Training: 1,029 patients.
- Further Tuning: 559 patients.
- Total Training/Tuning Sample Size: 1,029 + 559 = 1,588 patients.
9. How the ground truth for the training set was established:
- The document states the initial training dataset was collected from a major medical center in Taiwan between 1993 and 2017. For the further tuning dataset, an additional dataset from a nationwide healthcare database (2018-2019) was used. However, the document does not explicitly describe how the ground truth for the training dataset was established (e.g., by experts, pathology, etc.). It only mentions that the model was "trained on a retrospective dataset...".
Ask a specific question about this device
(189 days)
MR Contour DL generates a Radiotherapy Structure Set (RTSS) DICOM with segmented organs at risk which can be used by trained medical professionals. It is intended to aid in radiation therapy planning by generating initial contours to accelerate workflow for radiation therapy planning. It is the responsibility of the user to verify the processed output contours and user-defined labels for each organ at risk and correct the contours/labels as needed. MR Contour DL is intended to be used with images acquired on MR scanners, in adult patients.
MR Contour DL is a post processing application intended to assist a clinician by generating contours of organ at risk (OAR) from MR images in the form of a DICOM Radiotherapy Structure Set (RTSS) series. MR Contour DL is designed to automatically contour the organs in the head/neck, and in the pelvis for Radiation Therapy (RT) planning of adult cases. The output of the MR Contour DL is intended to be used by radiotherapy (RT) practitioners after review and editing, if necessary, and confirming the accuracy of the contours for use in radiation therapy planning.
MR Contour DL uses customizable input parameters that define RTSS description, RTSS labeling, organ naming and coloring. MR Contour DL does not have a user interface of its own and can be integrated with other software and hardware platforms. MR Contour DL has the capability to transfer the input and output series to the customer desired DICOM destination(s) for review.
MR Contour DL uses deep learning segmentation algorithms that have been designed and trained specifically for the task of generating organ at risk contours from MR images. MR Contour DL is designed to contour 37 different organs or structures using the deep learning algorithms in the application processing workflow.
The input of the application is MR DICOM images in adult patients acquired from compatible MR scanners. In the user-configured profile, the user has the flexibility to choose both the covered anatomy of input scan and the specific organs for segmentation. The proposed device has been tested on GE HealthCare MR data.
Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) clearance letter for MR Contour DL:
1. Table of Acceptance Criteria and Reported Device Performance
Device: MR Contour DL
| Metric | Organ Anatomy Region | Acceptance Criteria | Reported Performance (Mean) | Outcome |
|---|---|---|---|---|
| DICE Similarity Coefficient (DSC) | Small Organs (e.g., chiasm, inner-ear) | ≥ 50% | 67.4% - 98.8% (across all organs) | Met |
| Medium Organs (e.g., brainstem, eye) | ≥ 65% | 79.6% - 95.5% (across relevant organs) | Met | |
| Large Organs (e.g., bladder, head-body) | ≥ 80% | 90.3% - 99.3% (across relevant organs) | Met | |
| 95th percentile Hausdorff Distance (HD95) Comparison | All Organs | Improved or Equivalent to Predicate Device | Improved or Equivalent in 24/28 organs analyzed; average HD95 of 4.7 mm (< predicate average) | Met |
| Likert Score (Reader Study) | All Organs | Mean Likert Score ≥ 3.0 (where 3 = good, some correction needed) | 3.0 - 4.5 (across all organs) | Met |
Note: The HD95 values for specific organs are provided in Table 4 of the document, showing individual comparisons (Improved, Not-Improved, Equivalent, N/A). The overall performance for HD95 is summarized as met based on the text "improved or equivalent HD95 value in 24/28 of the organs analyzed and an average HD95 performance of 4.7 mm, which is smaller than the average corresponding HD95 values of the predicate device."
2. Sample Sizes and Data Provenance
- Test Set (Non-Clinical/Bench Testing):
- Total Cases: 105 retrospectively collected exams.
- Head/Neck: 50 cases (23 from independently collected cohorts, 27 separated from development data)
- Pelvis: 55 cases (32 from independently collected cohorts, 23 separated from development data)
- Data Provenance:
- Country of Origin: USA (72% Head/Neck, 58% Pelvis) and Europe (NL 28% Head/Neck, UK 42% Pelvis)
- Retrospective/Prospective: Retrospectively collected
- Total Cases: 105 retrospectively collected exams.
- Test Set (Clinical/Reader Study):
- Total Cases: 70 cases (a subset of the non-clinical test data).
- Head/Neck: 30 cases
- Pelvis: 40 cases
- Data Provenance: Same as non-clinical testing, as it was a subset.
- Total Cases: 70 cases (a subset of the non-clinical test data).
- Training Set: Not explicitly stated. The document mentions "separated from the development data cohorts before the models were trained," implying a training set existed but its size is not given.
3. Number of Experts and Qualifications for Ground Truth (Test Set)
- Number of Experts: Three (3) board-certified radiation oncologists.
- Qualifications: Two (2) from the USA, one (1) from Europe. All certified radiation oncologists. Experience level (e.g., "10 years experience") is not specified beyond "board certified."
4. Adjudication Method (Test Set)
- Non-Clinical/Bench Testing (Ground Truth Generation):
- Manual contours delineated by GEHC operators trained using international guidelines (DAHANCA, RTOG).
- Manual contours were revised (corrected and approved) by the three board-certified radiation oncologists.
- All three independently validated ground-truth contours were incorporated in the performance evaluation. This suggests a form of consensus or a voting mechanism, but the exact "adjudication" (e.g., 2+1, averaging) is not detailed. It implies that the final ground truth was derived from the combination of all three expert reviews.
- Clinical/Reader Study:
- Automated contours were scored by the three certified radiation oncologists.
- Readers completed their assessments independently and were blinded to the results of other readers' assessments.
- All three independently provided Likert Scores were incorporated in the performance evaluation. Similar to ground truth generation, the exact method of combining scores beyond "incorporating" is not specified, but the final reported value is a "Likert score MEAN."
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was it done? Yes, a multi-reader study was conducted to assess the adequacy of the contours. The readers were radiologists providing an assessment (Likert score) of the AI-generated contours.
- Effect Size of Human Readers' Improvement with AI vs. without AI Assistance: This study was structured to evaluate the adequacy of AI-generated contours for use in RT planning, with human readers providing an assessment of these pre-generated AI contours. It was not a comparative effectiveness study designed to measure the improvement in human reader performance when assisted by AI versus unassisted human performance (e.g., human-only contouring vs. AI-assisted human contouring). The study aimed to show that the AI's output is acceptable for human review and correction, not how much faster or more accurate humans become with the AI.
6. Standalone (Algorithm Only Without Human-in-the-Loop Performance)
- Was it done? Yes, the "Non-Clinical Testing" or "Bench Testing" section directly assesses the algorithm's standalone performance using DSC and HD95 metrics, comparing its output to expert-generated ground truth. The algorithm generates the initial contours, which are then evaluated for accuracy against the established ground truth.
7. Type of Ground Truth Used
- Non-Clinical/Bench Testing: Expert consensus (manual contours by trained operators, revised and approved by three board-certified radiation oncologists).
- Clinical/Reader Study: Expert opinion/assessment (Likert scores provided by three board-certified radiation oncologists on the adequacy of the AI-generated contours).
8. Sample Size for the Training Set
- The sample size for the training set is not explicitly provided in the document. It only states that the test data cases (27 head/neck, 23 pelvis) were "separated from the development data cohorts before the models were trained."
9. How the Ground Truth for the Training Set was Established
- The method for establishing ground truth for the training set is not explicitly detailed. It can be inferred that it followed a similar process to the test set ground truth (manual contouring by trained operators, potentially reviewed by experts), as it mentions "development data cohorts," but the specifics are absent.
Ask a specific question about this device
(197 days)
AI-Rad Companion Organs RT is a post-processing software intended to automatically contour DICOM CT and MR pre-defined structures using deep-learning-based algorithms.
Contours that are generated by AI-Rad Companion Organs RT may be used as input for clinical workflows including external beam radiation therapy treatment planning. AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept contours generated by AI-Rad Companion Organs RT.
The outputs of AI-Rad Companion Organs RT are intended to be used by trained medical professionals.
The software is not intended to automatically detect or contour lesions.
AI-Rad Companion Organs RT provides automatic segmentation of pre-defined structures such as Organs-at-risk (OAR) from CT or MR medical series, prior to dosimetry planning in radiation therapy. AI-Rad Companion Organs RT is not intended to be used as a standalone diagnostic device and is not a clinical decision-making software.
CT or MR series of images serve as input for AI-Rad Companion Organs RT and are acquired as part of a typical scanner acquisition. Once processed by the AI algorithms, generated contours in DICOMRTSTRUCT format are reviewed in a confirmation window, allowing clinical user to confirm or reject the contours before sending to the target system. Optionally, the user may select to directly transfer the contours to a configurable DICOM node (e.g., the Treatment Planning System (TPS), which is the standard location for the planning of radiation therapy).
AI-Rad Companion Organs RT must be used in conjunction with appropriate software such as Treatment Planning Systems and Interactive Contouring applications, to review, edit, and accept the automatically generated contours. Then the output of AI-Rad Companion Organs RT must be reviewed and, where necessary, edited with appropriate software before accepting generated contours as input to treatment planning steps. The output of AI-Rad Companion Organs RT is intended to be used by qualified medical professionals, who can perform a complementary manual editing of the contours or add any new contours in the TPS (or any other interactive contouring application supporting DICOM-RT objects) as part of the routine clinical workflow.
Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided text:
Acceptance Criteria and Device Performance Study for AI-Rad Companion Organs RT
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for the AI-Rad Companion Organs RT device, particularly for the enhanced CT contouring algorithm, are based on comparing its performance to the predicate device and relevant literature/cleared devices. The primary metrics used are Dice coefficient and Absolute Symmetric Surface Distance (ASSD).
Table 3: Acceptance Criteria of AIRC Organs RT VA50
| Validation Testing Subject | Acceptance Criteria | Reported Device Performance (Summary) |
|---|---|---|
| Organs in Predicate Device | All organs segmented in the predicate device are also segmented in the subject device. | Confirmed. The device continued to segment all organs previously handled by the predicate. |
| The average (AVG) Dice score difference between the subject and predicate device is < 3%. | Confirmed. "For existing organs, the average (AVG) Dice score difference between the subject device and predicate device is smaller than 3%." | |
| New Organs for Subject Device | The subject device in the selected reference metric has a higher value than the defined baseline value. | Confirmed. "The performance results of the subject device for the new CT organs are comparable to the reference literature & cleared devices. Here equivalence for the new organs is defined such that the selected reference metric has a higher value than the defined baseline." |
Table 3: Performance Summary of the Subject Device CT Contouring (Overall Average Dice Coefficients)
| Anatomic Region | Avg Dice (%) | Std Dice (%) | 95% CI |
|---|---|---|---|
| Head & Neck | 76.1 | 14.3 | [75.1, 77.2] |
| Head & Neck lymph nodes | 69.3 | 13.9 | [68.7, 70.0] |
| Thorax | 76.9 | 15.8 | [76.2, 77.6] |
| Abdomen | 87.3 | 10.1 | [86.3, 88.2] |
| Pelvis | 85.7 | 9.6 | [85.0, 86.5] |
| Cardiac | 75.6 | 15.1 | [74.1, 77.1] |
Table 4: Detailed Performance Evaluation of the New Organs in the Subject Device (Selected Examples)
| Organ Name | No. | AVG Dice (%) | STD Dice (%) | MED Dice (%) | 95%CI Dice | AVG ASSD (mm) | STD ASSD (mm) | MED ASSD (mm) | 95%CI ASSD |
|---|---|---|---|---|---|---|---|---|---|
| Left Breast | 30 | 90.4 | 3.8 | 91 | [89, 91.8] | 2.4 | 2.2 | 1.8 | [1.5, 3.2] |
| Right Breast | 30 | 90.2 | 3.7 | 90.8 | [88.8, 91.5] | 1.9 | 0.7 | 1.8 | [1.7, 2.2] |
| Bowel Bag | 33 | 95 | 3.6 | 96.5 | [93.7, 96.3] | 1.9 | 1.5 | 1.4 | [1.4, 2.5] |
| Pituitary | 30 | 75.8 | 7.4 | 77 | [73.1, 78.6] | 0.7 | 0.3 | 0.6 | [0.5, 0.8] |
| Brainstem | 30 | 88.4 | 2.5 | 88.8 | [87.5, 89.3] | 1 | 0.3 | 0.9 | [0.9, 1.1] |
| Esophagus | 30 | 85.6 | 4.2 | 86 | [84, 87.2] | 0.6 | 0.3 | 0.6 | [0.5, 0.7] |
| MEDIASTINAL LN 9L | 31 | 38.3 | 21.1 | 42.9 | [30.6, 46.1] | 5.3 | 4.4 | 3.7 | [3.7, 6.9] |
(Note: The full Table 4 from the document provides detailed performance for all 37 new organs. This table includes a selection for illustrative purposes.)
2. Sample Sizes and Data Provenance
- Test Set Sample Size:
- CT Contouring Algorithm: N = 579 cases
- MR Contouring Algorithm: The MR algorithm is unchanged from the predicate, so its performance is unchanged. The predicate was validated using 66 cases.
- Data Provenance (CT Contouring Algorithm Test Set):
- Geographic Origin (Overall N=579): Data from multiple clinical sites across North American, South American, Asia, Australia, and Europe.
- Example Cohorts (Table 5: Validation Testing Data Information based on Cohort):
- Cohort A.1 (N=73): Germany (14), Brazil (59)
- Cohort A.2 (N=40): Canada (40)
- Cohort A.3 (N=301): South/North America (184), EU (44), Asia (33), Australia (28), Unknown (12)
- Cohort B (N=165): South/North America (100), EU (51), Asia (6), Australia (3), Unknown (5)
- Retrospective/Prospective: "retrospective performance study on CT data previously acquired for RT treatment planning."
3. Number of Experts Used to Establish Ground Truth and Qualifications
- Number of Experts for Ground Truth: "a team of experienced annotators mentored by radiologists or radiation oncologists" for initial manual annotation. "a board-certified radiation oncologist" performed a quality assessment including review and correction of each annotation. The document does not specify an exact number of individuals for these teams, but describes the roles and qualifications.
- Qualifications of Experts:
- "experienced annotators"
- "radiologists or radiation oncologists" (mentors for annotators)
- "board-certified radiation oncologist" (for quality assessment/review)
4. Adjudication Method for the Test Set
The document describes the ground truth establishment process as: "manual annotation" by experienced annotators mentored by radiologists/radiation oncologists, followed by a "quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist." This indicates a hierarchical review/correction process rather than a multi-reader consensus adjudication between equally-weighted readers (e.g., 2+1 or 3+1). The final accepted contour after the board-certified radiation oncologist's review served as the ground truth.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No MRMC comparative effectiveness study was described. The study focused on the standalone performance of the AI algorithm against established ground truth and comparison with a predicate device and literature. The document does not mention an effect size of how much human readers improve with AI vs. without AI assistance. The intended use specifies that the AI-generated contours must be reviewed, edited, and accepted by trained medical professionals, implying a human-in-the-loop workflow, but the validation study presented focuses on the AI's autonomous segmentation accuracy.
6. Standalone (Algorithm Only) Performance Study
Yes, a standalone performance study was done. The performance metrics (Dice coefficient, ASSD) and the comparison to an expert-established ground truth demonstrate the algorithm's autonomous segmentation capability. The study validated the "autocontouring algorithms" and their performance.
7. Type of Ground Truth Used
The ground truth used for the test set was expert consensus / manual annotation based on clinical guidelines. Specifically: "Ground truth annotations were established following RTOG and clinical guidelines using manual annotation." This was further reviewed and corrected by a board-certified radiation oncologist.
8. Sample Size for the Training Set
The document provides the sample sizes for the training set for new organs introduced:
- Table 6: Training Dataset Characteristics (Examples):
- Lacrimal Glands Left/Right: 247
- Pituitary Gland: 247
- Humeral Head Left/Right: 207
- Bowel Bag: 544
- Pelvic Bone Left/Right: 160
- Sacrum: 160
- Mediastinal LN (various): 136
- Femoral Head Left/Right: 160
- Brainstem: 247
- Esophagus: 247
- Breast Left/Right: 172
- Supraglottic Larynx: 247
- Glottis: 247
The total training set size for all organs is not explicitly summed, but these numbers indicate the scale of the training data used for the specific new organs.
9. How the Ground Truth for the Training Set Was Established
"In both the annotation process for the training and validation testing data, the annotation protocols for the OAR were defined following the applicable guidelines. The ground truth annotations were drawn manually by a team of experienced annotators mentored by radiologists or radiation oncologists using an internal annotation tool. Additionally, a quality assessment including review and correction of each annotation was done by a board-certified radiation oncologist using validated medical image annotation tools."
This indicates the same rigorous process of expert manual annotation and review was applied to establish ground truth for the training set as for the test set. The validation testing and training data were explicitly stated to be independent.
Ask a specific question about this device
(151 days)
OncoStudio provides deep-learning-based automatic contouring to organs at risk in DICOM-RT format from CT images. This software could be used as an initial contouring for the clinicians to be confirmed by the radiation oncology department for treatment planning or other professions where a segmented mask of organs is needed.
- Deep learning contouring from Head & Neck, Thorax, Abdomen, and Pelvis
- Generates DICOM-RT structure of contoured objects
- Manual Contouring
- Receive, transmit, store, retrieve, display, and process medical images and DICOM objects
OncoStudio is a standalone software that provides deep-learning-based automatic contouring to organs at risk in DICOM-RT format from CT images. This software could be used as an initial contouring for the clinicians to be confirmed by the radiation oncology department for treatment planning or other professions where a segmented mask of organs is needed.
- Deep learning contouring from Head & Neck, Thorax, Abdomen, and Pelvis
- Generates DICOM-RT structure of contoured objects
- Manual Contouring
- Receive, transmit, store, retrieve, display, and process medical images and DICOM objects
It also has the following general functions: - Patient management;
- Review of processed images;
- Open and Save of files.
Based on the provided text, here's a description of the acceptance criteria and the study that proves the device meets those criteria for OncoStudio (OS-01):
The submission details a standalone performance test conducted to demonstrate the contouring capabilities of OncoStudio, an AI-powered software for automatic organ at risk contouring from CT images. The primary evaluation metric for acceptance was the Dice coefficient (DSC).
1. Acceptance Criteria and Reported Device Performance
The text explicitly states: "For the structures being compared, the mean Dice coefficient (DSC) of structures for each anatomical region (Head & Neck, Thorax, Abdomen, and Pelvis) should meet the established criteria." However, the specific numerical established criteria for the mean Dice coefficient for each anatomical region (Head & Neck, Thorax, Abdomen, and Pelvis) are not reported in the provided document. Similarly, the actual reported device performance (the mean DSC achieved for each region) is not explicitly stated in the visible sections.
To fully answer this, a table would look like this, but with missing data based on the provided text:
| Anatomical Region | Acceptance Criteria (Mean Dice Coefficient) | Reported Device Performance (Mean Dice Coefficient) |
|---|---|---|
| Head & Neck | Not specified in text | Not reported in text |
| Thorax | Not specified in text | Not reported in text |
| Abdomen | Not specified in text | Not reported in text |
| Pelvis | Not specified in text | Not reported in text |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: 310 CT images.
- 140 images from Yonsei Severance Hospital (Republic of Korea)
- 121 images from OneMedNet (U.S.A.)
- 49 images from University Hospital Basel (Switzerland)
- Data Provenance: The data is from South Korea, U.S.A., and Switzerland. The text specifies it was "collected from Yonsei Severance Hospital (Republic of Korea), OneMedNet (U.S.A.), and University Hospital Basel (Switzerland)". The data from OneMedNet is a "purchased set of CT data, mainly comprised of U.S.A. population." Yonsei Severance Hospital is in South Korea, and the Basel data is known as the TotalSegmentator dataset.
- Retrospective or Prospective: Not explicitly stated, but the description of data collection "from the years 2012, 2016, and 2020 from the University Hospital Basel through picture archiving and communication system (PACS)" implies a retrospective collection for at least part of the dataset.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: Three radiation oncologists established the ground truth segmentations for the test set.
- Qualifications of Experts (for Yonsei Severance Hospital and OneMedNet data): The radiation oncologists had "3-20 years of clinical practice," and included "associate professor, assistant professor, and radiation oncologist resident from two institutions (Yonsei Cancer Center, Samsung Seoul Hospital)."
- Qualifications of Experts (for University Hospital Basel data): The ground truth segmentation was "supervised by two physicians with 3 (M.S.) and 6 years (H.B.) of experience in body imaging, respectively." (Note: this refers to the public dataset from Basel, which was used for training, but the text states for the test set that "Ground truth segmentations were established by three radiation oncologists following international clinical guidelines" without distinguishing the origin for the test set ground truth specifically in terms of expert type, likely implying the former expert group applied to the test set as well for consistency).
4. Adjudication Method for the Test Set
The ground truthing process for the Yonsei Severance Hospital and OneMedNet data (which largely comprises the test set) was:
- "First, the 1 radiation oncologist manually delineated the organs."
- "Second, segmentation results generated by 1 radiation oncologist are sequentially edited and confirmed by 2 radiation oncologists. In this editing process, the first radiation oncologist makes corrections, and the corrected results are received and finalized by another radiation oncologist."
This indicates a sequential review and confirmation process rather than a strict 2+1 or 3+1 consensus, with an initial delineator and then two subsequent reviewers/editors, likely leading to a consensus by the end of the process.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study comparing human readers with AI assistance vs. without AI assistance was not mentioned in the provided text. The study described is a standalone performance test of the algorithm.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
Yes, a standalone performance test was done. The text explicitly states: "A standalone performance test was conducted to compare the contouring capabilities of OncoStudio."
7. The Type of Ground Truth Used
The ground truth used was expert consensus/manual annotation by radiation oncologists/physicians following international clinical guidelines (RTOG and clinical guidelines).
8. The Sample Size for the Training Set
- Total Training Data: 2,128 images.
- 731 images from Yonsei Severance Hospital (Republic of Korea)
- 194 images from OneMedNet (U.S.A)
- 1203 images from University Hospital Basel (Switzerland)
The total collected data was 2,438 datasets (315 US, 871 Korea, 1252 Europe). From this, 310 data were allocated for the test dataset, and the remaining 2,128 were used for training.
9. How the Ground Truth for the Training Set Was Established
The ground truth for the training set was established similarly to the test set:
- For Yonsei Severance Hospital (Korea) and OneMedNet (U.S.) data: Established by three radiation oncologists with 3-20 years of clinical practice following RTOG and clinical guidelines using manual annotation. The process involved initial manual delineation by one radiation oncologist, followed by sequential editing and confirmation by two other radiation oncologists.
- For University Hospital Basel (Europe) data (TotalSegmentator dataset): This is public data where ground truth was established by manual segmentation and refinement supervised by two physicians with 3 and 6 years of experience in body imaging.
Ask a specific question about this device
(27 days)
Trained medical professionals use Contour ProtégéAl as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAl supports the following indications:
· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transferring contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.
· Segmenting anatomical structures across a variety of CT anatomical locations.
· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.
Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.
Contour ProtégéAl+ is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl+ is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.
Here's a breakdown of Contour ProtégéAI+'s acceptance criteria and study information, based on the provided text:
Acceptance Criteria and Device Performance
The acceptance criteria for each structure's inclusion in the final models were a combination of statistical tests and user evaluation:
| Acceptance Criteria | Reported Device Performance (Contour ProtégéAI+) |
|---|---|
| Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Atlas). | For most structures, the Contour ProtégéAI+ Dice score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence. |
| Statistical non-inferiority of the Mean Distance Accuracy (MDA) score compared with the reference predicate (MIM Atlas). | For most structures, the Contour ProtégéAI+ MDA score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence. |
| Average user evaluation of 2 or higher (on a three-point scale: 1=negligible, 2=moderate, 3=significant time savings). | The "External Evaluation Score" (Table 2) consistently shows scores of 2 or higher across all listed structures, indicating moderate to significant time savings. |
| (For models as a whole) Statistically non-inferior cumulative Added Path Loss (APL) compared to the reference predicate. | For all 4.2.0 CT models (Thorax, Abdomen, Female Pelvis, SurePlan MRT), equivalence in cumulative APL was demonstrated (Table 3), with Contour ProtégéAI+ showing lower mean APL values than MIM Atlas. |
| (For localization accuracy) No specific passing criterion, but results are included. | Localization accuracy results (Table 4) are provided as percentages of images successfully localized for both "Relevant FOV" and "Whole Body CT," ranging from 77% to 100% depending on the structure and model. |
Note: Cells highlighted in orange in the original document indicate non-demonstrated equivalence (not reproducible in markdown), and cells marked with '**' indicate that equivalence was not demonstrated because the minimum sample size was not met for that contour.
Study Details
-
Sample size used for the test set and the data provenance:
- Test Set Sample Size: The Contour ProtégéAI+ subject device was evaluated on a pool of 770 images.
- Data Provenance: The images were gathered from 32 institutions. The verification data used for testing is from a set of institutions that are totally disjoint from the datasets used to train each model. Patient demographics for the testing data are: 53.4% female, 31.3% male, 15.3% unknown; 0.3% ages 0-20, 4.7% ages 20-40, 20.9% ages 40-60, 50.0% ages 60+, 24.1% unknown; varying scanner manufacturers (GE, Siemens, Phillips, Toshiba, unknown). The data is retrospective, originating from clinical treatment plans according to the training set description.
-
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document implies that the ground truth for the test set was validated against "original ground-truth contours" when measuring Dice and MDA against MIM Maestro. However, the expert qualifications are explicitly stated for the training set ground truth, which often implies a similar standard for the test set.
- Ground truth (for training/re-segmentation) was established by:
- Consultants (physicians and dosimetrists) specifically for this purpose, outside of clinical practice.
- Initial segmentations were reviewed and corrected by radiation oncologists.
- Final review and correction by qualified staff at MIM Software (MD or licensed dosimetrists).
- All segmenters and reviewers were instructed to ensure the highest quality training data according to relevant published contouring guidelines.
-
Adjudication method for the test set:
- The document doesn't explicitly describe a specific adjudication method like "2+1" or "3+1" for the test set ground truth. However, it does state that "Detailed instructions derived from relevant published contouring guidelines were prepared for the dosimetrists. The initial segmentations were then reviewed and corrected by radiation oncologists against the same standards and guidelines. Qualified staff at MIM Software (MD or licensed dosimetrists) then performed a final review and correction." This process implies a multi-expert review and correction process to establish the ground truth used for both training and evaluation, ensuring a high standard of accuracy.
-
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- A direct MRMC comparative effectiveness study measuring human readers' improvement with AI versus without AI assistance (i.e., human-in-the-loop performance) is not explicitly described in terms of effect size.
- Instead, the study evaluates the standalone performance of the AI device (Contour ProtégéAI+) against a reference device (MIM Maestro atlas segmentation) and user evaluation of time savings.
- The "Average user evaluation of 2 or higher" on a three-point scale (1=negligible, 2=moderate, 3=significant time savings) provides qualitative evidence of perceived improvement in workflow rather than a quantitative measure of diagnostic accuracy improvement due to AI assistance. "Preliminary user evaluation conducted as part of testing demonstrated that Contour ProtégéAI+ yields comparable time-saving functionality when creating contours as other commercially available automatic segmentation products."
-
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance evaluation was conducted. The primary comparisons for Dice score, MDA, and cumulative APL are between the Contour ProtégéAI+ algorithm's output and the ground truth, benchmarked against the predicate device's (MIM Maestro atlas segmentation) standalone performance. The results in Table 2 and Table 3 directly show the algorithm's performance.
-
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert Consensus Contour (and review): The ground truth was established by expert re-segmentation of images (by consultants, physicians, and dosimetrists) specifically for this purpose, reviewed and corrected by radiation oncologists, and then subjected to a final review and correction by qualified MIM Software staff (MD or licensed dosimetrists). This indicates a robust expert consensus process based on established clinical guidelines.
-
The sample size for the training set:
- The document states that the CT images for the "training set were obtained from clinical treatment plans for patients prescribed external beam or molecular radiotherapy". However, it does not provide a specific numerical sample size for the training set, only for the test set (770 images). It only mentions being "re-segmented by consultants... specifically for this purpose".
-
How the ground truth for the training set was established:
- The ground truth for the training set was established through a multi-step expert process:
- CT images from clinical treatment plans were re-segmented by consultants (physicians and dosimetrists), explicitly for the purpose of creating training data, outside of clinical practice.
- Detailed instructions from relevant published contouring guidelines were provided to the dosimetrists.
- Initial segmentations were reviewed and corrected by radiation oncologists against the same standards and guidelines.
- A final review and correction was performed by qualified staff at MIM Software (MD or licensed dosimetrists).
- All experts were instructed to spend additional time to ensure the highest quality training data, contouring all specified OAR structures on all images according to referenced standards.
- The ground truth for the training set was established through a multi-step expert process:
Ask a specific question about this device
(90 days)
AutoContour is intended to assist radiation treatment planners in contouring and reviewing structures within medical images in preparation for radiation therapy treatment planning.
As with AutoContour Model RADAC V3, the AutoContour Model RADAC V4 device is software that uses DICOM-compliant image data (CT or MR) as input to: (1) automatically contour various structures of interest for radiation therapy treatment planning using machine learning based contouring. The deep-learning based structure models are trained using imaging datasets consisting of anatomical organs of the head and neck, thorax, abdomen and pelvis for adult male and female patients, (2) allow the user to review and modify the resulting contours, and (3) generate DICOM-compliant structure set data the can be imported into a radiation therapy treatment planning system.
AutoContour Model RADAC V4 consists of 3 main components:
-
- A .NET client application designed to run on the Windows Operating System allowing the user to load image and structure sets for upload to the cloud-based server for automatic contouring, perform registration with other image sets, as well as review, edit, and export the structure set.
-
- A local "agent" service designed to run on the Windows Operating System that is configured by the user to monitor a network storage location for new CT and MR datasets that are to be automatically contoured.
-
- A cloud-based automatic contouring service that produces initial contours based on image sets sent by the user from the .NET client application.
Here's an analysis of the acceptance criteria and study findings for the Radformation AutoContour (Model RADAC V4) device, based on the provided text:
1. Acceptance Criteria and Reported Device Performance
The primary acceptance criterion for the automated contouring models is the Dice Similarity Coefficient (DSC), which measures the spatial overlap between the AI-generated contour and the ground truth contour. The criteria vary based on the estimated size of the anatomical structure. Additionally, for external clinical testing, an external reviewer rating was used to assess clinical appropriateness.
| Acceptance Criteria Category | Metric (for AI performance) | Performance Criteria (for AI performance) | Reported Device Performance (Mean ± Std Dev) for CT Models | Reported Device Performance (Mean ± Std Dev) for MR Models | Reported Device Performance (Mean External Reviewer Rating 1-5, higher is better) |
|---|---|---|---|---|---|
| Contouring Accuracy (CT Models) | Mean Dice Similarity Coefficient (DSC) | Large Volume Structures: ≥ 0.8 | 0.92 ± 0.06 | 0.96 ± 0.03 | N/A |
| Medium Volume Structures: ≥ 0.65 | 0.85 ± 0.09 | 0.84 ± 0.07 | N/A | ||
| Small Volume Structures: ≥ 0.5 | 0.81 ± 0.12 | 0.74 ± 0.09 | N/A | ||
| Clinical Appropriateness (CT Models) | External Reviewer Rating (1-5 scale) | Average Score ≥ 3 | N/A | N/A | 4.57 (across all CT models) |
| Contouring Accuracy (MR Models) | Mean Dice Similarity Coefficient (DSC) | Large Volume Structures: ≥ 0.8 | N/A | 0.96 ± 0.03 (training data) 0.80 ± 0.09 (external data) | N/A |
| Medium Volume Structures: ≥ 0.65 | N/A | 0.84 ± 0.07 (training data) 0.84 ± 0.09 (external data) | N/A | ||
| Small Volume Structures: ≥ 0.5 | N/A | 0.74 ± 0.09 (training data) 0.61 ± 0.14 (external data) | N/A | ||
| Clinical Appropriateness (MR Models) | External Reviewer Rating (1-5 scale) | Average Score ≥ 3 | N/A | N/A | 4.6 (across all MR models) |
2. Sample Size Used for the Test Set and Data Provenance
-
CT Models Test Set:
- Sample Size: For individual CT structure models, the number of testing sets ranged from 10 to 116 for the internal validation (Table 4) and 13 to 82 for the external clinical testing (Table 6). The document states "approximately 10% of the number of training image sets" were used for testing in the internal validation, with an average of 54 testing image sets per CT structure model.
- Data Provenance: Imaging data for training was gathered from 4 institutions in 2 different countries (United States and Switzerland). External clinical testing data for CT models was sourced from various TCIA (The Cancer Imaging Archive) datasets (Pelvic-Ref, Head-Neck-PET-CT, Pancreas-CT-CB, NSCLC, LCTSC, QIN-BREAST) and shared from several unidentified institutions in the United States. Data was retrospective, as it was acquired and then used for model validation.
-
MR Models Test Set:
- Sample Size: For individual MR structure models, the number of testing sets ranged from 45 for internal validation (Table 8) and 5 to 45 for external clinical testing (Table 10). The document states an average of 45 testing image sets per MR Brain model and 77 testing image sets per MR Pelvis model were used for internal validation.
- Data Provenance: Imaging data for training and internal testing was acquired from the Cancer Imaging Archive GLIS-RT dataset (for Brain models) and two open-source datasets plus one institution in the United States (for Pelvis models). External clinical testing data for MR models was from a clinical partner (for Brain models), two publicly available datasets (Prostate-MRI-U-S-Biopsy, Gold Atlas Pelvis, SynthRad), and two institutions utilizing MR Linacs for image acquisitions. Data was retrospective.
-
General Note: Test datasets were independent from those used for training.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: Three (3) experts were used.
- Qualifications of Experts: The ground truth was established by three clinically experienced experts consisting of 2 radiation therapy physicists and 1 radiation dosimetrist.
4. Adjudication Method for the Test Set
- Method: Ground truthing of each test data set was generated manually using consensus (NRG/RTOG) guidelines as appropriate by the three experts. This implies an expert consensus method, likely involving discussion and agreement among the three. The document does not specify a quantitative adjudication method like "2+1" or "3+1" but rather a "consensus" guided by established clinical guidelines.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done
- The document does not report an MRMC comparative effectiveness study comparing human readers with AI assistance versus without AI assistance. The study focuses purely on the AI's performance and its clinical appropriateness as rated by external reviewers.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was Done
- Yes, a standalone performance evaluation was done. The core of the performance data presented (Dice Similarity Coefficient) is a measure of the algorithm's direct output compared to the ground truth, without a human in the loop during the contour generation phase. The external reviewer ratings also assess the standalone performance of the AI-generated contours regarding their clinical utility for subsequent editing and approval.
7. The Type of Ground Truth Used
- Type: The ground truth used was expert consensus, specifically from three clinically experienced experts (2 radiation therapy physicists and 1 radiation dosimetrist), guided by NRG/RTOG guidelines.
8. The Sample Size for the Training Set
- CT Models Training Set: For CT structure models, there was an average of 341 training image sets.
- MR Models Training Set: For MR Brain models, there was an average of 149 training image sets. For MR Pelvis models, there was an average of 306 training image sets.
9. How the Ground Truth for the Training Set Was Established
The document states that the deep-learning based structure models were "trained using imaging datasets consisting of anatomical organs" and that the "test datasets were independent from those used for training." While it extensively details how ground truth was established for the test sets (manual generation by three experts using consensus and NRG/RTOG guidelines), it does not explicitly describe how the ground truth for the training sets was established. However, given the nature of deep learning models for medical image segmentation, it is highly probable that the training data also had meticulously generated, expert-annotated ground truth contours, likely following similar rigorous processes as the test sets, potentially from various institutions or public datasets. The consistency of the model architecture and training methodologies (e.g., "very similar CNN architecture was used to train these new CT models") suggests a standardized approach to data preparation, including ground truth generation, for both training and testing.
Ask a specific question about this device
Page 1 of 5