(105 days)
ART-Plan is indicated for cancer patients for whom radiation treatment has been planned. It is intended to be used by trained medical professionals including, but not limited to, radiologists, radiation oncologists, dosimetrists, and medical physicists.
ART-Plan is a software application intended to display and visualize 3D multi-modal medical image data. The user may import, define, display, transform and store DICOM3.0 compliant datasets (including regions of interest structures). These images, contours and objects can subsequently be exported/distributed within the system, across computer networks and/or to radiation treatment planning systems. Supported modalities include CT, PET-CT, CBCT, 4D-CT and MR images.
ART-Plan supports Al-based contouring on CT and MR images and offers semi-automatic and manual tools for segmentation.
To help the user assess changes in image data and to obtain combined multi-modal image information, ART-Plan allows the registration of anatomical and functional images and display of fused and non-fused images to facilitate the comparison of patient image data by the user.
With ART-Plan, users are also able to generate, visualize, evaluate and modify pseudo-CT from MRI images.
The ART-Plan application consists of two kev modules: SmartFuse and Annotate, allowing the user to display and visualize 3D multi-modal medical image data. The user may process, render, review, store, display and distribute DICOM 3.0 compliant datasets within the system and/or across computer networks. Supported modalities cover static and gated CT (computerized tomography including CBCT and 4D-CT), PET (positron emission tomography) and MR (magnetic resonance).
The ART-Plan technical functionalities claimed by TheraPanacea are the following:
- Proposing automatic solutions to the user, such as an automatic delineation, automatic . multimodal image fusion, etc. towards improving standardization of processes/ performance / reducing user tedious / time consuming involvement.
- . Offering to the user a set of tools to assist semi-automatic delineation, semi-automatic reqistration towards modifying/editing manually automatically generated structures and adding/removing new/undesired structures or imposing user-provided correspondences constraints on the fusion of multimodal images.
- . Presenting to the user a set of visualization methods of the delineated structures, and registration fusion maps.
- . Saving the delineated structures / fusion results for use in the dosimetry process.
- Enabling rigid and deformable registration of patients images sets to combine information contained in different or same modalities.
- . Allowing the users to generate, visualize, evaluate and modify pseudo-CT from MRI images.
ART-Plan offers deep-learning based automatic segmentation for the following localizations:
- head and neck (on CT images) ●
- thorax/breast (for male/female and on CT images) ●
- abdomen (on CT images and MR images) ●
- pelvis male(on CT images and MR images) ●
- pelvis female (on CT images) ●
- brain (on CT images and MR images)
ART-Plan offers deep-learning based synthetic CT-generation from MR images for the following localizations:
- . pelvis male
- brain
The provided text describes the acceptance criteria and the study conducted to prove that the ART-Plan v1.10.1 device meets these criteria. Note that this submission is a Special 510(k) for modifications to an already cleared device (ART-Plan v1.10.0), focusing on the addition of 48 new structures to existing localizations and 8 bug fixes. The performance studies primarily validate these new structures.
Here's the detailed breakdown:
1. Table of Acceptance Criteria and Reported Device Performance
The device ART-Plan v1.10.1 is an AI-based contouring tool. The acceptance criteria and reported performance for the new structures are categorized into two main types: quantitative (using Dice Similarity Coefficient) and qualitative.
For Auto-segmentation Models (New Structures):
Acceptance Criteria Type | Acceptance Criteria | Reported Device Performance (Examples from Table 4) | Pass/Fail |
---|---|---|---|
Quantitative | a) DSC (mean) ≥ 0.8 (AAPM standard) | (Not explicitly shown for new structures, but implied passed) | Pass |
b) DSC (mean) ≥ 0.54 OR DSC (mean) ≥ mean (DSC inter-expert) + 5% | Carina: DICE diff inter-expert = 6.58% | Pass | |
Lad coronary: DICE diff inter-expert = 15.56% | Pass | ||
Left bronchia: DICE diff inter-expert = 14.75% | Pass | ||
Right cochlea: DICE diff inter-expert = 29.22% | Pass | ||
Qualitative | c) A+B % ≥ 85% (clinically acceptable without modifications or with minor corrections) | Ascending aorta: A+B = 100% | Pass |
Left atrium: A+B = 100% | Pass | ||
Left main coronary artery: A+B = 93% | Pass | ||
Sigmoid: A+B = 100% | Pass |
For Synthetic-CT Generation Tool (General, not specifically for new features in this submission):
Acceptance Criteria Type | Acceptance Criteria | Reported Device Performance | Pass/Fail |
---|---|---|---|
Quantitative | a) A median 2%/2mm gamma passing criteria of ≥95% | (Not explicitly shown in this document, but implied passed for prior clearance) | Pass |
b) A median 3%/3mm gamma passing criteria of ≥99.0% | (Not explicitly shown in this document, but implied passed for prior clearance) | Pass | |
c) A mean dose deviation (pseudo-CT compared to standard CT) of ≤2% in ≥88% of patients | (Not explicitly shown in this document, but implied passed for prior clearance) | Pass |
2. Sample Size Used for the Test Set and Data Provenance
The document indicates that for the new structures, the sample sizes for the test set varied:
- For quantitative evaluations (Dice difference inter-expert):
- Minimum sample size for evaluation method: 20
- Reported sample size for most structures (e.g., Carina, Lad coronary, Left bronchia, Right cochlea): 33
- Reported sample size for some Brain T1 (MR) structures (e.g., Anterior cerebellum, Left cochlea): 30
- For qualitative evaluations (A+B %):
- Minimum sample size for evaluation method: 15
- Reported sample size for most structures (e.g., Ascending aorta, Left atrium, Left main coronary artery): 20
- Reported sample size for Left cervical lymph node IVB, Right cervical lymph node IVB: 15
- Reported sample size for Sigmoid: 30
Data Provenance: The data used for training and testing are described as "real-world retrospective data which were initially used for treatment of cancer patients." The document mentions that the data originated from various centers, with a statistical analysis of imaging vendors in EU & USA to represent the market share. It also states that the data demographic distribution (gender, age) aligns with cancer incidence statistics in the US, UK, and globally.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
The document explicitly states that the "truthing" process "includes a mix of data created by different delineators (clinical experts) and assessment of intervariability, ground truth contours provided by the centers and validated by a second expert of the center, and qualitative evaluation and validation of the contours."
- Number of Experts: For the inter-expert variability comparison, at least two experts are implied (one for ground truth, and comparison to other delineators or a second expert validation). For qualitative evaluations, "experts" (plural) are mentioned.
- Qualifications of Experts: The document states "trained medical professionals including, but not limited to, radiation oncologists, dosimetrists, and medical physicists." The ground truth contours were "provided by the centers and validated by a second expert of the center," indicating a high level of clinical expertise.
4. Adjudication Method for the Test Set
The adjudication method is implied to be a form of expert consensus or validation. The "truthing process" includes:
- "data created by different delineators (clinical experts)"
- "assessment of intervariability"
- "ground truth contours provided by the centers and validated by a second expert of the center"
- "qualitative evaluation and validation of the contours"
This suggests that for creating the reference standard, multiple experts contributed, and a validation step often involving a second expert was performed. For comparing the AI model's performance to human experts, it was compared to "inter-expert variability" or validated qualitatively by "experts." This is not a strict "2+1" or "3+1" for every single case, but rather a process involving consensus, validation, and inter-variability analysis among clinical experts for establishing the ground truth and for evaluating the AI's performance against that truth and against other expert interpretations.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No explicit Multi-Reader Multi-Case (MRMC) comparative effectiveness study comparing human readers with AI assistance vs. without AI assistance is detailed in the provided text. The studies focus on the standalone performance of the AI model against the established ground truth and inter-expert variability.
6. Standalone Performance Study
Yes, a standalone performance study was done for the algorithm without human-in-the-loop. The tables and descriptions of acceptance criteria and results (Dice Similarity Coefficient, A+B% for qualitative evaluation) directly assess the performance of the AI-based contouring (Annotate module) in generating contours.
7. Type of Ground Truth Used
The ground truth used is primarily expert consensus/delineation. It is described as:
- "data created by different delineators (clinical experts)"
- "ground truth contours provided by the centers and validated by a second expert of the center"
- "qualitative evaluation and validation of the contours"
The contouring guidelines followed were confirmed with the data-providing centers, and the process aimed to be representative of delineation practice across centers and international guidelines.
8. Sample Size for the Training Set
- Training samples: 299,142
- Validation samples: 75,018
- Total samples: 374,160
Although the total number of samples is 374,160, the document clarifies that "The total number of patients used for training (8736) is lower than the number of samples (374160)." This indicates that one patient can contribute to multiple images and multiple structures, leading to a higher number of "samples" for training an AI model.
9. How the Ground Truth for the Training Set Was Established
The ground truth for the training set was established through "real-world retrospective data," where contours were generated by clinical experts. The process included:
- Contouring guidelines confirmed with data-providing centers.
- A mix of data created by different delineators (clinical experts).
- Ground truth contours provided by the centers and validated by a second expert of the center.
- Qualitative evaluation and validation of the contours to ensure representativeness of delineation practice and adherence to international guidelines.
This rigorous process aimed to account for expert annotation variability and ensure the training data was clinically relevant and accurate.
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).