(111 days)
Trained medical professionals use Contour ProtégéAI as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAI supports the following indications:
· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transfering contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.
· Segmenting anatomical structures across a variety of CT anatomic locations.
· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.
Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.
Contour ProtégéAI is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.
Here's a breakdown of the acceptance criteria and study details for Contour ProtégéAI, based on the provided document:
Acceptance Criteria and Device Performance Study for Contour ProtégéAI
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for Contour ProtégéAI were based on a non-inferiority study comparing its segmentation performance (measured by Dice coefficient) to a predicate device, MIM Maestro (K071964), specifically using atlases built from the same training data. The key acceptance criterion was:
Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance.
This translates to being either equivalent to or having better performance than the MIM Maestro atlas segmentation reference device. The acceptance was demonstrated at a p=0.05 significance level.
The table below summarizes the reported mean ± standard deviation Dice coefficients for both the MIM Atlas (predicate) and Contour ProtégéAI, along with the lower 95th percentile confidence bound for Contour ProtégéAI, for various anatomical structures across different CT models (4.0.0 CT Model). The asterisk (*) next to Contour ProtégéAI performance indicates that equivalence was demonstrated at p=0.05.
Note: The document presents a single large table for all structures and models. For clarity, a few representative examples from each CT Model are extracted below to illustrate the reported performance against the acceptance criteria. The full table from the document should be consulted for comprehensive results.
4.0.0 CT Model: | Structure: | MIM Atlas (Mean ± Std Dice) | Contour ProtégéAI (Mean ± Std Dice, Lower 95th Percentile Bound) | Acceptance Met? |
---|---|---|---|---|
Head and Neck | Bone_Mandible | 0.81 ± 0.07 | 0.85 ± 0.07 (0.82) * | Yes |
Head and Neck | Brain | 0.97 ± 0.01 | 0.98 ± 0.01 (0.97) * | Yes |
Head and Neck | SpinalCord | 0.66 ± 0.14 | 0.63 ± 0.16 (0.57) * | Yes |
Thorax | Esophagus | 0.49 ± 0.16 | 0.70 ± 0.15 (0.65) * | Yes |
Thorax | Heart | 0.88 ± 0.08 | 0.90 ± 0.07 (0.88) * | Yes |
Thorax | Lung_L | 0.95 ± 0.02 | 0.96 ± 0.02 (0.96) * | Yes |
Abdomen | Bladder | 0.72 ± 0.23 | 0.91 ± 0.12 (0.81) * | Yes |
Abdomen | Liver | 0.84 ± 0.12 | 0.92 ± 0.08 (0.86) * | Yes |
Pelvis | Prostate | 0.74 ± 0.12 | 0.85 ± 0.06 (0.82) * | Yes |
Pelvis | Rectum | 0.63 ± 0.18 | 0.83 ± 0.11 (0.79) * | Yes |
SurePlan MRT | Bone | 0.76 ± 0.08 | 0.87 ± 0.05 (0.74) * | Yes |
SurePlan MRT | Spleen | 0.72 ± 0.10 | 0.95 ± 0.03 (0.87) * | Yes |
2. Sample Size and Data Provenance for the Test Set
- Sample Size for Test Set: 819 independent images.
- Data Provenance: The images were gathered from 10 institutions. The document explicitly states that the test set institutions are "totally disjoint from the training datasets used to train each model." The countries of origin for the test set are not explicitly detailed, but since the training data included multiple countries (USA, Hong Kong, Australia), it's implied the test set could also be diverse. The data was retrospective clinical data, re-segmented for this specific purpose.
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: The ground truth for the test set was established by "consultants (physicians and dosimetrists)." The exact number is not specified, but it implies a team. These initial segmentations were then "reviewed and corrected by a radiation oncologist." Finally, "Qualified staff at MIM Software (M.D. or licensed dosimetrists) then performed a final review and correction."
- Qualifications of Experts:
- Consultants: Physicians and dosimetrists.
- Review and Correction: Radiation oncologist.
- Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).
- All segmenters and reviewers were given "detailed instructions derived from relevant published clinical contouring guidelines" and instructed to ensure the "highest quality training data."
4. Adjudication Method for the Test Set
The adjudication method involved a multi-stage process:
- Initial Segmentation: Done by consultants (physicians and dosimetrists).
- First Review & Correction: By a radiation oncologist.
- Final Review & Correction: By qualified staff (M.D. or licensed dosimetrists) at MIM Software.
This indicates a sequential review process, rather than a specific (e.g., 2+1, 3+1) consensus model among peers at the same stage.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No MRMC comparative effectiveness study was explicitly described comparing human readers with AI assistance versus without AI assistance. The study focused on the algorithm's standalone performance compared to an atlas-based predicate device, and a preliminary user evaluation for time-saving was mentioned, but not in the context of an MRMC study.
6. Standalone (Algorithm Only) Performance
Yes, a standalone (algorithm only) performance study was conducted. The Dice coefficient results presented in the table demonstrate the performance of the Contour ProtégéAI algorithm compared to the MIM Maestro atlas-based segmentation, without human intervention in the segmentation process being evaluated. The document explicitly states the "performance of both segmentation devices was measured by calculating the Dice score of the novel segmentations with the original ground-truth contours."
7. Type of Ground Truth Used
The ground truth used was expert consensus. It was established by a multi-stage review and correction process involving physicians, dosimetrists, a radiation oncologist, and qualified MIM Software staff who re-segmented images "specifically for this purpose, outside of clinical practice" and were instructed to adhere to "relevant published clinical contouring guidelines."
8. Sample Size for the Training Set
The training set consisted of 326 CT images gathered from 37 clinical sites across multiple countries (USA, Hong Kong, Australia).
9. How the Ground Truth for the Training Set was Established
The ground truth for the training set was established through a rigorous, multi-step expert review process:
- CT images (from clinical treatment plans) were re-segmented by consultants (physicians and dosimetrists).
- These initial segmentations were then reviewed and corrected by a radiation oncologist against the same standards and guidelines.
- A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
All involved in ground truth establishment were given "detailed instructions derived from relevant published clinical contouring guidelines" and were explicitly asked "to spend additional time to ensure the highest quality training data" and to contour all specified structures "according to referenced standards."
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).