(145 days)
Trained medical professionals use Contour ProtégéAI as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAI supports the following indications:
· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transferring contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.
· Segmenting anatomical structures across a variety of CT anatomic locations.
· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.
Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.
Contour ProtégéAI is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.
Here's a detailed breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) summary:
Acceptance Criteria and Reported Device Performance for Contour ProtégéAI
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criteria Category | Acceptance Criteria | Reported Device Performance (Contour ProtégéAI) |
---|---|---|
Individual Structure Performance | 1. Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Maestro atlas segmentation). |
- Statistical non-inferiority of the MDA score compared with the reference predicate (MIM Maestro atlas segmentation).
- Average user evaluation score of 2 or higher (on a 3-point scale).
A structure is deemed acceptable if it passes two or more of these three tests. | Dice Score: For all reported structures in the Head and Neck, Thorax, and Whole Body - Physiological Uptake Organs CT models, Contour ProtégéAI generally showed higher mean Dice scores (indicating better overlap with ground truth) and often superior lower 95th percentile confidence bounds compared to MIM Atlas. Equivalence (defined as lower 95th percentile confidence bound of ProtégéAI Dice > 0.1 Dice lower than MIM Atlas mean) was demonstrated for most structures, often with direct improvement.
MDA Score: For most reported structures, Contour ProtégéAI showed lower mean MDA scores (indicating better boundary accuracy/distance to ground truth) and often superior upper 95th percentile confidence bounds compared to MIM Atlas. Equivalence was demonstrated for most structures, again often with direct improvement.
External Evaluation Score: All reported structures achieved an average user evaluation score of 2 or higher (ranging from 2.0 to 3.0), indicating moderate to significant time savings.
Overall: The summary states: "Contour ProtégéAl results were equivalent or had better performance than the MIM Maestro atlas segmentation reference device." And "only structures that pass two or more of the following three tests could be included in the final models". This indicates successful performance against the criteria for all included structures. |
| Model-as-a-Whole Performance | Statistically non-inferior cumulative Added Path Length (APL) compared to the reference predicate. | Cumulative APL (mm):
- Head and Neck CT: MIM Atlas: 38.69 ± 33.36; Contour ProtégéAI: 28.61 ± 29.59. Equivalence demonstrated.
- Thorax CT: MIM Atlas: 89.24 ± 82.73; Contour ProtégéAI: 65.44 ± 68.85. Equivalence demonstrated.
- Whole Body - Physiological Uptake Organs CT: MIM Atlas: 138.06 ± 142.42; Contour ProtégéAI: 98.20 ± 127.11. Equivalence demonstrated.
This indicates that Contour ProtégéAI performs with lower or equivalent APL, suggesting less editing time for the entire model. |
| Localization Accuracy (Informational) | No passing criterion, but results included for user understanding. | Percentage of images successfully localized by Contour ProtégéAI is provided for each structure and model. Most structures show 100% localization accuracy within their relevant FOV for Head and Neck and Thorax models. Some structures (e.g., Cochlea_L/R, OpticChiasm, Pancreas) show slightly lower percentages, indicating instances where the structure was not localized. For Whole Body CT, many structures also show 100%, with a few exceptions (e.g., Bladder: 95%, LN_Iliac: 64%). |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: 754 independent images.
- Data Provenance: Gathered from 27 institutions. The document does not explicitly state the countries of origin for the test set, but for the training set, it mentions "across multiple continents" and lists "USA" and "Hong Kong" and "Australia." It is reasonable to infer the test set would also be from diverse institutions/countries. The data is retrospective as it was gathered from existing clinical treatment plans.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
The ground truth for the test set was established by a multi-stage process involving:
- Initial Segmentation: Consultants (physicians and dosimetrists).
- Review and Correction: A radiation oncologist.
- Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).
While the exact number of experts is not specified, it involved multiple individuals with specialized qualifications (physicians, dosimetrists, radiation oncologists, M.D.s, licensed dosimetrists).
4. Adjudication Method for the Test Set
The ground truth generation involved a multi-stage review and correction process:
- Initial segmentations by consultants (physicians and dosimetrists).
- Review and correction by a radiation oncologist against established standards and guidelines.
- Final review and correction by qualified staff at MIM Software (M.D. or licensed dosimetrists).
This indicates a sequential refinement process, potentially similar to a "cascading consensus" or "expert review and correction" rather than a specific numeric adjudication method like 2+1 or 3+1 for resolving disagreements among multiple initial segmenters. The explicit mentioning of "correction" at multiple stages suggests an iterative process where initial segmentations were refined based on expert review.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
No, a traditional MRMC comparative effectiveness study was not explicitly stated in the provided text in the context of comparing human readers with and without AI assistance to measure an effect size on human performance.
Instead, the study primarily focused on the standalone performance of the AI model (Contour ProtégéAI) compared to an existing atlas-based segmentation system (MIM Maestro) using quantitative metrics (Dice, MDA, APL) and a user evaluation for "time savings functionality." The user evaluation (average score of 2 or higher on a three-point scale for time savings) provides an indirect measure of the AI's utility, but not a direct MRMC study on human reader improvement with AI.
6. If a Standalone Study Was Done
Yes, a standalone study was done.
- Contour ProtégéAI (the algorithm under review) was evaluated in comparison to a reference predicate device, MIM Maestro (K071964), which uses an atlas-based segmentation approach.
- The comparison involved quantitative metrics like Dice score, MDA, and cumulative APL, as well as a qualitative user evaluation. The goal was to show that Contour ProtégéAI was equivalent or superior in performance to the reference predicate in a standalone capacity.
7. The Type of Ground Truth Used
The ground truth used for the test set was expert consensus / expert-derived segmentation.
- It was derived from clinical treatment plans, but the original segmentations were not used.
- The images were re-segmented by consultants (physicians and dosimetrists) specifically for this purpose, following detailed clinical contouring guidelines.
- These initial segmentations were then reviewed and corrected by a radiation oncologist.
- A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
- All segmenters were instructed to ensure the "highest quality training data" and contour according to referenced standards.
8. The Sample Size for the Training Set
- CT Models: A total of 550 CT images from 41 clinical sites.
- The document implies that these 550 images are specifically for the training of the final 4.1.0 neural network models for CT. It does not explicitly state the training set size for MR models if they were separate.
9. How the Ground Truth for the Training Set Was Established
The ground truth for the training set was established through a rigorous, multi-stage expert-driven process, identical to the description for the test set ground truth:
- Initial Segmentation: Performed by consultants (physicians and dosimetrists) following detailed instructions derived from published clinical contouring guidelines.
- Review and Correction: By a radiation oncologist against the same standards and guidelines.
- Final Review and Correction: By qualified staff at MIM Software (M.D. or licensed dosimetrists).
- The goal was "to ensure the highest quality training data."
- Segmenters were asked to contour all specified OAR structures on all images according to referenced standards, regardless of proximity to the treatment field.
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).