(45 days)
Trained medical professionals use Contour ProtégéAI as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAI supports the following indications:
· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transfering contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.
· Segmenting normal structures across a variety of CT anatomical locations.
· And segmenting normal structures of the prostate, seminal vesicles, and urethra within T2-weighted MR images.
Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.
Contour ProtégéAl is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.
The user is asking for information about the acceptance criteria and study proving the device's performance from the provided FDA 510(k) summary. I will extract the relevant information to answer each point.
Here's a breakdown of the requested information and where to find it in the document:
- A table of acceptance criteria and the reported device performance: This will primarily come from the "Testing and Performance Data" section, specifically the table comparing MIM Atlas and Contour ProtégéAI Dice coefficients and the equivalence definition.
- Sample sizes used for the test set and the data provenance: Found in the "Testing and Performance Data" section.
- Number of experts used to establish the ground truth for the test set and the qualifications of those experts: Found in the "Testing and Performance Data" section.
- Adjudication method (e.g. 2+1, 3+1, none) for the test set: Found in the "Testing and Performance Data" section regarding ground truth generation.
- If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance: The document describes a comparison between the AI (Contour ProtégéAI) and an atlas-based segmentation (MIM Maestro reference device), not a human-in-the-loop study with human readers comparing performance with and without AI assistance.
- If a standalone (i.e. algorithm only without human-in-the-loop performance) was done: The provided data compares the algorithm's performance against a ground truth and an atlas-based reference algorithm. The use of "appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI" implies it's an AI-assisted tool, but the testing itself appears to be an algorithmic comparison.
- The type of ground truth used (expert consensus, pathology, outcomes data, etc.): Found in the "Testing and Performance Data" section.
- The sample size for the training set: Found in the "Device Description" and "Testing and Performance Data" sections.
- How the ground truth for the training set was established: Found in the "Testing and Performance Data" section.
Here's the detailed response based on the provided document:
Acceptance Criteria and Study Proving Device Performance
The study evaluated the performance of Contour ProtégéAI, specifically its new 3.0.0 CT neural network models, by comparing its segmentation accuracy (Dice coefficient) against a reference atlas-based segmentation device, MIM Maestro (K071964).
1. Table of Acceptance Criteria and Reported Device Performance:
Item | Acceptance Criteria | Reported Device Performance and Equivalence |
---|---|---|
Equivalence | Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance. This means: Contour ProtégéAI_LB95 > MIM_Atlas_Mean - 0.1 | "Contour ProtégéAI results were equivalent or had better performance than the MM Maestro atlas segmentation reference device." This was demonstrated at a p=0.05 significance level for all structures. Below is a sample of reported Dice coefficients, where * indicates equivalence demonstrated.* |
Structure: | MIM Atlas | Contour ProtégéAI |
---|---|---|
A_Aorta_Desc | 0.73 ± 0.15 | 0.78 ± 0.07 (0.68) * |
Bladder | 0.80 ± 0.12 | 0.94 ± 0.02 (0.86) * |
Bone | 0.80 ± 0.03 | 0.83 ± 0.05 (0.76) * |
Bone_Mandible | 0.79 ± 0.16 | 0.83 ± 0.04 (0.74) * |
Bowel † | 0.60 ± 0.13 | 0.75 ± 0.07 (0.68) * |
Colon_Sigmoid | 0.08 ± 0.09 | 0.50 ± 0.19 (0.33) * |
Esophagus | 0.43 ± 0.17 | 0.56 ± 0.19 (0.47) * |
Liver | 0.84 ± 0.12 | 0.93 ± 0.04 (0.87) * |
LN_Pelvic | 0.76 ± 0.03 | 0.80 ± 0.04 (0.77) * |
Lung_L | 0.94 ± 0.03 | 0.95 ± 0.02 (0.93) * |
Lung_R | 0.95 ± 0.02 | 0.95 ± 0.02 (0.94) * |
Prostate | 0.71 ± 0.12 | 0.82 ± 0.06 (0.74) * |
Rectum | 0.67 ± 0.14 | 0.76 ± 0.08 (0.67) * |
SeminalVes | 0.58 ± 0.15 | 0.70 ± 0.08 (0.60) * |
Spinal_Cord | 0.76 ± 0.10 | 0.82 ± 0.07 (0.78) * |
Spleen | 0.78 ± 0.14 | 0.91 ± 0.07 (0.80) * |
Stomach | 0.45 ± 0.20 | 0.79 ± 0.09 (0.69) * |
(Mean ± Std Dice coefficient (lower 95th percentile confidence bound based on normal distribution in parentheses). Equivalence demonstrated at p=0.05 significance level between Contour ProtégéAI and MIM Atlas) Source: Modified from the "Testing and Performance Data" table. |
2. Sample size used for the test set and the data provenance:
- Test Set Size: 739 independent images.
- Data Provenance: Gathered from 12 institutions. The specific countries for the test set are not explicitly stated, but the training data (from which test subjects were explicitly excluded) was from Australia, France, Hong Kong, and the USA. The data collection was prospective in the sense that the training data explicitly excluded patients from the institutions contributing to the test set, ensuring independence.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Not explicitly stated as a fixed number.
- Qualifications of Experts: Ground truth segmentations were generated by a "trained user (typically, a dosimetrist or radiologist)" and then reviewed and approved by a "supervising physician (typically, a radiation oncologist or a radiologist)."
4. Adjudication method for the test set:
- The ground truth generation process involved: initial segmentation by a trained user, followed by review and approval by a supervising physician. If necessary, the data was sent back for re-segmentation and re-review. This constitutes an iterative consensus-building method rather than a strict 2+1 or 3+1 type of adjudication.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study involving human readers' improvement with AI vs. without AI assistance was not conducted or reported in this summary. The study focused on the standalone algorithmic performance of the AI tool (Contour ProtégéAI) compared to an existing atlas-based automatic segmentation method (MIM Maestro). The device is intended as a "tool to assist" and mandates review/editing by users, but the performance study itself was not a human-in-the-loop clinical trial.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, the primary study reported is a standalone algorithmic performance comparison. The Dice coefficients were calculated for the algorithm's output directly against the established ground truth, and then compared to the performance of the MIM Maestro atlas segmentation reference device.
7. The type of ground truth used:
- The ground truth used was expert consensus segmentation, established by trained users (dosimetrists or radiologists) and approved by supervising physicians (radiation oncologists or radiologists).
8. The sample size for the training set:
- Training Set Size: 4061 CT images.
9. How the ground truth for the training set was established:
- The ground-truth segmentations used for both training and validation (test set) were established using the same method: generated by a "trained user (typically, a dosimetrist or radiologist)" that were then "reviewed and approved by a supervising physician (typically, a radiation oncologist or a radiologist) and sent back for re-segmentation and re-review as necessary."
§ 892.2050 Medical image management and processing system.
(a)
Identification. A medical image management and processing system is a device that provides one or more capabilities relating to the review and digital processing of medical images for the purposes of interpretation by a trained practitioner of disease detection, diagnosis, or patient management. The software components may provide advanced or complex image processing functions for image manipulation, enhancement, or quantification that are intended for use in the interpretation and analysis of medical images. Advanced image manipulation functions may include image segmentation, multimodality image registration, or 3D visualization. Complex quantitative functions may include semi-automated measurements or time-series measurements.(b)
Classification. Class II (special controls; voluntary standards—Digital Imaging and Communications in Medicine (DICOM) Std., Joint Photographic Experts Group (JPEG) Std., Society of Motion Picture and Television Engineers (SMPTE) Test Pattern).