Search Results

Trained medical professionals use Contour ProtégéAl as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAl supports the following indications:

· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transferring contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.

· Segmenting anatomical structures across a variety of CT anatomical locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAl+ is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl+ is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

Here's a breakdown of Contour ProtégéAI+'s acceptance criteria and study information, based on the provided text:

Acceptance Criteria and Device Performance

The acceptance criteria for each structure's inclusion in the final models were a combination of statistical tests and user evaluation:

Acceptance Criteria	Reported Device Performance (Contour ProtégéAI+)
Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Atlas).	For most structures, the Contour ProtégéAI+ Dice score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence.
Statistical non-inferiority of the Mean Distance Accuracy (MDA) score compared with the reference predicate (MIM Atlas).	For most structures, the Contour ProtégéAI+ MDA score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence.
Average user evaluation of 2 or higher (on a three-point scale: 1=negligible, 2=moderate, 3=significant time savings).	The "External Evaluation Score" (Table 2) consistently shows scores of 2 or higher across all listed structures, indicating moderate to significant time savings.
(For models as a whole) Statistically non-inferior cumulative Added Path Loss (APL) compared to the reference predicate.	For all 4.2.0 CT models (Thorax, Abdomen, Female Pelvis, SurePlan MRT), equivalence in cumulative APL was demonstrated (Table 3), with Contour ProtégéAI+ showing lower mean APL values than MIM Atlas.
(For localization accuracy) No specific passing criterion, but results are included.	Localization accuracy results (Table 4) are provided as percentages of images successfully localized for both "Relevant FOV" and "Whole Body CT," ranging from 77% to 100% depending on the structure and model.

Note: Cells highlighted in orange in the original document indicate non-demonstrated equivalence (not reproducible in markdown), and cells marked with '**' indicate that equivalence was not demonstrated because the minimum sample size was not met for that contour.

Study Details

Sample size used for the test set and the data provenance:
- Test Set Sample Size: The Contour ProtégéAI+ subject device was evaluated on a pool of 770 images.
- Data Provenance: The images were gathered from 32 institutions. The verification data used for testing is from a set of institutions that are totally disjoint from the datasets used to train each model. Patient demographics for the testing data are: 53.4% female, 31.3% male, 15.3% unknown; 0.3% ages 0-20, 4.7% ages 20-40, 20.9% ages 40-60, 50.0% ages 60+, 24.1% unknown; varying scanner manufacturers (GE, Siemens, Phillips, Toshiba, unknown). The data is retrospective, originating from clinical treatment plans according to the training set description.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document implies that the ground truth for the test set was validated against "original ground-truth contours" when measuring Dice and MDA against MIM Maestro. However, the expert qualifications are explicitly stated for the training set ground truth, which often implies a similar standard for the test set.
- Ground truth (for training/re-segmentation) was established by:
  - Consultants (physicians and dosimetrists) specifically for this purpose, outside of clinical practice.
  - Initial segmentations were reviewed and corrected by radiation oncologists.
  - Final review and correction by qualified staff at MIM Software (MD or licensed dosimetrists).
  - All segmenters and reviewers were instructed to ensure the highest quality training data according to relevant published contouring guidelines.
Adjudication method for the test set:
- The document doesn't explicitly describe a specific adjudication method like "2+1" or "3+1" for the test set ground truth. However, it does state that "Detailed instructions derived from relevant published contouring guidelines were prepared for the dosimetrists. The initial segmentations were then reviewed and corrected by radiation oncologists against the same standards and guidelines. Qualified staff at MIM Software (MD or licensed dosimetrists) then performed a final review and correction." This process implies a multi-expert review and correction process to establish the ground truth used for both training and evaluation, ensuring a high standard of accuracy.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- A direct MRMC comparative effectiveness study measuring human readers' improvement with AI versus without AI assistance (i.e., human-in-the-loop performance) is not explicitly described in terms of effect size.
- Instead, the study evaluates the standalone performance of the AI device (Contour ProtégéAI+) against a reference device (MIM Maestro atlas segmentation) and user evaluation of time savings.
- The "Average user evaluation of 2 or higher" on a three-point scale (1=negligible, 2=moderate, 3=significant time savings) provides qualitative evidence of perceived improvement in workflow rather than a quantitative measure of diagnostic accuracy improvement due to AI assistance. "Preliminary user evaluation conducted as part of testing demonstrated that Contour ProtégéAI+ yields comparable time-saving functionality when creating contours as other commercially available automatic segmentation products."
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance evaluation was conducted. The primary comparisons for Dice score, MDA, and cumulative APL are between the Contour ProtégéAI+ algorithm's output and the ground truth, benchmarked against the predicate device's (MIM Maestro atlas segmentation) standalone performance. The results in Table 2 and Table 3 directly show the algorithm's performance.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert Consensus Contour (and review): The ground truth was established by expert re-segmentation of images (by consultants, physicians, and dosimetrists) specifically for this purpose, reviewed and corrected by radiation oncologists, and then subjected to a final review and correction by qualified MIM Software staff (MD or licensed dosimetrists). This indicates a robust expert consensus process based on established clinical guidelines.
The sample size for the training set:
- The document states that the CT images for the "training set were obtained from clinical treatment plans for patients prescribed external beam or molecular radiotherapy". However, it does not provide a specific numerical sample size for the training set, only for the test set (770 images). It only mentions being "re-segmented by consultants... specifically for this purpose".
How the ground truth for the training set was established:
- The ground truth for the training set was established through a multi-step expert process:
  - CT images from clinical treatment plans were re-segmented by consultants (physicians and dosimetrists), explicitly for the purpose of creating training data, outside of clinical practice.
  - Detailed instructions from relevant published contouring guidelines were provided to the dosimetrists.
  - Initial segmentations were reviewed and corrected by radiation oncologists against the same standards and guidelines.
  - A final review and correction was performed by qualified staff at MIM Software (MD or licensed dosimetrists).
  - All experts were instructed to spend additional time to ensure the highest quality training data, contouring all specified OAR structures on all images according to referenced standards.

Ask a Question

Ask a specific question about this device

K Number

K231765

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2023-11-08

(145 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964

Predicate For

K250035

Intended Use

Trained medical professionals use Contour ProtégéAI as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAI supports the following indications:

· Segmenting anatomical structures across a variety of CT anatomic locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAI is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

Here's a detailed breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) summary:

Acceptance Criteria and Reported Device Performance for Contour ProtégéAI

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria Category	Acceptance Criteria	Reported Device Performance (Contour ProtégéAI)
Individual Structure Performance	1. Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Maestro atlas segmentation). 2. Statistical non-inferiority of the MDA score compared with the reference predicate (MIM Maestro atlas segmentation). 3. Average user evaluation score of 2 or higher (on a 3-point scale). A structure is deemed acceptable if it passes two or more of these three tests.	Dice Score: For all reported structures in the Head and Neck, Thorax, and Whole Body - Physiological Uptake Organs CT models, Contour ProtégéAI generally showed higher mean Dice scores (indicating better overlap with ground truth) and often superior lower 95th percentile confidence bounds compared to MIM Atlas. Equivalence (defined as lower 95th percentile confidence bound of ProtégéAI Dice > 0.1 Dice lower than MIM Atlas mean) was demonstrated for most structures, often with direct improvement. MDA Score: For most reported structures, Contour ProtégéAI showed lower mean MDA scores (indicating better boundary accuracy/distance to ground truth) and often superior upper 95th percentile confidence bounds compared to MIM Atlas. Equivalence was demonstrated for most structures, again often with direct improvement. External Evaluation Score: All reported structures achieved an average user evaluation score of 2 or higher (ranging from 2.0 to 3.0), indicating moderate to significant time savings. Overall: The summary states: "Contour ProtégéAl results were equivalent or had better performance than the MIM Maestro atlas segmentation reference device." And "only structures that pass two or more of the following three tests could be included in the final models". This indicates successful performance against the criteria for all included structures.
Model-as-a-Whole Performance	Statistically non-inferior cumulative Added Path Length (APL) compared to the reference predicate.	Cumulative APL (mm): - Head and Neck CT: MIM Atlas: 38.69 ± 33.36; Contour ProtégéAI: 28.61 ± 29.59. Equivalence demonstrated. - Thorax CT: MIM Atlas: 89.24 ± 82.73; Contour ProtégéAI: 65.44 ± 68.85. Equivalence demonstrated. - Whole Body - Physiological Uptake Organs CT: MIM Atlas: 138.06 ± 142.42; Contour ProtégéAI: 98.20 ± 127.11. Equivalence demonstrated. This indicates that Contour ProtégéAI performs with lower or equivalent APL, suggesting less editing time for the entire model.
Localization Accuracy (Informational)	No passing criterion, but results included for user understanding.	Percentage of images successfully localized by Contour ProtégéAI is provided for each structure and model. Most structures show 100% localization accuracy within their relevant FOV for Head and Neck and Thorax models. Some structures (e.g., Cochlea_L/R, OpticChiasm, Pancreas) show slightly lower percentages, indicating instances where the structure was not localized. For Whole Body CT, many structures also show 100%, with a few exceptions (e.g., Bladder: 95%, LN_Iliac: 64%).

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: 754 independent images.
Data Provenance: Gathered from 27 institutions. The document does not explicitly state the countries of origin for the test set, but for the training set, it mentions "across multiple continents" and lists "USA" and "Hong Kong" and "Australia." It is reasonable to infer the test set would also be from diverse institutions/countries. The data is retrospective as it was gathered from existing clinical treatment plans.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

The ground truth for the test set was established by a multi-stage process involving:

Initial Segmentation: Consultants (physicians and dosimetrists).
Review and Correction: A radiation oncologist.
Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).

While the exact number of experts is not specified, it involved multiple individuals with specialized qualifications (physicians, dosimetrists, radiation oncologists, M.D.s, licensed dosimetrists).

4. Adjudication Method for the Test Set

The ground truth generation involved a multi-stage review and correction process:

Initial segmentations by consultants (physicians and dosimetrists).
Review and correction by a radiation oncologist against established standards and guidelines.
Final review and correction by qualified staff at MIM Software (M.D. or licensed dosimetrists).

This indicates a sequential refinement process, potentially similar to a "cascading consensus" or "expert review and correction" rather than a specific numeric adjudication method like 2+1 or 3+1 for resolving disagreements among multiple initial segmenters. The explicit mentioning of "correction" at multiple stages suggests an iterative process where initial segmentations were refined based on expert review.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done

No, a traditional MRMC comparative effectiveness study was not explicitly stated in the provided text in the context of comparing human readers with and without AI assistance to measure an effect size on human performance.

Instead, the study primarily focused on the standalone performance of the AI model (Contour ProtégéAI) compared to an existing atlas-based segmentation system (MIM Maestro) using quantitative metrics (Dice, MDA, APL) and a user evaluation for "time savings functionality." The user evaluation (average score of 2 or higher on a three-point scale for time savings) provides an indirect measure of the AI's utility, but not a direct MRMC study on human reader improvement with AI.

6. If a Standalone Study Was Done

Yes, a standalone study was done.

Contour ProtégéAI (the algorithm under review) was evaluated in comparison to a reference predicate device, MIM Maestro (K071964), which uses an atlas-based segmentation approach.
The comparison involved quantitative metrics like Dice score, MDA, and cumulative APL, as well as a qualitative user evaluation. The goal was to show that Contour ProtégéAI was equivalent or superior in performance to the reference predicate in a standalone capacity.

7. The Type of Ground Truth Used

The ground truth used for the test set was expert consensus / expert-derived segmentation.

It was derived from clinical treatment plans, but the original segmentations were not used.
The images were re-segmented by consultants (physicians and dosimetrists) specifically for this purpose, following detailed clinical contouring guidelines.
These initial segmentations were then reviewed and corrected by a radiation oncologist.
A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
All segmenters were instructed to ensure the "highest quality training data" and contour according to referenced standards.

8. The Sample Size for the Training Set

CT Models: A total of 550 CT images from 41 clinical sites.
The document implies that these 550 images are specifically for the training of the final 4.1.0 neural network models for CT. It does not explicitly state the training set size for MR models if they were separate.

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set was established through a rigorous, multi-stage expert-driven process, identical to the description for the test set ground truth:

Initial Segmentation: Performed by consultants (physicians and dosimetrists) following detailed instructions derived from published clinical contouring guidelines.
Review and Correction: By a radiation oncologist against the same standards and guidelines.
Final Review and Correction: By qualified staff at MIM Software (M.D. or licensed dosimetrists).
- The goal was "to ensure the highest quality training data."
- Segmenters were asked to contour all specified OAR structures on all images according to referenced standards, regardless of proximity to the treatment field.

Ask a Question

Ask a specific question about this device

K Number

K223774

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2023-04-06

(111 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K213976

Predicate For

N/A

Intended Use

· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transfering contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.

· Segmenting anatomical structures across a variety of CT anatomic locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for Contour ProtégéAI, based on the provided document:

Acceptance Criteria and Device Performance Study for Contour ProtégéAI

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria for Contour ProtégéAI were based on a non-inferiority study comparing its segmentation performance (measured by Dice coefficient) to a predicate device, MIM Maestro (K071964), specifically using atlases built from the same training data. The key acceptance criterion was:

Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance.

This translates to being either equivalent to or having better performance than the MIM Maestro atlas segmentation reference device. The acceptance was demonstrated at a p=0.05 significance level.

The table below summarizes the reported mean ± standard deviation Dice coefficients for both the MIM Atlas (predicate) and Contour ProtégéAI, along with the lower 95th percentile confidence bound for Contour ProtégéAI, for various anatomical structures across different CT models (4.0.0 CT Model). The asterisk (*) next to Contour ProtégéAI performance indicates that equivalence was demonstrated at p=0.05.

Note: The document presents a single large table for all structures and models. For clarity, a few representative examples from each CT Model are extracted below to illustrate the reported performance against the acceptance criteria. The full table from the document should be consulted for comprehensive results.

4.0.0 CT Model:	Structure:	MIM Atlas (Mean ± Std Dice)	Contour ProtégéAI (Mean ± Std Dice, Lower 95th Percentile Bound)	Acceptance Met?
Head and Neck	Bone_Mandible	0.81 ± 0.07	0.85 ± 0.07 (0.82) *	Yes
Head and Neck	Brain	0.97 ± 0.01	0.98 ± 0.01 (0.97) *	Yes
Head and Neck	SpinalCord	0.66 ± 0.14	0.63 ± 0.16 (0.57) *	Yes
Thorax	Esophagus	0.49 ± 0.16	0.70 ± 0.15 (0.65) *	Yes
Thorax	Heart	0.88 ± 0.08	0.90 ± 0.07 (0.88) *	Yes
Thorax	Lung_L	0.95 ± 0.02	0.96 ± 0.02 (0.96) *	Yes
Abdomen	Bladder	0.72 ± 0.23	0.91 ± 0.12 (0.81) *	Yes
Abdomen	Liver	0.84 ± 0.12	0.92 ± 0.08 (0.86) *	Yes
Pelvis	Prostate	0.74 ± 0.12	0.85 ± 0.06 (0.82) *	Yes
Pelvis	Rectum	0.63 ± 0.18	0.83 ± 0.11 (0.79) *	Yes
SurePlan MRT	Bone	0.76 ± 0.08	0.87 ± 0.05 (0.74) *	Yes
SurePlan MRT	Spleen	0.72 ± 0.10	0.95 ± 0.03 (0.87) *	Yes

2. Sample Size and Data Provenance for the Test Set

Sample Size for Test Set: 819 independent images.
Data Provenance: The images were gathered from 10 institutions. The document explicitly states that the test set institutions are "totally disjoint from the training datasets used to train each model." The countries of origin for the test set are not explicitly detailed, but since the training data included multiple countries (USA, Hong Kong, Australia), it's implied the test set could also be diverse. The data was retrospective clinical data, re-segmented for this specific purpose.

3. Number of Experts and Qualifications for Ground Truth

Number of Experts: The ground truth for the test set was established by "consultants (physicians and dosimetrists)." The exact number is not specified, but it implies a team. These initial segmentations were then "reviewed and corrected by a radiation oncologist." Finally, "Qualified staff at MIM Software (M.D. or licensed dosimetrists) then performed a final review and correction."
Qualifications of Experts:
- Consultants: Physicians and dosimetrists.
- Review and Correction: Radiation oncologist.
- Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).
- All segmenters and reviewers were given "detailed instructions derived from relevant published clinical contouring guidelines" and instructed to ensure the "highest quality training data."

4. Adjudication Method for the Test Set

The adjudication method involved a multi-stage process:

Initial Segmentation: Done by consultants (physicians and dosimetrists).
First Review & Correction: By a radiation oncologist.
Final Review & Correction: By qualified staff (M.D. or licensed dosimetrists) at MIM Software.

This indicates a sequential review process, rather than a specific (e.g., 2+1, 3+1) consensus model among peers at the same stage.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was explicitly described comparing human readers with AI assistance versus without AI assistance. The study focused on the algorithm's standalone performance compared to an atlas-based predicate device, and a preliminary user evaluation for time-saving was mentioned, but not in the context of an MRMC study.

6. Standalone (Algorithm Only) Performance

Yes, a standalone (algorithm only) performance study was conducted. The Dice coefficient results presented in the table demonstrate the performance of the Contour ProtégéAI algorithm compared to the MIM Maestro atlas-based segmentation, without human intervention in the segmentation process being evaluated. The document explicitly states the "performance of both segmentation devices was measured by calculating the Dice score of the novel segmentations with the original ground-truth contours."

7. Type of Ground Truth Used

The ground truth used was expert consensus. It was established by a multi-stage review and correction process involving physicians, dosimetrists, a radiation oncologist, and qualified MIM Software staff who re-segmented images "specifically for this purpose, outside of clinical practice" and were instructed to adhere to "relevant published clinical contouring guidelines."

8. Sample Size for the Training Set

The training set consisted of 326 CT images gathered from 37 clinical sites across multiple countries (USA, Hong Kong, Australia).

9. How the Ground Truth for the Training Set was Established

The ground truth for the training set was established through a rigorous, multi-step expert review process:

CT images (from clinical treatment plans) were re-segmented by consultants (physicians and dosimetrists).
These initial segmentations were then reviewed and corrected by a radiation oncologist against the same standards and guidelines.
A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
All involved in ground truth establishment were given "detailed instructions derived from relevant published clinical contouring guidelines" and were explicitly asked "to spend additional time to ensure the highest quality training data" and to contour all specified structures "according to referenced standards."

Ask a Question

Ask a specific question about this device

K Number

K213976

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2022-02-03

(45 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K210632

Predicate For

K223774

Intended Use

· Segmenting normal structures across a variety of CT anatomical locations.

· And segmenting normal structures of the prostate, seminal vesicles, and urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAl is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

The user is asking for information about the acceptance criteria and study proving the device's performance from the provided FDA 510(k) summary. I will extract the relevant information to answer each point.

Here's a breakdown of the requested information and where to find it in the document:

A table of acceptance criteria and the reported device performance: This will primarily come from the "Testing and Performance Data" section, specifically the table comparing MIM Atlas and Contour ProtégéAI Dice coefficients and the equivalence definition.
Sample sizes used for the test set and the data provenance: Found in the "Testing and Performance Data" section.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts: Found in the "Testing and Performance Data" section.
Adjudication method (e.g. 2+1, 3+1, none) for the test set: Found in the "Testing and Performance Data" section regarding ground truth generation.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance: The document describes a comparison between the AI (Contour ProtégéAI) and an atlas-based segmentation (MIM Maestro reference device), not a human-in-the-loop study with human readers comparing performance with and without AI assistance.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done: The provided data compares the algorithm's performance against a ground truth and an atlas-based reference algorithm. The use of "appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI" implies it's an AI-assisted tool, but the testing itself appears to be an algorithmic comparison.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.): Found in the "Testing and Performance Data" section.
The sample size for the training set: Found in the "Device Description" and "Testing and Performance Data" sections.
How the ground truth for the training set was established: Found in the "Testing and Performance Data" section.

Here's the detailed response based on the provided document:

Acceptance Criteria and Study Proving Device Performance

The study evaluated the performance of Contour ProtégéAI, specifically its new 3.0.0 CT neural network models, by comparing its segmentation accuracy (Dice coefficient) against a reference atlas-based segmentation device, MIM Maestro (K071964).

1. Table of Acceptance Criteria and Reported Device Performance:

Item	Acceptance Criteria	Reported Device Performance and Equivalence
Equivalence	Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance. This means: Contour ProtégéAI_LB95 > MIM_Atlas_Mean - 0.1	"Contour ProtégéAI results were equivalent or had better performance than the MM Maestro atlas segmentation reference device." This was demonstrated at a p=0.05 significance level for all structures. Below is a sample of reported Dice coefficients, where * indicates equivalence demonstrated.*

2. Sample size used for the test set and the data provenance:

Test Set Size: 739 independent images.
Data Provenance: Gathered from 12 institutions. The specific countries for the test set are not explicitly stated, but the training data (from which test subjects were explicitly excluded) was from Australia, France, Hong Kong, and the USA. The data collection was prospective in the sense that the training data explicitly excluded patients from the institutions contributing to the test set, ensuring independence.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Number of Experts: Not explicitly stated as a fixed number.
Qualifications of Experts: Ground truth segmentations were generated by a "trained user (typically, a dosimetrist or radiologist)" and then reviewed and approved by a "supervising physician (typically, a radiation oncologist or a radiologist)."

4. Adjudication method for the test set:

The ground truth generation process involved: initial segmentation by a trained user, followed by review and approval by a supervising physician. If necessary, the data was sent back for re-segmentation and re-review. This constitutes an iterative consensus-building method rather than a strict 2+1 or 3+1 type of adjudication.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

No, an MRMC comparative effectiveness study involving human readers' improvement with AI vs. without AI assistance was not conducted or reported in this summary. The study focused on the standalone algorithmic performance of the AI tool (Contour ProtégéAI) compared to an existing atlas-based automatic segmentation method (MIM Maestro). The device is intended as a "tool to assist" and mandates review/editing by users, but the performance study itself was not a human-in-the-loop clinical trial.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, the primary study reported is a standalone algorithmic performance comparison. The Dice coefficients were calculated for the algorithm's output directly against the established ground truth, and then compared to the performance of the MIM Maestro atlas segmentation reference device.

7. The type of ground truth used:

The ground truth used was expert consensus segmentation, established by trained users (dosimetrists or radiologists) and approved by supervising physicians (radiation oncologists or radiologists).

8. The sample size for the training set:

Training Set Size: 4061 CT images.

9. How the ground truth for the training set was established:

The ground-truth segmentations used for both training and validation (test set) were established using the same method: generated by a "trained user (typically, a dosimetrist or radiologist)" that were then "reviewed and approved by a supervising physician (typically, a radiation oncologist or a radiologist) and sent back for re-segmentation and re-review as necessary."

Ask a Question

Ask a specific question about this device

Page 1 of 1