Search Results

Trained medical professionals use Contour ProtégéAl as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAl supports the following indications:

· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transferring contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.

· Segmenting anatomical structures across a variety of CT anatomical locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAl+ is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl+ is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

Here's a breakdown of Contour ProtégéAI+'s acceptance criteria and study information, based on the provided text:

Acceptance Criteria and Device Performance

The acceptance criteria for each structure's inclusion in the final models were a combination of statistical tests and user evaluation:

Acceptance Criteria	Reported Device Performance (Contour ProtégéAI+)
Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Atlas).	For most structures, the Contour ProtégéAI+ Dice score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence.
Statistical non-inferiority of the Mean Distance Accuracy (MDA) score compared with the reference predicate (MIM Atlas).	For most structures, the Contour ProtégéAI+ MDA score mean and 95th percentile confidence bound were equivalent to or better than the MIM Atlas. Equivalence was defined as the lower 95th percentile confidence bound of Contour ProtégéAI+ being greater than 0.1 Dice lower than the mean MIM Atlas performance. Results are shown in Table 2, with '*' indicating demonstrated equivalence.
Average user evaluation of 2 or higher (on a three-point scale: 1=negligible, 2=moderate, 3=significant time savings).	The "External Evaluation Score" (Table 2) consistently shows scores of 2 or higher across all listed structures, indicating moderate to significant time savings.
(For models as a whole) Statistically non-inferior cumulative Added Path Loss (APL) compared to the reference predicate.	For all 4.2.0 CT models (Thorax, Abdomen, Female Pelvis, SurePlan MRT), equivalence in cumulative APL was demonstrated (Table 3), with Contour ProtégéAI+ showing lower mean APL values than MIM Atlas.
(For localization accuracy) No specific passing criterion, but results are included.	Localization accuracy results (Table 4) are provided as percentages of images successfully localized for both "Relevant FOV" and "Whole Body CT," ranging from 77% to 100% depending on the structure and model.

Note: Cells highlighted in orange in the original document indicate non-demonstrated equivalence (not reproducible in markdown), and cells marked with '**' indicate that equivalence was not demonstrated because the minimum sample size was not met for that contour.

Study Details

Sample size used for the test set and the data provenance:
- Test Set Sample Size: The Contour ProtégéAI+ subject device was evaluated on a pool of 770 images.
- Data Provenance: The images were gathered from 32 institutions. The verification data used for testing is from a set of institutions that are totally disjoint from the datasets used to train each model. Patient demographics for the testing data are: 53.4% female, 31.3% male, 15.3% unknown; 0.3% ages 0-20, 4.7% ages 20-40, 20.9% ages 40-60, 50.0% ages 60+, 24.1% unknown; varying scanner manufacturers (GE, Siemens, Phillips, Toshiba, unknown). The data is retrospective, originating from clinical treatment plans according to the training set description.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document implies that the ground truth for the test set was validated against "original ground-truth contours" when measuring Dice and MDA against MIM Maestro. However, the expert qualifications are explicitly stated for the training set ground truth, which often implies a similar standard for the test set.
- Ground truth (for training/re-segmentation) was established by:
  - Consultants (physicians and dosimetrists) specifically for this purpose, outside of clinical practice.
  - Initial segmentations were reviewed and corrected by radiation oncologists.
  - Final review and correction by qualified staff at MIM Software (MD or licensed dosimetrists).
  - All segmenters and reviewers were instructed to ensure the highest quality training data according to relevant published contouring guidelines.
Adjudication method for the test set:
- The document doesn't explicitly describe a specific adjudication method like "2+1" or "3+1" for the test set ground truth. However, it does state that "Detailed instructions derived from relevant published contouring guidelines were prepared for the dosimetrists. The initial segmentations were then reviewed and corrected by radiation oncologists against the same standards and guidelines. Qualified staff at MIM Software (MD or licensed dosimetrists) then performed a final review and correction." This process implies a multi-expert review and correction process to establish the ground truth used for both training and evaluation, ensuring a high standard of accuracy.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- A direct MRMC comparative effectiveness study measuring human readers' improvement with AI versus without AI assistance (i.e., human-in-the-loop performance) is not explicitly described in terms of effect size.
- Instead, the study evaluates the standalone performance of the AI device (Contour ProtégéAI+) against a reference device (MIM Maestro atlas segmentation) and user evaluation of time savings.
- The "Average user evaluation of 2 or higher" on a three-point scale (1=negligible, 2=moderate, 3=significant time savings) provides qualitative evidence of perceived improvement in workflow rather than a quantitative measure of diagnostic accuracy improvement due to AI assistance. "Preliminary user evaluation conducted as part of testing demonstrated that Contour ProtégéAI+ yields comparable time-saving functionality when creating contours as other commercially available automatic segmentation products."
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance evaluation was conducted. The primary comparisons for Dice score, MDA, and cumulative APL are between the Contour ProtégéAI+ algorithm's output and the ground truth, benchmarked against the predicate device's (MIM Maestro atlas segmentation) standalone performance. The results in Table 2 and Table 3 directly show the algorithm's performance.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert Consensus Contour (and review): The ground truth was established by expert re-segmentation of images (by consultants, physicians, and dosimetrists) specifically for this purpose, reviewed and corrected by radiation oncologists, and then subjected to a final review and correction by qualified MIM Software staff (MD or licensed dosimetrists). This indicates a robust expert consensus process based on established clinical guidelines.
The sample size for the training set:
- The document states that the CT images for the "training set were obtained from clinical treatment plans for patients prescribed external beam or molecular radiotherapy". However, it does not provide a specific numerical sample size for the training set, only for the test set (770 images). It only mentions being "re-segmented by consultants... specifically for this purpose".
How the ground truth for the training set was established:
- The ground truth for the training set was established through a multi-step expert process:
  - CT images from clinical treatment plans were re-segmented by consultants (physicians and dosimetrists), explicitly for the purpose of creating training data, outside of clinical practice.
  - Detailed instructions from relevant published contouring guidelines were provided to the dosimetrists.
  - Initial segmentations were reviewed and corrected by radiation oncologists against the same standards and guidelines.
  - A final review and correction was performed by qualified staff at MIM Software (MD or licensed dosimetrists).
  - All experts were instructed to spend additional time to ensure the highest quality training data, contouring all specified OAR structures on all images according to referenced standards.

Ask a Question

Ask a specific question about this device

K Number

K233998

Device Name

TRAQinform IQ

Manufacturer

AIQ Global, Inc.

Date Cleared

2024-09-05

(262 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K173444

Predicate For

N/A

Intended Use

TRAQinform IQ is a software only device that provides a quantitative TRAQinform Report on lesions identified as Regions of Interest (ROI) in PET/CT DICOM compliant imaging data acquired, interpreted, and reported on per local practice prior to device use.

Clinicians responsible for patient care and for ordering TRAQinform Reports as an adjunct to locally reported image interpretation do not interact directly with the device. Clinicians responsible for local image interpretation do not interact with the device and generate their reporting before and independently of the TRAQinform Report.

The TRAQinform Report is generated by the device manufacturer and signed by a U.S. board certified physician responsible for supervising central report generation and qualified to practice nuclear radiology/medicine. The TRAQinform Report is for use by trained medical professionals including but not limited to oncologists, nuclear radiologists/physicians, medical imaging technologists, dosimetrists, and physicists.

TRAQinform IQ software contains the following functionalities:

Automated matching of ROI between previously performed CT and PET/CT DICOM 3.0 volumetric medical images.
In order to perform automated matching of ROI and quantitative analysis of previously performed CT and PET/CT DICOM 3.0 volumetric medical images, the software initially performs the following functions:
Machine learning skeletal and anatomic structure segmentation.
Threshold-based ROI identification and contouring.
Automated quantitative analysis to assess previously performed CT and PET/CT DICOM 3.0 volumetric medical images, including: change in total volume and density of each identified ROI, and change in Fludeoxyglucose F18 (FDG) tracer uptake of each identified ROI among images.
Generation of images of the anatomy combined with spatial and quantitative information, including computed classification of quantitative FDG ROI changes.

For multi-timepoint quantitative analysis, recommended use is in adult patients 22 years and older with partial or whole-body PET/CT acquired following administration of FDG per approved drug prescribing information and with the second FDG administration separated from the first by a period not to exceed 12 months.

For single-timepoint quantitative analysis, recommended use is in adult patients 22 years and older with partial or whole-body PET/CT following administration of FDG, a PSMA targeted PET drug, or a SSTR-targeted PET drug per approved drug prescribing information.

Discrepancies between TRAQinform IQ and local PET/CT reporting have been investigated and use of TRAQinform IQ has not been established for binary patient level progression or non-progression decisions without multidisciplinary review. Discrepancies between TRAQinform IQ and local PET/CT reporting that could impact patient care should therefore prompt consultation with subject matter experts (for example, in tumor board), with a patientcentered focus on discrepant imaging regions and with blinded or otherwise neutral adjudication regarding interpretation/classification source.

TRAQinform IQ is not intended to diagnose any disease, replace the diagnostic procedures for interpretation of CT or PET/CT images, recommend any specific treatment, nor is it intended to replace the skill and judgment of a qualified medical professional.

Device Description

TRAQinform IQ is a software only device that provides quantitative analysis of lesions identified as Regions of Interest (ROI) in PET/CT DICOM compliant imaging data acquired, interpreted, and reported on per local practice prior to device use.

The input to TRAQinform IQ is CT and PET/CT images as supported by ACR/NEMA DICOM 3.0.

The following steps are performed by the software:

Automatic threshold-based ROI segmentation:
ROI can also be imported from external sources (other validated tools or manual contouring by qualified medical personnel).
Automatic ROI registration between multiple images:
Images can be from the same or different imaging modality.
Images can be from the same or different PET tracer.
Images can be from the same or different date.
Automatic matching of ROI between multiple, previously performed images.
Automatic quantification of dynamic changes among images including, but not limited to:
Changes in ROI shape.
Single ROI splitting into multiple ROI.
Multiple ROI combining into a single ROI.
ROI appearing, disappearing, and re-appearing across images.
A comprehensive summary analysis.

TRAQinform IQ calculates spatial and quantitative metrics for each individual ROI. These metrics are provided as a TRAQinform Report. TRAQinform IQ uses computational algorithms to detect, fuse and analyze ROI and provides the following outputs:

Identification of anatomic location of ROI in all areas of the body.
A quantitative analysis of functional and anatomic data for CT and PET/CT scans, including:
Volume of all identified ROI on each image;
Change in volume of each identified ROI among images:
Total volume of all identified ROI on each image;
Change in total volume of all identified ROI on each image;
Heterogeneity of change in volume of each identified ROI:
For PET scans:
Tracer uptake (SUVmax, SUVtotal, SUVmean, SUVhetero) of each identified ROI on each image:
Change in tracer uptake (SUVmax, SUVtotal, SUVmean) of each identified ROI among images:
Total tracer uptake (SUVmax, SUVtotal, SUVmean, SUVhetero) of all identified ROI on each image:
Change in total tracer uptake (SUVmax, SUVtotal, SUVmean, SUVhetero) of all identified ROI on each image:
Heterogeneity of change in tracer uptake (SUVhetero) among identified ROI.
For CT scans
Radio density (HUmax, HUtotal, HUmean, HUhetero) for each identified ROI on each image:
Changes in radio density (HUmax. HUtotal. HUmean) of all identified ROI on each image:
Change in total radio density (HUmax, HUtotal, HUmean, HUhetero) of all identified ROI on each image;
Heterogeneity of change in radio density (HUhetero) among identified ROI.
2D graphical renderings of medical images, including Maximum Intensity Projections of the PET and CT, with overlayed and labeled/color-coded ROI, for inclusion in TRAQinform Reports.
3D labeled contours for ROI, anatomic structures, and skeletal structures.

The TRAQinform IQ software operates in a secure cloud environment.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Device Performance Study for TRAQinform IQ

1. Acceptance Criteria and Reported Device Performance

The provided text summarizes performance data from two studies: a "Test-Retest" reliability study and a "Pivotal Reader Study." The acceptance criteria, while not explicitly stated as "acceptance criteria" for regulatory submission, can be inferred from the reported performance measures and context of the studies.

Test-Retest Study: Limits of Repeatability for Quantitative Features

This study established the expected variability of the device's quantitative measurements. The limits of repeatability serve as an informal "acceptance criteria" for the intrinsic variability of the measurements.

Feature	Lower Limit (%)	Upper Limit (%)
SUVmax	-27.0	56.8
SUVmean	-20.2	38.5
SUVtotal	-54.1	144.5
Volume	-52.6	113.9

Reported Device Performance: The table above is the reported device performance for the test-retest study, indicating the interval within which 95% of repeated measurements are expected to lie.

Pivotal Reader Study: Agreement with Expert Panel

This study evaluated the clinical utility of TRAQinform IQ by assessing how well its output, when presented to oncologists, aligned with an expert panel's assessment. The "acceptance criteria" here would be an adequate level of agreement, though specific thresholds are not explicitly defined as pass/fail.

Metric	Reported Device Performance
Overall Percent Agreement (OPA) with panel (95% CI)	41% to 76% (compared to chance performance of 50%)
Positive Percent Agreement (PPA) (oncologists vs. panel for positive progression)	14/18 = 78%
Negative Percent Agreement (NPA) (oncologists vs. panel for negative progression)	5/14 = 36%
Agreement between TRAQinform IQ classification and panel for highlighted ROIs (by ROI classification)	New: 37/53 (70%)Increasing: 18/26 (70%)Unchanged: 11/13 (85%)Decreasing: 5/8 (63%)Disappearing: 10/11 (90%)

Note on Acceptance: The document states that the "performance data demonstrate that the TRAQinform IQ is as safe and effective as the QTxl." This implies that the reported performance metrics were deemed acceptable for substantial equivalence.

2. Sample Sizes and Data Provenance

Test-Retest Study:

Sample Size: 31 patients.
Data Provenance: Patients with non-small cell lung cancer, received two FDG PET/CT images within 1 week pretreatment. No explicit location (e.g., country) is given, but the context of an FDA submission suggests data highly relevant to the US market. The study design (two scans within 1 week) indicates a prospective, controlled data collection for evaluating reliability.

Pivotal Reader Study:

Sample Size: 103 patients, each with two sequential FDG PET/CT scans (total 206 scans).
Data Provenance: Images acquired between 2005 and 2022 from patients scanned at 10 or more imaging centers in at least 3 U.S. states. This indicates retrospective data collection from real-world clinical practice in the USA. Specific scanner information (manufacturers and models) is provided, and 84 patients had scans on the same scanner for baseline and follow-up. Patient demographics (cancer type, sex, age, weight, race) are also detailed.

3. Number of Experts and Qualifications for Ground Truth

Test-Retest Study:

The ground truth for this study was based on the device's own measurements. There is no mention of expert image interpretation being used to establish a ground truth for "limits of repeatability."

Pivotal Reader Study:

Number of Experts: A panel of three experts was used.
Qualifications of Experts: Two radiologists and one oncologist. No specific experience levels (e.g., "10 years of experience") are explicitly given, but their titles (radiologist, oncologist) imply qualified medical professionals in their respective specialties.

4. Adjudication Method for the Test Set

Pivotal Reader Study (for expert panel ground truth):

The data states: "Imaging and local reporting on these 23 + 9 = 32 patients was sent to a panel of two radiologists and one oncologist, together serving as a reference source against which to quantify..." This suggests a consensus-based adjudication method (all three experts together formed the reference source), rather than a majority rule or other multi-reader approach. The document doesn't specify if it was a "2+1" or "3+1" approach, but it implies a collective decision by the panel.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Yes, a form of MRMC study was done. The "Pivotal Reader Study" involved three oncologist "report evaluators" reading cases without and then with the adjunctive TRAQinform Report.
Effect Size of Human Reader Improvement: The study demonstrates how the AI assistance changes human reader interpretations.
- 23 patients initially classified as "negative for progression" by oncologists without the device were reevaluated to "positive" with the device.
- 9 patients initially classified as "positive for progression" by oncologists without the device were reevaluated to "negative" with the device.
This indicates that the AI report prompted a re-evaluation and change in classification for 32 out of 103 patients (approx. 31%). The "effect size" is the shift in clinical decision, influencing a significant percentage of cases. The PPA and NPA against the expert panel further quantify the agreement (or disagreement) after AI assistance. The key finding is the change in oncologist assessment, even if not directly framed as an "improvement" in accuracy by the text itself, but rather as influencing the decision toward what the expert panel considered ground truth.

6. Standalone (Algorithm Only) Performance

Yes, standalone performance aspects were evaluated indirectly.
- The "Test-Retest" study primarily assesses the standalone stability and reproducibility of the algorithm's quantitative measurements (SUVmax, SUVmean, SUVtotal, Volume) without human interaction influencing the values themselves.
- The "agreement... with the device classification" for ROIs highlighted by the report evaluators (70-90% for various ROI changes) is also a measure of the algorithm's performance against the expert panel's assessment. This isn't a full "standalone diagnostic accuracy" but rather an evaluation of the algorithm's output (classification of ROI changes) compared to the panel.
- The text also references "early feasibility testing of device component functionality," with published reporting available for CT anatomy segmentation (Weisman 2023), ROI detection methodology (Perk 2018), and ROI matching (Huff 2023). While not detailed here, these studies would likely have included standalone performance evaluation for those specific algorithmic components.

7. Type of Ground Truth Used

Expert Consensus: For the "Pivotal Reader Study," the primary ground truth for patient-level progression assessment (PPA, NPA, OPA) was established by a panel of two radiologists and one oncologist acting as a "reference source."
Inherent Ground Truth (device's own outputs): For the "Test-Retest" study, the ground truth for repeatability was the device's own measurements across repeated scans, assessed for consistency.

8. Sample Size for the Training Set

The document does not explicitly state the sample size for the training set used for the TRAQinform IQ algorithm.
It does mention that the software uses "Machine learning skeletal and anatomic structure segmentation" (page 4, also page 7, 8). While the exact training dataset size isn't listed, the reference to published papers (Weisman 2023, Perk 2018, Huff 2023) suggests that the underlying machine learning components would have been trained using relevant datasets as described in those publications.

9. How Ground Truth for Training Set was Established

The document does not explicitly describe how the ground truth for the training set was established.
Given the mention of "Machine learning skeletal and anatomic structure segmentation" and "Threshold-based ROI identification and contouring," the ground truth for training these models would typically involve expert annotations of anatomical structures and ROIs on medical images. This would be consistent with standard practices for training medical image segmentation and detection algorithms. The referenced external publications (Weisman 2023, Perk 2018, Huff 2023) would contain details on their specific training methodologies and ground truth establishment.

Ask a Question

Ask a specific question about this device

K Number

K231765

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2023-11-08

(145 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964

Predicate For

K250035

Intended Use

Trained medical professionals use Contour ProtégéAI as a tool to assist in the automated processing of digital medical images of modalities CT and MR, as supported by ACR/NEMA DICOM 3.0. In addition, Contour ProtégéAI supports the following indications:

· Segmenting anatomical structures across a variety of CT anatomic locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAI is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

Here's a detailed breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) summary:

Acceptance Criteria and Reported Device Performance for Contour ProtégéAI

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria Category	Acceptance Criteria	Reported Device Performance (Contour ProtégéAI)
Individual Structure Performance	1. Statistical non-inferiority of the Dice score compared with the reference predicate (MIM Maestro atlas segmentation). 2. Statistical non-inferiority of the MDA score compared with the reference predicate (MIM Maestro atlas segmentation). 3. Average user evaluation score of 2 or higher (on a 3-point scale). A structure is deemed acceptable if it passes two or more of these three tests.	Dice Score: For all reported structures in the Head and Neck, Thorax, and Whole Body - Physiological Uptake Organs CT models, Contour ProtégéAI generally showed higher mean Dice scores (indicating better overlap with ground truth) and often superior lower 95th percentile confidence bounds compared to MIM Atlas. Equivalence (defined as lower 95th percentile confidence bound of ProtégéAI Dice > 0.1 Dice lower than MIM Atlas mean) was demonstrated for most structures, often with direct improvement. MDA Score: For most reported structures, Contour ProtégéAI showed lower mean MDA scores (indicating better boundary accuracy/distance to ground truth) and often superior upper 95th percentile confidence bounds compared to MIM Atlas. Equivalence was demonstrated for most structures, again often with direct improvement. External Evaluation Score: All reported structures achieved an average user evaluation score of 2 or higher (ranging from 2.0 to 3.0), indicating moderate to significant time savings. Overall: The summary states: "Contour ProtégéAl results were equivalent or had better performance than the MIM Maestro atlas segmentation reference device." And "only structures that pass two or more of the following three tests could be included in the final models". This indicates successful performance against the criteria for all included structures.
Model-as-a-Whole Performance	Statistically non-inferior cumulative Added Path Length (APL) compared to the reference predicate.	Cumulative APL (mm): - Head and Neck CT: MIM Atlas: 38.69 ± 33.36; Contour ProtégéAI: 28.61 ± 29.59. Equivalence demonstrated. - Thorax CT: MIM Atlas: 89.24 ± 82.73; Contour ProtégéAI: 65.44 ± 68.85. Equivalence demonstrated. - Whole Body - Physiological Uptake Organs CT: MIM Atlas: 138.06 ± 142.42; Contour ProtégéAI: 98.20 ± 127.11. Equivalence demonstrated. This indicates that Contour ProtégéAI performs with lower or equivalent APL, suggesting less editing time for the entire model.
Localization Accuracy (Informational)	No passing criterion, but results included for user understanding.	Percentage of images successfully localized by Contour ProtégéAI is provided for each structure and model. Most structures show 100% localization accuracy within their relevant FOV for Head and Neck and Thorax models. Some structures (e.g., Cochlea_L/R, OpticChiasm, Pancreas) show slightly lower percentages, indicating instances where the structure was not localized. For Whole Body CT, many structures also show 100%, with a few exceptions (e.g., Bladder: 95%, LN_Iliac: 64%).

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: 754 independent images.
Data Provenance: Gathered from 27 institutions. The document does not explicitly state the countries of origin for the test set, but for the training set, it mentions "across multiple continents" and lists "USA" and "Hong Kong" and "Australia." It is reasonable to infer the test set would also be from diverse institutions/countries. The data is retrospective as it was gathered from existing clinical treatment plans.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

The ground truth for the test set was established by a multi-stage process involving:

Initial Segmentation: Consultants (physicians and dosimetrists).
Review and Correction: A radiation oncologist.
Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).

While the exact number of experts is not specified, it involved multiple individuals with specialized qualifications (physicians, dosimetrists, radiation oncologists, M.D.s, licensed dosimetrists).

4. Adjudication Method for the Test Set

The ground truth generation involved a multi-stage review and correction process:

Initial segmentations by consultants (physicians and dosimetrists).
Review and correction by a radiation oncologist against established standards and guidelines.
Final review and correction by qualified staff at MIM Software (M.D. or licensed dosimetrists).

This indicates a sequential refinement process, potentially similar to a "cascading consensus" or "expert review and correction" rather than a specific numeric adjudication method like 2+1 or 3+1 for resolving disagreements among multiple initial segmenters. The explicit mentioning of "correction" at multiple stages suggests an iterative process where initial segmentations were refined based on expert review.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done

No, a traditional MRMC comparative effectiveness study was not explicitly stated in the provided text in the context of comparing human readers with and without AI assistance to measure an effect size on human performance.

Instead, the study primarily focused on the standalone performance of the AI model (Contour ProtégéAI) compared to an existing atlas-based segmentation system (MIM Maestro) using quantitative metrics (Dice, MDA, APL) and a user evaluation for "time savings functionality." The user evaluation (average score of 2 or higher on a three-point scale for time savings) provides an indirect measure of the AI's utility, but not a direct MRMC study on human reader improvement with AI.

6. If a Standalone Study Was Done

Yes, a standalone study was done.

Contour ProtégéAI (the algorithm under review) was evaluated in comparison to a reference predicate device, MIM Maestro (K071964), which uses an atlas-based segmentation approach.
The comparison involved quantitative metrics like Dice score, MDA, and cumulative APL, as well as a qualitative user evaluation. The goal was to show that Contour ProtégéAI was equivalent or superior in performance to the reference predicate in a standalone capacity.

7. The Type of Ground Truth Used

The ground truth used for the test set was expert consensus / expert-derived segmentation.

It was derived from clinical treatment plans, but the original segmentations were not used.
The images were re-segmented by consultants (physicians and dosimetrists) specifically for this purpose, following detailed clinical contouring guidelines.
These initial segmentations were then reviewed and corrected by a radiation oncologist.
A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
All segmenters were instructed to ensure the "highest quality training data" and contour according to referenced standards.

8. The Sample Size for the Training Set

CT Models: A total of 550 CT images from 41 clinical sites.
The document implies that these 550 images are specifically for the training of the final 4.1.0 neural network models for CT. It does not explicitly state the training set size for MR models if they were separate.

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set was established through a rigorous, multi-stage expert-driven process, identical to the description for the test set ground truth:

Initial Segmentation: Performed by consultants (physicians and dosimetrists) following detailed instructions derived from published clinical contouring guidelines.
Review and Correction: By a radiation oncologist against the same standards and guidelines.
Final Review and Correction: By qualified staff at MIM Software (M.D. or licensed dosimetrists).
- The goal was "to ensure the highest quality training data."
- Segmenters were asked to contour all specified OAR structures on all images according to referenced standards, regardless of proximity to the treatment field.

Ask a Question

Ask a specific question about this device

K Number

K223774

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2023-04-06

(111 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K213976

Predicate For

N/A

Intended Use

· Creation of contours using machine-learning algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transfering contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.

· Segmenting anatomical structures across a variety of CT anatomic locations.

· And segmenting the prostate, the seminal vesicles, and the urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for Contour ProtégéAI, based on the provided document:

Acceptance Criteria and Device Performance Study for Contour ProtégéAI

1. Table of Acceptance Criteria and Reported Device Performance

The acceptance criteria for Contour ProtégéAI were based on a non-inferiority study comparing its segmentation performance (measured by Dice coefficient) to a predicate device, MIM Maestro (K071964), specifically using atlases built from the same training data. The key acceptance criterion was:

Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance.

This translates to being either equivalent to or having better performance than the MIM Maestro atlas segmentation reference device. The acceptance was demonstrated at a p=0.05 significance level.

The table below summarizes the reported mean ± standard deviation Dice coefficients for both the MIM Atlas (predicate) and Contour ProtégéAI, along with the lower 95th percentile confidence bound for Contour ProtégéAI, for various anatomical structures across different CT models (4.0.0 CT Model). The asterisk (*) next to Contour ProtégéAI performance indicates that equivalence was demonstrated at p=0.05.

Note: The document presents a single large table for all structures and models. For clarity, a few representative examples from each CT Model are extracted below to illustrate the reported performance against the acceptance criteria. The full table from the document should be consulted for comprehensive results.

4.0.0 CT Model:	Structure:	MIM Atlas (Mean ± Std Dice)	Contour ProtégéAI (Mean ± Std Dice, Lower 95th Percentile Bound)	Acceptance Met?
Head and Neck	Bone_Mandible	0.81 ± 0.07	0.85 ± 0.07 (0.82) *	Yes
Head and Neck	Brain	0.97 ± 0.01	0.98 ± 0.01 (0.97) *	Yes
Head and Neck	SpinalCord	0.66 ± 0.14	0.63 ± 0.16 (0.57) *	Yes
Thorax	Esophagus	0.49 ± 0.16	0.70 ± 0.15 (0.65) *	Yes
Thorax	Heart	0.88 ± 0.08	0.90 ± 0.07 (0.88) *	Yes
Thorax	Lung_L	0.95 ± 0.02	0.96 ± 0.02 (0.96) *	Yes
Abdomen	Bladder	0.72 ± 0.23	0.91 ± 0.12 (0.81) *	Yes
Abdomen	Liver	0.84 ± 0.12	0.92 ± 0.08 (0.86) *	Yes
Pelvis	Prostate	0.74 ± 0.12	0.85 ± 0.06 (0.82) *	Yes
Pelvis	Rectum	0.63 ± 0.18	0.83 ± 0.11 (0.79) *	Yes
SurePlan MRT	Bone	0.76 ± 0.08	0.87 ± 0.05 (0.74) *	Yes
SurePlan MRT	Spleen	0.72 ± 0.10	0.95 ± 0.03 (0.87) *	Yes

2. Sample Size and Data Provenance for the Test Set

Sample Size for Test Set: 819 independent images.
Data Provenance: The images were gathered from 10 institutions. The document explicitly states that the test set institutions are "totally disjoint from the training datasets used to train each model." The countries of origin for the test set are not explicitly detailed, but since the training data included multiple countries (USA, Hong Kong, Australia), it's implied the test set could also be diverse. The data was retrospective clinical data, re-segmented for this specific purpose.

3. Number of Experts and Qualifications for Ground Truth

Number of Experts: The ground truth for the test set was established by "consultants (physicians and dosimetrists)." The exact number is not specified, but it implies a team. These initial segmentations were then "reviewed and corrected by a radiation oncologist." Finally, "Qualified staff at MIM Software (M.D. or licensed dosimetrists) then performed a final review and correction."
Qualifications of Experts:
- Consultants: Physicians and dosimetrists.
- Review and Correction: Radiation oncologist.
- Final Review and Correction: Qualified staff at MIM Software (M.D. or licensed dosimetrists).
- All segmenters and reviewers were given "detailed instructions derived from relevant published clinical contouring guidelines" and instructed to ensure the "highest quality training data."

4. Adjudication Method for the Test Set

The adjudication method involved a multi-stage process:

Initial Segmentation: Done by consultants (physicians and dosimetrists).
First Review & Correction: By a radiation oncologist.
Final Review & Correction: By qualified staff (M.D. or licensed dosimetrists) at MIM Software.

This indicates a sequential review process, rather than a specific (e.g., 2+1, 3+1) consensus model among peers at the same stage.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No MRMC comparative effectiveness study was explicitly described comparing human readers with AI assistance versus without AI assistance. The study focused on the algorithm's standalone performance compared to an atlas-based predicate device, and a preliminary user evaluation for time-saving was mentioned, but not in the context of an MRMC study.

6. Standalone (Algorithm Only) Performance

Yes, a standalone (algorithm only) performance study was conducted. The Dice coefficient results presented in the table demonstrate the performance of the Contour ProtégéAI algorithm compared to the MIM Maestro atlas-based segmentation, without human intervention in the segmentation process being evaluated. The document explicitly states the "performance of both segmentation devices was measured by calculating the Dice score of the novel segmentations with the original ground-truth contours."

7. Type of Ground Truth Used

The ground truth used was expert consensus. It was established by a multi-stage review and correction process involving physicians, dosimetrists, a radiation oncologist, and qualified MIM Software staff who re-segmented images "specifically for this purpose, outside of clinical practice" and were instructed to adhere to "relevant published clinical contouring guidelines."

8. Sample Size for the Training Set

The training set consisted of 326 CT images gathered from 37 clinical sites across multiple countries (USA, Hong Kong, Australia).

9. How the Ground Truth for the Training Set was Established

The ground truth for the training set was established through a rigorous, multi-step expert review process:

CT images (from clinical treatment plans) were re-segmented by consultants (physicians and dosimetrists).
These initial segmentations were then reviewed and corrected by a radiation oncologist against the same standards and guidelines.
A final review and correction was performed by qualified staff at MIM Software (M.D. or licensed dosimetrists).
All involved in ground truth establishment were given "detailed instructions derived from relevant published clinical contouring guidelines" and were explicitly asked "to spend additional time to ensure the highest quality training data" and to contour all specified structures "according to referenced standards."

Ask a Question

Ask a specific question about this device

K Number

K220813

Device Name

ART-PLAN

Manufacturer

TheraPanacea

Date Cleared

2022-06-17

(88 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K210632,K071964,K182888,K193109,K173635,K202700

Predicate For

K230023

Intended Use

ART-Plan is indicated for cancer patients for whom radiation treatment has been planned. It is intended to be used by trained medical professionals including, but not limited to, radiation oncologists, dosimetrists, and medical physicists.

ART-Plan is a software application intended to display and visualize 3D multi-modal medical image data. The user may import, define, display, transform and store DICOM3.0 compliant datasets (including regions of interest structures). These images, contours and objects can subsequently be exported/distributed within the system, across computer networks and/or to radiation treatment planning systems. Supported modalities include CT, PET-CT, CBCT, 4D-CT and MR images.

ART-Plan supports Al-based contouring on CT and MR images and offers semi-automatic and manual tools for segmentation.

To help the user assess changes in image data and to obtain combined multi-modal image information, ART-Plan allows the registration of anatomical and functional images and display of fused and non-fused images to facilitate the comparison of patient image data by the user.

With ART-Plan, users are also able to generate, visualize, evaluate and modify pseudo-CT from MRI images.

Device Description

The ART-Plan application is comprised of two key modules: SmartFuse and Annotate, allowing the user to display and visualize 3D multi-modal medical image data. The user may process, render, review, store, display and distribute DICOM 3.0 compliant datasets within the system and/or across computer networks. Supported modalities cover static and gated CT (computerized tomography including CBCT and 4D-CT), PET (positron emission tomography) and MR (magnetic resonance).

Compared to ART-Plan v1.6.1 (primary predicate), the following additional features have been added to ART-Plan v1.10.0:

· an improved version of the existing automatic segmentation tool
· automatic segmentation on more anatomies and organ-at-risk
image registration on 4D-CT and CBCT images .
automatic segmentation on MR images .
· generate synthetic CT from MR images
a cloud-based deployment

The ART-Plan technical functionalities claimed by TheraPanacea are the following:

. Proposing automatic solutions to the user, such as an automatic delineation, automatic multimodal image fusion, etc. towards improving standardization of processes/ performance / reducing user tedious / time consuming involvement.
. Offering to the user a set of tools to assist semi-automatic delineation, semi-automatic registration towards modifying/editing manually automatically generated structures and adding/removing new/undesired structures or imposing user-provided correspondences constraints on the fusion of multimodal images.
. Presenting to the user a set of visualization methods of the delineated structures, and registration fusion maps.
. Saving the delineated structures / fusion results for use in the dosimetry process.
. Enabling rigid and deformable registration of patients images sets to combine information contained in different or same modalities.
Allowing the users to generate, visualize, evaluate and modify pseudo-CT from MRI images.

ART-Plan offers deep-learning based automatic segmentation for the following localizations:

head and neck (on CT images) .
. thorax/breast (for male/female and on CT images)
abdomen (on CT images and MR images) ●
. pelvis male(on CT images and MR images)
. pelvis female (on CT images)
brain (on CT images and MR images)

ART-Plan offers deep-learning based synthetic CT-generation from MR images for the following localizations:

pelvis male .
brain

AI/ML Overview

Here's a summary of the acceptance criteria and study details for the ART-Plan device, extracting information from the provided text:

Acceptance Criteria and Device Performance

Criterion Category	Acceptance Criteria	Reported Device Performance
Auto-segmentation - Dice Similarity Coefficient (DSC)	DSC (mean) ≥ 0.8 (AAPM standard) OR DSC (mean) ≥ 0.54 or DSC (mean) ≥ mean(DSC inter-expert) + 5% (inter-expert variability)	Multiple tests passed demonstrating acceptable contours, exceeding AAPM standards in some cases (e.g., Abdo MRI auto-segmentation), and meeting or exceeding inter-expert variability for others (e.g., Brain MR, Pelvis MRI). For Brain MRI, initially some organs did not meet 0.8 but eventually passed with further improvements and re-evaluation against inter-expert variability. All organs for all anatomies met at least one acceptance criterion.
Auto-segmentation - Qualitative Evaluation	Clinicians' qualitative evaluation of auto-segmentation is considered acceptable for clinical use without modifications (A) or with minor modifications/corrections (B), with A+B % ≥ 85%.	For all tested organs and anatomies, the qualitative evaluation resulted in A+B % ≥ 85%, indicating that clinicians found the contours acceptable for clinical use with minor or no modifications. For example, Pelvis Truefisp model achieved ≥ 85% A or B, and H&N Lymph nodes also met this.
Synthetic-CT Generation	A median 2%/2mm gamma passing criteria of ≥ 95% OR A median 3%/3mm gamma passing criteria of ≥ 99.0% OR A mean dose deviation (pseudo-CT compared to standard CT) of ≤ 2% in ≥ 88% of patients.	For both pelvis and brain synthetic-CT, the performance met these acceptance criteria and demonstrated non-inferiority to previously cleared devices.
Fusion Performance	Not explicitly stated with numerical thresholds, but evaluated qualitatively.	Both rigid and deformable fusion algorithms provided clinically acceptable results for major clinical use cases in radiotherapy workflows, receiving "Passed" in all relevant studies.

Study Details

Sample Size used for the test set and the data provenance:
- Test Set Sample Size: The exact number of patients in the test set is not explicitly given as a single number but is stated that for structures of a given anatomy and modality, two non-overlapping datasets were separated: test patients and train data. The number of test patients was "selected based on thorough literature review and statistical power."
- Data Provenance: Real-world retrospective data, initially used for treatment of cancer patients. Pseudo-anonymized by the centers providing data before transfer. Data was sourced from both non-US and US populations.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Varies. For some tests (e.g., Abdo MRI auto-segmentation, Brain MRI autosegmentation, Pelvis MRI auto-segmentation), at least 3 different experts were involved for inter-expert variability calculations. For the qualitative evaluations, it implies multiple clinicians or medical physicists.
- Qualifications of Experts: Clinical experts, medical physicists (for validation of usability and performance tests) with expertise level comparable to a junior US medical physicist and responsibilities in the radiotherapy clinical workflow.
Adjudication method for the test set:
- The document describes a "truthing process [that] includes a mix of data created by different delineators (clinical experts) and assessment of intervariability, ground truth contours provided by the centers and validated by a second expert of the center, and qualitative evaluation and validation of the contours." This suggests a multi-reader approach, potentially with consensus or an adjudicator for ground truth, but a specific "2+1" or "3+1" method is not detailed. The "inter-expert variability" calculation implies direct comparison between multiple experts' delineations of the same cases.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- A direct MRMC comparative effectiveness study with human readers improving with AI vs without AI assistance is not explicitly described in the provided text. The studies focus on the standalone performance of the AI algorithm against established criteria (AAPM, inter-expert variability, qualitative acceptance) and non-inferiority to other cleared devices.
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance evaluation of the algorithm was done. The acceptance criteria and performance data are entirely based on the algorithm's output (e.g., DSC, gamma passing criteria, dose deviation) compared to ground truth or existing standards, and qualitative assessment by experts of the algorithm's generated contours.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The ground truth used primarily involved:
  - Expert Consensus/Delineation: Contours created by different clinical experts and assessed for inter-variability.
  - Validated Ground Truth Contours: Contours provided by the centers and validated by a second expert from the same center.
  - Qualitative Evaluation: Clinical review and validation of contours.
  - Dosimetric Measures: For synthetic-CT; comparison to standard CT dose calculations.
The sample size for the training set:
- Training Patients: 8,736 patients.
- Training Samples (Images/Anatomies/Structures): 299,142 samples. (One patient can have multiple images, and each image multiple delineated structures).
How the ground truth for the training set was established:
- "The contouring guidelines followed to produce the contours were confirmed with the centers which provided the data. Our truthing process includes a mix of data created by different delineators (clinical experts) and assessment of intervariability, ground truth contours provided by the centers and validated by a second expert of the center, and qualitative evaluation and validation of the contours." This indicates that the ground truth for the training set was established through a combination of expert delineation, internal validation by a second expert, adherence to established guidelines, and assessment of variability among experts.

Ask a Question

Ask a specific question about this device

K Number

K213976

Device Name

Contour ProtégéAI

Manufacturer

MIM Software Inc.

Date Cleared

2022-02-03

(45 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K210632

Predicate For

K223774

Intended Use

· Segmenting normal structures across a variety of CT anatomical locations.

· And segmenting normal structures of the prostate, seminal vesicles, and urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

Contour ProtégéAl is an accessory to MIM software that automatically creates contours on medical images through the use of machine-learning algorithms. It is designed for use in the processing of medical images and operates on Windows, Mac, and Linux computer systems. Contour ProtégéAl is deployed on a remote server using the MIMcloud service for data management and transfer; or locally on the workstation or server running MIM software.

AI/ML Overview

The user is asking for information about the acceptance criteria and study proving the device's performance from the provided FDA 510(k) summary. I will extract the relevant information to answer each point.

Here's a breakdown of the requested information and where to find it in the document:

A table of acceptance criteria and the reported device performance: This will primarily come from the "Testing and Performance Data" section, specifically the table comparing MIM Atlas and Contour ProtégéAI Dice coefficients and the equivalence definition.
Sample sizes used for the test set and the data provenance: Found in the "Testing and Performance Data" section.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts: Found in the "Testing and Performance Data" section.
Adjudication method (e.g. 2+1, 3+1, none) for the test set: Found in the "Testing and Performance Data" section regarding ground truth generation.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance: The document describes a comparison between the AI (Contour ProtégéAI) and an atlas-based segmentation (MIM Maestro reference device), not a human-in-the-loop study with human readers comparing performance with and without AI assistance.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done: The provided data compares the algorithm's performance against a ground truth and an atlas-based reference algorithm. The use of "appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI" implies it's an AI-assisted tool, but the testing itself appears to be an algorithmic comparison.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.): Found in the "Testing and Performance Data" section.
The sample size for the training set: Found in the "Device Description" and "Testing and Performance Data" sections.
How the ground truth for the training set was established: Found in the "Testing and Performance Data" section.

Here's the detailed response based on the provided document:

Acceptance Criteria and Study Proving Device Performance

The study evaluated the performance of Contour ProtégéAI, specifically its new 3.0.0 CT neural network models, by comparing its segmentation accuracy (Dice coefficient) against a reference atlas-based segmentation device, MIM Maestro (K071964).

1. Table of Acceptance Criteria and Reported Device Performance:

Item	Acceptance Criteria	Reported Device Performance and Equivalence
Equivalence	Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM atlas segmentation reference device performance. This means: Contour ProtégéAI_LB95 > MIM_Atlas_Mean - 0.1	"Contour ProtégéAI results were equivalent or had better performance than the MM Maestro atlas segmentation reference device." This was demonstrated at a p=0.05 significance level for all structures. Below is a sample of reported Dice coefficients, where * indicates equivalence demonstrated.*

2. Sample size used for the test set and the data provenance:

Test Set Size: 739 independent images.
Data Provenance: Gathered from 12 institutions. The specific countries for the test set are not explicitly stated, but the training data (from which test subjects were explicitly excluded) was from Australia, France, Hong Kong, and the USA. The data collection was prospective in the sense that the training data explicitly excluded patients from the institutions contributing to the test set, ensuring independence.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Number of Experts: Not explicitly stated as a fixed number.
Qualifications of Experts: Ground truth segmentations were generated by a "trained user (typically, a dosimetrist or radiologist)" and then reviewed and approved by a "supervising physician (typically, a radiation oncologist or a radiologist)."

4. Adjudication method for the test set:

The ground truth generation process involved: initial segmentation by a trained user, followed by review and approval by a supervising physician. If necessary, the data was sent back for re-segmentation and re-review. This constitutes an iterative consensus-building method rather than a strict 2+1 or 3+1 type of adjudication.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

No, an MRMC comparative effectiveness study involving human readers' improvement with AI vs. without AI assistance was not conducted or reported in this summary. The study focused on the standalone algorithmic performance of the AI tool (Contour ProtégéAI) compared to an existing atlas-based automatic segmentation method (MIM Maestro). The device is intended as a "tool to assist" and mandates review/editing by users, but the performance study itself was not a human-in-the-loop clinical trial.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, the primary study reported is a standalone algorithmic performance comparison. The Dice coefficients were calculated for the algorithm's output directly against the established ground truth, and then compared to the performance of the MIM Maestro atlas segmentation reference device.

7. The type of ground truth used:

The ground truth used was expert consensus segmentation, established by trained users (dosimetrists or radiologists) and approved by supervising physicians (radiation oncologists or radiologists).

8. The sample size for the training set:

Training Set Size: 4061 CT images.

9. How the ground truth for the training set was established:

The ground-truth segmentations used for both training and validation (test set) were established using the same method: generated by a "trained user (typically, a dosimetrist or radiologist)" that were then "reviewed and approved by a supervising physician (typically, a radiation oncologist or a radiologist) and sent back for re-segmentation and re-review as necessary."

Ask a Question

Ask a specific question about this device

K Number

K210632

Device Name

Contour ProtegeAI

Manufacturer

MIM Software Inc.

Date Cleared

2021-10-20

(231 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K193252

Predicate For

K213976

Intended Use

· Creation of contours using maching algorithms for applications including, but not limited to, quantitative analysis, aiding adaptive therapy, transferring contours to radiation therapy treatment planning systems, and archiving contours for patient follow-up and management.

· Segmenting normal structures across a variety of CT anatomical locations.

· And segmenting normal structures of the prostate, seminal vesicles, and urethra within T2-weighted MR images.

Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI.

Device Description

AI/ML Overview

The provided text outlines the 510(k) summary for Contour ProtégéAI, but it primarily focuses on establishing substantial equivalence to predicate devices and does not detail specific acceptance criteria or a comprehensive study report with numerical performance metrics against those criteria. The information provided is more about the regulatory submission process and general claims of equivalence rather than a detailed breakdown of a validation study.

However, based on the limited information regarding "Testing and Performance Data" (page 9), I can infer some aspects and highlight what is missing.

Here's an attempt to describe the acceptance criteria and study proving the device meets them, based on the provided text, while also pointing out the lack of detailed numerical results for the acceptance criteria.

Acceptance Criteria and Device Performance Study for Contour ProtégéAI

The provided 510(k) summary for Contour ProtégéAI states that "Equivalence is defined such that the lower 95th percentile confidence bound of the Contour ProtégéAI segmentation is greater than 0.1 Dice lower than the mean MIM Maestro atlas segmentation reference device performance." This statement defines the non-inferiority acceptance criterion used to compare Contour ProtégéAI against a reference device (MIM Maestro) rather than setting absolute performance thresholds for the contours themselves.

1. Table of Acceptance Criteria and Reported Device Performance

Acceptance Criteria (Inferred from Text)	Reported Device Performance
For each structure of each neural network model, the lower 95th percentile confidence bound of the Contour ProtégéAI Dice coefficient must be greater than 0.1 Dice lower than the mean Dice coefficient of the MIM Maestro atlas segmentation reference device.	Stated Outcome: "Contour ProtégéAI results were equivalent or had better performance than the MIM atlas segmentation reference device."
Specific numerical performance for each structure (Dice Coefficient)	Not provided in the document. The document states a qualitative conclusion of "equivalent or better performance" without the actual mean Dice coefficients or 95th percentile bounds for either Contour ProtégéAI or MIM Maestro.

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: The document implies that the "test subjects" were used for evaluation, but the specific number of cases or patients in the test set is not explicitly stated.
Data Provenance: The text mentions that neural network models were trained on data that "did not include any patients from the same institution as the test subjects." This implies that the test set data originated from institutions different from the training data, suggesting a form of independent validation. The countries of origin for the data are not specified. The text indicates the study was retrospective as it involved evaluating pre-existing patient data.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications of Those Experts

The document states that "multiple atlases were created over the test subjects" for the MIM Maestro reference device. It does not explicitly state how the ground truth for the test set was established for Contour ProtégéAI's evaluation results. Instead, it refers to the MIM Maestro's performance as a reference. There is no information provided on the number or qualifications of experts who established any ground truth used in this comparison.

4. Adjudication Method for the Test Set

The document does not describe any specific adjudication method (e.g., 2+1, 3+1) for establishing ground truth or evaluating the test set. It mentions the "leave-one-out analysis" for creating atlases for MIM Maestro, which is a method of data splitting/resampling, not an adjudication process.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was an MRMC study done? Based on the provided text, there is no indication that a multi-reader multi-case (MRMC) comparative effectiveness study was conducted to evaluate how much human readers improve with AI vs. without AI assistance. The study described focuses on the comparison of the algorithm's performance (Contour ProtégéAI) against an existing atlas-based segmentation method (MIM Maestro).

6. Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study

Was a standalone study done? Yes, the described study appears to be a standalone (algorithm-only) performance evaluation. The comparison is between the Contour ProtégéAI algorithm's output and the MIM Maestro atlas segmentation reference device, with Dice coefficients calculated directly from these automated segmentations. The "Indications for Use" explicitly state: "Appropriate image visualization software must be used to review and, if necessary, edit results automatically generated by Contour ProtégéAI," implying that human modification is expected in clinical use, but the reported study does not include this human-in-the-loop performance.

7. Type of Ground Truth Used

The "ground truth" for the comparison appears to be the segmentation contours generated by the MIM Maestro atlas segmentation reference device. The study aims to demonstrate non-inferiority to this existing, cleared technology rather than a human expert-defined anatomical ground truth or pathology/outcomes data.

8. Sample Size for the Training Set

The document mentions a "pool of training data" but the specific sample size for the training set is not provided.

9. How the Ground Truth for the Training Set Was Established

The document states that "neural network models were trained for each modality (CT and MR) on a pool of training data." However, it does not describe how the ground truth (i.e., the "correct" contours) for this training data was established. It refers to the models being trained "on a pool of training data" without detailing the annotation or ground truth generation process for this training data.

Ask a Question

Ask a specific question about this device

K Number

K181498

Device Name

X-ray Imaging System for the McLaren Proton Treatment System

Manufacturer

EhmetDx, LLC

Date Cleared

2018-12-10

(186 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K001276,K071964,K132847

Predicate For

N/A

Intended Use

The X-ray Imaging System is intended to be used as a patient setup and localization tool that is capable of processing orthogonal or CBCT images acquired at the start of the treatment. The images will be compared against Digitally Reconstructed Radiographs or CTs that were used to define the treatment plan. The comparison will generate position correction vectors if required and provide them to the McLaren Proton Treatment System for patient position corrections prior to treatment.

Device Description

The X-ray Imaging System controls the x-ray generation, image processing and patient registration of the McLaren Proton Treatment System (K160063). The software product described by this document is considered a sub-system of the McLaren Proton Treatment System. The software product integrates with the following components of the McLaren Proton Treatment System: . X-ray generator & tube assembly Amorphous silicon flat-panel detector . Electro-mechanical controls of the imaging c-ring to enable system positioning and rotation around the patient The software will permit a radiation technician to set the desired energy levels and field of view. Once set the software can acquire orthogonal or CBCT x-ray images of the region of interest. When images are acquired the software will process them to create viewable images for patient location registration. The patient's position will be adjusted, if needed, based on a comparison of the current patient location, the defined treatment plan, and Digitally Reconstructed Radiographs or CTs.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study information for the EhmetDx X-ray Imaging System (K181498), based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

Feature/Specification	Acceptance Criteria (Predicate Device)	Reported Device Performance (EhmetDx XIS)
CBCT distance accuracy	1%	≤1%
CBCT spatial resolution	≥5 lp/cm	≥5 lp/cm
CBCT low-contrast resolution	15mm@1%	15mm@1%
CBCT CT number accuracy	±40 HU	±40 HU
CBCT CT number uniformity	±40 HU	±40 HU

2. Sample Size Used for the Test Set and Data Provenance

The document states: "Imaging performance and radiography and cone beam CT was assessed using standard measurement tools and phantoms and passed all tests." This indicates that the testing was performed on phantoms (simulated objects used for testing), not human patient data.

Sample Size for Test Set: Not explicitly stated in terms of number of cases, but involved "standard measurement tools and phantoms."
Data Provenance: Not human patient data; generated in a controlled environment using phantoms. No country of origin is specified for clinical data, as no clinical data was used.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

The provided text does not mention the use of experts to establish ground truth for the test set. Given that the testing involved phantoms and standardized measurements, the ground truth would have been established by the inherent properties of the phantoms and the measurement protocols.

4. Adjudication Method for the Test Set

No adjudication method is described. As no human readers or expert evaluations were explicitly mentioned for ground truth establishment, no adjudication method would have been necessary in the context of expert review.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No. The document explicitly states: "EhmetDx did not perform clinical testing for the X-ray Imaging System and no clinical data was collected during device validation activity and substantial equivalence testing." Therefore, a human-in-the-loop MRMC study comparing human readers with and without AI assistance was not conducted.

6. Standalone (Algorithm Only) Performance

Yes. The provided performance metrics in the table (CBCT distance accuracy, spatial resolution, low-contrast resolution, CT number accuracy, CT number uniformity) are all standalone performance metrics of the imaging system and its image processing capabilities, without human intervention in the interpretation of the output for these specific technical aspects.

7. Type of Ground Truth Used

The ground truth used for performance assessment was based on the physical properties and known measurements of standard phantoms.

8. Sample Size for the Training Set

The document does not provide any information regarding a training set size. This is likely because the device, as described, is an X-ray imaging system with image processing capabilities, and the focus of the 510(k) submission is on its technical imaging and registration performance compared to a predicate, rather than an AI/ML algorithm that requires extensive training data.

9. How the Ground Truth for the Training Set Was Established

Since no information on a training set or AI/ML algorithm requiring such a set is provided, there is no mention of how ground truth for a training set was established.

Ask a Question

Ask a specific question about this device

K Number

K173444

Device Name

Quantitative Total Extensible Imaging (QTxI)

Manufacturer

AIQ Solutions, Inc.

Date Cleared

2018-07-23

(259 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071964,K122205

Predicate For

K233998

Intended Use

Quantitative Total Extensible Imaging (QTxI) is a software tool used to aid in evaluation and information management of digital medical images by trained medical professionals including, but not limited to, radiologists, nuclear medicine physicians, medical imaging technologists, dosimetrists and physicists. The medical modalities of these medical images include DICOM CT and PET as supported by ACR/NEMA DICOM 3.0.

QTxI assists in the following indications:

Receive, store, retrieve, display and process digital medical images.
Create, display and print reports from those images.
Provide medical professionals with the ability to display, register, and fuse medical images.
Identify Regions of Interest (ROIs) and perform ROI contouring allowing quantitative/statistical analysis of full or partial body scans.
Evaluate quantitative change in ROIs (total or partial body; individual ROI within individual) with 3D interactive rendering of images with highlighted ROIs.

Device Description

Quantitative Total Extensible Imaging (QTxl) is a software tool designed for use in medical imaging. It is stand-alone software which operates on Windows 7 and Windows 10. Its intended function and use is to provide medical professionals with the means to display, register and fuse medical images from multiple modalities including DICOM PET and CT. Additionally, it identifies Regions of Interest (ROIs) and performs ROI contouring allowing quantitative/statistical analysis of full or partial-body scans through registration to template space.

QTxl is designed to support multiple image analysis modules. Each module is designed for a specific image analysis purpose. Currently QTxl includes only the Quantitative Total Bone Imaging (QTBI) module, which is designed to identify and measure hot-spots on PET scans. QTBI aids the efficiency of medical professionals through automatic quantification of ROIs and changes in those ROIs, including 3D interactive rendering of the patient skeleton with highlighted Regions of Interest.

QTxI also functions as a Picture Archive and Communications System (PACS) intended to receive, store, retrieve, display and process digital medical images, as well as create, display and print reports from those images. It also provides platform features for security, workflow and integration.

AI/ML Overview

The provided text is a 510(k) Pre-market Notification from the FDA regarding the Quantitative Total Extensible Imaging (QTxI) device. However, it does not contain the specific details about a study proving the device meets acceptance criteria as described in your request.

The document primarily focuses on establishing "substantial equivalence" of QTxI to a predicate device (Exini Diagnostics AB; EXINI, K122205) and a supporting predicate device (MIMvista Corp. MIM4.1 (Seastar), K071964). It mentions "Performance Data (Nonclinical)" but only in very general terms:

"Software verification testing that demonstrates the device meets product performance and functional specifications."
"Software verification testing demonstrating that DICOM information collected with medical imaging systems and transmitted through manual or virtual input are captured, transmitted, and stored properly to maintain data integrity (e.g., no loss of data)."

It concludes that "QTxl met all predetermined acceptance criteria of design verification and validation as specified by applicable standards, and test protocols." However, it does not detail these acceptance criteria or the specific study results.

Therefore, I cannot provide the requested information based solely on the provided text. The document states that performance data was submitted and implies successful verification and validation, but it does not present the data itself or the specifics of the study design.

To answer your request, if the information were present in the document, it would look something like this (speculative, based on typical FDA submissions, but not found in the provided text):

Acceptance Criteria and Device Performance Study

While the provided 510(k) summary indicates that QTxI met all predetermined acceptance criteria through non-clinical performance bench tests and simulated clinical performance tests, the specific details of these criteria and the study results are not explicitly enumerated in the document. The general nature of the "Performance Data (Nonclinical)" section suggests that detailed quantitative data on performance was likely submitted separately as part of the full 510(k application.

Based on the general statements, a hypothetical structure of the requested information, if it were present, would be:

1. Table of Acceptance Criteria and Reported Device Performance

Performance Metric (Hypothetical)	Acceptance Criteria (Hypothetical)	Reported Device Performance (Hypothetical)	P-value/Confidence Interval (Hypothetical)
ROI Contouring Accuracy (Jaccard Index)	≥ 0.90 (vs. Expert Consensus)	0.92 ± 0.03	< 0.001
Quantitative Analysis Precision (CV% for ROI Volume)	≤ 5%	3.8%	N/A (precision measure)
Image Registration Accuracy (Target Registration Error)	≤ 2 mm (mean)	1.5 mm ± 0.5 mm	N/A (accuracy measure)
DICOM Data Integrity (Loss/Corruption Rate)	0%	0% (in 1000 transfers)	N/A
Software Functionality (Pass Rate of Test Cases)	100%	100%	N/A

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: The document does not specify. Hypothetically, for a software general purpose imaging device, it might be in the range of 50-200 cases per test type (e.g., for ROI contouring accuracy on different anatomies, or for image registration tasks).
Data Provenance: The document does not specify. Hypothetically, for devices of this nature, data could be retrospectively collected clinical DICOM images from various institutions (e.g., U.S. and European hospitals) to ensure variability in scanners and patient demographics.

3. Number of Experts Used to Establish Ground Truth and Qualifications

Number of Experts: The document does not specify. Hypothetically, typically 2-3 experts for consensus, or more for an MRMC study.
Qualifications of Experts: The document does not specify. Hypothetically, e.g., Three Board-Certified Radiologists with 10+ years of experience in oncological imaging and PET/CT interpretation, specializing in bone metastases.

4. Adjudication Method for the Test Set

Adjudication Method: The document does not specify. Hypothetically, for ground truth establishment, a common method is 2+1 (two experts independently annotate, and a third adjudicates disagreements). If no disagreements, the consensus of the two is the ground truth. Alternatively, 3+0 (majority vote of three independent experts) or a defined consensus meeting might be used.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

MRMC Study: The document does not explicitly state that an MRMC study was performed to assess human reader improvement with AI assistance. The device is described as a "software tool to aid in evaluation and information management," implying it's a tool for professionals, but the performance data presented is general "nonclinical" verification.
Effect Size: Therefore, no effect size of human readers improving with AI vs. without AI assistance can be reported from this document.

6. Standalone (Algorithm Only) Performance

Standalone Performance: The document states, "QTxl is stand-alone software." The "non-clinical performance bench tests and simulated clinical performance tests" likely refer to the algorithm's standalone performance, particularly for tasks like ROI contouring and quantitative analysis. However, specific metrics are not provided.

7. Type of Ground Truth Used

Type of Ground Truth: The document does not specify the method used to establish ground truth for the "software verification testing." Hypothetically, for a device performing ROI contouring and quantitative analysis on images, the ground truth would most likely be expert consensus annotations/measurements on the medical images. Pathology or outcomes data might be relevant for clinical utility but are less direct for verifying a software's image processing capabilities.

8. Sample Size for the Training Set

Training Set Sample Size: The document does not provide details on a training set, as it is a 510(k) for a PACS/image processing tool, not a deep learning AI model that typically requires distinct training sets. The "software verification testing" mentioned would apply to the developed software based on its functional specifications. If the QTxI uses "template space" registration as mentioned, those templates would be developed, but not via a "training set" in the sense of deep learning.

9. How Ground Truth for the Training Set Was Established

Ground Truth for Training Set: As no explicit training set is mentioned in the context of typical AI model development, the method for establishing its "ground truth" is not applicable from this document. If elements of machine learning were present (not explicitly stated for QTxI), the ground truth for training data would similarly be established by expert annotation or curated datasets.

Ask a Question

Ask a specific question about this device

K Number

K091373

Device Name

SYNGO TRUED

Manufacturer

SIEMENS MEDICAL SOLUTIONS USA, INC.

Date Cleared

2009-05-20

(9 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K071950,K063324,K071964,K032915,K063762

Predicate For

K093621,K101228,K101749,K102687,K130393

Intended Use

syngo TrueD is a medical diagnostic application for viewing, manipulation, 3D- visualization and comparison of medical images from multiple imaging modalities and/or multiple time-points. The application supports functional data, such as PET or SPECT as well as anatomical datasets, such as CT or MR. The images can be viewed in a number of output formats including MIP and volume rendering.

syngo TrueD enables visualization of information that would otherwise have to be visually compared disjointedly. syngo TrueD provides analytical tools to help the user assess, and document changes in morphological or functional activity at diagnostic and therapy follow-up examinations.

syngo TrueD is designed to support the oncological.workflow by helping the user to confirm the absence or presence of lesions, including evaluation, quantification, follow-up and documentation of any such lesions. The application allows to store and export volume of interest (VOI) structures in DICOM RT format for use in radiation therapy planning systems.

syngo TrueD allows visualization and analysis of respiratory gated studies to support accurate delineation of the target or treatment volume over a defined phase of the respiratory cycle and thus provide information for radiation therapy planning.

Note: The clinician retains the ultimate responsibility for making the pertinent diagnosis based on their standard practices and visual comparison of the separate unregistered images. syngo Truen in a complement to these standard procedures.

Device Description

syngo TrueD is designed to support the oncological workflow by helping the user to confirm the absence or presence of lesions, including evaluation, quantification, follow-up and documentation of any such lesions. The application allows to store and export volume of interest (VOD structures in DICOM RT format for use in radiation therapy planning systems.

TrueD will be marketed as a software only solution for the end-user (with recommended hardware requirements) . It will be installed by Siemens service engineers. The TrueD described supports DICOM formatted images and information. It is based on the Windows XP operating system.

AI/ML Overview

This 510(k) summary primarily focuses on the substantial equivalence of the "syngo™ TrueD Software" to existing predicate devices, rather than providing detailed acceptance criteria and a specific study demonstrating performance against those criteria. It's a submission for a software device used for image viewing, manipulation, and 3D visualization, targeting applications in oncology and radiation therapy planning.

Based on the provided text, the following information can be extracted:

1. Table of Acceptance Criteria and Reported Device Performance:

The document does not specify quantitative acceptance criteria or report specific performance metrics for the syngo™ TrueD Software that would typically be found in a performance study (e.g., sensitivity, specificity, accuracy for a diagnostic task). The submission emphasizes substantial equivalence to predicate devices for its intended use and technical characteristics.

2. Sample Size Used for the Test Set and Data Provenance:

The document does not mention a specific test set, its sample size, or data provenance (e.g., country of origin, retrospective/prospective). This type of information is usually associated with performance studies assessing diagnostic accuracy or similar metrics.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:

The document does not describe any specific ground truth establishment process for a test set, nor does it mention the number or qualifications of experts involved.

4. Adjudication Method for the Test Set:

No adjudication method is mentioned as there is no described test set or ground truth establishment process.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done:

No MRMC comparative effectiveness study is mentioned. The document primarily argues for substantial equivalence based on intended use and technological characteristics compared to predicate devices. The final "Note" states: "The clinician retains the ultimate responsibility for making the pertinent diagnosis based on their standard practices and visual comparison of the separate unregistered images. syngo TrueD in a complement to these standard procedures." This suggests the device is an assistive tool and not designed to replace human interpretation in diagnostic decision-making, which is often the focus of MRMC studies.

6. If a Standalone (algorithm only without human-in-the-loop performance) was done:

The document does not describe a standalone performance study. The device is presented as an image viewing and manipulation tool to support clinicians.

7. The Type of Ground Truth Used:

No specific ground truth type is mentioned as no performance study with a defined test set is described.

8. The Sample Size for the Training Set:

The document does not mention any training set or its sample size. This suggests the device may not heavily rely on machine learning models that require extensive training data in the same way as some contemporary AI diagnostics. Its function is primarily visualization and analysis, with tools to help users assess changes.

9. How the Ground Truth for the Training Set was Established:

Not applicable, as no training set is mentioned.

Summary of what the document does provide regarding "proof":

The document argues for the device's substantial equivalence to existing legally marketed predicate devices, rather than providing direct "proof" of meeting novel acceptance criteria through a performance study. It emphasizes:

Identical Intended Use: The device's intended use is described as viewing, manipulation, 3D visualization, and comparison of medical images, similar to the functions of its predicate devices. It supports oncology workflows (lesion evaluation, quantification, follow-up, documentation) and radiation therapy planning (visualization/analysis of respiratory gated studies).
Similar Technological Characteristics: It is a software-only solution, supports DICOM images, and runs on Windows XP, implying comparable technology to its predicates.
Safety Information: A hazard analysis was conducted, and appropriate preventive measures were taken, resulting in a determination of "minor level of concern." It highlights that the device has no patient-contacting materials, is used by trained professionals, and device output is subject to review by these professionals. It also states the device "does not impact the quality or status of the original acquired data."

Conclusion:

This 510(k) submission for syngo™ TrueD software in 2009 is a premarket notification for substantial equivalence. It does not include a detailed study with quantitative acceptance criteria and performance data like those commonly seen for AI/ML diagnostic algorithms today. Instead, its "proof" is centered on demonstrating that it is as safe and effective as, and performs as well as, already legally marketed predicate devices with similar intended uses and technological features.

Ask a Question

Ask a specific question about this device

Page 1 of 2