Search Results
Found 244 results
510(k) Data Aggregation
(237 days)
MUJ
Ask a specific question about this device
(123 days)
MUJ
The device is intended for radiation treatment planning for use in stereotactic, conformal, computer planned, Linac based radiation treatment and indicated for cranial, head and neck and extracranial lesions.
RT Elements are computed-based software applications for radiation therapy treatment planning and dose optimization for linac-based conformal radiation treatments, i.e. stereotactic radiosurgery (SRS), fractionated stereotactic radiotherapy (SRT) or stereotactic ablative radiotherapy (SABR), also known as stereotactic body radiation therapy (SBRT) for use in stereotactic, conformal, computer planned, Linac based radiation treatment of cranial, head and neck, and extracranial lesions.
The device consists of the following software modules: Multiple Brain Mets SRS 4.5, Cranial SRS 4.5, Spine SRS 4.5, Cranial SRS w/ Cones 4.5, RT Contouring 4.5, RT QA 4.5, Dose Review 4.5, Brain Mets Retreatment Review 4.5, and Physics Administration 7.5.
Here's the breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) clearance letter for RT Elements 4.5, specifically focusing on the AI Tumor Segmentation feature:
Acceptance Criteria and Reported Device Performance
Diagnostic Characteristics | Minimum Acceptance Criteria (Lower Bound of 95% Confidence Interval) | Reported Device Performance (Mean 95% CI Lower Bound) |
---|---|---|
All Tumor Types | Dice ≥ 0.7 | Dice: 0.74 |
Recall ≥ 0.8 | Recall: 0.83 | |
Precision ≥ 0.8 | Precision: 0.85 | |
Metastases to the CNS | Dice ≥ 0.7 | Dice: 0.73 |
Recall ≥ 0.8 | Recall: 0.82 | |
Precision ≥ 0.8 | Precision: 0.83 | |
Meningiomas | Dice ≥ 0.7 | Dice: 0.73 |
Recall ≥ 0.8 | Recall: 0.85 | |
Precision ≥ 0.8 | Precision: 0.84 | |
Cranial and paraspinal nerve tumors | Dice ≥ 0.7 | Dice: 0.88 |
Recall ≥ 0.8 | Recall: 0.93 | |
Precision ≥ 0.8 | Precision: 0.93 | |
Gliomas and glio-/neuronal tumors | Dice ≥ 0.7 | Dice: 0.76 |
Recall ≥ 0.8 | Recall: 0.74 | |
Precision ≥ 0.8 | Precision: 0.88 |
Note: For "Gliomas and glio-/neuronal tumors," the reported lower bound 95% CI for Recall (0.74) is slightly below the stated acceptance criteria of 0.8. Additional clarification from the submission would be needed to understand how this was reconciled for clearance. However, for all other categories and overall, the reported performance meets or exceeds the acceptance criteria.
Study Details for AI Tumor Segmentation
2. Sample size used for the test set and the data provenance:
- Sample Size: 412 patients (595 scans, 1878 annotations)
- Data Provenance: De-identified 3D CE-T1 MR images from multiple clinical sites in the US and Europe. Data was acquired from adult patients with one or multiple contrast-enhancing tumors. ¼ of the test pool corresponded to data from three independent sites in the USA.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Number of Experts: Not explicitly stated as a number, but referred to as an "external/independent annotator team."
- Qualifications of Experts: US radiologists and non-US radiologists. No further details on years of experience or specialization are provided in this document.
4. Adjudication method for the test set:
- The document mentions "a well-defined data curation process" followed by the annotator team, but it does not explicitly describe a specific adjudication method (e.g., 2+1, 3+1) for resolving disagreements among annotators.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, a multi-reader multi-case (MRMC) comparative effectiveness study comparing human readers with and without AI assistance was not reported for the AI tumor segmentation. The study focused on standalone algorithm performance against ground truth.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, a standalone performance study was done. The validation was conducted quantitatively by comparing the algorithm's automatically-created segmentations with the manual ground-truth segmentations.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- Expert Consensus Segmentations: The ground truth was established through "manual ground-truth segmentations, the so-called annotations," performed by the external/independent annotator team of radiologists.
8. The sample size for the training set:
- The sample size for the training set is not explicitly stated in this document. The document mentions that "The algorithm was trained on MRI image data with contrast-enhancing tumors from multiple clinical sites, including a wide variety of scanner models and patient characteristics."
9. How the ground truth for the training set was established:
- How the ground truth for the training set was established is not explicitly stated in this document. It can be inferred that it followed a similar process to the test set, involving expert annotations, but the details are not provided.
Ask a specific question about this device
(203 days)
MUJ
RayCare is an oncology information system intended to provide information which is used to take decisions for diagnosis, treatment management, treatment planning, scheduling, treatment and follow-up of radiation therapy, medical oncology and surgical oncology.
For these disciplines, as applicable, RayCare enables the user to define the clinical treatment intent, prescribe treatment, specify the detailed course of treatment delivery, manage the treatment course and monitor the treatment course.
In the context of radiation therapy, the RayCare image viewer can be used for viewing images, annotating images, performing and saving image registrations as well as image fusion to enable offline image review of patient positioning during treatment delivery.
RayCare is not intended for use in diagnostic activities.
RayCare is an oncology information system intended to provide information which is used to take decisions for diagnosis, treatment management, treatment planning, scheduling, treatment and follow-up of radiation therapy, medical oncology and surgical oncology.
For these disciplines, as applicable, RayCare enables the user to define the clinical treatment intent, prescribe treatment, specify the detailed course of treatment delivery, manage the treatment course and monitor the treatment course.
In the context of radiation therapy, the RayCare image viewer can be used for viewing images, annotating images, performing and saving image registrations as well as image fusion to enable offline image review of patient positioning during treatment delivery.
As an oncology information system, RayCare supports healthcare professionals in managing cancer care treatments. The system provides functionalities as described briefly in the sections below. These functionalities are not provided separately in different applications and have a joint purpose for the treatment of the patient.
RayCare is a software-as a Medical Device with a client part that allows the user to interact with the system and a server part that performs the necessary processing and storage functions. Selected aspects of RayCare are configurable, such as adapting workflow templates to the specific needs of the clinic.
This document describes the premarket notification for RayCare (2024A SP1), an oncology information system. The relevant sections for acceptance criteria and study details are primarily found under "VII. Non-Clinical and/or Clinical Tests Summary" and the tables within it.
Based on the provided text, RayCare (2024A SP1) is not an AI/ML device in the sense of making autonomous diagnostic decisions or image-based classifications. It is an Oncology Information System that supports clinical workflows for radiation therapy and other oncology disciplines. The "acceptance criteria" and "study that proves the device meets the acceptance criteria" in this context refer to the software verification and validation (V&V) activities. Therefore, the information provided focuses on demonstrating the software's functional correctness, safety, and effectiveness compared to a predicate device, rather than performance metrics specifically for an AI model (e.g., sensitivity, specificity, AUC).
Here's a breakdown of the requested information based on the provided document:
Acceptance Criteria and Device Performance (Software V&V)
The acceptance criteria for RayCare (2024A SP1) are implicitly defined by the successful completion of various software verification and validation activities designed to demonstrate that the device performs as intended and is as safe and effective as its predicate. These are primarily functional and system-level criteria.
Table of Acceptance Criteria and Reported Device Performance:
Since this is a software verification and validation summary for an oncology information system, the "performance" is demonstrated through successful compliance with system specifications and validated functionality. The "acceptance criteria" are the "Pass criteria" of the specific tests.
Acceptance Criteria (from "Criteria" or "Pass criteria" of listed V&V) | Reported Device Performance |
---|---|
Treatment Course Management (TCM) Workspace: The TCM workspace shall show the treatment course and its related series, treatment fractions, and assigned beam sets for the care plan selected in the global care plan selector. | |
Specific criteria: | |
• The treatment series related to the selected care plan is displayed. | |
• The fractions in the fractions table are only related to the treatment series related to the selected care plan. | |
• The assigned beam set table only displays the beam set related to the selected care plan. | Passed. "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." |
Extended RayCare Scripting Support (Unit Testing): Queries shall only be available for scripting if explicitly declared as scriptable (whitelisted data). | Passed. "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." |
Extended RayCare Scripting Support (System Level Verification): It is possible to run a script by clicking a RayCare script task, and the script has performed the expected action within RayCare. | Passed. "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." |
Offline and Online Recording of Treatment Results: Offline import is requested, received, and possible to sign with device and radiotherapy record selected for import for a selected session. | |
Specific criteria: | |
• Verify treatment course table and beam delivery result table in TC overview gets updated with corresponding data for the first session. | |
• Verify the device selected for offline import is the delivered device on the session. | Passed. "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." |
Treatment Delivery Integration Framework (Varian TrueBeam): The treatment flow for treatment delivery is verified. | |
Specific criteria: | |
• The fraction is fully delivered, and the status of the fraction, session, and beams is set to "Delivered". | |
• Compare the delivered meterset, the couch positions and angles. They should be the same. | |
• The online couch corrections are calculated as the difference between the planned and the delivered couch positions. | Passed. "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." |
Overall Conclusion:
"From the successful verification and validation activities, the conclusion can be drawn that RayCare 2024A SP1 has met specifications and is as safe, as effective and performs as well as or better than the legally marketed predicate device."
-
Sample sizes used for the test set and the data provenance:
- Test Set Sample Size: The document does not specify a numerical sample size for "test sets" in the traditional sense of patient cases or images for evaluating an AI model. Instead, it refers to software verification and validation ("V&V") activities including unit testing, integration testing, system-level testing, cybersecurity testing, usability testing, and regression testing. These involve testing against requirements and specifications, often using simulated data, test cases, or specific user scenarios, rather than a fixed "dataset" of patient images.
- Data Provenance: The document does not explicitly state the country of origin of testing data or if it was retrospective or prospective. Given it's software V&V for an oncology information system, the "data" would primarily be test inputs and expected outputs generated internally during the development process (e.g., test scripts, simulated patient data to exercise specific functionalities). It's not a study on real patient data for diagnostic performance.
-
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- This concept is not applicable as this is a software verification and validation summary for an oncology information system, not a study evaluating an AI model's diagnostic or prognostic performance against expert-determined ground truth. The "ground truth" for V&V activities is the system's specified behavior and functional requirements. Software engineers and QA professionals establish whether the software meets these pre-defined requirements.
-
Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- Not applicable. Adjudication methods are typically used in clinical studies involving human readers to resolve discrepancies in annotations or diagnoses, especially when establishing ground truth for AI model evaluation. This document describes software V&V.
-
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was not done. The document explicitly states: "No Clinical trials were required to demonstrate substantial equivalence."
- This type of study is relevant for AI-assisted diagnostic devices. RayCare is described as an oncology information system, not an AI diagnostic tool.
-
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- While the system has automated functions, the concept of "standalone performance" as it relates to an AI algorithm making a clinical decision (e.g., classifying a lesion) is not directly applicable here. The V&V described focuses on the system's ability to correctly manage and process information, integrate with other systems, and support user workflows, which are inherent to its "standalone" operation as an information system. The "performance" is demonstrated through successful execution of its intended software functions as per its specifications.
-
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The "ground truth" for the software verification and validation described here is the functional specifications and requirements of the RayCare system. Successful "verification" means the design output meets the requirements, and "validation" means the software conforms to user needs and intended uses. This is established through internal testing against defined expected behaviors.
-
The sample size for the training set:
- Not applicable. RayCare (2024A SP1) is an oncology information system, and the document does not indicate that it incorporates a machine learning model that was "trained" on a dataset in the way an AI diagnostic or predictive algorithm would be. The device's "development" involved standard software engineering practices.
-
How the ground truth for the training set was established:
- Not applicable. As there is no mention of an AI/ML training set, the concept of establishing ground truth for it does not apply.
In summary, this FDA review document pertains to the clearance of an Oncology Information System (OIS) through the 510(k) pathway, demonstrating substantial equivalence to a predicate device. The "acceptance criteria" and "proof" come from a robust set of software verification and validation activities (unit, integration, system, cybersecurity, usability, regression testing) rather than clinical studies or the evaluation of an AI model's diagnostic performance against a clinical ground truth. The device is not presented as an AI-driven diagnostic tool.
Ask a specific question about this device
(211 days)
MUJ
Oncospace is used to configure and review radiotherapy treatment plans for a patient with malignant or benign disease in the head and neck, thoracic, abdominal, and pelvic regions. It allows for set up of radiotherapy treatment protocols, association of a potential treatment plan with the protocol(s), submission of a dose prescription and achievable dosimetric goals to a treatment planning system, and review of the treatment plan. It is intended for use by qualified, trained radiation therapy professionals (such as medical physicists, oncologists, and dosimetrists). This device is for prescription use by order of a physician.
The Oncospace software supports radiation oncologists and medical dosimetrists during radiotherapy treatment planning. The software includes locked machine learning algorithms. During treatment planning, the Oncospace software works in conjunction with, and does not replace, a treatment planning system (TPS).
The Oncospace software is intended to augment the treatment planning process by:
- allowing the radiation oncologist to select and customize a treatment planning protocol that includes dose prescription (number of fractions, dose per fraction, dose normalization), a delivery method (beam type and geometry), and protocol-based dosimetric goals/objectives for treatment targets, and organs at risk (OAR);
- predicting dosimetric goals/objectives for OARs based on patient-specific anatomical geometry;
- automating the initiation of plan optimization on a TPS by supplying the dose prescription, delivery method, protocol-based target objectives, and predicted OAR objectives;
- providing a user interface for plan evaluation against protocol-based and predicted goals.
Diagnosis and treatment decisions occur prior to treatment planning and do not involve Oncospace. Decisions involving Oncospace are restricted to setting of dosimetric goals for use during plan optimization and plan evaluation. Human judgement continues to be applied in accepting these goals and updating them as necessary during the iterative beam optimization process. Human judgement is also still applied as in standard practice during plan quality assessment; the protocol-based OAR goals are used as the primary means of plan assessment, with the role of the predicted goals being to provide additional information as to whether dose to an OAR may be able to be further lowered.
When Oncospace is used in conjunction with a TPS, the user retains full control of the TPS, including finalization of the treatment plan created for the patient. Oncospace also does not interface with the treatment machines. The risk to patient safety is lower than a TPS since it only informs the treatment plan, does not allow region of interest editing, does not make treatment decisions, and does not interface directly with the treatment machine or any record and verify system.
Oncospace's OAR dose prediction approach, and the use of predictions in end-to-end treatment planning workflow, has been tested for use with a variety of cancer treatment plans. These included a wide range of target and OAR geometries, prescriptions and boost strategies (sequential and simultaneous delivery). Validity has thus been demonstrated for the range of prediction model input features encountered in the test cases. This range is representative of the diversity of the same feature types (describing target-OAR proximity, target and OAR shapes, sizes, etc.) encountered across all cancer sites. Given that the same feature types will be used in OAR dose prediction models trained for all sites, the modeling approach validated here is not cancer site specific, but rather is designed to predict OAR DVHs based on impactful features common to all sites. The software is designed to be used in the context of all forms of intensity-modulated photon beam radiotherapy. The planning objectives themselves are intended to be TPS-independent: these are instead dependent on the degree of organ sparing possible given the beam modality and range of delivery techniques for plans in the database. To facilitate streamlined transmission of DICOM files and plan parameters Oncospace includes scripts using the treatment planning system's scripting language (for example, Pinnacle).
The Oncospace software includes an algorithm for transforming non-standardized OAR names used by treatment planners to standardized names defined by AAPM Task Group 263. This matching process primarily uses a table of synonyms that is updated as matches are made during use of the product, as well as a Natural Language Processing (NLP) model that attempts to match plan names not already in the synonym table. The NLP model selects the most likely match, which may be a correct match to a standard OAR name, an incorrect match, or no match (when the model considers this to be most likely, such as for names resembling a target). The user can also manually match names using a drop-down menu of all TG-263 OAR names. The user is instructed to check each automated match and make corrections using the drop-down menu as needed.
Based on the provided 510(k) Clearance Letter, here's a detailed description of the acceptance criteria and the study proving the device meets them:
1. Table of Acceptance Criteria and Reported Device Performance
The document describes two main types of performance testing: clinical performance testing and model performance testing. The acceptance criteria are implicitly defined by the reported performance achieving non-inferiority or being within acceptable error margins.
Acceptance Criteria Category | Specific Metric/Target | Reported Device Performance |
---|---|---|
Clinical Performance (Primary Outcome) | OAR Dose Sparing Non-inferiority Margin: |
- Thoracic: 2.2 Gy
- Abdominal: 1 Gy
- Pelvis (Gynecological): 1.9 Gy | Achieved Non-Inferiority:
- Mean OAR dose was statistically significantly lower for 5 OARs for abdominal and 4 OARs for pelvis (gynecological).
- No statistically significant differences in mean dose for remaining 11 OARs for thoracic, 3 OARs for abdominal, and 2 OARs for pelvis (gynecological).
- Non-inferiority demonstrated to 2.2 Gy for thoracic, 1 Gy for abdominal, and 1.9 Gy for pelvis (gynecological). |
| Clinical Performance (Secondary Outcome) | Target Coverage Maintenance: No statistically significant difference in target coverage compared to clinical plans without Oncospace. | Achieved: No statistically significant difference in target coverage between clinical plans and plans created with use of the Oncospace system. |
| Clinical Performance (Effort Reduction) | No increased optimization cycles when using Oncospace vs. traditional workflow. (Implicit acceptance criteria) | Achieved: Out of all the plans tested, no plan required more optimization cycles using Oncospace versus using traditional radiation treatment planning clinical workflow. |
| Model Performance (H&N External Validation) | Mean Absolute Error (MAE) in OAR DVH dose values: - Institution 2: Within 5% of prescription dose for all OARs.
- Institution 3: Within 5% of prescription dose for all OARs. | Achieved (with some exceptions):
- Institution 2: MAE within 5% for 9/12 OARs; does not exceed 9% for any OARs.
- Institution 3: MAE within 5% for 10/12 OARs; does not exceed 8% for any OARs. |
| Model Performance (Prostate External Validation) | Mean Absolute Error (MAE) in OAR DVH dose values: Within 5% of prescription dose for all OARs. | Achieved (with some exceptions): - Institution 3: MAE within 5% for 4/6 OARs; 5.1% for one OAR; 15.9% for one OAR. |
| NLP Model Performance (Cross-Validation) | Validation macro-averaged F1 score above 0.92 and accuracy above 96% for classifying previously unseen terms. | Achieved: All models achieved a validation macro-averaged F1 score above 0.92 and accuracy above 96%. |
| NLP Model Performance (External Validation) | Correctly match a high percentage of unique and total structure names. (Implicit acceptance criteria) | Achieved: Correctly matched 207/221 (94.1%) of all structure names, or 131/145 (91.0%) unique structure names. |
| General Verification Tests | All system requirements and acceptance criteria met (clinical, standard UI, cybersecurity). | Achieved: Met all system requirements and acceptance criteria. |
2. Sample Sizes and Data Provenance
The document provides detailed sample sizes for training/tuning, and external performance/clinical validation datasets.
-
Test Set Sample Sizes:
- Clinical Validation Dataset:
- Head and Neck: 18 patients (previously validated)
- Thoracic: 20 patients (14 lung, 6 esophagus)
- Abdominal: 17 patients (11 pancreas, 6 liver)
- Pelvis: 17 patients (12 prostate, 5 gynecological) (prostate previously validated)
- External Performance Test Dataset(s) (Model Performance):
- Head and Neck: Dataset A: 265 patients (Institution_2); Dataset B: 27 patients (Institution_3)
- Prostate: 40 patients (Institution_3)
- NLP Model External Testing Dataset: 221 structures with 145 unique original names.
- Clinical Validation Dataset:
-
Data Provenance (Country of Origin, Retrospective/Prospective):
- Training/Tuning/Internal Testing Datasets: Acquired from Johns Hopkins University (JHU) between 2008-2019. Johns Hopkins University is located in the United States. This data is retrospective.
- External Performance Test Datasets: Acquired from Institution_2 and Institution_3. Locations of these institutions are not specified but are implied to be distinct from JHU. This data is retrospective.
- Clinical Validation Datasets: Acquired from Johns Hopkins University (JHU) between 2021-2024 (for Thoracic, Abdominal, Pelvis) and 2021-2022 (for H&N, previously validated). This data is retrospective.
- NLP Model Training/External Validation: Trained and validated using "known name matches in the prostate, gynecological, head and neck, thoracic, and pancreas cancer datasets licensed to Oncospace by Johns Hopkins University." This indicates retrospective data from the United States.
3. Number of Experts and Qualifications for Ground Truth
The document does not explicitly state the number of experts or their specific qualifications (e.g., years of experience, types of radiologists) used to establish the ground truth for the test sets.
Instead, it refers to "heterogenous sets of traditionally-planned clinical treatment plans" and "curated, gold-standard treatment plans" (for the predicate device comparison table, implying similar for the subject). This suggests that the ground truth for the OAR dose values reflects actual clinical outcomes from existing treatment plans.
For the NLP model, the ground truth was "known name matches" in the acquired datasets, implying consensus or established naming conventions from the institutions, rather than real-time expert adjudication for the study.
4. Adjudication Method for the Test Set
The document does not describe an explicit adjudication method (e.g., 2+1, 3+1 reader adjudication) for establishing the ground truth dose values or treatment plan quality for the test sets. The "ground truth" seems to be defined by:
- Clinical Performance Testing: "heterogenous sets of traditionally-planned clinical treatment plans," implying the actual clinical plans serve as the comparative ground truth.
- Model Performance Testing: "comparison of predicted dose values to ground truth values." These ground truth values appear to be the actual recorded DVH dose values from the clinical plans in the external test datasets.
- NLP Model: "known name matches" from the licensed datasets, suggesting pre-defined or institutional standards.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No, an MRMC comparative effectiveness study involving human readers improving with AI vs. without AI assistance was not described in this document.
The study focused on:
- Comparing plans generated with Oncospace to traditionally-planned clinical treatment plans (effectively comparing AI-assisted plan generation to human-only plan generation, but without a specific MRMC design to measure human reader improvement).
- Assessing the ability of Oncospace to maintain or improve OAR sparing and target coverage.
- Evaluating the accuracy of the model's dose predictions and the NLP module.
The study design described is a non-inferiority trial for clinical performance, and model performance accuracy assessments, not an MRMC study quantifying human reader performance change.
6. Standalone (Algorithm Only) Performance
Yes, standalone performance was done for:
- Model Performance Testing: This involved comparing the model's predicted OAR DVH dose values directly against "ground truth values" (actual recorded dose values from clinical plans) in the external test datasets. This is an algorithm-only (standalone) assessment of the dose prediction accuracy, independent of the overall human-in-the-loop clinical workflow.
- NLP Model Performance: The NLP model's accuracy in mapping non-standardized OAR names to TG-263 names was evaluated in a standalone manner using cross-validation and an external test dataset.
The "clinical performance testing," while ultimately comparing plans with Oncospace assistance to traditional plans, is also evaluating the algorithm's influence on the final plan quality. However, the explicit "model performance testing" sections clearly describe standalone algorithm evaluation.
7. Type of Ground Truth Used
The ground truth used in this study primarily relied on:
- Existing Clinical Treatment Plans/Outcomes Data: For clinical performance testing, "heterogenous sets of traditionally-planned clinical treatment plans" served as the comparative baseline. The OAR doses and target coverage from these real-world clinical plans constituted the "ground truth" for comparison. This can be categorized as outcomes data in terms of actual treatment parameters delivered in a clinical setting.
- Recorded Dosimetric Data: For model performance testing, the "ground truth values" for predicted DVH doses were the actual, recorded DVH dose values from the clinical plans in the external datasets. This data is derived from expert consensus in practice (as these were actual clinical plans deemed acceptable by clinicians) and outcomes data (the resulting dose distributions).
- Established Reference/Consensus (NLP): For the NLP model, the ground truth was based on "known name matches" or "standardized names defined by AAPM Task Group 263," which represents expert consensus or authoritative standards.
8. Sample Size for the Training Set
The document refers to the "Development (Training/Tuning) and Internal Performance Testing Dataset (randomly split 80/20)" for each anatomical location. Assuming the 80% split is for training:
- Head and Neck: 1145 patients (80% for training) = approx. 916 patients
- Thoracic: 1623 patients (80% for training) = approx. 1298 patients
- Abdominal: 712 patients (80% for training) = approx. 569 patients
- Pelvis: 1785 patients (80% for training) = approx. 1428 patients
9. How the Ground Truth for the Training Set Was Established
The ground truth for the training set for the dose prediction models was established based on retrospective clinical data from Johns Hopkins University between 2008-2019. These were actual treatment plans for patients who received radiation therapy, meaning their dosimetric parameters (like OAR doses and target coverage from DVHs) and anatomical geometries (from imaging) were used as the input features and target outputs for the machine learning models.
The plans were selected based on certain criteria (e.g., "required to exhibit 90% target coverage" for H&N, "92% target coverage" for Thoracic, etc.), implying these were clinically acceptable plans. This essentially makes the ground truth for training derived from expert consensus in practice (as these were plans approved and delivered by medical professionals at a major institution) and historical outcomes data (the actual treatment parameters achieved).
For the NLP model, the training ground truth was "known name matches" in these same prostate, gynecological, head and neck, thoracic, and pancreas cancer datasets, meaning established mappings between unstandardized and standardized OAR names were used. This again points to expert consensus/standardization.
Ask a specific question about this device
(420 days)
MUJ
RayStation is a software system for radiation therapy and medical oncology. Based on user input, RayStation proposes treatment plans. After a proposed treatment plan is reviewed and approved by authorized intended users, RayStation may also be used to administer treatments.
The system functionality can be configured based on user needs.
RayStation is a software system for radiation therapy and medical oncology. Based on user input, RayStation proposes treatment plans. After a proposed treatment plan is reviewed and approved by authorized intended users, RayStation may also be used to administer treatments.
The system functionality can be configured based on user needs.
RayStation consists of multiple applications:
- The main RayStation application is used for treatment planning.
- The RayPhysics application is used for commissioning of treatment machines to make them available for treatment planning and used for commissioning of imaging systems.
The devices to be marketed, RayStation/RayPlan 2024A SP3, 2024A and 2023B, contain modified features compared to last cleared version RayStation 12A including:
- Improved sliding window VMAT (Volumetric Modulated Arc Therapy) sequencing
- Higher dose grid resolution for proton PBS (Pencil Beam Scanning)
- Automated field in field planning
- LET optimization (Linear Energy Transfer)
These applications are built on a software platform, containing the radiotherapy domain model and providing GUI, optimization, dose calculation and storage services. The platform uses three Microsoft SQL databases for persistent storage of the patient, machine and clinic settings data.
As a treatment planning system, RayStation aims to be an extensive software toolbox for generating and evaluating various types of radiotherapy treatment plans. RayStation supports a wide variety of radiotherapy treatment techniques and features an extensive range of tools for manual or semi-automatic treatment planning.
The RayStation application is divided in modules, which are activated through licensing. A simplified license configuration of RayStation is marketed as RayPlan.
The provided document is a 510(k) clearance letter for the RayStation/RayPlan 2024A SP3, 2024A, and 2023B devices. It discusses the substantial equivalence of these devices to a predicate device (RayStation 12A).
However, the document does not contain specific acceptance criteria tables nor detailed study results for a single, comprehensive study proving the device meets acceptance criteria in the format typically requested (e.g., a specific clinical validation study with explicitly defined acceptance metrics like sensitivity, specificity, or AUC, and corresponding reported performance values).
Instead, the document describes a broad software verification and validation process, stating that the software underwent:
- Unit Testing
- Integration Testing
- System Level Testing
- Cybersecurity Testing
- Usability Testing (Validation in a clinical environment)
- Regression Testing
For several "Added/updated functions," the document provides a description of the verification and validation data used to demonstrate substantial equivalence and simply states "Yes" under the "Substantially Equivalent?" column if the validation was "successful." The acceptance criteria for these tests are described narratively within the text, not in a consolidated table format with numerical performance outcomes.
Therefore, I cannot generate the requested table of "acceptance criteria and the reported device performance" as a single, consolidated table with numerical results for the entire device's performance against specific, pre-defined acceptance criteria for a single study. The document describes a process of demonstrating substantial equivalence through various verification and validation activities rather than a single, large-scale study with quantitative acceptance criteria for the overall device performance.
However, I can extract the information related to the validation activities for specific features and the general approach to proving substantial equivalence.
Here's a breakdown of the requested information based on the provided document, addressing each point to the best of my ability given the available details:
Acceptance Criteria and Device Performance (Based on provided verification and validation descriptions)
As noted, a single, consolidated table of quantitative acceptance criteria and overall device performance is not provided. Instead, the document describes various verification and validation activities with implicit or explicit pass criteria for individual features or system aspects to demonstrate substantial equivalence to the predicate device.
Below are examples of how some "acceptance criteria" (pass criteria) and "reported performance" are described for specific features. These are not aggregated performance metrics for the entire device but rather success criteria for sub-components or changes.
Feature/Aspect Tested | Acceptance Criteria (Pass Criteria) Described | Reported Device Performance (as stated in the document) |
---|---|---|
Dose compensation point computation for Tomo Synchrony | 1. Calculated values for the center point coordinates are equal to values from the version used in Accuray validation. |
- Calculated values are numerically equal to values obtained from calling the method (regression test).
- Calculated values are exported correctly from RayStation to DICOM (equality between calculated and exported point, only for Helical Tomo Synchrony plans, only in correct DICOM item).
- Calculated values are converted correctly from DICOM to Accuray's system format (equality of point coordinates, only for relevant plan types). | "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." (Implies all pass criteria were met). |
| Point-dose optimization in brachy plans | 1. Position from the correct image set is used for point-dose objectives/constraints. - Possible to add optimization objective/constraint to a point, referring to the correct point.
- When adding objective/constraint, selection of function type and dose level is possible and reflected in description.
- Saving and loading an optimization function template containing point objectives/constraints works correctly (loaded functions are same as saved).
- Results from single/multiple point optimization are as expected (dose in point(s) should be equal to specified dose in objective(s)). | "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." (Implies all pass criteria were met). |
| Electron Monte Carlo dose engine improvements | Comparing calculated doses with:
- Measured doses obtained from clinics,
- Doses computed in independent, well-established TPS,
- Doses computed with earlier versions of RayStation,
- Doses computed in BEAMnrc/egs++
using Gamma evaluation criteria. | "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." (Implies adequate agreement based on Gamma criteria). |
| Evaluation on converted CBCT images for protons | For proton MC/PB dose computation: - Gamma 2%/2mm pass rate above 90%
- Gamma 3%/3mm pass rate above 95% | "The successful validation of this feature demonstrates that the device is as safe and effective as the predicate device." (Implies specified Gamma pass rates were achieved). |
| Overall Device (Software Verification/Validation) | Software specifications conform to user needs and intended uses, and particular requirements implemented through software can be consistently fulfilled. Conformance to applicable requirements and specifications. Successful outcome of unit, integration, system, cybersecurity, usability, and regression testing. Safety and effectiveness validated. | "RayStation/RayPlan 2024A SP3, 2024A and 2023B have met specifications and are as safe, as effective and perform as well as the legally marketed predicate devices." All general software tests (unit, integration, system, cybersecurity, usability, regression) were acceptable/successful. |
Study Details (Based on the document)
Given the nature of the 510(k) submission for a treatment planning system, the "study" is primarily a comprehensive software verification and validation effort to demonstrate substantial equivalence, rather than a single, standalone clinical trial or diagnostic accuracy study.
-
Sample sizes used for the test set and the data provenance:
- Test Set Sample Sizes: Not explicitly stated as a single numerical value for a global "test set." Testing was conducted at multiple levels (unit, integration, system, usability, regression) across various features.
- For "Evaluation on converted CBCT images for protons," it mentions "Test cases consist of CBCTs from the MedPhoton imaging ring on a Mevion S250i system, as well as the on-board CBCT systems on a Varian ProBeam and an IBA P1," implying a set of patient or phantom imaging data, but the exact number of cases/patients is not specified.
- For other features, it refers to "tests," "validation data," or "computed doses" but doesn't quantify the number of distinct data points/cases used.
- Data Provenance:
- Country of Origin: Not specified in the document. Likely internal RaySearch data and potentially data from collaboration with clinical sites, but no specific countries are mentioned.
- Retrospective or Prospective: Not explicitly stated. The verification and validation activities appear to be primarily retrospective (using existing data, phantom measurements, or simulated scenarios) as part of the software development lifecycle, rather than prospective clinical data collection for a specific study.
- Test Set Sample Sizes: Not explicitly stated as a single numerical value for a global "test set." Testing was conducted at multiple levels (unit, integration, system, usability, regression) across various features.
-
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- This information is not provided in the document. The document refers to "measured doses obtained from clinics" and "doses computed in independent, well-established TPS" as part of the validation for dose engine improvements, suggesting some form of external or expert-derived ground truth, but the number and qualifications of experts involved are not detailed. For "Evaluation on converted CBCT images for protons," it states "For each case, a ground truth CT image has been prepared to serve as ground truth," implying expert or established reference standard, but again, no details on experts.
-
Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- This information is not provided. The document focuses on computational and functional verification rather than multi-reader clinical assessment.
-
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, an MRMC comparative effectiveness study was not explicitly done or reported in this document. The device, RayStation/RayPlan, is a treatment planning system that assists users in creating treatment plans, not primarily an AI-driven image interpretation or diagnostic aid where human reader performance improvement is typically measured. The AI-related feature mentioned is "deep learning segmentation," but the document states, "(The model training is performed offline on clinical CT and structure data.)" It does not detail an MRMC study related to its performance or impact on human readers.
-
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, standalone (algorithm-only) performance was central to the validation. The document describes extensive "Unit Testing," "Integration Testing," "System Level Testing," and "Dose engine validation" which are all a form of standalone algorithmic evaluation. For example, the Gamma evaluation criteria for dose calculations or the numerical equality checks for dose compensation points are purely algorithmic performance assessments.
-
The type of ground truth used (expert consensus, pathology, outcomes data, etc):
- The ground truth varied depending on the feature being validated:
- "Measured doses" from clinics / Independent TPS computations / BEAMnrc/egs++ calculations: For dose engine validation. This represents a highly accurate, often physical measurement or well-established computational standard.
- "Ground truth CT image": For evaluation of converted CBCT images for protons. This implies a high-quality reference image.
- Internal "expected results" and "specifications": For functional and system-level tests (e.g., for point-dose optimization, the expected result was that the dose in the point should equal the dose specified in the objective).
- "Clinical objectives": Used for plan comparisons (e.g., in segment weight optimization validation), likely representing desired dose distributions defined by clinical experts.
- The ground truth varied depending on the feature being validated:
-
The sample size for the training set:
- The document mentions "deep learning segmentation" and states that "The model training is performed offline on clinical CT and structure data." However, the sample size for this training set is not provided.
-
How the ground truth for the training set was established:
- For "deep learning segmentation," the ground truth for training would implicitly be the "clinical CT and structure data" mentioned. This typically means expert-delineated structures (ROIs) on clinical CT images, but the exact method (e.g., single expert, consensus, specific software tools) is not detailed.
Ask a specific question about this device
(160 days)
MUJ
ART-Plan+'s indicated target population is cancer patients for whom radiotherapy treatment has been prescribed. In this population, any patient for whom relevant modality imaging data is available.
ART-Plan+'s includes several modules:
-
SmartPlan which allows automatic generation of radiotherapy treatment plan that the users import into their own Treatment Planning System (TPS) for the dose calculation, review and approval. This module is available for supported prescriptions for prostate only.
-
Annotate which allows automatic generation of contours for organs at risk, lymph nodes and tumors, based on medical practices, on medical images such as CT and MR images
ART-Plan+ is not intended to be used for patients less than 18 years of age.
The indicated users are trained medical professionals including, but not limited to, radiotherapists, radiation oncologists, medical physicists, dosimetrists and medical professionals involved in the radiation therapy process.
The indicated use environments include, but are not limited to, hospitals, clinics and any health facility offering radiation therapy.
ART-Plan+ is a software platform allowing contour regions of interest on 3D images and to provide an automatic treatment plan. It includes several modules:
-Home: tasks
-Annotate and TumorBox: contouring of regions of interest
-SmartPlan: creation of an automatic treatment plan based on a planning CT and a RTSS
-Administration and settings : preferences management, user account management, etc.
-Institute Management: institute information, including licenses, list of users, etc.
-About: information about the software and its use, as well as contact details.
Annotate, TumorBox and SmartPlan are partially based on a batch mode, which allows the user to launch the operations of autocontouring and autoplanning without having to use the interface or the viewers. In that way, the software is completely integrated into the radiotherapy workflow and offer to the user a maximum of flexibility.
ART-Plan+ offers deep-learning based automatic segmentation of OARs and LNs for the following localizations:
-Head and neck (on CT images)
-Thorax/breast (on CT images)
-Abdomen (on CT and male on MR images)
-Pelvis male (on CT and MR images)
-Pelvis female (on CT images)
-Brain (on CT images and MR images)
ART-Plan+ offers deep-learning based automatic segmentation of targets for the following localizations:
-Brain (on MR images)
Based on the provided text, here's a detailed breakdown of the acceptance criteria and the study that proves the device meets them:
1. Table of Acceptance Criteria and the Reported Device Performance:
The document describes five distinct types of evaluations with their respective acceptance criteria. While the exact "reported device performance" (i.e., the specific numerical results obtained for each metric) is not explicitly stated, the document uniformly concludes, "All validation tests were carried out using datasets representative of the worldwide population receiving radiotherapy treatments. Finally, all tests passed their respective acceptance criteria, thus showing ART-Plan + v3.0.0 clinical acceptability." This implies all reported device performances met or exceeded the criteria.
Study Type | Acceptance Criteria | Reported Device Performance (Implied) |
---|---|---|
Non-regression Testing of Autosegmentation of ORs | Mean DSC should not regress negatively between the current and last validated version of Annotate beyond a maximum tolerance margin set to -5% relative error. | Met |
Qualitative Evaluation of Autosegmentation of ORs | Clinicians' qualitative evaluation of the auto-segmentation is considered acceptable for clinical use without modifications (A) or with minor modifications/corrections (B) with an A+B % above or equal to 85%. | Met |
Quantitative Evaluation of Autosegmentation of ORs | Mean DSC (annotate) ≥ 0.8 | Met |
Inter-expert Variability Evaluation of Autosegmentation of ORs | Mean DSC (annotate) ≥ Mean DSC (inter-expert) with a tolerance margin of -5% of relative error. | Met |
Quantitative Evaluation of Autosegmentation of Brain Metastasis | Lesion-wise sensitivity ≥ 0.86 | |
AND Lesion-wise precision ≥ 0.70 | ||
AND Lesion-wise DSC ≥ 0.78 | ||
AND Patient-wise DSC ≥ 0.83 | ||
AND Patient-wise false positive (FP) ≤ 2.1 | Met | |
Quantitative Evaluation of Autosegmentation of Glioblastoma | Sensitivity ≥ 0.80 | |
AND DSC ≥ 0.76 | Met | |
Quantitative and Qualitative Evaluation of Automatic Treatment Plans Generations | Quantitative: effectiveness difference (%) in DVH achieved goals between manual plans and automatic plans ≤ 5% | |
AND Qualitative: % of clinical acceptable automatic plans ≥ 93% after expert review. | Met |
2. Sample Sizes Used for the Test Set and the Data Provenance:
- Non-regression Testing (Autosegmentation of ORs): Minimum sample size of 24 patients.
- Qualitative Evaluation (Autosegmentation of ORs): Minimum sample size of 18 patients.
- Quantitative Evaluation (Autosegmentation of ORs): Minimum sample size of 24 patients.
- Inter-expert Variability Evaluation (Autosegmentation of ORs): Minimum sample size of 13 patients.
- Quantitative Evaluation (Brain Metastasis, MR images): Minimum sample size of 51 patients.
- Quantitative Evaluation (Glioblastoma, MR images): Minimum sample size of 43 patients.
- Quantitative and Qualitative Evaluation (Automatic Treatment Plans): Minimum sample size of 20 patients.
Data Provenance: The document states, "All validation tests were carried out using datasets representative of the worldwide population receiving radiotherapy treatments." It does not specify the country of origin or whether the data was retrospective or prospective.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and the Qualifications of Those Experts:
The document refers to "medical experts" or "clinicians" for establishing ground truth and performing evaluations.
- For the non-regression testing of autosegmentation, "manual contours performed by medical experts" were used.
- For qualitative evaluation of autosegmentation, "medical experts" performed the qualitative evaluation.
- For inter-expert variability evaluation of autosegmentation, "two independent medical experts" were asked to contour the same images.
- For brain metastasis and glioblastoma segmentation, "contours provided by medical experts" were used for comparison.
- For the evaluation of automatic treatment plans, "medical experts" determined the clinical acceptability.
The specific number of experts beyond "two independent" for inter-expert variability is not consistently provided, nor are their exact qualifications (e.g., specific specialties like "radiation oncologist" or years of experience). However, the stated users of the device include "trained medical professionals including, but not limited to, radiotherapists, radiation oncologists, medical physicists, dosimetrists and medical professionals involved in the radiation therapy process," implying these are the types of professionals who would serve as experts.
4. Adjudication Method for the Test Set:
- For the inter-expert variability test, it involved comparing contours between two independent medical experts and with the software's contours. This implies a comparison rather than an explicit formal adjudication method (like 2+1 voting).
- For other segmentation evaluations, the ground truth was "manual contours performed by medical experts" or "contours provided by medical experts." It's not specified if these were consensus readings, or if an adjudication method was used if multiple experts contributed to a single ground truth contour for a case.
- For the automatic treatment plan qualitative evaluation, "expert review" is mentioned, but the number of reviewers or their adjudication process is not detailed.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
The document describes studies that evaluate the standalone performance of the AI for segmentation and treatment planning, and how its performance compares to expert-generated contours/plans or inter-expert variability. It does not explicitly describe an MRMC comparative effectiveness study designed to measure the improvement of human readers with AI assistance versus without AI assistance. The focus is on the AI's performance relative to expert-defined ground truths or benchmarks.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done:
Yes, the studies are largely focused on standalone algorithm performance.
- The "Non-regression testing," "Quantitative evaluation," and "Inter-expert variability evaluation" of autosegmentation explicitly compare the software's generated contours (algorithm only) against manual contours or inter-expert contours.
- The "Quantitative evaluation of autosegmentation of Brain metastasis" and "Glioblastoma" assess the algorithm's performance (sensitivity, precision, DSC, FP) against expert-provided contours.
- For "Automatic Treatment Plan Generations," the quantitative evaluation compares the algorithm's plans to manual plans, and the qualitative evaluation assesses the acceptance of the automatic plans by experts.
7. The Type of Ground Truth Used:
The primary ground truth relied upon in these studies is:
- Expert Consensus/Manual Contours: This is repeatedly stated as "manual contours performed by medical experts" or "contours provided by medical experts."
- Inter-expert Variability: For one specific study, the variability between two independent experts was used as a benchmark for comparison.
- Manual Treatment Plans: For the treatment plan evaluation, manual plans served as a benchmark for quantitative comparison.
No mention of pathology or outcomes data as ground truth is provided.
8. The Sample Size for the Training Set:
The document does not specify the sample size for the training set. It only mentions the training of the algorithm (e.g., "retraining or algorithm improvement").
9. How the Ground Truth for the Training Set Was Established:
The document does not explicitly describe how the ground truth for the training set was established. It only states that the device uses "deep-learning based automatic segmentation," implying that it would have been trained on curated data with established ground truth, likely also generated by medical experts, but the specifics are not detailed in this excerpt.
Ask a specific question about this device
(116 days)
MUJ
The ARIA Radiation Therapy Management product is a treatment plan and image management application. It enables the authorized user to enter, access, modify, store and archive treatment plan and image data from diagnostic studies, treatment planning, simulation, plan verification and treatment. ARIA Radiation Therapy Management also stores the treatment histories including dose delivered to defined sites and provides tools to verify performed treatments.
ARIA Radiation Therapy Management (ARIA RTM) manages several treatment information such as images and treatment data to prepare plans created for treatment and review post-treatment images and records. It also provides quality assurance options. ARIA RTM does not directly act on the patient. ARIA RTM is applied by trained medical professionals in the process of preparation and management of radiotherapy treatments for patients.
The provided FDA 510(k) summary for ARIA Radiation Therapy Management System (18.1) does not contain the detailed information necessary to fully answer your request regarding acceptance criteria and the study proving device performance.
Here's a breakdown of what can be extracted and what is missing:
1. Table of Acceptance Criteria and Reported Device Performance
The submission states: "Test results demonstrate conformance to applicable requirements and specifications." However, it does not provide a specific table of acceptance criteria or reported device performance metrics. It implies that underlying V&V documentation exists that confirms the software meets its design requirements, but these details are not present in this summary.
2. Sample size used for the test set and the data provenance
The document states: "No animal studies or clinical tests have been included in this pre-market submission." This indicates that the validation was likely based on non-clinical software testing, not patient data. Therefore, there is no patient-specific test set sample size or data provenance (e.g., country of origin, retrospective/prospective) mentioned. The testing would have involved simulated data, test cases, or internal datasets to verify software functionalities.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
Since the testing was non-clinical software verification and validation, there is no mention of experts establishing ground truth for a test set in the traditional sense of clinical evaluation. Software testing typically relies on predefined requirements, specifications, and expected outputs, rather than expert-adjudicated ground truth from medical images or patient cases.
4. Adjudication method for the test set
Similarly, because there are no clinical trials or expert-adjudicated test sets, there is no adjudication method (e.g., 2+1, 3+1) described.
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
The document explicitly states: "No animal studies or clinical tests have been included in this pre-market submission." Therefore, no MRMC comparative effectiveness study was conducted or reported for this submission. The device is a radiation therapy management system, not explicitly an AI-assisted diagnostic tool for human readers.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
The submission indicates that the software underwent "Software Verification and Validation Testing" and was considered a "major" level of concern. This implies extensive standalone algorithm (software) testing to ensure it meets its functional and safety requirements. However, specific details of these tests (e.g., test cases, scenarios, and their results) are not provided in this summary. The device "does not directly act on the patient" and is "applied by trained medical professionals," suggesting it's an assistive tool within a human workflow, but its core functionalities are tested in a standalone manner.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)
Given the non-clinical nature of the testing, the "ground truth" would have been established by the software requirements and specifications, test case design, and expected outputs defined by the developers. This is typical for software verification and validation, where the goal is to confirm the software performs as designed.
8. The sample size for the training set
The submission does not mention any training set as there is no indication of machine learning or AI models with external data training involved that would require such information. The changes appear to be feature enhancements to an existing software system.
9. How the ground truth for the training set was established
As no training set is mentioned, this information is not applicable.
In summary, the provided document focuses on the regulatory aspects of a software update (v18.1) to an existing device (v18.0) and highlights software verification and validation as the primary evidence of performance. It explicitly states that no clinical or animal studies were included. Therefore, the detailed performance metrics, test set characteristics, and expert involvement typically associated with clinical efficacy studies are not present.
Ask a specific question about this device
(124 days)
MUJ
The Eclipse Treatment Planning System (Eclipse TPS) is used to plan radiotherapy treatments for patients with malignant or benign diseases. Eclipse TPS is used to plan external beam irradiation with photon, electron and proton beams, as well as for internal radiation (brachytherapy) treatments.
Eclipse provides software tools for planning the treatment of malignant or benign diseases with radiation. Eclipse is a computer-based software device used by trained medical professionals to design and simulate radiation therapy treatments. It is capable of planning treatments for external beam irradiation with photon, electron, and proton beams, as well as for internal irradiation (brachytherapy) treatments.
Eclipse is used for planning external beam radiation therapy treatments employing photon energies between 1 and 50 MV, electron energies between 1 and 50 MeV, for proton energies between 50 and 300 MeV, and for planning internal radiation (brachytherapy) treatments with any clinically approved radioisotope. The treatment planning system utilizes a patient model or virtual patient derive from medical imaging techniques to simulate, calculate and optimize the radiation dose distribution inside the body during a treatment procedure in order to ensure effective treatment of the tumor but to minimize damage to surrounding tissue.
The provided document is a 510(k) premarket notification letter from the FDA regarding the Eclipse Treatment Planning System (18.1). It primarily addresses the substantial equivalence of the new version to a legally marketed predicate device (Eclipse Treatment Planning System 18.0).
Unfortunately, this document does not contain the detailed information required to describe acceptance criteria and a study proving the device meets those criteria, as requested in your prompt.
Here's why and what information is missing:
- No specific acceptance criteria table or performance metrics: The document states that "Test results demonstrate conformance to applicable requirements and specifications," but it does not list what those requirements or specifications are (the acceptance criteria), nor does it provide a table of reported device performance against those criteria.
- No information on clinical studies or human-in-the-loop performance: The document explicitly states, "No animal studies or clinical tests have been included in this pre-market submission." This immediately tells us that there was no MRMC study, no standalone performance study in a clinical context, and no ground truth established from patient outcomes or expert consensus for such a study.
- Focus on software V&V and equivalence to predicate: The "Summary of Performance Testing (Non-Clinical Testing)" section primarily references software verification and validation (V&V) activities (unit, integration, system testing) and measurement comparison tests using Gamma evaluation criteria and plan comparisons using clinical objectives and workflow testing to show comparability to the predicate. These are engineering and software quality assurance tests, not clinical performance studies with defined acceptance metrics for AI/algorithm performance.
- No mention of AI/algorithm specific performance: The document describes "RapidArc Dynamic" as an "improved optimization algorithm," but it does not treat it as a distinct AI algorithm requiring specific clinical performance validation against a ground truth as one might expect for a diagnostic or prognostic AI tool. The testing mentioned appears to be related to the accuracy and efficiency of the planning output compared to the predicate, rather than the performance of an AI model in a diagnostic or assistive capacity.
- No details on sample size, data provenance, expert ground truth, or adjudication: Because no clinical performance study was conducted or reported, all these details are consequently missing.
In summary, the provided document focuses on regulatory compliance, substantial equivalence to a predicate device, and general software V&V, rather than providing the detailed clinical performance study information you are asking for, which is typical for AI/ML-driven diagnostic or prognostic devices seeking regulatory clearance.
The "Eclipse Treatment Planning System" is a software tool used by trained medical professionals to design and simulate radiation therapy treatments. While it includes "optimization algorithms" (like RapidArc Dynamic), the FDA submission treats these changes as enhancements to an existing system, validated through engineering and software testing for comparability, rather than a novel AI/ML device requiring an independent clinical performance study as outlined in your prompt questions.
Ask a specific question about this device
(268 days)
MUJ
The XBeam Software can be used for validating the monitor units or radiation dose to a point that has been calculated by hand or another treatment planning system for external beam radiation therapy. In addition, the XBeam Software can also be used as a primary means of calculating the monitor units or radiation dose to a point for external beam radiation treatments.
XBeam is only intended to be used with Xstrahl's superficial and orthovoltage radiotherapy and surface electronic brachytherapy systems. XBeam is intended to be used by authorized personnel trained in medical physics.
XBeam is a standalone dose calculation software for Xstrahl's medical devices include:
- Xstrahl 100, Xstrahl 150, Xstrahl 200, Xstrahl 300 (K962613)
- X80 RADiant Photoelectric Therapy System (K172080)
- . RADiant Aura (X80 RADiant Photoelectric Therapy System) (K230611)
XBeam's dose calculation algorithm can be used to determine the beam-on time or monitor units based on the applicator and filter selected for the specific device. The beam-on time / monitor units are calculated based on the percent dose depth (PDD) curve and the absolute dose output for the specified applicatorfilter combination. The software allows for calculating treatment parameters for single or two (parallel opposed) beams.
XBeam is intended to be used within a clinical environment where the patient is treated with Xstrahl's medical systems. XBeam is intended to be used by authorized personnel trained in medical physics. It is not intended to be used by patients or general public.
Here's an analysis of the provided text regarding the acceptance criteria and the study that proves the device meets those criteria:
The provided FDA 510(k) summary for the XBeam (v2) device focuses on demonstrating substantial equivalence to its predicate device, RADCalc (K193381), primarily through a comparison of intended use, technical characteristics, and a summary of non-clinical testing. While it mentions "acceptance criteria" through verification and validation activities, it does not explicitly define specific numerical acceptance criteria (e.g., "accuracy must be > 95%") for its performance when compared against ground truth.
Instead, the summary reports the results of the performance testing and concludes that they are acceptable, implying that these results meet implicit acceptance criteria for clinical equivalence and safety/effectiveness.
Given this, I will infer the implicit acceptance criterion based on the reported results.
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criteria (Inferred) | Reported Device Performance |
---|---|
Dosimetric Accuracy (against hand calculation/RADCalc): Maximum difference in calculated dose/monitor units must be clinically acceptable. | Maximum difference found was 0.7%, attributed to interpolation/rounding errors. The output calculated by XBeam was "the same" as that calculated by hand calculation and by RADCalc. |
Dosimetric Accuracy (against delivered dose for energies 80kV): Measured and planned dose values must agree within clinically acceptable limits, considering measurement uncertainties. | Measured and planned dose values agree to within 1.8%. Measurement uncertainties estimated at 1.7%. |
Conformance to Standards: Device must meet requirements of specified medical device standards. | Conforms to IEC 62366-1, IEC 62304, and ISO 14971. |
Usability, Risk Mitigation, and Functionality: Device functionality works as per intended use, risks are mitigated, and is substantially equivalent. | Verification activities included system tests, module tests, anomaly verification, code reviews, and run-through integration tests (323 tests executed, all passed). Validation activities included clinical workflow, treatment planning, and software usability. |
2. Sample Size Used for the Test Set and Data Provenance
The document states: "Three hundred twenty-three (323) independent verification tests were executed." This refers to verification activities (system tests, module tests, etc.) rather than a specific test set of patient cases or dosimetric scenarios for performance evaluation against ground truth.
For the dosimetric accuracy validation:
- Sample size: Not explicitly stated as a number of distinct cases or patient datasets. It refers to comparing XBeam's output against two standard methods (hand calculations and RadCalc) and then comparing planned dose (presumably from XBeam) to delivered dose using physical measurements. The number of such comparisons or the range of parameters tested is not quantified.
- Data provenance: Not specified in terms of country of origin. The study appears to be a prospective validation of the software's dose calculation against established methods and physical measurements, rather than clinical retrospective or prospective patient data analysis.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
- Number of Experts: Not explicitly stated.
- Qualifications of Experts: The ground truth for dose calculation was established by "hand calculations" and the output of the predicate device "RadCalc (version: 7.3)." This implies that the 'experts' or processes involved in performing these hand calculations or configuring/using RadCalc would be "authorized personnel trained in medical physics" as stipulated in the device's indications for use.
4. Adjudication Method
Not applicable/specified. The validation involves direct comparison of numerical outputs (dose, monitor units) against established calculational methods and physical measurements, rather than assessment by multiple human reviewers requiring adjudication for a "ground truth" establishment in a subjective medical imaging context.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was not conducted or reported in this summary. The device is a dose calculation software, not an AI-powered diagnostic image analysis tool that would typically involve human readers interpreting results with and without AI assistance.
6. Standalone Performance Study
Yes, a standalone performance study was done. The summary describes the validation of the XBeam algorithm's output (dose calculations) by comparing it against two independent methods:
- Hand calculations.
- The output of the predicate device, RADCalc.
It also compared XBeam's planned dose to the physically delivered dose using measurement. This demonstrates the algorithm-only performance.
7. Type of Ground Truth Used
The ground truth used for the dosimetric accuracy validation was a combination of:
- Expert Consensus/Established Methods: "Hand calculations" (representing established physics principles and manual computation).
- A Legally Marketed Predicate Device's Output: "RadCalc (version: 7.3)".
- Physical Measurements/Outcomes Data (Indirectly): Comparison of "planned dose" (from XBeam) to "delivered dose" (presumably measured with dosimetry equipment in a controlled setting).
8. Sample Size for the Training Set
The document does not explicitly mention a "training set" or "training data." The XBeam software appears to be a dose calculation algorithm based on physics models, rather than a machine learning model that requires a distinct training phase with labeled data. Therefore, the concept of a training set as typically understood in AI/ML is not directly applicable to this description.
9. How the Ground Truth for the Training Set Was Established
As noted above, the concept of a training set is not explicitly referred to for XBeam. The data that would inform the development and calibration of such a physics-based dose calculation system would typically come from extensive commissioning data (e.g., PDD curves, absolute dose output, beam profiles) measured for each specific Xstrahl radiotherapy system it supports, established via standard medical physics protocols. These measurements would be considered the "ground truth" for calibrating the physics model within the software. However, the document does not detail this specific process for XBeam's development.
Ask a specific question about this device
(117 days)
MUJ
Vitesse is indicated for use as a treatment planning software application used by medical professionals to plan,guide,optimize and document high-dose-rate brachytherapy procedures
The Varian VITESSE 5.0 is a stand-alone application which provides treatment planning features for high dose rate brachytherapy. Vitesse supports real-time ultrasound guided implant procedures as well as image import based workflows based on DICOM RT. Vitesse supports DICOM RT export of the plan to another external dose planning system (e.g., Brachy Vision) or directly to the brachytherapy afterloader treatment unit.
Here's a breakdown of the acceptance criteria and the study information based on the provided text, unfortunately, much of the requested information (like specific performance metrics, sample sizes for test sets, expert details, or ground truth establishment) is not detailed in this 510(k) summary. This document primarily focuses on demonstrating substantial equivalence through software verification and validation, rather than a detailed performance study with quantifiable metrics against a ground truth.
1. Table of Acceptance Criteria and Reported Device Performance
Feature/Modification | Method of Evaluation | Acceptance Criteria (Explicitly Stated is Lacking) | Reported Device Performance (as stated) |
---|---|---|---|
Export plan to DICOM destination (e.g., Aria, Velocity) over network (DICOM C-STORE) | Software verification and validation covering the performance and use of this feature | Implied: Successful and accurate transfer of treatment plans via DICOM C-STORE. | Software verification and validation demonstrate the feature performs as intended. |
Populate key elements of patient record from Aria via DICOM query (DICOM C-FIND) | Software verification and validation covering the performance and use of this feature | Implied: Successful and accurate retrieval and population of patient record elements via DICOM C-FIND. | Software verification and validation demonstrate the feature performs as intended. |
Require user to enter valid Windows credentials and permissions prior to accessing Vitesse | Software verification and validation covering the performance and use of this feature | Implied: Successful enforcement of Windows security credentials and permissions. | Software verification and validation demonstrate the feature performs as intended. |
Provide more direct visualization of region definitions on the main UI | Software verification and validation covering the performance and use of this feature | Implied: Region definitions are displayed clearly and accurately on the UI. | Software verification and validation demonstrate the feature performs as intended. |
Provide support for PET images as secondary image volumes | Software verification and validation covering the performance and use of this feature | Implied: Successful import and display of PET DICOM images as secondary volumes without degradation. | Software verification and validation demonstrate the feature performs as intended. |
Provides support for import of PET DICOM images as secondary image volumes | Software verification and validation covering the performance and use of this feature | Implied: Successful import and display of PET DICOM images as secondary volumes without degradation. (Redundant with previous entry) | Software verification and validation demonstrate the feature performs as intended. |
Provides support of capturing color images (e.g., from color Doppler) and displaying them as color images | Software verification and validation covering the performance and use of this feature | Implied: Successful capture and accurate display of color images from various imaging modes. | Software verification and validation demonstrate the feature performs as intended. |
Overall device requirements and risk control measures | Software verification and validation testing | Implied: Performance as intended at a level similar to the predicate device. | "Software verification and validation testing for Vitesse 5.0 demonstrates that the device requirements and risk control measures perform as intended at a level similar to the predicate." |
Usability | Usability testing (according to IEC 62366) | Implied: Device performs well as intended for intended users, uses, and use environments. | "Usability testing was conducted according to the standard IEC 62366 to verify that the subject device performs well as intended for the intended users, uses, and use environments." |
Cybersecurity | Assessment per FDA guidances ("Cybersecurity in Medical Devices" series) | Implied: Prevention of unauthorized access, modification, misuse, denial of use, or unauthorized use of information. | "Varian Medical Systems conforms to cybersecurity requirements by implementing a means to prevent unauthorized access, modification, misuse, denial of use or unauthorized use of information stored, accessed or transferred from a medical device to an external recipient." |
2. Sample size used for the test set and the data provenance
The document does not specify a "test set" in terms of patient data or a dataset of medical images with ground truth for performance evaluation of specific features. The testing described is primarily software verification and validation testing and usability testing.
- Software Verification and Validation: This typically involves testing against software requirements and design specifications, which may use simulated data or previously known cases, but specific sample sizes of patient data are not mentioned.
- Usability Testing: The text states it was conducted, but provides no details on the number of participants or the nature of the test cases.
- Data Provenance: Not applicable in the context of the described software verification and validation, as it doesn't involve clinical data from patients.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
This information is not provided in the document. The study described is not a performance study against an established medical ground truth, but rather engineering verification and validation of software functionalities.
4. Adjudication method for the test set
This information is not provided as there is no described test set of medical cases requiring adjudication.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
No, an MRMC comparative effectiveness study was not done. The device is a treatment planning software, not a diagnostic AI system for human readers. No AI assistance or human reader improvement effect size is mentioned.
6. If a standalone (i.e. algorithm only without human-in-the loop performance) was done
This document describes the validation of a standalone software application (Vitesse 5.0) which provides treatment planning features. Its performance is evaluated through software verification and validation testing and usability testing, as opposed to a clinical performance study measuring accuracy against pathology. The software itself is standalone, but it is a "treatment planning software application used by medical professionals," meaning it is intended for human-in-the-loop use. The "standalone" here refers to it being a distinct software product rather than an integrated component of a larger system, not a fully automated AI system without human involvement.
7. The type of ground truth used
The concept of "ground truth" as typically understood in performance studies (e.g., pathology, clinical outcomes) is not applicable here. The primary "truth" against which the software was tested was its software specifications and requirements. The testing ensures the software functions as designed and intended, meets usability criteria, and adheres to cybersecurity standards, rather than assessing its diagnostic or prognostic accuracy against a biological or clinical gold standard.
8. The sample size for the training set
This information is not applicable and therefore not provided. This device is a treatment planning software, not a machine learning or AI model that requires a training set of data.
9. How the ground truth for the training set was established
This information is not applicable and therefore not provided as there is no training set mentioned or implied for this device.
Ask a specific question about this device
Page 1 of 25