K Number
K242748
Device Name
Oncospace
Manufacturer
Date Cleared
2025-04-11

(211 days)

Product Code
Regulation Number
892.5050
Panel
RA
Reference & Predicate Devices
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

Oncospace is used to configure and review radiotherapy treatment plans for a patient with malignant or benign disease in the head and neck, thoracic, abdominal, and pelvic regions. It allows for set up of radiotherapy treatment protocols, association of a potential treatment plan with the protocol(s), submission of a dose prescription and achievable dosimetric goals to a treatment planning system, and review of the treatment plan. It is intended for use by qualified, trained radiation therapy professionals (such as medical physicists, oncologists, and dosimetrists). This device is for prescription use by order of a physician.

Device Description

The Oncospace software supports radiation oncologists and medical dosimetrists during radiotherapy treatment planning. The software includes locked machine learning algorithms. During treatment planning, the Oncospace software works in conjunction with, and does not replace, a treatment planning system (TPS).

The Oncospace software is intended to augment the treatment planning process by:

  • allowing the radiation oncologist to select and customize a treatment planning protocol that includes dose prescription (number of fractions, dose per fraction, dose normalization), a delivery method (beam type and geometry), and protocol-based dosimetric goals/objectives for treatment targets, and organs at risk (OAR);
  • predicting dosimetric goals/objectives for OARs based on patient-specific anatomical geometry;
  • automating the initiation of plan optimization on a TPS by supplying the dose prescription, delivery method, protocol-based target objectives, and predicted OAR objectives;
  • providing a user interface for plan evaluation against protocol-based and predicted goals.

Diagnosis and treatment decisions occur prior to treatment planning and do not involve Oncospace. Decisions involving Oncospace are restricted to setting of dosimetric goals for use during plan optimization and plan evaluation. Human judgement continues to be applied in accepting these goals and updating them as necessary during the iterative beam optimization process. Human judgement is also still applied as in standard practice during plan quality assessment; the protocol-based OAR goals are used as the primary means of plan assessment, with the role of the predicted goals being to provide additional information as to whether dose to an OAR may be able to be further lowered.

When Oncospace is used in conjunction with a TPS, the user retains full control of the TPS, including finalization of the treatment plan created for the patient. Oncospace also does not interface with the treatment machines. The risk to patient safety is lower than a TPS since it only informs the treatment plan, does not allow region of interest editing, does not make treatment decisions, and does not interface directly with the treatment machine or any record and verify system.

Oncospace's OAR dose prediction approach, and the use of predictions in end-to-end treatment planning workflow, has been tested for use with a variety of cancer treatment plans. These included a wide range of target and OAR geometries, prescriptions and boost strategies (sequential and simultaneous delivery). Validity has thus been demonstrated for the range of prediction model input features encountered in the test cases. This range is representative of the diversity of the same feature types (describing target-OAR proximity, target and OAR shapes, sizes, etc.) encountered across all cancer sites. Given that the same feature types will be used in OAR dose prediction models trained for all sites, the modeling approach validated here is not cancer site specific, but rather is designed to predict OAR DVHs based on impactful features common to all sites. The software is designed to be used in the context of all forms of intensity-modulated photon beam radiotherapy. The planning objectives themselves are intended to be TPS-independent: these are instead dependent on the degree of organ sparing possible given the beam modality and range of delivery techniques for plans in the database. To facilitate streamlined transmission of DICOM files and plan parameters Oncospace includes scripts using the treatment planning system's scripting language (for example, Pinnacle).

The Oncospace software includes an algorithm for transforming non-standardized OAR names used by treatment planners to standardized names defined by AAPM Task Group 263. This matching process primarily uses a table of synonyms that is updated as matches are made during use of the product, as well as a Natural Language Processing (NLP) model that attempts to match plan names not already in the synonym table. The NLP model selects the most likely match, which may be a correct match to a standard OAR name, an incorrect match, or no match (when the model considers this to be most likely, such as for names resembling a target). The user can also manually match names using a drop-down menu of all TG-263 OAR names. The user is instructed to check each automated match and make corrections using the drop-down menu as needed.

AI/ML Overview

Based on the provided 510(k) Clearance Letter, here's a detailed description of the acceptance criteria and the study proving the device meets them:

1. Table of Acceptance Criteria and Reported Device Performance

The document describes two main types of performance testing: clinical performance testing and model performance testing. The acceptance criteria are implicitly defined by the reported performance achieving non-inferiority or being within acceptable error margins.

Acceptance Criteria CategorySpecific Metric/TargetReported Device Performance
Clinical Performance (Primary Outcome)OAR Dose Sparing Non-inferiority Margin:
  • Thoracic: 2.2 Gy
  • Abdominal: 1 Gy
  • Pelvis (Gynecological): 1.9 Gy | Achieved Non-Inferiority:
  • Mean OAR dose was statistically significantly lower for 5 OARs for abdominal and 4 OARs for pelvis (gynecological).
  • No statistically significant differences in mean dose for remaining 11 OARs for thoracic, 3 OARs for abdominal, and 2 OARs for pelvis (gynecological).
  • Non-inferiority demonstrated to 2.2 Gy for thoracic, 1 Gy for abdominal, and 1.9 Gy for pelvis (gynecological). |
    | Clinical Performance (Secondary Outcome) | Target Coverage Maintenance: No statistically significant difference in target coverage compared to clinical plans without Oncospace. | Achieved: No statistically significant difference in target coverage between clinical plans and plans created with use of the Oncospace system. |
    | Clinical Performance (Effort Reduction) | No increased optimization cycles when using Oncospace vs. traditional workflow. (Implicit acceptance criteria) | Achieved: Out of all the plans tested, no plan required more optimization cycles using Oncospace versus using traditional radiation treatment planning clinical workflow. |
    | Model Performance (H&N External Validation) | Mean Absolute Error (MAE) in OAR DVH dose values:
  • Institution 2: Within 5% of prescription dose for all OARs.
  • Institution 3: Within 5% of prescription dose for all OARs. | Achieved (with some exceptions):
  • Institution 2: MAE within 5% for 9/12 OARs; does not exceed 9% for any OARs.
  • Institution 3: MAE within 5% for 10/12 OARs; does not exceed 8% for any OARs. |
    | Model Performance (Prostate External Validation) | Mean Absolute Error (MAE) in OAR DVH dose values: Within 5% of prescription dose for all OARs. | Achieved (with some exceptions):
  • Institution 3: MAE within 5% for 4/6 OARs; 5.1% for one OAR; 15.9% for one OAR. |
    | NLP Model Performance (Cross-Validation) | Validation macro-averaged F1 score above 0.92 and accuracy above 96% for classifying previously unseen terms. | Achieved: All models achieved a validation macro-averaged F1 score above 0.92 and accuracy above 96%. |
    | NLP Model Performance (External Validation) | Correctly match a high percentage of unique and total structure names. (Implicit acceptance criteria) | Achieved: Correctly matched 207/221 (94.1%) of all structure names, or 131/145 (91.0%) unique structure names. |
    | General Verification Tests | All system requirements and acceptance criteria met (clinical, standard UI, cybersecurity). | Achieved: Met all system requirements and acceptance criteria. |

2. Sample Sizes and Data Provenance

The document provides detailed sample sizes for training/tuning, and external performance/clinical validation datasets.

  • Test Set Sample Sizes:

    • Clinical Validation Dataset:
      • Head and Neck: 18 patients (previously validated)
      • Thoracic: 20 patients (14 lung, 6 esophagus)
      • Abdominal: 17 patients (11 pancreas, 6 liver)
      • Pelvis: 17 patients (12 prostate, 5 gynecological) (prostate previously validated)
    • External Performance Test Dataset(s) (Model Performance):
      • Head and Neck: Dataset A: 265 patients (Institution_2); Dataset B: 27 patients (Institution_3)
      • Prostate: 40 patients (Institution_3)
    • NLP Model External Testing Dataset: 221 structures with 145 unique original names.
  • Data Provenance (Country of Origin, Retrospective/Prospective):

    • Training/Tuning/Internal Testing Datasets: Acquired from Johns Hopkins University (JHU) between 2008-2019. Johns Hopkins University is located in the United States. This data is retrospective.
    • External Performance Test Datasets: Acquired from Institution_2 and Institution_3. Locations of these institutions are not specified but are implied to be distinct from JHU. This data is retrospective.
    • Clinical Validation Datasets: Acquired from Johns Hopkins University (JHU) between 2021-2024 (for Thoracic, Abdominal, Pelvis) and 2021-2022 (for H&N, previously validated). This data is retrospective.
    • NLP Model Training/External Validation: Trained and validated using "known name matches in the prostate, gynecological, head and neck, thoracic, and pancreas cancer datasets licensed to Oncospace by Johns Hopkins University." This indicates retrospective data from the United States.

3. Number of Experts and Qualifications for Ground Truth

The document does not explicitly state the number of experts or their specific qualifications (e.g., years of experience, types of radiologists) used to establish the ground truth for the test sets.

Instead, it refers to "heterogenous sets of traditionally-planned clinical treatment plans" and "curated, gold-standard treatment plans" (for the predicate device comparison table, implying similar for the subject). This suggests that the ground truth for the OAR dose values reflects actual clinical outcomes from existing treatment plans.

For the NLP model, the ground truth was "known name matches" in the acquired datasets, implying consensus or established naming conventions from the institutions, rather than real-time expert adjudication for the study.

4. Adjudication Method for the Test Set

The document does not describe an explicit adjudication method (e.g., 2+1, 3+1 reader adjudication) for establishing the ground truth dose values or treatment plan quality for the test sets. The "ground truth" seems to be defined by:

  • Clinical Performance Testing: "heterogenous sets of traditionally-planned clinical treatment plans," implying the actual clinical plans serve as the comparative ground truth.
  • Model Performance Testing: "comparison of predicted dose values to ground truth values." These ground truth values appear to be the actual recorded DVH dose values from the clinical plans in the external test datasets.
  • NLP Model: "known name matches" from the licensed datasets, suggesting pre-defined or institutional standards.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

No, an MRMC comparative effectiveness study involving human readers improving with AI vs. without AI assistance was not described in this document.

The study focused on:

  • Comparing plans generated with Oncospace to traditionally-planned clinical treatment plans (effectively comparing AI-assisted plan generation to human-only plan generation, but without a specific MRMC design to measure human reader improvement).
  • Assessing the ability of Oncospace to maintain or improve OAR sparing and target coverage.
  • Evaluating the accuracy of the model's dose predictions and the NLP module.

The study design described is a non-inferiority trial for clinical performance, and model performance accuracy assessments, not an MRMC study quantifying human reader performance change.

6. Standalone (Algorithm Only) Performance

Yes, standalone performance was done for:

  • Model Performance Testing: This involved comparing the model's predicted OAR DVH dose values directly against "ground truth values" (actual recorded dose values from clinical plans) in the external test datasets. This is an algorithm-only (standalone) assessment of the dose prediction accuracy, independent of the overall human-in-the-loop clinical workflow.
  • NLP Model Performance: The NLP model's accuracy in mapping non-standardized OAR names to TG-263 names was evaluated in a standalone manner using cross-validation and an external test dataset.

The "clinical performance testing," while ultimately comparing plans with Oncospace assistance to traditional plans, is also evaluating the algorithm's influence on the final plan quality. However, the explicit "model performance testing" sections clearly describe standalone algorithm evaluation.

7. Type of Ground Truth Used

The ground truth used in this study primarily relied on:

  • Existing Clinical Treatment Plans/Outcomes Data: For clinical performance testing, "heterogenous sets of traditionally-planned clinical treatment plans" served as the comparative baseline. The OAR doses and target coverage from these real-world clinical plans constituted the "ground truth" for comparison. This can be categorized as outcomes data in terms of actual treatment parameters delivered in a clinical setting.
  • Recorded Dosimetric Data: For model performance testing, the "ground truth values" for predicted DVH doses were the actual, recorded DVH dose values from the clinical plans in the external datasets. This data is derived from expert consensus in practice (as these were actual clinical plans deemed acceptable by clinicians) and outcomes data (the resulting dose distributions).
  • Established Reference/Consensus (NLP): For the NLP model, the ground truth was based on "known name matches" or "standardized names defined by AAPM Task Group 263," which represents expert consensus or authoritative standards.

8. Sample Size for the Training Set

The document refers to the "Development (Training/Tuning) and Internal Performance Testing Dataset (randomly split 80/20)" for each anatomical location. Assuming the 80% split is for training:

  • Head and Neck: 1145 patients (80% for training) = approx. 916 patients
  • Thoracic: 1623 patients (80% for training) = approx. 1298 patients
  • Abdominal: 712 patients (80% for training) = approx. 569 patients
  • Pelvis: 1785 patients (80% for training) = approx. 1428 patients

9. How the Ground Truth for the Training Set Was Established

The ground truth for the training set for the dose prediction models was established based on retrospective clinical data from Johns Hopkins University between 2008-2019. These were actual treatment plans for patients who received radiation therapy, meaning their dosimetric parameters (like OAR doses and target coverage from DVHs) and anatomical geometries (from imaging) were used as the input features and target outputs for the machine learning models.

The plans were selected based on certain criteria (e.g., "required to exhibit 90% target coverage" for H&N, "92% target coverage" for Thoracic, etc.), implying these were clinically acceptable plans. This essentially makes the ground truth for training derived from expert consensus in practice (as these were plans approved and delivered by medical professionals at a major institution) and historical outcomes data (the actual treatment parameters achieved).

For the NLP model, the training ground truth was "known name matches" in these same prostate, gynecological, head and neck, thoracic, and pancreas cancer datasets, meaning established mappings between unstandardized and standardized OAR names were used. This again points to expert consensus/standardization.

§ 892.5050 Medical charged-particle radiation therapy system.

(a)
Identification. A medical charged-particle radiation therapy system is a device that produces by acceleration high energy charged particles (e.g., electrons and protons) intended for use in radiation therapy. This generic type of device may include signal analysis and display equipment, patient and equipment supports, treatment planning computer programs, component parts, and accessories.(b)
Classification. Class II. When intended for use as a quality control system, the film dosimetry system (film scanning system) included as an accessory to the device described in paragraph (a) of this section, is exempt from the premarket notification procedures in subpart E of part 807 of this chapter subject to the limitations in § 892.9.