(116 days)
Koios Decision Support (DS) is an artificial intelligence (AI)/machine learning (ML)-based computer-aided diagnosis (CADx) software device intended for use as an adjunct to diagnostic ultrasound examinations of lesions or nodules suspicious for breast or thyroid cancer.
Koios DS allows the user to select or confirm regions of interest (ROIs) within an image representing a single lesion or nodule to be analyzed. The software then automatically characterizes the selected image data to generate an AI/ML-derived cancer risk assessment and selects applicable lexicon-based descriptors designed to improve overall diagnostic accuracy as well as reduce interpreting physician variability.
Koios DS software may also be used as an image viewer of multi-modality digital images, including ultrasound and mammography. The software includes tools that allow users to adjust, measure and document images, and output into a structured report.
Koios DS software is designed to assist trained interpreting physicians in analyzing the breast ultrasound images of adult (>= 22 years) female patients with soft tissue breast lesions and/or thyroid ultrasounds of all adult (>= 22 years) patients with thyroid nodules suspicious for cancer. When utilized by an interpreting physician who has completed the prescribed training, this device provides information that may be useful in recommending appropriate clinical management.
Limitations:
· Patient management decisions should not be made solely on the results of the Koios DS analysis.
· Koios DS software is not to be used for the evaluation of normal tissue, on sites of post-surgical excision, or images with doppler, elastography, or other overlays present in them.
· Koios DS software is not intended for use on portable handheld devices (e.g. smartphones or tablets) or as a primary diagnostic viewer of mammography images.
· The software does not predict the presence of the thyroid nodule margin descriptor, extra-thyroidal extension. In the event that this condition is present, the user may select this category manually from the margin descriptor list.
Koios Decision Support (DS) is a software application designed to assist trained interpreting physicians in analyzing breast and thyroid ultrasound images. The software device is a web application that is deployed to a Microsoft IIS web server and accessed by a user through a compatible client. Once logged in and granted access to the Koios DS application, the user examines selected breast or thyroid ultrasound DICOM images. The user selects Regions of Interest (ROIs) of orthogonal views of a breast lesion or thyroid nodule for processing by Koios DS. The ROI(s) are transmitted electronically to the Koios DS server for image processing and the results are returned to the user for review.
Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:
Device Name: Koios DS Version 3.6
1. Table of Acceptance Criteria and Reported Device Performance (Combining Breast and Thyroid where applicable):
Acceptance Criteria Category | Specific Metric (Breast Engine) | Reported Device Performance (Breast Engine) | Specific Metric (Thyroid Engine) | Reported Device Performance (Thyroid Engine) | Acceptance Criteria (Smart Click) | Reported Device Performance (Smart Click) | Acceptance Criteria (Image Registration & Matching) | Reported Device Performance (Image Registration & Matching) | Acceptance Criteria (OCR) | Reported Device Performance (OCR) |
---|---|---|---|---|---|---|---|---|---|---|
Standalone Performance (AI Engine) | Malignancy Risk Classifier AUC | 0.945 [0.932, 0.959] (increased from 0.929) | AUC (ACR TI-RADS, with AI Adapter) | 79.8% (significant increase over average physician AUC) | Non-inferiority Test - Sensitivity / Specificity | Sensitivity: Difference = -0.009 [-0.036, 0.018] (Non-inferior) | ||||
Specificity: Difference = -0.018 [-0.041, 0.005] (Non-inferior) | No Match Rate | 0.32% | Breast Freetext Identification (by field) | Breast Side: 0.983 | ||||||
Location Type: 0.948 | ||||||||||
Clock Hour: 0.926 | ||||||||||
Clock Minute: 0.934 | ||||||||||
CMFN: 0.944 | ||||||||||
Plane: 0.976 | ||||||||||
Categorical Output Sensitivity | 0.976 [0.960, 0.992] (increased from 0.97) | Sensitivity (ACR TI-RADS, biopsy rec., with AI Adapter) | 0.644 [0.545, 0.744] (non-significant improvement over avg physician) | Non-inferiority Test - AUC | Difference = -0.012 [-0.029, 0.006] (Non-inferior) | Average Time for Study Preprocessing | 2.39 +/- 0.48 seconds | Thyroid Freetext Identification (by field) | Thyroid Side: 0.965 | |
Pole: 0.976 | ||||||||||
Region: 0.998 | ||||||||||
Plane: 0.970 | ||||||||||
Categorical Output Specificity | 0.632 [0.588, 0.676] (increased from 0.61) | Specificity (ACR TI-RADS, biopsy rec., with AI Adapter) | 0.612 [0.566, 0.658] (significant improvement over avg physician) | Sub-optimal ROI Test | Difference = 0.026 [-0.009, 0.062] (Non-inferior) | Average Time for Image Matching | 0.22 +/- 0.12 seconds | Measurement Text Identification (by field) | Measurement Description: 0.943 | |
Measurement Value: 0.948 | ||||||||||
Unit of Measurement: 0.967 | ||||||||||
Sensitivity to Region of Interest | 0.012 (decreased from 0.019) | Sensitivity (ACR TI-RADS, follow-up rec., with AI Adapter) | 0.879 [0.812, 0.946] (non-significant improvement) | Detection DICE Coefficient | DICE = 0.913 +/- 0.075 (demonstrating precise approximation to physician ROIs) | End-to-End Breast Engine Performance | AUC = 0.946 | |||
Sensitivity = 0.975 | ||||||||||
Specificity = 0.637 | ||||||||||
Sensitivity to Transducer Frequency (High freq, >=15MHz) | AUC = 0.948 [0.917, 0.978] (increased from 0.940) | Specificity (ACR TI-RADS, follow-up rec., with AI Adapter) | 0.495 [0.446, 0.544] (significant improvement) | Non-inferiority Test - Descriptor Agreement (per descriptor, e.g., Composition) | Demonstrated non-inferiority for all listed descriptors (Composition, Echogenicity, Shape, Margin, Echogenic Foci subcategories). Examples: Composition: 0.018 [0.001, 0.035]; Echogenicity: -0.005 [-0.022, 0.011] | End-to-End Thyroid Engine Performance | AUC = 0.801 | |||
Sensitivity = 0.670 | ||||||||||
Specificity = 0.603 | ||||||||||
Sensitivity to Transducer Frequency (Low freq, Significant increase in agreement. |
* Intra-operator variability (class switching rate):
* USE Alone: 13.6%
* USE + DS: 10.8% (p = 0.042) => **Statistically significant reduction.**
* **Thyroid (CRRS-3 Study):**
* Change in average AUC (USE + DS vs. USE Alone, all readers, all data): **+0.083 [0.066, 0.099]** (parametric) / **+0.079 [0.062, 0.096]** (non-parametric)
* Specifically for US readers, US data: **+0.074 [0.051, 0.098]** (parametric) / **+0.073 [0.049, 0.096]** (non-parametric). This demonstrates a statistically significant improvement in overall reader performance.
* **Change in average Sensitivity/Specificity of FNA (with AI Adapter + size criteria):**
* All readers, all data: **+0.084 (sensitivity), +0.140 (specificity)**
* US readers, US data: **+0.058 (sensitivity), +0.130 (specificity)**
* **Change in average Sensitivity/Specificity of Follow-up (with AI Adapter + size criteria):**
* All readers, all data: **+0.060 (sensitivity), +0.206 (specificity)**
* US readers, US data: **+0.053 (sensitivity), +0.180 (specificity)**
* Inter-Reader Variability (relative change in TI-RADS points association): **40.7% (all readers, all data)**, 37.4% (US readers, US data), 49.7% (EU Readers, EU Data)
* Impact on Interpretation Time: **-23.6% (all readers, all data)**, -22.7% (US readers, US data), -32.4% (EU Readers, EU Data).
6. Standalone (Algorithm Only without Human-in-the-loop) Performance:
- Yes, for both Breast and Thyroid AI Engines, Smart Click, Image Registration and Matching, and OCR.
- Breast Engine: AUC = 0.945; Sensitivity = 0.976; Specificity = 0.632.
- Thyroid Engine (ACR TI-RADS, biopsy recommendation): Sensitivity = 0.644; Specificity = 0.612.
- Thyroid Smart Click: Demonstrated non-inferiority for Sensitivity, Specificity, AUC, and descriptor agreement compared to physician-selected calipers. Detection DICE = 0.913.
- Image Registration and Matching: Very high DICE coefficients (Breast 0.995, Thyroid 0.996) and successful match rates (>99.5%).
- OCR Engine: High accuracy rates for identification of various freetext and measurement fields (e.g., Breast Side 0.983, Measurement Value 0.948).
7. Type of Ground Truth Used:
- Malignancy Risk Classification (Breast & Thyroid AI Engines):
- Breast: Pathology or 1-year follow-up.
- Thyroid: Pathology results only (for standalone). Clinical study also used cyto-/histological or excisional pathology.
- Descriptor Predictions (Thyroid Standalone): Tested objectively against ground truth pathology and subjectively for agreement with readers' descriptor categorizations.
- Smart Click, Image Registration, OCR: Ground truth was established by manual annotations, physician-drawn ROIs, or defined objective metrics (like DICE coefficient against a reference ROI).
8. Sample Size for Training Set:
- Not explicitly stated for either Breast or Thyroid engines. The text mentions drawing upon a "large database of known cases" for the underlying engines and that the test sets were "set aside from the system's training data." However, the exact number of cases/images in the training set is not provided.
9. How Ground Truth for Training Set was Established:
- Not explicitly detailed for either Breast or Thyroid engines. The text states the engines "draw upon knowledge learned from a large database of known cases, tying image features to their eventual diagnosis, to form a predictive model." This implies that the training data had associated definitive diagnoses (e.g., from pathology or follow-up), but the process of establishing this ground truth (e.g., expert review, adjudication) for the training data is not described.
§ 892.2060 Radiological computer-assisted diagnostic software for lesions suspicious of cancer.
(a)
Identification. A radiological computer-assisted diagnostic software for lesions suspicious of cancer is an image processing prescription device intended to aid in the characterization of lesions as suspicious for cancer identified on acquired medical images such as magnetic resonance, mammography, radiography, or computed tomography. The device characterizes lesions based on features or information extracted from the images and provides information about the lesion(s) to the user. Diagnostic and patient management decisions are made by the clinical user.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithms including, but not limited to, a detailed description of the algorithm inputs and outputs, each major component or block, and algorithm limitations.
(ii) A detailed description of pre-specified performance testing protocols and dataset(s) used to assess whether the device will improve reader performance as intended.
(iii) Results from performance testing protocols that demonstrate that the device improves reader performance in the intended use population when used in accordance with the instructions for use. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, predictive value, and diagnostic likelihood ratio). The test dataset must contain sufficient numbers of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Standalone performance testing protocols and results of the device.
(v) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; and description of verification and validation activities including system level test protocol, pass/fail criteria, results, and cybersecurity).(2) Labeling must include:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the intended reading protocol.
(iii) A detailed description of the intended user and recommended user training.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Warnings, precautions, and limitations, including situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) Detailed instructions for use.
(viii) A detailed summary of the performance testing, including: Test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders (
e.g., lesion and organ characteristics, disease stages, and imaging equipment).