Search Results

Koios Decision Support (DS) is an artificial intelligence (AI)/machine learning (ML)-based computer-aided diagnosis (CADx) software device intended for use as an adjunct to diagnostic ultrasound examinations of lesions or nodules suspicious for breast or thyroid cancer.

Koios DS allows the user to select or confirm regions of interest (ROIs) within an image representing a single lesion or nodule to be analyzed. The software then automatically characterizes the selected image data to generate an AI/ML-derived cancer risk assessment and selects applicable lexicon-based descriptors designed to improve overall diagnostic accuracy as well as reduce interpreting physician variability.

Koios DS software may also be used as an image viewer of multi-modality digital images, including ultrasound and mammography. The software includes tools that allow users to adjust, measure and document images, and output into a structured report.

Koios DS software is designed to assist trained interpreting physicians in analyzing the breast ultrasound images of adult (>= 22 years) female patients with soft tissue breast lesions and/or thyroid ultrasounds of all adult (>= 22 years) patients with thyroid nodules suspicious for cancer. When utilized by an interpreting physician who has completed the prescribed training, this device provides information that may be useful in recommending appropriate clinical management.

Limitations:
· Patient management decisions should not be made solely on the results of the Koios DS analysis.
· Koios DS software is not to be used for the evaluation of normal tissue, on sites of post-surgical excision, or images with doppler, elastography, or other overlays present in them.
· Koios DS software is not intended for use on portable handheld devices (e.g. smartphones or tablets) or as a primary diagnostic viewer of mammography images.
· The software does not predict the presence of the thyroid nodule margin descriptor, extra-thyroidal extension. In the event that this condition is present, the user may select this category manually from the margin descriptor list.

Device Description

Koios Decision Support (DS) is a software application designed to assist trained interpreting physicians in analyzing breast and thyroid ultrasound images. The software device is a web application that is deployed to a Microsoft IIS web server and accessed by a user through a compatible client. Once logged in and granted access to the Koios DS application, the user examines selected breast or thyroid ultrasound DICOM images. The user selects Regions of Interest (ROIs) of orthogonal views of a breast lesion or thyroid nodule for processing by Koios DS. The ROI(s) are transmitted electronically to the Koios DS server for image processing and the results are returned to the user for review.

AI/ML Overview

Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:

Device Name: Koios DS Version 3.6

1. Table of Acceptance Criteria and Reported Device Performance (Combining Breast and Thyroid where applicable):

Acceptance Criteria Category	Specific Metric (Breast Engine)	Reported Device Performance (Breast Engine)	Specific Metric (Thyroid Engine)	Reported Device Performance (Thyroid Engine)	Acceptance Criteria (Smart Click)	Reported Device Performance (Smart Click)	Acceptance Criteria (Image Registration & Matching)	Reported Device Performance (Image Registration & Matching)	Acceptance Criteria (OCR)	Reported Device Performance (OCR)
Standalone Performance (AI Engine)	Malignancy Risk Classifier AUC	0.945 [0.932, 0.959] (increased from 0.929)	AUC (ACR TI-RADS, with AI Adapter)	79.8% (significant increase over average physician AUC)	Non-inferiority Test - Sensitivity / Specificity	Sensitivity: Difference = -0.009 [-0.036, 0.018] (Non-inferior) Specificity: Difference = -0.018 [-0.041, 0.005] (Non-inferior)	No Match Rate	0.32%	Breast Freetext Identification (by field)	Breast Side: 0.983 Location Type: 0.948 Clock Hour: 0.926 Clock Minute: 0.934 CMFN: 0.944 Plane: 0.976
	Categorical Output Sensitivity	0.976 [0.960, 0.992] (increased from 0.97)	Sensitivity (ACR TI-RADS, biopsy rec., with AI Adapter)	0.644 [0.545, 0.744] (non-significant improvement over avg physician)	Non-inferiority Test - AUC	Difference = -0.012 [-0.029, 0.006] (Non-inferior)	Average Time for Study Preprocessing	2.39 +/- 0.48 seconds	Thyroid Freetext Identification (by field)	Thyroid Side: 0.965 Pole: 0.976 Region: 0.998 Plane: 0.970
	Categorical Output Specificity	0.632 [0.588, 0.676] (increased from 0.61)	Specificity (ACR TI-RADS, biopsy rec., with AI Adapter)	0.612 [0.566, 0.658] (significant improvement over avg physician)	Sub-optimal ROI Test	Difference = 0.026 [-0.009, 0.062] (Non-inferior)	Average Time for Image Matching	0.22 +/- 0.12 seconds	Measurement Text Identification (by field)	Measurement Description: 0.943 Measurement Value: 0.948 Unit of Measurement: 0.967
	Sensitivity to Region of Interest	0.012 (decreased from 0.019)	Sensitivity (ACR TI-RADS, follow-up rec., with AI Adapter)	0.879 [0.812, 0.946] (non-significant improvement)	Detection DICE Coefficient	DICE = 0.913 +/- 0.075 (demonstrating precise approximation to physician ROIs)	End-to-End Breast Engine Performance	AUC = 0.946 Sensitivity = 0.975 Specificity = 0.637
	Sensitivity to Transducer Frequency (High freq, >=15MHz)	AUC = 0.948 [0.917, 0.978] (increased from 0.940)	Specificity (ACR TI-RADS, follow-up rec., with AI Adapter)	0.495 [0.446, 0.544] (significant improvement)	Non-inferiority Test - Descriptor Agreement (per descriptor, e.g., Composition)	Demonstrated non-inferiority for all listed descriptors (Composition, Echogenicity, Shape, Margin, Echogenic Foci subcategories). Examples: Composition: 0.018 [0.001, 0.035]; Echogenicity: -0.005 [-0.022, 0.011]	End-to-End Thyroid Engine Performance	AUC = 0.801 Sensitivity = 0.670 Specificity = 0.603
	Sensitivity to Transducer Frequency (Low freq, <15MHz)	AUC = 0.940 [0.925, 0.956] (increased from 0.924)	AUC (ATA, with AI Adapter)	Significant increase of 9.135% [5.975, 12.294] over physician AUC	Overall Thyroid Engine Met or Exceeded Performance Requirements in all tests.		Breast Image Matching Outcomes	Successful Match: 99.5% (2018/2028 ROIs) No Match: 0.5% (10/2028 ROIs) Incorrect Match: 0.0% Incorrect Image: 0.0%
	Single Image vs Orthogonal Image Pair	Single Image: 0.932 [+/- 0.003] (not directly comparable, but improved standalone AUC overall)	Sensitivity (ATA, with AI Adapter)	Non-significant increase of 0.511% [-5.182, 6.204]			Breast Image Matching DICE Coefficient	0.995 +/- 0.005
	Operating Point (PLR, NLR, PPV, NPV)	Improved from predicate: PLR = 2.661 [2.338, 2.984]; NLR = 0.039 [0.013, 0.064]; PPV = 0.708 [0.672, 0.743]; NPV = 0.966 [0.944, 0.988]	Specificity (ATA, with AI Adapter)	Significant increase of 18.741% [9.885, 27.596]			Thyroid Image Matching Outcomes	Successful Match: 100% (1288/1288 ROIs) No Match: 0.0% Incorrect Match: 0.0% Incorrect Image: 0.0%
	Overall Breast Engine Met or Exceeded Performance Requirements in all tests.		Overall Thyroid Engine Met or Exceeded Performance Requirements in all tests.				Thyroid Image Matching DICE Coefficient	0.996 +/- 0.004

2. Sample Sizes and Data Provenance for Test Sets:

Breast Engine Standalone/Clinical Test Set:
- Sample Size: 900 lesions from 900 different patients. An expanded validation set of 1014 cases (900 + 114 additional) was used for dataset drift.
- Data Provenance: Retrospective. Images sourced from a wide variety of ultrasound hardware. Patient demographic distribution was based upon data from the Breast Cancer Surveillance Consortium (2006-2009) to ensure representativeness of diverse populations.
Thyroid Engine Standalone/Clinical Test Set:
- Sample Size: 650 retrospectively collected cases (nodules) from 650 different patients. Each lesion represented by two orthogonal images, totaling 1000 images for standalone testing.
- Data Provenance: Retrospective. Data analysis cases involved images from both the US (500 cases, 77%) and Europe (150 cases). Tested on images from a wide variety of ultrasound hardware.
Thyroid Smart Click Test Set:
- Sample Size: 650 nodules.
- Data Provenance: Not explicitly stated, but likely the same validation dataset as the Thyroid Engine, derived retrospectively from US and European locations.
Image Registration and Matching Test Set:
- Sample Size: 1,600 ultrasound studies (950 breast, 650 thyroid), involving 2028 breast ROIs and 1288 thyroid ROIs.
- Data Provenance: Not explicitly stated, but likely drawn from the same validation datasets for breast and thyroid as mentioned above.
OCR Test Set:
- Sample Size: 1910 ultrasound B-Scans (mix of thyroid and breast images). A subset of 1226 images from supported machines was used for evaluation.
- Data Provenance: Not explicitly stated, but derived from a variety of machines.

3. Number of Experts and Qualifications for Ground Truth - Test Set:

Breast Engine Standalone: Not applicable for malignancy risk classification ground truth, which was pathology or 1-year follow-up. For categorical agreement metrics (Shape, Orientation), it mentions "agreement with the subjective categorizations assigned by physicians," implying experts, but the number and specific qualifications are not detailed beyond "trained interpreting physicians."
Thyroid Engine Standalone: Not applicable for malignancy risk classification ground truth, which was "pathology results only." For descriptor predictions, it states they were tested "objectively – against ground truth pathology" and "subjectively and met the requirements for agreement with readers' descriptor categorizations," implying experts, but number and specific qualifications are not detailed.
Breast Clinical Study: 15 readers. Qualifications varied (Diagnostic Radiology, Breast Surgeon, OB/GYN). Years of experience ranged from 0 to 30 years. Some were Breast Fellowship Trained and/or Dedicated Breast Imagers, and some were MQSA Qualified Interpreting Physicians.
Thyroid Clinical Study: 15 readers (11 US-based, 4 European-based). Qualifications included Endocrinologists (End) and Radiologists (Rad). Experience ranged from < 10 years to ≥ 20 years (post-residency).

4. Adjudication Method for Test Set:

Breast Engine Standalone/Clinical:
- Malignancy Ground Truth: Determined by pathology or 1-year follow-up. No explicit adjudication method amongst multiple experts for this final ground truth is mentioned, implying a single, definitive source.
- Reader Study: Not explicitly stated for establishing a ground truth for individual cases based on reader input. The study collected reader interpretations and compared them to the established ground truth (pathology/follow-up).
Thyroid Engine Standalone/Clinical:
- Malignancy Ground Truth: Determined by "pathology results only." No explicit adjudication method amongst multiple experts is mentioned.
- Reader Study: Not explicitly stated for establishing a ground truth for individual cases based on reader input. The study collected reader interpretations and compared them to pathological ground truth.
Other Standalone Tests (Smart Click, Image Registration, OCR): Ground truth was based on defined metrics (e.g., DICE coefficient for ROI matching, manual annotation for OCR, physician-drawn ROIs, pathology for descriptor agreement). No multi-expert adjudication mentioned.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:

Yes, for both Breast and Thyroid.
Effect Size of Human Readers Improve with AI vs without AI Assistance:
- Breast (from K190442, still applicable to K242130 as performance is superior):
  - Change in average AUC (USE + DS vs. USE Alone): 0.0370 [0.030, 0.044] at α = .05 (a significant increase).
  - Average Kendall Tau-B (measure of inter-operator variability):
    - USE Alone: 0.5404 (.5301, .5507)
    - USE + DS: 0.6797 (.6653, .6941) => Significant increase in agreement.
  - Intra-operator variability (class switching rate):
    - USE Alone: 13.6%
    - USE + DS: 10.8% (p = 0.042) => Statistically significant reduction.
- Thyroid (CRRS-3 Study):
  - Change in average AUC (USE + DS vs. USE Alone, all readers, all data): +0.083 [0.066, 0.099] (parametric) / +0.079 [0.062, 0.096] (non-parametric)
  - Specifically for US readers, US data: +0.074 [0.051, 0.098] (parametric) / +0.073 [0.049, 0.096] (non-parametric). This demonstrates a statistically significant improvement in overall reader performance.
  - Change in average Sensitivity/Specificity of FNA (with AI Adapter + size criteria):
    - All readers, all data: +0.084 (sensitivity), +0.140 (specificity)
    - US readers, US data: +0.058 (sensitivity), +0.130 (specificity)
  - Change in average Sensitivity/Specificity of Follow-up (with AI Adapter + size criteria):
    - All readers, all data: +0.060 (sensitivity), +0.206 (specificity)
    - US readers, US data: +0.053 (sensitivity), +0.180 (specificity)
  - Inter-Reader Variability (relative change in TI-RADS points association): 40.7% (all readers, all data), 37.4% (US readers, US data), 49.7% (EU Readers, EU Data)
  - Impact on Interpretation Time: -23.6% (all readers, all data), -22.7% (US readers, US data), -32.4% (EU Readers, EU Data).

6. Standalone (Algorithm Only without Human-in-the-loop) Performance:

Yes, for both Breast and Thyroid AI Engines, Smart Click, Image Registration and Matching, and OCR.
- Breast Engine: AUC = 0.945; Sensitivity = 0.976; Specificity = 0.632.
- Thyroid Engine (ACR TI-RADS, biopsy recommendation): Sensitivity = 0.644; Specificity = 0.612.
- Thyroid Smart Click: Demonstrated non-inferiority for Sensitivity, Specificity, AUC, and descriptor agreement compared to physician-selected calipers. Detection DICE = 0.913.
- Image Registration and Matching: Very high DICE coefficients (Breast 0.995, Thyroid 0.996) and successful match rates (>99.5%).
- OCR Engine: High accuracy rates for identification of various freetext and measurement fields (e.g., Breast Side 0.983, Measurement Value 0.948).

7. Type of Ground Truth Used:

Malignancy Risk Classification (Breast & Thyroid AI Engines):
- Breast: Pathology or 1-year follow-up.
- Thyroid: Pathology results only (for standalone). Clinical study also used cyto-/histological or excisional pathology.
Descriptor Predictions (Thyroid Standalone): Tested objectively against ground truth pathology and subjectively for agreement with readers' descriptor categorizations.
Smart Click, Image Registration, OCR: Ground truth was established by manual annotations, physician-drawn ROIs, or defined objective metrics (like DICE coefficient against a reference ROI).

8. Sample Size for Training Set:

Not explicitly stated for either Breast or Thyroid engines. The text mentions drawing upon a "large database of known cases" for the underlying engines and that the test sets were "set aside from the system's training data." However, the exact number of cases/images in the training set is not provided.

9. How Ground Truth for Training Set was Established:

Not explicitly detailed for either Breast or Thyroid engines. The text states the engines "draw upon knowledge learned from a large database of known cases, tying image features to their eventual diagnosis, to form a predictive model." This implies that the training data had associated definitive diagnoses (e.g., from pathology or follow-up), but the process of establishing this ground truth (e.g., expert review, adjudication) for the training data is not described.

Ask a Question

Ask a specific question about this device

Page 1 of 1