Search Results

Saige-Dx analyzes digital breast tomosynthesis (DBT) mammograms to identify the presence or absence of soft tissue lesions and calcifications that may be indicative of cancer. For a given DBT mammogram, Saige-Dx analyzes the DBT image stacks and the accompanying 2D images, including full field digital mammography and/or synthetic images. The system assigns a Suspicion Level, indicating the strength of suspicion that cancer may be present, for each detected finding and for the entire case. The outputs of Saige-Dx are intended to be used as a concurrent reading aid for interpreting physicians on screening mammograms with compatible DBT hardware.

Device Description

Saige-Dx is a software device that processes screening mammograms using artificial intelligence to aid interpreting radiologists. By automatically detecting the presence or absence of soft tissue lesions and calcifications in mammography images, Saige-Dx can help improve reader performance, while also reducing reading time. The software takes as input a set of x-ray mammogram DICOM files from a single digital breast tomosynthesis (DBT) study and generates finding-level outputs for each image analyzed, as well as an aggregate case-level assessment. Saige-Dx processes both the DBT image stacks and the associated 2D images (full-field digital mammography (FFDM) and/or synthetic 2D images) in a DBT study. For each image, Saige-Dx outputs bounding boxes circumscribing any detected findings and assigns a Finding Suspicion Level to each finding, indicating the degree of suspicion that the finding is malignant. Saige-Dx uses the results of the finding-level analysis to generate a Case Suspicion Level, indicating the degree of suspicion for malignancy across the case. Saige-Dx encapsulates the finding and case-level results into a DICOM Structured Report (SR) object containing markings that can be overlaid on the original mammogram images using a viewing workstation and a DICOM Secondary Capture (SC) object containing a summary report of the Saige-Dx results.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) clearance letter for Saige-Dx:

1. Table of Acceptance Criteria and Reported Device Performance

The provided document indicates that the primary endpoint of the standalone performance testing was to demonstrate non-inferiority of the subject device (new Saige-Dx version) to the predicate device (previous Saige-Dx version). Specific quantitative acceptance criteria (e.g., AUC, sensitivity, specificity thresholds) are not explicitly stated in the provided text. However, the document states:

"The test met the pre-specified performance criteria, and the results support the safety and effectiveness of Saige-Dx updated AI model on Hologic and GE exams."

Acceptance Criteria (Not explicitly quantified in source)	Reported Device Performance
Non-inferiority of subject device performance to predicate device performance.	"The test met the pre-specified performance criteria, and the results support the safety and effectiveness of Saige-Dx updated AI model on Hologic and GE exams."
Performance across breast densities, ages, race/ethnicities, and lesion types and sizes.	Subgroup analyses "demonstrated similar standalone performance trends across breast densities, ages, race/ethnicities, and lesion types and sizes."
Software design and implementation meeting requirements.	Verification testing including unit, integration, system, and regression testing confirmed "the software, as designed and implemented, satisfied the software requirements and has no unintentional differences from the predicate device."

2. Sample Size for the Test Set and Data Provenance

Sample Size for Test Set: 2,002 DBT screening mammograms from unique women.
- 259 cancer cases
- 1,743 non-cancer cases
Data Provenance:
- Country of Origin: United States (cases collected from 12 diverse clinical sites).
- Retrospective or Prospective: Retrospective.
- Acquisition Equipment: Hologic (standard definition and high definition) and GE images.

3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications

The document mentions: "The case collection and ground truth lesion localization processes of the newly collected cases were the same processes used for the previously collected test dataset (details provided in K220105)."

While the specific number and qualifications of experts for the ground truth of the current test set are not explicitly detailed in this document, it refers back to K220105 for those details. It implies that a standardized process involving experts was used.

4. Adjudication Method for the Test Set

The document does not explicitly describe the adjudication method (e.g., 2+1, 3+1) used for establishing ground truth for the test set. It states: "The case collection and ground truth lesion localization processes of the newly collected cases were the same processes used for the previously collected test dataset (details provided in K220105)." This suggests a pre-defined and presumably robust method for ground truth establishment.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

Was it done? Yes.
Effect Size: The document states: "a multi-reader multi-case (MRMC) study was previously conducted for the predicate device and remains applicable to the subject device." It does not provide details on the effect size (how much human readers improve with AI vs. without AI assistance) within this document. Readers would need to refer to the K220105 submission for that information if it was presented there.

6. Standalone (Algorithm Only) Performance Study

Was it done? Yes.
Description: "Validation of the software was conducted using a retrospective and blinded multicenter standalone performance testing under an IRB approved protocol..."
Primary Endpoint: "to demonstrate that the performance of the subject device was non-inferior to the performance of the predicate device."

7. Type of Ground Truth Used

The ground truth involved the presence or absence of cancer, with cases categorized as 259 cancer and 1,743 non-cancer. The mention of "ground truth lesion localization processes" implies a detailed assessment of findings, likely involving expert consensus and/or pathology/biopsy results to confirm malignancy. Given it's a diagnostic aid for cancer, pathology is the gold standard for confirmation.

8. Sample Size for the Training Set

Training Dataset: 161,323 patients and 300,439 studies.

9. How the Ground Truth for the Training Set Was Established

The document states: "The Saige-Dx algorithm was trained on a robust and diverse dataset of mammography exams acquired from multiple vendors including GE and Hologic equipment."
While it doesn't explicitly detail the method of ground truth establishment for the training set (e.g., expert consensus, pathology reports), similar to the test set, for a cancer detection AI, it is highly probable that the ground truth for the training data was derived from rigorous clinical assessments, including follow-up, biopsy results, and/or expert interpretations, to accurately label cancer and non-cancer cases for the algorithm to learn from. The implied "robust and diverse" nature of the training data suggests a comprehensive approach to ground truth.

Ask a Question

Ask a specific question about this device

K Number

K243341

Device Name

Genius AI Detection 2.0

Manufacturer

Hologic, Inc.

Date Cleared

2025-07-31

(279 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K230096,K221449

Predicate For

N/A

Intended Use

Genius AI Detection is a computer-aided detection and diagnosis (CADe/CADx) software device intended to be used with compatible digital breast tomosynthesis (DBT) systems to identify and mark regions of interest including soft tissue densities (masses, architectural distortions and asymmetries) and calcifications in DBT exams from compatible DBT systems and provide confidence scores that offer assessment for Certainty of Findings and a Case Score.

The device intends to aid in the interpretation of digital breast tomosynthesis exams in a concurrent fashion, where the interpreting physician confirms or dismisses the findings during the reading of the exam.

Device Description

Genius AI Detection 2.0 is a software device intended to identify potential abnormalities in breast tomosynthesis images. Genius AI Detection 2.0 analyzes each standard mammographic view in a digital breast tomosynthesis examination using deep learning networks. For each detected lesion, Genius AI Detection 2.0 produces CAD results that include:

the location of the lesion;
an outline of the lesion;
a confidence score for the lesion
Genius AI Detection 2.0 also produces a case score for the entire breast tomosynthesis exam.

Genius AI Detection 2.0 packages all CAD findings derived from the corresponding analysis of a tomosynthesis exam into a DICOM Mammography CAD SR object and distributes it for display on DICOM compliant review workstations. The interpreting physician will have access to the CAD findings concurrently to the reading of the tomosynthesis exam. In addition, a combination of peripheral information such as number of marks and case scores may be used on the review workstation to enhance the interpreting physician's workflow by offering a better organization of the patient worklist.

AI/ML Overview

Here's a breakdown of the acceptance criteria and study details for Genius AI Detection 2.0, based on the provided FDA 510(k) clearance letter:

Acceptance Criteria and Device Performance for Genius AI Detection 2.0

1. Table of Acceptance Criteria and Reported Device Performance

The provided document describes a non-inferiority study to demonstrate that the performance of Genius AI Detection 2.0 on Envision (ENV) images is equivalent to its performance on the predicate's Standard of Care (SOC) images (Hologic's Selenia Dimensions systems). The primary acceptance criterion was non-inferiority of the Area Under the Curve (AUC) of the ROC curve, with a 5% margin. Secondary metrics included sensitivity, specificity, and false marker rate per view.

Acceptance Criteria Category	Specific Metric	Predicate Device Performance (SOC Images)	Subject Device Performance (ENV Images)	Acceptance Criteria Met?
Primary Endpoint (Non-Inferiority)	AUC of ROC Curve (ENV-SOC)	N/A (Comparison study)	-0.0017 (95% CI -0.023 - 0.020)	Yes (p-value for difference = 0.87, indicating no significant difference, and within 5% non-inferiority margin)
Secondary Metrics	Sensitivity	N/A (Comparison study)	No significant difference reported between modalities	Yes
	Specificity	N/A (Comparison study)	No significant difference reported between modalities	Yes
	False Marker Rate per View	N/A (Comparison study)	No significant difference reported between modalities	Yes
CC-MLO Correlation	Accuracy on Malignant Lesions	N/A	90%	Yes (Considered accurate)
	Accuracy on Negative Cases (Correlated pairs)	N/A	73%	Yes (Considered accurate)
Implant Cases	Location-specific cancer detection sensitivity	N/A	76% (CI 68%~84%)	Yes (Considered acceptable based on confidence intervals)
	Specificity	N/A	67% (CI 62%~72%)	Yes (Considered acceptable based on confidence intervals)

(Note: The document focuses on demonstrating equivalence to the predicate's performance on a new platform rather than absolute performance against a fixed threshold for all metrics, except for the implant case where specific CIs are given and deemed acceptable.)

2. Sample Size Used for the Test Set and Data Provenance

Sample Size (Main Comparison Study): 1475 subjects
- 200 biopsy-proven cancer subjects
- 275 biopsy-proven benign subjects
- 78 BI-RADS 3 subjects (considered BI-RADS 1 or 2 upon diagnostic workup)
- 922 BI-RADS 1 and 2 subjects (at screening)
- Implant Case Test Set: 480 subjects
  - 132 biopsy-proven cancer subjects
  - 348 negative subjects (119 biopsy-proven benign, 229 screening negative)
Data Provenance:
- Country of Origin: Not explicitly stated, but collected from a "national multi-center breast imaging network" within the U.S., implying U.S. origin.
- Retrospective or Prospective: The main comparison study data was collected for evaluating the safety and effectiveness of the Envision platform, with an IRB approved protocol. This suggests a retrospective study design, where existing images were gathered for evaluation. The implant cases were collected between 2015 and 2022, also indicating a retrospective approach.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications

Number of Experts: Two
Qualifications: Both were MQSA-certified radiologists with over 20 years of experience.

4. Adjudication Method for the Test Set

The document explicitly states that the "ground truthing to evaluate performance metrics including the locations of cancer lesions was done by two MQSA-certified radiologists with over 20 years of experience."

Adjudication Method: It does not specify a particular adjudication method (e.g., 2+1, 3+1). It simply states that ground truthing was done by two experts. This implies either consensus was reached between the two, or potentially an unstated arbitration method if they disagreed, or that their individual findings were used for analysis. Given the phrasing, expert consensus is the most likely implied method, but not explicitly detailed.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done

No, an MRMC comparative effectiveness study was NOT done. The study described is a standalone performance comparison of the AI algorithm on images from different modalities (Envision vs. Standard of Care), not a study involving human readers with and without AI assistance to measure effect size.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone study WAS done. The document explicitly states, "A standalone study was conducted to compare the detection performance of FDA cleared Genius AI Detection 2.0 (K221449) using Standard of Care (SOC) images acquired on the Dimensions systems against images acquired on the FDA approved Envision Mammography Platform (P080003/S009)." This study evaluated the algorithm's performance (fROC, ROC, sensitivity, specificity, false marker rate) directly against the ground truth without human intervention.

7. The Type of Ground Truth Used

Ground Truth Type: A combination of biopsy-proven cancer and biopsy-proven benign cases, along with BI-RADS diagnostic outcomes (for negative cases). For the cancer cases, the "locations of cancer lesions" were part of the ground truth.

8. The Sample Size for the Training Set

Not provided. The document states that the test dataset was "sequestered from any training datasets by isolating it on a secured server with controlled access permissions" and that the data for implant cases was "sequestered from the training datasets for Genius AI Detection." However, the actual sample size of the training set is not mentioned.

9. How the Ground Truth for the Training Set Was Established

Not provided. Since the training set sample size and details are not disclosed, the method for establishing its ground truth is also not mentioned in this document. It is generally assumed that similar rigorous methods (e.g., biopsy-proven truth, expert review) would have been used for training data, but this specific filing does not detail it.

Ask a Question

Ask a specific question about this device

K Number

K243679

Device Name

MammoScreen® (4)

Manufacturer

Therapixel

Date Cleared

2025-07-03

(216 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K192854,K211541,K240301

Predicate For

N/A

Intended Use

MammoScreen® 4 is a concurrent reading and reporting aid for physicians interpreting screening mammograms. It is intended for use with compatible full-field digital mammography and digital breast tomosynthesis systems. The device can also use compatible prior examinations in the analysis.

Output of the device includes graphical marks of findings as soft-tissue lesions or calcifications on mammograms along with their level of suspicion scores. The lesion type is characterized as mass/asymmetry, distortion, or calcifications for each detected finding. The level of suspicion score is expressed at the finding level, for each breast, and overall for the mammogram.

The location of findings, including quadrant, depth, and distance from the nipple, is also provided. This adjunctive information is intended to assist interpreting physicians during reporting.

Patient management decisions should not be made solely based on the analysis by MammoScreen 4.

Device Description

MammoScreen 4 is a concurrent reading medical software device using artificial intelligence to assist radiologists in the interpretation of mammograms.

MammoScreen 4 processes the mammogram(s) and detects findings suspicious for breast cancer. Each detected finding gets a score called the MammoScreen Score™. The score was designed such that findings with a low score have a very low level of suspicion. As the score increases, so does the level of suspicion. For each mammogram, MammoScreen 4 outputs the detected findings with their associated score, a score per breast, driven by the highest finding score for each breast, and a score per case, driven by the highest finding score overall. The MammoScreen Score goes from one to ten.

MammoScreen 4 is available for 2D (FFDM images) and 3D processing (FFDM & DBT or 2DSM & DBT). Optionally, MammoScreen 4 can use prior examinations in the analysis.

The results indicating potential breast cancer, identified by MammoScreen 4, are accessible via a dedicated user interface and can seamlessly integrate into DICOM viewers (using DICOM-SC and DICOM-SR). Reporting aid outputs can be incorporated into the practice's reporting system to generate a preliminary report.

Note that the MammoScreen 4 outputs should be used as complementary information by radiologists while interpreting mammograms. For all cases, the medical professional interpreting the mammogram remains the sole decision-maker.

AI/ML Overview

The provided text describes the acceptance criteria and a study to prove that MammoScreen® 4 meets these criteria. Here is a breakdown of the requested information:

Acceptance Criteria and Device Performance

1. Table of Acceptance Criteria and Reported Device Performance

Rationale for using "MammoScreen 2" data for comparison: The document states that the standalone testing for MammoScreen 4 compared its performance against "MammoScreen 2 on Dimension". While MammoScreen 3 is the predicate device, the provided performance data in the standalone test section specifically refers to MammoScreen 2. The PCCP section later references performance targets for MammoScreen versions 1, 2, and 3, but the actual "Primary endpoint" results for the current device validation are given in comparison to MammoScreen 2. Therefore, the table below uses the reported performance against MammoScreen 2 as per the "Primary endpoint" section.

Metric	Acceptance Criteria	Reported Device Performance (MammoScreen 4 vs. MammoScreen 2)
Primary Objective	Non-inferiority in standalone cancer detection performance compared to the previous version of MammoScreen (specifically MammoScreen 2 on Dimension).	Achieved.
AUC at the mammogram level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.894 (0.870, 0.919) MS2: 0.867 (0.839, 0.896) Δ: 0.027 (0.002, 0.052), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC at the breast level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.919 (0.897, 0.941) MS2: 0.895 (0.871, 0.920) Δ: 0.023 (0.002, 0.045), p<0.0001 (Lower bound of difference 0.002 is positive, meeting criteria)
AUC LROC at the finding level	Positive lower bound of the 95% CI of the difference in endpoints between MammoScreen 4 and MammoScreen 2.	MS4: 0.891 (0.862, 0.921) MS2: 0.837 (0.797, 0.877) Δ: 0.055 (0.032, 0.077), p<0.0001 (Lower bound of difference 0.032 is positive, meeting criteria)

Study Details

2. Sample size used for the test set and the data provenance

Sample Size: 1,475 patients, leading to 2,950 included studies (each patient underwent a DBT acquisition with two Hologic mammography systems).
Data Provenance: The document explicitly mentions "Data provenance" as a considered subgroup for analysis but does not specify the country of origin. It indicates that the data for standalone performance testing only belonged to the "test group," which means it was "unseen data" from sources entirely left out during training and tuning. The study appears to be retrospective as it uses existing patient data.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

The document states that for the clinical testing (MRMC studies), "MQSA-qualified and ACR-certified readers" were used. However, for the standalone performance testing (which is where the ground truth for the algorithm's performance is established), the document only describes the "Truthing process" and does not specify the number or qualifications of experts involved in establishing the ground truth.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set

The document describes the "Truthing process" for the standalone performance testing but does not specify an adjudication method involving multiple readers. The ground truth establishment is described as:

Positive cases: biopsy-proven presence of cancer.
Benign cases: cases confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: verified by imaging follow-up.

This indicates a reliance on clinical outcomes/pathology rather than reader consensus for ground truth for the standalone performance data.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Was an MRMC study done? Yes, "Clinical Testing" section explicitly states: "The clinical validation of MammoScreen 4 includes three multi-reader multi-case (MRMC) studies: One for FFDM, One for DBT, One for combined DBT and 2D mammograms (FFDM or 2DSM), and using prior examinations."
Effect size of improvement: The document states, "The studies demonstrated the superiority of the Area Under the Receiver Operating Characteristic Curve of the radiologist using the MammoScreen algorithm compared to the unaided radiologist." However, specific effect sizes (e.g., AUC difference, confidence intervals) for the human reader performance improvement with AI assistance versus without AI assistance are not provided in the excerpt. Only the result of superiority is mentioned, not the quantitative measure of that superiority.

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done

Was a standalone study done? Yes. The section "The standalone performance testing carried out to validate the device is summarized in what follows:" directly addresses this.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)

For the standalone performance testing:

Positive cases: Biopsy-proven presence of cancer.
Benign cases: Confirmed by biopsy result and cases confirmed by imaging follow-up.
Negative cases: Verified by imaging follow-up.

This indicates a mix of pathology (biopsy) and outcomes data (imaging follow-up) as the ground truth.

8. The sample size for the training set

The document states: "Sources in the training/tuning group may only be used for model training and tuning. Sources in the test group may only be used for external validation of the model's performances on unseen data (i.e., from sources entirely left out during training and tuning)." However, it does not provide the specific sample size for the training set. It only implies it was "very large databases."

9. How the ground truth for the training set was established

The document states: "These modules are trained with very large databases of biopsy-proven examples of breast cancer and normal tissue." This implies that the ground truth for the training set was primarily established through biopsy results for cancerous cases and likely outcomes/clinical confirmation for normal or benign cases, similar to the test set ground truth. However, detailed methodology on training set ground truth establishment is not provided beyond "biopsy-proven examples."

Ask a Question

Ask a specific question about this device

K Number

K242683

Device Name

QP-Prostate® CAD

Manufacturer

Quibim S.L.

Date Cleared

2025-03-18

(193 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212783

Predicate For

N/A

Intended Use

QP-Prostate® CAD is a Computed Aided Detection and Diagnosis (CADe/CADx) image processing software that automatically detects and identifies suspected lesions in the prostate gland based on bi-parametric prostate MRI. The software is intended to be used as a concurrent read by physicians with proper training in a clinical setting as an aid for interpreting prostate MRI studies. The results can be displayed in a variety of DICOM outputs, including identified suspected lesions marked as an overlay onto source MR images. The output can be displayed on third-party DICOM workstations and Picture Archive and Communication Systems (PACS). Patient management decisions should not be based solely on the results of QP-Prostate® CAD.

Device Description

QP-Prostate® CAD is an artificial intelligence-based Computed Aided Detection and Diagnosis (CADe/CADx) image processing software. QP-Prostate® CAD uses Al-based algorithms trained with pathology data to detect suspicious lesions for clinically significant prostate cancer. The device automatically detects and identifies suspected lesions in the prostate gland based on bi-parametric prostate MRI and provides marks over regions of the images that may contain suspected lesions. There are two possible markers that are provided in different colors suggesting different levels of suspicion of clinically significant prostate cancer (moderate or high suspicion level).

The software is intended to be used as a concurrent read by physicians with proper training in a clinical setting as an aid for interpreting prostate MRI studies. The results can be displayed in a variety of DICOM outputs, including identified suspected lesions marked as an overlay onto source MR images. The output can be displayed on third-party DICOM workstations and Picture Archive and Communication Systems (PACS). Based on biparametric input consisting of T2W and DWI series, the analysis is run automatically, and the output in standard DICOM formats is returned to PACS.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

Table 1: Acceptance Criteria and Reported Device Performance (Standalone)

Metric (lesion level)	Acceptance Criterion (Implicit)	Reported Device Performance
AUC-ROC	Evidence of good discriminatory ability (e.g., above a certain threshold)	0.732 (95% CI: 0.668-0.791)
Sensitivity (high suspicion marker)	Evidence of good detection rate for clinically significant findings	0.677 (95% CI: 0.593-0.761)
False Positive Rate per Case (high suspicion marker, any biopsy source)	Evidence of acceptable false positive rate	0.417 (95% CI: 0.313-0.522)
Sensitivity (high and moderate suspicion markers)	Evidence of good detection rate for clinically significant findings	0.795 (95% CI: 0.722-0.861)
False Positive Rate per Case (high and moderate suspicion markers, any biopsy source)	Evidence of acceptable false positive rate	0.855 (95% CI: 0.709-0.996)

Note: The document does not explicitly state numerical acceptance criteria thresholds for the standalone performance metrics (AUC-ROC, Sensitivity, FPR). Instead, it presents the results and implies that these values "demonstrate the safety and effectiveness" in comparison to the predicate device. The general implicit acceptance criterion for these metrics would be that they exhibit performance levels indicative of a useful diagnostic aid.

Table 2: Acceptance Criteria and Reported Device Performance (Multi-Reader Multi-Case Study)

Metric	Acceptance Criterion (Explicit)	Reported Device Performance
ΔAUC (AUCaided - AUCunaided) (Primary Endpoint)	A statistically significant improvement (p-value < 0.05)	0.019 (95% CI: 0.001-0.038) p-value: 0.039
Sensitivity with/without CAD assistance (Secondary Endpoint)	Improvement when using CAD assistance	Not explicitly quantified in table, but overall improvement is stated to be demonstrated.
Specificity with/without CAD assistance (Secondary Endpoint)	Improvement when using CAD assistance	Not explicitly quantified in table, but overall improvement is stated to be demonstrated.

The document stated directly that "The test results demonstrate that QP-Prostate® CAD functioned as intended and met its primary endpoint, is acceptable for clinical use, and is as safe and effective as its predicate device, without introducing new questions of safety and efficacy."

Study Details for QP-Prostate® CAD performance:

Sample Size and Data Provenance for Test Set:
- Sample Size: 228 cases for the clinical reader performance assessment (MRMC study) and 247 for the standalone performance assessment (lesion-level).
- Data Provenance: Retrospectively collected from multiple centers across the US. This dataset was "completely independent from the training dataset that was acquired from different institutions."
Number of Experts and Qualifications for Ground Truth (Test Set):
- The document states that the ground truth for the standalone performance evaluation was derived from "associated pathology reports and radiologist interpretations." It does not specify the number or qualifications of radiologists involved in these interpretations for the ground truth establishment of the test set.
- For the MRMC study, 9 readers (presumably radiologists, though specific qualifications for each weren't detailed beyond "clinical readers") participated in the study itself, but this is about their performance, not their role in establishing the ground truth.
Adjudication Method for Test Set:
- The document doesn't explicitly describe an adjudication method (e.g., 2+1, 3+1) for establishing the ground truth of the test set cases based on radiologist interpretations. It mentions "biopsy outcomes" and "biopsy confirmed (Gleason score ≥ 7)" for positive cases, and "biopsy-confirmed (Gleason score < 7) or non-biopsied with a clinical followup of at least one year" for negative cases. This suggests pathology and long-term clinical follow-up played a primary role in ground truth for patient status.
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- Was it done? Yes, a fully crossed multi-reader multi-case study was performed.
- Effect Size of Human Reader Improvement: The primary endpoint was an improvement in AUC.
  - AUC_unaided: 0.849 (95% CI: 0.814-0.884)
  - AUC_aided: 0.868 (95% CI: 0.834-0.902)
  - ΔAUC (AUC_aided - AUC_unaided): 0.019 (95% CI: 0.001-0.038), p-value: 0.039.
  - This indicates a small but statistically significant improvement in reader performance (AUC) when assisted by the AI.
Standalone Performance Study (Algorithm Only):
- Was it done? Yes, a standalone performance assessment was conducted.
- The results are summarized in "Table 2: Summary of the standalone performance testing for QP-Prostate® CAD" (refer to the table above).
Type of Ground Truth Used:
- For the Test Set:
  - Pathology: Biopsy confirmed for both positive (Gleason score ≥ 7) and negative (Gleason score < 7) cases of clinically significant prostate cancer (csPCa).
  - Outcomes Data: For non-biopsied negative cases, ground truth was established by "a clinical followup of at least one year."
  - Expert Consensus/Radiologist Interpretation: Used in conjunction with pathology for the standalone lesion-level evaluation ("QP-Prostate® CAD outputs were compared to ground truth diagnoses derived from associated pathology reports and radiologist interpretations, at lesion-level").
Sample Size for Training Set:
- The exact sample size for the training set is not explicitly stated in numerical form. It is described as "cases acquired in the US from multiple centers."
How Ground Truth for Training Set Was Established:
- The AI-based algorithms were "trained with biopsy outcomes as ground truth to detect suspicious lesions for csPCa (Gleason score ≥7)." This indicates that pathology reports (biopsy outcomes) were the primary source of ground truth for the training data.

Ask a Question

Ask a specific question about this device

K Number

K241770

Device Name

Prostate MR AI (VA10A)

Manufacturer

Siemens Healthcare GmbH

Date Cleared

2025-03-05

(258 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K212783,K181704

Predicate For

N/A

Intended Use

Prostate MR AI is a plug-in Radiological Computer Assisted Detection and Diagnosis Software device intended to be used · with a separate hosting application · as a concurrent reading aid to assist radiologists in the interpretation of a prostate MRI examination acquired according to the PI-RADS standard · in adult men (40 years and older) with suspected cancer in treatment naïve prostate glands The plug-in software analyzes non-contrast T2 weighted (T2W) and diffusion weighted image (DWI) series to segment the prostate gland and to provide an automatic detection and segmentation of regions suspicious for cancer. For each suspicious region detected, the algorithm moreover provides a lesion Score, by way of PI-RADS interpretation suggestion. Outputs of the device should be interpreted consistently with ACR recommendations using all available MR data (e.g., dynamic contrast enhanced images [if available]). Patient management decisions should not be made solely based on analysis by the Prostate MR AI algorithm.

Device Description

This premarket notification addresses the Siemens Healthineers Prostate MR AI (VA10A) Radiological Computer Assisted Detection and Diagnosis Software (CADe/CADx). Prostate MR AI is a Computer Assisted Detection and Diagnosis algorithm designed to plug into a hosting workflow that assists radiologists in the detection of suspicious lesions and their classification. It is used as a concurrent reading aid to assist radiologists in the interpretation of a prostate MRI examination acquired according to the PI-RADS standard. The automatic lesion detection requires transversal T2W and DWI series as inputs. The device automatically exports a list of detected prostate regions that are suspicious for cancer (each list entry consists of contours and a classification by Score and Level of Suspicion (LoS)), a computed suspicion map, and a per-case LoS. The results of the Prostate MR AI plug-in (with the case-level LoS, lesion center points, lesion diameters, lesion ADC median, lesion 10th percentile, suspicion map, and non-PZ segmentation considered optional) are to be shown in a hosting application that allows the radiologist to view the original case, as well as confirm, reject, or edit lesion candidates with their contours and Scores as generated by the Prostate MR AI plug-in. Moreover, the radiologist can add lesions with contours and PI-RADS scores and finalize the case. In addition, the outputs include an automatically computed prostate segmentation, as well as sub-segmentations of the peripheral zone and the rest of the prostate (non-PZ). The algorithm will augment the prostate workflow of currently cleared syngo.MR General Engine if activated via a separate license on the General Engine.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

Acceptance Criteria	Reported Device Performance
Automatic Prostate Segmentation
Median Dice score between AI algorithm results and ground truth masks exceeds 0.9.	The median of the Dice score between the AI algorithm results and the corresponding ground truth masks exceeds the threshold of 0.9.
Median normalized volume difference between algorithm results and ground truth masks is within ±5%.	The median of the normalized volume difference between the algorithm results and the corresponding ground truth masks is within a ±5% range.
AI algorithm results are statistically non-inferior to individual reader variability (5% margin of error, 5% significance level).	The AI algorithm results as compared to any individual reader are statistically non-inferior based on variabilities that existed among the individual readers within the 5% margin of error and 5% significance level.
Prostate Lesion Detection and Classification
Case-level sensitivity of lesion detection ≥ 0.80 for both radiology and pathology ground truth.	The case-level sensitivity of the lesion detection is equal or greater than 0.80 for both radiology and pathology ground truth.
False positive rate per case of lesion detection < 1 false positive per case for radiology ground truth.	The false positive rate per case of the lesion detection is smaller than one false positive per case for radiology ground truth.
Accuracy of PI-RADS classification of radiology ground truth lesions (detected by algorithm) ≥ 0.8.	The accuracy of the PI-RADS classification of radiology ground truth lesions detected by the algorithm is equal or greater than 0.8.
Non-inferior performance in GE vs Siemens and African American vs non-African American cases, and in cases with peripheral zone vs non-peripheral lesions.	The non-inferior performance of the subject device in GE vs Siemens and African American vs non-African American cases, and in cases with peripheral zone vs non-peripheral lesions was demonstrated. (Note: Specific metrics for this non-inferiority are not explicitly stated as distinct numerical criteria but are stated as "met".)
Clinical Performance (Reader Study - Case-level discrimination of Gleason Grade Group ≥ 1)
Statistically significant improvement in AUROC for aided reading vs unaided reading.	Fully Inclusive Analysis: AUROC improved from 0.6758 (unaided) to 0.7010 (aided), difference of 0.0252 (95% C.I. [0.0011, 0.0493]; P=0.040). Maximally Restrictive Analysis: AUROC improved from 0.6579 (unaided) to 0.6948 (aided), difference of 0.0368 (95% C.I. [0.0108, 0.0628]; P=0.006). In both analyses, the improvement was statistically significant and the primary endpoint thus met.
Clinical Performance (Reader Study - Lesion-level reading performance)
Statistically significant improvement in AUwAFROC for aided reading vs unaided reading.	Fully Inclusive Analysis: AUwAFROC improved in aided reading by 0.0350 (95% C.I.:[0.0020, 0.0681], P=0.037). Maximally Restrictive Analysis: AUwAFROC improved in aided vs. unaided reading by 0.302 (95% C.I.: [0.0080,0.0520], P=0.008). In both analyses, the improvement was statistically significant and the secondary endpoint thus met.
Statistically significant improvement in Fleiss' Kappa for interreader agreement in per-case PI-RADS scores for aided reading vs unaided reading.	Fleiss' Kappa improved from 0.283 (unaided) to 0.371 (aided), with a difference of 0.087 (95% C.I. [0.051, 0.125]). The improvement was statistically significant (P<0.0001).

Study Information

2. Sample size used for the test set and the data provenance:

Automatic Prostate Segmentation: 222 transversal T2 series.
- Provenance: More than 10 clinical sites.
- Retrospective/Prospective: Not explicitly stated, but the description of comparing against ground truth generated implies retrospective use of existing scans.
Prostate Lesion Detection and Classification (Standalone Performance):
- 105 cases from 6 sites (against radiology ground truth).
- 115 cases from 6 sites (against pathology ground truth).
- 340 cases from the multi-reader multi-case study (used for evaluation, implied prospective for this part of the evaluation, but the cases themselves were retrospective for the reader study).
- Provenance: 6 sites (for 105 and 115 cases), and two US sites (for 340 cases).
- Retrospective/Prospective: The cases for the lesion detection and classification evaluation were used to compare against established ground truths, suggesting retrospective analysis of existing data. The cases for the reader study were retrospectively selected.
Multi-Reader Multi-Case (MRMC) Study: 340 cases.
- Provenance: Two US sites. Cases were consecutive and specifically included additional consecutive patient cases from men of African descent to ensure at least 13% Black or African American ethnicity.
- Retrospective/Prospective: Cases were selected retrospectively.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Automatic Prostate Segmentation: 3 expert radiologists. No specific years of experience or subspecialty beyond "radiologists" are mentioned but implied as "expert".
Prostate Lesion Detection and Classification (Radiology Ground Truth): 3 expert radiologists in prostate MRI reading.
MRMC Study (Lesion-level reference standard): 3 experienced radiologists acting as Truthers.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:

Automatic Prostate Segmentation: Pixel-wise consensus among the 3 expert radiologists.
Prostate Lesion Detection and Classification (Radiology Ground Truth): Consensus reading of the 3 expert radiologists.
MRMC Study (Case-level reference standard): Biopsy results (Gleason Grade Group GGG ≥ 1), or for cases without biopsy, PSA density and follow-up data.
MRMC Study (Lesion-level reference standard): Consensus lesions with a consensus PI-RADS of at least 3 from majority voting among the 3 experienced radiologists. (This implies a form of consensus/majority vote).

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

Yes, an MRMC study was done with a paired split-plot design, combining two fully-crossed MRMC sub-studies.

Case-level AUROC improvement (discriminating Gleason Score ≥ 1):
- Fully Inclusive Analysis: +0.0252 (from 0.6758 unaided to 0.7010 aided).
- Maximally Restrictive Analysis: +0.0368 (from 0.6579 unaided to 0.6948 aided).
Lesion-level AUwAFROC improvement:
- Fully Inclusive Analysis: +0.0350.
- Maximally Restrictive Analysis: +0.0302.
Fleiss' Kappa (interreader agreement in per-case PI-RADS scores) improvement: +0.087 (from 0.283 unaided to 0.371 aided).

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, standalone performance was evaluated for:

Automatic Prostate Segmentation: Compared algorithm results to ground truth generated by radiologists.
Prostate Lesion Detection and Classification: Compared automatic detection and classification results to radiology ground truth and pathology ground truth.
MRMC Study (AI Standalone reference): The ROC curves shown graphically include a "grey curve [that] denotes AI standalone performance."

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

For Automatic Prostate Segmentation: Pixel-wise consensus from 3 expert radiologists.
For Prostate Lesion Detection and Classification:
- Consensus reading of 3 expert radiologists (radiology ground truth).
- Biopsy results for the same patient (pathology ground truth).
For MRMC Study (Case-level): Biopsy results (Gleason Grade Group GGG ≥ 1), and in cases where biopsy was unavailable, PSA density and follow-up (12 months negative by PSA or MRI).
For MRMC Study (Lesion-level): Consensus lesions with a consensus PI-RADS of at least 3 from majority voting among 3 experienced radiologists.

8. The sample size for the training set:

The document states: "The cases for the reader study were kept completely separate from those used for the training of the Prostate MR AI algorithm." However, it does not specify the sample size for the training set. It only mentions that the AI algorithm was "trained on a database of prostate MR image series acquired according to the PI-RADS standard (non-contrast T2W and DWI image series), and corresponding radiological and/or biopsy findings."

9. How the ground truth for the training set was established:

The ground truth for the training set was established based on "corresponding radiological and/or biopsy findings." Specific details on the adjudication method (e.g., number of experts, consensus process) for the training set are not provided in this document, only the source of the ground truth.

Ask a Question

Ask a specific question about this device

K Number

K243688

Device Name

Saige-Dx (3.1.0)

Manufacturer

DeepHealth, Inc.

Date Cleared

2024-12-19

(20 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K220105,K241747

Predicate For

K251873

Intended Use

Device Description

Saige-Dx is a software device that processes screening mammograms using artificial intelligence to aid interpreting radiologists. By automatically detecting the presence or absence of soft tissue lesions and calcifications in mammography images, Saige-Dx can help improve reader performance, while also reducing time. The software takes as input a set of x-ray mammogram DICOM files from a single digital breast tomosynthesis (DBT) study and generates finding-level outputs for each image analyzed, as well as an aggregate case-level assessment. Saige-Dx processes both the DBT image stacks and the associated 2D images (full-field digital mammography (FFDM) and/or synthetic 2D images) in a DBT study. For each image, Saige-Dx outputs bounding boxes circumscribing any detected findings and assigns a Finding Suspicion Level to each finding, indicating the degree of suspicion that the finding is malignant. Saige-Dx uses the results of the finding-level analysis to generate a Case Suspicion Level, indicating the degree of suspicion for malignancy across the case. Saige-Dx encapsulates the finding and case-level results into a DICOM Structured Report (SR) object containing markings that can be overlaid on the original mammogram images using a viewing workstation and a DICOM Secondary Capture (SC) object containing a summary report of the Saige-Dx results.

AI/ML Overview

The provided text describes the Saige-Dx (v.3.1.0) device and its performance testing as part of an FDA 510(k) submission (K243688). However, it does not contain specific acceptance criteria values or the quantitative results of the device's performance against those criteria. It states that "All tests met the pre-specified performance criteria," but does not list those criteria or the measured performance metrics.

Therefore, while I can extract information related to the different aspects of the study, I cannot create a table of acceptance criteria and reported device performance with specific values.

Here's a breakdown of the information available based on your request:

1. A table of acceptance criteria and the reported device performance

Acceptance Criteria: Not explicitly stated in quantitative terms. The document only mentions that "All tests met the pre-specified performance criteria."
Reported Device Performance: Not explicitly stated in quantitative terms (e.g., specific sensitivity, specificity, AUC values, or improvements in human reader performance).

2. Sample sized used for the test set and the data provenance (e.g. country of origin of the data, retrospective or prospective)

Test Set Sample Size: Not explicitly stated for the validation performance study. The text mentions "Validation of the software was previously conducted using a multi-reader multi-case (MRMC) study and standalone performance testing conducted under approved IRB protocols (K220105 and K241747)." It also mentions that the tests included "DBT screening mammograms with Hologic standard definition and HD images, GE images, exams with unilateral breasts, and from patients with breast implants (on implant displaced views)."
Data Provenance: The data for the training set was collected from "multiple vendors including GE and Hologic equipment" and from "diverse practices with the majority from geographically diverse areas within the United States, including New York and California." For the test set, it is implied to be similar in nature as it's part of the overall "performance testing," but specific details for the test set alone are not provided regarding country of origin or retrospective/prospective nature. However, since it involves IRB protocols, it suggests a structured, likely prospective collection or at least a carefully curated retrospective collection.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts (e.g. radiologist with 10 years of experience)

Not explicitly stated for the test set. The document indicates that a Multi-Reader Multi-Case (MRMC) study was performed, which implies the involvement of expert readers, but the number of experts and their qualifications are not detailed.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set

Not explicitly stated for the test set. The involvement of an MRMC study suggests a structured interpretation process, potentially including adjudication, but the method (e.g., consensus, majority rule with an adjudicator) is not described.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

Yes, an MRMC study was done: "Validation of the software was previously conducted using a multi-reader multi-case (MRMC) study..."
Effect Size: The document does not provide the quantitative effect size of how much human readers improved with AI vs. without AI assistance. It broadly states that Saige-Dx "can help improve reader performance, while also reducing time."

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, standalone performance testing was done: "...and standalone performance testing conducted under approved IRB protocols..."
Results: The document states that "All tests met the pre-specified performance criteria" for the standalone performance, but does not provide the specific quantitative results (e.g., sensitivity, specificity, AUC).

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc)

Not explicitly stated. For a device identifying "soft tissue lesions and calcifications that may be indicative of cancer," ground truth would typically involve a combination of biopsy/pathology results, clinical follow-up, and potentially expert consensus on imaging in cases without definitive pathology. However, the document doesn't specify the exact method for establishing ground truth for either the training or test sets.

8. The sample size for the training set

Training Set Sample Size: "A total of nine datasets comprising 141,768 patients and 316,166 studies were collected..."

9. How the ground truth for the training set was established

Not explicitly stated. The document mentions the collection of diverse datasets for training but does not detail how the ground truth for these 141,768 patients and 316,166 studies was established (e.g., through radiologists' interpretations, pathology reports, clinical outcomes).

Ask a Question

Ask a specific question about this device

K Number

K241831

Device Name

Transpara (2.1.0)

Manufacturer

ScreenPoint Medical B.V.

Date Cleared

2024-11-25

(153 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K221347

Predicate For

N/A

Intended Use

Transpara software is intended for use as a concurrent reading aid for physicians interpreting screening full-field digital mammography exams and digital breast tomosynthesis exams from compatible FFDM and DBT systems, to identify regions suspicious for breast cancer and assess their likelihood of malignancy. Output of the device includes locations of calcifications groups and soft-tissue regions, with scores indicating the likelihood that cancer is present, and an exam score indicating the likelihood that cancer is present in the exam. Patient management decisions should not be made solely on the basis of analysis by Transpara.

Device Description

Transpara is a software only application designed to be used by physicians to improve interpretation of full-field digital mammography (FFMD) and digital breast tomosynthesis (DBT). Deep learning algorithms are applied to images for recognition of suspicious calcifications and soft tissue lesions (including densities, masses, architectural distortions, and asymmetries). Algorithms are trained with a large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue.

Transpara offers the following functions which may be used at any time in the reading process, to improve detection and characterization of abnormalities and enhance workflow:

AI findings for display in the images to highlight locations where the device detects suspicious calcifications or soft tissue lesions, along with region scores per finding on a scale ranging from 1-100, with higher scores indicating a higher level of suspicion.
Links between corresponding regions in different views of the breast, which may be utilized to enhance user interfaces and workflow.
An exam-based score which categorizes exams with increasing likelihood of cancer on a scale of 1-10 or in three risk categories labeled as 'low', 'intermediate' or 'elevated'.

The concurrent use indication implies that it is up to the users to decide how to use Transpara in the reading process. Transpara functions can be used before, during or after visual interpretation of an exam by a user.

Results of Transpara are computed in a standalone processing appliance which accepts mammograms in DICOM format as input, processes them, and sends the processing output to a destination using the DICOM protocol in a standardized mammography CAD DICOM format. Common destinations are medical workstations, PACS and RIS. The system can be configured using a service interface. Implementation of a user interface for end users in a medical workstation is to be provided by third parties.

AI/ML Overview

The provided text describes the acceptance criteria and a study that proves the device, Transpara (2.1.0), meets these criteria.

Here's an organized breakdown of the information requested:

Acceptance Criteria and Reported Device Performance

The acceptance criteria are implicitly defined by the reported performance metrics. The study aims to demonstrate non-inferiority and superiority to the predicate device, Transpara 1.7.2. The key metrics reported are sensitivity at various specificity levels and Exam-based Area Under the Receiver Operating Characteristic Curve (AUC).

Table 1: Acceptance Criteria (Implied by Performance Goals) and Reported Device Performance (Standalone without Temporal Analysis)

Metric	Acceptance Criteria (Implied/Target)	Reported Performance (FFDM)	Reported Performance (DBT)
Sensitivity (Sensitive Mode @ 70% Specificity)	Non-inferior & Superior to Predicate Device 1.7.2 (quantitative value not specified, but implied by comparison)	97.4% (96.3 - 98.5)	96.9% (95.5 - 98.3)
Sensitivity (Specific Mode @ 80% Specificity)	Non-inferior & Superior to Predicate Device 1.7.2	95.2% (93.7 - 96.7)	95.1% (93.3 - 96.8)
Sensitivity (Elevated Risk @ 97% Specificity)	Non-inferior & Superior to Predicate Device 1.7.2	80.8% (78.0 - 83.6)	78.4% (75.1 - 81.7)
Exam-based AUC	Non-inferior & Superior to Predicate Device 1.7.2	0.960 (0.953 - 0.966)	0.955 (0.947 - 0.963)

Table 2: Acceptance Criteria (Implied by Performance Goals) and Reported Device Performance (Standalone with Temporal Analysis - TA)

Metric	Acceptance Criteria (Implied/Target)	Reported Performance (FFDM with TA)	Reported Performance (DBT with TA)
Sensitivity (Sensitive Mode @ 70% Specificity)	Superior to performance without temporal comparison	95.7% (93.7 - 97.6)	94.6% (91.2 - 98.0)
Sensitivity (Specific Mode @ 80% Specificity)	Superior to performance without temporal comparison	95.4% (93.4 - 97.4)	91.0% (86.7 - 95.4)
Sensitivity (Elevated Risk @ 97% Specificity)	Superior to performance without temporal comparison	82.7% (79.1 - 86.4)	74.9% (68.3 - 81.4)
Exam-based AUC	Superior to performance without temporal comparison	0.958 (0.946 - 0.969)	0.941 (0.921 - 0.958)

Study Details

Sample Size Used for the Test Set and Data Provenance:
- Main Test Set (without temporal analysis): 10,207 exams (5,730 FFDM, 4,477 DBT).
  - Normal: 8,587 exams
  - Benign: 270 exams
  - Cancer: 1,350 exams (750 FFDM, 600 DBT)
- Temporal Analysis Test Set: 5,724 exams (4,266 FFDM, 1,458 DBT).
  - Normal: 4,998 exams
  - Benign: 83 exams
  - Cancer: 643 exams (471 FFDM, 172 DBT)
- Data Provenance: Independent dataset acquired from multiple centers in seven EU countries and the US. Retrospective in nature, as it was acquired and not used for algorithm development and included normal exams with at least one year follow-up. The data included images from various manufacturers (Hologic, GE, Philips, Siemens, Fujifilm).
Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of those Experts:
- The document states that the cancer cases in the test set were "biopsy-proven cancer." It does not specify the number or qualifications of experts used to establish the ground truth for the entire test set (including normal and benign cases, and detailed lesion characteristics). The mechanism for establishing the "normal" and "benign" status is not explicitly detailed beyond "normal follow-up of at least one year."
Adjudication Method for the Test Set:
- The document does not explicitly describe an adjudication method involving multiple readers for establishing ground truth for the test set. The ground truth for cancer cases is stated as "biopsy-proven."
If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done:
- No, the document does not describe a Multi-Reader Multi-Case (MRMC) comparative effectiveness study. The performance assessment is a standalone evaluation of the algorithm's performance, not a human-in-the-loop study comparing human readers with and without AI assistance.
If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, standalone performance tests were conducted. The results presented in Tables 2 and 5 are for the algorithm's performance only.
The Type of Ground Truth Used:
- The primary ground truth for cancer cases is biopsy-proven cancer. For normal exams within the test set, the ground truth was established by "a normal follow-up of at least one year," implying outcomes data (absence of diagnosed cancer over a follow-up period).
The Sample Size for the Training Set:
- The document does not explicitly state the sample size of the training set. It mentions "Deep learning algorithms are applied to images for recognition of suspicious calcifications and soft tissue lesions... Algorithms are trained with a large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue."
How the Ground Truth for the Training Set Was Established:
- The ground truth for the training set was established using a "large database of biopsy-proven examples of breast cancer, benign abnormalities, and examples of normal tissue." This implies a similar methodology to the test set for cancer cases (biopsy verification) and likely clinical follow-up or expert consensus for benign/normal cases, though not explicitly detailed for the training set.

Ask a Question

Ask a specific question about this device

K Number

K241747

Device Name

Saige-Dx

Manufacturer

DeepHealth, Inc

Date Cleared

2024-11-18

(153 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K220105

Predicate For

K243688

Intended Use

Device Description

Saige-Dx is a software device that processes screening mammograms using artificial intelligence to aid interpreting radiologists. By automatically detecting the presence or absence of soft tissue lesions and calcifications in mammography images, Saige-Dx can help improve reader performance, while also reducing time. The software takes as input a set of x-ray mammogram DICOM files from a single digital breast tomosynthesis (DBT) study and generates finding-level outputs for each image analyzed, as well as an aggregate case-level assessment. Saige-Dx processes both the DBT image stacks and the associated 2D images (full-field digital mammography (FFDM) and/or synthetic 2D images) in a DBT study. For each image, Saige-Dx outputs bounding boxes circumscribing any detected findings and assigns a Finding Suspicion Level to each finding, indicating the degree of suspicion that the finding is malignant. Saige-Dx uses the results of the finding-level analysis to generate a Case Suspicion Level, indicating the degree of suspicion for malignancy across the case. Saige-Dx encapsulates the finding and case-level results into a DICOM Structured Report (SR) object containing markings that can be overlaid on the original mammogram images using a viewing workstation and a DICOM Secondary Capture (SC) object containing a summary report of the Saige-Dx results.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. A table of acceptance criteria and the reported device performance

Acceptance Criteria (Endpoint)	Reported Device Performance
Substantial equivalence demonstrating non-inferiority of the subject device (Saige-Dx) on compatible exams compared to the predicate device's performance on previously compatible exams.	The study endpoint was met. The lower bound of the 95% CI around the delta AUC between Hologic and GE cases, compared to Hologic-only exams, was greater than the non-inferiority margin.
	Case-level AUC on compatible exams: 0.910 (95% CI: 0.886, 0.933)
Generalizable standalone performance across confounders for GE and Hologic exams.	Demonstrated generalizable standalone performance on GE and Hologic exams across patient age, breast density, breast size, race, ethnicity, exam type, pathology classification, lesion size, and modality.
Performance on Hologic HD images.	Met pre-specified performance criteria.
Performance on unilateral breasts.	Met pre-specified performance criteria.
Performance on breast implants (implant displaced views).	Met pre-specified performance criteria.

2. Sample size used for the test set and the data provenance

Sample Size: 1,804 women (236 cancer exams and 1,568 non-cancer exams).
Data Provenance: Collected from 12 clinical sites across the United States. It's a retrospective dataset, as indicated by the description of cancer exams being confirmed by biopsy pathology and non-cancer exams by negatively interpreted subsequent screens.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts

Number of Experts: At least two independent truthers, plus an additional adjudicator if needed (implying a minimum of two, potentially three).
Qualifications of Experts: MQSA qualified, breast imaging specialists.

4. Adjudication method for the test set

Adjudication Method: "Briefly, each cancer exam and supporting medical reports were reviewed by two independent truthers, plus an additional adjudicator if needed." This describes a 2+1 adjudication method.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance

The provided text describes a standalone performance study ("The pivotal study compared the standalone performance between the subject device"). It does not mention an MRMC comparative effectiveness study and therefore no effect size for human reader improvement with AI assistance is reported. The device is intended as a concurrent reading aid, but the reported study focused on the algorithm's standalone performance.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done. The text states: "Validation of the software was performed using standalone performance testing..." and "The pivotal study compared the standalone performance between the subject device."

7. The type of ground truth used

For Cancer Exams: Confirmed by biopsy pathology.
For Non-Cancer Exams: Confirmed by a negatively interpreted exam on the subsequent screen and without malignant biopsy pathology.
For Lesions: Lesions for cancer exams were established by MQSA qualified breast imaging specialists, likely based on radiological findings and pathology reports.

8. The sample size for the training set

Sample Size: 121,348 patients and 122,252 studies.

9. How the ground truth for the training set was established

The document does not explicitly detail the method for establishing ground truth for the training set. It mentions the training dataset was "robust and diverse." However, given the rigorous approach described for the test set's ground truth (biopsy pathology, negative subsequent screens, expert review), it is reasonable to infer a similar, if not identical, standard was applied to the training data. The text emphasizes "no exam overlap between the training and testing datasets," indicating a careful approach to data separation.

Ask a Question

Ask a specific question about this device

K Number

K240417

Device Name

ProFound Detection (V4.0)

Manufacturer

iCAD, Inc.

Date Cleared

2024-11-08

(269 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K203822

Predicate For

N/A

Intended Use

ProFound Detection V4.0 is a computer-assisted detection and diagnosis (CAD) software device intended to be used concurrently by interpreting physicians while reading digital breast tomosynthesis (DBT) exams from compatible DBT system detects soft tissue densities (masses, architectural distortions and asymmetries) and calcifications in the 3D DBT slices. The detections and Certainty of Finding and Case Scores assist interpreting physicians in identifying soft tissue densities and calcifications that may be confirmed or dismissed by the interpreting Physician.

Device Description

ProFound Detection V4.0 is a computer-assisted detection and diagnosis (CAD) software device that detects malignant soft-tissue densities and calcifications in digital breast tomosynthesis (DBT) images. The ProFound Detection V4.0 software allows an interpreting physician to quickly identify suspicious soft tissue densities and calcifications by marking the detected areas in the tomosynthesis images. When the ProFound Detection V4.0 marks are displayed by a user, the marks will appear as overlays on the tomosynthesis images. Each detected finding will also be assigned a "score" that corresponds to the ProFound Detection V4.0 algorithm's confidence that the detected finding is a cancer (Certainty of Finding). Certainty of Finding scores are a percentage in range of 0% to 100% to indicate CAD's confidence that the finding is malignant. ProFound Detection V4.0 also assigns a score to each case (Case Score) as a percentage in range of 0% to 100% to indicate CAD's confidence that the case has malignant findings. The higher the Certainty of Finding or Case Score, the higher the confidence that the detected finding is a cancer or that the case has malignant findings.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Device Performance

The core acceptance criterion is non-inferiority to the predicate device (ProFound AI V3.0) on key performance metrics.

Table of Acceptance Criteria and Reported Device Performance

Metric	Acceptance Criteria (Non-inferior to Predicate)	Reported ProFound Detection V4.0 Performance (with priors)	Reported ProFound Detection V4.0 Performance (without priors)	Reported Predicate Performance (ProFound AI V3.0)
Sensitivity	Not inferior to 0.8725	0.9004 (0.8633-0.9374)	0.9004 (0.8633-0.9374)	0.8725 (0.8312-0.9138)
Specificity	Not inferior to 0.5278	0.6205 (0.5846-0.6565)	0.5863 (0.5498-0.6228)	0.5278 (0.4909-0.5648)
AUC	Not inferior to 0.8230	0.8753 (0.8475-0.9032)	0.8714 (0.8423-0.9007)	0.8230 (0.7878-0.8570)

Summary of Performance vs. Criteria:
The study demonstrated that ProFound Detection V4.0, particularly when using prior images, achieved superior performance across all three metrics (Sensitivity, Specificity, and AUC) compared to the predicate device, thus meeting the non-inferiority acceptance criteria and additionally showing superiority in specificity.

Study Details

2. Sample size used for the test set and the data provenance:

Sample Size: 952 cases
- 251 biopsy-proven cancer cases (with 256 malignant lesions)
- 701 non-cancer cases
Data Provenance:
- Country of Origin: U.S. image acquisition sites
- Retrospective or Prospective: Retrospectively collected
- Independence: The data was collected from sites independent of those included in the training and development sets. iCAD ensured this independence by sequestering the data.
- Manufacturer: 100% Hologic DBT system exam data.
- Exam Dates: 2018 - 2022.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

Number of Experts: The text states, "Each cancer case was a biopsy proven positive, truthed by an expert breast imaging radiologist". While it explicitly mentions "an expert breast imaging radiologist" in the singular for truthing, it does not specify the exact number of unique "expert breast imaging radiologists" involved in truthing the entire dataset or their specific years of experience.

4. Adjudication method (e.g., 2+1, 3+1, none) for the test set:

The text does not specify a formal adjudication method (like 2+1 or 3+1) for establishing ground truth from multiple readers. Ground truth was established based on clinical data including radiology report, follow-up biopsy, and pathology data, and then truthed by an expert radiologist.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs. without AI assistance:

No, an MRMC comparative effectiveness study was NOT done. The study described is a standalone performance assessment of the AI algorithm itself, comparing it to a predicate AI algorithm. It does not evaluate the performance of human readers, either with or without AI assistance.

6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:

Yes, a standalone study was done. The text explicitly states: "A standalone study was conducted, which evaluated the performance of ProFound Detection version 4.0 without an interpreting physician." This study directly compared the algorithm's performance (V4.0) against the predicate (V3.0) on an independent test set.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

The ground truth was a combination of biopsy-proven pathology data and clinical data, including radiology reports and follow-up data. Specifically, "These reference standards were derived from clinical data including radiology report, follow-up biopsy and pathology data. Each cancer case was a biopsy proven positive, truthed by an expert breast imaging radiologist who outlined the location and extent of cancer lesions in the case."

8. The sample size for the training set:

The sample size for the training set is not provided. The text only refers to the test set being "independent of those included in the training and development" and that iCAD "ensures the independence of this dataset by sequestering the data and keeping it separate from the test and development datasets."

9. How the ground truth for the training set was established:

How the ground truth for the training set was established is not explicitly detailed. The text mentions that the test set's ground truth was established by "biopsy proven cancer cases" and "truthed by an expert breast imaging radiologist." While it implies a similar process would likely be used for training data, the specific method for the training set's ground truth establishment is not provided in the submitted document.

Ask a Question

Ask a specific question about this device

K Number

K242652

Device Name

Lunit INSIGHT DBT v1.1

Manufacturer

Lunit Inc.

Date Cleared

2024-10-04

(30 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K231470

Predicate For

N/A

Intended Use

Lunit INSIGHT DBT is a computer-assisted detection and diagnosis (CADe/x) software intended to be used concurrently by interpreting physicians to aid in the detection and characterization of suspected lesions for breast cancer in digital breast tomosynthesis (DBT) exams from compatible DBT systems. Through the analysis. the regions of soft tissue lesions and calcifications are marked with an abnormality score indicating the likelihood of the presence of malignancy for each lesion. Lunit INSIGHT DBT uses screening mammograms of the female population.

Lunit INSIGHT DBT is not intended as a replacement for a complete interpreting physician's review or their clinical judgment that takes into account other relevant information from the image or patient history.

Device Description

Lunit INSIGHT DBT is a computer-assisted detection/diagnosis (CADe/x) software as a medical device that provides information about the presence, location and characteristics of lesions suspicious for breast cancer to assist interpreting physicians in making diagnostic decisions when reading digital breast tomosynthesis (DBT) images. The software automatically analyzes digital breast tomosynthesis slices via artificial intelligence technology that has been trained via deep learning.

For each DBT case, Lunit INSIGHT DBT generates an artificial intelligence analysis results that include the lesion type, location, lesion-level/case-level score, and outline of the regions suspected of breast cancer. This peripheral information intends to augment the physician's workflow to better aid in detection and diagnosis of breast cancer.

AI/ML Overview

The provided text describes the 510(k) submission for Lunit INSIGHT DBT v1.1, a computer-assisted detection and diagnosis (CADe/x) software for breast cancer in digital breast tomosynthesis (DBT) exams. The document primarily focuses on demonstrating substantial equivalence to its predicate device, Lunit INSIGHT DBT v1.0.

Here's an analysis of the acceptance criteria and the study that proves the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

The core acceptance criterion explicitly mentioned for the standalone performance testing is an AUROC (Area Under the Receiver Operating Characteristic curve) greater than 0.903. This is directly compared to the predicate device's performance.

Acceptance Criterion (Primary Endpoint)	Reported Device Performance (Lunit INSIGHT DBT v1.1)
AUROC in standalone performance > 0.903	AUROC = 0.931 (95% CI: 0.920 - 0.941)
Statistical Significance	p < 0.0001
Exceeds Acceptance Criteria	Yes

Details of the Study

The provided text only discusses a "Standalone Performance Testing" for Lunit INSIGHT DBT v1.1. It states that the protocol for this evaluation was the same as that used for the predicate device (K231470).

Sample Size Used for the Test Set and Data Provenance:
- Sample Size: The document does not explicitly state the sample size (number of cases or images) used for the test set in the standalone performance study.
- Data Provenance: The document does not explicitly state the country of origin of the data or whether it was retrospective or prospective. It only mentions that the software uses "screening mammograms of the female population."
Number of Experts Used to Establish Ground Truth and Qualifications:
- The document does not specify the number of experts used or their qualifications for establishing ground truth in the standalone performance study.
Adjudication Method for the Test Set:
- The document does not mention any adjudication method (e.g., 2+1, 3+1, none) used for the test set.
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- The document does not indicate that a multi-reader multi-case (MRMC) comparative effectiveness study was done to show how much human readers improve with AI vs. without AI assistance. The study described focuses on the standalone performance of the AI algorithm.
Standalone (Algorithm Only Without Human-in-the-Loop Performance) Study:
- Yes, a standalone performance study was done. The document explicitly states: "A standalone performance study of the Lunit INSIGHT DBT v1.1 assessed the detection performance of the artificial intelligence algorithm for breast cancer within DBT exams."
Type of Ground Truth Used:
- The document does not explicitly state the specific type of ground truth used (e.g., expert consensus, pathology, outcomes data, etc.) for the standalone performance study. It refers to the detection of "breast cancer," implying a definitive diagnosis, but doesn't detail how this diagnosis was established as ground truth for the test cases.
Sample Size for the Training Set:
- The document does not specify the exact sample size for the training set. It only mentions that the updated AI engine has "expanded training data."
How the Ground Truth for the Training Set Was Established:
- The document does not explicitly detail how the ground truth for the training set was established. It states that the AI technology "has been trained via deep learning," which implies the use of labeled data, but does not describe the process of labeling or establishing that ground truth.

In summary:

The provided information focuses on demonstrating that Lunit INSIGHT DBT v1.1 meets the standalone performance AUROC criterion (0.931 > 0.903), which was the same criterion used for its predicate device. However, it lacks detailed information regarding the specifics of the data used (sample sizes, provenance), the ground truth establishment process (experts, adjudication), and the absence of an MRMC study is notable for a CADe/x device, though not explicitly required for this specific 510(k) submission that highlights substantial equivalence based on standalone performance to a predicate.

Ask a Question

Ask a specific question about this device

Page 1 of 3