K Number
DEN200080
Device Name
Paige Prostate
Manufacturer
Date Cleared
2021-09-21

(264 days)

Product Code
Regulation Number
864.3750
Type
Direct
Reference & Predicate Devices
N/A
Predicate For
N/A
AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview
Intended Use

Paige Prostate is a software only device intended to assist pathologists in the detection of foci that are suspicious for cancer during the review of scanned whole slide images (WSI) from prostate needle biopsies prepared from hematoxylin & eosin (H&E) stained formalinfixed paraffin embedded (FFPE) tissue. After initial diagnostic review of the WSI by the pathologist, if Paige Prostate detects tissue morphology suspicious for cancer, it provides coordinates (X,Y) on a single location on the image with the highest likelihood of having cancer for further review by the pathologist.

Paige Prostate is intended to be used with slide images digitized with Philips Ultra Fast Scanner and visualized with Paige FullFocus WSI viewing software.

Paige Prostate is an adjunctive computer-assisted methodology and its output should not be used as the primary diagnosis. Pathologists should only use Paige Prostate in conjunction with their complete standard of care evaluation of the slide image.

Device Description

Paige Prostate is an in vitro diagnostic medical device software, derived from a deterministic deep learning system that has been developed with digitized WSIs of H&E stained prostate needle biopsy slides.

Paige Prostate utilizes several accessory devices as shown in Figure 1 below, for automated ingestion of the input. The device identifies areas suspicious for cancer on the input WSIs. For each input WSI, Paige Prostate automatically analyzes the WSI and outputs the following:

  • . Binary classification of suspicious or not suspicious for cancer based on a pre-defined threshold on the neural network output.
  • . If the slide is classified as suspicious for cancer, a single coordinate (X,Y) of the location with the highest probability of cancer on an image determined to be suspicious for cancer.
AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for Paige Prostate, based on the provided text:


Acceptance Criteria and Reported Device Performance

Acceptance CriteriaReported Device PerformanceComments
Algorithm Localization (X,Y Coordinate) and Accuracy StudySensitivity: 94.5% (95% CI: 91.4%; 96.6%)
Specificity: 94.0% (95% CI: 91.3%; 95.9%)This study evaluated the standalone performance of the algorithm in identifying suspicious foci and localizing them.
Precision Study (Within-scanner)Cancer Slides: Probability of result being "Cancer" with same scanner/operator is 99.0% (95%CI: 94.8%; 99.8%)
Benign Slides: Probability of result being "Benign" with same scanner/operator is 94.4% (95%CI: 88.4%; 97.4%)This assessed the consistency of the device's output under repeated scans by the same operator on the same scanner.
Precision Study (Reproducibility: Between-scanner and between-operator)Cancer Slides: Probability of result being "Cancer" with different scanners/operators is 100% (95%CI: 96.5%; 100%)
Benign Slides: Probability of result being "Benign" with different scanners/operators is 93.5% (95%CI: 87.2%; 96.8%)This assessed the consistency of the device's output across different scanners and operators.
Localization Precision StudyLocation Correct (Within-Scanner, Op1/Sc1): 98.2% (56/57) (95%CI: 90.7%; 99.7%)
Location Correct (3 Scanners, 3 Operators): 96.4% (53/55) (95%CI: 87.7%; 99.0%)This focused specifically on the precision of the (X,Y) coordinate localization.
Clinical Study (Pathologist Performance with AI Assistance)Average Improvement in Sensitivity: 7.3% (95% CI: 3.9%; 11.4%) (statistically significant)
Average Difference in Specificity: 1.1% (95% CI: -0.7%; 3.4%) (not statistically significant)This study measured the efficacy of Paige Prostate as an adjunctive tool for pathologists. "Positive" was defined as 'deferred' or 'cancer', and 'negative' as 'benign'.

Study Information

2. Sample size used for the test set and the data provenance:

  • Algorithm Localization and Accuracy Study:

    • Test Set Size: 728 WSIs (311 cancer, 417 benign) from unique patients.
    • Provenance: De-identified WSIs from:
      • Consecutive prostate cancer slides from an internal site (located in US).
      • Challenging cancer slides (≤0.5mm tumor) from an internal site.
      • Consecutive cancer slides submitted from external sites.
      • Challenging cancer slides submitted from external sites.
      • Benign slides from consecutive prostate biopsy cases from an internal site.
      • Consecutive benign slides submitted from external sites (submitted to internal site for expert consultation).
    • External Sites: Included 217 different sites located throughout the world (including US).
    • Retrospective/Prospective: Retrospective.
  • Precision Study:

    • Test Set Size: 35 cancer WSIs and 36 benign WSIs from unique patients.
    • Provenance: Slides from an internal site and external sites (217 different sites).
    • Retrospective/Prospective: Retrospective.
  • Clinical Study:

    • Test Set Size: 527 WSIs (171 prostate cancer, 356 benign) from unique patients.
    • Provenance: 44.15% from cases prepared, reviewed, diagnosed, and digitized at the internal site (US). 55.85% from cases prepared at 156 different external sites but reviewed, diagnosed, and digitized at the internal site.
    • Retrospective/Prospective: Retrospective.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

  • Algorithm Localization and Accuracy Study (Localization Ground Truth):

    • Number of Experts: 3 study pathologists.
    • Qualifications: US board-certified pathologists (2 completed anatomic pathology fellowship and 1 sub-specialized genitourinary pathologist). They were blinded to Paige Prostate results.
  • Clinical Study (Ground Truth for slide-level cancer/benign):

    • Number of Experts: Not explicitly stated as "experts for ground truth creation" but implies the original pathologists who generated the synoptic diagnostic reports.
    • Qualifications: Pathologists at the internal site generating synoptic diagnostic reports.

4. Adjudication method for the test set:

  • Algorithm Localization and Accuracy Study (Localization Ground Truth):

    • Adjudication Method: The union of annotations between at least 2 of the 3 annotating pathologists was used as the localization ground truth.
  • Clinical Study (Slide-Level Cancer/Benign Ground Truth):

    • Adjudication Method: "Synoptic diagnostic reports from the internal site were used to generate the ground truth for each slide as either cancer or no cancer." This implies a single, established diagnostic report rather than a consensus process for the study's ground truth.

5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

  • Yes, an MRMC comparative effectiveness study was done (the "Clinical Study").
  • Effect Size of Improvement:
    • Average Improvement in Sensitivity: 7.3% (95% CI: 3.9%; 11.4%)
    • Average Difference in Specificity: 1.1% (95% CI: -0.7%; 3.4%)
    • The document clarifies that this is an average across 16 pathologists.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

  • Yes, a standalone performance study was done. This is detailed in the "Analytical Performance" section, specifically the "Algorithm Localization (X,Y Coordinate) and Accuracy Study."
    • Sensitivity (Standalone): 94.5%
    • Specificity (Standalone): 94.0%

7. The type of ground truth used:

  • Algorithm Localization and Accuracy Study (Slide-Level Cancer Ground Truth): Synoptic pathology diagnostic reports from the internal site.
  • Algorithm Localization and Accuracy Study (Localization Ground Truth): Consensus of 3 US board-certified pathologists who manually annotated image patches.
  • Precision Study (Slide-Level Cancer Ground Truth): Synoptic diagnostic reports from the internal site.
  • Clinical Study (Slide-Level Cancer/Benign Ground Truth): Original diagnostic synoptic reports.

8. The sample size for the training set:

  • Training Dataset: 33,543 slide images.

9. How the ground truth for the training set was established:

  • "De-identified slides were labeled as benign or cancer based on the synoptic diagnostic pathology report."

§ 864.3750 Software algorithm device to assist users in digital pathology.

(a)
Identification. A software algorithm device to assist users in digital pathology is an in vitro diagnostic device intended to evaluate acquired scanned pathology whole slide images. The device uses software algorithms to provide information to the user about presence, location, and characteristics of areas of the image with clinical implications. Information from this device is intended to assist the user in determining a pathology diagnosis.(b)
Classification. Class II (special controls). The special controls for this device are:(1) The intended use on the device's label and labeling required under § 809.10 of this chapter must include:
(i) Specimen type;
(ii) Information on the device input(s) (
e.g., scanned whole slide images (WSI), etc.);(iii) Information on the device output(s) (
e.g., format of the information provided by the device to the user that can be used to evaluate the WSI, etc.);(iv) Intended users;
(v) Necessary input/output devices (
e.g., WSI scanners, viewing software, etc.);(vi) A limiting statement that addresses use of the device as an adjunct; and
(vii) A limiting statement that users should use the device in conjunction with complete standard of care evaluation of the WSI.
(2) The labeling required under § 809.10(b) of this chapter must include:
(i) A detailed description of the device, including the following:
(A) Detailed descriptions of the software device, including the detection/analysis algorithm, software design architecture, interaction with input/output devices, and necessary third-party software;
(B) Detailed descriptions of the intended user(s) and recommended training for safe use of the device; and
(C) Clear instructions about how to resolve device-related issues (
e.g., cybersecurity or device malfunction issues).(ii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics, medical history, user experience, and scanning equipment, as applicable.
(iii) Limiting statements that indicate:
(A) A description of situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), including any limitations in the dataset used to train, test, and tune the algorithm during device development;(B) The data acquired using the device should only be interpreted by the types of users indicated in the intended use statement; and
(C) Qualified users should employ appropriate procedures and safeguards (e.g., quality control measures, etc.) to assure the validity of the interpretation of images obtained using this device.
(3) Design verification and validation must include:
(i) A detailed description of the device software, including its algorithm and its development, that includes a description of any datasets used to train, tune, or test the software algorithm. This detailed description of the device software must include:
(A) A detailed description of the technical performance assessment study protocols (e.g., regions of interest (ROI) localization study) and results used to assess the device output(s) (e.g., image overlays, image heatmaps, etc.);
(B) The training dataset must include cases representing different pre-analytical variables representative of the conditions likely to be encountered when used as intended (e.g., fixation type and time, histology slide processing techniques, challenging diagnostic cases, multiple sites, patient demographics, etc.);
(C) The number of WSI in an independent validation dataset must be appropriate to demonstrate device accuracy in detecting and localizing ROIs on scanned WSI, and must include subsets clinically relevant to the intended use of the device;
(D) Emergency recovery/backup functions, which must be included in the device design;
(E) System level architecture diagram with a matrix to depict the communication endpoints, communication protocols, and security protections for the device and its supportive systems, including any products or services that are included in the communication pathway; and
(F) A risk management plan, including a justification of how the cybersecurity vulnerabilities of third-party software and services are reduced by the device's risk management mitigations in order to address cybersecurity risks associated with key device functionality (such as loss of image, altered metadata, corrupted image data, degraded image quality, etc.). The risk management plan must also include how the device will be maintained on its intended platform (
e.g. a general purpose computing platform, virtual machine, middleware, cloud-based computing services, medical device hardware, etc.), which includes how the software integrity will be maintained, how the software will be authenticated on the platform, how any reliance on the platform will be managed in order to facilitate implementation of cybersecurity controls (such as user authentication, communication encryption and authentication, etc.), and how the device will be protected when the underlying platform is not updated, such that the specific risks of the device are addressed (such as loss of image, altered metadata, corrupted image data, degraded image quality, etc.).(ii) Data demonstrating acceptable, as determined by FDA, analytical device performance, by conducting analytical studies. For each analytical study, relevant details must be documented (e.g., the origin of the study slides and images, reader/annotator qualifications, method of annotation, location of the study site(s), challenging diagnoses, etc.). The analytical studies must include:
(A) Bench testing or technical testing to assess device output, such as localization of ROIs within a pre-specified threshold. Samples must be representative of the entire spectrum of challenging cases likely to be encountered when the device is used as intended; and
(B) Data from a precision study that demonstrates device performance when used with multiple input devices (e.g., WSI scanners) to assess total variability across operators, within-scanner, between-scanner and between-site, using clinical specimens with defined, clinically relevant, and challenging characteristics likely to be encountered when the device is used as intended. Samples must be representative of the entire spectrum of challenging cases likely to be encountered when the device is used as intended. Precision, including performance of the device and reproducibility, must be assessed by agreement between replicates.
(iii) Data demonstrating acceptable, as determined by FDA, clinical validation must be demonstrated by conducting studies with clinical specimens. For each clinical study, relevant details must be documented (e.g., the origin of the study slides and images, reader/annotator qualifications, method of annotation, location of the study site(s) (on-site/remote), challenging diagnoses, etc.). The studies must include:
(A) A study demonstrating the performance by the intended users with and without the software device (e.g., unassisted and device-assisted reading of scanned WSI of pathology slides). The study dataset must contain sufficient numbers of cases from relevant cohorts that are representative of the scope of patients likely to be encountered given the intended use of the device (e.g., subsets defined by clinically relevant confounders, challenging diagnoses, subsets with potential biopsy appearance modifiers, concomitant diseases, and subsets defined by image scanning characteristics, etc.) such that the performance estimates and confidence intervals for these individual subsets can be characterized. The performance assessment must be based on appropriate diagnostic accuracy measures (e.g., sensitivity, specificity, predictive value, diagnostic likelihood ratio, etc.).
(B) [Reserved]