(267 days)
Galen™ Second Read™ is a software only device intended to analyze scanned histopathology whole slide images (WSIs) from prostate core needle biopsies (PCNB) prepared from hematoxylin & eosin (H&E) stained formalin-fixed paraffin embedded (FFPE) tissue. The device is intentify cases initially diagnosed as benign for further review by a pathologist. If Galen™ Second Read™ detects tissue morphology suspicious for prostate adenocarcinoma (AdC), it provides case- and slide-level alerts (flags) which includes a heatmap of tissue areas in the WSI that is likely to contain cancer.
Galen™ Second Read™ is intended to be used with slide images digitized with Philips Ultra Fast Scanner and visualized using the Galen™ Second Read™ user interface.
Galen™ Second Read™ outputs are not intended to be used on a standalone basis for diagnosis, to rule out prostatic AdC or to preclude pathological assessment of WSIs according to the standard of care.
The Galen Second Read is an in vitro diagnostic medical device software, derived from a deterministic deep convolutional network that has been developed with digitized WSIs of H&E-stained prostate core needle biopsy (PCNB) slides originating from formalin-fixed paraffinembedded (FFPE) tissue sections, that were initially diagnosed as benign by the pathologist.
The Galen Second Read is cloud-hosted and utilizes external accessories [e.g., scanner and image management systems (IMS)] for automatic ingestion of the input. The device identifies WSIs that are more likely to contain prostatic adenocarcinoma (AdC). For each input WSI, the Galen Second Read automatically analyzes the WSI and outputs the following:
- Binary classification of the likelihood (high/low) to contain AdC based on a predetermined . threshold of the neural network output.
- For slides classified with high likelihood to contain AdC, slide-level findings are flagged . and visualized (AdC score and heatmap) for additional review by a pathologist alongside the WSI.
- For slides classified as low likelihood to contain AdC, no additional output is available. .
Galen Second Read key functionalities include image upload and analysis, flag slides of high likelihood to contain AdC and display of all the WSIs uploaded to the system alongside their analysis results. Flagged findings constitute a recommendation for additional review by a pathologist.
Here's a breakdown of the acceptance criteria and study information for the Galen™ Second Read™ device, based on the provided text:
Acceptance Criteria and Device Performance
The document does not explicitly state pre-defined acceptance criteria with specific numerical targets. Instead, it presents the device's performance metrics from clinical studies. The implied acceptance criteria are that the device should improve the detection of prostatic adenocarcinoma (AdC) in initially benign cases when assisting pathologists.
Here are the reported device performance metrics from the provided studies:
Table 1: Device Performance (Clinical Study 1 - Standalone Performance)
Parameter | Estimate | 95% CI | Context |
---|---|---|---|
Slide-Level | |||
Sensitivity | 81.0% | (69.2%; 92.9%) | Ability to correctly identify GT positive slides |
Specificity | 91.6% | (90.9%; 92.3%) | Ability to correctly identify GT negative slides |
Case-Level | |||
Sensitivity | 80.8% | (74.1%; 87.6%) | Ability to correctly identify GT positive cases |
Specificity | 46.9% | (39.5%; 54.3%) | Ability to correctly identify GT negative cases |
Table 2: Device Performance (Clinical Study 2 - Human-in-the-Loop Performance)
Parameter | Performance with Galen Second Read AI Assistance | Performance with Standard of Care (SoC) | Difference | 95% CI (Difference) |
---|---|---|---|---|
Combined Pathologists (Overall) | ||||
Sensitivity | 93.9% | 90.5% | 3.5% | (2.3%; 4.5%) |
Specificity | 87.9% | 91.1% | -3.2% | (-4.3%; -1.9%) |
For Slides Initially Assessed as Benign by Pathologists | ||||
Sensitivity | 36.3% | 0% (SoC) | 36.3% | (28.0%; 45.5%) |
Specificity | 96.5% | 100% (SoC) | -3.5% (approx) | (95.2%; 97.5%) |
Study Information:
1. Sample Size and Data Provenance
Analytical Performance Studies (Precision and Localization):
- Sample Size: Not explicitly stated as a single number for these studies. The tables show "n/N" values for positive and negative slides. For repeatability, there were 39 positive slides and 38 negative slides in each run (total for repeatability: 3 runs * 39 positive + 3 runs * 38 negative = 231 slide-reads). For reproducibility, it was also based on "39" and "38" slides for each scanner/operator combination.
- Data Provenance: Retrospectively collected, de-identified slides.
- Country of Origin: Not specified for these analytical studies.
Clinical Performance Study 1 (Standalone Performance):
- Sample Size: 347 cases (initially diagnosed as benign) with associated whole slide images (WSIs).
- Data Provenance: Retrospectively collected samples.
- Country of Origin: Three sites, including 2 US sites and 1 Outside the US (OUS) site.
Clinical Performance Study 2 (Human-in-the-Loop Performance):
- Sample Size: 772 cases/slides (376 negative cases and 396 positive cases).
- Data Provenance: Retrospectively collected slides.
- Country of Origin: Four sites, including 3 US sites and 1 OUS site.
2. Number of Experts and Qualifications for Test Set Ground Truth
Analytical Performance Studies:
- Number of Experts: Not explicitly stated, but "GT determined as 'positive', or 'benign' by the GT pathologists" implies multiple pathologists.
- Qualifications: "GT pathologists" - no specific experience level mentioned.
Clinical Performance Study 1 (Standalone Performance):
- Number of Experts: Two independent expert pathologists for initial review, with a third independent expert pathologist for tie-breaking.
- Qualifications: "Independent expert pathologists" - no specific experience level mentioned.
Clinical Performance Study 2 (Human-in-the-Loop Performance):
- Number of Experts: Not explicitly detailed for the GT determination for this specific study, but it is likely consistent with Study 1's method, as it shares similar retrospective data characteristics.
- Qualifications: Not explicitly detailed for the GT determination for this specific study.
3. Adjudication Method for the Test Set
Clinical Performance Study 1 (Standalone Performance):
- Adjudication Method: 2+1 (Two independent expert pathologists, with a third independent expert pathologist to review disagreements and determine the majority rule for the final ground truth).
Analytical Performance Studies & Clinical Performance Study 2:
- Adjudication Method: Not explicitly detailed, but implied to be expert consensus.
4. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Yes, a MRMC comparative effectiveness study was done (Clinical Performance Study 2).
- Effect Size of Human Readers Improvement with AI vs. without AI Assistance:
- Sensitivity: The combined sensitivity for pathologists improved by 3.5% (95% CI: 2.3%; 4.5%) with Galen Second Read assistance compared to SoC.
- Specificity: The combined specificity for pathologists decreased by 3.2% (95% CI: -4.3%; -1.9%) with Galen Second Read assistance compared to SoC.
- For slides initially assessed as benign by pathologists (the intended use population), sensitivity increased by 36.3% (from 0% in SoC to 36.3% with Galen Second Read). Specificity for these slides decreased by 3.5% (from 100% in SoC to 96.5% with Galen Second Read).
5. Standalone Performance Study
- Yes, a standalone (algorithm only without human-in-the-loop performance) was done (Clinical Performance Study 1).
- The results are shown in "Table 1: Device Performance (Clinical Study 1 - Standalone Performance)" above.
6. Type of Ground Truth Used
- Expert Consensus: For both clinical performance studies, the ground truth for slides was established by expert pathologists via a consensus process (two independent experts, with a third for adjudication in cases of disagreement). The ground truth for cases was derived from the slide-level ground truth.
7. Sample Size for the Training Set
- Not provided in the document. The document describes the device as a "deterministic deep convolutional network that has been developed with digitized WSIs...". However, it does not state the specific sample size, origin, or characteristics of the training dataset.
8. How Ground Truth for the Training Set Was Established
- Not provided in the document. While it mentions the network was "developed with digitized WSIs," details on how the ground truth for these training images was established are not included in the provided text.
§ 864.3750 Software algorithm device to assist users in digital pathology.
(a)
Identification. A software algorithm device to assist users in digital pathology is an in vitro diagnostic device intended to evaluate acquired scanned pathology whole slide images. The device uses software algorithms to provide information to the user about presence, location, and characteristics of areas of the image with clinical implications. Information from this device is intended to assist the user in determining a pathology diagnosis.(b)
Classification. Class II (special controls). The special controls for this device are:(1) The intended use on the device's label and labeling required under § 809.10 of this chapter must include:
(i) Specimen type;
(ii) Information on the device input(s) (
e.g., scanned whole slide images (WSI), etc.);(iii) Information on the device output(s) (
e.g., format of the information provided by the device to the user that can be used to evaluate the WSI, etc.);(iv) Intended users;
(v) Necessary input/output devices (
e.g., WSI scanners, viewing software, etc.);(vi) A limiting statement that addresses use of the device as an adjunct; and
(vii) A limiting statement that users should use the device in conjunction with complete standard of care evaluation of the WSI.
(2) The labeling required under § 809.10(b) of this chapter must include:
(i) A detailed description of the device, including the following:
(A) Detailed descriptions of the software device, including the detection/analysis algorithm, software design architecture, interaction with input/output devices, and necessary third-party software;
(B) Detailed descriptions of the intended user(s) and recommended training for safe use of the device; and
(C) Clear instructions about how to resolve device-related issues (
e.g., cybersecurity or device malfunction issues).(ii) A detailed summary of the performance testing, including test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as anatomical characteristics, patient demographics, medical history, user experience, and scanning equipment, as applicable.
(iii) Limiting statements that indicate:
(A) A description of situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), including any limitations in the dataset used to train, test, and tune the algorithm during device development;(B) The data acquired using the device should only be interpreted by the types of users indicated in the intended use statement; and
(C) Qualified users should employ appropriate procedures and safeguards (e.g., quality control measures, etc.) to assure the validity of the interpretation of images obtained using this device.
(3) Design verification and validation must include:
(i) A detailed description of the device software, including its algorithm and its development, that includes a description of any datasets used to train, tune, or test the software algorithm. This detailed description of the device software must include:
(A) A detailed description of the technical performance assessment study protocols (e.g., regions of interest (ROI) localization study) and results used to assess the device output(s) (e.g., image overlays, image heatmaps, etc.);
(B) The training dataset must include cases representing different pre-analytical variables representative of the conditions likely to be encountered when used as intended (e.g., fixation type and time, histology slide processing techniques, challenging diagnostic cases, multiple sites, patient demographics, etc.);
(C) The number of WSI in an independent validation dataset must be appropriate to demonstrate device accuracy in detecting and localizing ROIs on scanned WSI, and must include subsets clinically relevant to the intended use of the device;
(D) Emergency recovery/backup functions, which must be included in the device design;
(E) System level architecture diagram with a matrix to depict the communication endpoints, communication protocols, and security protections for the device and its supportive systems, including any products or services that are included in the communication pathway; and
(F) A risk management plan, including a justification of how the cybersecurity vulnerabilities of third-party software and services are reduced by the device's risk management mitigations in order to address cybersecurity risks associated with key device functionality (such as loss of image, altered metadata, corrupted image data, degraded image quality, etc.). The risk management plan must also include how the device will be maintained on its intended platform (
e.g. a general purpose computing platform, virtual machine, middleware, cloud-based computing services, medical device hardware, etc.), which includes how the software integrity will be maintained, how the software will be authenticated on the platform, how any reliance on the platform will be managed in order to facilitate implementation of cybersecurity controls (such as user authentication, communication encryption and authentication, etc.), and how the device will be protected when the underlying platform is not updated, such that the specific risks of the device are addressed (such as loss of image, altered metadata, corrupted image data, degraded image quality, etc.).(ii) Data demonstrating acceptable, as determined by FDA, analytical device performance, by conducting analytical studies. For each analytical study, relevant details must be documented (e.g., the origin of the study slides and images, reader/annotator qualifications, method of annotation, location of the study site(s), challenging diagnoses, etc.). The analytical studies must include:
(A) Bench testing or technical testing to assess device output, such as localization of ROIs within a pre-specified threshold. Samples must be representative of the entire spectrum of challenging cases likely to be encountered when the device is used as intended; and
(B) Data from a precision study that demonstrates device performance when used with multiple input devices (e.g., WSI scanners) to assess total variability across operators, within-scanner, between-scanner and between-site, using clinical specimens with defined, clinically relevant, and challenging characteristics likely to be encountered when the device is used as intended. Samples must be representative of the entire spectrum of challenging cases likely to be encountered when the device is used as intended. Precision, including performance of the device and reproducibility, must be assessed by agreement between replicates.
(iii) Data demonstrating acceptable, as determined by FDA, clinical validation must be demonstrated by conducting studies with clinical specimens. For each clinical study, relevant details must be documented (e.g., the origin of the study slides and images, reader/annotator qualifications, method of annotation, location of the study site(s) (on-site/remote), challenging diagnoses, etc.). The studies must include:
(A) A study demonstrating the performance by the intended users with and without the software device (e.g., unassisted and device-assisted reading of scanned WSI of pathology slides). The study dataset must contain sufficient numbers of cases from relevant cohorts that are representative of the scope of patients likely to be encountered given the intended use of the device (e.g., subsets defined by clinically relevant confounders, challenging diagnoses, subsets with potential biopsy appearance modifiers, concomitant diseases, and subsets defined by image scanning characteristics, etc.) such that the performance estimates and confidence intervals for these individual subsets can be characterized. The performance assessment must be based on appropriate diagnostic accuracy measures (e.g., sensitivity, specificity, predictive value, diagnostic likelihood ratio, etc.).
(B) [Reserved]