Search Results

The VinDr-Mammo is a passive notification for prioritization-only, a parallel-workflow software tool used by MQSA qualified interpreting physicians to prioritize patients with suspicious findings in the medical care environment. VinDr-Mammo utilizes an artificial intelligence algorithm to analyze 2D FFDM screening mammograms and flags those that are suggestive of the presence of at least one suspicious finding at the exam-level. VinDr-Mammo produces an exam-level output to a PACS/ Workstation for flagging the suspicious case and allows worklist prioritization.

MQSA qualified interpreting physicians are responsible for reviewing each exam on a display approved for use in mammography, according to the current standard of care. VinDr-Mammo device is limited to the categorization of exams, does not provide any diagnostic information beyond triage and prioritization, does not remove images from the interpreting physician's worklist, and should not be used in lieu of full patient evaluation, or relied upon to make or confirm diagnosis.

The VinDr-Mammo device is intended for use with complete 2D FFDM mammography exams acquired using validated FFDM systems only.

Device Description

The VinDr-Mammo is an innovative medical device designed to assist in the analysis and triage of 2D full-field digital mammogram (FFDM) screening mammograms. Operating as non-invasive computer-assisted software, known as SaMD, it employs a machine learning algorithm to identify potential suspicious findings within the images. Once identified, the system promptly notifies a PACS/workstation for further examination. This passive-notification feature enables radiologists to prioritize their workload efficiently and view studies in order of importance using standard PACS or workstation viewing software. It is important to note that the VinDr-Mammo software is intended solely to aid in the prioritization and triage of radiological medical images. It serves as a valuable tool for MQSA interpreting physicians who specialize in mammogram readings, complementing the standard of care. It should be emphasized that the device does not replace the need for a comprehensive evaluation as per established medical practices. During the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience.

The VinDr-Mammo code can be viewed by radiologists on a Picture Archiving and Communication System (PACS), Electronic Patient Record (EPR), and/or Radiology Information System (RIS) worklist and can be used to reorder the worklist: the mammographic studies with code 1 should be prioritized over those with code 0 and, thus, should be moved to the top of the worklist. As a software-only device, VinDr-Mammo can be hosted on a compatible host server connected to the necessary clinical IT systems such that DICOM studies can be received and the resulting outputs returned where they can be incorporated into the radiology worklist.

The following modules compose the VinDr-Mammo software:

Data input and validation: Following retrieval of a study, the validation feature assessed the input data (i.e. age, modality, view) to ensure compatibility for processing by the algorithm.
VinDr-Mammo algorithm: Once a study has been validated, the algorithm analyzes the 2D FFDM screening mammogram for detection of suspected findings.
API Cognitive service: The study analysis and the results of a successful study analysis are provided through an API service, whose outputs will then be sent to the appropriate clinical IT system for viewing on a radiology worklist.
Error codes feature: In the case of a study failure during data validation or the analysis by the algorithm, an error is provided to the system.

AI/ML Overview

Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly state pre-defined acceptance criteria for performance metrics like Sensitivity, Specificity, or AUC. Instead, it presents the device's performance metrics and then concludes that it is "substantially equivalent" to a predicate device, implicitly using the predicate's performance as a benchmark.

However, based on the performance data presented and the comparison to the predicate, we can infer some implied performance expectations:

Metric	Acceptance Criteria (Implied by Predicate Performance)	Reported VinDr-Mammo Aggregate Performance	Reported Predicate (K220080) Performance
Sensitivity	At least 0.870	0.900 (0.877 - 0.921 CI)	0.870
Specificity	At least 0.890	0.910 (0.897 - 0.922 CI)	0.890
AUC	At least 0.957	0.962 (0.957 - 0.971 CI)	0.957 (0.936 - 0.973 CI)
Processing Time	Within clinical operational expectations (minutes)	Average of 2.8 minutes	Not explicitly stated for predicate

2. Sample Size Used for the Test Set and Data Provenance

The device performance was evaluated on two separate pivotal studies:

Study 1 (RSNA Dataset):
- Sample Size: 1000 2D FFDM mammogram exams.
- Data Provenance: Retrospective data provided by the Radiological Society of North America (RSNA) via their RSNA Screening Mammography Breast Cancer Detection AI Challenge. This dataset was used to demonstrate generalizability to the demographics of the US population.
Study 2 (Vietnamese Dataset):
- Sample Size: 1864 anonymized 2D FFDM mammograms.
- Data Provenance: Retrospective cohort from a frontline Vietnamese hospital (Hanoi Medical University Hospital). This dataset was used to demonstrate generalizability to different screening modalities due to the lack of scanner information in the RSNA dataset.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts

The document does not specify the number of experts used or their qualifications for establishing the ground truth of the test sets. It mentions:

For the RSNA dataset: 252 cases positive for cancer with histologically proven and 748 cases negative for breast cancer (BI-RADS 1, BI-RADS 2 and biopsy-proven benign) with a two-year follow-up of a negative diagnosis.
For the Vietnamese dataset: 466 cases positive with biopsy-confirmed cancers and 1398 cases negative for breast cancer (BIRADS1, BIRADS2 and biopsy-proven benign) with a two-year follow-up of a negative diagnosis.

This implies that the ground truth was established through a combination of histological proof, BI-RADS assessment, and clinical follow-up, which would typically involve qualified radiologists and pathologists, but specific numbers or qualifications are not provided.

4. Adjudication Method for the Test Set

The document does not explicitly describe an adjudication method (e.g., 2+1, 3+1, none) used for establishing the ground truth of the test sets. The ground truth seems to be derived from a combination of histological proof, BI-RADS classification, and 2-year follow-up data.

5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, If so, what was the effect size of how much human readers improve with AI vs. without AI assistance

The document does not mention a Multi-Reader Multi-Case (MRMC) comparative effectiveness study. The studies presented focus on the standalone performance of the VinDr-Mammo device. Therefore, no effect size for human reader improvement with AI assistance is provided.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done

Yes, a standalone performance study was done. Both pivotal studies (RSNA and Vietnamese datasets) evaluate the "standalone detection and triage" or simply the "performance" of the VinDr-Mammo device (the algorithm only) against the established ground truth. The reported Sensitivity, Specificity, and AUC metrics are indicative of standalone algorithm performance.

7. The Type of Ground Truth Used

The ground truth used was a combination of:

Histological Proof: For positive cancer cases, confirmation was based on biopsy results.
BI-RADS Classification: For negative cases, BI-RADS 1 and 2 classifications were used.
Outcomes Data (2-year follow-up): For negative cases (BI-RADS 1, BI-RADS 2, and biopsy-proven benign), a 2-year follow-up confirming a negative diagnosis was used to solidify the ground truth.

8. The Sample Size for the Training Set

The document states that during the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience. However, the specific sample size for the training set is not provided in the given text.

9. How the Ground Truth for the Training Set Was Established

The document states that "During the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience." However, it does not explicitly describe how the ground truth for these training sets was established. It can be inferred that similar methods to the test set (histology, BI-RADS, follow-up) would have been used, but this is not confirmed in the text.

Ask a Question

Ask a specific question about this device

Page 1 of 1