Search Results
Found 38 results
510(k) Data Aggregation
(35 days)
QFM
Annalise Enterprise is a device designed to be used in the medical care environment to aid in triage and prioritization of studies with features suggestive of the following findings:
- pleural effusion* [1]
- pneumoperitoneum* [2]
- pneumothorax
- tension pneumothorax
- vertebral compression fracture* [3]
*See additional information below.
The device analyzes studies using an artificial intelligence algorithm to identify findings. It makes study-level output available to an order and imaging management system for worklist prioritization or triage.
The device is not intended to direct attention to specific portions of an image and only provides notification for suspected findings.
Its results are not intended:
- to be used on a standalone basis for clinical decision making
- to rule out specific findings, or otherwise preclude clinical assessment of chest X-ray studies
Intended modality:
Annalise Enterprise identifies suspected findings in digitized (CR) or digital (DX) chest X-ray studies.
Intended user:
The device is intended to be used by trained clinicians who are qualified to interpret chest X-ray studies as part of their scope of practice.
Intended patient population:
The intended population is patients who are 22 years or older.
Additional information:
The following additional information relates to the findings listed above:
[1] Pleural effusion
- specificity may be reduced in the presence of scarring and/or pleural thickening
- standalone performance evaluation was performed on a dataset that included supine and erect positioning
- use of this device with prone positioning may result in differences in performance
[2] Pneumoperitoneum
- standalone performance evaluation was performed on a dataset that included supine and erect positioning where most cases were of unilateral right-sided and bilateral pneumoperitoneum
- use of this device with prone positioning and for unilateral left-sided pneumoperitoneum may result in differences in performance
[3] Vertebral compression fracture
- intended for prioritization or triage of worklists of Bone Health and Fracture Liaison Service program clinicians
- standalone performance evaluation was performed on a dataset that included only erect positioning
- use of this device with supine positioning may result in differences in performance
Annalise Enterprise is a software workflow tool which uses an artificial intelligence (AI) algorithm to identify suspected findings on chest X-ray studies in the medical care environment. The findings identified by the device include pneumothorax, tension pneumothorax, pleural effusion, pneumoperitoneum and vertebral compression fracture.
Radiological findings are identified by the device using an AI algorithm – a convolutional neural network trained using deep-learning techniques. Images used to train the algorithm were sourced from datasets that included a range of equipment manufacturers including. This dataset, which contained over 750,000 chest X-ray imaging studies, was labelled by trained radiologists regarding the presence of the findings of interest.
The performance of the device's AI algorithm was validated in a standalone performance evaluation, in which the case-level output from the device was compared with a reference standard ('ground truth'). This was determined by two ground truthers, with a third truther used in the event of disagreement. All truthers were US board-certified radiologists.
The device interfaces with image and order management systems (such as PACS/RIS) to obtain chest X-ray studies for processing by the AI algorithm. Following processing, if any of the clinical findings of interest are identified in the study, the device provides a notification to the image and order management system for prioritization of that study in the worklist. This enables users to review the studies containing features suggestive of these clinical findings earlier than in the standard clinical workflow. It is important to note that the device will never decrease a study's existing priority in the worklist. This ensures that worklist items will never have their priorities downgraded based on AI results.
The device workflow is performed parallel to and in conjunction with the standard clinical workflow for interpretation of chest X-ray studies. The device is intended to aid in prioritization and triage of radiological medical images only.
The Annalise Enterprise device is designed to aid in the triage and prioritization of chest X-ray studies by identifying features suggestive of several findings. The following outlines the acceptance criteria and the study conducted to prove the device meets these criteria.
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for each finding are implicitly demonstrated by the reported Area Under the Curve (AUC), Sensitivity, and Specificity values, aiming for high performance in triaging positive cases while minimizing false positives. The reported device performance for each finding at various operating points is:
Finding | Product Code | AUC (95% CI) | Operating Point (Threshold) | Sensitivity % (Se) (95% CI) | Specificity % (Sp) (95% CI) |
---|---|---|---|---|---|
Pneumothorax | QFM | 0.984 (0.976, 0.990) | 0.200 | 97.1 (95.5, 98.6) | 88.2 (85.4, 90.8) |
0.250 | 96.2 (94.3, 98.1) | 91.9 (89.5, 94.1) | |||
0.300 | 95.0 (92.8, 97.1) | 94.1 (91.9, 95.9) | |||
0.350 | 93.1 (90.7, 95.5) | 95.6 (93.7, 97.2) | |||
0.400 | 90.7 (88.0, 93.3) | 96.7 (95.0, 98.2) | |||
Tension Pneumothorax | QFM | 0.989 (0.984, 0.994) | 0.225 | 96.0 (92.0, 99.2) | 94.0 (92.3, 95.6) |
0.250 | 95.2 (91.2, 98.4) | 94.6 (93.1, 96.2) | |||
0.300 | 93.6 (88.8, 97.6) | 95.6 (94.1, 96.9) | |||
0.350 | 89.6 (84.0, 94.4) | 96.6 (95.3, 97.8) | |||
0.400 | 87.2 (80.8, 92.8) | 97.5 (96.4, 98.6) | |||
Pneumoperitoneum | QAS | 0.987 (0.976, 0.994) | 0.250 | 96.2 (92.4, 99.0) | 87.9 (83.2, 92.1) |
0.300 | 94.3 (89.5, 98.1) | 90.5 (86.3, 94.2) | |||
0.350 | 92.4 (86.7, 97.1) | 93.7 (90.0, 96.8) | |||
0.400 | 91.4 (85.7, 96.2) | 95.8 (92.6, 98.4) | |||
0.450 | 87.6 (81.0, 93.3) | 98.4 (96.3, 100.0) | |||
Pleural Effusion | QFM | 0.977 (0.969, 0.984) | 0.380 | 96.7 (95.0, 98.1) | 86.8 (83.6, 89.5) |
0.425 | 94.4 (92.3, 96.5) | 89.5 (86.8, 92.1) | |||
0.450 | 92.9 (90.7, 95.0) | 91.3 (88.6, 93.7) | |||
0.475 | 89.8 (87.1, 92.3) | 93.7 (91.5, 95.9) | |||
0.500 | 87.6 (84.6, 90.5) | 95.5 (93.5, 97.0) | |||
Vertebral Compression Fx | QFM | 0.972 (0.960, 0.982) | 0.460 | 93.4 (90.1, 96.0) | 85.8 (82.1, 89.6) |
0.500 | 92.6 (89.3, 95.6) | 90.9 (87.7, 93.7) | |||
0.550 | 87.1 (83.1, 90.8) | 94.7 (91.8, 96.9) |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: The standalone performance evaluation was conducted on a total dataset of 3,252 cases.
- Data Provenance: The data was collected retrospectively and anonymized. Cases were collected consecutively from four US hospital network sites. The datasets included a variety of patient demographics (gender, age, ethnicity, race) and technical parameters (imaging equipment make, model), indicating a diverse geographic (US) and technical (various scanner manufacturers: Agfa, Carestream, Fujifilm, GE Healthcare, Kodak, Konica Minolta, McKesson, Philips, Siemens, Varian) origin.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: At least two ABR-certified radiologists were used for each de-identified case. A third radiologist was used in the event of disagreement.
- Qualifications: All truthers were US board-certified radiologists who interpret chest X-rays as part of their regular clinical practice and were protocol-trained.
4. Adjudication Method for the Test Set
The adjudication method used was 2+1 consensus. Each deidentified case was annotated by at least two ground truthers (radiologists), and consensus was determined by these two. In the event of disagreement between the first two, a third ground truther was used to resolve the discrepancy and establish the final ground truth.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
The provided information does not indicate that a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done to assess how human readers improve with AI vs. without AI assistance. The study focuses on the standalone performance of the AI algorithm and its impact on triage effectiveness (turn-around time).
6. Standalone Performance (Algorithm Only without Human-in-the-Loop Performance)
Yes, a standalone performance evaluation was done. The key results table and associated metrics (AUC, Sensitivity, Specificity) are specifically for the device's AI algorithm independent of human intervention. The study describes "case-level output from the device was compared with a reference standard ('ground truth')", confirming a standalone evaluation.
7. Type of Ground Truth Used
The ground truth used was expert consensus, established by multiple US board-certified radiologists using a 2+1 adjudication method.
8. Sample Size for the Training Set
The training dataset used to train the Convolutional Neural Network (CNN) algorithm contained over 750,000 chest X-ray imaging studies.
9. How the Ground Truth for the Training Set was Established
The studies in the training dataset were labelled by trained radiologists regarding the presence of the findings of interest. The document does not specify the exact number of radiologists or the specific consensus or adjudication method used for the training set, only that they were "trained radiologists".
Ask a specific question about this device
(100 days)
QFM
Rayvolve PTX-PE is a radiological computer-assisted triage and notification software that analyzes chest x-ray images (Postero-Anterior (PA) or Antero-Posterior (AP)) of patients 18 years of age or older for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax).
Rayvolve PTX-PE uses an artificial intelligence algorithm to analyze the images for features suggestive of critical findings and provides study-level output available in DICOM node servers for worklist prioritization or triage.
As a passive notification for prioritization-only software tool within the standard of care workflow, Rayvolve PTX-PE does not send a proactive alert directly to a trained medical specialist.
Rayvolve PTX-PE is not intended to direct attention to specific portions of an image. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
Rayvolve PTX-PE is a software-only device designed to help healthcare professionals. It's a radiological computer-assisted triage and notification software that analyzes chest x-ray imaqes (Postero-Anterior (PA) or Antero-Posterior (AP)) of patients of 18 years of age or older for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax). It is intended to work in combination with DICOM node servers.
Rayvolve PTX-PE has been developed to use the current edition of the DICOM image standard. DICOM is the international standard for transmitting, storing, retrieving, printing, processing, and displaying medical imaging.
Using the DICOM standard allows Rayvolve PTX-PE to interact with existing DICOM node servers (eg .: PACS), and clinical-grade image viewers. The device is designed to run on a cloud platform and be connected to the radiology center's local network. It can also interact with the DICOM Node server.
When remotely connected to a medical center DICOM Node server, the software utilizes Al-based analysis algorithms to analyze chest X-rays for features suggestive of critical findings and provide study-level outputs to the DICOM node server for worklist prioritization. Following receipt of chest X-rays, the software device automatically analyzes each image to detect features suggestive of pneumothorax and/or pleural effusion.
Rayvolve PTX-PE filters and downloads only X-rays with organs determined from the DICOM Node server.
As a passive notification for prioritization-only software tool within the standard of care workflow, Rayvolve PTX-PE does not send a proactive alert directly to a trained health professional. Rayvolve PTX-PE is not intended to direct attention to a specific portion of an image. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
Rayvolve PTX-PE does not intend to replace medical doctors. The instructions for use are strictly and systematically transmitted to each user and used to train them on Rayvolve's use.
AZmed's Rayvolve PTX-PE is a radiological computer-assisted triage and notification software designed to analyze chest x-ray images for the presence of suspected pleural effusion and/or pneumothorax. The device's performance was evaluated through a standalone study to demonstrate its effectiveness and substantial equivalence to a predicate device (Lunit INSIGHT CXR Triage, K211733).
Here's a breakdown of the acceptance criteria and the study proving the device meets them:
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for Rayvolve PTX-PE are implicitly derived from demonstrating performance comparable to or better than the predicate device, especially regarding AUC, sensitivity, and specificity for detecting pleural effusion and pneumothorax, as well as notification time. The predicate's performance metrics are used as a benchmark.
Metric (Disease) | Acceptance Criteria (Implicit, based on Predicate K211733) | Reported Device Performance (Rayvolve PTX-PE) |
---|---|---|
Pleural Effusion | ||
ROC AUC | > 0.95 (Predicate: 0.9686) | 0.9830 (95% CI: [0.9778, 0.9880]) |
Sensitivity | 89.86% (Predicate) | 0.9134 (95% CI: [0.8874, 0.9339]) |
Specificity | 93.48% (Predicate) | 0.9448 (95% CI: [0.9239, 0.9339]) |
Performance Time | 20.76 seconds (Predicate) | 19.56 seconds (95% CI: [19.49 - 19.58]) |
Pneumothorax | ||
ROC AUC | > 0.95 (Predicate: 0.9630) | 0.9857 (95% CI: [0.9809, 0.9901]) |
Sensitivity | 88.92% (Predicate) | 0.9379 (95% CI: [0.9127, 0.9561]) |
Specificity | 90.51% (Predicate) | 0.9178 (95% CI: [0.8911, 0.9561]) |
Performance Time | 20.45 seconds (Predicate) | 19.43 seconds (95% CI: [19.42 - 19.45]) |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size: The test set for the standalone study consisted of 1000 radiographs for the Pneumothorax group and 1000 radiographs for the Pleural Effusion group. For each group, positive and negative images represented approximately 50%.
- Data Provenance: The document does not explicitly state the country of origin of the data or whether it was retrospective or prospective.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
The document does not provide details on the number of experts or their specific qualifications (e.g., years of experience as a radiologist) used to establish the ground truth for the test set.
4. Adjudication Method for the Test Set
The document does not describe the adjudication method used for the test set (e.g., 2+1, 3+1, none).
5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study Was Done
No, a Multi Reader Multi Case (MRMC) comparative effectiveness study was not conducted. The performance assessment was a standalone study evaluating the algorithm's performance only. The document explicitly states: "AZmed conducted a standalone performance assessment for Pneumothorax and Pleural Effusion in worklist prioritization and triage." Therefore, there is no effect size of how much human readers improve with AI vs. without AI assistance reported in this document.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
Yes, a standalone performance assessment (algorithm only without human-in-the-loop) was performed. The results presented in the table above and in the "Bench Testing" section are from this standalone evaluation.
7. The Type of Ground Truth Used
The document does not explicitly state the type of ground truth used (e.g., expert consensus, pathology, outcomes data). However, for a diagnostic AI device, it is standard practice to establish ground truth through a panel of qualified medical experts (e.g., radiologists) providing consensus reads, often with access to additional clinical information or follow-up. Given the nature of the findings (pleural effusion and pneumothorax on X-ray), it is highly likely that expert interpretations served as the ground truth.
8. The Sample Size for the Training Set
The document does not specify the sample size used for the training set of the AI model. The provided information focuses on the performance evaluation using an independent test set.
9. How the Ground Truth for the Training Set Was Established
The document does not detail how the ground truth for the training set was established. This information is typically proprietary to the developer's internal development process and is not always fully disclosed in 510(k) summaries.
Ask a specific question about this device
(26 days)
QFM
BriefCase-Triage is a radiological computer aided triage and notification software indicated for use in the analysis of CT images with or without contrast that include the ribs, in adults or transitional adolescents aged 18 and older. The device is intended to assist hospital networks and appropriately trained medical specialists in workflow triage by flagging and communicating suspect cases of three or more acute Rib fracture (RibFx) pathologies.
BriefCase-Triage uses an artificial intelligence algorithm to analyze images and highlight cases with detected findings in parallel to the ongoing standard of care image interpretation. The user is presented with notifications for cases with suspected RibFx findings. Notifications include compressed preview images that are meant for informational purposes only, and not intended for diagnostic use beyond notification. The device does not alter the original medical image and is not intended to be used as a diagnostic device.
The results of BriefCase-Triage are intended to be used in conjunction with other patient information and based on their professional judgment, to assist with triage/prioritization of medical images. Notified clinicians are responsible for viewing full images per the standard of care.
BriefCase-Triage is a radiological computer-assisted triage and notification software device. The software is based on an algorithm programmed component and is intended to run on a linuxbased server in a cloud environment.
The BriefCase-Triage receives filtered DICOM Images, and processes them chronologically by running the algorithms on each series to detect suspected cases. Following the Al processing, the output of the algorithm analysis is transferred to an image review software (desktop application). When a suspected case is detected, the user receives a pop-up notification and is presented with a compressed, low-quality, grayscale image that is captioned "not for diagnostic use, for prioritization only" which is displayed as a preview function. This preview is meant for informational purposes only, does not contain any marking of the findings, and is not intended for primary diagnosis beyond notification.
Presenting the users with worklist prioritization facilitates efficient triage by prompting the user to assess the relevant original images in the PACS. Thus, the suspect case receives attention earlier than would have been the case in the standard of care practice alone.
Acceptance Criteria and Device Performance for BriefCase-Triage
1. Table of Acceptance Criteria and Reported Device Performance
Metric | Acceptance Criteria (Performance Goal) | Reported Device Performance |
---|---|---|
AUC | > 0.95 | 97.2% (95% Cl: 95.5%-99.0%) |
Sensitivity | > 80% | 95.2% (95% Cl: 89.1%-98.4%) |
Specificity | > 80% | 95.1% (95% Cl: 91.2%-97.6%) |
Time-to-notification (Mean) | Comparability with predicate (70.1 seconds) | 41.4 seconds (95% Cl: 40.4-42.5) |
Note: The acceptance criteria for sensitivity and specificity are extrapolated from the statement "As the AUC exceeded 0.95 and sensitivity and specificity both exceeded 80%, the study's primary endpoints were met."
2. Sample Size and Data Provenance for Test Set
- Sample Size for Test Set: 308 cases
- Data Provenance: Retrospective, multicenter study from 5 US-based clinical sites. The cases collected for the pivotal dataset were distinct in time or center from the cases used to train the algorithm.
3. Number and Qualifications of Experts for Ground Truth
- Number of Experts: Three senior board-certified radiologists.
- Qualifications: Senior board-certified radiologists. (Specific years of experience are not provided in the document).
4. Adjudication Method for Test Set
The adjudication method is not explicitly stated. The document mentions "ground truth, as determined by three senior board-certified radiologists," but does not detail how disagreements among these radiologists were resolved (e.g., 2+1, 3+1, or simple consensus).
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No MRMC comparative effectiveness study was done to assess the effect of AI assistance on human readers' performance. The study focused on the standalone performance of the AI algorithm and its time-to-notification compared to a predicate device.
6. Standalone Performance Study
Yes, a standalone performance study was done. The "Pivotal Study Summary" section explicitly details the evaluation of the software's performance (AUC, sensitivity, specificity, PPV, NPV, PLR, NLR) in identifying RibFx without human intervention, comparing it to the established ground truth.
7. Type of Ground Truth Used
The ground truth used was expert consensus, determined by three senior board-certified radiologists.
8. Sample Size for Training Set
The sample size for the training set is not explicitly provided in the document. The document states, "The algorithm was trained during software development on images of the pathology. As is customary in the field of machine learning, deep learning algorithm development consisted of training on labeled ("tagged") images." It also mentions, "The cases collected for the pivotal dataset were all distinct in time or center from the cases used to train the algorithm, as was used for the most recent clearance (K230020)."
9. How Ground Truth for Training Set Was Established
The ground truth for the training set was established by labeling ("tagging") images based on the presence of the critical finding (three or more acute Rib fractures). This process is described as "each image in the training dataset was tagged based on the presence of the critical finding." The document does not specify who performed this tagging or the exact methodology for establishing the ground truth for the training set (e.g., expert consensus, pathology).
Ask a specific question about this device
(178 days)
QFM
VUNO Med-Chest X-ray Triage/VUNO Med-CXR Link Triage is a radiological computer-assisted triage and notification software that analyzes adult chest X-ray images for the presence of prespecified suspected critical findings (pleural effusion and/or pneumothorax). VUNO Med-Chest X-ray Triage/VUNO Med-CXR Link Triage uses an artificial intelligence algorithm to analyze images for features suggestive of critical findings and provides case-level output available in the PACS/ workstation for worklist prioritization or triage.
As a passive notification for prioritization-only software tool within standard of care workflow, VUNO Med-Chest X-ray Triage/VUNO Med-CXR Link Triage does not send a proactive alert directly to the appropriately trained medical specialists. VUNO Med-Chest X-ray Triage/VUNO Med-CXR Link Triage is not intended to direct attention to specific portions of an image or to anomalies other than pleural effusion and/or pneumothorax. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
VUNO Med-Chest X-ray Triage/VUNO Med-CXR Link Triage is an automated computerassisted triage and notification software that analyzes adult chest X-ray images for the presence of pleural effusion and pneumothorax. It is based on an artificial intelligence analysis model, specifically a convolutional network (CNN), which employs deep learning technology to learn features from data.
The training data is sourced from 4 distinct sites of South Korea and India data provider, including medical imaging centers, data partners, and medical hospitals, and over 13 different modality manufacturers such as GE. Philps, FUJI, Canon, Samsung, SIEMENS, etc.
A "locked" algorithm is used, and the same input gives the same results every time. The software receives an image of a frontal chest radiograph and automatically analyzes it for the presence of pre-specified critical findings. If any findings are suspected, the image is flagged, and a passive notification is provided to the user. Subsequently, trained radiologists or healthcare professionals should make the final decision which is the standard of care at present. A user interface is provided for visualization, displaying the loaded image and any detected findings.
The data can be transmitted from Picture Archive and Communications Systems (PACS) using the DICOM protocol.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
Performance Metric | Acceptance Criteria | Reported Device Performance (VUNO Med-Chest X-ray Triage) | Reported Predicate Performance (qXR-PTX-PE) |
---|---|---|---|
Pneumothorax | |||
ROC AUC | > 0.95 | 0.9883 (95% CI: [0.9815, 0.9939]) | 0.9894 (95% CI: [0.9829, 0.9980]) |
Sensitivity | Not explicitly stated | 95.45% (95% CI: [92.01, 97.71]) | 94.53% (95% CI: [90.42, 97.24]) |
Specificity | Not explicitly stated | 96.41% (95% CI: [94.32, 97.90]) | 96.36% (95% CI: [94.07, 97.95]) |
Pleural Effusion | |||
ROC AUC | > 0.95 | 0.9900 (95% CI: [0.9863, 0.9932]) | 0.989 (95% CI: [0.9847, 0.9944]) |
Sensitivity | Not explicitly stated | 96.53% (95% CI: [94.24, 98.09]) | 96.22% (95% CI: [93.62, 97.97]) |
Specificity | Not explicitly stated | 95.11% (95% CI: [93.37, 96.50]) | 94.90% (95% CI: [93.04, 96.39]) |
Timing of Notification | Below 10 seconds | 7.86 seconds (average) | 10 seconds (average) |
2. Sample Sizes and Data Provenance
- Test Set Sample Sizes:
- Pleural Effusion: 1,200 scans (with pleural effusion) and 797 scans (without pleural effusion) for a total of 1,997 scans.
- Pneumothorax: 716 scans (with pneumothorax) and 474 scans (without pneumothorax) for a total of 1,190 scans.
- Data Provenance: The test datasets were retrospectively collected chest X-rays. They were sourced from various regions of the US: Midwest, West, Northeast, and South. The text explicitly states that the test dataset is "independent of the training dataset, with each sourced from a different country." While the training set is from South Korea and India, the text indicates the test set is from the US.
3. Number of Experts and Qualifications for Ground Truth - Test Set
- For the predicate device (qXR-PTX-PE), the ground truth for pneumothorax performance testing was established by 3 ABR radiologists with a minimum of 5 years of experience.
- The document does not explicitly state the number or qualifications of experts used to establish the ground truth for the subject device's (VUNO Med-Chest X-ray Triage) test set. It only mentions the ground truth for the predicate device's test set was established by radiologists. It's common practice for the subject device to follow a similar ground truthing methodology, but this is not explicitly stated.
4. Adjudication Method for the Test Set
- The document does not specify an adjudication method (e.g., 2+1, 3+1) for establishing the ground truth for the test set. It only states that the ground truth for the predicate device was "established by 3 ABR radiologists." This could imply consensus or a majority vote, but it's not detailed.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- No, an MRMC comparative effectiveness study was not reported. The study focused on the standalone performance of the AI algorithm (VUNO Med-Chest X-ray Triage) and compared its performance metrics (AUC, sensitivity, specificity) against those of the predicate device (qXR-PTX-PE). There is no mention of human readers assisting or being compared to the AI.
6. Standalone (Algorithm Only) Performance
- Yes, a standalone performance study was done. The reported performance metrics (AUC, sensitivity, specificity) are for the VUNO Med-Chest X-ray Triage algorithm operating independently (without human-in-the-loop assistance for the reported metrics).
7. Type of Ground Truth Used
- The ground truth for the test set was established by expert consensus (specifically, by ABR radiologists for the predicate device, implying a similar method for the subject device). The presence or absence of the critical findings (pneumothorax and pleural effusion) was determined by these experts.
8. Sample Size for the Training Set
- The document mentions that the training data is sourced from "4 distinct sites of South Korea and India data provider," but it does not specify the sample size (number of images or patients) used for the training set.
9. How the Ground Truth for the Training Set Was Established
- The document states that the AI algorithm "employs deep learning technology to learn features from data" and that the "training data is sourced from 4 distinct sites of South Korea and India data provider." However, it does not explicitly detail how the ground truth for the training set was established. It can be inferred that these images were expert-labeled, as is typical for supervised deep learning, but the specific process (e.g., number of readers, their qualifications, adjudication) is not described for the training set.
Ask a specific question about this device
(87 days)
QFM
CINA-VCF is a radiological computer aided triage and notification software indicated for use in patients aged 50 years and over undergoing non-enhanced or contrast-enhanced CT scans which include the chest and/or abdomen.
The device is intended to assist hospital networks and appropriately trained medical specialists within the standard-of-care bone health setting in workflow triage by flagging and communication of suspected positive cases of Vertebral Compression Fractures (VCF) findings.
CINA-VCF uses an artificial intelligence algorithm to analyze images and highlight cases with detected findings on a standalone application in parallel to the ongoing standard of care image interpretation. The device does not alter the original medical image, and it is not intended to be used as a diagnostic device.
The results of CINA-VCF are intended to be used in conjunction with other patient information and based on professional judgment to assist with triage/prioritization of medical images. Notified clinicians are ultimately responsible for reviewing full images per the standard of care.
CINA-VCF is a radiological computer-assisted triage and notification software device.
CINA-VCF runs on a standard "off the shelf" server/workstation and consists of VCF Image Processing Application, which can be integrated, deployed and used with the CINA Platform (cleared under K200855) or other compatible medical image communications devices. CINA-VCF receives nonenhanced or contrast-enhanced CT scans (which include the chest and/or abdomen) identified by the CINA Platform or other compatible medical image communications device, processes them using algorithmic methods involving execution of multiple computational steps to identify suspected presence of Vertebral Compression Fractures (VCF) findings and generates results files to be transferred by CINA Platform or a similar medical image communications device for output to a PACS system or workstation for worklist prioritization.
DICOM images are received, recorded and filtered before processing. The series are processed chronologically by running algorithms on each series to detect suspected positive findings of a vertebral compressions fracture (VCF).
The device uses deep learning models to detect VCF at the T1-L5 level. The models were trained endto-end on a dataset of 886 series collected from multiple centers in the USA and France satisfying the device protocol and representing a large distribution of scanner models from Siemens, Philips, GE and Canon (formerly Toshiba), acquisition protocols, spine presentation and fracture location and severity. Additional models, trained on subsets of this dataset, are used to locate the spine, identify the vertebra bodies and exclude vertebra which have been subjected to vertebroplasty or contains orthopedic material.
The Worklist Application displays all incoming suspect cases, each notified case is marked with an icon. In addition, compressed, grayscale, unannotated images that are captioned "not for diagnostic use" is displayed as a preview function. This compressed preview is meant for informational purposes only, does not contain any marking of the findings, and is not intended for diagnostic use beyond notification.
Presenting the specialist with worklist prioritization facilitates earlier triage by allowing prioritization of images in the PACS. Thus, the suspect case receives attention earlier than would have been the case in the standard of care practice alone.
The CINA Platform is an example of medical image communications platform for integrating and deploying the CINA-VCF image processing applications. It provides the necessary requirements for interoperability based on the standardized DICOM protocol and services to communicate with existing systems in the hospital radiology department such as CT modalities or other DICOM nodes (DICOM router or PACS for example). It is responsible for transferring, converting formats, notifying of suspected findings and displaying medical device data such as radiological data. The CINA Platform server includes the Worklist client application which receives notifications from the CINA-VCF Image Processing application.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
Acceptance Criterion | Reported Device Performance (CINA-VCF) |
---|---|
Primary Endpoint: ROC AUC | 0.974 [95% CI: 0.962 - 0.986] (Exceeded the 0.95 performance goal) |
Sensitivity | 95.2% [95% CI: 90.7% - 97.9%] |
Specificity | 92.9% [95% CI: 89.4% - 96.5%] |
Accuracy (Overall Agreement) | 93.7% [95% CI: 91.1% - 95.7%] |
Time-to-Notification (All cases, Mean ± SD) | 23.4 ± 8.4 seconds (Median: 21.0 seconds, 95% CI: [22.7 - 24.2], Min: 9.0, Max: 60.0) |
Time-to-Notification (True Positive cases, Mean ± SD) | 21.7 ± 7.5 seconds (Median: 20.0 seconds, 95% CI: [20.5 - 22.8], Min: 9.0, Max: 45.0) |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 474 clinical anonymized cases.
- Data Provenance: Retrospective, multinational study. Data provided from multiple US (66.9%) and OUS (33.1%) clinical sites. The data included 180 (37.9%) positive cases (CT with VCF) and 294 (62.1%) negative cases.
- Patient Demographics: Mean age 72.1 ± 10.1 years old (MIN = 50 yo and MAX = 100 yo), 50.8% female. Data accounted for race/ethnicity in the intended US patient population.
- Image Acquisition: Acquired by 4 different scanner makers and 38 different scanner models. Various scanner parameters were considered, including slice thickness, number of detector rows, kVp ranges, contrast vs. non-contrast, imaging protocol (chest and/or abdomen), and reconstruction kernel (soft/standard).
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
- Number of Experts: Three.
- Qualifications: US-board-certified expert radiologists.
4. Adjudication Method for the Test Set
- Method: Consensus of three US-board-certified expert radiologists. A case was considered positive if at least one moderate or severe vertebral compression fracture located within the thoracic or lumbar spine was identified by the experts.
5. If a Multi-reader Multi-case (MRMC) Comparative Effectiveness Study Was Done
- No, a multi-reader multi-case (MRMC) comparative effectiveness study was not conducted to evaluate human readers with and without AI assistance for effect size. The study focused on the standalone performance of the AI device and compared its time-to-notification to a predicate device, not directly to human reader performance with or without the AI.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done
- Yes, a standalone performance testing was performed. The study describes "Avicenna.Al conducted a retrospective, multinational and blinded study with the CINA-VCF application... to evaluate the software's performance."
7. The Type of Ground Truth Used
- Type of Ground Truth: Expert consensus. Specifically, "ground truth established by consensus of three US-board-certified expert radiologists."
8. The Sample Size for the Training Set
- Sample Size: 886 series.
9. How the Ground Truth for the Training Set Was Established
- The device uses deep learning models that "were trained end-to-end on a dataset of 886 series collected from multiple centers in the USA and France satisfying the device protocol and representing a large distribution of scanner models... and fracture location and severity." While the text describes the dataset, it does not explicitly state how the ground truth for this training set was established. It implies that the "device protocol" guided the selection of data, likely with some form of expert labeling or pre-existing clinical reports classifying the fractures, similar to the test set ground truth, but this is not directly stated.
Ask a specific question about this device
(239 days)
QFM
The VinDr-Mammo is a passive notification for prioritization-only, a parallel-workflow software tool used by MQSA qualified interpreting physicians to prioritize patients with suspicious findings in the medical care environment. VinDr-Mammo utilizes an artificial intelligence algorithm to analyze 2D FFDM screening mammograms and flags those that are suggestive of the presence of at least one suspicious finding at the exam-level. VinDr-Mammo produces an exam-level output to a PACS/ Workstation for flagging the suspicious case and allows worklist prioritization.
MQSA qualified interpreting physicians are responsible for reviewing each exam on a display approved for use in mammography, according to the current standard of care. VinDr-Mammo device is limited to the categorization of exams, does not provide any diagnostic information beyond triage and prioritization, does not remove images from the interpreting physician's worklist, and should not be used in lieu of full patient evaluation, or relied upon to make or confirm diagnosis.
The VinDr-Mammo device is intended for use with complete 2D FFDM mammography exams acquired using validated FFDM systems only.
The VinDr-Mammo is an innovative medical device designed to assist in the analysis and triage of 2D full-field digital mammogram (FFDM) screening mammograms. Operating as non-invasive computer-assisted software, known as SaMD, it employs a machine learning algorithm to identify potential suspicious findings within the images. Once identified, the system promptly notifies a PACS/workstation for further examination. This passive-notification feature enables radiologists to prioritize their workload efficiently and view studies in order of importance using standard PACS or workstation viewing software. It is important to note that the VinDr-Mammo software is intended solely to aid in the prioritization and triage of radiological medical images. It serves as a valuable tool for MQSA interpreting physicians who specialize in mammogram readings, complementing the standard of care. It should be emphasized that the device does not replace the need for a comprehensive evaluation as per established medical practices. During the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience.
The VinDr-Mammo code can be viewed by radiologists on a Picture Archiving and Communication System (PACS), Electronic Patient Record (EPR), and/or Radiology Information System (RIS) worklist and can be used to reorder the worklist: the mammographic studies with code 1 should be prioritized over those with code 0 and, thus, should be moved to the top of the worklist. As a software-only device, VinDr-Mammo can be hosted on a compatible host server connected to the necessary clinical IT systems such that DICOM studies can be received and the resulting outputs returned where they can be incorporated into the radiology worklist.
The following modules compose the VinDr-Mammo software:
- Data input and validation: Following retrieval of a study, the validation feature assessed the input data (i.e. age, modality, view) to ensure compatibility for processing by the algorithm.
- VinDr-Mammo algorithm: Once a study has been validated, the algorithm analyzes the 2D FFDM screening mammogram for detection of suspected findings.
- API Cognitive service: The study analysis and the results of a successful study analysis are provided through an API service, whose outputs will then be sent to the appropriate clinical IT system for viewing on a radiology worklist.
- Error codes feature: In the case of a study failure during data validation or the analysis by the algorithm, an error is provided to the system.
Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
The document does not explicitly state pre-defined acceptance criteria for performance metrics like Sensitivity, Specificity, or AUC. Instead, it presents the device's performance metrics and then concludes that it is "substantially equivalent" to a predicate device, implicitly using the predicate's performance as a benchmark.
However, based on the performance data presented and the comparison to the predicate, we can infer some implied performance expectations:
Metric | Acceptance Criteria (Implied by Predicate Performance) | Reported VinDr-Mammo Aggregate Performance | Reported Predicate (K220080) Performance |
---|---|---|---|
Sensitivity | At least 0.870 | 0.900 (0.877 - 0.921 CI) | 0.870 |
Specificity | At least 0.890 | 0.910 (0.897 - 0.922 CI) | 0.890 |
AUC | At least 0.957 | 0.962 (0.957 - 0.971 CI) | 0.957 (0.936 - 0.973 CI) |
Processing Time | Within clinical operational expectations (minutes) | Average of 2.8 minutes | Not explicitly stated for predicate |
2. Sample Size Used for the Test Set and Data Provenance
The device performance was evaluated on two separate pivotal studies:
-
Study 1 (RSNA Dataset):
- Sample Size: 1000 2D FFDM mammogram exams.
- Data Provenance: Retrospective data provided by the Radiological Society of North America (RSNA) via their RSNA Screening Mammography Breast Cancer Detection AI Challenge. This dataset was used to demonstrate generalizability to the demographics of the US population.
-
Study 2 (Vietnamese Dataset):
- Sample Size: 1864 anonymized 2D FFDM mammograms.
- Data Provenance: Retrospective cohort from a frontline Vietnamese hospital (Hanoi Medical University Hospital). This dataset was used to demonstrate generalizability to different screening modalities due to the lack of scanner information in the RSNA dataset.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
The document does not specify the number of experts used or their qualifications for establishing the ground truth of the test sets. It mentions:
- For the RSNA dataset: 252 cases positive for cancer with histologically proven and 748 cases negative for breast cancer (BI-RADS 1, BI-RADS 2 and biopsy-proven benign) with a two-year follow-up of a negative diagnosis.
- For the Vietnamese dataset: 466 cases positive with biopsy-confirmed cancers and 1398 cases negative for breast cancer (BIRADS1, BIRADS2 and biopsy-proven benign) with a two-year follow-up of a negative diagnosis.
This implies that the ground truth was established through a combination of histological proof, BI-RADS assessment, and clinical follow-up, which would typically involve qualified radiologists and pathologists, but specific numbers or qualifications are not provided.
4. Adjudication Method for the Test Set
The document does not explicitly describe an adjudication method (e.g., 2+1, 3+1, none) used for establishing the ground truth of the test sets. The ground truth seems to be derived from a combination of histological proof, BI-RADS classification, and 2-year follow-up data.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, If so, what was the effect size of how much human readers improve with AI vs. without AI assistance
The document does not mention a Multi-Reader Multi-Case (MRMC) comparative effectiveness study. The studies presented focus on the standalone performance of the VinDr-Mammo device. Therefore, no effect size for human reader improvement with AI assistance is provided.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
Yes, a standalone performance study was done. Both pivotal studies (RSNA and Vietnamese datasets) evaluate the "standalone detection and triage" or simply the "performance" of the VinDr-Mammo device (the algorithm only) against the established ground truth. The reported Sensitivity, Specificity, and AUC metrics are indicative of standalone algorithm performance.
7. The Type of Ground Truth Used
The ground truth used was a combination of:
- Histological Proof: For positive cancer cases, confirmation was based on biopsy results.
- BI-RADS Classification: For negative cases, BI-RADS 1 and 2 classifications were used.
- Outcomes Data (2-year follow-up): For negative cases (BI-RADS 1, BI-RADS 2, and biopsy-proven benign), a 2-year follow-up confirming a negative diagnosis was used to solidify the ground truth.
8. The Sample Size for the Training Set
The document states that during the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience. However, the specific sample size for the training set is not provided in the given text.
9. How the Ground Truth for the Training Set Was Established
The document states that "During the algorithm's training, independent datasets from various global sites were utilized, ensuring a robust and diverse training experience." However, it does not explicitly describe how the ground truth for these training sets was established. It can be inferred that similar methods to the test set (histology, BI-RADS, follow-up) would have been used, but this is not confirmed in the text.
Ask a specific question about this device
(274 days)
QFM
SmartChest is a radiological computer assisted triage and notification software that analyzes frontal chest X-ray images (Postero-Anterior (PA) or Antero-Posterior (AP)) of transitional adolescents (18 -21 yo but treated like adults) and adults (≥22 yo) for the presence of suspected pleural effusion and/or pneumothorax. SmartChest uses an artificial intelligence algorithm to analyze the images for features suggestive of critical findings and provides case-level output available to a PACS (or other DICOM storage platforms) for worklist prioritization.
As a passive notification for prioritization-only software tool within the standard of care workflow, SmartChest does not send a proactive alert directly to a trained medical specialist.
SmartChest is not intended to direct attention to a specific portion of an image. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
Smart Chest is a radiological computer assisted triage and notification software that analyzes (Postero-Anterior (PA) and/or Antero-Posterior (AP)) of transitional adolescents (18 ≤ age ≤ 21 yo but treated like adults) and adults (22 yo ≤ age) for the presence of suspected pleural effusion and/or pneumothorax. The software utilizes Al-based image analysis algorithms to detect the findings.
SmartChest provides case-level output available in the worklist prioritization by appropriately trained medical specialists qualified to interpret chest radiographs are automatically received from the user's image acquisition or storage systems (e.g., PACS, other DICOM storage platforms) and processed by SmartChest for analysis. After receiving Chest X-Ray images, the device automatically analyzes the images and identifies pre-spectied findings (pleural effusion and/or pneumothorax). Then the analysis results are passively sent by SmartChest yia a notfication to the worklist software being used (PACS, or other platforms).
The results are made available via a newly generated DICOM series (containing a secondary capture image), where DICOM tags contains the following information:
-
"SUSPECTED FINDING" or "CASE PROCESSED" if the algorithm ran successfully, "NOT PROCESSED" if the algorithm receives a study containing chest images that are not part of the intended use (lateral views or excluded age for example).
-
"SUSPECTED PLEURAL EFFUSION" OR "SUSPECTED PNEUMOTHORAX" if one pre-specified finding category identified OR,
3."SUSPECTED PLEURAL EFFUSION, PNEUMOTHORAX" if the two pre-specified finding categories identified
-
- The secondary capture image returned in the storage system indicates at the study-level:
- The number of images received by SmartChest,
- The number of images processed by SmartChest,
- The status of the study: "NOT PROCESSED", "SUSPECTED FINDING" or "CASE PROCESSED".
The DICOM storage component may be a Picture Archiving and Communications (PACS) system or other local storage platforms. This would allow the appropriately trained medical specialists to group suspicious exams together that may potentially benefit their prioritization. Chest radiographs without an identified anomaly are placed in the worklist for routine review, which is the current standard of care.
The device is not intended to be a rule-out device and for cases that have been processed by the device without notification for prespecified suspected findings it should not be viewed as indicating that the pre-specified findings are excluded. SmartChest device does not alter the order nor remove imaging exams from the interpretation queue. Unflagged cases should still be interpreted by medical specialists.
The notification is contextual and does not provide any diagnostic information. The results are not intended to be used on a stand-alone basis for clinical decision-making. The summary image will display the following statement: "The product is not for Diagnostic Use-For Prioritization Only".
The information provided is a 510(k) summary for the Milvue SmartChest device. Here's a breakdown of its acceptance criteria and the study proving it meets those criteria:
1. Table of Acceptance Criteria and Reported Device Performance:
The document doesn't explicitly state "acceptance criteria" as a separate section with specific numerical thresholds for sensitivity, specificity, and AUC that were set a priori. However, it reports the device's performance metrics for two distinct conditions: Pneumothorax and Pleural Effusion. The implication is that these reported performances met an acceptable level for substantial equivalence to the predicate device.
Performance Metric | Pneumothorax Reported Performance | Pleural Effusion Reported Performance |
---|---|---|
ROC AUC | 0.989 [0.978; 0.997] | 0.975 [0.960; 0.987] |
Sensitivity | 92.7% [95% CI: 87.4-96.2] | 93.3% [95% CI: 88.1-96.4] |
Specificity | 97.3% [95% CI: 93.4-99.1] | 90.0% [95% CI: 84.1-94.1] |
Mean Execution Time (local) | 2.322 ± 0.267 seconds | 2.288 ± 0.165 seconds |
Mean Execution Time (cloud) | 28.542 ± 8.254 seconds | 28.257 ± 7.226 seconds |
2. Sample Size and Data Provenance for the Test Set:
- Sample Size for Each Study: 300 Chest X-Ray cases
- Total Test Set Size (extrapolated based on two studies): 600 Chest X-Ray cases (300 for pneumothorax, 300 for pleural effusion).
- Data Provenance: The test data was obtained from multiple institutions across the US. It was from sites different from the training data sites, ensuring independence. It included images from rural (49 for pneumothorax, 53 for pleural effusion) and urban (251 for pneumothorax, 247 for pleural effusion) sites from states like New York, North Carolina, Texas, and Washington. The data was retrospective.
3. Number of Experts and Qualifications for Ground Truth - Test Set:
- Number of Experts: Three.
- Qualifications of Experts: All three were ABR-certified (American Board of Radiology-certified) radiologists with a minimum of 5 years of experience.
4. Adjudication Method for the Test Set:
- The ground truth was established by three ABR-certified radiologists. The first two radiologists independently interpreted each case. The third radiologist independently reviewed cases where there was a disagreement between the first two. The final ground truth was determined by majority consensus (2+1 adjudication).
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- The document does not indicate that an MRMC comparative effectiveness study was done to measure human reader improvement with AI assistance. The study focuses solely on the standalone performance of the AI algorithm.
6. Standalone Performance Study:
- Yes, a standalone performance study was done. Two individual standalone performance assessment studies were conducted to evaluate the effectiveness of SmartChest for pneumothorax and pleural effusion separately.
7. Type of Ground Truth Used:
- The ground truth for the test set was established by expert consensus among three ABR-certified radiologists.
8. Sample Size for the Training Set:
- The training set was composed of 9,560 images.
9. How the Ground Truth for the Training Set Was Established:
- The document states that the training data was collected from an unfiltered stream of exams in four French institutions between October 2018 and December 2021. It lists the distribution of exams per pathology (No findings, Pleural Effusion, Pneumothorax).
- While it explains where and when the data was collected and the distribution of findings, the document does not explicitly describe the detailed process by which the ground truth labels for the training set were established. It only mentions that the data was processed to fit the model's requirements and that the images were used to train the model. This typically implies that the original clinical reports or expert annotations associated with these studies were used to create the ground truth labels for training, but the specific expert qualifications or adjudication methods for the training set ground truth are not provided.
Ask a specific question about this device
(205 days)
QFM
RADIFY® Triage is a radiological computer-assisted triage and notification software that analyzes adult chest X-ray images for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax).
RADIFY® Triage uses an artificial intelligence algorithm to analyze images for features suggestive of critical findings and provides case-level output available in the PACS for worklist prioritization or triage.
As a passive notification for prioritization-only software tool within the standard of care workflow, RADIFY® Triage does not send a proactive alert directly to the appropriately trained medical specialists. The product is not intended to direct attention to specific portions of an image. Its results are not intended to be used on a stand-alone basis for clinical decision-making. The device does not remove the cases from the queue and does not flag the condition as being absent.
RADIFY® Triage is a radiological computer-assisted prioritization software that utilizes Albased image analysis algorithms to identify pre-specified critical findings (pleural effusion and/or pneumothorax) on frontal (AP and PA) views chest X-ray images and flag the images in the PACS to enable worklist prioritization by the appropriately trained medical specialists who are qualified to interpret chest radiographs. The software does not alter the order or remove cases from the reading queue.
The algorithm was trained on datasets from US and non-USA sources. This training dataset consisted of 93.7% of the data from South Africa, and 6.3% of the data from the USA. The input for RADIFY® Triage is a frontal chest x-ray (AP and PA view) in digital imaging and communications in medicine (DICOM) format.
Chest X-rays are sent to RADIFY® Triage via PACS (Picture Archiving and Communication System (PACS) and processed by the device for analysis. Following receipt of chest x-rays, the software device automatically analyses each image to detect features suggestive of pneumothorax and/or pleural effusion. Chest x-rays without the suspicious findings are placed in the worklist for routine review, which is the standard of care. RADIFY® Triage does not provide any proactive alerts and is not intended to direct attention to specific portions of the image. The results are not intended to be used on a standalone basis for clinical decision-making nor is it intended to rule out the target conditions or otherwise preclude clinical assessment of x-ray cases.
Here's a breakdown of the acceptance criteria and the study details for the Radify® Triage device, based on the provided document:
1. Acceptance Criteria and Reported Device Performance
Condition | Acceptance Criteria (ROC AUC) | Reported Device Performance (ROC AUC) | Reported Device Sensitivity | Reported Device Specificity |
---|---|---|---|---|
Pleural Effusion | > 0.95 | 0.9761 (95% CI: [0.9736, 0.9786]) | 94.39% (95% CI: [93.26, 95.51]) | 96.42% (95% CI: [95.29, 98.00]) |
Pneumothorax | > 0.95 | 0.9743 (95% CI: [0.9712, 0.9774]) | 94.81% (95% CI: [93.90, 95.73]) | 97.91% (95% CI: [97.00, 98.83]) |
Overall | N/A (implied by individual) | 0.9762 (95% CI: [0.9743, 0.9781]) | 94.26% (95% CI: [93.53, 94.99]) | 97.27% (95% CI: [96.54, 98.00]) |
Notification Time | (Implicitly comparable to predicate) | Average of 3 seconds | N/A | N/A |
Note: The document explicitly states the acceptance criteria for performance as "Device shows > 95% AUC".
2. Sample Size and Data Provenance for the Test Set
- Test Set Sample Size:
- Pneumothorax: 2188 scans (1229 with pneumothorax + 959 without pneumothorax).
- Pleural Effusion: 1229 scans (392 with pleural effusion + 837 without pleural effusion).
- Shared Cases: 88 scans had both pleural effusion and pneumothorax co-existing.
- Data Provenance: Retrospective, obtained from three hospitals across the US: one large urban hospital in New York City and three different private clinics in urban and suburban areas in Texas state.
3. Number, Qualifications, and Adjudication Method of Experts for Test Set Ground Truth
- Number of Experts: 3
- Qualifications of Experts: Board-certified ABR (USA) radiologists with a minimum of 11 years of experience.
- Adjudication Method: Not explicitly stated, but the phrase "The ground truth was established by 3 board-certified ABR (USA) radiologists" implies a consensus-based approach, likely a majority vote or discussion to reach agreement. It does not specify 2+1 or 3+1, but suggests a similar process.
4. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? No, the document describes a standalone (algorithm only) performance evaluation against a radiologist-established ground truth. It does not mention a study to compare human reader performance with and without AI assistance.
- Effect size of human readers improving with AI vs. without AI assistance: Not applicable as an MRMC comparative effectiveness study was not performed or reported.
5. Standalone Performance Study
- Was a standalone study done? Yes, the document details the performance of the RADIFY® Triage algorithm alone, analyzing chest X-ray images for pneumothorax and pleural effusion. The reported metrics (AUC, sensitivity, specificity) are for the algorithm's performance in detecting these conditions compared to the established ground truth.
6. Type of Ground Truth Used
- Type of Ground Truth: Expert consensus, established by 3 board-certified ABR (USA) radiologists.
7. Sample Size for the Training Set
- Training Set Sample Size: Not explicitly stated as a total number of images, but the composition is given: "The algorithm was trained on datasets from US and non-USA sources. This training dataset consisted of 93.7% of the data from South Africa, and 6.3% of the data from the USA."
8. How Ground Truth for the Training Set Was Established
- How Ground Truth Was Established: Not explicitly detailed for the training set. The document only states that the algorithm was trained on datasets and then evaluated on a separate, independent test set where the ground truth was established by the 3 expert radiologists. It's common practice for training data ground truth to be established through similar expert review processes, but this specific detail is not provided for the training data in the given text.
Ask a specific question about this device
(329 days)
QFM
BraveCX is a radiological computer-assisted triage and notification software that analyzes adult (≥18 years old) chest Xray images for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax). BraveCX uses an artificial intelligence algorithm to analyze images for features suggestive of critical findings and provides caselevel output available in the PACS/workstation for worklist prioritization or triage. As a passive notification for prioritization-only software tool within standard of care workflow, BraveCX does not send a proactive alert directly to the appropriately trained medical specialists. BraveCX is not intended to direct attention to specific portions of an image or to anomalies other than pleural effusion and/or pneumothorax. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
BraveCX is a Deep Learning Artificial Intelligence (AI) software that analyzes adult (≥18 years old) chest X-ray images for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax. It uses deep learning to analyze each image to identify features suggestive of pleural effusion and/or pneumothorax. Upon image acquisition from other radiological imaging equipment (e.g. X-ray systems), Anteroposterior (AP) and Posteroanterior (PA) chest X-Rays are received and processed by BraveCX. Following receipt of an image, BraveCX de-identifies a copy of each DICOM file and analyses it for features suggestive of pleural effusion and/or pneumothorax. Based on the analysis result, the software notifies PACS/workstation for the presence of the critical findings, indicated by "flag" or "(blank)". This allows the appropriately trained medical specialists to group suspicious exams together with potential for prioritization. Chest radiographs without an identified anomaly are placed in the worklist for routine review, which is the current standard of care. The intended user of the BraveCX software is a health care professional such as radiologist or another appropriately trained clinician. The software does not alter the order or remove cases from the reading queue. The software output to the user is a label of "flag" or "(blank)" that relates to the likelihood of presence of pneumothorax and/or pleural effusion. BraveCX platform ingests prediction requests with either attached DICOM images or DICOM UIDs referencing images already uploaded to DICOM storage. The results will be made available via a newly generated DICOM that is stored in DICOM storage or as a JSON file. The DICOM storage component may be a Picture Archiving and Communications (PACS) system or some other local storage platform. BraveCX works in parallel to and in conjunction with the standard of care workflow to enable prioritized review by the appropriately trained medical specialists who are qualified to interpret chest radiographs. As a passive notification for prioritization-only software tool within standard of care workflow, BraveCX does not send a proactive alert directly to the appropriately trained medical specialists who are qualified to interpret chest radiographs. BraveCX is not intended to direct attention to specific portions or anomalies of an image and it should not be used on a standalone basis for clinical decision-making. BraveCX automatically runs after image acquisition. It prioritises and displays the analysis results through the worklist interface of PACS/workstation. An on-device, technologist notification is generated within 15 minutes after interpretation by the user, indicating which cases were prioritized by BraveCX in PACS. The technologist notification is contextual and does not provide any diagnostic information. The on-device, technologist notification is not intended to inform any clinical decision, prioritization, or action.
The provided text describes the BraveCX device, a radiological computer-assisted triage and notification software that analyzes adult chest X-ray images for the presence of suspected critical findings (pleural effusion and/or pneumothorax).
Here's a breakdown of the acceptance criteria and the study that proves the device meets them:
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for the BraveCX device are not explicitly listed in a separate table as "acceptance criteria." However, based on the "Summary of results" in the "9. Non-Clinical Performance Data" section and the comparison to the predicate device, the implied acceptance criteria are:
- For Pleural Effusion:
- ROC AUC > 0.95
- Sensitivity > 0.85 (implied by "lower bounds of both sensitivity and specificity are above 0.85")
- Specificity > 0.85 (implied by "lower bounds of both sensitivity and specificity are above 0.85")
- For Pneumothorax:
- ROC AUC > 0.95
- Sensitivity > 0.85 (implied by "lower bounds of both sensitivity and specificity are above 0.85")
- Specificity > 0.85 (implied by "lower bounds of both sensitivity and specificity are above 0.85")
- Device Performance Time (Time-to-notification): Comparable to the predicate device.
Reported Device Performance (External Independent Testing):
Metric | Pleural Effusion (BraveCX) | Pneumothorax (BraveCX) |
---|---|---|
ROC AUC | 0.988 (95% CI: 0.9885-0.9887) | 0.972 (95% CI: 0.9727-0.9729) |
Sensitivity | 92.62% (95% CI: 90.67%-94.27%) | 93.38% (95% CI: 92.23%-94.40%) |
Specificity | 98.11% (95% CI: 97.33%-98.71%) | 97.27% (95% CI: 96.49%-97.92%) |
Time-to-notification | 4.8-10.4 seconds (95% CI: 4.2-10.41s) for simultaneous prediction | 4.8-10.4 seconds (95% CI: 4.2-10.41s) for simultaneous prediction |
Predicate Device Performance (Lunit INSIGHT CXR Triage, K211733):
Metric | Pleural Effusion (Predicate) | Pneumothorax (Predicate) |
---|---|---|
ROC AUC | 0.9686 (95% CI: 0.9547 - 0.9824) | 0.9630 (95% CI: 0.9521 - 0.9739) |
Sensitivity | 89.86% (95% CI: 86.72 - 93.00) | 88.92% (95% CI: 85.60 - 92.24) |
Specificity | 93.48% (95% CI: 91.06 - 95.91) | 90.51% (95% CI: 88.18 - 92.83) |
Time-to-notification | 20.76 seconds (95% CI: 20.23 - 21.28) | 20.45 seconds (95% CI: 19.99 - 20.92) |
The BraveCX device’s performance metrics for ROC AUC, sensitivity, and specificity exceed the implied acceptance criteria (all > 0.95 for AUC and > 0.85 for sensitivity/specificity). The time-to-notification is also comparable to, and even faster than, the predicate device.
2. Sample Sizes and Data Provenance for the Test Set
- Sample Size:
- Pleural Effusion: n=2,509 images (with n=867 positive cases)
- Pneumothorax: n=3,245 images (with n=2,114 positive cases)
- Data Provenance: The study used the MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.02, NIH Chest X-Ray dataset (NIH-CXR), and CheXpert dataset (Stanford Hospital). These datasets represent the US population. The specific institutions mentioned are Beth Israel Deaconess Medical Center in Boston, MA, NIH Clinical Center, and Stanford Hospital. The data is retrospective.
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: Three.
- Qualifications: Board-certified Radiologists with at least 10 years of experience in specialty radiology training.
4. Adjudication Method for the Test Set
The document states, "All images were manually labelled by three board-certified Radiologists with at least 10 years of experience in specialty radiology training." It does not explicitly specify an adjudication method like 2+1 or 3+1, but implies that the agreement among these three experts established the ground truth. There is no mention of a specific tie-breaking rule or consensus process beyond all three being involved in labeling.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No, a multi-reader multi-case (MRMC) comparative effectiveness study comparing human readers with AI vs. without AI assistance was not explicitly mentioned or performed. The study described is a standalone performance evaluation of the AI algorithm.
6. Standalone Performance (Algorithm Only) Study
Yes, a standalone study was performed. The "Non-Clinical Performance Data" section describes an "external independent testing to assess the performance of BraveCX." This is a standalone evaluation of the algorithm's performance without a human in the loop. The results (ROC AUC, sensitivity, specificity, and time-to-notification) are reported for the algorithm itself.
7. Type of Ground Truth Used
The ground truth used for the test set was expert consensus (manual labeling by three board-certified radiologists).
8. Sample Size for the Training Set
The document mentions "Model training, validation, and testing sets were generated by stratified random partitions of 80%, 10%, and 10% respectively." While the exact total number of images used for training across all datasets is not explicitly stated, it implies that the training set constituted 80% of the total dataset used for development.
For the internal independent testing set, it contained n=1,209 cases for pleural effusion and n=1,387 cases of pneumothorax. Assuming these numbers are part of the 10% split for testing for the dataset from NHS Greater Glasgow and Clyde, the training set for that specific dataset would be significantly larger (e.g., if 1209 cases were 10%, training would be 8x that).
The external validation (MIMIC, NIH, CheXpert) had a sample size for testing (2509 for pleural effusion, 3245 for pneumothorax), but the specific training set size for the model that produced these results is not directly stated in terms of an absolute number, only the proportion (80%).
9. How the Ground Truth for the Training Set Was Established
The ground truth for the training set was established through manual curation by three board-certified Radiologists with at least 10 years in specialist radiology training. This is explicitly stated: "Images used in the training, validation, and testing of the subject device were all manually-curated ground truths provided by three board-certified Radiologists with at least 10 years in specialist radiology training."
Ask a specific question about this device
(144 days)
QFM
qXR-PTX-PE is a radiological computer-assisted triage and notification software that analyzes adult chest X-ray images for the presence of pre-specified suspected critical findings (pleural effusion and/or pneumothorax). qXR-PTX-PE uses an artificial intelligence algorithm to analyze images for features suggestive of critical findings and provides case-level output available in the PACS/workstation for worklist prioritization or triage.
As a passive notification for prioritization-only software tool within standard of care workflow, qXR-PTX-PE does not send a proactive alert directly to the appropriately trained medical specialists. qXR-PTX-PE is not intended to direct attention to specific portions of an image or to anomalies other than pleural effusion and/or pneumothorax. Its results are not intended to be used on a stand-alone basis for clinical decision-making.
qXR-PTX-PE is a radiological computer aided triage and notification software that analyses adult frontal (AP or PA views) CXR images for the presence of pre-specified suspected target conditions (pleural effusion and/or pneumothorax). The algorithm was trained on training data from across the world. The training dataset consisted of 74% of the data from India, 20.04% from the EU, 3.9% from the US, 1.4% from Brazil and 0.63% from Vietnam. The input for qXR-PTX-PE is a frontal chest X-ray (AP and PA view) in digital imaging and communications in medicine (DICOM) format
Chest X-rays are sent to qXR-PTX-PE by the means of transmission within the user's image storage system (e.g., Picture Archiving and Communication System (PACS)) or other radiological imaging equipment (e.g., X-ray systems) and processed by the qXR-PTX-PE for analysis. Following receipt of chest radiographs, the software device automatically analyses each image to detect features suggestive of pneumothorax and/or pleural effusion.
This would allow the appropriately trained medical specialists to group suspicious exams together that may potentially benefit for their prioritization. Chest radiographs without the suspicious findings are placed in the worklist for routine review, which is the standard of care at present. A secondary capture is available for the information on presence of the suspicious findings.
qXR-PTX-PE does not provide any proactive alerts. qXR-PTX-PE is not intended to direct attention to specific portions of the image. The results are not intended to be used on a standalone basis for clinical decision-making nor is it intended to rule out the target conditions or otherwise preclude clinical assessment of X-Ray cases.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria are implicitly defined by the statement: "The device shows > 95% AUC". The reported performance significantly exceeds this threshold.
Metric | Acceptance Criteria | qXR-PTX-PE Performance (Pneumothorax) | qXR-PTX-PE Performance (Pleural Effusion) |
---|---|---|---|
ROC AUC | > 0.95 | 0.9894 (95% CI: 0.9829 - 0.9980) | 0.9890 (95% CI: 0.9847 - 0.9944) |
Sensitivity | Not explicitly defined beyond AUC | 94.53% (95% CI: 90.42-97.24) | 96.22% (95% CI: 93.62-97.97) |
Specificity | Not explicitly defined beyond AUC | 96.36% (95% CI: 94.07-97.95) | 94.90% (95% CI: 93.04-96.39) |
Performance Time (Notification) | Not explicitly defined, but compared to predicate and other cleared products | 10 seconds (average) | 10 seconds (average) |
2. Sample Size Used for the Test Set and Data Provenance
- Pneumothorax Test Set: 613 scans
- 201 scans with pneumothorax
- 412 scans without pneumothorax
- Pleural Effusion Test Set: 1070 scans
- 344 scans with pleural effusion
- 726 scans without pleural effusion
- Data Provenance: Retrospective. All test data was obtained from various hospitals across the US. Specific regions mentioned include Midwest, Northeast, and West. The test set was intentionally obtained from sites different from the training data sites to ensure independence.
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
- Number of Experts: 3
- Qualifications: ABR (American Board of Radiology) thoracic radiologists with a minimum of 10 years of experience.
4. Adjudication Method for the Test Set
The provided text states that "The ground truth was established by 3 ABR thoracic radiologists with a minimum of 10 years of experience." It does not specify a particular adjudication method (e.g., 2+1, 3+1, or simple consensus). Without further detail, it's assumed to be a consensus among these three experts, but the exact process of resolving discrepancies is not described.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done
No, an MRMC comparative effectiveness study involving human readers with and without AI assistance was not reported. The study focuses purely on the standalone performance of the AI algorithm (qXR-PTX-PE) and compares it to the standalone performance of a predicate device.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
Yes, a standalone performance study was done. The reported AUC, sensitivity, and specificity metrics are for the qXR-PTX-PE algorithm acting independently.
7. The Type of Ground Truth Used
The ground truth used was expert consensus (3 ABR thoracic radiologists with a minimum of 10 years of experience).
8. The Sample Size for the Training Set
The exact total sample size for the training set is not specified. However, the text provides the geographical distribution of the training data:
- 74% from India
- 20.04% from the EU
- 3.9% from the US
- 1.4% from Brazil
- 0.63% from Vietnam
9. How the Ground Truth for the Training Set was Established
The document does not explicitly state how the ground truth for the training set was established. It only mentions that the algorithm was "trained on training data from across the world." It can be inferred that expert labeling or clinical diagnoses were likely used, given the nature of a medical imaging AI, but specific details about the process (e.g., number of readers, their qualifications, adjudication) are not provided for the training data.
Ask a specific question about this device
Page 1 of 4