Search Results
Found 809 results
510(k) Data Aggregation
(136 days)
CT VScore+ is a software application intended for non-invasive evaluation of calcified lesions of the coronary arteries based on ECG-gated, non-contrast cardiac CT images for patients aged 30 years or older. The device automatically generates calcium scores for the coronary arteries (combined LM+LAD, RCA, LCX) and highlights the segmented calcium on the original CT image. The device also offers the option for the user to display the calcium scores in the context of reference data from the MESA and Hoff-Kondos databases.
The segmented arteries include combined LM+LAD, RCA, and LCX. To obtain separate LM and LAD results, the user must perform manual segmentation. The segmentation map of calcifications is intended for informational use only and is not intended for detection or diagnostic purposes. The 3D Calcium View output is provided strictly as an informational and supplementary output and should never be used alone as the method of reviewing the calcium segmentation.
CT VScore+ is a software application intended for non-invasive evaluation of calcified lesions of the coronary arteries based on ECG-gated, non-contrast cardiac CT images for patients aged 30 years or older. The application runs on the Vitrea platform.
The device automatically generates Agatston and volume calcium scores for each of the coronary arteries (combined LM+LAD, RCA, LCX) based on the volume and density of the calcium deposits and highlights the Segmented calcium on the original CT image. The device also offers the option for the user to display the calcium scores in the context of reference data from the MESA and Hoff-Kondos databases.
The software uses deep learning-based segmentation methods. Users can edit the automated segmentation, including manually assigning calcifications to anatomical structures.
The device automatically outputs a combined LM+LAD score as the final automated output. To obtain separate LM and LAD results, the user must perform manual segmentation using the provided editing tools.
The device is Software as a Medical Device (SaMD) that operates on ECG-gated, non-contrast cardiac CT DICOM images.
The device does not interact directly with the patient. The device is a software application that runs on the Vitrea platform and processes ECG-gated non-contrast cardiac CT DICOM images. The device automatically generates Agatston and volume calcium scores for each of the coronary arteries (LAD+LM, RCA, LCX) based on the volume and density of the calcium deposits and highlights the segmented calcium on the original CT image. Results can be exported to image management, archival, or reporting systems that support DICOM standards for further review and interpretation.
Results can also be saved in DICOM Structured Reports (DICOM SR) format.
The CT VScore+ device is a software application for non-invasive evaluation of calcified lesions of the coronary arteries from ECG-gated, non-contrast cardiac CT images. The study presented demonstrates the analytical validity and performance of the device against predefined acceptance criteria.
1. Table of Acceptance Criteria and Reported Device Performance
| Metric | Acceptance Criteria | Reported Device Performance |
|---|---|---|
| Total Agatston Score ICC(2,1) | > 0.95 | 0.997 [95% CI: 0.996–0.998] |
| Total Volume Score ICC(2,1) | > 0.95 | 0.996 [95% CI: 0.995–0.997] |
| Per-Vessel ICC - LCx | > 0.90 | 0.937 |
| Per-Vessel ICC - RCA | > 0.90 | 0.990 |
| Per-Vessel ICC - LM+LAD | > 0.90 | 0.983 |
| CAC-DRS 4-Class Kappa | > 0.90 | 0.959 [95% CI: 0.936–0.982] |
| CAC Standard 5-Class Kappa | > 0.90 | 0.958 [95% CI: 0.938–0.978] |
| Voxelwise Dice Score | Informational Metric | 0.920 overall; LCx 0.874, RCA 0.883, LM+LAD 0.958 |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size (Test Set): 236 independent cases.
- Data Provenance: The pivotal validation dataset was sourced from diverse U.S. sites and scanner vendors. The development dataset, from which the test set was independent, included data from four institutions (two US sites and two Japanese sites). The 236 cases for validation were "independent" at both the patient level and the site level from the development dataset. It is retrospective data.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: Three.
- Qualifications of Experts: U.S. board-certified radiologists/cardiologists. (Specific years of experience are not mentioned).
4. Adjudication Method for the Test Set
- Adjudication Method: A "2+1 consensus process" was used. This typically means that if two experts agree, their consensus defines the ground truth. If there's a disagreement between two, the third expert acts as a tie-breaker or adjudicator.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- The provided document does not mention a multi-reader multi-case (MRMC) comparative effectiveness study to assess the effect size of human readers improving with AI vs. without AI assistance. The study focuses on the standalone performance of the AI algorithm against a consensus ground truth.
6. Standalone Performance Study (Algorithm Only)
- Yes, a standalone performance study was conducted. The metrics listed in the table (ICC, Kappa, Dice Score) directly assess the performance of the CT VScore+ algorithm in isolation against the established ground truth.
7. Type of Ground Truth Used
- Type of Ground Truth: Expert consensus. Specifically, the reference standard ground truth was established by consensus manual scoring on an FDA-cleared device (Vitrea CT VScore, K243240) and a 2+1 consensus process by three U.S. board-certified radiologists/cardiologists.
8. Sample Size for the Training Set
- Sample Size (Training Set): 94 cases (part of the 210 cases used for development).
9. How the Ground Truth for the Training Set Was Established
- The document implies that the ground truth for the training set (part of the development dataset) was established similarly to the validation set's ground truth, i.e., "by consensus manual scoring on an FDA-cleared device (Vitrea CT VScore, K243240)" by experts, given that the development process involved ensuring "robust and unbiased performance." However, the exact details of ground truth establishment specifically for the training set are not explicitly broken out as they are for the pivotal validation dataset. It's reasonable to infer a similar rigorous process if the data was used for deep learning model development.
Ask a specific question about this device
(117 days)
a2z-Unified-Triage is a radiological computer-aided triage and notification software indicated for use in the analysis of abdominal/pelvic CT images in adults aged 22 and older. The device is intended to assist hospital networks and appropriately trained medical specialists in workflow triage by flagging and communicating suspected positive cases of the 7 specified abdominopelvic findings: Acute Cholecystitis, Acute Pancreatitis, Unruptured Abdominal Aortic Aneurysm, Acute Diverticulitis, Free Air, Hydronephrosis, and Small Bowel Obstruction. These findings are intended to be used together as one device. The device supports both cloud-based and on-premises deployment, with integration either directly with healthcare facility systems or through third-party healthcare technology platforms.
a2z-Unified-Triage uses an artificial intelligence algorithm to analyze images and flag cases with detected findings in parallel to the ongoing standard of care image interpretation. The device provides analysis results that enable client systems to generate notifications for cases with suspected findings. These results can include DICOM instance UIDs for key images, which are meant for informational purposes only and not intended for primary diagnosis beyond notification. The device does not alter the original medical image and is not intended to be used as a diagnostic device.
The results of a2z-Unified-Triage are intended to be used in conjunction with other patient information and based on clinicians' professional judgment, to assist with triage/prioritization of medical images. Notified clinicians are responsible for viewing full images per the standard of care.
a2z-Unified-Triage is a radiological computer-assisted triage and notification software device. The software consists of an algorithmic component that supports both cloud-based and on-premises deployment on standard server hardware. The device processes abdomen/pelvis CT images from clinical imaging systems, analyzing them using artificial intelligence algorithms to detect suspected cases of 7 abdominopelvic conditions: Acute Cholecystitis, Acute Pancreatitis, Unruptured Abdominal Aortic Aneurysm, Acute Diverticulitis, Free Air, Hydronephrosis, and Small Bowel Obstruction.
Following the AI processing, the analysis results are returned to the client system for worklist prioritization. When a suspected case is detected, the software provides analysis results that enable the client system to generate appropriate notifications. These results can include DICOM instance UIDs for key images, which are for informational purposes only, do not contain any marking of the findings, and are not intended for primary diagnosis beyond notification.
Integration with clinical imaging systems facilitates efficient triage by enabling prioritization of suspect cases for review of the relevant original images in the PACS. Thus, the suspect case receives attention earlier than would have been the case in the standard of care practice alone.
Here's a detailed summary of the acceptance criteria and the study proving the device meets them, based on the provided FDA clearance letter:
Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance
a2z-Unified-Triage differentiates between two types of findings for regulatory purposes: QAS (Qualitative, Automated, and Subjective) and QFM (Quantitative, Functional, and Measurable).
| Condition Type | Acceptance Criteria | Device Performance (with 95% Confidence Intervals) |
|---|---|---|
| QFM Findings | AUC > 0.95 | |
| Acute Cholecystitis | AUC > 0.95 | AUC: 0.985 [0.972-0.998] (Also provided: High Sensitivity: Se 96.1% [89.2-98.7%], Sp 89.3% [86.6-91.5%]; Sensitivity Biased: Se 92.2% [84.0-96.4%], Sp 95.8% [93.9-97.2%]; Balanced: Se 92.2% [84.0-96.4%], Sp 95.8% [93.9-97.2%]) |
| Acute Pancreatitis | AUC > 0.95 | AUC: 0.994 [0.985-1.000] (Also provided: High Sensitivity: Se 98.0% [92.9-99.4%], Sp 87.8% [84.9-90.3%]; Sensitivity Biased: Se 98.0% [92.9-99.4%], Sp 97.0% [95.3-98.1%]; Balanced: Se 98.0% [92.9-99.4%], Sp 97.0% [95.3-98.1%]; High Specificity: Se 92.9% [86.1-96.5%], Sp 99.8% [99.0-100.0%]) |
| Unruptured AAA | AUC > 0.95 | AUC: 0.995 [0.991-0.999] (Also provided: High Sensitivity: Se 100.0% [95.2-100.0%], Sp 86.3% [83.3-88.8%]; Sensitivity Biased: Se 97.4% [90.9-99.3%], Sp 95.8% [93.9-97.2%]; Balanced: Se 97.4% [90.9-99.3%], Sp 97.5% [95.9-98.5%]) |
| Acute Diverticulitis | AUC > 0.95 | AUC: 0.995 [0.990-1.000] (Also provided: High Sensitivity: Se 98.7% [92.9-99.8%], Sp 89.3% [86.6-91.5%]; Sensitivity Biased: Se 97.4% [90.9-99.3%], Sp 96.8% [95.1-98.0%]; Balanced: Se 97.4% [90.9-99.3%], Sp 96.8% [95.1-98.0%]; High Specificity: Se 94.7% [87.2-97.9%], Sp 98.7% [97.4-99.3%]) |
| Hydronephrosis | AUC > 0.95 | AUC: 0.976 [0.960-0.991] (Also provided: High Sensitivity: Se 89.7% [82.1-94.3%], Sp 92.9% [90.5-94.7%]) |
| QAS Findings | Sensitivity > 80% and Specificity > 80% | |
| Small Bowel Obstruction | Sensitivity > 80%, Specificity > 80% | High Sensitivity: Se 94.9% [88.7-97.8%], Sp 91.7% [89.1-93.7%]; Sensitivity Biased: Se 91.9% [84.9-95.8%], Sp 96.0% [94.1-97.3%]; Balanced: Se 88.9% [81.2-93.7%], Sp 98.1% [96.6-98.9%] |
| Free Air | Sensitivity > 80%, Specificity > 80% | Balanced: Se 89.3% [82.2-93.8%], Sp 88.6% [85.7-91.0%]; High Specificity: Se 88.4% [81.1-93.1%], Sp 90.8% [88.1-92.9%] |
Turnaround Time Acceptance Criteria and Performance:
| Metric | Acceptance Criteria (Implied by Predicate) | Device Performance |
|---|---|---|
| Triage Turn-around Time | Mean < 81.6 seconds (Predicate's Mean) | Mean: 58.39 seconds (95% CI: 56.11-60.68) |
| Median: 55.02 seconds | ||
| 95th percentile: 90.36 seconds |
2. Sample size used for the test set and the data provenance
- Test Set Sample Size: 675 cases from 643 unique patients (after excluding 3 cases due to quality control failures from an initial 678 cases).
- Data Provenance: The data was sourced from multiple clinical sites within the United States. Specific states mentioned are New York (45.2%), Kansas (21.2%), Missouri (18.4%), Texas (15.0%), and Nebraska (0.3%). The study evaluated against clinical standards consistent with U.S. practice patterns. The data appears to be retrospective, as it was used for development and testing after collection.
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
- Number of Experts: A minimum of two U.S. board-certified radiologists, with a third U.S. board-certified expert adjudicator for discordant cases.
- Qualifications: All experts were U.S. board-certified radiologists. The third adjudicator was specifically fellowship-trained in body imaging.
4. Adjudication method (e.g. 2+1, 3+1, none) for the test set
- Adjudication Method: 2+1 methodology. Each case was independently reviewed by two U.S. board-certified radiologists. If the two initial readers disagreed, a third U.S. board-certified expert adjudicator (fellowship-trained in body imaging) provided the tie-breaking determination.
5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
- The provided document does not indicate that an MRMC comparative effectiveness study was performed or submitted for this clearance. The study described is a standalone performance assessment of the algorithm itself against ground truth.
6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done
- Yes, a standalone performance assessment was done. The document explicitly states: "A standalone performance assessment was performed for a2z-Unified-Triage to validate the accuracy of detecting the 7 findings against a reference standard established by U.S. board-certified radiologists."
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
- Type of Ground Truth: Expert consensus, specifically a 2+1 consensus of U.S. board-certified radiologists, with the third adjudicator being fellowship-trained in body imaging.
8. The sample size for the training set
- The document states, "The algorithms were developed on an extensive dataset of abdomen/pelvis CT studies from multiple clinical sites." However, a specific numerical sample size for the training set is not provided. It only mentions that strict protocols ensured complete independence between development and testing datasets (mutually exclusive patients).
9. How the ground truth for the training set was established
- The document does not explicitly detail how the ground truth for the training set was established. It only describes the ground truth establishment for the test set (2+1 radiologist consensus). It states that the algorithms were developed on an "extensive dataset" and implies internal processes for data collection and annotation during development.
Ask a specific question about this device
(210 days)
The iCardio.ai CardioVision™ AI is an automated machine learning–based decision support system, indicated as a diagnostic aid for patients undergoing an echocardiographic exam consisting of a single PLAX view in an outpatient environment, such as a primary care setting.
When utilized by an interpreting clinician, this device provides information that may be useful in detecting moderate or severe aortic stenosis. iCardio.ai CardioVision™ AI is indicated in adult populations over 21 years of age. Patient management decisions should not be made solely on the results of the iCardio.ai CardioVision™ AI analysis. iCardio.ai CardioVision™ AI analyzes a single cine-loop DICOM of the parasternal long axis (PLAX).
The iCardio.ai CardioVision™ AI is a standalone image analysis software developed by iCardio.ai Corporation, designed to assist in the review of echocardiography images. It is intended for adjunctive use with other physical vital sign parameters and patient information, but it is not intended to independently direct therapy. The device facilitates determining whether an echocardiographic exam is consistent with aortic stenosis (AS), by providing classification results that support clinical decision-making.
The iCardio.ai CardioVision™ AI takes as input a DICOM-compliant, partial or full echocardiogram study, which must include at least one parasternal long-axis (PLAX) view of the heart and at least one full cardiac cycle. The device uses a set of convolutional neural networks (CNNs) to analyze the image data and estimate the likelihood of moderate or severe aortic stenosis. The output consists of a binary classification of "none/mild" or "moderate/severe," indicating whether the echocardiogram is consistent with moderate or severe aortic stenosis. In cases where the image quality is insufficient, the device may output an "indeterminate" result.
The CNNs and their thresholds are fixed prior to validation and do not continuously learn during standalone testing. These models are coupled with pre- and post-processing functionalities, allowing the device to integrate seamlessly with pre-existing medical imaging workflows, including PACS, DICOM viewers, and imaging worklists. The iCardio.ai CardioVision™ AI is intended to be used as an aid in diagnosing AS, with the final diagnosis always made by an interpreting clinician, who should consider the patient's presentation, medical history, and additional diagnostic tests.
Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided FDA 510(k) clearance letter for CardioVision™:
Acceptance Criteria and Reported Device Performance
| Metric | Acceptance Criteria | Reported Device Performance (without indeterminate outputs) | Reported Device Performance (including indeterminate outputs) |
|---|---|---|---|
| AUROC | Exceeds predefined success criteria | 0.945 | Not explicitly stated but inferred to be similar to Sensitivity/Specificity |
| Sensitivity | Exceeds predefined success criteria and predicate device | 0.896 (95% Wilson score CI: [0.8427, 0.9321]) | 0.876 (95% Wilson score CI: [0.8213, 0.9162]) |
| Specificity | Exceeds predefined success criteria and predicate device | 0.872 (95% Wilson score CI: [0.8384, 0.8995]) | 0.866 (95% Wilson score CI: [0.8324, 0.8943]) |
| PPV | Not explicitly stated as acceptance criteria | 0.734 (95% Wilson score CI: [0.673, 0.787]) | Not explicitly stated |
| NPV | Not explicitly stated as acceptance criteria | 0.955 (95% Wilson score CI: [0.931, 0.971]) | Not explicitly stated |
| Rejection Rate | Not explicitly stated as acceptance criteria | 1.077% (7 out of 650 studies) | 1.077% |
Note: The document explicitly states that the levels of sensitivity and specificity exceed the predefined success criteria and those of the predicate device, supporting the claim of substantial equivalence. While exact numerical thresholds for the acceptance criteria aren't provided in terms of specific values, the narrative confirms they were met.
Study Details
| Feature | Description |
|---|---|
| 1. Sample size used for the test set and the data provenance | Sample Size: 650 echocardiography studies from 608 subjects.Data Provenance: Retrospective, multi-center performance study from 12 independent clinical sites across the United States. |
| 2. Number of experts used to establish the ground truth for the test set and the qualifications of those experts | Number of Experts: Not explicitly stated as a specific number, but referred to as "experienced Level III echocardiographers."Qualifications: "Experienced Level III echocardiographers." |
| 3. Adjudication method for the test set | Method: A "majority vote approach" was used in cases of disagreement among the experts. |
| 4. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, if so, what was the effect size of how much human readers improve with AI vs without AI assistance | MRMC Study: No, an MRMC comparative effectiveness study is not detailed in this document. The study described is a standalone performance evaluation of the AI. (A "human factors validation study" was conducted to evaluate usability, where participants successfully completed the critical task of results interpretation without errors, but this is not an MRMC study comparing human performance with and without AI assistance on diagnostic accuracy). |
| 5. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done | Standalone Performance Study: Yes, the document describes a "standalone study" with the primary objective to "evaluate the software's ability to detect aortic stenosis." The reported performance metrics (AUROC, Sensitivity, Specificity, etc.) are for the algorithm's performance alone. |
| 6. The type of ground truth used | Ground Truth Type: Expert consensus based on "echocardiographic assessments performed by experienced Level III echocardiographers," with a majority vote for disagreements. |
| 7. The sample size for the training set | Training Set Size: Not specified in the provided document. The document states, "No data from these [test set] sites were used in the training or tuning of the algorithm." |
| 8. How the ground truth for the training set was established | Training Set Ground Truth: Not explicitly detailed in the provided document. It can be inferred that similar methods (expert echocardiographic assessments) would have been used for training data, but the specifics are not provided. |
Ask a specific question about this device
(143 days)
The Ceribell Infant Seizure Detection Software is intended to mark previously acquired sections of EEG recordings in newborns (defined as preterm or term neonates of 25-44 weeks postmenstrual age) and infants less than 1 year of age that may correspond to electrographic seizures in order to assist qualified clinical practitioners in the assessment of EEG traces. The Seizure Detection Software also provides notifications to the user when detected seizure prevalence is "Frequent", "Abundant", or "Continuous", per the definitions of the American Clinical Neurophysiology Society Guideline 14. Delays of up to several minutes can occur between the beginning of a seizure and when the Seizure Detection notifications will be shown to a user.
The Ceribell Infant Seizure Detection Software does not provide any diagnostic conclusion about the subject's condition and Seizure Detection notifications cannot be used as a substitute for real time monitoring of the underlying EEG by a trained expert.
The Ceribell Infant Seizure Detection Software is a software-only device that is intended to mark previously acquired sections of EEG recordings that may correspond to electrographic seizures in order to assist qualified clinical practitioners in the assessment of EEG traces.
Ceribell Infant Seizure Detection Software: Acceptance Criteria and Supporting Study
1. Table of Acceptance Criteria and Reported Device Performance
| Activity Category | Metric | Acceptance Criteria | Device Performance (Overall) | 95% Confidence Interval | Meets Criteria? |
|---|---|---|---|---|---|
| Seizure Episodes with Seizure Burden ≥10% (Frequent) | PPA | Lower bound of 95% CI ≥ 70% | 91.36% | [85.71, 94.91] | Yes |
| FP/hr | Upper bound of 95% CI ≤ 0.446 FP/hr | 0.204 | [0.180, 0.230] | Yes | |
| Seizure Episodes with Seizure Burden ≥50% (Abundant) | PPA | Lower bound of 95% CI ≥ 70% | 91.23% | [82.67, 96.57] | Yes |
| FP/hr | Upper bound of 95% CI ≤ 0.446 FP/hr | 0.083 | [0.069, 0.100] | Yes | |
| Seizure Episodes with Seizure Burden ≥90% (Continuous) | PPA | Lower bound of 95% CI ≥ 70% | 91.18% | [75.00, 100.00] | Yes |
| FP/hr | Upper bound of 95% CI ≤ 0.446 FP/hr | 0.057 | [0.045, 0.072] | Yes |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size: 713 patients.
- 25-36 weeks PMA: 155 patients
- 37-44 weeks PMA: 321 patients
-
44 weeks PMA: 237 patients
- Data Provenance: The EEG recordings were obtained from patients less than 1 year of age who received continuous EEG monitoring within the hospital environment. The study was retrospective. The country of origin is not explicitly stated in the provided text.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications of Those Experts
- Number of Experts: 3
- Qualifications of Experts: Expert pediatric neurologists who were fellowship-trained in epilepsy or clinical neurophysiology.
4. Adjudication Method for the Test Set
- Adjudication Method: A two-thirds majority agreement among the 3 expert pediatric neurologists was required to form a determination of seizures, establishing the reference standard for the test set.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done
- No, a multi-reader multi-case (MRMC) comparative effectiveness study was not explicitly described. The study focused on the standalone performance of the algorithm against an expert-adjudicated ground truth.
6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done
- Yes, a standalone performance study was done. The performance metrics (PPA and FP/hr) were evaluated for the Ceribell Infant Seizure Detection Software algorithm without human intervention in the detection process. The reviewing neurologists for ground truth establishment were explicitly blinded to the software's output.
7. The Type of Ground Truth Used
- Type of Ground Truth: Expert consensus (adjudication by a panel of 3 expert pediatric neurologists).
8. The Sample Size for the Training Set
- The sample size for the training set is not provided in the document. The document states, "Importantly, none of the data in the validation dataset were used for training of the Seizure Detection algorithm; the validation dataset is completely independent."
9. How the Ground Truth for the Training Set Was Established
- The document does not explicitly state how the ground truth for the training set was established. It only mentions that the validation dataset was independent and not used for training.
Ask a specific question about this device
(23 days)
syngo.MR Applications is a syngo based post-acquisition image processing software for viewing, manipulating, evaluating, and analyzing MR, MR-PET, CT, PET, CT-PET images and MR spectra.
syngo.MR Applications is a software only Medical Device consisting post-processing applications/workflows used for viewing and evaluating the designated images provided by a MR diagnostic device. The post-processing applications/workflows are integrated with the hosting application syngo.via, that enables structured evaluation of the corresponding images
The provided FDA 510(k) clearance letter and summary for syngo.MR Applications (VB80) indicate that no clinical studies or bench testing were performed to establish new performance criteria or demonstrate meeting previously established acceptance criteria. The submission focuses on software changes and enhancements from a predicate device (syngo.MR Applications VB40).
Therefore, based solely on the provided document, I cannot create the requested tables and information because the document explicitly states:
- "No clinical studies were carried out for the product, all performance testing was conducted in a non-clinical fashion as part of verification and validation activities of the medical device."
- "No bench testing was required to be carried out for the product."
The document details the following regarding performance and acceptance:
- Non-clinical Performance Testing: "Non-clinical tests were conducted for the subject device during product development. The modifications described in this Premarket Notification were supported with verification and validation testing."
- Software Verification and Validation: "The performance data demonstrates continued conformance with special controls for medical devices containing software. Non-clinical tests were conducted on the device Syngo.MR Applications during product development... The testing results support that all the software specifications have met the acceptance criteria. Testing for verification and validation for the device was found acceptable to support the claims of substantial equivalence."
- Conclusion: "The predicate device was cleared based on non-clinical supportive information. The comparison of technological characteristics, device hazards, non-clinical performance data, and software validation data demonstrates that the subject device performs comparably to and is as safe and effective as the predicate device that is currently marketed for the same intended use."
This implies that the acceptance criteria are related to the functional specifications and performance of the software, as demonstrated by internal verification and validation activities, rather than a clinical performance study with specific quantitative metrics. The new component, "MR Prostate AI," is noted to be integrated without modification and had its own prior 510(k) clearance (K241770), suggesting its performance was established in that separate submission.
Without access to the actual verification and validation reports mentioned in the document, it's impossible to list precise acceptance criteria or detailed study results. The provided text only states that "all the software specifications have met the acceptance criteria."
Therefore, I can only provide an explanation of why the requested details cannot be extracted from this document:
Explanation Regarding Acceptance Criteria and Study Data:
The provided FDA 510(k) clearance letter and summary for syngo.MR Applications (VB80) explicitly state that no clinical studies or bench testing were performed for this submission. The device (syngo.MR Applications VB80) is presented as a new version of a predicate device (syngo.MR Applications VB40) with added features and enhancements, notably the integration of an existing AI algorithm, "Prostate MR AI VA10A (K241770)," which was cleared under a separate 510(k).
The basis for clearance is "non-clinical performance data" and "software validation data" demonstrating that the subject device performs comparably to and is as safe and effective as the predicate device. The document mentions that "all the software specifications have met the acceptance criteria" as part of the verification and validation (V&V) activities. However, the specific quantitative acceptance criteria, detailed performance metrics, sample sizes, ground truth establishment, or expert involvement for these V&V activities are not included in this public summary.
Therefore, the requested information cannot be precisely extracted from the provided text.
Summary of Information Available (and Not Available) from the Document:
| Information Requested | Status (Based on provided document) |
|---|---|
| 1. Table of acceptance criteria and reported performance | Not provided in the document. The document states: "The testing results support that all the software specifications have met the acceptance criteria." However, it does not specify what those acceptance criteria are or report detailed performance metrics against them. These would typically be found in the detailed V&V reports, which are not part of this summary. |
| 2. Sample size and data provenance for test set | Not provided. The document indicates "non-clinical tests were conducted as part of verification and validation activities." The sample sizes for these internal tests, the nature of the data, and its provenance (e.g., country, retrospective/prospective) are not detailed. It is implied that the data is not patient-specific clinical test data. |
| 3. Number of experts and qualifications for ground truth | Not applicable/Not provided. Since no clinical studies or specific performance evaluations against an external ground truth are described in this document, there's no mention of experts establishing ground truth for a test set. The validation appears to be against software specifications. If the "MR Prostate AI" component had such a study, those details would be in its individual 510(k) (K241770), not this submission. |
| 4. Adjudication method for test set | Not applicable/Not provided. As with the ground truth establishment, no adjudication method is mentioned because no external test set requiring such expert consensus is described within this 510(k) summary. |
| 5. MRMC comparative effectiveness study and effect size | Not performed for this submission. The document explicitly states "No clinical studies were carried out for the product." Therefore, no MRMC study or AI-assisted improvement effect size is reported here. |
| 6. Standalone (algorithm only) performance study | Partially addressed for a component. While this submission doesn't detail such a study, it notes that the "MR Prostate AI" algorithm is integrated without modification and "is classified under a different regulation in its 510(K) and this is out-of-scope from the current submission." This implies that a standalone performance study was done for the Prostate MR AI algorithm under its own 510(k) (K241770), but those details are not within this document. For the overall syngo.MR Applications (VB80) product, no standalone study is described. |
| 7. Type of ground truth used | Not provided for the overall device's V&V. The V&V activities are stated to have met "software specifications," which suggests an internal, design-based ground truth rather than clinical ground truth like pathology or outcomes data. For the integrated "MR Prostate AI" algorithm, clinical ground truth would have been established for its separate 510(k) submission. |
| 8. Sample size for the training set | Not applicable/Not provided for this submission. The document describes internal non-clinical V&V for the syngo.MR Applications software. It does not refer to a machine learning model's training set within this context. The "Prostate MR AI" algorithm, being independently cleared, would have its training set details in its specific 510(k) dossier (K241770), not here. |
| 9. How the ground truth for the training set was established | Not applicable/Not provided for this submission. As above, this document does not discuss a training set or its ground truth establishment for syngo.MR Applications. This information would pertain to the Prostate MR AI algorithm and be found in its own 510(k). |
Ask a specific question about this device
(138 days)
DTX Studio Assist is a Software Development Kit (SDK) designed to integrate with medical device software that displays two-dimensional dental radiographs. It contains a selection of algorithms that processes input data (two-dimensional radiographs) from the hosting application and returns a corresponding output to it.
DTX Studio Assist is intended to support the measurement of alveolar bone levels associated with each tooth. It is also intended to aid in the detection and segmentation of non-pathological structures (i.e., restorations and dental anatomy).
DTX Studio Assist contains a computer-assisted detection (CADe) function that analyzes bitewing and periapical radiographs of permanent teeth in patients aged 15 and older to identify and localize dental findings, including caries, calculus, periapical radiolucency, root canal filling deficiency, discrepancy at the margin of an existing restoration, and bone loss.
DTX Studio Assist is not intended as a replacement for a complete dentist's review nor their clinical judgment which takes into account other relevant information from the image, patient history, and actual in vivo clinical assessment.
DTX Studio Assist is a software development kit (SDK) that makes a selection of algorithms (including AI-based algorithms) available through a clean, well-documented API. DTX Studio Assist features are only available to licensed customers. The SDK has no user interface and is intended to be bundled with and used through other software products (hosting applications).
Key functionalities of DTX Studio Assist include:
Focus Area Detection on IOR images: The software features the Focus Area Detection algorithm which analyzes intraoral radiographs for potential dental findings (caries, periapical radiolucency, root canal filling deficiency, discrepancy at the margin of an existing restoration, bone loss and calculus) or image artifacts.
Alveolar Bone Level Measurement: The software enables the measurements of mesial and distal alveolar bone levels associated with each tooth.
Detection of historical treatments: The software enables automated detection and segmentation of dental restorations in IOR images to support dental charting which can be used during patient communication. The following restoration types are supported: amalgam fillings, composite fillings, prosthetic crowns, bridges, implants, implant abutments, root canal fillings and posts.
Anatomy Segmentation: The software segments dental structures by assigning a unique label to each pixel in IOR images, including enamel, dentine, pulp, bone, and artificial structures.
Here's a breakdown of the acceptance criteria and the studies that prove the device meets them, based on the provided FDA 510(k) Clearance Letter.
1. Table of Acceptance Criteria and Reported Device Performance
Note: The document does not explicitly state pre-defined acceptance criteria for the new features (Restoration Detection, ABL Measurement, Anatomy Segmentation). Instead, it presents the achieved performance metrics, implying that these values were considered acceptable. For the CADe function, the acceptance criteria are implied by the statistically significant improvement observed in the MRMC study.
| Feature / Metric | Acceptance Criteria (Implied) | Reported Device Performance |
|---|---|---|
| Focus Area Detection (CADe) | Statistically significant increase in AUC (AFROC analysis) when aided by the algorithm compared to unaided reading. | Achieved a highly significant AUC increase of 8.7% overall (p < 0.001) in the aided arm compared to the unaided arm. |
| Restoration Detection Algorithm | Acceptable standalone sensitivity, specificity, and Dice score for identifying and segmenting 8 types of dental restorations. | Overall Sensitivity: 88.8%Overall Specificity: 96.6%Mean Dice Score: 86.5% (closely matching inter-expert agreement) |
| Alveolar Bone Level (ABL) Measurement Algorithm | Acceptable standalone sensitivity and specificity for ABL line segment matching, and Mean Average Error (MAE) for ABL length measurements below a specific threshold (e.g., 1.5mm). | Sensitivity (ABL line segment matching): 93.2%Specificity (ABL line segment matching): 88.6%Average Mean Average Error (MAE) for ABL length: 0.26 mm (well below 1.5 mm threshold) |
| Anatomy Segmentation Algorithm | Acceptable standalone average Dice score, sensitivity, and specificity for identifying and segmenting key anatomical structures (Enamel, Dentine, Pulp, Jaw bone, artificial). | Overall Average Dice Score: 86.5%Overall Average Sensitivity: 89.0%Overall Average Specificity: 95.2% |
2. Sample Size Used for the Test Set and Data Provenance
| Feature / Study | Test Set Sample Size | Data Provenance | Retrospective/Prospective |
|---|---|---|---|
| Focus Area Detection (CADe) | 216 images (periapical and bitewing) | U.S.-based dental offices (using either sensors or photostimulable phosphor plates) | Retrospective |
| Restoration Detection Algorithm | 1,530 IOR images | Collected from dental practices across the United States and Europe. Images sourced from nine U.S. states and multiple European sites. | Retrospective |
| Alveolar Bone Level (ABL) Measurement Algorithm | 274 IOR images | Collected from 30 dental practices across the United States and Europe. Images sourced from multiple U.S and European sites. | Retrospective |
| Anatomy Segmentation Algorithm | 220 IOR images | Collected from dental practices across the United States and Europe. | Retrospective |
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
| Feature / Study | Number of Experts | Qualifications |
|---|---|---|
| Focus Area Detection (CADe) | Not explicitly stated in this document, but the MRMC study involved 30 readers (dentists) who participated in the diagnostic detection and localization tasks. The ground truth for the AFROC analysis would have been established by a panel of expert radiologists/dentists. | Dentists (for the MRMC study readers). For the ground truth establishment, typically board-certified radiologists/dentists with significant experience would be used, though specific qualifications are not detailed here. |
| Restoration Detection Algorithm | Three experts | Not explicitly stated, but for establishing ground truth in dental imaging, these would typically be board-certified dentists or oral radiologists with significant experience in diagnosing and identifying dental restorations. |
| Alveolar Bone Level (ABL) Measurement Algorithm | Three experts | Not explicitly stated, but for establishing ground truth in dental imaging, these would typically be board-certified dentists or oral radiologists with significant experience in measuring alveolar bone levels. |
| Anatomy Segmentation Algorithm | Not explicitly stated, but the "two-out-of-three consensus method" implies at least three experts were involved across the new features for ground truth. | Not explicitly stated, but for establishing ground truth in dental imaging, these would typically be board-certified dentists or oral radiologists with significant experience in identifying and segmenting dental anatomy. |
4. Adjudication Method for the Test Set Ground Truth
| Feature / Study | Adjudication Method |
|---|---|
| Focus Area Detection (CADe) | Not explicitly stated in this document. The MRMC study used AFROC analysis, implying a comprehensive ground truth established prior to the reader study. |
| Restoration Detection Algorithm | Two-out-of-three consensus method |
| Alveolar Bone Level (ABL) Measurement Algorithm | Two-out-of-three consensus method |
| Anatomy Segmentation Algorithm | Implied two-out-of-three consensus method (similar to other new features, although not explicitly stated for this specific algorithm). |
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study Was Done, and Effect Size
Yes, a Multi-Reader Multi-Case (MRMC) comparative effectiveness study was done for the Focus Area Detection (CADe) functionality.
Effect Size: The study demonstrated a highly significant AUC increase (p < 0.001) of 8.7% overall in the aided arm (human readers with AI assistance) compared to the unaided control arm (human readers without AI assistance). This indicates that the AI significantly improved dentists' diagnostic detection and localization performance.
6. If a Standalone (Algorithm Only Without Human-in-the-Loop Performance) Was Done
Yes, standalone performance studies were done for all functionalities mentioned:
- Focus Area Detection (CADe): While the primary demonstration of effectiveness was through an MRMC study, the summary states, "The standalone performance testing results supporting this feature are included in that submission [K221921]," indicating standalone testing was performed.
- Restoration Detection Algorithm: "A standalone performance assessment was conducted to evaluate the DTX Studio Assist IOR Restoration Detection algorithm independently, without interaction from dental professionals..."
- Alveolar Bone Level (ABL) Measurement Algorithm: "A standalone performance assessment was conducted to evaluate the DTX Studio Assist IOR Alveolar Bone Level (ABL) Measurement algorithm independently, without interaction from dental professionals..."
- Anatomy Segmentation Algorithm: "A standalone performance assessment was conducted to evaluate the DTX Studio Assist IOR Anatomy Segmentation algorithm independently, without interaction from dental professionals..."
7. The Type of Ground Truth Used
| Feature / Study | Type of Ground Truth |
|---|---|
| Focus Area Detection (CADe) | Expert consensus (implied by the MRMC study setup and AFROC analysis, where an established truth is required for evaluating reader performance). |
| Restoration Detection Algorithm | Expert consensus (established by a two-out-of-three consensus method). |
| Alveolar Bone Level (ABL) Measurement Algorithm | Expert consensus (established by a two-out-of-three consensus method). |
| Anatomy Segmentation Algorithm | Expert consensus (established by a two-out-of-three consensus method, implied). |
8. The Sample Size for the Training Set
The document does not provide the specific sample size for the training set for any of the algorithms. It focuses on the validation (test) sets.
9. How the Ground Truth for the Training Set Was Established
The document does not provide details on how the ground truth for the training set was established. It only describes the ground truth establishment for the test sets. It does mention that the algorithms are based on "supervised machine learning algorithms," which inherently means they were trained on data with established ground truth.
Ask a specific question about this device
(115 days)
DeepRESP is an aid in the diagnosis of various sleep disorders where subjects are often evaluated during the initiation or follow-up of treatment of various sleep disorders. The recordings to be analyzed by DeepRESP can be performed in a hospital, a patient's home, or an ambulatory setting. It is indicated for use with adults (18 years and above) in a clinical environment by or on the order of a medical professional.
DeepRESP is intended to mark sleep study signals to aid in the identification of events and annotations of traces; automatically calculate measures obtained from recorded signals (e.g., magnitude, time, frequency, and statistical measures of marked events); and infer sleep staging with arousals with EEG and in the absence of EEG. All output is subject to verification by a medical professional.
DeepRESP is a cloud-based software as a medical device (SaMD), designed to perform analysis of sleep study recordings, with and without EEG signals, providing data for the assessment and diagnosis of sleep-related disorders. Its algorithmic framework provides the derivation of sleep staging, including arousals, scoring of respiratory events, and key parameters such as the Apnea-Hypopnea Index (AHI) and Central Apnea-Hypopnea Index (CAHI).
DeepRESP (K252330) is hosted on a serverless stack. It consists of:
- A web Application Programming Interface (API) intended to interface with a third-party client application, allowing medical professionals to access DeepRESP's analytical capabilities.
- Predefined sequences called Protocols that run data analyses, including artificial intelligence and rule-based models for the scoring of sleep studies, and a parameter calculation service.
- A Result storage using an object storage service to temporarily store outputs from the DeepRESP Protocols.
Here's a breakdown of the acceptance criteria and study details for DeepRESP, based on the provided FDA 510(k) clearance letter:
1. Table of Acceptance Criteria and Reported Device Performance
The document does not explicitly state pre-defined acceptance criteria (e.g., "DeepRESP must achieve AHI ≥ 5 PPA% of at least X%"). Instead, it reports the performance of the device and compares it to predicate devices to demonstrate substantial equivalence and non-inferiority. The "Observed paired differences" columns, particularly those where the confidence interval does not cross zero, imply a comparison to show that the new DeepRESP v2.0 performs at least as well as the previous version or the additional predicate.
For the purpose of this analysis, I will present the reported performance of the Subject Device (DeepRESP v2.0) as the "reported device performance." Since no explicit acceptance criteria thresholds are given, the comparison to predicates and the demonstration of non-inferiority served as the implicit acceptance path for the FDA clearance.
Reported Device Performance (DeepRESP v2.0)
| Metric | Type I/II Studies (EEG) Reported Performance PPA% (NPA%, OPA%) | Type III HSAT-Flow Studies Reported Performance PPA% (NPA%, OPA%) | Type III HSAT-RIP Studies Reported Performance PPA% (NPA%, OPA%) |
|---|---|---|---|
| Severity Classification | |||
| AHI ≥ 5 | 87.7 (76.5, 87.3) | 91.0 (78.0, 90.6) | 93.7 (63.5, 92.8) |
| AHI ≥ 15 | 71.9 (94.8, 78.2) | 78.1 (93.9, 81.7) | 81.0 (91.1, 83.4) |
| CAHI ≥ 5 | 80.0 (98.0, 97.2) | 80.7 (98.0, 97.2) | 79.5 (97.6, 96.9) |
| Sleep Stages | |||
| Wake | 92.8 (95.8, 95.1) | 79.7 (96.6, 92.9) | 79.7 (96.6, 92.9) |
| REM | 82.5 (98.8, 96.5) | 77.0 (98.1, 95.2) | 77.0 (98.1, 95.2) |
| NREM1 | 43.1 (94.5, 91.7) | N/A (Only NREM total reported for Type III studies) | N/A (Only NREM total reported for Type III studies) |
| NREM2 | 78.1 (91.5, 85.3) | N/A (Only NREM total reported for Type III studies) | N/A (Only NREM total reported for Type III studies) |
| NREM3 | 87.5 (94.6, 93.7) | N/A (Only NREM total reported for Type III studies) | N/A (Only NREM total reported for Type III studies) |
| NREM (Total for Type III) | N/A | 94.2 (80.1, 89.1) | 94.2 (80.1, 89.1) |
| Respiratory Events | |||
| Respiratory events (overall) | 71.2 (93.2, 85.0) | 74.4 (92.0, 85.5) | 75.0 (90.7, 84.8) |
| All apnea | 83.7 (98.2, 97.1) | 84.5 (98.2, 97.0) | 81.1 (95.7, 94.5) |
| Central apnea | 79.3 (99.2, 99.0) | 77.5 (99.2, 99.0) | 78.8 (99.2, 99.0) |
| Obstructive apnea | 76.2 (98.4, 97.0) | 78.4 (98.4, 97.0) | 74.3 (96.0, 94.4) |
| Hypopnea | 60.1 (92.9, 83.5) | 63.9 (91.7, 83.3) | 58.9 (90.7, 81.0) |
| Desaturation | 98.5 (95.5, 96.1) | 98.8 (96.3, 96.9) | 98.8 (96.3, 96.9) |
| Arousal events | 62.1 (89.1, 81.5) | 64.0 (90.5, 83.1) | 64.0 (90.5, 83.0) |
PPA%: Positive Percent Agreement, NPA%: Negative Percent Agreement, OPA%: Overall Percent Agreement.
2. Sample Sizes and Data Provenance
The clinical validation was conducted using retrospective data.
-
Test Set Sample Size:
- Type I/II Scoring Validation: 4,030 PSG recordings
- Type III Scoring Validation: 5,771 sleep recordings
- This comprised 4,037 Type I recordings and 1,734 Type II recordings, processed as Type III by using only the relevant subset of signals.
-
Data Provenance:
- Country of Origin: United States.
- Data Type: Manually scored sleep recordings from sleep clinics, collected as part of routine clinical work for patients suspected of suffering from sleep disorders.
- Settings: Urban, suburban, and rural areas.
- Demographics: Included individuals in all age groups (18-21, 22-35, 36-45, 46-55, 56-65, >65) and all BMI groups (<25, 25-30, <30). The recording collection for Type I/II scoring consisted of 44% Females, and for Type III scoring, 35% Females. High level of race/ethnicity diversity (Caucasian or White, Black or African American, Other).
3. Number of Experts and Qualifications for Ground Truth
- Number of Experts: Not explicitly stated. The document refers to "manually scored sleep recordings" and "medical professional" for verification. It also mentions "board-certified sleep physicians" in the context of the training set. However, the specific number of experts used to establish the ground truth for the test set is not detailed.
- Qualifications of Experts: For the test set, it's implied that "medical professionals" performed the manual scoring, as the data originated from "routine clinical work." For the training set, "board-certified sleep physicians" were involved in establishing the ground truth.
4. Adjudication Method for the Test Set
The document does not explicitly describe an adjudication method (e.g., 2+1, 3+1) for establishing the ground truth on the test set. It mentions "manually scored sleep recordings" but does not detail how potential disagreements between multiple scorers (if any were used) were resolved.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
No, a multi-reader multi-case (MRMC) comparative effectiveness study was not done.
The study design was a retrospective data study comparing the automatic scoring of DeepRESP to manual scoring (ground truth) and also comparing DeepRESP's performance to two predicate devices (DeepRESP K241960 and Nox Sleep System K192469). This is a standalone performance evaluation against expert-derived ground truth, with a direct comparison to existing automated systems, not an MRMC study assessing human reader improvement with AI assistance.
6. Standalone Performance Study
Yes, a standalone performance study was done. The reported PPA, NPA, and OPA values for DeepRESP v2.0 (Subject Device) represent its performance as a standalone algorithm without human-in-the-loop assistance. The subsequent comparison to the predicate devices also evaluated their standalone performance. The document explicitly states: "All output is subject to verification by a medical professional," indicating that while the device is intended to aid in diagnosis, its performance evaluation was conducted on its automated output before any human review.
7. Type of Ground Truth Used
The ground truth used was expert consensus (manual scoring). The study used "manually scored sleep recordings" from "routine clinical work" as the reference standard against which DeepRESP's automatic scoring was compared.
8. Sample Size for the Training Set
The document does not report the sample size used for the training set. It only states the sample sizes for the validation (test) sets: 4,030 PSG recordings for Type I/II validation and 5,771 sleep recordings for Type III validation.
9. How the Ground Truth for the Training Set was Established
The document states that the ground truth for the training set "was established through a rigorous process involving multiple board-certified sleep physicians." This implies an expert-driven process, likely involving consensus or reconciliation among several highly qualified professionals. However, the exact methodology (e.g., number of physicians, adjudication rules) is not detailed beyond "rigorous process."
Ask a specific question about this device
(266 days)
Second Opinion® Panoramic is a radiological automated image processing software device intended to identify and mark regions, in panoramic radiographs, in relation to suspected dental findings which include: Caries, Periapical radiolucency, and Impacted third molars.
It is designed to aid dental health professionals to review panoramic radiographs of permanent teeth in patients 16 years of age or older as both a concurrent and second reader.
Second Opinion® PR is a radiological automated image processing software device intended to identify and mark regions, in panoramic radiographs, in relation to suspected dental findings which include: caries, periapical radiolucency, and impacted third molars. It should not be used in lieu of full patient evaluation or solely relied upon to make or confirm a diagnosis.
It is designed to aid dental health professionals to review panoramic radiographs of permanent teeth in patients 16 years of age or older as a concurrent and second reader.
Second Opinion® PR consists of three parts:
- Application Programing Interface ("API")
- Machine Learning Modules ("ML Modules")
- Client User Interface (UI) ("Client")
The processing sequence for an image is as follows:
- Images are sent for processing via the API
- The API routes images to the ML modules
- The ML modules produce detection output
- The UI renders the detection output
The API serves as a conduit for passing imagery and metadata between the user interface and the machine learning modules. The API sends imagery to the machine learning modules for processing and subsequently receives metadata generated by the machine learning modules which is passed to the interface for rendering.
Second Opinion® PR uses machine learning to detect regions of interest. Images received by the ML modules are processed yielding detections which are represented as metadata. The final output is made accessible to the API for the purpose of sending to the UI for visualization. Detected regions of interest are displayed as mask overlays atop the original radiograph which indicate to the practitioner which regions contain which detected potential conditions that may require clinical review. The clinician can toggle over the image to highlight a potential condition for viewing.
Here's a detailed breakdown of the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) Clearance Letter:
1. A table of acceptance criteria and the reported device performance
| Performance Metric | Acceptance Criteria (Pre-specified Performance Threshold) | Reported Device Performance (Standalone Study) |
|---|---|---|
| Impacted Third Molars | ||
| wAFROC FOM | > 0.78 | 0.9788 |
| Lesion-level Sensitivity | Not explicitly stated (implied high) | 99% |
| Dice | Not explicitly stated (implied high) | ≥ 0.68 (overall for segmentation) |
| Jaccard Index | Not explicitly stated (implied high) | ≥ 0.62 (overall for segmentation) |
| Periapical Radiolucency | ||
| wAFROC FOM | > 0.71 | 0.8113 |
| Lesion-level Sensitivity | Not explicitly stated (implied high) | 82% |
| Dice | Not explicitly stated (implied high) | ≥ 0.68 (overall for segmentation) |
| Jaccard Index | Not explicitly stated (implied high) | ≥ 0.62 (overall for segmentation) |
| Caries | ||
| wAFROC FOM | > 0.70 | 0.7211 |
| Lesion-level Sensitivity | Not explicitly stated (implied high) | 77% |
| Dice | Not explicitly stated (implied high) | ≥ 0.68 (overall for segmentation) |
| Jaccard Index | Not explicitly stated (implied high) | ≥ 0.62 (overall for segmentation) |
| General (Across all features) | ||
| Statistical Significance (p-value) | < 0.0001 (implied for exceeding thresholds) | < 0.0001 (for all wAFROC values) |
| Segmentation (Dice & Jaccard) | Not explicitly stated (implied high) | Dice ≥ 0.68, Jaccard ≥ 0.62 |
| MRMC (Improvement with AI) | ||
| Periapical Radiolucency (Lesion wAFROC difference) | Statistically significant increase | 0.0705 (p < 0.00001) |
| Periapical Radiolucency (Lesion Sens. gain) | Statistically significant increase | 0.2045 (p < 0.00001) |
| Caries (Lesion wAFROC difference) | Statistically significant increase | 0.0306 (p = 0.0195) |
| Caries (Lesion Sens. gain) | Statistically significant increase | 0.1169 |
| Impacted Teeth (Lesion wAFROC difference) | Statistically significant increase | 0.0093 (p = 0.0326) |
| Impacted Teeth (Lesion Sens. gain) | Statistically significant increase | 0.0192 |
| FPPI or Specificity | Not increasing/reducing significantly | Stable FPPI, high specificity (≥0.97) maintained |
2. Sample size used for the test set and the data provenance
- Sample Size (Test Set): An "enriched regionally balanced image set of 795 images" was used for the clinical evaluation.
- Data Provenance:
- Country of Origin: Not explicitly stated for each image, but geographically diverse, described "with respect to the United States" and including specific regions (Northwest, Northeast, South, West, Midwest).
- Retrospective/Prospective: The study is described as "retrospective" due to "non-patient-contact nature."
3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts
- Number of Experts: Four board-certified dentists.
- Qualifications of Experts: Each possessed "a minimum of five years practice experience" as ground truth readers.
4. Adjudication method (e.g., 2+1, 3+1, none) for the test set
- Adjudication Method: Consensus approach based on agreement among at least three out of four expert readers. (This is a 3-out-of-4 or 3/4 consensus method).
5. If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance
- Yes, a fully-crossed MRMC study was done.
- Effect Size of Improvement (AI-aided vs. unaided reading):
- Periapical Radiolucency:
- Lesion level wAFROC difference: 0.0705 (95% CI: 0.04–0.10)
- Image level wAFROC difference: 0.0715 (95% CI: 0.07–0.07)
- Lesion-level sensitivity gain: 0.2045 (95% CI: 0.17–0.24)
- Caries:
- Lesion level wAFROC difference: 0.0306 (95% CI: 0.00–0.06)
- Image level wAFROC difference: 0.0176 (95% CI: 0.02–0.02)
- Lesion-level sensitivity gain: 0.1169 (95% CI: 0.08–0.15)
- Impacted Teeth:
- Lesion level wAFROC difference: 0.0093 (95% CI: 0.00–0.02)
- Image level wAFROC difference: 0.0715 (95% CI: 0.07–0.07)
- Lesion-level sensitivity gain: 0.0192 (95% CI: 0.01–0.03)
- Periapical Radiolucency:
6. If a standalone (i.e., algorithm only without human-in-the-loop performance) was done
- Yes, a standalone clinical study was done. The results are discussed in the "Standalone Testing" section, demonstrating the algorithm's performance independent of human readers.
7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.)
- Type of Ground Truth: Expert consensus. Specifically, "consensus ground truth established by expert dental radiologists" using agreement among the four board-certified dentists.
8. The sample size for the training set
- The document does not provide the sample size for the training set. It only describes the test set size.
9. How the ground truth for the training set was established
- The document does not specify how the ground truth for the training set was established. It only details the ground truth establishment process for the test set.
Ask a specific question about this device
(220 days)
AutoDensity is a post-processing software intended to estimate spine Bone Mineral Density (BMD) from EOSedge dual energy images for orthopedic pre-surgical assessment applications. It is an opportunistic tool that enables immediate assessment of bone density from EOSedge images acquired for other purposes.
AutoDensity is not intended to replace DXA screening. Suspected low BMD should be confirmed by a DXA exam.
Clinical judgment and experience are required to properly use the software.
Based on EOSedge™ system's images acquired with the dual energy protocols cleared in K233920, AutoDensity software provides an estimate of the Bone Mineral Density (BMD) for L1-L4 in EOSedge AP radiographs of the spine. These values are used to aid in BMD estimation in orthopedic surgical planning workflows to help inform patient assessment and surgical decisions. AutoDensity is opportunistic in nature and provides BMD information with equivalent radiation dose compared to the EOSedge images concurrently acquired and used for general radiographic exams. AutoDensity is not intended to replace DXA screening.
Here's a breakdown of the acceptance criteria and the study details for the AutoDensity device, based on the provided FDA 510(k) clearance letter:
1. Acceptance Criteria and Reported Device Performance
Device Name: AutoDensity
Intended Use: Post-processing software to estimate spine Bone Mineral Density (BMD) from EOSedge dual energy images for orthopedic pre-surgical assessment applications.
| Acceptance Criteria | Reported Device Performance |
|---|---|
| Vertebral Level Identification Accuracy | |
| Percent of levels correctly identified ≥ 90% | Testing confirms that the AutoDensity ROI detection algorithm meets performance thresholds. (Specific percentage not provided, but stated to meet criterion). |
| Spine ROI Accuracy (Dice Coefficient) | |
| Lower boundary of 95% CI of mean Dice Coefficient ≥ 0.80 | Testing confirms that the AutoDensity ROI detection algorithm meets performance thresholds. (Specific value not provided, but stated to meet criterion). |
| BMD Precision (Phantom - CV%) | |
| CV% < 1.5% (compared to reference device) | Results met the acceptance criterion (CV% < 1.5%). |
| BMD Agreement (Phantom - max difference) | |
| (Specific numeric criterion not explicitly stated, but implies clinical equivalence to reference device) | Maximum BMD difference of 0.057 g/cm² for the high BMD phantom vertebra, and a difference of < 0.018 g/cm² for clinically relevant BMD range. |
| BMD Precision (Clinical - CV%) | |
| (Specific numeric criterion not explicitly stated, but implies acceptable clinical limits) | AutoDensity precision CV% was 2.23% [95% CI: 1.78%, 2.98%], which is within the range of acceptable clinical limits for the specified pre-surgical orthopedic patient assessment. |
| BMD Agreement (Clinical - Bland-Altman) | |
| (Specific numeric criterion not explicitly stated, but implies equivalence to other commercial bone densitometers) | Bland-Altman bias was 0.045 g/cm², and limits of agreement (LoA) were [-0.088 g/cm², 0.178 g/cm²]. Stated as equivalent to published agreement between other commercial bone densitometers. |
2. Sample Sizes and Data Provenance
Test Set (for ROI Performance Evaluation):
- Sample Size: 129 patients.
- Data Provenance: All cases obtained from EOSedge systems (K233920). The document does not specify the country of origin but mentions a clinical study with 65% US subjects and 35% French subjects for clinical performance testing, which might suggest a similar distribution for the test set, though it's not explicitly stated for the ROI test set. The data was retrospective as it was "obtained from EOSedge systems."
3. Number of Experts and Qualifications for Ground Truth
For ROI Performance Evaluation Test Set:
- Number of Experts: At least 3 (implied by "3 truther majority voting principle") plus one senior US board certified expert radiologist who acted as the gold standard adjudicator.
- Qualifications:
- Two trained technologists (for initial ROI and level identification).
- One senior US board-certified expert radiologist (for supervision, review, selection of most accurate set, and final adjustments).
4. Adjudication Method for the Test Set
For ROI Performance Evaluation Test Set:
- Adjudication Method: A "3 truther majority voting principle" was used, with input from a senior US board-certified expert radiologist (acting as the "gold standard"). The radiologist reviewed results, selected the more accurate set, and made necessary adjustments. This combines elements of majority voting with expert adjudication.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was an MRMC study done? No, the provided document does not mention an MRMC comparative effectiveness study where human readers' performance with and without AI assistance was evaluated. The performance data presented focuses on the standalone performance of the AI algorithm and its agreement/precision with a reference device or clinical measurements.
6. Standalone Performance Study (Algorithm Only)
- Was a standalone study done? Yes. The "Region of Interest (ROI) Performance Evaluation" section explicitly states: "To assess the standalone performance of the AI algorithm of AutoDensity, the test was performed with..." This section details the evaluation of the algorithm's predictions against ground truth for vertebral level identification and spine ROI accuracy.
7. Type of Ground Truth Used
For ROI Performance Evaluation Test Set:
- Type of Ground Truth: Expert consensus with adjudication. Ground truths for ROIs and level identification were established by two trained technologists under the supervision of a senior US board-certified radiologist. The radiologist made the final informed decision, often described as a "gold standard."
8. Sample Size for the Training Set
- Training Set Sample Size: The AI algorithm was trained using 4,679 3D reconstructions and 9,358 corresponding EOS (K152788) or EOSedge (K233920) biplanar 2D X-ray images.
9. How Ground Truth for the Training Set was Established
- The document implies that the training data was "selected to only keep relevant images with the fields of view of interest." However, it does not explicitly detail how the ground truth for the training set was established (e.g., whether it used expert annotations, a similar adjudication process, or other methods). It primarily focuses on the test set ground truth establishment.
Ask a specific question about this device
(282 days)
SugarBug is a radiological, automated, concurrent read, computer-assisted detection software intended to aid in the detection and segmentation of caries on bitewing radiographs. The device provides additional information for the dentist to use in their diagnosis of a tooth surface suspected of being carious. Sugarbug is intended to be used on patients 18 years and older. The device is not intended as a replacement for a complete dentist's review or their clinical judgment that takes into account other relevant information from the image, patient history, and actual in vivo clinical assessment.
SugarBug is a software as a medical device (SaMD) that uses machine learning to label features that the reader should examine for evidence of decay. SugarBug uses convolutional neural network to perform a semantic segmentation task. The algorithm goes through every pixel in an image and assigns a probability value to it for the possibility that it contains decay. A threshold is used to determine which pixels are labeled in the device's output. The software reads the selected image using local processing; images are not imported or sent to a cloud server any time during routine use.
Here's a breakdown of the acceptance criteria and the study details for the SugarBug (1.x) device, based on the provided FDA 510(k) clearance letter:
1. Acceptance Criteria and Reported Device Performance
The direct "acceptance criteria" are not explicitly stated in a quantitative table for this device. However, based on the clinical study results and the stated objectives, the implicit acceptance criteria would have been:
- Statistically significant improvement in overall diagnostic performance (wAFROC-AUC) for aided readers compared to unaided readers.
- Demonstrated improvement in lesion-level sensitivity for aided readers.
- Maintain or improve lesion annotation quality (DICE scores) with aid.
- Standalone performance metrics (sensitivity, FPPI, DICE coefficient) within an acceptable range.
Here's a table summarizing the reported device performance against these implicit criteria:
| Metric | Acceptance Criteria (Implicit) | Reported Unaided Reader Performance | Reported Aided Reader Performance | Reported Difference (Aided vs. Unaided) | Statistical Significance | Standalone Device Performance |
|---|---|---|---|---|---|---|
| MRMC Study (Aided vs. Unaided) | ||||||
| wAFROC-AUC (Primary Endpoint) | Statistically significant improvement with aid | 0.659 (0.611, 0.707) | 0.725 (0.683, 0.767) | 0.066 (0.030, 0.102) | p = 0.001 (Significant) | N/A |
| Lesion-Level Sensitivity | Statistically significant improvement with aid | 0.540 (0.445, 0.621) | 0.674 (0.615, 0.728) | 0.134 (0.066, 0.206) | Significant | N/A |
| Mean FPPI | Maintain or improve (small or negative difference) | 0.328 (0.102, 0.331) | 0.325 (0.128, 0.310) | -0.003 (-0.103, 0.086) | Not statistically significant (small improvement) | N/A |
| Mean DICE Scores (Readers) | Improvement in lesion delineation | 0.695 (0.688, 0.702) | 0.740 (0.733, 0.747) | 0.045 (0.035, 0.055) | N/A (modest improvement) | N/A |
| Standalone Study | ||||||
| Lesion-level sensitivity | Acceptable range | N/A | N/A | N/A | N/A | 0.686 (0.655, 0.717) |
| Mean FPPI | Acceptable range | N/A | N/A | N/A | N/A | 0.231 (0.111, 0.303) |
| DICE coefficient (vs. ground truth) | Acceptable range | N/A | N/A | N/A | N/A | 0.746 (0.724, 0.768) |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size for Test Set (MRMC Study): 300 bitewing radiographic images.
- Sample Size for Test Set (Standalone Study): 400 de-identified images.
- Data Provenance: Retrospectively collected from routine dental examinations of patients aged 18 and older from the US. The images were sampled to be representative of a range of x-ray sensor types (Vatech HD 29%, iSensor H2 11%, Schick 33: 45%, Dexis Platinum 15%).
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications
- Number of Experts: 3 US licensed general dentists.
- Qualifications: Mean of 27 years of clinical experience.
4. Adjudication Method for the Test Set
- Adjudication Method (Ground Truth): Consensus labels of the 3 US licensed general dentists.
5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study
- Was a MRMC study done? Yes.
- Effect size of human readers improvement with AI vs. without AI assistance:
- wAFROC-AUC Improvement: 0.066 (0.030, 0.102) with a p-value of 0.001.
- Lesion-Level Sensitivity Improvement: 0.134 (0.066, 0.206).
6. Standalone (Algorithm Only) Performance Study
- Was a standalone study done? Yes.
- Performance metrics:
- Lesion-level sensitivity: 0.686 (0.655, 0.717)
- Mean FPPI: 0.231 (0.111, 0.303)
- DICE coefficient versus ground truth: 0.746 (0.724, 0.768)
7. Type of Ground Truth Used
- Type of Ground Truth: Expert consensus (established by 3 US licensed general dentists).
8. Sample Size for the Training Set
- The document does not explicitly state the sample size used for the training set. It only describes the test sets.
9. How the Ground Truth for the Training Set Was Established
- The document does not explicitly state how the ground truth for the training set was established. It only mentions that the standalone testing data (which could be considered a "test set" for the standalone algorithm) was "collected and labeled in the same procedure as the MRMC study," implying expert consensus was used for that, but it doesn't specify for the training data.
Ask a specific question about this device
Page 1 of 81