(389 days)
Second Opinion® is a computer aided detection ("CADe") software to identify and mark regions in relation to suspected dental findings which include Caries, Discrepancy at the margin of an existing restoration, Calculus, Periapical radiolucency, Crown (metal, including zirconia & non-metal), Filling (metal & non-metal), Root canal, Bridge and Implants.
It is designed to aid dental health professionals to review bitewing and periapical radiographs of permanent teeth in patients 12 years of age or older as a second reader.
Second Opinion®is a computer aided detection ("CADe") software device indicated for use by dental health professionals as an aid in their assessment of bitewing and periapical radiographs of permanent teeth in patients 12 years of age or older. Second Opinion® employs computer vision technology, developed using machine learning techniques, to detect and draw attention as second reader to regions on bitewing and periapical radiographs where distinct pathologic and/or nonpathologic dental features may appear.
Second Opinion® consists of three parts:
- In-office application or Client User Interface ("Client")
- Application Programing Interface ("API")
- Computer Vision Models ("CV Model", "CV Models")
The Client resides in the clinician's office. The API and CV Models reside in a cloud computing platform, where image processing takes place.
The CV Models create and append to a metadata file information denoting pixel regions and other associated properties of each radiograph. Those associated properties include:
- Normal anatomy (e.g., Teeth)
- Nine radiological dental findings, which include five restorations (crowns, bridges, implants, root canals, fillings) and four pathologies (caries, marqin discrepancy -MD, calculus, periapical radiolucency - PR)
The API delivers the metadata back to Second Opinion® via the cloud. The metadata information is displayed in graphical form to clinical users by way of the Second Opinion® Client's user interface.
Here's a summary of the acceptance criteria and the study proving the device meets them, based on the provided text:
Acceptance Criteria and Device Performance
1. Table of Acceptance Criteria and Reported Device Performance
The acceptance criteria for the Second Opinion® device appear to be primarily based on the demonstrated improvement in reader performance when aided by the device, as well as the device's standalone detection capabilities. The document details the performance metrics used rather than explicitly stating pre-defined "acceptance criteria" with numerical thresholds for all aspects. However, based on the clinical study results and conclusion, the following can be inferred:
Feature/Metric | Acceptance Criteria (Implied) | Reported Device Performance |
---|---|---|
Standalone Performance | ||
wAFROC-FOM (Caries, MD, Calculus, PR) | Comparable performance to unaided readers in detecting features based on Jaccard Index (JI) ≥ 0.4 for Lesion Localization. | JI ≥ 0.4: Caries: (0.73, 0.79), MD: (0.71, 0.78), Calculus: (0.78, 0.85), PR: (0.75, 0.84) (95% CI) |
JI ≥ 0.5: Caries: (0.61, 0.68), MD: (0.62, 0.68), Calculus: (0.75, 0.81), PR: (0.69, 0.78) (95% CI) | ||
Standalone Sensitivity | Not explicitly stated as a target, but performance was assessed. | Range: 76.39% – 89.77% |
Standalone False Positive Rate (FPPI) | Not explicitly stated as a target, but performance was assessed. | Range: 0.46 - 4.85 |
MRMC (Aided vs. Unaided) Performance | ||
Aided Reader Accuracy Improvement | Statistically significant improvement over unaided readers for caries, margin discrepancy (MD), calculus, and periapical radiolucency (PR). All pathologies must meet pre-specified endpoints for the MRMC study. No statistically significant reductions in performance. | Statistically significant improvement for Caries, MD, Calculus, and PR. |
Caries wAFROC-FOM: Unaided: 0.740, Aided: 0.758 (P=0.0062) | ||
Sensitivity Improvement: Range 0.9% - 11.7% | ||
Proportion of Readers with improved sensitivity: Caries (68%), MD (76%), Calculus (88%), PR (100%) | ||
FPPI Improvement: Range 0.08 - 0.136 | ||
Proportion of Readers with improved FPPI: Caries (92%), MD (96%), Calculus (100%), PR (36%) |
2. Sample Size Used for the Test Set and Data Provenance
- Test Set Sample Size: 2,010 images. These images were reviewed by all four Ground Truth (GT) readers for both standalone and MRMC studies.
- Image Composition:
- Caries: 1,640 normal, 370 lesion-containing (655 lesions)
- MD: 1,741 normal, 269 lesion-containing (355 lesions)
- Calculus: 1,766 normal, 244 lesion-containing (467 lesions)
- PR: 1,887 normal, 123 lesion-containing (144 lesions)
- Image Composition:
- Data Provenance: Retrospective, unblinded open-label, multi-site trials. The document does not specify the country of origin.
3. Number of Experts Used to Establish the Ground Truth for the Test Set and Qualifications
- Number of Experts: Four expert readers.
- Qualifications: The document identifies them as "expert readers" in the context of dental radiographs, but does not provide specific qualifications (e.g., years of experience, specialization like radiologist).
4. Adjudication Method for the Test Set
- Adjudication Method: Consensus approach based on agreement among at least three out of four expert readers. Each expert independently marked areas on the radiographs using the smallest possible rectangular bounding box.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done, and the Effect Size of how much Human Readers Improve with AI vs. without AI Assistance
- MRMC Study Done: Yes, a fully-crossed multi-reader, multi-case (MRMC) retrospective reader study was performed.
- Effect Size (Improvement with AI vs. without AI Assistance):
- Overall Detection Accuracy (wAFROC-FOM for Caries): Unaided: 0.740, Aided: 0.758. This difference was significant (P=0.0062).
- Sensitivity Improvement: The improvement in sensitivity for a single dental finding was in the range of 0.9% to 11.7%.
- For Caries, 68% of readers improved sensitivity.
- For MD, 76% of readers improved sensitivity.
- For Calculus, 88% of readers improved sensitivity.
- For PR, 100% of readers improved sensitivity.
- False Positive Rate (FPPI) Improvement: The improvement in false positive rate for a single dental pathology was in the range of 0.08 to 0.136.
- For Caries, 92% of readers improved FPPI.
- For MD, 96% of readers improved FPPI.
- For Calculus, 100% of readers improved FPPI.
- For PR, 36% of readers improved FPPI.
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was done
- Standalone Study Done: Yes. The document states: "Second Opinion® was clinically tested as a standalone device and in a fully-crossed multi-case (MRMC) reader study."
7. The Type of Ground Truth Used
- Ground Truth Type: Expert consensus. Specifically, agreement among at least three out of four expert readers who independently marked areas on radiographs using bounding boxes.
8. The Sample Size for the Training Set
- The document does not explicitly state the sample size used for the training set. It mentions the "computer vision models developed using machine learning techniques" and "open-source models using supervised machine learning techniques," implying a training set was used, but the size is not provided in the given text.
9. How the Ground Truth for the Training Set Was Established
- The document states that the computer vision models were developed using "supervised machine learning techniques." This implies that the training set had labels (ground truth) provided. However, the specific method of establishing this ground truth for the training set (e.g., expert consensus like the test set, or a different methodology) is not explicitly detailed in the provided text. It can be inferred that a similar process of expert labeling was used, but it's not confirmed.
§ 892.2070 Medical image analyzer.
(a)
Identification. Medical image analyzers, including computer-assisted/aided detection (CADe) devices for mammography breast cancer, ultrasound breast lesions, radiograph lung nodules, and radiograph dental caries detection, is a prescription device that is intended to identify, mark, highlight, or in any other manner direct the clinicians' attention to portions of a radiology image that may reveal abnormalities during interpretation of patient radiology images by the clinicians. This device incorporates pattern recognition and data analysis capabilities and operates on previously acquired medical images. This device is not intended to replace the review by a qualified radiologist, and is not intended to be used for triage, or to recommend diagnosis.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithms including a description of the algorithm inputs and outputs, each major component or block, and algorithm limitations.
(ii) A detailed description of pre-specified performance testing methods and dataset(s) used to assess whether the device will improve reader performance as intended and to characterize the standalone device performance. Performance testing includes one or more standalone tests, side-by-side comparisons, or a reader study, as applicable.
(iii) Results from performance testing that demonstrate that the device improves reader performance in the intended use population when used in accordance with the instructions for use. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, predictive value, and diagnostic likelihood ratio). The test dataset must contain a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results; and cybersecurity).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the intended reading protocol.
(iii) A detailed description of the intended user and user training that addresses appropriate reading protocols for the device.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) Device operating instructions.
(viii) A detailed summary of the performance testing, including: test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.