(260 days)
Overjet Calculus Assist (OCalA) is a radiological automated concurrent-read computer-assisted detection software intended to aid in the detection of interproximal calculus deposits on both bitewing and periapical radiographs. The Overjet Calculus Assist surrounds suspected calculus deposits with a bounding box. The device provides additional information for the dentist to use in their diagnosis of a tooth surface suspected of containing calculus deposits. The device is not intended as a replacement for a complete dentist's review or their clinical judgment that takes into account other relevant information from the image or patient history. The system is to be used by professionally trained and licensed dentists.
Overjet Calculus Assist is a module within the Overjet Platform. The Overjet Calculus Assist (OCalA) software automatically detects interproximal calculus on bitewing and periapical radiographs. It is intended to aid dentists in the detection of calculus. It should not be used in lieu of full patient evaluation or solely relied upon to make or confirm a diagnosis. The system is to be used by professionally trained and licensed dentists.
Here's an analysis of the acceptance criteria and study findings for the Overjet Calculus Assist device, based on the provided text:
1. Table of Acceptance Criteria and Reported Device Performance
While specific acceptance criteria thresholds are not explicitly stated as numerical values in the document (e.g., "Sensitivity must be >= X%"), the document describes the performance testing conducted and implies that these results met the pre-specified requirements. The performance presented is what the FDA reviewed and deemed acceptable for clearance.
Metric (Type of Test) | Acceptance Criteria (Implied) | Reported Device Performance |
---|---|---|
Standalone Performance | Meets pre-specified requirements for sensitivity and specificity in calculus detection. | Sensitivity: |
- Bitewing: 74.1% (95% CI: 66.2%, 82.0%)
- Periapical: 72.9% (95% CI: 65.3%, 80.5%)
Specificity: - Bitewing: 99.4% (95% CI: 99.1%, 99.6%)
- Periapical: 99.6% (95% CI: 99.3%, 99.8%)
AFROC AUC: - Bitewing: 0.859 (95% CI: 0.823, 0.894)
- Periapical: 0.867 (95% CI: 0.828, 0.903) |
| Clinical Performance (Reader Improvement) | Demonstrates superiority of aided reader performance versus unaided reader performance. | Reader Sensitivity (Unassisted vs. Assisted): - Bitewing: Improved from 74.9% (68.3%, 80.2%) to 84.0% (78.8%, 88.2%)
- Periapical: Improved from 74.7% (69.9%. 79.0%) to 84.4% (78.8%, 89.2%)
Reader Specificity (Unassisted vs. Assisted): - Bitewing: Decreased slightly from 98.8% (98.7%, 99.0%) to 98.6% (98.4%, 98.9%)
- Periapical: Decreased slightly from 98.1% (97.8%, 98.4%) to 98.0% (97.7%, 98.4%)
Reader AFROC AUC (Unassisted vs. Assisted - Average of all readers): - Bitewing: Increased from 0.840 (0.800, 0.880) to 0.878 (0.844. 0.913) (p-value 0.0055)
- Periapical: Increased from 0.846 (0.808. 0.884) to 0.900 (0.870, 0.929) (p-value 1.47e-05) |
2. Sample Sizes Used for the Test Set and Data Provenance
-
Standalone Test Set:
- Bitewing Radiographs: 296
- Periapical Radiographs: 322
- Total Surfaces (Bitewing): 6,121
- Total Surfaces (Periapical): 3,595
- Data Provenance: Not explicitly stated, but subgroup analyses for "sensor" and "clinical site" suggest real-world, diverse data. The document does not specify if the data was retrospective or prospective, or the country of origin.
-
Clinical Evaluation (Reader Improvement) Test Set:
- Bitewing Radiographs: 292 (85 with calculus, 211 without calculus)
- Periapical Radiographs: 322 (89 with calculus, 233 without calculus)
- Data Provenance: Not explicitly stated regarding retrospective/prospective or geographical origin.
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
- Ground Truth Establishment for Clinical Evaluation Test Set:
- Number of Experts: 3 US-licensed dentists formed a consensus for initial labeling. An oral radiologist provided adjudication for non-consensus labels.
- Qualifications of Experts: "US-licensed dentists" and an "oral radiologist." Specific years of experience or specialization within dentistry beyond "oral radiologist" are not provided.
4. Adjudication Method for the Test Set
- Clinical Evaluation Test Set Adjudication:
- Ground truth was established by consensus labels of three US-licensed dentists.
- Non-consensus labels were adjudicated by an oral radiologist. This effectively represents a 3-reader consensus with a 1-reader expert adjudication for disagreements.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was Done, What was the Effect Size of How Much Human Readers Improve with AI vs Without AI Assistance?
- Yes, an MRMC comparative effectiveness study was done. It was described as a "multi-reader, fully crossed reader improvement study."
- Effect Size (Improvement with AI vs. without AI assistance):
- Sensitivity Improvement:
- Bitewing: 9.1% (84.0% - 74.9%)
- Periapical: 9.7% (84.4% - 74.7%)
- AFROC AUC Improvement (Reader Average):
- Bitewing: 0.038 (0.878 - 0.840), with a p-value of 0.0055 (statistically significant)
- Periapical: 0.054 (0.900 - 0.846), with a p-value of 1.47e-05 (statistically significant)
- Specificity: There was a slight decrease in specificity (0.1-0.2%) when assisted, which is common in CADe systems where increased sensitivity might lead to a minor trade-off in specificity.
- Sensitivity Improvement:
6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) was Done
- Yes, a standalone performance test was conducted.
- The results are detailed in the "Standalone Testing" section, including sensitivity, specificity, and AFROC AUC for the AI algorithm alone.
7. The Type of Ground Truth Used (Expert Consensus, Pathology, Outcomes Data, etc.)
- For both Standalone and Clinical Evaluation Studies:
- The ground truth was established by expert consensus of US-licensed dentists, with adjudication by an oral radiologist for disagreements. This is a type of "expert consensus" ground truth. The document does not mention pathology or outcomes data.
8. The Sample Size for the Training Set
- The document does not provide the sample size of the training set for the AI model. It only details the test set used for performance evaluation.
9. How the Ground Truth for the Training Set Was Established
- The document does not specify how the ground truth for the training set was established. It only describes the ground truth methodology for the test set used in performance validation.
§ 892.2070 Medical image analyzer.
(a)
Identification. Medical image analyzers, including computer-assisted/aided detection (CADe) devices for mammography breast cancer, ultrasound breast lesions, radiograph lung nodules, and radiograph dental caries detection, is a prescription device that is intended to identify, mark, highlight, or in any other manner direct the clinicians' attention to portions of a radiology image that may reveal abnormalities during interpretation of patient radiology images by the clinicians. This device incorporates pattern recognition and data analysis capabilities and operates on previously acquired medical images. This device is not intended to replace the review by a qualified radiologist, and is not intended to be used for triage, or to recommend diagnosis.(b)
Classification. Class II (special controls). The special controls for this device are:(1) Design verification and validation must include:
(i) A detailed description of the image analysis algorithms including a description of the algorithm inputs and outputs, each major component or block, and algorithm limitations.
(ii) A detailed description of pre-specified performance testing methods and dataset(s) used to assess whether the device will improve reader performance as intended and to characterize the standalone device performance. Performance testing includes one or more standalone tests, side-by-side comparisons, or a reader study, as applicable.
(iii) Results from performance testing that demonstrate that the device improves reader performance in the intended use population when used in accordance with the instructions for use. The performance assessment must be based on appropriate diagnostic accuracy measures (
e.g., receiver operator characteristic plot, sensitivity, specificity, predictive value, and diagnostic likelihood ratio). The test dataset must contain a sufficient number of cases from important cohorts (e.g., subsets defined by clinically relevant confounders, effect modifiers, concomitant diseases, and subsets defined by image acquisition characteristics) such that the performance estimates and confidence intervals of the device for these individual subsets can be characterized for the intended use population and imaging equipment.(iv) Appropriate software documentation (
e.g., device hazard analysis; software requirements specification document; software design specification document; traceability analysis; description of verification and validation activities including system level test protocol, pass/fail criteria, and results; and cybersecurity).(2) Labeling must include the following:
(i) A detailed description of the patient population for which the device is indicated for use.
(ii) A detailed description of the intended reading protocol.
(iii) A detailed description of the intended user and user training that addresses appropriate reading protocols for the device.
(iv) A detailed description of the device inputs and outputs.
(v) A detailed description of compatible imaging hardware and imaging protocols.
(vi) Discussion of warnings, precautions, and limitations must include situations in which the device may fail or may not operate at its expected performance level (
e.g., poor image quality or for certain subpopulations), as applicable.(vii) Device operating instructions.
(viii) A detailed summary of the performance testing, including: test methods, dataset characteristics, results, and a summary of sub-analyses on case distributions stratified by relevant confounders, such as lesion and organ characteristics, disease stages, and imaging equipment.