(195 days)
The Lucy Point-of-Care Magnetic Resonance Imaging Device is a bedside magnetic resonance imaging device for producing images that display the internal structure of the head where full diagnostic examination is not clinically practical. When interpreted by a trained physician, these images provide information that can be useful in determining a diagnosis.
Lucy is a magnetic resonance imaging (MRI) device. Its portability allows patient bedside imaging. It enables visualization of the internal structures of the head using standard magnetic resonance imaging contrasts. The main interface is a commercial off the shelf device, used to operate the system, provide access to patient data, exam set-up, exam execution, and MRI image data viewing for quality control purposes as well as for cloud storage interactions. Lucy can generate MRI data sets with a broad range of contrasts. The user interface includes touch screen menus, controls, indicators and navigation icons that allow the operator to control the system and to view imagery.
The provided text is a 510(k) summary for the Lucy Point-of-Care Magnetic Resonance Imaging Device. While it describes the device, its intended use, and a comparison to a predicate device, it does not contain information regarding an AI component or a study that specifically proves the device meets AI-related acceptance criteria. The acceptance criteria and performance data outlined below are based on general medical device regulatory submissions and what one would expect for an AI/ML-driven medical device, assuming the device had such a component.
Therefore, many of the requested details, particularly those related to AI algorithm performance (e.g., sample sizes for test and training sets, expert adjudication, MRMC studies, standalone performance, ground truth for AI), cannot be extracted from this specific document.
Based on the provided text, the device described is a Magnetic Resonance Imaging (MRI) device, and the submission is for its substantial equivalence to a predicate MRI device. There is no mention of an AI/ML component in the provided documentation, nor any study proving an AI component meets acceptance criteria.
Therefore, the following information is what would be expected for an AI-powered medical device, but cannot be directly extracted or inferred from the provided text.
Hypothetical Acceptance Criteria and Study (if the Lucy device had an AI component):
Given that the provided text describes a hardware medical device (MRI scanner) rather than an AI/ML algorithm, the concept of "acceptance criteria" and "study that proves the device meets the acceptance criteria" in the context of AI applies to the performance of an algorithm, not the hardware. Since no AI algorithm is mentioned in this document, the following is a hypothetical structure for what such a response would look like if an AI component were present.
1. Table of Acceptance Criteria and Reported Device Performance
If the Lucy device included an AI component (e.g., for automated lesion detection or image quality assessment), the acceptance criteria would typically revolve around diagnostic accuracy metrics.
Metric (Hypothetical for AI Component) | Acceptance Criteria (Hypothetical) | Reported Device Performance (Hypothetical) |
---|---|---|
Primary Endpoint (e.g., Sensitivity for detecting XYZ condition) | ≥ [Target %] for primary indication | [Achieved %] |
Secondary Endpoint (e.g., Specificity for detecting XYZ condition) | ≥ [Target %] | [Achieved %] |
ROC AUC (for classification tasks) | ≥ [Target value] | [Achieved value] |
Negative Predictive Value (NPV) | ≥ [Target %] | [Achieved %] |
Positive Predictive Value (PPV) | ≥ [Target %] | [Achieved %] |
Detection Rate (for certain pathologies) | Within [X]% of expert consensus | [Achieved %] |
False Positives per scan | ≤ [Target number] | [Achieved number] |
False Negatives per scan | ≤ [Target number] | [Achieved number] |
2. Sample Size Used for the Test Set and Data Provenance
- Sample Size (Hypothetical): Typically, several hundreds to thousands of relevant cases are used for a robust test set for AI/ML medical devices. For example, 500-1000 unique patient studies.
- Data Provenance (Hypothetical): Data from diverse geographic locations (e.g., multi-center studies including US, Europe, Asia) to ensure generalizability. Data would ideally be a mix of retrospective (for efficiency) and prospective (for real-world validation) collection. For initial clearance, often retrospectively collected data is used, but for broader clinical claims, prospective data is valuable.
3. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications
- Number of Experts (Hypothetical): Typically 3 to 5 independent experts are common, sometimes more for complex or ambiguous cases.
- Qualifications (Hypothetical): Board-certified radiologists with specific subspecialty expertise related to the device's intended use (e.g., neuroradiologists for head MRI), with significant years of experience (e.g., 5-10+ years) in interpreting the relevant imaging studies.
4. Adjudication Method for the Test Set
- Adjudication Method (Hypothetical): Common methods include:
- Majority Rule (e.g., 2+1 or 3+1): If 2 out of 3, or 3 out of 4, experts agree, that serves as the consensus ground truth. If no majority, a senior expert or a consensus discussion among experts may be employed for final arbitration.
- Consensus Panel: Experts meet and discuss all discordant cases to reach a unanimous decision.
- Primary Reader + Adjudicator: One expert makes the initial read, and another adjudicates discordant cases or a percentage of cases for quality control.
5. If a Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study was done
- MRMC Study (Hypothetical for AI): Yes, for devices intended to assist human readers, an MRMC study is standard practice.
- Effect Size (Hypothetical): The effect size would quantify the improvement in diagnostic performance (e.g., AUC, sensitivity, specificity, accuracy) of human readers with AI assistance compared to without AI assistance. For example, an MRMC study might show that radiologists' diagnostic accuracy for a specific condition increased by X% (e.g., 5-10%) and/or their reading time decreased by Y% when using the AI tool.
6. If a Standalone (algorithm-only without human-in-the-loop performance) was done
- Standalone Performance (Hypothetical for AI): Yes, standalone performance is almost always assessed for AI algorithms to understand the intrinsic capability of the algorithm before combining it with human input. This would be reported against the adjudicated ground truth.
7. The Type of Ground Truth Used
- Type of Ground Truth (Hypothetical for AI):
- Expert Consensus: The most common for imaging-based AI, established by independent highly-qualified experts.
- Pathology: Biopsy-proven results, considered the gold standard for many disease states (e.g., cancer).
- Clinical Outcomes Data: Longitudinal patient follow-up, lab tests, or other clinical findings that confirm the presence or absence of a condition.
- Hybrid: A combination of the above, often employing pathology or clinical outcomes where available, and expert consensus for cases where definitive pathological or outcome data is not feasible.
8. The Sample Size for the Training Set
- Training Set Sample Size (Hypothetical for AI): This varies significantly depending on the complexity of the task, the variety of conditions, and the imaging modality. It could range from thousands to hundreds of thousands or even millions of images/studies, often augmented with data synthesis techniques. For medical imaging, tens of thousands of studies are often used for robust training.
9. How the Ground Truth for the Training Set Was Established
- Training Set Ground Truth (Hypothetical for AI): This is typically less rigorously established than the test set ground truth due to the sheer volume of data, but must still be reliable. Methods include:
- Single Expert Annotation: A single trained expert (e.g., radiologist, technologist) labels the data.
- Automated Labeling from Reports: NLP tools might extract labels from existing clinical reports, followed by human review of a subset.
- Crowdsourcing (with Quality Control): For certain tasks, a large group of annotators might be used, with mechanisms for quality control and consensus.
- Referral to Clinical Records/EHR: Labels derived from the electronic health record (e.g., diagnosis codes, lab results) can serve as weak labels.
- Existing Clinical Labels: Utilizing labels already present in de-identified clinical datasets.
In summary, the provided document from the FDA clearance K192002 for the "Lucy Point-of-Care Magnetic Resonance Imaging Device" describes a hardware MRI system and its substantial equivalence to a predicate MRI system. It does not refer to any AI/ML component or associated performance studies.
§ 892.1000 Magnetic resonance diagnostic device.
(a)
Identification. A magnetic resonance diagnostic device is intended for general diagnostic use to present images which reflect the spatial distribution and/or magnetic resonance spectra which reflect frequency and distribution of nuclei exhibiting nuclear magnetic resonance. Other physical parameters derived from the images and/or spectra may also be produced. The device includes hydrogen-1 (proton) imaging, sodium-23 imaging, hydrogen-1 spectroscopy, phosphorus-31 spectroscopy, and chemical shift imaging (preserving simultaneous frequency and spatial information).(b)
Classification. Class II (special controls). A magnetic resonance imaging disposable kit intended for use with a magnetic resonance diagnostic device only is exempt from the premarket notification procedures in subpart E of part 807 of this chapter subject to the limitations in § 892.9.