Search Results

Trade/Device Name: Sway System Sports Plus Regulation Number: 21 CFR 882.1471
|
| PrimaryClassificationRegulation: | 21 CFR §882.1471
Conclusion: Sway Sports+ falls within the type of device regulated under 21 CFR §882.1471, Computerized

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

The Sway System Sports Plus is intended for use as a computerized cognitive assessment and management of concussion in individuals ages 18-24. The Sway System Sports Plus is indicated for use when there is a suspected concussion or head injury.

Device Description

Sway Sports Plus (referred to as Sway Sports+) is a licensed variant of the Device Sway System and is a software solution (SaMD) that provides a computerized cognitive assessment aid for concussion. This prescription device uses an individual's score(s) on a battery of cognitive tasks to provide an indication of the current level of cognitive function in response to a suspected concussion. The computerized cognitive assessment aid for concussion is used only as an assessment aid in the management of concussion to determine cognitive function for patients after a potential concussive event where other diagnostic tools are available and does not identify the presence or absence of concussion. The examiner may have the patient complete the Sway Balance test to provide an additional indicator in their assessment of concussion. The Sway System is not intended as a stand-alone diagnostic device. After the patient completes the Sway Sports+ test(s), the examiner can view the summary statistics for each test such as mean reaction time (speed). Sway Sports+ does not create new information through analysis or interpretation, nor does it offer any diagnostic or treatment recommendations. The Sway Sports+ is not intended to be used as a standalone diagnostic device. Sway Sports+ operates in the Apple iOS or Android operating systems, enabling it to be used on the hardware of Apple iPhone. iPad, and iPod Touch. or Android mobile devices. The internal accelerometer is accessed to analyze motion of the device in response to stimuli during testing. The mobile device hardware is not provided by Sway Medical. Environments where Sway Sports+ may be used may include both clinical and non-clinical settings. Clinical settings may include physician offices, hospitals, physical or occupational therapy clinics, skilled nursing facilities, assisted living centers and other healthcare facilities where cognitive tests may be evaluated. In addition, non-clinical environments may include sports team's locations where the application may be used on the sideline to perform a cognitive assessment, e.g., fitness centers, gymnasiums, or stadiums. It is recommended that the tests be taken in a supervised environment for baseline and post-injury testing. Sway Sports+ consists of the Sway Balance (K121590) and the Sway Computerized Cognitive Assessment Aid. The Sway Computerized Cognitive Assessment Aid (PTY) is exempt from 510(k).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study information for the Sway System Sports Plus, based on the provided text:

Acceptance Criteria and Device Performance

The provided text details the non-clinical and clinical performance testing results, and how they relate to the substantial equivalence claim rather than explicit "acceptance criteria" in a pass/fail table format. However, we can infer the acceptance criteria from the reported performance results and the comparison to predicate devices.

Table 1: Inferred Acceptance Criteria and Reported Device Performance

Performance Metric	Inferred Acceptance Criteria (Based on Predicate/Reference Equivalence & Clinical Utility)	Reported Device Performance (Sway Sports Plus)
Software Validation	Compliance with IEC 62304 and FDA guidance for software V&V. Meeting predetermined specifications.	Designed and developed according to Sway Medical, Inc.'s internal software development process in accordance with IEC 62304. Tested using verification and validation methods. Results indicate compliance with specifications. All tests passed with no exceptions.
Construct Validity	Demonstrates construct validity with traditional neuropsychological tests and similar correlations to predicate devices.	VanRavenhorst-Bell et al., 2021: Sway Reaction Time, Impulse Control, and Inspection Time módulos correlated significantly (r = -0.46 to 0.22, p ≤ 0.05) with ImPACT QT reaction time measures. Clark & VanRavenhorst-Bell, 2022: Sway Sports+ modules exhibited correlation values ranging from 0.22 to 0.42 with traditional neurocognitive tests (WAIS-IV, WMS-IV, D-KEFS, CVLT-III). Demonstrated stronger correlation strengths with traditional neurocognitive tests compared to ImPACT. Significant correlations with ImPACT and traditional tests (p < 0.001 and p < 0.05).
Test-Retest Reliability	Adequate or better test-retest reliability (e.g., ICC values generally ≥ 0.60 or similar to predicate devices).	Van Patten et al., 2021: ICC values for weeks 1-2 ranged from 0.60 to 0.8; weeks 1-3 ranged from 0.68 to 0.88. ANOVA showed non-significant differences over three weeks. Caccese et al., 2022: For Sway Balance, ICC for Weeks 1 & 2 was 0.82, Weeks 1 & 3 was 0.88. ANOVA showed no significant differences. Additional Study (healthy adults 18-35): ICC values ranged between 0.82 and 0.92 for all Sway Sports+ modules with a one-week interval. No statistically significant differences between test and retest. Additional Study (concussed individuals): Robust level of reliability during post-concussion testing, comparable to baseline evaluations (within-session design).
Sensitivity	Ability to effectively identify concussed individuals, comparable to predicate devices.	Chikar & Curtiss, 2023: Sensitivity of 69.6% (32/46) for predicting concussion group membership, outperforming ImPACT (58.7% at 80% CI). McNemar's test showed no significant difference (p=0.383). Additional Study: High sensitivity.
Specificity	Ability to correctly identify non-concussed individuals, avoiding excessive false positives, or acceptable levels given the clinical context.	Additional Study: Moderate specificity, suggesting a tendency to over-identify concussed individuals (potential false positives). However, in a clinical context with healthcare professionals, such false positives can be eliminated. Very low rate of false negatives.
Normative Data	Establishment of a robust normative database
to aid in clinical interpretation, stratified by relevant demographics (e.g., age, gender).	De-identified baseline test data from over 126,000 users aged 18 to 24 years old (US, English speakers, no recent concussion/neurological conditions/learning disability, valid scores). Data stratified across age and gender. Age/gender groups required a minimal size of 50 profiles. Results indicate increased performance with age (peaking early to mid-20s) and gender-based differences, leading to age/gender-specific normative values.

Study Details

Sample size used for the test set and the data provenance:
- Validity Studies:
  - VanRavenhorst-Bell et al., 2021: 88 healthy adults aged 18 to 48 years (mean 22.09 ± 4.47 years). Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Retrospective as it compares existing tools.
  - Clark & VanRavenhorst-Bell, 2022: 85 adults (mean age 23.1 years; 68% female). Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Prospective (all tests administered in one session by trained professionals).
- Reliability Studies:
  - Van Patten et al., 2021: 55 adults (69% women) with a mean age of 26.69 years (± 9.89; range = 18-58). Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Prospective (assessed remotely over 3 subsequent weeks).
  - Caccese et al., 2022: 55 healthy adults (69% women) with a mean age of 26.69 years (SD = 9.89; range = 18-58). Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Prospective (assessed remotely over 3 subsequent weeks).
  - Additional study (healthy adults): 97 participants initially recruited, with 56 completing all test and retest sessions (one-week interval). Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Prospective.
  - Additional study (concussed individuals): Utilized a dataset derived from NCAA member schools. Data provenance implies U.S. Retrospective (analyzing existing NCAA data for a within-session design).
- Sensitivity & Specificity Studies:
  - Chikar & Curtiss, 2023: 46 athletes (mean age 19.6, SD=1.4; 59% male) with confirmed concussions. Data from athletic organizations that conducted both ImPACT and Sway assessments. Data provenance implies U.S. and likely retrospective.
  - Additional study (sensitivity/specificity): No explicit sample size given, describes "high sensitivity but moderate specificity." Data provenance not explicitly stated, but implies U.S. based on the context of FDA submission. Likely retrospective analysis of existing data.
  - Additional pilot study (ImPACT vs. Sway): 24 high school and college-aged participants. Data provenance implies U.S. Prospective (underwent baseline and post-injury testing with both tools).
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The document primarily relies on comparisons to established and FDA-cleared devices (ImPACT QT, ImPACT, traditional neurocognitive assessments) and clinical standards for concussion diagnosis (NCAA Concussion Safety Protocol guidelines).
- For the Clark & VanRavenhorst-Bell, 2022 validity study, "all tests administered by trained professionals." Specific number or qualifications of these professionals (e.g., neuropsychologists, physicians) are not detailed.
- For the concussed individuals reliability study and the sensitivity/specificity pilot study, the "clinical standards for concussion diagnosis were strictly followed according to NCAA Concussion Safety Protocol guidelines" or involved "athletic organizations that conducted both ImPACT and Sway assessments." This implies diagnosis by medical professionals (e.g., team physicians, athletic trainers) adhering to established protocols, but no specific number or detailed qualifications of these "experts" is provided.
Adjudication method for the test set:
- No explicit adjudication method (e.g., 2+1, 3+1) is described for establishing ground truth within the provided studies. The ground truth often relies on the established scores/diagnoses from predicate devices or clinical protocols.
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No MRMC comparative effectiveness study involving human readers' improvement with AI vs. without AI assistance is described. The device is a "computerized cognitive assessment aid," meaning it provides data, but does not inherently involve human "readers" interpreting images (as would be common in diagnostic imaging MRMC studies). The studies focus on the device's measurement properties and its correlation with other assessment tools.
If a standalone (i.e. algorithm only without human-in-the loop performance) was done:
- Yes, the studies evaluate the Sway Sports+ system (algorithm/software) primarily in a standalone capacity, assessing its internal consistency (reliability), its correlation with established cognitive tests (validity), and its ability to distinguish between concussed and non-concussed states (sensitivity/specificity) based on its output.
- The document states, "While Sway Sports+ is not intended to be used to classify a patient as having a concussion, it may be useful to create criteria based on the device's output to classify a patient as concussion positive or negative, to allow comparison to predicate devices..." and "Sway Sports+ does not create new information through analysis or interpretation, nor does it offer any diagnostic or treatment recommendations. The Sway Sports+ is not intended to be used as a standalone diagnostic device." However, the performance studies themselves are evaluating the device's output directly. The implication is that a human still needs to interpret these outputs in the broader clinical context. The sensitivity/specificity studies show how well the device's output alone, when used to classify, performs.
The type of ground truth used:
- Comparison to predicate devices: ImPACT QT, ImPACT, traditional neurocognitive assessments (Wechsler Adult Memory Scale-IV (WMS-IV), Wechsler Adult Intelligence Scale-IV (WAIS-IV), Delis-Kaplan Executive Function System (D-KEFS), and California Verbal Learning Test – 3rd Edition (CVLT-III)).
- Clinical Diagnosis: Concussion diagnosis confirmed by medical professionals following NCAA Concussion Safety Protocol guidelines.
- Healthy Controls: Participants recruited as "healthy adults" or "healthy controls."
The sample size for the training set:
- Normative Database: Over 126,000 de-identified baseline test data records from users aged 18 to 24 years old were used to establish the normative database. This effectively serves as a large training/reference set for individual score comparison.
- No specific "training set" size for the AI algorithm itself (if applicable, as it's a "computerized cognitive assessment aid" not explicitly framed as deep learning AI) is provided beyond the normative data.
How the ground truth for the training set was established:
- For the normative database (126,000+ users), the ground truth was established by:
  - Self-reported health status: Users "did not report a concussion within the last 6 months, did not report neurological conditions, did not report learning disability."
  - Behavioral criteria: "Completed a baseline test in English, completed a baseline test in the United States of America, primary language was English, had valid scores on all tests."
  - This establishes a "healthy, non-concussed baseline" for reference rather than a diagnostic ground truth.

Ask a Question

Ask a specific question about this device

K Number

K231688

Device Name

ImPACT Version 4

Manufacturer

ImPACT Applications, Inc.

Date Cleared

2023-09-16

(99 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K170551,K202485

Predicate For

N/A

Why did this record match?

510k Summary Text (Full-text Search) :

Coralville, Iowa 52241

Re: K231688

Trade/Device Name: ImPACT Version 4 Regulation Number: 21 CFR 882.1471
|
| Product Code: | POM, 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT is intended for use as a computer-based neurocogntive test battery to aid in the assessment and management of concussion.

ImPACT is a neurocognitive test battery that provides healthcare professionals with objective measure of neurocognitive functioning as an assessment aid and in the management of concussion in individuals ages 12-80.

Device Description

ImPACT® (Immediate Post-Concussion Assessment and Cognitive Testing) is a computer-based neurocognitive test battery that allows healthcare professionals to conduct a series of tests on individuals to gather data related to the neurocognitive functioning of the test subject. This test battery measures various aspects of neurocognitive functioning including reaction time, memory, attention, spatial processing speed, and records symptoms of a test subject. ImPACT Version 4 is similar to the paper-and-pencil neuropsychological tests that have long been used by psychologists to evaluate cognition, and memory related to a wide variety of disabilities.

The device is not intended to provide a direct diagnosis or a return-to-activity recommendation, it does not directly manage or provide any treatment recommendations, and any interpretation of the results should be made only by qualified healthcare professional. The neurocognitive assessment represents only one aspect of assisting healthcare professionals in evaluating and managing individuals with cognitive function impairment related to TBI (concussion).

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study proving the device meets them, based on the provided text:

Acceptance Criteria and Reported Device Performance

The document primarily focuses on demonstrating substantial equivalence to a predicate device (ImPACT Version 4, K202485) rather than explicitly listing quantitative acceptance criteria for each neurocognitive measure. However, the core of the performance demonstration rests on the establishment of a normative database for iPad use and test-retest reliability consistent with previous versions.

Implied Acceptance Criteria and Performance:

Acceptance Criteria (Implied)	Reported Device Performance
1. Normative Database for iPad: The device must establish a reliable and representative normative database for performance on an iPad, allowing for accurate interpretation of test results compared to a healthy population.	A prospective clinical investigation was conducted to collect data for standardization and to construct the normative database for tests performed on an iPad. The normative sample included 1495 subjects ages 12-59 (670 males, 825 females). Data was collected prospectively from 4 different sites across the US in 2022 and 2023. These sites were approved by two ethics boards (Advarra IRB Services and St. Joseph's University in Philadelphia). Data collection occurred in a mixed environment (supervised and unsupervised testing) to approximate real-world conditions. All subjects met inclusion criteria consistent with previous normative data creation (age 12-59, primary English speaking, no concussion in past 6 months, no known physical/neurological/behavioral/psychological impairment affecting test, corrected hearing/vision impairments, signed IRB consent).
2. Test-retest Reliability (iPad): The neurocognitive measures on the iPad must demonstrate high consistency over time, meaning a subject's scores should be similar if they take the test multiple times within a reasonable period, assuming no change in their cognitive state.	Test-retest reliability was calculated in a sample of 116 individuals ages 12-59 who were part of the standardization sample. They completed an initial baseline assessment on an iPad and a second baseline within 7-21 days (mean=12.7 days, SD=4.3 days). Pearson's Product-Moment Correlation coefficients and Intra-class correlation coefficients (ICCs) were calculated for ImPACT Composite Scores and Two Factor Scores. The reported Pearson's correlations and ICCs were "consistent with those from the test-retest coefficients obtained using Mouse and Trackpad inputs of the predicate device." This indicates that the iPad version maintains the reliability characteristics of the established predicate device.
3. Safety and Effectiveness (Overall Equivalence): The device modifications (specifically the iPad platform and related software for touchscreen input and normative data) must not adversely affect the safety or effectiveness for its intended use, nor raise new questions of safety and effectiveness.	The document states: "The differences between the two devices described above do not affect the safety or effectiveness of ImPACT Version 4.1 for its intended use and do not raise new questions of safety and effectiveness, which was demonstrated through risk management and performance testing including software verification and validation, clinical investigations and non-clinical assessments." Risk management activities were conducted per ISO 14971, assuring risks are controlled and the new device has the same safety and risk profile as the predicate. Software verification and validation (IEC 62304, FDA Guidance "General Principles of Software Validation") included code reviews, design reviews, automation/manual testing, and regression testing, with all tests meeting acceptance criteria.

Detailed Study Information:

Sample Size and Data Provenance:
- Test Set (Normative Data):
  - Sample Size: 1495 subjects.
  - Data Provenance: Prospectively collected from 4 different sites across the US in 2022 and 2023.
  - Type: Prospective.
- Test Set (Test-retest Reliability):
  - Sample Size: 116 individuals, a subset of the normative sample.
  - Data Provenance: Prospectively collected from the same US sites as the normative data, in 2022 and 2023.
  - Type: Prospective.
Number of Experts and Qualifications for Ground Truth:
- The document does not explicitly state the number or qualifications of experts used to establish a "ground truth" in the traditional sense of a diagnostic consensus.
- For the normative and test-retest studies, the "ground truth" is primarily based on the self-reported health status of the participants (e.g., "not suffering from a concussion or being treated for a concussion in the past 6 months," "no known physical, neurological, behavioral or psychological impairment"). The inclusion criteria, which participants had to meet, effectively defined the "healthy" or "normal" population against which the device's performance is normed.
- Clinical experts were part of the "cross-functional team" involved in "Walkthroughs and design reviews of mock-ups and prototypes" during software verification, but not explicitly for ground truth establishment for the clinical study data itself.
Adjudication Method for the Test Set:
- The document does not describe an adjudication method for the clinical study data. The studies focused on collecting representative data from healthy individuals to establish norms and assess reliability, rather than evaluating the device's diagnostic accuracy against a separate, adjudicated clinical diagnosis.
Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:
- No, an MRMC comparative effectiveness study was not performed. This device is a neurocognitive test battery, not an imaging AI diagnostic aid meant to be used by multiple readers on the same cases. The "human-in-the-loop" aspect is not about human readers interpreting AI output, but rather healthcare professionals using the device's objective measures to aid in assessment and management.
Standalone Performance:
- The device ImPACT is a "computer-based neurocognitive test battery." The performance described (normative data collection, test-retest reliability) is inherently the "algorithm only" performance, where the "algorithm" refers to the test battery itself and its scoring methodology. The device provides "objective measure of neurocognitive functioning," which is its standalone output. The interpretation of these results is then done by a qualified healthcare professional.
- So, yes, a standalone performance assessment was done in the form of normative data collection and test-retest reliability, demonstrating the device's intrinsic ability to measure cognitive function consistently in an uninjured population.
Type of Ground Truth Used:
- The primary ground truth for the normative database was the self-reported clinical status and inclusion/exclusion criteria of the participants, aiming for a "healthy" or "uninjured" cognitive state. This serves as the reference for "normal" cognitive functioning.
- For the test-retest reliability, the implicit ground truth is that the cognitive state of the healthy individuals should not have changed significantly between the two tests, allowing for an evaluation of the device's consistency.
Sample Size for the Training Set:
- The document does not explicitly describe a separate "training set" for an AI or machine learning model in the context of this 510(k) submission.
- The term "normative database" sometimes functions similarly to a reference or training set in that it establishes what "normal" looks like. In this case, the 1495 subjects for the normative database building represent the data used to define the expected range of performance.
- The "training" of the neurocognitive test itself, being a standardized battery, is more akin to its initial development and validation stages prior to the specific modifications addressed in this 510(k). The current submission focuses on adapting an existing, validated test to a new platform (iPad) and updating its normative reference.
How the Ground Truth for the Training Set Was Established:
- As mentioned above, if the normative database is considered the "training set" for defining normal performance, its "ground truth" was established by prospectively enrolling individuals who met strict inclusion criteria indicating they were healthy, English-speaking, not suffering from or recently treated for a concussion, and without other known neurological/physical impairments that would affect test performance. This was overseen by ethics boards and involved multiple US sites.

Ask a Question

Ask a specific question about this device

K Number

K201376

Device Name

ANAM Test System

Manufacturer

Vista LifeSciences, Inc.

Date Cleared

2021-03-25

(303 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN150037

Predicate For

N/A

Why did this record match?

510k Summary Text (Full-text Search) :

Englewood, Colorado 80112

Re: K201376

Trade/Device Name: ANAM Test System Regulation Number: 21 CFR 882.1471
|
| Classification Regulation: | 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

The ANAM® Test System: Military, Military Expanded, Core, and Sports Batteries are intended for use as computerbased neurocognitive test batteries to aid in the assessment of mild traumatic brain injury. The ANAM Test System: Military, Military Expanded, Core, and Sports Batteries are neurocognitive test batteries that provide healthcare professionals with objective measure of neurocognitive functioning as assessment aids and in the management of mild traumatic brain injury in individuals ages 13-65.

The ANAM Test System should only be used as an adjunctive tool for evaluating cognitive function.

Device Description

The ANAM Test System is a software only device that provides clinicians with objective measurements of cognitive performance in populations, to aid in the assessment and management of concussion. ANAM measures various aspects of neurocognitive functioning including reaction time, memory, attention, and spatial processing speed. It also records symptoms of concussion in the test taker.

The software is downloaded from the Vista LifeSciences website and is for use on Dell Inspiron 15 3000 Series or similar Windows PC model or Android Samsung Galaxy tablet or similar Android device. The hardware is not provided as part of the device but is purchased separately by the user. Each ANAM battery consists of a collection of pre-selected modules that are administered in a sequential manner.

Specific modules included in the ANAM Test System:

1. Questionnaires
- 1. Demographics
- 1. Mood Scale
- 1. Neurobehavioral Symptom Inventory (NSI)
- 1. PTSD Checklist (PCL)
- 1. Sleepiness Scale
- 1. Symptoms Checklist
- 1. TBI Questionnaire

Performance Tests

1. Code Substitution Learning
1. Code Substitution Delayed
1. Go/No-Go"
1. Matching to Sample*
1. Mathematical Processing
1. Memory Search
1. Procedural Reaction Time
1. Simple Reaction Time*
1. Spatial Processing*
*Available for tablet platform.

The tests and questionnaires can be combined into custom batteries or users can choose from pre-configured standardized batteries. The standardized batteries include ANAM Core, ANAM Sports. ANAM Military, and ANAM Military-Expanded. These standardized batteries have fixed test settings and parameters to ensure standardized presentation and enable comparison to normative data.

AI/ML Overview

Here's a breakdown of the requested information regarding the acceptance criteria and the study that proves the device meets them, based on the provided FDA 510(k) summary for the ANAM Test System.

Please note: The provided document is a 510(k) summary, which focuses on demonstrating substantial equivalence to a predicate device rather than detailing specific acceptance criteria and the full study methodology in the way a clinical study report would. Therefore, some information, particularly regarding specific acceptance criteria values, sample sizes for test and training sets, number and qualification of experts, adjudication methods, and MRMC study specifics, is not explicitly stated in this document. The document primarily highlights the "numerous studies" that have examined concurrent validity.

Acceptance Criteria and Reported Device Performance

Given the nature of the 510(k) submission, the "acceptance criteria" are implicitly tied to demonstrating substantially equivalent performance to the predicate device, ImPACT, in providing an objective measure of neurocognitive functioning to aid in the assessment and management of mild traumatic brain injury (mTBI).

Acceptance Criteria (Inferred from Substantial Equivalence)	Reported Device Performance (Summary from Document)
Intended Use Equivalence: Aid in assessment and management of mTBI by providing objective measures of neurocognitive functioning.	The ANAM Test System's intended use is identical to the predicate device: "intended for use as computer-based neurocognitive test batteries to aid in the assessment and management of mild traumatic brain injury." (p. 5) The ANAM Test System measures various aspects of neurocognitive functioning, including reaction time, memory, attention, and spatial processing speed, similar to the predicate's measurement of verbal and visual memory, visual motor speed, impulse control, and reaction time.
Safety and Effectiveness Equivalence: Reliable measure of cognitive function comparable to the predicate device.	The 510(k) submission states, "The 510(k) submission includes the results of numerous studies that have examined the concurrent validity of ANAM as a clinical tool by documenting correlations with traditional neuropsychological tests with both normal and concussed populations. The results of these studies demonstrate that ANAM provides a reliable measure of cognitive function for use as an assessment aid and in the management of concussion and is therefore substantially equivalent to the predicate device." (p. 5)
Patient Population Equivalence: Individuals aged 13-65.	The ANAM Test System is indicated for individuals aged 13-65, which is largely similar to the predicate device's age range of 12-59 years.
Fundamental Neurocognitive Functions Measured: Assess core cognitive domains relevant to mTBI.	ANAM measures "response speed, attention/concentration, immediate and delayed memory, spatial processing, inhibition, and decision processing speed and efficiency." This aligns with the types of functions measured by the predicate device (verbal and visual memory, visual motor speed, impulse control, and reaction time). (p. 5)
Reporting and Interpretation Features: Provide meaningful data for clinical interpretation, including comparison to normative data and reliable change indices.	ANAM provides raw scores, standard scores (from a normative database), and Reliable Change Indices (RCI) for individual tests and an ANAM Composite Score (ACS). This is comparable to the predicate device's provision of composite scores, percentile scores, and RCIs. (p. 5)

Study Proving Device Meets Acceptance Criteria

The document states that the substantial equivalence determination is based on "numerous studies" that have examined the concurrent validity of ANAM.

Sample sizes used for the test set and the data provenance:
- Sample Size: Not explicitly stated for specific test sets within this summary. It mentions "numerous studies" with "normal and concussed populations."
- Data Provenance: Not specified (e.g., country of origin). The document implies the studies are part of the broader clinical evidence for the device. The studies examined "concurrent validity" by correlating ANAM with "traditional neuropsychological tests." This suggests retrospective or prospective clinical data involving real patient populations.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- Not specified within the 510(k) summary. The ground truth for "concurrent validity" studies would typically involve the results from the "traditional neuropsychological tests" administered by qualified neuropsychologists or clinicians.
Adjudication method (e.g. 2+1, 3+1, none) for the test set:
- Not specified. The studies focused on "concurrent validity" which suggests a comparison to established, validated neuropsychological tests rather than an adjudication process between human readers for diagnostic consensus.
If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- Not applicable/Not mentioned. The ANAM Test System is described as a "computer-based neurocognitive test battery" and a "software only device." It functions as an adjunctive tool to aid healthcare professionals in assessment and management. This type of device typically provides quantitative measures of cognitive function directly, rather than assisting human 'readers' in interpreting medical images or complex data in an MRMC study setup. The studies focus on the validity of the test itself in measuring cognitive function.
If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:
- Yes, implicitly. The "performance data" section (p. 5) discusses the results of studies examining the "concurrent validity of ANAM as a clinical tool by documenting correlations with traditional neuropsychological tests." This refers to the standalone performance of the ANAM test in measuring cognitive function and its correlation with established measures, without direct human-in-the-loop assistance for interpreting the ANAM results, though a human clinician ultimately uses the results in their assessment.
The type of ground truth used (expert consensus, pathology, outcomes data, etc):
- The ground truth relies on the scores and results derived from "traditional neuropsychological tests" administered to "normal and concussed populations." This implies that expert diagnosis of concussion and the established measurements from these traditional gold-standard tests served as the reference for determining concurrent validity.
The sample size for the training set:
- Not specified in the 510(k) summary. Given that it's a "computer-based neurocognitive test battery" and not a typical AI/ML algorithm that requires a discrete training dataset in the same way, any "training" would likely refer to the iterative development and validation of the test components and norming data based on large population studies, rather than a separate "training set" for an explicit AI model. The document mentions "normative data" and a "normative database."
How the ground truth for the training set was established:
- For the establishment of "normative data" (which would serve a similar function to a training set for a traditional AI model), the ground truth would be established through the collection of ANAM test performance data from large, healthy populations across different demographics, age groups, and potentially educational levels. This process involves statistical analysis to define "normal" ranges and establish a reliable baseline against which individual performance can be compared. The summary mentions "standard scores (calculated with the normative database)" and "the summed T-score of a normative control group."

Ask a Question

Ask a specific question about this device

K Number

K202485

Device Name

ImPACT Version 4

Manufacturer

ImPACT Applications, Inc.

Date Cleared

2020-12-25

(116 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K181223

Predicate For

K231688

Why did this record match?

510k Summary Text (Full-text Search) :

Coralville, Iowa 52241

Re: K202485

Trade/Device Name: ImPACT Version 4 Regulation Number: 21 CFR 882.1471
|
| Product Code: | POM, 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT is intended for use as a computer-based neurocognitive test battery to aid in the assessment and management of concussion.

ImPACT is a neurocognitive test battery that provides healthcare professionals with objective measures of neurocognitive functioning as an assessment aid and in the management of concussion in individuals ages 12-80.

Device Description

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study information for the ImPACT Version 4 device, based on the provided text:

1. Table of Acceptance Criteria and Reported Device Performance

The FDA submission primarily focuses on demonstrating substantial equivalence to a predicate device (ImPACT Version 3.3.0) rather than listing specific, quantitative acceptance criteria for each functional aspect. The document indicates that all software verification and validation tests met the required acceptance criteria, but these criteria are described generally rather than with specific metrics.

However, based on the summary of performance testing and clinical data, the implicit acceptance criteria relate to:

Software Functionality: The software performs as intended, modifications did not affect existing functionality.
Safety and Effectiveness: The device modifications do not raise new questions of safety and effectiveness.
Test-Retest Reliability (for extended age range): Cognitive performance should remain stable over time.
Construct Validity (for extended age range): ImPACT Version 4 scores should correlate with established neuropsychological tests.
Normative Database Quality: The normative data should be accurately established for the specified age ranges, differentiating between input device types.

Acceptance Criteria (Inferred from testing)	Reported Device Performance
Software functionality met design specifications.	All software verification activities (code reviews, design reviews, walkthroughs, software V&V testing, regression testing) met "required acceptance criteria." Modifications did not affect existing functionality.
Device modifications do not affect safety or effectiveness.	"The differences between the two devices... do not affect the safety or effectiveness of ImPACT Version 4 for its intended use and do not raise new questions of safety and effectiveness, which was demonstrated through risk management and performance testing." Risk management concluded all individual risk is acceptable and the new device has "virtually the same safety characteristics... and same risk profile" as the predicate.
Test-retest reliability is established (for ages 60-80).	For a subset of 93 individuals (ages 60-80), only a "small percentage" (0-1% for composite scores, 0-2% for factor scores) of scores showed "reliable or 'significant' change" over an average of 16.04 days. This "suggest[s] the cognitive performance of test takers at baseline remained stable over a one-month period."
Construct validity correlated with established tests (for ages 60-80).	ImPACT Verbal Memory Composite scores correlated significantly (P<.001) with HVLT and BVMT-R, and SDMT Memory Sub-scales. ImPACT Motor Speed and Reaction Time Composite Scores both correlated with the SDMT Total Correct Subscales.
Normative database is established and robust for extended age range (60-80).	A normative database was constructed for ages 60-80 from 554 subjects across 8 sites in the US, meeting specific inclusion/exclusion criteria.
Normative database is updated and robust for existing age range (12-59).	A normative database for ages 12-59 was constructed from 71,815 de-identified subjects selected from a larger company database (766,093 records), with separate calculations for mouse and trackpad. Subjects met specific criteria (age, gender, English speaker, input device, no recent concussion, no neurological issues, no ADHD/LD).
Device provides reliable measure of cognitive function to aid in assessment.	"The results of these studies demonstrate ImPACT Version 4 provides a reliable measure of cognitive function to aid in assessment and management of concussion and is therefore substantially equivalent to the Predicate Device." (This is a summary statement based on the overall findings, not a specific performance metric.)

2. Sample Sizes Used for the Test Set and Data Provenance

The document describes several "clinical studies" or investigations which serve as test sets for different aspects (normative data, reliability, validity).

For the 12-59 age range (normative database update):
- Sample Size: 71,815 subjects.
- Data Provenance: Retrospective, de-identified data from the Company's test database of 766,093 subjects. Subjects were from the United States of America.
For the 60-80 age range (normative database, test-retest reliability, construct validity):
- Normative Sample Size: 554 subjects (174 males, 380 females).
- Test-retest Reliability Sample Size: 93 individuals (a subset of the normative extension sample, 64.5% females, 35.5% males).
- Construct Validity Sample Size: 71 individuals (ages 60-80, 63.4% females, 36.6% males).
- Data Provenance: Prospective clinical investigation. Data collected from 8 different sites across the United States (universities, hospitals, clinics, private medical practices). Data collection began in 2017 and ended in 2020.

3. Number of Experts Used to Establish the Ground Truth for the Test Set and Their Qualifications

The document does not explicitly state the number of experts used to establish a "ground truth" for the test set in the sense of adjudicating concussion status for individual subjects within the studies.
For the construct validity study, the document states: "All measures were administered by Neuropsychologists trained in test administration as part of the study to collect data for normative dataset described above." These Neuropsychologists administered the comparative tests (HVLT, BVMT-R, SDMT) which served as the "ground truth" or reference standard for comparison, but they were not adjudicating the ImPACT scores or concussion status. They were experts in administering and interpreting the comparative neuropsychological tests.

4. Adjudication Method for the Test Set

The document does not describe an "adjudication method" in the typical sense (e.g., 2+1 rater adjudication) for determining ground truth of concussion or for evaluating the ImPACT scores.
Instead, the studies used established clinical methods:
- For the normative database, subjects were selected based on self-reported criteria (no recent concussion, no neurological issues, no ADHD/LD diagnosis) and, for the 60-80 age range, MMSE scores (>24).
- For construct validity, the reference standards were scores from other validated neuropsychological tests administered by trained neuropsychologists.

5. If a Multi Reader Multi Case (MRMC) Comparative Effectiveness Study Was Done, If So, What Was the Effect Size of How Much Human Readers Improve with AI vs Without AI Assistance

No, the document does not describe a multi-reader multi-case (MRMC) comparative effectiveness study evaluating how human readers (healthcare professionals) improve with ImPACT Version 4 assistance versus without it.
ImPACT is presented as an "aid" in assessment and management, providing objective measures, but the study design did not assess the human reader's diagnostic performance with and without the device.

6. If a Standalone (i.e., algorithm only without human-in-the-loop performance) Was Done

Yes, the performance testing described for ImPACT Version 4 is in a standalone capacity. The device itself (the neurocognitive test battery) collects and processes data, and its outputs (scores, validity indicators) are what were evaluated for reliability and validity against other tests or over time. The "performance" being assessed is that of the computerized test generating consistent and correlated results, not a human using the device to make a decision.

7. The Type of Ground Truth Used

For the normative database: The "ground truth" for subject inclusion was based on self-reported health status (no recent concussion, no neurological issues, no ADHD/LD) and, for the older age group, a basic cognitive screen (MMSE score > 24). The purpose was to establish norms for a healthy population.
For test-retest reliability: The "ground truth" was the expectation that cognitive performance in a stable individual should not significantly change over a short period.
For construct validity: The "ground truth" was established by scores from other widely utilized and previously validated traditional neuropsychological tests (HVLT, BVMT-R, SDMT).

8. The Sample Size for the Training Set

The document does not explicitly mention a "training set" in the context of machine learning model development. ImPACT is described as a "computer-based neurocognitive test battery" and its modifications primarily involve updating the normative database and output scores, not training a new predictive algorithm.
The "normative database" could be considered analogous to a reference dataset that the device's scoring relies upon.
- For ages 12-59: The normative database was constructed from 71,815 subjects.
- For ages 60-80: The normative database was constructed from 554 subjects.

9. How the Ground Truth for the Training Set Was Established

As mentioned above, there isn't a "training set" for a machine learning model described.
For the normative databases, which serve as the reference for interpreting individual test results, the "ground truth" for inclusion was:
- For 12-59 age range: De-identified data selected from a large company database. Subjects had to meet criteria such as no concussion in the past 6 months, no other neurological issues, no ADHD/LD diagnosis. These were self-reported or existing medical history.
- For 60-80 age range: Prospective data collection where subjects met strict inclusion criteria: ages 60-80, primary English speaking, not in a skilled nursing facility, not suffering from/being treated for a concussion, no known impairing physical/neurological/behavioral/psychological conditions, corrected hearing/vision within normal limits, and a Mini-Mental State Examination (MMSE) score of 24 or greater. This was established through screening and clinical assessment at 8 different sites.

Ask a Question

Ask a specific question about this device

K Number

K181223

Device Name

ImPACT

Manufacturer

ImPACT Applications, Inc.

Date Cleared

2018-10-20

(165 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

K170209

Predicate For

K202485

Why did this record match?

510k Summary Text (Full-text Search) :

Suite 550 San Diego, California 92123

Re: K181223

Trade/Device Name: ImPACT Regulation Number: 21 CFR 882.1471
|
| Product Code: | POM, 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT is intended for use as a computer-based neurocognitive test battery to aid in the assessment and management of concussion.

Device Description

ImPACT® (Immediate Post-Concussion Assessment and Cognitive Testing) is a computer-based neurocognitive test battery.

lmPACT is a software-based tool that allows healthcare professionals to conduct a series of neurocognitive tests on individuals to gather basic data related to the neurocognitive functioning of the test subject. This computerized cognitive test battery evaluates and provides a healthcare professional with measures of various neurocognitive functions, including the reaction time, memory, attention, spatial processing speed and symptoms of an individual.

lmPACT provides healthcare professionals with a set of well-developed and researched neurocognitive tasks that have been medically accepted as state-of-the-art best practices and is intended to be used as part of a multidisciplinary approach to making return to activity decisions.

AI/ML Overview

Here's a breakdown of the acceptance criteria and the study details for the ImPACT device, based on the provided FDA 510(k) summary:

Acceptance Criteria and Reported Device Performance

The document does not explicitly present a table of acceptance criteria with numerical targets. Instead, it describes general claims of conformance and successful verification/validation. The core acceptance criterion for the device modification (allowing unsupervised baseline testing) appears to be that this change does not affect the safety or effectiveness of ImPACT for its intended use, particularly that the number of invalid self-administered tests are not different when compared to the supervised environment.

Acceptance Criteria (Inferred)	Reported Device Performance
Software verification and validation activities demonstrate device performance and functionality according to IEC 62304 and other software standards.	"Software verification and validation activities including code reviews, evaluations, analyses, traceability assessment, and manual testing were performed in accordance with IEC 62304 and other software standards to demonstrate device performance and functionality. All tests met the required acceptance criteria."
Risk Management activities assure all risks (including use-related and security) are appropriately mitigated per ISO 14971.	"Risk Management activities conducted in accordance with ISO 14971 assure all risk related to use of a computerized neurocognitive test, including use related risks and security risks, are appropriately mitigated."
Unsupervised baseline testing does not affect the validity of test results compared to supervised environments (i.e., invalid test rates are comparable).	"5.8% of subjects reported invalid results. The results of the studies indicate that the number of invalid self-administered tests are not different when compared to the supervised environment reported in the literature."
Unsupervised baseline testing does not affect the test-retest reliability of ImPACT.	Pearson correlations between baseline assessments ranged from .43 to .78. ICCs reflected higher reliability than Pearson's r, across all measures. Visual Motor Speed: mean ICC=.91. Reaction Time: ICC=.78. Visual Memory: ICC=.62. Verbal Memory: ICC=.55. Mean ImPACT composite and symptom scores showed no significant improvement between the two assessments. No significant practice effects across the two assessments (mean of 80 days). Scores reflected considerable stability as reflected in ICCs and UERs. All participants were able to complete the test independently, with no "Invalid Baselines" obtained in the reliability study.

Study Details

This submission describes changes to an already cleared device, primarily regarding the use environment for baseline testing (allowing unsupervised baseline testing). Therefore, the "study" is focused on demonstrating that this modification does not negatively impact the device's performance or safety.

1. A table of acceptance criteria and the reported device performance: (See table above).

2. Sample size used for the test set and the data provenance:

Usability/Validity Study: 162 subjects.
- Composition: 74 college students, 44 Middle and High School students, and 44 adults.
- Provenance: Not explicitly stated, but implied to be from the US given the FDA submission. It's a prospective study conducted specifically for this device modification.
Clinical Data (Test-Retest Reliability Study): 50 participants.
- Provenance: Not explicitly stated, but implied to be from the US. This was a prospective study.

3. Number of experts used to establish the ground truth for the test set and the qualifications of those experts:

The document does not mention the use of experts to establish ground truth for this specific validation study related to the device modification.
The clinical study focused on test-retest reliability within individuals performing the ImPACT test. The "truth" here is the individual's own performance consistency, not an external expert's diagnosis.
The overall ImPACT device, as a "computerized cognitive assessment aid," is designed to provide "objective measures of neurocognitive functioning" that healthcare professionals interpret. The "ground truth" for concussion diagnosis itself relies on a multidisciplinary approach by healthcare professionals, which the device aids in, rather than determines independently.

4. Adjudication method (e.g. 2+1, 3+1, none) for the test set:

No adjudication method is described for the test sets. The studies are evaluating the device's output (invalidity rates, reliability scores) directly from participant interactions with the modified device.

5. If a multi reader multi case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:

No MRMC comparative effectiveness study was done or described. This submission is for enabling unsupervised baseline testing, not for evaluating human reader performance with or without AI assistance. The device is the assessment aid for healthcare professionals.

6. If a standalone (i.e. algorithm only without human-in-the-loop performance) was done:

Yes, both "studies" implicitly evaluated the standalone performance of the ImPACT device in the context of the proposed unsupervised use environment.
- The usability assessment determined the rate of invalid tests when self-administered.
- The clinical data study evaluated the test-retest reliability of the ImPACT scores themselves, independent of human interpretation during the testing phase.
- The device's output (scores, invalidity indicators) is generated by the algorithm, which is then provided to healthcare professionals.

7. The type of ground truth used (expert consensus, pathology, outcomes data, etc.):

For the usability/validity study (unsupervised testing): The "ground truth" was internal to the ImPACT system – the device's own "Invalidity Indicator" function was used to determine if results were "invalid." This is based on pre-established criteria within the device, not external expert consensus for each individual test. The comparison was against reported invalidity rates in supervised environments from literature.
For the test-retest reliability study: The "ground truth" was the consistency of an individual's performance on the test over time. This is assessed statistically using measures like Pearson correlations and Intraclass Correlation Coefficients (ICCs). There is no "ground truth diagnosis" being established here, but rather the reliability of the measurement.

8. The sample size for the training set:

The document does not provide information on the training set or how the core ImPACT algorithm was initially developed/trained. This submission is for a device modification (change in use environment, minor UI adjustment) of an already cleared predicate device (K170209). The fundamental neurocognitive test battery design and algorithms are stated to be "identical to the original version."

9. How the ground truth for the training set was established:

As mentioned above, this information is not provided in this 510(k) summary for the modified device. The document states that the neurocognitive tasks "have been medically accepted as state-of-the-art best practices," implying that the underlying methodology for generating the neurocognitive measures is well-established, but details on the initial training data and ground truth establishment for the original algorithm are absent.

Ask a Question

Ask a specific question about this device

K Number

K170551

Device Name

ImPACT Quick Test

Manufacturer

ImPACT Applications, Inc.

Date Cleared

2017-06-21

(117 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN150037

Predicate For

K092819,K241737

Why did this record match?

510k Summary Text (Full-text Search) :

Diego, California 92123

Re: K170551

Trade/Device Name: ImPACT Ouick Test Regulation Number: 21 CFR 882.1471
| II |
| Device Classification: | 21 CFR 882.1471
|

Conclusion:

ImPACT QT falls within the generic type of device regulated under 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT Quick Test is intended for use as a computerized cognitive test aid in the assessment and management of concussion in individuals ages 12-70.

Device Description

ImPACT Quick Test (ImPACT) QT is a brief computerized neurocognitive test designed to assist trained healthcare professionals in determining a patient's status after a suspected concussion. ImPACT QT provides basic data related to neurocognitive functioning, including working memory, processing speed, reaction time, and symptom recording.

ImPACT QT is designed to be a brief 5-7 minute iPad-based test to aid sideline personnel and first responders in determining if an athlete/individual is in need of further evaluation or is able to immediately return to activity. ImPACT QT is not a substitute for a full neuropsychological evaluation or a more comprehensive computerized neurocognitive test (such as ImPACT).

AI/ML Overview

Here's an analysis of the acceptance criteria and the study that proves the device meets the acceptance criteria, based on the provided text:

Acceptance Criteria and Device Performance

The document does not explicitly state a table of pre-defined acceptance criteria for the ImPACT Quick Test itself, in terms of specific performance thresholds (e.g., a minimum correlation coefficient). Instead, the studies aim to demonstrate construct validity, concurrent validity, and test-retest reliability as evidence of the device's performance, aiming to show substantial equivalence to the predicate device.

The reported device performance aligns with these goals:

Acceptance Criteria (Implied)	Reported Device Performance
Concurrent Validity	Correlations between ImPACT Quick Test and the predicate device (ImPACT) were in the moderate to high range (0.32-0.63, all p<.001). This suggests that although the two instruments measure similar constructs, the fact that ImPACT Quick Test contains a subset of ImPACT tests as well as unique content explains the moderate relationship between the two instruments.
Construct Validity	ImPACT Quick Test measures (Attention Tracker and Motor Speed) correlated more highly with established neuropsychological tests (BVMT-R, CTT) that assess similar constructs (attention and motor speed). Correlations varied from 0.28 to 0.61 (many significant at p<.001), demonstrating expected relationships with external measures. Lower correlation for Memory scale with BVMT-R was attributed to significant differences in format and task demands.
Test-Retest Reliability	Test-retest correlations for composite scores were: Memory (r=0.18), Attention Tracker (r=0.73), and Motor Speed (r=0.82). All significant at p<.001 or beyond. The majority of correlations were in the 0.6 to 0.8 range, reflecting "considerable stability" across the re-test period. Reliable Change Index (RCI) also indicated the percentage of cases falling outside confidence intervals.
Clinical Acceptability	The device provides a "reliable measure of cognitive function to aid in assessment of concussion" and is "substantially equivalent to the Predicate Device."
Software Validation & Risk Management	ImPACT QT software was developed, validated, and documented according to IEC 62304 and FDA Guidance. Risk Management (ISO 14971) was conducted, with all risks appropriately mitigated.
Normative Data	A normative database was developed from 772 subjects, representative of ages 12-70 based on 2010 U.S. Census for age, gender, and race.

Study Details

The provided document describes several studies to support the predicate device equivalence and performance of the ImPACT Quick Test.

Sample sizes used for the test set and the data provenance:
- Concurrent Validity Study: 92 subjects (41 males, 51 females; average age 36.5 years, range 12-76 years).
- Construct Validity Study: 118 subjects (73 females, 45 males; average age 32.5 years, range 18-79 years).
- Test-Retest Reliability Study: 76 individuals.
- Normative Database Development: 772 subjects.
- Data Provenance: All subjects were recruited from 11 sites across the United States. All subjects completed an IRB approved consent form and met eligibility criteria. The studies appear to be prospective in nature, collecting new data for these specific analyses.
Number of experts used to establish the ground truth for the test set and the qualifications of those experts:
- The concept of "ground truth" for the test set in this context applies differently. For the concurrent and construct validity studies, the "ground truth" or reference standards were the predicate device (ImPACT) and established traditional neuropsychological tests (BVMT-R, CTT, SDMT). The performance of these reference tests themselves is considered the standard.
- For the administration of the tests and data collection, the document states: "All testing was completed by professionals who were specifically trained to administer the test. These professionals consisted of neuropsychologists, physicians, psychology/psychology/psychology graduate students, certified athletic trainers and athletic training graduate students. All testing was completed in a supervised setting." While this describes the administrators, it doesn't specify how many "experts" established a singular ground truth for any given case, as the outputs are quantitative cognitive scores.
Adjudication method (e.g., 2+1, 3+1, none) for the test set:
- The document does not describe an adjudication method for the test set as would typically be seen in diagnostic studies where expert consensus determines a disease state. The outputs of the device and the comparison tests are quantitative scores, not subjective interpretations requiring adjudication.
If a multi-reader multi-case (MRMC) comparative effectiveness study was done, If so, what was the effect size of how much human readers improve with AI vs without AI assistance:
- No, a multi-reader multi-case (MRMC) comparative effectiveness study focusing on human reader improvement with AI assistance was not done or reported in this document. The device is a "computerized cognitive assessment aid" providing quantitative scores, not an AI that directly assists human interpretation in the MRMC sense. It's an aid for trained healthcare professionals, but the study doesn't quantify their improvement with the aid versus without it.
If a standalone (i.e., algorithm only without human-in-the-loop performance) was done:
- Yes, the studies evaluating concurrent validity, construct validity, and test-retest reliability are essentially standalone performance evaluations. The ImPACT Quick Test (algorithm/device) generated cognitive scores which were then compared to other tests or re-administered. While trained professionals administered the test, their role was in test administration, not in altering the device's output or performing an "in-the-loop" interpretation that influenced the device's score generation. The "performance" being measured is the scores produced by the device itself.
The type of ground truth used (expert consensus, pathology, outcomes data, etc.):
- The "ground truth" was established by comparison to established, validated neuropsychological tests and a predicate device (ImPACT).
  - For concurrent validity, the predicate device (ImPACT) was the reference standard.
  - For construct validity, traditional neuropsychological tests (Brief Visuospatial Motor Test (BVMT-R), Color Trails Test (CTT), Symbol Digit Modalities Test (SDMT)) served as the reference standards.
  - For test-retest reliability, the device's own repeated measurements served as the basis for evaluation, with consistency across measurements being the goal.
The sample size for the training set:
- The normative database used to establish percentiles for the ImPACT Quick Test was developed using 772 subjects. This serves as a "training set" in the sense that it provides the reference data against which individual patient's scores are compared. It is not a software machine learning training set in the typical sense, but rather a reference population.
- The document states that the new device "reports symptoms and displays test results in a form of composites score percentiles based on normative data" and that "The standardization sample was developed to be representative of the population of individuals ages 12-70 years".
How the ground truth for the training set was established:
- For the normative database (which serves a similar function to a training set here), the "ground truth" was simply the measured performance of a large, representative sample of healthy individuals on the ImPACT Quick Test itself. These subjects were free from concussion within one year, neurological disorders, and psychoactive medication. Their scores define the "normal" range for different age, gender, and race stratifications, against which subsequent patient scores are compared. All subjects completed IRB approved consent and met eligibility criteria. Testing was completed by trained professionals in a supervised setting to ensure standardized administration.

Ask a Question

Ask a specific question about this device

K Number

K170209

Device Name

ImPACT

Manufacturer

IMPACT APPLICATIONS, INC

Date Cleared

2017-02-23

(30 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

DEN150037

Predicate For

K181223

Why did this record match?

510k Summary Text (Full-text Search) :

Suite 550 San Diego, California 92123

Re: K170209

Trade/Device Name: ImPACT Regulation Number: 21 CFR 882.1471
Classification: | Class II |
| Product Code: | POM, 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT is intended for use as a computer-based neurocognitive test battery to aid in the assessment and management of concussion.

Device Description

ImPACT® (Immediate Post-Concussion Assessment and Cognitive Testing) is a computer-based neurocognitive test battery.

ImPACT is a software-based tool that allows healthcare professionals to conduct a series of neurocognitive tests on individuals to gather basic data related to the neurocognitive functioning of the test subject. This computerized cognitive test battery evaluates and provides a healthcare professional with measures of various neurocognitive functions, including the reaction time, memory, attention, spatial processing speed and symptoms of an individual.

ImPACT provides healthcare professionals with a set of well-developed and researched neurocognitive tasks that have been medically accepted as state-of-the-art best practices and is intended to be used as part of a multidisciplinary approach to making return to activity decisions.

AI/ML Overview

The ImPACT device is a computer-based neurocognitive test battery intended to aid in the assessment and management of concussion. The submission focuses on a device modification – a rewrite of the software from Adobe Flash to HTML5/CSS/JavaScript, while preserving all functions of the original version. The performance testing aims to demonstrate that this software change does not affect the device's functionality or performance.

Here's a breakdown of the acceptance criteria and the study proving the device meets them:

1. Table of Acceptance Criteria and Reported Device Performance

The document does not explicitly present a formal table of acceptance criteria with specific quantifiable metrics like sensitivity, specificity, or AUC, as might be seen for a diagnostic AI device. Instead, the acceptance criteria are implicitly tied to demonstrating non-inferiority or statistical and clinical insignificance of differences between the new HTML5 version and the predicate Flash version, specifically for reaction time. The primary performance metric assessed was "reaction time."

Acceptance Criteria (Implicit)	Reported Device Performance
Differences in reaction time between the new HTML5 version and the predicate Flash version are statistically and clinically insignificant.	Two laboratory studies (one bench, one in volunteers age 19-22) showed that the differences in performance for reaction time were statistically and clinically insignificant.
All software verification and validation tests are met.	All tests met the required acceptance criteria.
Risk management activities assure all risks are appropriately mitigated.	Risk Management activities conducted in accordance with ISO 14971 assure all risk related to use of a computerized neurocognitive test, including use related risks are appropriately mitigated.

2. Sample Size Used for the Test Set and Data Provenance

Test Set Sample Size: "one in volunteers age 19-22". While a specific number is not provided, this indicates a human user study. The bench study would not involve a "sample size" in the same way, but rather controlled test conditions.
Data Provenance: The document does not specify the country of origin of the data. It appears to be a prospective study conducted for the purpose of this submission, comparing the new and old software versions.

3. Number of Experts Used to Establish Ground Truth and Qualifications

This information is not provided in the document. Given the nature of the device (a neurocognitive test battery for concussion assessment) and the type of performance testing described (comparison of reaction time between software versions), it's unlikely that external medical experts were used to establish "ground truth" in the typical sense (e.g., image annotation for disease presence). The "ground truth" here is the actual reaction time measurement as recorded by the device.

4. Adjudication Method for the Test Set

This information is not provided and is likely not applicable. Since the study is comparing measurements (reaction time) between two software versions rather than subjective interpretation of data, an adjudication method for reconciling expert opinions would generally not be needed.

5. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study

A MRMC comparative effectiveness study was not done. The study described focused on direct comparison of output (reaction time) between two software versions rather than evaluating the impact of AI assistance on human readers' diagnostic performance. The device itself is an assessment aid, not an AI for image reading or interpretation that augments a human reader.

6. Standalone (Algorithm Only) Performance Study

The study described is essentially a standalone performance study in the context of the device modification. It evaluates the performance of the new HTML5 algorithm directly against the predicate Flash algorithm, without involving human interpretation or human-in-the-loop assistance in the performance assessment itself. The "volunteers" in the study are the subjects on whom the reaction time is measured by the device itself.

7. Type of Ground Truth Used

The "ground truth" for this performance study was the objective measurement of reaction time. The study aimed to show that the new software version produced reaction time measurements statistically and clinically equivalent to those from the predicate software version. This is not a ground truth derived from expert consensus, pathology, or outcomes data in the usual sense of a diagnostic device.

8. Sample Size for the Training Set

This information is not provided. As this submission relates to a software re-write of an existing device and not the development of a new AI/ML model from scratch, there might not be a separate "training set" in the conventional machine learning sense. If the underlying neurocognitive test battery involved trained algorithms, that training would have occurred for the original predicate device, and the re-write aimed to replicate its behavior.

9. How the Ground Truth for the Training Set Was Established

This information is not provided. Similar to point 8, if there was initial algorithm training for the predicate device, the method for establishing its ground truth is not detailed in this document. Given it's a neurocognitive test battery, the initial "ground truth" would likely have been established through extensive psycho-metric validation, normative data collection, and clinical trials for the original ImPACT device to ensure its measurements accurately reflect neurocognitive function in the context of concussion.

Ask a Question

Ask a specific question about this device

K Number

DEN150037

Device Name

ImPACT Computerized Neurocognitive Concussion Management Aid

Manufacturer

IMPACT APPLICATIONS, INC.

Date Cleared

2016-08-22

(377 days)

Product Code

Regulation Number

Type

Panel

Reference & Predicate Devices

N/A

Predicate For

N/A

Why did this record match?

510k Summary Text (Full-text Search) :

NEW REGULATION NUMBER: 882.1471

CLASSIFICATION: CLASS II

PRODUCT CODE: POM

BACKGROUND

DEVICE
POM Device Type: Computerized Cognitive Assessment Aid for Concussion Class: II Regulation: 21 CFR 882.1471

AI/MLSaMDIVD (In Vitro Diagnostic)TherapeuticDiagnosticis PCCP AuthorizedThirdpartyExpeditedreview

Intended Use

ImPACT is intended for use as a computer-based neurocognitive test battery to aid in the assessment and management of concussion.

ImPACT Pediatric is intended for use as a computer-based neurocognitive test battery to aid in the assessment and management of concussion.

ImPACT Pediatric is a neurocognitive test battery that provides healthcare professionals with objective measure of neurocognitive functioning as an assessment aid and in the management of concussion in individuals ages 5-11.

Device Description

ImPACT® (Immediate Post-Concussion Assessment and Cognitive Testing) and ImPACT Pediatric are computer-based neurocognitive test batteries for use as an assessment aid in the management of concussion.

ImPACT and ImPACT Pediatric are software-based tools that allows healthcare professionals to conduct a series of neurocognitive tests that provide data related to the neurocognitive functioning of the test taker. This computerized neurocognitive test battery measures various aspects of neurocognitive functioning including reaction time, memory, attention, and spatial processing speed. It also records symptoms of concussion in the test taker.

ImPACT and ImPACT Pediatric provide healthcare professionals with a set of well-developed and researched neurocognitive tasks that have been medically accepted as state-of-the-art best practices. The devices are intended to be used as part of a multidisciplinary approach to concussion assessment and patient management.

AI/ML Overview

This response summarizes the acceptance criteria and supporting studies for the ImPACT and ImPACT Pediatric devices, as detailed in the provided document.

Acceptance Criteria and Device Performance

The acceptance criteria for ImPACT and ImPACT Pediatric are primarily demonstrated through the establishment of their psychometric properties: construct validity, test-retest reliability, and the development of a robust normative database. The device performance is reported through the results of various peer-reviewed studies and clinical data supporting these properties.

Table 1: Acceptance Criteria and Reported Device Performance

Acceptance Criteria Category	Specific Criteria/Outcome Demonstrated	Reported Device Performance (Summary)
Construct Validity	Demonstrate the device measures what it purports to measure (e.g., specific cognitive functions).	ImPACT: Significant correlations with traditional neuropsychological measures (SDMT, WAIS-R, NFL Battery components) for processing speed, reaction time, memory. Demonstrated good convergent and discriminant validity. ImPACT Pediatric: Significant correlations (20/24 potential comparisons) with WRAML-2 subtests (e.g., Story Memory, Design Recall), indicating it measures important aspects of memory.
Test-Retest Reliability	Demonstrate consistent results over time for multiple administrations.	ImPACT: Robust test-retest reliability with ICCs generally ranging from 0.46 to 0.88 across various intervals (30 days to 2 years) for Verbal Memory, Visual Memory, Visual Motor Speed, and Reaction Time. ImPACT Pediatric: ICCs ranging from 0.46 to 0.89 across test modules (e.g., Word List, Memory Touch, Stop & Go) over one-week intervals, indicating adequate to excellent stability for most.
Normative Database	Establish a representative database to compare patient performance against "normal" population.	ImPACT: Standardization sample of 17,013 individuals (ages 10-59 years, diverse gender/age breakdown). Data collected by trained professionals in supervised settings. ImPACT Pediatric: Normative database of 915 children (ages 5-12 years) from multiple clinical sites (Atlanta, Annapolis, Marquette, Pittsburgh, Guelph, ON). Age-stratified, considering gender differences.
Reliable Change Index (RCI)	Provide a statistical calculation to determine if a change in score is clinically meaningful and not due to measurement error or practice effects.	ImPACT: RCI calculation provided to indicate clinically significant improvement, reducing adverse impact of measurement error. ImPACT Pediatric: RCI calculated to highlight score changes not due to practice effects or measurement error, displayed in red on reports.
Validity Index	Provide an index to identify invalid baseline examinations.	ImPACT: Algorithm-based index identifies invalid tests based on sub-optimal performance on specific subtests (e.g., X's and O's Total Incorrect > 30, Word Memory Learning Pct Correct < 69%). Automated flagging in reports. Not explicitly detailed for ImPACT Pediatric, but importance of valid tests is implied.

Study Details Proving Device Meets Acceptance Criteria

The device relies on a collection of previously published peer-reviewed studies and additional clinical data submitted with the de novo request. The FDA's determination is based on the adequacy of this compiled evidence rather than a single, large prospective clinical trial specifically for this de novo submission.

1. Sample Sizes Used for the Test Set and Data Provenance:

ImPACT:
- Construct Validity Studies: Sample sizes for individual studies ranged from N=30 to N=100.
  - Iverson et al. (2005): N=72 athletes from PA high schools.
  - Maerlender et al. (2010): N=54 varsity athletes from Dartmouth College.
  - Schatz & Putz (2006): N=30 college students from St. Joseph's University, Philadelphia.
  - Allen & Gfeller (2011): N=100 psychology students from a private university (Midwestern).
- Reliability Studies: Sample sizes for individual studies ranged from N=25 to N=369.
  - Schatz (2013): N=25
  - Cole (2013): N=44 (active duty military population)
  - Nakayama (2014): N=85
  - Elbin (2011): N=369 high school athletes
  - Schatz (2010): N=95
- Normative Database: N=17,013 individuals. Data collected from high schools and colleges across the US, and adult athlete populations/coaches/administrators.
- Data Provenance: Predominantly retrospective analysis of published literature and existing clinical data. Studies are from various locations within the United States (PA, Dartmouth, Philadelphia, Midwestern university) and some multinational data for normative database (South African sample mentioned for cross-cultural norming). The military population study (Cole, 2013) also indicates a specific cohort. Data is gathered from both concussed and concussion-free individuals for different study purposes (validity vs. norming).
ImPACT Pediatric:
- Normative Database: N=915 children. Data collected from clinical sites in Atlanta, GA; Annapolis, MD; Marquette, MI; Pittsburgh, PA; and Guelph, Ontario, Canada.
- Reliability Study: N=100 children (ages 5-12 years) participating in youth soccer and hockey leagues. (Unpublished study reported).
- Construct Validity Study: N=83 participants (ages 5-12 years).
- Data Provenance: A mix of clinical sites across the US and Canada. Primarily retrospective analysis of previously collected data.

2. Number of Experts Used to Establish Ground Truth for the Test Set and Qualifications:

The document does not specify a fixed number of experts for adjudicating individual cases within a test set, as this is not a diagnostic device relying on expert consensus on image interpretation.
Instead, the "ground truth" for these performance studies is established by:
- Clinical Standards: For construct validity, ImPACT was compared to traditional neuropsychological measures (e.g., Symbol Digit Modalities Test, WAIS-R, NFL Battery components). These traditional tests are themselves well-established and interpreted by qualified neuropsychologists or clinicians.
- Normative Data Collection: "Professionals who were specifically trained to administer the tests," including "Neuropsychologists, Psychologists and Neuropsychology/Psychology graduate students, Certified Athletic Trainers and Athletic Training Graduate Students and Nurses." Their expertise ensures correct test administration and data collection for establishing norms.
- Reference Standards: For the ImPACT construct validity studies, diagnoses of concussion were made using "then-applicable clinical standards including grading of concussion using AAN guidelines." For ImPACT Pediatric, construct validity was assessed against the "Wide Range Assessment of Memory and Learning-2 (WRAML-2)," a recognized neuropsychological battery.

3. Adjudication Method for the Test Set:

Not applicable in the typical sense of a diagnostic medical device relying on image interpretation and expert consensus for ground truth. The "adjudication" is inherent in the established methodologies of traditional neuropsychological testing and statistical correlations.

4. Multi-Reader Multi-Case (MRMC) Comparative Effectiveness Study:

No MRMC study was performed in the traditional sense of comparing human readers with and without AI assistance for interpretation.
This device is a Computerized Cognitive Assessment Aid, not an AI for image interpretation. Its function is to administer a neurocognitive test battery and provide objective measures. The "AI" component is more akin to sophisticated algorithms for scoring, norming, and identifying reliable change or invalid tests, rather than a system that assists human interpretation of complex medical imagery.
The benefit is the "Ability to have access to a non-invasive cognitive assessment battery that can be used to compare pre-injury (baseline cognitive performance) to post-injury cognitive performance" and "Ability to compare cognitive test performance to a large normative database in the absence of baseline testing," replacing or augmenting less standardized manual methods.

5. Standalone Performance (Algorithm Only without Human-in-the-Loop Performance):

The device inherently operates as an "algorithm only" in generating scores and reports. However, it is explicitly not a standalone diagnostic device.
The "performance" of the algorithm is demonstrated through its ability to accurately measure cognitive functions (construct validity), consistently produce results (reliability), and correctly apply the normative and change index calculations.
The output (scores, RCIs, validity flags) is then interpreted by a healthcare professional. The device itself does not make a diagnosis. Therefore, its "standalone performance" is specifically in generating the objective data, which is then used by a human clinician as an aid.

6. Type of Ground Truth Used:

Neuropsychological Test Scores: For construct validity, the device's scores were correlated with scores from established, traditional paper-and-pencil neuropsychological assessment batteries. These batteries serve as the "ground truth" for what cognitive functions are being measured.
Clinical Standards/Diagnosis: For some ImPACT studies, comparison was made to individuals diagnosed with concussion based on "then-applicable clinical standards."
Normative Data: The "normal" population for the normative database was established through "clinical work-up... including the establishment of inclusion and exclusion criteria" by trained professionals, ensuring subjects did not have medical conditions affecting test performance.

7. Sample Size for the Training Set:

ImPACT: The standardization sample (normative database) served as the primary basis for the algorithm's "training" or establishment of "normal" values and internal psychometric properties. This sample consisted of 17,013 individuals.
ImPACT Pediatric: The normative database for ImPACT Pediatric comprised 915 children.
These are not "training sets" in the modern machine learning sense (where a model learns iteratively), but rather the large datasets used to establish the statistical properties and normative values from which the device's algorithms derive meaning (e.g., percentile rankings, Reliable Change Index).

8. How the Ground Truth for the Training Set (Normative Database) Was Established:

ImPACT:
- Data collected from high schools and colleges nationwide, ensuring representation of the intended use population.
- Older adults included coaches, school administrators, nurses, and adult athletes.
- Testing administered by specifically trained professionals: Neuropsychologists, Psychologists, Neuropsychology/Psychology graduate students, Certified Athletic Trainers, Athletic Training Graduate Students, and Nurses.
- Tests completed in a supervised setting.
- Data uploaded to a secure HIPAA-compliant server and de-identified.
- Participants were English speakers, not reported to have underlying intellectual/developmental data, and not currently concussed or suffering from medical conditions that might affect performance.
ImPACT Pediatric:
- Large, age-stratified sample of children (5-12 years) from multiple clinical sites.
- Tests administered by a researcher, clinician, or educational professional trained in the use of ImPACT Pediatric.
- Tests taken on an iPad in a one-on-one basis (no group testing).
- Children instructed to respond by touching the screen.
- "Ground truth" for "normal" performance was derived from statistical analysis (means, standard deviations, t-tests for gender differences) of this large, carefully collected, and screened normative dataset. Factor analysis was conducted on a subset to derive relevant score clusters.

Ask a Question

Ask a specific question about this device

Page 1 of 1