The Hidden Flaw in FDA-Approved AI Medical Devices 

Artificial Intelligence (AI) offers unprecedented tools for diagnosis and treatment. However, a new cross-sectional study of nearly 1,000 FDA-approved AI-enabled medical devices reveals a troubling gap between regulatory approval and robust clinical evidence. Questions are raised about the reliability of these tools to perform safely and effectively for every patient, regardless of their background or demographics. 

Most approved AI devices are in the fields of radiology, cardiovascular medicine, and neurology. An analysis of 903 FDA-approved AI-enabled medical devices, available through August 2024, shows that the majority of these are software-only applications cleared to assist clinicians. The new study found that rigorous clinical testing supporting most of these devices was often incomplete or absent. Clinical performance studies were reported for only about half (56%) of the analyzed devices. More concerningly, a substantial portion—nearly one-quarter (24.1%)—explicitly stated that no performance studies had been conducted at all prior to approval. For the studies that were performed, most were retrospective (38.2%), which often means they analyze existing data rather than testing the device in a real-world, controlled setting. Only a tiny fraction were prospective (8.1%), and a mere 2.4% employed a randomized clinical trial design, the gold standard for clinical evidence.

This reliance on less stringent evidence is compounded by a critical issue of demographic representation. The core challenge for AI is generalizability—the ability of a model to perform reliably across different populations and clinical settings. AI models are highly dependent on the data they are trained on, and if that data is not representative, the model is prone to bias. The study’s findings here are alarming: less than one-third (28.7%) of clinical evaluations provided sex-specific data, and only about one-fourth (23.2%) addressed age-related patient subgroups. This severe lack of diversity in testing means that a device performing perfectly in a narrow validation set might fail or introduce disparities when used on a broader patient population.

The vast majority of these devices (97.1%) were approved via the FDA’s 510(k) pathway, which allows manufacturers to bypass compulsory clinical testing by demonstrating the new device is “substantially equivalent” to an already-approved predicate device. While efficient, this pathway may not be adequate for novel, adaptive AI technologies that require continuous monitoring and evaluation due to their susceptibility to data shifts and changes in clinical applications. Given that the data available is often insufficient for a comprehensive assessment, the current framework may be struggling to keep pace with the unique demands of AI validation. Ultimately, the information publicly available on the FDA website is often insufficient for a thorough and comprehensive assessment of these devices’ clinical generalizability.

The study raises questions regarding the transparency and clinical validation of AI medical devices. Safe and effective AI devices require representative premarket datasets and robust post-market monitoring. The major issue with AI-based medical devices is equitable care, without which such devices would have limited adoption by healthcare providers and patients alike.

Author

FDA Purán Newsletter Signup

Subscribe to FDA Purán Newsletter for 
Refreshing Outlook on Regulatory Topics