A new NIH-CMS partnership promises to use electronic medical records (EMRs) and real-world data (RWD) to answer complex questions about autism and other chronic diseases. Experience in the last decade has shown several limitations of the real world data relying of which could lead to dangerous policy missteps, misleading health research, and wasted billions, especially in high-stakes areas like chronic disease and autism spectrum disorder (ASD) research.
The newly announced partnership between the National Institutes of Health (NIH) and the Centers for Medicare & Medicaid Services (CMS) to create a real-world data (RWD) platform for research into autism spectrum disorder (ASD) and chronic diseases is ambitious, but also deeply flawed in its assumptions. While the goal of leveraging claims data, electronic medical records (EMRs), and consumer wearables to uncover root causes of complex conditions is laudable, the proponents vastly oversimplify the challenges involved in using such data for meaningful scientific insight. The unfortunate reality is that medical records and other real-world data sources are not designed to answer the types of high-stakes research questions this project aims to tackle.
Medical records in clinical practice are notoriously limited, inconsistent, and incomplete. Physicians often record only the minimum necessary details due to time constraints, liability concerns, and variability in documentation practices. This creates substantial gaps in information that are critical to understanding patient histories, treatment efficacy, behavioral factors, and diagnostic subtleties—especially in complex, multifactorial conditions like autism. The assumption that one can extract deep etiological insights from such fragmented and superficial data is highly questionable.
Moreover, many critical variables that influence patient outcomes—such as environmental exposures, familial context, and social determinants of health—are either entirely absent from medical records or captured in unstructured, inconsistent ways that defy easy quantification. Even data from wearables and insurance claims, while voluminous, often lack the granularity and clinical context needed for nuanced interpretation. These datasets may highlight correlations, but they rarely yield causal inferences strong enough to guide policy or medical intervention.
It is telling that some of the world’s most sophisticated data-driven companies—Amazon, Microsoft, Apple—as well as major health insurers, have attempted to make use of medical record data at scale and failed to derive robust, actionable insights. If these entities, with their immense technical and financial resources, have struggled, there is little reason to believe that a public-sector initiative, however well-intentioned, will fare better without fundamentally new approaches to data quality and structure.
There is also a dangerous potential for misuse. Poor-quality or incomplete data may lead to anecdotal findings being presented as evidence, which in turn could inform public health policy or clinical guidelines. This could result in misallocation of resources, misdiagnosis, or ineffective interventions, ultimately harming the very populations this initiative intends to help. For example, if medical records do not capture whether interventions occurred consistently across patient populations, or if there is selection bias in who receives certain treatments, the resulting analysis could falsely attribute causality where none exists.
While privacy protections and secure data infrastructures are important, and seem to be prioritized in this project, they are no substitute for data integrity and scientific rigor. The NIH-CMS initiative must confront the limitations of RWD honestly and build safeguards against drawing unwarranted conclusions from weak or noisy data. Without doing so, this partnership risks becoming another well-meaning but ill-fated attempt to repurpose administrative records into tools for solving some of the most complex problems in medicine and public health.