Real-world data is being hailed as the next big thing in clinical research, but the reality is a bit messier. Until we clean up the data and rethink how we collect it, real-world evidence risks being a shiny promise with little power.
The explosion of electronic health records (EHRs) and digital claims databases was supposed to revolutionize clinical research. With 96% of U.S. hospitals now using EHRs, we have access to a firehose of medical information that could, at least in theory, fuel faster, cheaper, and more inclusive studies. But here’s the problem: real-world data (RWD) is, well, real messy. Most EHRs weren’t built with science in mind. They were designed for billing, not research. As a result, the data often lacks detail, accuracy, and completeness, especially for information not tied to reimbursement. Critical exposures, outcomes, or even something as basic as mortality data may be missing or inconsistently recorded. That makes using this information for anything more rigorous than observational studies an uphill battle.
And it’s not just about what’s missing. RWD also struggles with confounding factors, fragmented care histories, inconsistent formats, and zero standardization across institutions. Trying to extract causal insights from such chaos is like trying to run a clinical trial using half-filled charts scribbled in four different languages.
So, what’s the fix?
The Hybrid Solution: Bring the Discipline of Trials to Real-World Studies. A promising path forward is a hybrid model, one that marries the breadth of real-world data with the rigor of clinical trials. Instead of passively mining EHRs after the fact, researchers should design prospective, real-world study protocols that deliberately collect high-quality data directly from medical records, using tools similar to traditional case report forms. Treat that data with the same respect clinical trial data receives: manage missing data, track quality metrics, and enforce consistency across sites. When you approach real-world studies with the same discipline as trials, the result is usable, trustworthy real-world evidence (RWE).
AI, especially large language models, can help lighten the load, automating documentation, extracting data from unstructured fields, and linking disparate datasets. But they can’t record information that was never captured, and they’re still prone to hallucinating details. We must deploy AI thoughtfully, with guardrails and human oversight, not as a magic fix. Let’s not forget the people actually entering the data, the clinicians. Clinicians are already drowning in administrative tasks. Without incentives to collect complete and structured data, they simply won’t do it. Companies need to reward providers who contribute to high-quality research datasets.
Real-world data has the potential to transform clinical research, making it faster, cheaper, and more inclusive. But potential alone isn’t enough. Without data quality, structure, and strategic oversight, we’re just building castles on sand. Want real-world evidence that’s actually reliable? Then we need to treat real-world data like clinical trial data, with rigor, planning, and respect. Only then can we turn the everyday mess of medical records into the engine of discovery.