AI beats doctors in synthetic scenarios. Real clinical practice is messier.
- doctorbhargavmd
- Mar 28
- 2 min read
Benchmark studies use complete data and clear criteria. Medicine doesn't work that way.
TL;DR:
"AI outperforms doctors" headlines come from benchmarks with complete information and structured formats
Real clinical practice involves incomplete data, ambiguous presentations, and contextual factors not in electronic records
Both positive and negative synthetic research creates misleading narratives about AI deployment
The gap between benchmark performance and clinical utility runs in both directions
The pattern in AI research:
Studies showing AI outperforms doctors use synthetic scenarios. Clean vignettes, all information provided upfront, structured formats, clear diagnostic criteria.
These are cases where pattern recognition excels. AI processes complete data sets and matches them to known patterns extremely well.
Studies showing AI fails catastrophically also use synthetic scenarios. No follow-up questions, multiple choice constraints, fixed information sets that don't match how tools are designed to be used.
Both types create misleading narratives.
Here’s why real clinical practice is different:
Medicine involves incomplete data, ambiguous presentations, patients who can't articulate symptoms clearly, and contextual factors that don't appear in structured records.
Physicians navigate uncertainty. They make judgment calls with missing information. They integrate social context, patient preferences, and factors that aren't captured in data.
AI performs well when problems map to training distribution. Physicians perform well when problems require adaptation beyond pattern matching.
The deployment problem:
"AI is better than doctors" based on synthetic scenarios creates unrealistic deployment expectations.
"AI is unbelievably dangerous" based on synthetic scenarios creates fear that blocks useful applications.
Both rely on controlled settings that don't match real clinical workflows.
Here’s my take:
We can't use synthetic research (positive or negative) to guide real-world deployment decisions.
The gap between benchmark performance and clinical utility runs in both directions. AI that beats physicians on clean benchmarks may fail in messy reality. AI that fails on constrained tests may perform well when used as designed.
The future isn't AI replacing physicians or physicians blocking AI out of safety concerns based on flawed studies. It's Human + AI, each doing tasks they do best.
That requires different research. Not synthetic benchmarks showing AI vs human performance, but real-world studies showing how Human + AI configurations perform compared to either alone.
Dr. Bhargav Patel, MD, MBA
Physician-Innovator | AI in Healthcare | Child & Adolescent Psychiatrist
P.S. Have you seen "AI beats doctors" studies that don't reflect the clinical scenarios where you'd actually want AI support?
Reply and let me know what deployment reality looks like in your organization.


Comments