The Business Brief Executives Actually Trust
In a world of sensational headlines and shallow analysis, The Daily Upside stands apart. Founded by former bankers and seasoned journalists, it delivers crisp, actionable insights executives actually use to make smarter decisions.
From market-moving developments to deep dives on business trends, The Daily Upside gives leaders clarity on what matters — without the noise.
That’s why over 1 million readers, including C-suite executives and senior decision-makers, start their day with it.
No fluff. No spin. Just business clarity.
Hello!
Welcome to LLM Friday.
Today, I will share with you 1 research, 2 AI tools, and 3 News on Large Language Models.
Are you ready?
✨LLMs
This study evaluated how well ChatGPT 4.5 answered common patient questions about lower back pain (LBP), focusing on accuracy, readability, and clinical utility.
🔬 Methods
Questions: 25 standardized patient questions across 5 categories: diagnosis, medical guidance, treatment, self-care, and physical therapy.
Prompting styles tested:
No special instructions
Patient-friendly (plain language)
4th, 6th, 8th grade reading level
Reference-based (with citations)
Evaluation: Three blinded physicians rated responses as:
1= incorrect
2= partially correct
3= correct according to the International Pain and Spine Intervention Society Guidelines.
📊 Results
Accuracy was high (mean score 2.81/3), with less than 6% of responses rated fully incorrect.
Prompt style matters:
8th grade and Reference prompts were most accurate.
4th grade prompts reduced accuracy, often omitting safety details.
Overall accuracy: 2.81/3. <6% fully incorrect.
Readability mismatch:
Most answers are above 6th-grade level.
Only 4th grade prompts produced responses below 8th grade level, but accuracy failed.
Verbosity doesn’t correlate with accuracy: Reference prompts were longer, but not more accurate than shorter prompts.
Errors: 18–30% rated partially correct due to missing safety details, not false information.
🔑 Key Takeaways
How questions are made affects ChatGPT’s medical accuracy.
Moderate literacy prompts (8th grade level) give the most reliable balance between clarity and accuracy.
ChatGPT tends to increase the readability level, producing technical responses.
💡LLMs may support patient education but should not replace professional guidance.
🔗Basharat A, Shah R, Wilcox N, et al. ChatGPT and low back pain: Evaluating AI-driven patient education in the context of interventional pain medicine. Interventional Pain Medicine. 2025;4:100636. doi:10.1016/j.inpm.2025.100636
🦾TechTools
Gives you access to 100+ AI models for language, images, video, and code.
Can code, debug, and do technical tasks.
Allows you to ask questions and get answers you can trust based on 420 peer-reviewed journals, and medical guidelines.
Shows you the reasoning and evidence behind the advice.
Designed to integrate with Electronic Health Records and hospital systems.
Reduced risk of hallucination or misinformation.
🧬AIMedily Snaps
LLMs in neurology: a paper on the evidence of LLMs in neurological treatment (Link).
Do ChatGPT and Gemini's recommendations align with established guidelines for hand and upper extremity surgery? (Link).
Nonclinical information in patient messages — like typos, extra white space — reduces the accuracy of LLMs (Link).
That’s all for today.
You’re already ahead of the curve in medical LLMs — don’t keep it to yourself. Forward AIMedily to a friend who’d appreciate it.
Before I go, help me answer this 2 question (anonymous)
Which AI tools do you want me to feature more?
Which do you prefer?
Thank you for your answers!
See you next week.
Itzel Fer, MD PM&R
Forwarded this email? Sign up here
How did you like today's newsletter?
Seeking impartial news? Meet 1440.
Every day, 3.5 million readers turn to 1440 for their factual news. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture, all in a brief 5-minute email. Enjoy an impartial news experience.