In partnership with

Business news doesn’t have to be boring

Morning Brew makes business news way more enjoyable—and way easier to understand. The free newsletter breaks down the latest in business, tech, and finance with smart insights, bold takes, and a tone that actually makes you want to keep reading.

No jargon, no drawn-out analysis, no snooze-fests. Just the stuff you need to know, delivered with a little personality.

Over 4 million people start their day with Morning Brew, and once you try it, you’ll see why.

Plus, it takes just 15 seconds to subscribe—so why not give it a shot?

Check it out

Hi!

Today is LLM Friday. The day I’ll only share Research, Tools, and News on Large Language Models.

A few days ago, OpenAI released a plan to strengthen ChatGPT’s role in mental health and safety, after a family filed a lawsuit against OpenAI after their teen son committed suicide. The teen had conversations about his suicidal ideation, but allegedly, ChatGPT’s safeguards were not enough.

OpenAI will allow parents to link their account and receive a notification if the platform thinks their child is in "acute distress", improve protection, and strengthen how models handle self-harm and mental health crises. The company is working with 90 physicians in 30 different countries, along with an expert council.

Are you ready? Here we go.

✨LLMs

Generative AI in Stroke Care: Not Ready Yet

This study evaluated the performance of GPT-4o, Claude 3, and Gemini Ultra 1.0 across multiple stages of stroke care in realistic scenarios.

🔬 Methods

Prompting approaches used to improve responses:

Zero-Shot Learning: Answer directly with no examples or guidance.

Chain-of-Thoughts: Shows Step-by-Step reasoning.

Talk-Out-Your-Thought: Verbalize and explore options before concluding.

4 Clinical stages evaluated: Prevention, diagnosis, treatment and rehabilitation.

Cases: Five patient scenarios.

Who was evaluated the responses?: Four senior stroke clinicians on:

Accuracy
Hallucinations
Specificity
Empathy
Actionability
Responses were randomized, blinded, and compared against established guidelines and a minimum competency threshold of 60/100.

📊 Results

Overall performance: All models scored below the clinical competency threshold of 60/100.
ChatGPT-4o outperformed others in accuracy, specificity, and actionability, but had higher hallucination rates.
Gemini Ultra had a moderate performance, slightly below ChatGPT.
Claude 3 had slightly lower scores overall, with fewer hallucinations but less specificity.
Talk-Out-Your-Thoughts was best for empathy and actionability, especially in prevention and recovery.
Chain-of-Thought enhanced structured reasoning during diagnosis.
Zero-Shot-Learning provided concise responses with fewer hallucinations, effective in treatment.
Hallucinations: All models generated false or misleading information.

🔑 Key Takeaways

Rigorous clinician oversight remains essential to mitigate risks like hallucinations and inaccuracies.
Current LLMs still delivered inconsistently accurate and safe stroke information.
Prompt design matters: Zero-Shot reduced hallucinations, while Talk-Out-Your-Thoughts improved empathy and usefulness.
Statistical Significance: Differences among models and prompt techniques were often not statistically significant.
💡The findings indicate we need to carefully integrate LLMs into healthcare, emphasizing human oversight.
💡 Highlights the critical need for specialized medical LLMs and rigorous validation before clinical deployment.

🔗 Lee JT, Li VCS, Wu JJ, Chen HH, Su SSY, Chang BPH, Lai RL, Liu CH, Chen CT, Tanapima V, Shen TKB, Atun R. Evaluation of performance of generative large language models for stroke care. NPJ Digit Med. 2025;8:481. doi:10.1038/s41746-025-01830-9. PMID: 40730644.

🦾TechTools

ChatDoctor (Link)

Medical LLM based on LLaMa (Meta’s model) that is Open Source (free).
Trained on real patient–doctor dialogues.
Provides doctor-patient style responses to practitioners and patients, rather than just facts.
Can be customizable for institutions, used in multiple languages, and understands text and audio.

MedPalm 2 (Link)

Developed by Google DeepMind/Google Research.
LLM fine-tuned for medicine, designed to give you high quality answers.
Can be applied to basic tasks and complex workflows.
Has been evaluated on MedQA (USMLE questions) and other benchmarks.
Video here.

Medisearch (Link)

Search tool that answers to medical questions from medical guidelines and research papers.
Reduces the time you spend searching databases.
Can create lists of papers on specific topics you need.
Gives you medical references with links to the original papers.
Works in 23 languages.

That’s all for today.

You’re already ahead of the curve in medical LLMs — don’t keep it to yourself. Forward AIMedily to a colleague who’d appreciate the insights.

Thank you!

Until next Wednesday.

Itzel Fer, MD PM&R

Follow me on LinkedIn | Substack | X | Instagram

Join my Newsletter 👉 AIMedily.com

Forwarded this email? Sign up here

P.S. Do you mind sharing how AIMedily has helped you? Even one sentence makes a huge difference Fill review here.

How did you like today's newsletter?

The Gold standard for AI news

AI will eliminate 300 million jobs in the next 5 years.

Yours doesn't have to be one of them.

Here's how to future-proof your career:

Join the Superhuman AI newsletter - read by 1M+ professionals
Learn AI skills in 3 mins a day
Become the AI expert on your team

Start learning AI now

LLMs Friday 5