In partnership with

Trusted by millions. Actually enjoyed by them too.

Most business news feels like homework. Morning Brew feels like a cheat sheet. Quick hits on business, tech, and finance—sharp enough to make sense, snappy enough to make you smile.

Try the newsletter for free and see why it’s the go-to for over 4 million professionals every morning.

Check it out

Hi {name}!

Today is LLM Friday, a day I share research only on Large Language Models.

A recent paper published in the New England Journal of Medicine brought to light how AI is transforming healthcare delivery — and how this comes with a hidden cost.

Training and running large AI models demand high energy consumption. Nature estimated that training GPT-3 is equal to the yearly emissions of 100 cars.

Post-training LLMs can be widely used, but new models come out frequently, which raises questions about sustainability.

Should we ask ourselves how to use AI to improve patient outcomes while minimizing our effect on the environment?

Healthcare already consumes significant resources — so the balance between innovation and sustainability matters. Companies need to consider the best way to reduce emissions and maximize clinical utility.

Now, let’s check what’s new.

LLMs

LLMs in healthcare need to integrate new and updated medical knowledge to produce relevant and accurate responses.

OpenAI, Google, and Meta enable users to fine-tune their models through commercial APIs (Application Programming Interfaces).

But it’s unclear whether this fine-tuning is effective. In this study, researchers fine-tuned six LLMs using datasets of new and updated medical knowledge.

🔬 Methods

LLMs evaluated:

  • OpenAI GPT-3.5 Turbo

  • OpenAI GPT-4o

  • OpenAI GPT-4o mini

  • Google Gemini 1.5

  • Meta Llama 3.1 8B

Tests:

  • New FDA drug approvals (38 drugs, safety, dosage, contraindications).

  • Updated medical guidelines (e.g., new hypertension targets).

  • Synthetic patient electronic health records.

  • Clinical vignettes.

📊 Results

Overall performance:

  • Average accuracy for learning new medical knowledge: 37%.

  • Average for updating existing knowledge like guidelines: 19%.

Model differences:

  • Best performances: GPT‑4o‑mini, followed by GPT‑3.5‑Turbo and GPT‑4o.

  • OpenAI models memorized better.

  • Gemini/Llama showed very low learning.

🔑 Key Takeaways

  • Fine-tuning did not improve accuracy; some models worsened.

  • Models memorized data but failed to generalize to new vignettes.

  • Commercial fine-tuning of LLMs on recent medical literature and guidelines leads to modest improvements.

  • All of these LLMs had frequent hallucinations. They struggled with distinguishing outdated versus current recommendations.

  • Fine-tuned LLMs have difficulty reasoning on complex cases, recent changes in diagnostic criteria, treatment recommendations, and

    emerging therapies.

    💡Relying solely on fine-tuning for keeping clinical LLMs up to date is still not enough, human supervision is key.

🔗 Wu E, Wu K, Zou J. Limitations of Learning New and Updated Medical Knowledge with Commercial Fine‑Tuning Large Language Models. NEJM AI. 2025 Jul 15;2(8). doi:10.1056/AIcs2401155

🦾AIMedily Tools

  • Built for clinical medicine with search from medical guidelines.

  • Tested with 314 real clinical questions across 9 specialties.

  • A research paper from NEJM AI showed that it outperformed ChatGPT-4 in clinician preference and safety.

  • References curated to reduce hallucinations.

  • App available to download on smartphone.

  • OpenAI GPT-4 model trained in medical contexts (clinical decision support, education, documentation).

  • Performs above passing level on medical exams like USMLE.

  • Strong at summarizing guidelines and explaining medical concepts.

  • More accurate than general models.

  • Still has hallucinations and errors, especially in rare cases.

  • A platform where you can access to multiple LLMs (GPT-4, Claude, Gemini, etc) in one app.

  • Good for brainstorming and comparing responses from different LLMs.

  • Community prompt libraries available.

  • Does not have the latest LLMs models (like ChatGPT-5)

  • Good option to save on subscriptions.

That’s all for this week.

You’re already ahead of the curve in medical LLMs — don’t keep it to yourself; Forward AIMedily to a colleague who’d appreciate the insights (and help grow this community).

Thank you!

Until next Wednesday.

Itzel Fer, MD PM&R

Follow me on LinkedIn | Substack | X | Instagram

Forwarded this email? Sign up here

P.S. If you enjoy AIMedily, would you help us with a quick review? If your answer is yes, awesome! Here is the link. Fill review here  

How did you like today's newsletter?

Login or Subscribe to participate

Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.