In partnership with

Hi!

Today is LLM’s day.

A day when I will only share Research, Tools, and News on Large Language Models.

But what are LLMs? Well, they are deep learning algorithms that can recognize, summarize, translate, predict, and generate content.

LLMs are trained on immense amounts of data, making them capable of understanding and generating natural language, images, code, and videos.

The most known is ChatGPT, but I’ll share some other LLMs with you, and their best use from my point of view.

Are you ready? Here we go.

LLMs

🔬 Methods

Data: 2,133 vignettes from the Human Diagnosis Project.

Participants: 

  • Licensed Physicians

  • 5 large language models (LLMs): Claude 3 Opus, GPT‑4, Gemini Pro, Mistral, and Llama 2.

Comparison groups:

  • Physician alone

  • Physician teams

  • Individual LLMs

  • LLM ensembles

  • Physicians + LLMs

📊 Results

Accuracy of diagnosis:

  • Physicians alone: 68.3%

  • Physician teams: 75%

  • LLM alone:

    • Claude 3 Opus: 72.1%

    • GPT-4: 71.6%

    • Gemini Pro: 69.2%

    • Mistral/Llama 2: <65%

  • LLM ensembles: 74.8%

  • Physicians team + LLM ensemble: 79.8%

  • Physician alone + LLMs: 80.4% (p<0.001)

Error correction:

  • Physicians corrected 58.7% of LLM mistakes

  • LLM corrected 61.3% of Physician errors

🔑 Key Takeaways

  • Physician + LLMs collectives outperformed all other groups.

  • Best performing LLMs: Claude 3 Opus, and GPT-4.

  • LLMs helped cover gaps where physicians can make mistakes, and vice versa.

  • This research supports collaborative workflows in clinical settings.

🔗Zöller N, Berger J, Lin I, et al. Human–AI collectives most accurately diagnose clinical vignettes. Proc Natl Acad Sci U S A. 2025;122(24):e2426153122. doi:10.1073/pnas.2426153122

🦾TechTools

There are several LLMs; some of them are general, and there are also LLMs designed for clinical use.

Today, I’ll start with 3 generals: ChatGPT, Claude, and Manus.

  • The most well-known LLM is great for everyday tasks.

  • Versatile and conversational.

  • Good for creative writing and generating images.

  • Is great at summarizing and analysing long documents (upload a paper and ask questions about it).

  • Great for long, more professional writing, deep thinking, and clarity.

  • It spots ethical risks, and it’s fast too.

  • Best for researching medical information that requires references.

  • Can manage complex tasks and workflows (without having to explain every step).

  • Good at deep reasoning.

  • Great for automation and integration with other apps.

It’s important to remember that current LLMs are not trained for clinical use and are not HIPAA-compliant.

That’s all for now.

If you know people in healthcare who would like to get updates on LLM news, feel free to share it. You can:

↪️Forwarding this email or 📲share this link .

Thank you!

Until next Wednesday.

Itzel Fer, MD PM&R

Follow me on LinkedIn | Substack | X | Instagram

Join my Newsletter 👉 AIMedily.com

Forwarded this email? Sign up here

How did you like today's newsletter?

Login or Subscribe to participate