Hi!
Today is LLM Friday. A day that I will only share Research and News on Large Language Models.
We’re done with the LLMs that are accessible to everyone without having to code. So today I’ll be sharing the basics of prompting styles. If you’re interested in learning more about this topic, hit reply and let me know.
But first, a research summary on LLMs. Are you ready?
✨LLMs
Researchers developed DrugGPT, an LLM designed to improve drug recommendations, dosage, and safety analysis based on evidence.
Their goal was to create an AI tool that is accurate, evidence-based, and traceable for clinical decision support.
🔬 Methods
Had three components:
Inquiry Analysis LLM: interprets medical questions.
Knowledge Acquisition LLM: extracts evidence from trusted drug databases like the PubMed, NHS and Drugs.com.
Evidence Generation LLM: provides answers, and the sources of information can be traced.
It was evaluated for:
Drug recommendation
Dosage recommendation
Adverse reaction identification
Interaction detection
General pharmacology questions.
The LLMs was evaluated on 11 benchmarks, and compared against ChatGPT, GPT-4, Claude, Med-PaLM 2, Flan-PaLM, LLaMA, and Galactica.

📊 Results
DrugGPT achieved state-of-the-art accuracy across all tasks with fewer parameters than GPT.
USMLE accuracy: 88.2% vs. 83.5% (GPT-4) and 63.7% (ChatGPT).
Drug–drug interactions (DDI): 83.7% accuracy vs. 62.8% (GPT-4).
Maintained high accuracy on new drugs, where GPT-4 and ChatGPT dropped below 65%.
Physicians rated DrugGPT outputs higher in factuality, completeness, safety, and especially evidence-based.
🔑 Key Takeaways
DrugGPT consistently links responses to information that could be verified.
Outperforms existing LLMs on drug recommendation, dosage, safety, and pharmacology tasks.
Achieves higher accuracy with fewer parameters.
Maintains strong performance on unseen drugs, critical for real-world clinical adoption.
💡Domain-specific LLMs in medicine are a great option to improve accuracy and reliability in clinical contexts.
🔗 Zhou H, Liu F, Wu J, et al. A collaborative large language model for drug analysis. Nat Biomed Eng. 2025. doi:10.1038/s41551-025-01471-z
🦾 AITools
Prompting
Zero-Shot Prompting
You ask the model to perform a task without showing any examples.
➝ Best for general tasks when the model already has information.
One-Shot / Few-Shot Prompting
You show the model one (one-shot) or a few (few-shot) examples before asking your real question.
➝ Helps the model learn your format or tone.
Chain-of-Thought Prompting (CoT)
You explicitly tell the model to reason step-by-step before giving the answer.
➝ Improves performance on tasks that need logic or multi-step reasoning.
Zero-Shot Chain-of-Thought
A hybrid: you give no examples, but add a phrase like “Let’s think step by step”.
➝ Increases accuracy on reasoning tasks without needing examples.
Retrieval-Augmented Generation (RAG)
The model pulls information from databases before answering.
➝ Reduces hallucinations, ensures answers are evidence-based.
🧬AIMedily Snaps
That’s all for today! After a long week, I’m ready to close the computer and watch a movie (A Complete Unknown).
You’re already ahead of the curve in medical LLMs — don’t keep it to yourself. Forward AIMedily to a friend who’d appreciate the insights.
Thank you!
Until next Wednesday.
Itzel Fer, MD PM&R
Forwarded this email? Sign up here