Lesson 01 — Intro to LLMs
What is a language model?
A function that, given a prefix of tokens, returns a probability distribution over the next token.
Why now?
- Compute is cheap and parallel
- The web gave us trillions of tokens
- Attention scales better than recurrence
What we’ll build toward
- A working mental model of how a transformer predicts the next token
- Comfort reading inference-time code (sampling, KV cache, batching)
- Practical fluency with the tradeoffs in fine-tuning vs. RAG vs. prompting
Logistics
- Slides land at
llm-engg.github.io/llms-may-26/slides/
- Assignments tracked in the main course site