Lesson 01 — Intro to LLMs

What is a language model?

A function that, given a prefix of tokens, returns a probability distribution over the next token.

Why now?

Compute is cheap and parallel
The web gave us trillions of tokens
Attention scales better than recurrence

What we’ll build toward

A working mental model of how a transformer predicts the next token
Comfort reading inference-time code (sampling, KV cache, batching)
Practical fluency with the tradeoffs in fine-tuning vs. RAG vs. prompting

Logistics

Slides land at llm-engg.github.io/llms-may-26/slides/
Assignments tracked in the main course site