Lesson 01 — Intro to LLMs

What is a language model?

A function that, given a prefix of tokens, returns a probability distribution over the next token.

Why now?

  • Compute is cheap and parallel
  • The web gave us trillions of tokens
  • Attention scales better than recurrence

What we’ll build toward

  • A working mental model of how a transformer predicts the next token
  • Comfort reading inference-time code (sampling, KV cache, batching)
  • Practical fluency with the tradeoffs in fine-tuning vs. RAG vs. prompting

Logistics

  • Slides land at llm-engg.github.io/llms-may-26/slides/
  • Assignments tracked in the main course site