Welcome

Your journey: From concept to code - A roadmap showing the path from Course Intro & Setup through Data Prep & Tokenization, Model Architecture, Training Loop, to Build & Deploy Your LLM — Your journey: From concept to code

ChatGPT, Claude, Gemini — these tools feel like magic. You type a question, and they respond with human-like text. But how do they actually work?

Most explanations fall into two camps: either "it's a neural network" (too vague) or research papers full of equations (too dense). This tutorial takes a different approach.

The best way to understand something is to build it.

Who This Is For

This tutorial is for developers who want to truly understand LLMs — not just use APIs, but know what's happening inside.

You've used ChatGPT and wondered "how does this actually work?"
You've heard terms like "transformer" and "attention" but they're still fuzzy
You want to understand AI deeply, not just superficially
You learn best by building, not just reading

What Makes This Different

Instead of explaining transformers abstractly, we'll build a working LLM together. A small one — but using the same core architecture that powers ChatGPT.

Typical Tutorial vs This Tutorial: Left side shows a frustrated person surrounded by abstract math equations like Q × K^T × V with caption 'Read and forget'. Right side shows a person building a robot with labeled parts (Tokenizer, Attention, Output) with caption 'Build and understand'. — Read and forget vs. Build and understand

By the end, you won't just know what a transformer is — you'll know why each piece exists, because you'll have built each piece yourself.

Vibe Coding Friendly

Got access to Claude Code, Cursor, or similar AI coding tools? This tutorial is designed for you.

Copy-paste friendly — all code blocks are ready to run
Ask your AI — "explain this attention code" or "what if I change the embedding size?"
Experiment freely — break things, ask why, fix them

What You'll Build

We'll build a calculator that understands English — using the same transformer architecture that powers ChatGPT:

Input:  "two plus three"    →  Output: "five"
Input:  "seven times eight"  →  Output: "fifty six"
Input:  "nineteen minus four" →  Output: "fifteen"

Sounds simple? It requires the full transformer architecture: tokenization, embeddings, attention, training, generation — everything that makes ChatGPT work.

This is not fine-tuning. You'll write the transformer architecture from line 1. No pre-trained weights. No shortcuts. Just you, Python, and PyTorch.

How It Compares to GPT-4

A friendly robot presenting a completed LLM project. The robot gives a thumbs up next to a box labeled 'YOUR LLM' that transforms 'two plus three' into 'five'. A chalkboard behind shows the comparison: Your Model (100K params, 20 min) vs GPT-4 (2T params, months) with 'Same Core Principles' in the middle. — Your finished model: same architecture as GPT-4, just smaller

Spec	Our Model	GPT-4
Parameters	~50,000	~1.7 trillion
Embedding dim	64	12,288
Attention heads	4	~96
Layers	2	~96
Vocabulary	~30 words	~100,000 tokens
Training time	5-10 min	Months

Same recipe, smaller cake. Your calculator uses the same transformer architecture — attention, embeddings, layer normalization, everything. We just dial down the size because our problem is simpler.

Why a Calculator?

Why use a neural network for math that a $2 calculator does better? Because math has objectively right and wrong answers — making it perfect for learning:

Instant feedback — if the model says 2+2=5, you know it's wrong
Small vocabulary — only ~30 words vs GPT's 100,000 tokens
Trains in minutes — not days or weeks like real LLMs
Runs anywhere — your laptop is enough, no GPU required

Scope & Limitations

Our calculator is intentionally limited — these constraints make learning easier:

✓ Can Do	✗ Cannot Do
Numbers 0-99	Numbers above 99
Single operations (+, -, ×, ÷)	Chained operations
English words in/out	Digits or symbols

✓ Can Do

Numbers 0-99

✗ Cannot Do

Numbers above 99

✓ Can Do

Single operations (+, -, ×, ÷)

✗ Cannot Do

Chained operations

✓ Can Do

English words in/out

✗ Cannot Do

Digits or symbols

Try It Right Now

Here's the finished model running on Hugging Face. This is exactly what you'll build:

Try the live demo →

Helpful?

Overview The Journey