Build Your First LLM from ScratchPart 2 · Section 6 of 7
What You'll Create
By the end of this series, you'll have built these files from scratch. The project has two main phases:
Phase 1: Training — Teaching the Robot
First, we build the factory and train it to understand math:
| File | What It Does |
|---|---|
| tokenizer.py | Converts words → numbers (the vocabulary) |
| embeddings.py | Converts numbers → rich vectors (meaning) |
| attention.py | Lets tokens "talk" to each other |
| transformer.py | Combines attention with processing layers |
| model.py | The complete factory — all pieces together |
| dataset.py | Generates thousands of math examples to learn from |
| train.py | The learning loop — practice until perfect |
Phase 2: Generation — Using the Trained Robot
Once trained, the factory can answer new questions:
| File | What It Does |
|---|---|
| generate.py | Takes input, runs through factory, outputs answer |
| app.py | Web interface so anyone can try your model |
Training happens once (takes ~10 minutes on a laptop). After that, generation is instant — the factory is ready to answer any math question!
Source Code: The complete code for this tutorial is available at github.com/slahiri/small_calculator_modelHelpful?