Build Your First LLM from ScratchPart 2 · Section 6 of 7

What You'll Create

By the end of this series, you'll have built these files from scratch. The project has two main phases:

Phase 1: Training — Teaching the Robot

First, we build the factory and train it to understand math:

FileWhat It Does
tokenizer.pyConverts words → numbers (the vocabulary)
embeddings.pyConverts numbers → rich vectors (meaning)
attention.pyLets tokens "talk" to each other
transformer.pyCombines attention with processing layers
model.pyThe complete factory — all pieces together
dataset.pyGenerates thousands of math examples to learn from
train.pyThe learning loop — practice until perfect

Phase 2: Generation — Using the Trained Robot

Once trained, the factory can answer new questions:

FileWhat It Does
generate.pyTakes input, runs through factory, outputs answer
app.pyWeb interface so anyone can try your model
Training happens once (takes ~10 minutes on a laptop). After that, generation is instant — the factory is ready to answer any math question!
Source Code: The complete code for this tutorial is available at github.com/slahiri/small_calculator_model
Helpful?