Part 2: The Project

Understand why we're building a calculator and what the final result looks like

Why a Calculator?

We could teach LLM concepts with any task—chatbot, code generation, translation. But a calculator is perfect for learning. Here's why:

Criteria	Calculator	Text-to-SQL	Chatbot
Vocabulary	~30 words	~10,000	~50,000
Training time	5-10 min	2-3 hours	Days
Data generation	Trivial	Need dataset	Complex
Verify correctness	Easy	Medium	Hard

Key insight: Same concepts, 100x faster iteration. You'll learn tokenization, attention, and transformers—just on a smaller scale.

You might want to jump straight to code generation or chat. But:

Complexity hides understanding — With a complex task, you can't tell if issues are from your model or your data
Training time kills iteration — Real LLMs take days/weeks to train. Our calculator trains in minutes.
The concepts are identical — Tokenization, attention, transformers—it's all the same, just smaller

Once you understand the calculator, scaling up is straightforward.

The Task

Our model will convert English math phrases into English number answers:

"two plus three"         → "five"
"seven minus four"       → "three"
"six times eight"        → "forty eight"
"twenty divided by five" → "four"

Notice: both input and output are words, not digits. The model never sees "2 + 3 = 5"—it only sees "two plus three" and learns to predict "five".

The Dataset

Unlike real LLMs that train on internet text (Wikipedia, books, websites), we'll generate our own dataset programmatically. This is one of the beauties of our calculator project—we don't need to find or download any corpus.

# We write code to generate training examples:
import random

def generate_example():
    a = random.randint(0, 99)  # e.g., 23
    b = random.randint(0, 99)  # e.g., 15
    op = "plus"
    result = a + b             # 38

    # Convert to words (we'll write this helper function):
    # to_words(23) → "twenty three"
    # to_words(38) → "thirty eight"
    return to_words(a) + " plus " + to_words(b), to_words(result)

# Generate 1000 examples in seconds!

Why this matters: Real LLM training requires terabytes of carefully curated text data. Our approach lets us create unlimited, perfectly labeled training examples instantly.

Bonus insight: Our random.randint(0, 99) creates a uniform distribution—every number appears equally. Real language follows a Zipfian (power law) distribution: "1+1" appears millions of times while "97+6" appears once. Our uniform data actually makes training easier and prevents the model from memorizing common cases.

Our Vocabulary

Category	Tokens
Numbers 0-19	zero, one, two, ... nineteen
Tens	twenty, thirty, forty, ... ninety
Operations	plus, minus, times, divided, by
Special	[PAD], [START], [END]

That's roughly 30 tokens total. Compare this to GPT-4's ~100,000 tokens!

Tokenization choice: We'll use whole-word tokenization (splitting by spaces). Real LLMs use sub-word tokenization (BPE/WordPiece), where "ninety" might split into "nine" + "##ty". Our approach is simpler and works perfectly for our fixed vocabulary.

We'll generate 500-1000 examples covering addition, subtraction, multiplication, and division with numbers 0-99.

Model Specifications

Here's how our tiny model compares to GPT-4:

Spec	Our Model	GPT-4
Parameters	~1-2 million	~1.7 trillion
Embedding dim	64-128	12,288
Layers	2-4	~96
Vocabulary	~30	~100,000
Max sequence length	~10-20 tokens	32k-128k tokens
Training time	5-10 min	Months

Our model is about a million times smaller—but it uses the exact same architecture!

What It Can & Cannot Do

Can Do

Basic arithmetic with numbers 0-99
Single operations (one plus, minus, times, or divided by)
Output results as English words

Cannot Do

Numbers above 99
Chained operations ("two plus three minus one")
Decimals or fractions
Parentheses or order of operations

These limitations are by design. We're building a learning tool, not a production calculator. The concepts transfer directly to larger models.

What You'll Create

By the end of this series, you'll have built these files from scratch:

File	Description
tokenizer.py	Converts text to token IDs
embeddings.py	Converts token IDs to vectors
attention.py	The attention mechanism
transformer.py	Transformer blocks
model.py	Complete model architecture
dataset.py	Calculator dataset generator
train.py	Training loop
generate.py	Text generation
app.py	Gradio demo for Hugging Face

Source Code: The complete code for this tutorial is available at github.com/slahiri/small_calculator_model

End-to-End Preview

Here's what the final result looks like:

from model import CalculatorLLM

model = CalculatorLLM.load("calculator-llm.pt")

model.calculate("two plus three")      # → "five"
model.calculate("nine times nine")     # → "eighty one"
model.calculate("fifty minus twelve")  # → "thirty eight"

The .calculate() method is a convenience wrapper. Under the hood, it's still doing autoregressive generation—appending "two plus three" + predicted token, checking for [END], and returning just the answer portion.

Try It Live

At the end of this series, we'll deploy our model to Hugging Face Spaces—a free platform to host ML demos. You'll be able to share your working calculator LLM with anyone via a simple URL.

Your friends can test your model in their browser—no Python or setup required. Just type "seven times eight" and watch your model respond "fifty six".

Part 1: Foundations Part 3: Tokenization & Embeddings