Siddhartha Lahiri

Build Your First LLM from ScratchPart 2 · Section 4 of 7

Model Specifications

Here's how our tiny model compares to GPT-4:

Spec	Our Model	GPT-4
Parameters	~1-2 million	~1.7 trillion
Embedding dim	64-128	12,288
Layers	2-4	~96
Vocabulary	~30	~100,000
Max sequence length	~10-20 tokens	32k-128k tokens
Training time	5-10 min	Months

Our model is about a million times smaller—but it uses the exact same architecture!

Helpful?

The Dataset What It Can & Cannot Do