Build Your First LLM from ScratchPart 2 · Section 4 of 7

Model Specifications

Here's how our tiny model compares to GPT-4:

SpecOur ModelGPT-4
Parameters~1-2 million~1.7 trillion
Embedding dim64-12812,288
Layers2-4~96
Vocabulary~30~100,000
Max sequence length~10-20 tokens32k-128k tokens
Training time5-10 minMonths

Our model is about a million times smaller—but it uses the exact same architecture!

Helpful?