Build Your First LLM from ScratchPart 2 · Section 4 of 7
Model Specifications
Here's how our tiny model compares to GPT-4:
| Spec | Our Model | GPT-4 |
|---|---|---|
| Parameters | ~1-2 million | ~1.7 trillion |
| Embedding dim | 64-128 | 12,288 |
| Layers | 2-4 | ~96 |
| Vocabulary | ~30 | ~100,000 |
| Max sequence length | ~10-20 tokens | 32k-128k tokens |
| Training time | 5-10 min | Months |
Our model is about a million times smaller—but it uses the exact same architecture!
Helpful?