Build Your First LLM from ScratchPart 3 · Section 10 of 13
At Scale
| Model | Vocab Size | Embed Dim | Embedding Parameters |
|---|---|---|---|
| Our Calculator | 36 | 64 | 2,304 |
| GPT-2 | 50,257 | 768 | 38.6 million |
| GPT-3 | 50,257 | 12,288 | 617 million |
| GPT-4 | ~100,000 | ~12,288 | ~1.2 billion |
GPT-4's embedding layer alone (~1.2B parameters) is 500,000× larger than our entire embedding layer. Same concept, vastly different scale.
Helpful?