Build Your First LLM from ScratchPart 3 · Section 7 of 13
Why PyTorch (Not Just NumPy)?
NumPy is great for array math, but neural networks need two things NumPy can't do:
- Automatic gradients — Training requires computing derivatives of millions of parameters. PyTorch tracks operations and computes gradients automatically ("autograd"). In NumPy, you'd write the calculus by hand.
- GPU acceleration — A GPU has thousands of cores that can multiply matrices in parallel. What takes 10 minutes on CPU takes 10 seconds on GPU. PyTorch moves tensors to GPU with one line:
tensor.to("cuda")
For our tiny calculator, CPU is fine—training takes minutes either way. But understanding PyTorch conventions prepares you for larger models where GPU is essential.
Helpful?