Build Your First LLM from ScratchPart 3 · Section 7 of 13

Why PyTorch (Not Just NumPy)?

NumPy is great for array math, but neural networks need two things NumPy can't do:

  1. Automatic gradients — Training requires computing derivatives of millions of parameters. PyTorch tracks operations and computes gradients automatically ("autograd"). In NumPy, you'd write the calculus by hand.
  2. GPU acceleration — A GPU has thousands of cores that can multiply matrices in parallel. What takes 10 minutes on CPU takes 10 seconds on GPU. PyTorch moves tensors to GPU with one line: tensor.to("cuda")

For our tiny calculator, CPU is fine—training takes minutes either way. But understanding PyTorch conventions prepares you for larger models where GPU is essential.

Helpful?