Build Your First LLM from ScratchPart 3 · Section 7 of 13

Why PyTorch (Not Just NumPy)?

NumPy is great for array math, but neural networks need two things NumPy can't do:

Automatic gradients — Training requires computing derivatives of millions of parameters. PyTorch tracks operations and computes gradients automatically ("autograd"). In NumPy, you'd write the calculus by hand.
GPU acceleration — A GPU has thousands of cores that can multiply matrices in parallel. What takes 10 minutes on CPU takes 10 seconds on GPU. PyTorch moves tensors to GPU with one line: tensor.to("cuda")

For our tiny calculator, CPU is fine—training takes minutes either way. But understanding PyTorch conventions prepares you for larger models where GPU is essential.

Helpful?

How PyTorch Populates Parameters The Embedding Layer