Build Your First LLM from ScratchPart 1 · Section 8 of 9

Step 6: Generation

Generation illustration showing probability scores for words and selecting the highest probability word as output

We now have probabilities for every word. How do we pick the final answer? There are a few strategies:

Strategy 1: Greedy (pick the highest)

The simplest approach: always pick the word with the highest probability.

"five"  → 94.2%  ← Pick this one!
"four"  → 2.1%
"six"   → 1.8%
...

Output: "five"

This is called greedy decoding. It's deterministic—the same input always gives the same output. Perfect for math where there's only one right answer.

Strategy 2: Sampling (add randomness)

Instead of always picking the top word, we randomly choose based on the probabilities. Higher probability = more likely to be chosen, but not guaranteed.

Run 1: "five"  (94.2% chance → picked!)
Run 2: "five"  (94.2% chance → picked!)
Run 3: "four"  (2.1% chance  → lucky pick!)
Run 4: "five"  (94.2% chance → picked!)

This adds variety. When writing a story, you don't want the same words every time. Sampling makes the model more creative.

Strategy 3: Temperature (control randomness)

We can adjust how "confident" the model is using a parameter called temperature:

Low temperature (0.1): Makes high probabilities even higher. Model becomes very confident, less creative.
Temperature = 1: Use probabilities as-is.
High temperature (2.0): Flattens probabilities. Model becomes more random, more creative.

Original:     "five" 94.2%, "four" 2.1%, "six" 1.8%
Low temp:     "five" 99.9%, "four" 0.05%, "six" 0.03%  (almost certain)
High temp:    "five" 60%, "four" 15%, "six" 12%        (more random)

What Do Real Models Use?

Model/Use Case	Strategy	Why
ChatGPT (default)	Sampling + Temperature ~0.7	Balanced creativity and coherence
Code generation (Copilot)	Low temperature ~0.2	Code needs to be precise and correct
Creative writing	Higher temperature ~1.0+	More surprising and varied outputs
Math/Reasoning	Greedy or very low temp	Only one right answer
Our calculator	Greedy	Math has no room for creativity!

For our calculator, we'll use greedy decoding (always pick the highest). Math has right and wrong answers—we don't want creativity here! With high temperature, even GPT-4 might confidently answer "2+2=5" just for variety. Deterministic tasks need deterministic settings.

And that's it! The model outputs "five", and we've successfully computed "two plus three" = "five".

Helpful?

Step 5: Output Layer Summary