Build Your First LLM from ScratchPart 1 · Section 8 of 9

Step 6: Generation

Generation illustration showing probability scores for words and selecting the highest probability word as output

We now have probabilities for every word. How do we pick the final answer? There are a few strategies:

Strategy 1: Greedy (pick the highest)

The simplest approach: always pick the word with the highest probability.

"five"  → 94.2%  ← Pick this one!
"four"  → 2.1%
"six"   → 1.8%
...

Output: "five"

This is called greedy decoding. It's deterministic—the same input always gives the same output. Perfect for math where there's only one right answer.

Strategy 2: Sampling (add randomness)

Instead of always picking the top word, we randomly choose based on the probabilities. Higher probability = more likely to be chosen, but not guaranteed.

Run 1: "five"  (94.2% chance → picked!)
Run 2: "five"  (94.2% chance → picked!)
Run 3: "four"  (2.1% chance  → lucky pick!)
Run 4: "five"  (94.2% chance → picked!)

This adds variety. When writing a story, you don't want the same words every time. Sampling makes the model more creative.

Strategy 3: Temperature (control randomness)

We can adjust how "confident" the model is using a parameter called temperature:

  • Low temperature (0.1): Makes high probabilities even higher. Model becomes very confident, less creative.
  • Temperature = 1: Use probabilities as-is.
  • High temperature (2.0): Flattens probabilities. Model becomes more random, more creative.
Original:     "five" 94.2%, "four" 2.1%, "six" 1.8%
Low temp:     "five" 99.9%, "four" 0.05%, "six" 0.03%  (almost certain)
High temp:    "five" 60%, "four" 15%, "six" 12%        (more random)

What Do Real Models Use?

Model/Use CaseStrategyWhy
ChatGPT (default)Sampling + Temperature ~0.7Balanced creativity and coherence
Code generation (Copilot)Low temperature ~0.2Code needs to be precise and correct
Creative writingHigher temperature ~1.0+More surprising and varied outputs
Math/ReasoningGreedy or very low tempOnly one right answer
Our calculatorGreedyMath has no room for creativity!
For our calculator, we'll use greedy decoding (always pick the highest). Math has right and wrong answers—we don't want creativity here! With high temperature, even GPT-4 might confidently answer "2+2=5" just for variety. Deterministic tasks need deterministic settings.

And that's it! The model outputs "five", and we've successfully computed "two plus three" = "five".

Helpful?