Model Configurator

Now that you understand how each component works, you can experiment with different configurations. Use this interactive tool to customize your model:

Model Configurator

104K

Parameters

111K

Training

~96%

Accuracy

~2 min

CPU Time*

*Estimated on Apple M1 CPU

Model Architecture

Embedding Dimension

Attention Heads

Transformer Layers

Feed-Forward Dimension

Max Sequence Length

Dropout

0.1

Training Parameters

Batch Size

Learning Rate

Epochs

100

Data Multiplier

10x

Test Examples

500

Generated Config

config.json

{
  "model": {
    "vocab_size": 36,
    "embed_dim": 64,
    "num_heads": 4,
    "num_layers": 2,
    "ff_dim": 256,
    "max_seq_len": 16,
    "dropout": 0.1
  },
  "training": {
    "batch_size": 64,
    "learning_rate": 0.001,
    "num_epochs": 100,
    "train_multiplier": 10,
    "test_size": 500
  }
}

Understanding the Parameters

Model Architecture:

Embedding Dimension: Size of token vectors. Larger values give the model more capacity to learn patterns, but increase memory and training time.
Attention Heads: Number of parallel attention operations. More heads let the model focus on different aspects of the input simultaneously.
Transformer Layers: Number of stacked attention+feed-forward blocks. Deeper models can learn more complex patterns.
Feed-Forward Dimension: Size of the hidden layer in the feed-forward network. Typically 4x the embedding dimension.
Max Sequence Length: Maximum number of tokens. Our longest equation ("ninety nine times one equals ninety nine") is about 10 tokens.
Dropout: Regularization to prevent overfitting. Set to 0 for small models/datasets.

Training Parameters:

Batch Size: Number of examples processed together. Larger batches are faster but need more memory.
Learning Rate: How big each weight update is. Too high = unstable training, too low = slow learning.
Epochs: Number of passes through the entire dataset. More epochs = better learning (up to a point).
Data Multiplier: How many times to repeat training data. More exposure helps the model learn.
Test Examples: Number of examples to evaluate accuracy on after training.

Using Your Config

After downloading your config, place the calculator_llm_config.json file in your project and update the training script to load both model and training parameters:

python

1import json
2
3# Load your custom config
4with open("calculator_llm_config.json") as f:
5    config = json.load(f)
6
7model_config = config["model"]
8train_config = config["training"]
9
10# Use in model creation
11model = CalculatorLLM(
12    vocab_size=model_config["vocab_size"],
13    embed_dim=model_config["embed_dim"],
14    num_heads=model_config["num_heads"],
15    num_layers=model_config["num_layers"],
16    ff_dim=model_config["ff_dim"],
17    max_seq_len=model_config["max_seq_len"],
18    dropout=model_config["dropout"],
19)
20
21# Use in training
22train(
23    num_epochs=train_config["num_epochs"],
24    batch_size=train_config["batch_size"],
25    learning_rate=train_config["learning_rate"],
26)

Experiment! Try different configurations and see how they affect training time and accuracy. Smaller models train faster but may be less accurate.

Helpful?

Overview