Model Configurator

Now that you understand how each component works, you can experiment with different configurations. Use this interactive tool to customize your model:

Model Configurator

104K
Parameters
111K
Training
~96%
Accuracy
~2 min
CPU Time*

*Estimated on Apple M1 CPU

Model Architecture

0.1

Training Parameters

100
10x
500

Generated Config

config.json
{
  "model": {
    "vocab_size": 36,
    "embed_dim": 64,
    "num_heads": 4,
    "num_layers": 2,
    "ff_dim": 256,
    "max_seq_len": 16,
    "dropout": 0.1
  },
  "training": {
    "batch_size": 64,
    "learning_rate": 0.001,
    "num_epochs": 100,
    "train_multiplier": 10,
    "test_size": 500
  }
}

Understanding the Parameters

Model Architecture:

  • Embedding Dimension: Size of token vectors. Larger values give the model more capacity to learn patterns, but increase memory and training time.
  • Attention Heads: Number of parallel attention operations. More heads let the model focus on different aspects of the input simultaneously.
  • Transformer Layers: Number of stacked attention+feed-forward blocks. Deeper models can learn more complex patterns.
  • Feed-Forward Dimension: Size of the hidden layer in the feed-forward network. Typically 4x the embedding dimension.
  • Max Sequence Length: Maximum number of tokens. Our longest equation ("ninety nine times one equals ninety nine") is about 10 tokens.
  • Dropout: Regularization to prevent overfitting. Set to 0 for small models/datasets.

Training Parameters:

  • Batch Size: Number of examples processed together. Larger batches are faster but need more memory.
  • Learning Rate: How big each weight update is. Too high = unstable training, too low = slow learning.
  • Epochs: Number of passes through the entire dataset. More epochs = better learning (up to a point).
  • Data Multiplier: How many times to repeat training data. More exposure helps the model learn.
  • Test Examples: Number of examples to evaluate accuracy on after training.

Using Your Config

After downloading your config, place the calculator_llm_config.json file in your project and update the training script to load both model and training parameters:

python
1import json
2
3# Load your custom config
4with open("calculator_llm_config.json") as f:
5 config = json.load(f)
6
7model_config = config["model"]
8train_config = config["training"]
9
10# Use in model creation
11model = CalculatorLLM(
12 vocab_size=model_config["vocab_size"],
13 embed_dim=model_config["embed_dim"],
14 num_heads=model_config["num_heads"],
15 num_layers=model_config["num_layers"],
16 ff_dim=model_config["ff_dim"],
17 max_seq_len=model_config["max_seq_len"],
18 dropout=model_config["dropout"],
19)
20
21# Use in training
22train(
23 num_epochs=train_config["num_epochs"],
24 batch_size=train_config["batch_size"],
25 learning_rate=train_config["learning_rate"],
26)
Experiment! Try different configurations and see how they affect training time and accuracy. Smaller models train faster but may be less accurate.
Helpful?