Model Configurator
Now that you understand how each component works, you can experiment with different configurations. Use this interactive tool to customize your model:
Model Configurator
104K
Parameters
111K
Training
~96%
Accuracy
~2 min
CPU Time*
*Estimated on Apple M1 CPU
Model Architecture
0.1
Training Parameters
100
10x
500
Generated Config
config.json
{
"model": {
"vocab_size": 36,
"embed_dim": 64,
"num_heads": 4,
"num_layers": 2,
"ff_dim": 256,
"max_seq_len": 16,
"dropout": 0.1
},
"training": {
"batch_size": 64,
"learning_rate": 0.001,
"num_epochs": 100,
"train_multiplier": 10,
"test_size": 500
}
}Understanding the Parameters
Model Architecture:
- Embedding Dimension: Size of token vectors. Larger values give the model more capacity to learn patterns, but increase memory and training time.
- Attention Heads: Number of parallel attention operations. More heads let the model focus on different aspects of the input simultaneously.
- Transformer Layers: Number of stacked attention+feed-forward blocks. Deeper models can learn more complex patterns.
- Feed-Forward Dimension: Size of the hidden layer in the feed-forward network. Typically 4x the embedding dimension.
- Max Sequence Length: Maximum number of tokens. Our longest equation ("ninety nine times one equals ninety nine") is about 10 tokens.
- Dropout: Regularization to prevent overfitting. Set to 0 for small models/datasets.
Training Parameters:
- Batch Size: Number of examples processed together. Larger batches are faster but need more memory.
- Learning Rate: How big each weight update is. Too high = unstable training, too low = slow learning.
- Epochs: Number of passes through the entire dataset. More epochs = better learning (up to a point).
- Data Multiplier: How many times to repeat training data. More exposure helps the model learn.
- Test Examples: Number of examples to evaluate accuracy on after training.
Using Your Config
After downloading your config, place the calculator_llm_config.json file in your project and update the training script to load both model and training parameters:
python
1import json2
3# Load your custom config4with open("calculator_llm_config.json") as f:5 config = json.load(f)6
7model_config = config["model"]8train_config = config["training"]9
10# Use in model creation11model = CalculatorLLM(12 vocab_size=model_config["vocab_size"],13 embed_dim=model_config["embed_dim"],14 num_heads=model_config["num_heads"],15 num_layers=model_config["num_layers"],16 ff_dim=model_config["ff_dim"],17 max_seq_len=model_config["max_seq_len"],18 dropout=model_config["dropout"],19)20
21# Use in training22train(23 num_epochs=train_config["num_epochs"],24 batch_size=train_config["batch_size"],25 learning_rate=train_config["learning_rate"],26)Experiment! Try different configurations and see how they affect training time and accuracy. Smaller models train faster but may be less accurate.
Helpful?