Hyperparameters
How to Evaluate Across Hyperparameter Configurations
When calling evaluate
, you may pass a hyperparameters argument to sweep model performance across one or more
hyperparameter configurations. These hyperparameters are forwarded directly to your inference callable as **kwargs.
Scorebook will automatically expand and run evaluations across all provided configurations.
There are three ways to specify hyperparameters:
- A single hyperparameter configuration (one run)
- A list of hyperparameter configurations (multiple runs)
- A hyperparameter configuration grid (auto-expanded into all possible runs)
Singular Hyperparameter Configuration
If hyperparameters is a single dict, it’s used for one evaluation run.
results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters={"temperature": 0.7, "max_tokens": 128},
)
In this case, my_inference will receive:
my_inference(items, temperature=0.7, max_tokens=128)
List of Hyperparameter Configurations
If hyperparameters is a list of dicts, each dict represents a separate configuration to evaluate.
configs = [
{"temperature": 0.3},
{"temperature": 0.7},
{"temperature": 1.0},
]
results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters=configs,
)
This produces one evaluation run per config, keeping other parameters fixed.
Hyperparameter Configuration Grid
If hyperparameters is a dict where values are lists, Scorebook expands it into the full parameter grid. To run and view the result of an evaluation with a hyperparameter grid sweep, run Scorebook's Example 7.
grid = {
"temperature": [0.3, 0.7, 1.0],
"max_tokens": [64, 128],
}
results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters=grid,
)
This expands to:
[
{"temperature": 0.3, "max_tokens": 64},
{"temperature": 0.3, "max_tokens": 128},
{"temperature": 0.7, "max_tokens": 64},
{"temperature": 0.7, "max_tokens": 128},
{"temperature": 1.0, "max_tokens": 64},
{"temperature": 1.0, "max_tokens": 128},
]
Total configurations: 2 × 3 = 6 hyperparameter configurations