Skip to main content

Hyperparameters

How to Evaluate Across Hyperparameter Configurations

When calling evaluate, you may pass a hyperparameters argument to sweep model performance across one or more hyperparameter configurations. These hyperparameters are forwarded directly to your inference callable as **kwargs. Scorebook will automatically expand and run evaluations across all provided configurations.

There are three ways to specify hyperparameters:

  • A single hyperparameter configuration (one run)
  • A list of hyperparameter configurations (multiple runs)
  • A hyperparameter configuration grid (auto-expanded into all possible runs)

Singular Hyperparameter Configuration

If hyperparameters is a single dict, it’s used for one evaluation run.

results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters={"temperature": 0.7, "max_tokens": 128},
)

In this case, my_inference will receive:

my_inference(items, temperature=0.7, max_tokens=128)

List of Hyperparameter Configurations

If hyperparameters is a list of dicts, each dict represents a separate configuration to evaluate.

configs = [
{"temperature": 0.3},
{"temperature": 0.7},
{"temperature": 1.0},
]

results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters=configs,
)

This produces one evaluation run per config, keeping other parameters fixed.


Hyperparameter Configuration Grid

If hyperparameters is a dict where values are lists, Scorebook expands it into the full parameter grid. To run and view the result of an evaluation with a hyperparameter grid sweep, run Scorebook's Example 7.

grid = {
"temperature": [0.3, 0.7, 1.0],
"max_tokens": [64, 128],
}

results = evaluate(
inference=my_inference,
datasets=eval_dataset,
hyperparameters=grid,
)

This expands to:

[
{"temperature": 0.3, "max_tokens": 64},
{"temperature": 0.3, "max_tokens": 128},
{"temperature": 0.7, "max_tokens": 64},
{"temperature": 0.7, "max_tokens": 128},
{"temperature": 1.0, "max_tokens": 64},
{"temperature": 1.0, "max_tokens": 128},
]

Total configurations: 2 × 3 = 6 hyperparameter configurations