Skip to main content

Evaluations

Comprehensive Model Evaluation with Scorebook

Scorebook's core evaluate function provides a flexible framework for assessing model performance across datasets, hyperparameters, and metrics. Whether you're running simple accuracy checks or complex adaptive evaluations, Scorebook handles the orchestration while giving you full control over the evaluation process.


Basic Evaluation Structure

All Scorebook evaluations follow the same fundamental pattern:

from scorebook import evaluate, EvalDataset
from scorebook.metrics import Accuracy

# Basic evaluation
results = evaluate(
inference=my_inference_function,
datasets=my_dataset,
hyperparameters={"temperature": 0.7},
)

The evaluate function requires:

  • Inference callable: Function that generates model predictions
  • Datasets: One or more evaluation datasets with associated metrics
  • Optional parameters: Hyperparameters, experiment tracking, result formatting

Next Steps