Skip to main content

Adaptive Evaluations

Running Trismik's Adaptive Evaluations in Scorebook

Adaptive evaluations use intelligent algorithms to optimize the evaluation process by dynamically selecting which questions to present to your model. This approach can significantly reduce evaluation time while maintaining statistical accuracy, making it ideal for expensive models or large-scale evaluations.

Instead of running through every item in a dataset, adaptive evaluations intelligently select questions based on your model's previous responses, stopping when sufficient confidence is achieved about the model's performance.


Prerequisites

Before running adaptive evaluations, you need:

  1. Valid Trismik authentication credentials - Get your API key from the Trismik dashboard
  2. A Trismik project - Create a project on the Trismik dashboard
  3. Adaptive dataset access - Datasets with the :adaptive suffix

Using Adaptive Datasets

Adaptive datasets are available through Trismik and are accessed by appending :adaptive to the dataset name:

from scorebook import evaluate

# Regular dataset
results = evaluate(inference, datasets="MMLUPro2025")

# Adaptive version of the same dataset
results = evaluate(inference, datasets="MMLUPro2025:adaptive")

For a complete runnable example, see Scorebook Example 9 which demonstrates the full adaptive evaluation workflow.