Inference
Flexible Inference Implementation for Scorebook Evaluations
Scorebook's signature evaluate
function requires a callable argument for its inference
parameter. This is the function used to encapsulate a model's inference process and generate a list of predictions for a
list of input evaluation items.
Inference Callable Requirements
All inference callables in Scorebook must share the same basic contract, regardless of its implementation approach:
An inference callable must:
- Accept a list of evaluation items
- Accept hyperparameters as kwargs
- Return a list of model outputs for scoring
Inference Callable Implementations
This flexibility allows inference to be implemented using any callable type, such as functions, methods, classes, or objects.
Inference Functions
The most straightforward approach is defining a single function that handles the entire inference process:
# A basic inference function implementation
def inference(evaluation_items: List[Dict[str, Any]], **hyperparameters):
predictions = []
for item in items:
model = get_model()
model.temperature = hyperparameters.get("temperature")
prediction = model.predict(item["question"])
predictions.append(prediction)
return predictions
Advanced Inference Implementations
As projects grow, inference functions can be expanded into more modular and reusable components. Instead of handling
all logic in a single function, you can compose inference using Scorebook’s InferencePipeline
.
from scorebook import InferencePipeline
# Create an inference pipeline
inference_pipeline = InferencePipeline(
model = "model-name", # Optionally specify the model name
preprocessor = preprocessor, # Prepares evaluation items for model input
inference_function = inference, # Generates raw model output for structured inputs
postprocessor = postprocessor, # Parses model outputs to extract the response for scoring
)
Pipelines let you break the process into distinct stages of preprocessing, inference, and postprocessing, making it easier to manage complexity, reuse components, or plug in different models.
Additionally, Scorebook's flexibility allows for inference utilizing: In practice, more advanced scenarios often use one of the following approaches:
- Asynchronous Inference: Run inference asynchronously
- Batch Inference: Improve performance with local models
- Cloud Inference: Integrate with providers such as OpenAI or Anthropic
Each of these builds on the same callable contract described above
Next Steps
- For simple use cases: Start with basic inference functions as shown above
- For modular, reusable code: Explore Inference Pipelines
- For performance optimization: Check out Batch Inference
- For cloud integration: See Cloud Inference