Uploading Results
Uploading Evaluation Results to Trismik's Dashboard
After running evaluations with Scorebook, you can upload your results to the Trismik platform for centralized tracking, analysis, and collaboration. This enables you to visualize performance trends, compare different models, and share results with your team.
Prerequisites
Before uploading results to Trismik, you need:
- Valid Trismik API credentials - Get your API key from the Trismik dashboard
- A Trismik project - Create a project on the Trismik dashboard to organize your evaluations
- Authentication setup - Configure your API key either via environment variable or login
Authentication
Environment Variable (Recommended)
Set your API key as an environment variable:
export TRISMIK_API_KEY="your-api-key-here"
Login Function
Alternatively, login programmatically using the login()
function:
import os
from scorebook import login
api_key = os.environ.get("TRISMIK_API_KEY")
login(api_key)
The login()
function saves your API key locally for future use. You only need to call it once per environment.
Uploading Results
Automatic Upload
When authenticated, you can enable automatic result uploads by providing experiment_id
and project_id
to the evaluate()
function:
from scorebook import evaluate, EvalDataset
from scorebook.metrics import Accuracy
# Setup your inference function
def my_inference(eval_items, **hyperparameters):
# Your inference logic here
pass
# Load your dataset
dataset = EvalDataset.from_json(
file_path="path/to/dataset.json",
label="answer",
metrics=Accuracy
)
# Run evaluation with automatic upload
results = evaluate(
inference=my_inference,
datasets=dataset,
hyperparameters={"temperature": 0.7},
experiment_id="my-experiment", # Creates experiment if it doesn't exist
project_id="your-project-id", # Must exist on Trismik dashboard
metadata={"model": "gpt-4", "version": "1.0"},
return_items=True
)
Manual Upload Control
You can explicitly control result uploading with the upload_results
parameter:
# Force upload even without experiment_id/project_id
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results=True,
project_id="your-project-id"
)
# Disable upload even when authenticated
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results=False
)
# Auto mode (default) - uploads if authenticated and IDs provided
results = evaluate(
inference=my_inference,
datasets=dataset,
upload_results="auto" # This is the default
)
Understanding Upload Behavior
Condition | Upload Behavior |
---|---|
upload_results=True + authenticated | Always uploads |
upload_results=True + not authenticated | upload Fails |
upload_results=False | Never uploads |
upload_results="auto" + authenticated + IDs provided | Uploads automatically |
upload_results="auto" + not authenticated | No upload |
Metadata and Organization
Experiment Organization
- Projects: Top-level containers for related experiments
- Experiments: Specific evaluation campaigns (created automatically if they don't exist)
- Runs: Individual evaluation executions with specific hyperparameters
Adding Metadata
Include relevant metadata to enhance result tracking:
metadata = {
"model": "microsoft/Phi-4-mini-instruct",
"version": "1.2.0",
"dataset_version": "v2",
"notes": "Testing new prompt template",
"environment": "production"
}
results = evaluate(
inference=my_inference,
datasets=dataset,
experiment_id="prompt-optimization",
project_id="your-project-id",
metadata=metadata
)
Verification
After uploading, verify your results appear on the Trismik dashboard:
- Navigate to your project
- Check the experiment list
- View individual run details and metrics
For a complete runnable example, see Scorebook Example 8 which demonstrates the full upload workflow.