Skip to main content

scorebook.types

Type definitions for scorebook evaluation framework.

AdaptiveEvalDataset Objects

@dataclass
class AdaptiveEvalDataset()

Represents a dataset configured for adaptive evaluation.

EvalRunSpec Objects

@dataclass
class EvalRunSpec()

Specification for a single evaluation run with dataset and hyperparameters.

__str__

def __str__() -> str

Return string representation of EvalRunSpec.

AdaptiveEvalRunSpec Objects

@dataclass
class AdaptiveEvalRunSpec()

Specification for an adaptive evaluation run.

ClassicEvalRunResult Objects

@dataclass
class ClassicEvalRunResult()

Results from executing a classic evaluation run.

item_scores

@property
def item_scores() -> List[Dict[str, Any]]

Return a list of dictionaries containing scores for each evaluated item.

aggregate_scores

@property
def aggregate_scores() -> Dict[str, Any]

Return the aggregated scores for this run.

AdaptiveEvalRunResult Objects

@dataclass
class AdaptiveEvalRunResult()

Results from executing an adaptive evaluation run.

aggregate_scores

@property
def aggregate_scores() -> Dict[str, Any]

Return the aggregated scores for this adaptive run.

EvalResult Objects

@dataclass
class EvalResult()

Container for evaluation results across multiple runs.

item_scores

@property
def item_scores() -> List[Dict[str, Any]]

Return a list of dictionaries containing scores for each evaluated item.

aggregate_scores

@property
def aggregate_scores() -> List[Dict[str, Any]]

Return the aggregated scores across all evaluated runs.