scorebook.types
Type definitions for scorebook evaluation framework.
AdaptiveEvalDataset Objects
@dataclass
class AdaptiveEvalDataset()
Represents a dataset configured for adaptive evaluation.
EvalRunSpec Objects
@dataclass
class EvalRunSpec()
Specification for a single evaluation run with dataset and hyperparameters.
__str__
def __str__() -> str
Return string representation of EvalRunSpec.
AdaptiveEvalRunSpec Objects
@dataclass
class AdaptiveEvalRunSpec()
Specification for an adaptive evaluation run.
ClassicEvalRunResult Objects
@dataclass
class ClassicEvalRunResult()
Results from executing a classic evaluation run.
item_scores
@property
def item_scores() -> List[Dict[str, Any]]
Return a list of dictionaries containing scores for each evaluated item.
aggregate_scores
@property
def aggregate_scores() -> Dict[str, Any]
Return the aggregated scores for this run.
AdaptiveEvalRunResult Objects
@dataclass
class AdaptiveEvalRunResult()
Results from executing an adaptive evaluation run.
aggregate_scores
@property
def aggregate_scores() -> Dict[str, Any]
Return the aggregated scores for this adaptive run.
EvalResult Objects
@dataclass
class EvalResult()
Container for evaluation results across multiple runs.
item_scores
@property
def item_scores() -> List[Dict[str, Any]]
Return a list of dictionaries containing scores for each evaluated item.
aggregate_scores
@property
def aggregate_scores() -> List[Dict[str, Any]]
Return the aggregated scores across all evaluated runs.