scorebook.eval_dataset

Eval Dataset implementation for scorebook.

EvalDataset Objects

class EvalDataset()

Eval Dataset implementation for scorebook.

init

def __init__(name: str,
             label: str,
             metrics: Union[str, Type[MetricBase],
                            List[Union[str, Type[MetricBase]]]],
             hf_dataset: HuggingFaceDataset,
             prompt_template: Optional[str] = None)

Create a new scorebook evaluation dataset instance.

Arguments:

name: The name of the evaluation dataset.
label: The label field of the dataset.
metrics: The specified metrics associated with the dataset.
hf_dataset: The dataset as a hugging face dataset object.
prompt_template: Optional prompt template for building prompts from dataset items.

len

def __len__() -> int

Return the number of items in the dataset.

getitem

def __getitem__(key: Union[int, str]) -> Union[Dict[str, Any], List[Any]]

Allow item access by index (int) or by column name (str).

eval_dataset[i] returns the i-th example (dict).
eval_dataset["feature"] returns a list of values for that feature.

str

def __str__() -> str

Return a formatted string summary of the evaluation dataset.

iter

def __iter__() -> Iterator[Dict[str, Any]]

Return an iterator over all examples in the dataset.

shuffle

def shuffle() -> None

Randomly shuffle the dataset items.

items

@property
def items() -> List[Any]

Return a list of all examples in the dataset.

column_names

@property
def column_names() -> List[str]

Return a list of column/feature names available in the dataset.

from_list

@classmethod
def from_list(cls, name: str, label: str,
              metrics: Union[str, Type[MetricBase],
                             List[Union[str, Type[MetricBase]]]],
              data: List[Dict[str, Any]]) -> "EvalDataset"

Instantiate an EvalDataset from a list of dictionaries.

Arguments:

name - The name of the evaluation dataset.
label - The field used as the evaluation label (ground truth).
metrics - The specified metrics associated with the dataset.
data - List of dictionaries containing the dataset examples.

Returns:

A scorebook EvalDataset wrapping a Hugging Face dataset.

from_csv

@classmethod
def from_csv(cls,
             file_path: str,
             label: str,
             metrics: Union[str, Type[MetricBase],
                            List[Union[str, Type[MetricBase]]]],
             name: Optional[str] = None,
             encoding: str = "utf-8",
             newline: str = "",
             **reader_kwargs: Any) -> "EvalDataset"

Instantiate a scorebook dataset from a CSV file.

Arguments:

file_path - Path to the CSV file.
label - The field used as the evaluation label (ground truth).
metrics - The specified metrics associated with the dataset.
name - Optional name for the eval dataset, if not provided, the path is used
encoding - Encoding of the CSV file.
newline - Newline character of the CSV file.
reader_kwargs - Dict of kwargs passed to csv.DictReader.

Returns:

A scorebook EvalDataset.

Raises:

FileNotFoundError - If the file does not exist at the given path.
ValueError - If the CSV file cannot be parsed or is empty.

from_json

@classmethod
def from_json(cls,
              file_path: str,
              label: str,
              metrics: Union[str, Type[MetricBase],
                             List[Union[str, Type[MetricBase]]]],
              name: Optional[str] = None,
              split: Optional[str] = None) -> "EvalDataset"

Instantiate an EvalDataset from a JSON file.

The JSON file must follow one of two supported formats:

Flat format – a list of dictionaries:

[
  {"input": "What is 2+2?", "label": "4"},
  {"input": "Capital of France?", "label": "Paris"}
]

Split format – a dictionary of named splits:

{
  "train": [{"input": ..., "label": ...}],
  "test": [{"input": ..., "label": ...}]
}

Arguments:

file_path - Path to the JSON file on disk.
label - The field used as the evaluation label (ground truth).
metrics - The specified metrics associated with the dataset.
name - Optional name for the eval dataset, if not provided, the path is used
split - If the JSON uses a split structure, this is the split name to load.

Returns:

A scorebook EvalDataset wrapping a Hugging Face dataset.

Raises:

FileNotFoundError - If the file does not exist.
ValueError - If the JSON is invalid or the structure is unsupported.

from_huggingface

@classmethod
def from_huggingface(cls,
                     path: str,
                     label: str,
                     metrics: Union[str, Type[MetricBase],
                                    List[Union[str, Type[MetricBase]]]],
                     split: Optional[str] = None,
                     name: Optional[str] = None) -> "EvalDataset"

Instantiate an EvalDataset from a dataset available on Hugging Face Hub.

If a specific split is provided (e.g., "train" or "test"), it will be loaded directly. If no split is specified, the method attempts to load the full dataset. If the dataset is split into multiple subsets (i.e., a DatasetDict), it defaults to loading the "test" split.

Arguments:

path - The path of the dataset on the Hugging Face Hub.
label - The field used as the evaluation label (ground truth).
metrics - The specified metrics associated with the dataset.
split - Optional name of the split to load.
name - Optional dataset configuration name.

Returns:

An EvalDataset wrapping the selected Hugging Face dataset.

Raises:

ValueError - If the dataset cannot be loaded, or the expected split is missing.

from_yaml

@classmethod
def from_yaml(cls, file_path: str) -> "EvalDataset"

Instantiate an EvalDataset from a YAML file.

The YAML file should contain configuration for loading a dataset, including:

name: Name of the dataset or Hugging Face dataset path
label: The field used as the evaluation label
metrics: List of metrics to evaluate
split: Optional split name to load
template: Optional prompt template

Returns:

An EvalDataset instance configured according to the YAML file.

Raises:

ValueError - If the YAML file is invalid or missing required fields.

sample

def sample(sample_size: int) -> "EvalDataset"

Create a new dataset with randomly sampled items from this dataset.

Arguments:

sample_size - The number of items to sample from the dataset

Returns:

A new EvalDataset with randomly sampled items

Raises:

ValueError - If sample_size is larger than the dataset size

EvalDataset Objects​

__init__​

__len__​

__getitem__​

__str__​

__iter__​

shuffle​

items​

column_names​

from_list​

from_csv​

from_json​

from_huggingface​

from_yaml​

sample​

EvalDataset Objects

init

len

getitem

str

iter

shuffle

items

column_names

from_list

from_csv

from_json

from_huggingface

from_yaml

sample