Skip to content

scandeval.generation

source module scandeval.generation

Functions related to text generation of models.

Functions

  • generate Evaluate a model on a dataset through generation.

  • generate_single_iteration Evaluate a model on a dataset in a single iteration through generation.

  • debug_log Log inputs and outputs for debugging purposes.

source generate(model: BenchmarkModule, datasets: list[DatasetDict], model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)list[dict[str, float]]

Evaluate a model on a dataset through generation.

Parameters

  • model : BenchmarkModule

    The model to evaluate.

  • datasets : list[DatasetDict]

    The datasets to evaluate on.

  • model_config : ModelConfig

    The configuration of the model.

  • benchmark_config : BenchmarkConfig

    The configuration of the benchmark.

  • dataset_config : DatasetConfig

    The configuration of the dataset.

Returns

  • list[dict[str, float]] A list of dictionaries containing the test scores.

source generate_single_iteration(dataset: Dataset, model: BenchmarkModule, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, cache: ModelCache)dict[str, float]

Evaluate a model on a dataset in a single iteration through generation.

Parameters

  • dataset : Dataset

    The dataset to evaluate on.

  • model : BenchmarkModule

    The model to evaluate.

  • dataset_config : DatasetConfig

    The configuration of the dataset.

  • benchmark_config : BenchmarkConfig

    The configuration of the benchmark.

  • cache : ModelCache

    The model output cache.

Returns

  • dict[str, float] A list of dictionaries containing the scores for each metric.

Raises

  • ValueError

source debug_log(batch: dict[str, t.Any], extracted_labels: list[dict | str | list[str]], dataset_config: DatasetConfig)None

Log inputs and outputs for debugging purposes.

Parameters

  • batch : dict[str, t.Any]

    The batch of examples to evaluate on.

  • extracted_labels : list[dict | str | list[str]]

    The extracted labels from the model output.

  • dataset_config : DatasetConfig

    The configuration of the dataset.

Raises