scandeval.generation
source module scandeval.generation
Functions related to text generation of models.
Functions
-
generate — Evaluate a model on a dataset through generation.
-
generate_single_iteration — Evaluate a model on a dataset in a single iteration through generation.
-
debug_log — Log inputs and outputs for debugging purposes.
source generate(model: BenchmarkModule, datasets: list[DatasetDict], model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → list[dict[str, float]]
Evaluate a model on a dataset through generation.
Parameters
-
model : BenchmarkModule —
The model to evaluate.
-
datasets : list[DatasetDict] —
The datasets to evaluate on.
-
model_config : ModelConfig —
The configuration of the model.
-
benchmark_config : BenchmarkConfig —
The configuration of the benchmark.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
Returns
-
list[dict[str, float]] — A list of dictionaries containing the test scores.
source generate_single_iteration(dataset: Dataset, model: BenchmarkModule, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, cache: ModelCache) → dict[str, float]
Evaluate a model on a dataset in a single iteration through generation.
Parameters
-
dataset : Dataset —
The dataset to evaluate on.
-
model : BenchmarkModule —
The model to evaluate.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
-
benchmark_config : BenchmarkConfig —
The configuration of the benchmark.
-
cache : ModelCache —
The model output cache.
Returns
-
dict[str, float] — A list of dictionaries containing the scores for each metric.
Raises
-
ValueError
source debug_log(batch: dict[str, t.Any], extracted_labels: list[dict | str | list[str]], dataset_config: DatasetConfig) → None
Log inputs and outputs for debugging purposes.
Parameters
-
batch : dict[str, t.Any] —
The batch of examples to evaluate on.
-
extracted_labels : list[dict | str | list[str]] —
The extracted labels from the model output.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
Raises