scandeval.finetuning

Functions related to the finetuning of models.

Functions

source finetune(model: BenchmarkModule, datasets: list[DatasetDict], model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → list[dict[str, float]]

Evaluate a model on a dataset through finetuning.

Parameters

model : BenchmarkModule —

The model to evaluate.
datasets : list[DatasetDict] —

The datasets to use for training and evaluation.
model_config : ModelConfig —

The configuration of the model.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

list[dict[str, float]] — A list of dicts containing the scores for each metric for each iteration.

Raises

source finetune_single_iteration(model: BenchmarkModule | None, dataset: DatasetDict, iteration_idx: int, training_args: TrainingArguments, model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → dict[str, float]

Run a single iteration of a benchmark.

Parameters

model : BenchmarkModule | None —

The model to use in the benchmark. If None then a new model will be loaded.
dataset : DatasetDict —

The dataset to use for training and evaluation.
iteration_idx : int —

The index of the iteration.
training_args : TrainingArguments —

The training arguments.
model_config : ModelConfig —

The model configuration.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

Raises

source get_training_args(benchmark_config: BenchmarkConfig, model_config: ModelConfig, iteration_idx: int, dtype: DataType, batch_size: int | None = None) → TrainingArguments

Get the training arguments for the current iteration.

Parameters

benchmark_config : BenchmarkConfig —

The benchmark configuration.
model_config : ModelConfig —

The model configuration.
iteration_idx : int —

The index of the current iteration. This is only used to generate a unique random seed for the current iteration.
dtype : DataType —

The data type to use for the model weights.
batch_size : int | None —

The batch size to use for the current iteration, or None if the batch size in the benchmark config should be used.

Returns