scandeval.benchmark_modules

source package scandeval.benchmark_modules

The different types of modules that can be benchmarked.

Classes

BenchmarkModule — Abstract class for a benchmark module.
FreshEncoderModel — A freshly initialised encoder model.
HuggingFaceEncoderModel — An encoder model from the Hugging Face Hub.
LiteLLMModel — A generative model from LiteLLM.
VLLMModel — A generative model using the vLLM inference framework.

source class BenchmarkModule(dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : ABC

Abstract class for a benchmark module.

Initialise the benchmark module.

Attributes

model_config —

The model configuration.
dataset_config —

The dataset configuration.
benchmark_config —

The benchmark configuration.
buffer : dict[str, t.Any] —

A buffer to store temporary data.
generative_type : GenerativeType | None — Get the generative type of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type[Trainer] — The Trainer class to use for finetuning.

Parameters

model_config : ModelConfig —

The model configuration.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Methods

get_pytorch_module — Get the underlying PyTorch module.
get_tokenizer — Get the underlying tokenizer.
num_params — The number of parameters in the model.
vocab_size — The vocabulary size of the model.
model_max_length — The maximum length of the model.
prepare_datasets — Prepare the datasets for the model.
prepare_dataset — Prepare the dataset for the model.
generate — Generate outputs from the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.

source method BenchmarkModule.get_pytorch_module() → nn.Module

Get the underlying PyTorch module.

Returns

nn.Module — The PyTorch module.

Raises

NotImplementedError

source method BenchmarkModule.get_tokenizer() → PreTrainedTokenizer

Get the underlying tokenizer.

Returns

PreTrainedTokenizer — The tokenizer.

Raises

NotImplementedError

source method BenchmarkModule.num_params() → int

The number of parameters in the model.

Returns

int — The number of parameters in the model.

source property BenchmarkModule.generative_type: GenerativeType | None

Get the generative type of the model.

Returns

GenerativeType | None — The generative type of the model, or None if the model is not generative.

source method BenchmarkModule.vocab_size() → int

The vocabulary size of the model.

Returns

int — The vocabulary size of the model.

source method BenchmarkModule.model_max_length() → int

The maximum length of the model.

Returns

int — The maximum length of the model.

source property BenchmarkModule.data_collator: c.Callable[[list[t.Any]], dict[str, t.Any]]

The data collator used to prepare samples during finetuning.

Returns

c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator.

source property BenchmarkModule.compute_metrics: ComputeMetricsFunction

The function used to compute the metrics.

Returns

ComputeMetricsFunction — The function used to compute the metrics.

source property BenchmarkModule.extract_labels_from_generation: ExtractLabelsFunction

The function used to extract the labels from the generated output.

Returns

ExtractLabelsFunction — The function used to extract the labels from the generated output.

source property BenchmarkModule.trainer_class: t.Type[Trainer]

The Trainer class to use for finetuning.

Returns

t.Type[Trainer] — The Trainer class.

source method BenchmarkModule.prepare_datasets(datasets: list[DatasetDict], task: Task) → list[DatasetDict]

Prepare the datasets for the model.

This includes things like tokenisation.

Parameters

datasets : list[DatasetDict] —

The datasets to prepare.
task : Task —

The task to prepare the datasets for.

Returns

list[DatasetDict] — The prepared datasets.

source method BenchmarkModule.prepare_dataset(dataset: DatasetDict, task: Task, itr_idx: int) → DatasetDict

Prepare the dataset for the model.

This includes things like tokenisation.

Parameters

dataset : DatasetDict —

The dataset to prepare.
task : Task —

The task to prepare the dataset for.
itr_idx : int —

The index of the dataset in the iterator.

Returns

DatasetDict — The prepared dataset.

source method BenchmarkModule.generate(inputs: dict) → GenerativeModelOutput

Generate outputs from the model.

Parameters

inputs : dict —

A batch of inputs to pass through the model.

Returns

GenerativeModelOutput — The generated model outputs.

Raises

NotImplementedError

source classmethod BenchmarkModule.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

source classmethod BenchmarkModule.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig

Fetch the model configuration.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

ModelConfig — The model configuration.

source class FreshEncoderModel(dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : HuggingFaceEncoderModel

A freshly initialised encoder model.

Initialise the model.

Parameters

model_config : ModelConfig —

The model configuration.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Attributes

generative_type : GenerativeType | None — Get the generative type of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type[Trainer] — The Trainer class to use for finetuning.

Methods

num_params — The number of parameters in the model.
vocab_size — The vocabulary size of the model.
model_max_length — The maximum context length of the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.

source method FreshEncoderModel.num_params() → int

The number of parameters in the model.

Returns

int — The number of parameters in the model.

Raises

NotImplementedError

source method FreshEncoderModel.vocab_size() → int

The vocabulary size of the model.

Returns

int — The vocabulary size of the model.

Raises

NotImplementedError

source method FreshEncoderModel.model_max_length() → int

The maximum context length of the model.

Returns

int — The maximum context length of the model.

Raises

NotImplementedError

source classmethod FreshEncoderModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

source classmethod FreshEncoderModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig

Fetch the model configuration.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

ModelConfig — The model configuration.

source class HuggingFaceEncoderModel(dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : BenchmarkModule

An encoder model from the Hugging Face Hub.

Initialise the model.

Parameters

model_config : ModelConfig —

The model configuration.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Attributes

generative_type : GenerativeType | None — Get the generative type of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type[Trainer] — The Trainer class to use for finetuning.

Methods

num_params — The number of parameters in the model.
vocab_size — The vocabulary size of the model.
model_max_length — The maximum context length of the model.
prepare_dataset — Prepare the dataset for the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.

source method HuggingFaceEncoderModel.num_params() → int

The number of parameters in the model.

Returns

int — The number of parameters in the model.

source method HuggingFaceEncoderModel.vocab_size() → int

The vocabulary size of the model.

Returns

int — The vocabulary size of the model.

source method HuggingFaceEncoderModel.model_max_length() → int

The maximum context length of the model.

Returns

int — The maximum context length of the model.

source property HuggingFaceEncoderModel.data_collator: c.Callable[[list[t.Any]], dict[str, t.Any]]

The data collator used to prepare samples during finetuning.

Returns

c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator.

source property HuggingFaceEncoderModel.generative_type: GenerativeType | None

Get the generative type of the model.

Returns

GenerativeType | None — The generative type of the model, or None if it has not been set yet.

source property HuggingFaceEncoderModel.extract_labels_from_generation: ExtractLabelsFunction

The function used to extract the labels from the generated output.

Returns

ExtractLabelsFunction — The function used to extract the labels from the generated output.

source property HuggingFaceEncoderModel.trainer_class: t.Type[Trainer]

The Trainer class to use for finetuning.

Returns

t.Type[Trainer] — The Trainer class.

source method HuggingFaceEncoderModel.prepare_dataset(dataset: DatasetDict, task: Task, itr_idx: int) → DatasetDict

Prepare the dataset for the model.

This includes things like tokenisation.

Parameters

dataset : DatasetDict —

The dataset to prepare.
task : Task —

The task to prepare the dataset for.
itr_idx : int —

The index of the dataset in the iterator.

Returns

DatasetDict — The prepared dataset.

Raises

NotImplementedError
InvalidBenchmark

source classmethod HuggingFaceEncoderModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

source classmethod HuggingFaceEncoderModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig

Fetch the model configuration.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

ModelConfig — The model configuration.

Raises

InvalidModel

source class LiteLLMModel()

Bases : BenchmarkModule

A generative model from LiteLLM.

Attributes

generative_type : GenerativeType | None — Get the generative type of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type[Trainer] — The Trainer class to use for finetuning.

Methods

generate — Generate outputs from the model.
num_params — The number of parameters in the model.
vocab_size — The vocabulary size of the model.
model_max_length — The maximum length of the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.
prepare_dataset — Prepare the dataset for the model.

source property LiteLLMModel.generative_type: GenerativeType | None

Get the generative type of the model.

Returns

GenerativeType | None — The generative type of the model, or None if it has not been set yet.

source method LiteLLMModel.generate(inputs: dict) → GenerativeModelOutput

Generate outputs from the model.

Parameters

inputs : dict —

A batch of inputs to pass through the model.

Returns

GenerativeModelOutput — The generated model outputs.

Raises

source method LiteLLMModel.num_params() → int

The number of parameters in the model.

Returns

int — The number of parameters in the model.

source method LiteLLMModel.vocab_size() → int

The vocabulary size of the model.

Returns

int — The vocabulary size of the model.

source method LiteLLMModel.model_max_length() → int

The maximum length of the model.

Returns

int — The maximum length of the model.

source property LiteLLMModel.data_collator: c.Callable[[list[t.Any]], dict[str, t.Any]]

The data collator used to prepare samples during finetuning.

Returns

c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator.

source property LiteLLMModel.extract_labels_from_generation: ExtractLabelsFunction

The function used to extract the labels from the generated output.

Returns

ExtractLabelsFunction — The function used to extract the labels from the generated output.

source property LiteLLMModel.trainer_class: t.Type[Trainer]

The Trainer class to use for finetuning.

Returns

t.Type[Trainer] — The Trainer class.

source classmethod LiteLLMModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

Raises

e

source classmethod LiteLLMModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig

Fetch the model configuration.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

ModelConfig — The model configuration.

source method LiteLLMModel.prepare_dataset(dataset: DatasetDict, task: Task, itr_idx: int) → DatasetDict

Prepare the dataset for the model.

This includes things like tokenisation.

Parameters

dataset : DatasetDict —

The dataset to prepare.
task : Task —

The task to prepare the dataset for.
itr_idx : int —

The index of the dataset in the iterator.

Returns

DatasetDict — The prepared dataset.

source class VLLMModel(dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : HuggingFaceEncoderModel

A generative model using the vLLM inference framework.

Initialise the vLLM model.

Parameters

model_config : ModelConfig —

The model configuration.
dataset_config : DatasetConfig —

The dataset configuration.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Attributes

generative_type : GenerativeType | None — Get the generative type of the model.
data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] — The data collator used to prepare samples during finetuning.
compute_metrics : ComputeMetricsFunction — The function used to compute the metrics.
extract_labels_from_generation : ExtractLabelsFunction — The function used to extract the labels from the generated output.
trainer_class : t.Type[Trainer] — The Trainer class to use for finetuning.

Methods

prepare_dataset — Prepare the dataset for the model.
generate — Generate outputs from the model.
model_exists — Check if a model exists.
get_model_config — Fetch the model configuration.

source property VLLMModel.generative_type: GenerativeType | None

Get the generative type of the model.

Returns

GenerativeType | None — The generative type of the model, or None if it has not been set yet.

source property VLLMModel.extract_labels_from_generation: ExtractLabelsFunction

The function used to extract the labels from the generated output.

Returns

ExtractLabelsFunction — The function used to extract the labels from the generated output.

source method VLLMModel.prepare_dataset(dataset: DatasetDict, task: Task, itr_idx: int) → DatasetDict

Prepare the dataset for the model.

This includes things like tokenisation.

Parameters

dataset : DatasetDict —

The dataset to prepare.
task : Task —

The task to prepare the dataset for.
itr_idx : int —

The index of the dataset in the iterator.

Returns

DatasetDict — The prepared dataset.

source method VLLMModel.generate(inputs: dict) → GenerativeModelOutput

Generate outputs from the model.

Parameters

inputs : dict —

A batch of inputs to pass through the model.

Returns

GenerativeModelOutput — The generated model outputs.

Raises

InvalidModel

source classmethod VLLMModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig) → bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

model_id : str —

The model ID.
benchmark_config : BenchmarkConfig —

The benchmark configuration.

Returns

bool | NeedsExtraInstalled | NeedsEnvironmentVariable — Whether the model exists, or an error describing why we cannot check whether the model exists.

source classmethod VLLMModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig) → ModelConfig