Skip to content

scandeval.benchmark_modules.fresh

source module scandeval.benchmark_modules.fresh

Freshly initialised encoder models.

Classes

Functions

source class FreshEncoderModel(dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)

Bases : HuggingFaceEncoderModel

A freshly initialised encoder model.

Initialise the model.

Parameters

Attributes

  • generative_type : GenerativeType | None Get the generative type of the model.

  • data_collator : c.Callable[[list[t.Any]], dict[str, t.Any]] The data collator used to prepare samples during finetuning.

  • compute_metrics : ComputeMetricsFunction The function used to compute the metrics.

  • extract_labels_from_generation : ExtractLabelsFunction The function used to extract the labels from the generated output.

  • trainer_class : t.Type[Trainer] The Trainer class to use for finetuning.

Methods

source method FreshEncoderModel.num_params()int

The number of parameters in the model.

Returns

  • int The number of parameters in the model.

Raises

  • NotImplementedError

source method FreshEncoderModel.vocab_size()int

The vocabulary size of the model.

Returns

  • int The vocabulary size of the model.

Raises

  • NotImplementedError

source method FreshEncoderModel.model_max_length()int

The maximum context length of the model.

Returns

  • int The maximum context length of the model.

Raises

  • NotImplementedError

source classmethod FreshEncoderModel.model_exists(model_id: str, benchmark_config: BenchmarkConfig)bool | NeedsExtraInstalled | NeedsEnvironmentVariable

Check if a model exists.

Parameters

  • model_id : str

    The model ID.

  • benchmark_config : BenchmarkConfig

    The benchmark configuration.

Returns

source classmethod FreshEncoderModel.get_model_config(model_id: str, benchmark_config: BenchmarkConfig)ModelConfig

Fetch the model configuration.

Parameters

  • model_id : str

    The model ID.

  • benchmark_config : BenchmarkConfig

    The benchmark configuration.

Returns

source load_model_and_tokenizer(model_config: ModelConfig, dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig, model_max_length: int)tuple[PreTrainedModel, PreTrainedTokenizer]

Load the model and tokenizer.

Parameters

  • model_config : ModelConfig

    The model configuration.

  • dataset_config : DatasetConfig

    The dataset configuration.

  • benchmark_config : BenchmarkConfig

    The benchmark configuration.

  • model_max_length : int

    The maximum context length of the model.

Returns

  • tuple[PreTrainedModel, PreTrainedTokenizer] The loaded model and tokenizer.

Raises