scandeval.task_utils.sequence_classification
source module scandeval.task_utils.sequence_classification
Utility functions related to the sequence-classification task group.
Functions
-
compute_metrics — Compute the metrics needed for evaluation.
-
extract_labels_from_generation — Extract the predicted labels from the generated output.
-
get_closest_logprobs_labels — Get the labels with the highest predicted logprob value.
-
get_closest_word_edit_labels — Get the labels with the smallest edit distance to the predicted labels.
source compute_metrics(model_outputs_and_labels: tuple[Predictions, Labels], dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → dict[str, float]
Compute the metrics needed for evaluation.
Parameters
-
model_outputs_and_labels : tuple[Predictions, Labels] —
The first sequence contains the model outputs and the second sequence contains the true labels.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
-
benchmark_config : BenchmarkConfig —
The configuration of the benchmark.
Returns
-
dict[str, float] — A dictionary with the names of the metrics as keys and the metric values as values.
source extract_labels_from_generation(input_batch: dict[str, list], model_output: GenerativeModelOutput, dataset_config: DatasetConfig) → list[str]
Extract the predicted labels from the generated output.
Parameters
-
input_batch : dict[str, list] —
The input batch, where the keys are the feature names and the values are lists with the feature values.
-
model_output : GenerativeModelOutput —
The raw generated output of the model.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
Returns
-
list[str] — The predicted labels.
source get_closest_logprobs_labels(generation_logprobs: list[list[list[tuple[str, float]]]], dataset_config: DatasetConfig) → list[str]
Get the labels with the highest predicted logprob value.
In case a candidate label is split into multiple tokens, we only use the first token to compute the logprob value. E.g., if the candidate label "positive" is tokenised as ["pos", "itive"], we only use the logprob value of "pos" to represent the logprob value of the entire label.
Parameters
-
generation_logprobs : list[list[list[tuple[str, float]]]] —
The logprobs of the generated tokens, for all samples in the batch. Of shape (batch_size, num_tokens, num_logprobs).
-
dataset_config : DatasetConfig —
The configuration of the dataset.
Returns
-
list[str] — The predicted labels.
Raises
-
InvalidBenchmark —
If no candidate label can be found for any of the generated labels.
source get_closest_word_edit_labels(generated_sequences: list[str], dataset_config: DatasetConfig) → list[str]
Get the labels with the smallest edit distance to the predicted labels.
Parameters
-
generated_sequences : list[str] —
The generated sequences from the model.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
Returns
-
list[str] — The candidate labels with the smallest edit distance to the predicted labels.