Skip to content

scandeval.task_utils.question_answering

source module scandeval.task_utils.question_answering

Utility functions related to the question-answering task group.

Classes

Functions

source class QuestionAnsweringTrainer(**kwargs)

Bases : Trainer

Trainer subclass for question answering tasks.

Initialize the trainer.

Methods

  • evaluate Evaluate the model on the given dataset.

source method QuestionAnsweringTrainer.evaluate(eval_dataset: Dataset | None = None, orig_eval_dataset: Dataset | None = None, ignore_keys: list[str] | None = None, metric_key_prefix: str = 'eval')dict[str, float] | None

Evaluate the model on the given dataset.

Parameters

  • eval_dataset : Dataset | None

    The dataset to evaluate on. If None, then use the stored evaluation dataset.

  • orig_eval_dataset : Dataset | None

    The original evaluation dataset, before any postprocessing. If None, then use the stored original evaluation dataset.

  • ignore_keys : list[str] | None

    The keys to ignore when computing the metrics.

  • metric_key_prefix : str

    The prefix to use for the metric keys.

Returns

  • dict[str, float] | None The metrics computed on the evaluation dataset.

source compute_metrics(model_outputs_and_labels: tuple[Predictions, Labels], dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig)dict[str, float]

Compute the metrics needed for evaluation.

Parameters

  • model_outputs_and_labels : tuple[Predictions, Labels]

    The first sequence contains the model outputs and the second sequence contains the true labels.

  • dataset_config : DatasetConfig

    The configuration of the dataset.

  • benchmark_config : BenchmarkConfig

    The configuration of the benchmark.

Returns

  • dict[str, float] A dictionary with the names of the metrics as keys and the metric values as values.

source extract_labels_from_generation(input_batch: dict[str, list], model_output: GenerativeModelOutput)list[t.Any]

Extract the predicted labels from the generated output.

Parameters

  • input_batch : dict[str, list]

    The input batch, where the keys are the feature names and the values are lists with the feature values.

  • model_output : GenerativeModelOutput

    The raw generated output of the model.

Returns

  • list[t.Any] The predicted labels.

source prepare_train_examples(examples: BatchEncoding, tokenizer: PreTrainedTokenizer)BatchEncoding

Prepare the features for training.

Parameters

  • examples : BatchEncoding

    The examples to prepare.

  • tokenizer : PreTrainedTokenizer

    The tokenizer to use to prepare the examples.

Returns

  • BatchEncoding The prepared examples.

source prepare_test_examples(examples: BatchEncoding, tokenizer: PreTrainedTokenizer)BatchEncoding

Prepare test examples.

Parameters

  • examples : BatchEncoding

    Dictionary of test examples.

  • tokenizer : PreTrainedTokenizer

    The tokenizer used to preprocess the examples.

Returns

  • BatchEncoding The prepared test examples.

source postprocess_predictions_and_labels(predictions: list, dataset: Dataset, prepared_dataset: Dataset, cls_token_index: int)tuple[list[dict], list[dict]]

Postprocess the predictions and labels, to allow easier metric computation.

Parameters

  • predictions : list

    A pair of (start_logits, end_logits) predictions.

  • dataset : Dataset

    The dataset containing the examples.

  • prepared_dataset : Dataset

    The dataset containing the prepared examples.

  • cls_token_index : int

    The index of the CLS token.

Returns

  • tuple[list[dict], list[dict]] The postprocessed predictions and labels.

source find_best_answer(all_start_logits: np.ndarray, all_end_logits: np.ndarray, prepared_dataset: Dataset, feature_indices: list[int], context: str, max_answer_length: int, num_best_logits: int, min_null_score: float, cls_token_index: int)str

Find the best answer for a given example.

Parameters

  • all_start_logits : np.ndarray

    The start logits for all the features.

  • all_end_logits : np.ndarray

    The end logits for all the features.

  • prepared_dataset : Dataset

    The dataset containing the prepared examples.

  • feature_indices : list[int]

    The indices of the features associated with the current example.

  • context : str

    The context of the example.

  • max_answer_length : int

    The maximum length of the answer.

  • num_best_logits : int

    The number of best logits to consider.

  • min_null_score : float

    The minimum score an answer can have.

  • cls_token_index : int

    The index of the CLS token.

Returns

  • str The best answer for the example.

source find_valid_answers(start_logits: np.ndarray, end_logits: np.ndarray, offset_mapping: list[tuple[int, int]], context: str, max_answer_length: int, num_best_logits: int, min_null_score: float)list[dict]

Find the valid answers from the start and end indexes.

Parameters

  • start_logits : np.ndarray

    The logits for the start of the answer.

  • end_logits : np.ndarray

    The logits for the end of the answer.

  • offset_mapping : list[tuple[int, int]]

    The offset mapping, being a list of pairs of integers for each token index, containing the start and end character index in the original context.

  • context : str

    The context of the example.

  • max_answer_length : int

    The maximum length of the answer.

  • num_best_logits : int

    The number of best logits to consider. Note that this function will run in O(num_best_logits ^ 2) time.

  • min_null_score : float

    The minimum score an answer can have.

Returns

  • list[dict] A list of the valid answers, each being a dictionary with keys "text" and "score", the score being the sum of the start and end logits.