scandeval.task_utils.question_answering
source module scandeval.task_utils.question_answering
Utility functions related to the question-answering task group.
Classes
-
QuestionAnsweringTrainer — Trainer subclass for question answering tasks.
Functions
-
compute_metrics — Compute the metrics needed for evaluation.
-
extract_labels_from_generation — Extract the predicted labels from the generated output.
-
prepare_train_examples — Prepare the features for training.
-
prepare_test_examples — Prepare test examples.
-
postprocess_predictions_and_labels — Postprocess the predictions and labels, to allow easier metric computation.
-
find_best_answer — Find the best answer for a given example.
-
find_valid_answers — Find the valid answers from the start and end indexes.
source class QuestionAnsweringTrainer(**kwargs)
Bases : Trainer
Trainer subclass for question answering tasks.
Initialize the trainer.
Methods
-
evaluate — Evaluate the model on the given dataset.
source method QuestionAnsweringTrainer.evaluate(eval_dataset: Dataset | None = None, orig_eval_dataset: Dataset | None = None, ignore_keys: list[str] | None = None, metric_key_prefix: str = 'eval') → dict[str, float] | None
Evaluate the model on the given dataset.
Parameters
-
eval_dataset : Dataset | None —
The dataset to evaluate on. If None, then use the stored evaluation dataset.
-
orig_eval_dataset : Dataset | None —
The original evaluation dataset, before any postprocessing. If None, then use the stored original evaluation dataset.
-
ignore_keys : list[str] | None —
The keys to ignore when computing the metrics.
-
metric_key_prefix : str —
The prefix to use for the metric keys.
Returns
-
dict[str, float] | None — The metrics computed on the evaluation dataset.
source compute_metrics(model_outputs_and_labels: tuple[Predictions, Labels], dataset_config: DatasetConfig, benchmark_config: BenchmarkConfig) → dict[str, float]
Compute the metrics needed for evaluation.
Parameters
-
model_outputs_and_labels : tuple[Predictions, Labels] —
The first sequence contains the model outputs and the second sequence contains the true labels.
-
dataset_config : DatasetConfig —
The configuration of the dataset.
-
benchmark_config : BenchmarkConfig —
The configuration of the benchmark.
Returns
-
dict[str, float] — A dictionary with the names of the metrics as keys and the metric values as values.
source extract_labels_from_generation(input_batch: dict[str, list], model_output: GenerativeModelOutput) → list[t.Any]
Extract the predicted labels from the generated output.
Parameters
-
input_batch : dict[str, list] —
The input batch, where the keys are the feature names and the values are lists with the feature values.
-
model_output : GenerativeModelOutput —
The raw generated output of the model.
Returns
-
list[t.Any] — The predicted labels.
source prepare_train_examples(examples: BatchEncoding, tokenizer: PreTrainedTokenizer) → BatchEncoding
Prepare the features for training.
Parameters
-
examples : BatchEncoding —
The examples to prepare.
-
tokenizer : PreTrainedTokenizer —
The tokenizer to use to prepare the examples.
Returns
-
BatchEncoding — The prepared examples.
source prepare_test_examples(examples: BatchEncoding, tokenizer: PreTrainedTokenizer) → BatchEncoding
Prepare test examples.
Parameters
-
examples : BatchEncoding —
Dictionary of test examples.
-
tokenizer : PreTrainedTokenizer —
The tokenizer used to preprocess the examples.
Returns
-
BatchEncoding — The prepared test examples.
source postprocess_predictions_and_labels(predictions: list, dataset: Dataset, prepared_dataset: Dataset, cls_token_index: int) → tuple[list[dict], list[dict]]
Postprocess the predictions and labels, to allow easier metric computation.
Parameters
-
predictions : list —
A pair of (start_logits, end_logits) predictions.
-
dataset : Dataset —
The dataset containing the examples.
-
prepared_dataset : Dataset —
The dataset containing the prepared examples.
-
cls_token_index : int —
The index of the CLS token.
Returns
-
tuple[list[dict], list[dict]] — The postprocessed predictions and labels.
source find_best_answer(all_start_logits: np.ndarray, all_end_logits: np.ndarray, prepared_dataset: Dataset, feature_indices: list[int], context: str, max_answer_length: int, num_best_logits: int, min_null_score: float, cls_token_index: int) → str
Find the best answer for a given example.
Parameters
-
all_start_logits : np.ndarray —
The start logits for all the features.
-
all_end_logits : np.ndarray —
The end logits for all the features.
-
prepared_dataset : Dataset —
The dataset containing the prepared examples.
-
feature_indices : list[int] —
The indices of the features associated with the current example.
-
context : str —
The context of the example.
-
max_answer_length : int —
The maximum length of the answer.
-
num_best_logits : int —
The number of best logits to consider.
-
min_null_score : float —
The minimum score an answer can have.
-
cls_token_index : int —
The index of the CLS token.
Returns
-
str — The best answer for the example.
source find_valid_answers(start_logits: np.ndarray, end_logits: np.ndarray, offset_mapping: list[tuple[int, int]], context: str, max_answer_length: int, num_best_logits: int, min_null_score: float) → list[dict]
Find the valid answers from the start and end indexes.
Parameters
-
start_logits : np.ndarray —
The logits for the start of the answer.
-
end_logits : np.ndarray —
The logits for the end of the answer.
-
offset_mapping : list[tuple[int, int]] —
The offset mapping, being a list of pairs of integers for each token index, containing the start and end character index in the original context.
-
context : str —
The context of the example.
-
max_answer_length : int —
The maximum length of the answer.
-
num_best_logits : int —
The number of best logits to consider. Note that this function will run in O(
num_best_logits
^ 2) time. -
min_null_score : float —
The minimum score an answer can have.
Returns
-
list[dict] — A list of the valid answers, each being a dictionary with keys "text" and "score", the score being the sum of the start and end logits.