Skip to content

scandeval.scores

source module scandeval.scores

Aggregation of raw scores into the mean and a confidence interval.

Functions

source log_scores(dataset_name: str, metric_configs: list[MetricConfig], scores: list[dict[str, float]], model_id: str)ScoreDict

Log the scores.

Parameters

  • dataset_name : str

    Name of the dataset.

  • metric_configs : list[MetricConfig]

    List of metrics to log.

  • scores : list[dict[str, float]]

    The scores that are to be logged. This is a list of dictionaries full of scores.

  • model_id : str

    The full Hugging Face Hub path to the pretrained transformer model.

Returns

  • ScoreDict A dictionary with keys 'raw_scores' and 'total', with 'raw_scores' being identical to scores and 'total' being a dictionary with the aggregated scores (means and standard errors).

source aggregate_scores(scores: list[dict[str, float]], metric_config: MetricConfig)tuple[float, float]

Helper function to compute the mean with confidence intervals.

Parameters

  • scores : list[dict[str, float]]

    Dictionary with the names of the metrics as keys, of the form "_", such as "val_f1", and values the metric values.

  • metric_config : MetricConfig

    The configuration of the metric, which is used to collect the correct metric from scores.

Returns

  • tuple[float, float] A pair of floats, containing the score and the radius of its 95% confidence interval.