scandeval.scores
source module scandeval.scores
Aggregation of raw scores into the mean and a confidence interval.
Functions
-
log_scores — Log the scores.
-
aggregate_scores — Helper function to compute the mean with confidence intervals.
source log_scores(dataset_name: str, metric_configs: list[MetricConfig], scores: list[dict[str, float]], model_id: str) → ScoreDict
Log the scores.
Parameters
-
dataset_name : str —
Name of the dataset.
-
metric_configs : list[MetricConfig] —
List of metrics to log.
-
scores : list[dict[str, float]] —
The scores that are to be logged. This is a list of dictionaries full of scores.
-
model_id : str —
The full Hugging Face Hub path to the pretrained transformer model.
Returns
-
ScoreDict — A dictionary with keys 'raw_scores' and 'total', with 'raw_scores' being identical to
scores
and 'total' being a dictionary with the aggregated scores (means and standard errors).
source aggregate_scores(scores: list[dict[str, float]], metric_config: MetricConfig) → tuple[float, float]
Helper function to compute the mean with confidence intervals.
Parameters
-
scores : list[dict[str, float]] —
Dictionary with the names of the metrics as keys, of the form "
_ ", such as "val_f1", and values the metric values. -
metric_config : MetricConfig —
The configuration of the metric, which is used to collect the correct metric from
scores
.
Returns
-
tuple[float, float] — A pair of floats, containing the score and the radius of its 95% confidence interval.