Skip to content

scandeval.human_evaluation

source module scandeval.human_evaluation

Gradio app for conducting human evaluation of the tasks.

Classes

  • HumanEvaluator An app for evaluating human performance on the ScandEval benchmark.

Functions

  • main Start the Gradio app for human evaluation.

source class HumanEvaluator(title: str, description: str, dummy_model_id: str = 'mistralai/Mistral-7B-v0.1')

An app for evaluating human performance on the ScandEval benchmark.

Initialize the HumanEvaluator.

Parameters

  • annotator_id : int

    The annotator ID for the evaluation.

  • title : str

    The title of the app.

  • description : str

    The description of the app.

  • dummy_model_id : str

    The model ID to use for generating prompts.

Methods

source method HumanEvaluator.create_app()gr.Blocks

Create the Gradio app for human evaluation.

Returns

  • gr.Blocks The Gradio app for human evaluation.

source method HumanEvaluator.update_dataset_choices(language: str | None, task: str | None)gr.Dropdown

Update the dataset choices based on the selected language and task.

Parameters

  • language : str | None

    The language selected by the user.

  • task : str | None

    The task selected by the user.

Returns

  • gr.Dropdown A list of dataset names that match the selected language and task.

source method HumanEvaluator.update_dataset(dataset_name: str, iteration: int)tuple[gr.Markdown, gr.Markdown, gr.Dropdown, gr.Textbox, gr.Button, gr.Button, gr.Textbox, gr.Button]

Update the dataset based on a selected dataset name.

Parameters

  • dataset_name : str

    The dataset name selected by the user.

  • iteration : int

    The iteration index of the datasets to evaluate.

Returns

  • tuple[gr.Markdown, gr.Markdown, gr.Dropdown, gr.Textbox, gr.Button, gr.Button, gr.Textbox, gr.Button] A tuple (task_examples, question, entity_type, entity, entity_add_button, entity_reset_button, answer, submit_button) for the selected dataset.

Raises

  • NotImplementedError

source method HumanEvaluator.add_entity_to_answer(question: str, entity_type: str, entity: str, answer: str)tuple[gr.Textbox, gr.Textbox]

Add an entity to the answer.

Parameters

  • question : str

    The current question.

  • entity_type : str

    The entity type selected by the user.

  • entity : str

    The entity provided by the user.

  • answer : str

    The current answer.

Returns

  • tuple[gr.Textbox, gr.Textbox] A tuple (entity, answer) with a (blank) entity and answer.

source method HumanEvaluator.reset_entities()gr.Textbox

Reset the entities in the answer.

Returns

  • gr.Textbox A blank answer.

source method HumanEvaluator.submit_answer(dataset_name: str, question: str, answer: str, annotator_id: int)tuple[str, str]

Submit an answer to the dataset.

Parameters

  • dataset_name : str

    The name of the dataset.

  • question : str

    The question for the dataset.

  • answer : str

    The answer to the question.

  • annotator_id : int

    The annotator ID for the evaluation.

Returns

  • tuple[str, str] A tuple (question, answer), with question being the next question, and answer being an empty string.

source method HumanEvaluator.example_to_markdown(example: dict)tuple[str, str]

Convert an example to a Markdown string.

Parameters

  • example : dict

    The example to convert.

Returns

  • tuple[str, str] A tuple (task_examples, question) for the example.

source method HumanEvaluator.compute_and_log_scores()None

Computes and logs the scores for the dataset.

source main(annotator_id: int)None

Start the Gradio app for human evaluation.

Raises