Linguistic Acceptability

📚 Overview

Linguistic acceptability is a task of determining whether a given text is grammatically correct or not. It thus tests whether the model is able to understand the detailed syntax of a given document, and not just understand the overall gist of it. It roughly corresponds to when a native speaker would say "this sentence sounds weird".

When evaluating generative models, we allow the model to generate 5 tokens on this task.

📊 Metrics

The primary metric we use when evaluating the performance of a model on the linguistic acceptability task, we use Matthews correlation coefficient (MCC), which has a value between -100% and +100%, where 0% reflects a random guess. The primary benefit of MCC is that it is balanced even if the classes are imbalanced.

We also report the macro-average F1-score, being the average of the F1-score for each class, thus again weighing each class equally.

🛠️ How to run

In the command line interface of the ScandEval Python package, you can benchmark your favorite model on the linguistic acceptability task like so:

$ scandeval --model <model-id> --task linguistic-acceptability