Sentiment Classification
📚 Overview
Sentiment classification is a classical task of determining the sentiment of a given text, which can be positive, negative, or neutral. It thus tests whether the model is able to understand the overall semantics of a given document.
When evaluating generative models, we allow the model to generate 5 tokens on this task.
📊 Metrics
The primary metric we use when evaluating the performance of a model on the sentiment classification task, we use Matthews correlation coefficient (MCC), which has a value between -100% and +100%, where 0% reflects a random guess. The primary benefit of MCC is that it is balanced even if the classes are imbalanced.
We also report the macro-average F1-score, being the average of the F1-score for each class, thus again weighing each class equally.
🛠️ How to run
In the command line interface of the ScandEval Python package, you can benchmark your favorite model on the sentiment classification task like so:
$ scandeval --model <model-id> --task sentiment-classification