Project of CL Team Lab, 22SS, University of Stuttgart
Checkout the report for more details.
- Different combinations of n-grams ranging from 1-2 to 3-4-grams are proposed to tackle the elusive nature of emotion expression in textual data. This approach is tested by comparing support vector machine (SVM) and convolutional neural network (CNN). The experiments demonstrate that our approach is effective when combining more n-grams in CNN, whereas the performance of SVM degrades when combining higher n-grams.
- Initial work: A multi-class perceptron using bag-of-words built from scratch is also provided.
- classifier: Perceptron baseline module, CNN module
- evaluaion: evaluaion - to calculate precision, recall, and F1 score for perceptron baseline
- preprocessing: preprocessing classes for perceptron baseline and CNN
- run_training: 'ModelName_main.py' is for training model
- trained_classifiers: Folder for saving trained models
- Clone this repository
- Get required pip libraries
pip install -r requirements.txt
- Install Anaconda: https://docs.anaconda.com/anaconda/install/. Create an environment with conda and install all relevant libraries:
pip install scikit-learn
pip install tensorflow
pip install keras
pip install numpy
pip install nltk
- Download pre-trained GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip
- Create 'glove.6B' folder in the repository and save 'glove.6B.100d.txt' file there
- Download the ISEAR dataset from SWISS CENTER FOR AFFECTIVE SCIENCES
- Create 'data' folder in the repository and save the dataset there
- Run the 'ModelName_main.py' script in 'run_training' folder
All experiments used the International Survey on Emotion Antecedents and Reactions (ISEAR) dataset, in which texts are catergorized into seven emotions: joy, fear, anger, sadness, disgust, shame and guilt.
All texts are converted into lowercase and tokenized, without stemming. Two data preprocessing settings are applied to experiment their effects on models' performance:
- Setting 1: with tokenization only
- Setting 2: with punctuation and stopwords removal
- Text representaion: TF-IDF
- Word embeddings: Pre-trained GloVe embeddings with 100 dimensions
| Preprocess | Features | SVM | CNN |
|---|---|---|---|
| Setting 1 | 1-gram | 0.54 | 0.56 |
| 3-grams | 0.42 | 0.58 | |
| 1-2-grams | 0.55 | 0.58 | |
| 3-4-grams | 0.41 | 0.58 | |
| 1-3-grams | 0.56 | 0.61 | |
| Setting 2 | 1-gram | 0.55 | 0.55 |
| 3-grams | 0.15 | 0.53 | |
| 1-2-grams | 0.54 | 0.56 | |
| 3-4-grams | 0.15 | 0.54 | |
| 1-3-grams | 0.54 | 0.58 |