Emotion Classification on Textual Data

Project of CL Team Lab, 22SS, University of Stuttgart

Checkout the report for more details.

Contributors

Project Introduction

Different combinations of n-grams ranging from 1-2 to 3-4-grams are proposed to tackle the elusive nature of emotion expression in textual data. This approach is tested by comparing support vector machine (SVM) and convolutional neural network (CNN). The experiments demonstrate that our approach is effective when combining more n-grams in CNN, whereas the performance of SVM degrades when combining higher n-grams.
Initial work: A multi-class perceptron using bag-of-words built from scratch is also provided.

Folders of This Repository

classifier: Perceptron baseline module, CNN module
evaluaion: evaluaion - to calculate precision, recall, and F1 score for perceptron baseline
preprocessing: preprocessing classes for perceptron baseline and CNN
run_training: 'ModelName_main.py' is for training model
trained_classifiers: Folder for saving trained models

Reusing the Materials

Clone this repository
Get required pip libraries

pip install -r requirements.txt

CNN

Install Anaconda: https://docs.anaconda.com/anaconda/install/. Create an environment with conda and install all relevant libraries:

pip install scikit-learn
pip install tensorflow
pip install keras
pip install numpy
pip install nltk

Download pre-trained GloVe embeddings from http://nlp.stanford.edu/data/glove.6B.zip
- Create 'glove.6B' folder in the repository and save 'glove.6B.100d.txt' file there

Download the ISEAR dataset from SWISS CENTER FOR AFFECTIVE SCIENCES
- Create 'data' folder in the repository and save the dataset there
Run the 'ModelName_main.py' script in 'run_training' folder

Experiments

Dataset

All experiments used the International Survey on Emotion Antecedents and Reactions (ISEAR) dataset, in which texts are catergorized into seven emotions: joy, fear, anger, sadness, disgust, shame and guilt.

Data Preprocessing

All texts are converted into lowercase and tokenized, without stemming. Two data preprocessing settings are applied to experiment their effects on models' performance:

Setting 1: with tokenization only
Setting 2: with punctuation and stopwords removal

SVM

Text representaion: TF-IDF

CNN

Word embeddings: Pre-trained GloVe embeddings with 100 dimensions

Experiment Results

Preprocess	Features	SVM	CNN
Setting 1	1-gram	0.54	0.56
	3-grams	0.42	0.58
	1-2-grams	0.55	0.58
	3-4-grams	0.41	0.58
	1-3-grams	0.56	0.61
Setting 2	1-gram	0.55	0.55
	3-grams	0.15	0.53
	1-2-grams	0.54	0.56
	3-4-grams	0.15	0.54
	1-3-grams	0.54	0.58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emotion Classification on Textual Data

Contributors

Project Introduction

Folders of This Repository

Reusing the Materials

CNN

Experiments

Dataset

Data Preprocessing

SVM

CNN

Experiment Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
classifiers		classifiers
evaluation		evaluation
preprocessing		preprocessing
run_training		run_training
trained_classifiers		trained_classifiers
.gitignore		.gitignore
README.md		README.md
final_report.pdf		final_report.pdf
requirements.txt		requirements.txt

chihyi-lin/Emotion-Classification

Folders and files

Latest commit

History

Repository files navigation

Emotion Classification on Textual Data

Contributors

Project Introduction

Folders of This Repository

Reusing the Materials

CNN

Experiments

Dataset

Data Preprocessing

SVM

CNN

Experiment Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages