Skip to content

Classification of various user's emails based on writing patterns and extraction of topics from text corpus using Enron dataset

Notifications You must be signed in to change notification settings

krupalshah6996/email-classification-using-BERT

 
 

Repository files navigation

Email User Classification and Clustering Similar Emails

Used the Enron email dataset to represent a real life like scenario. Classified emails by understanding the content of the email sent. Did feature extraction using Google's NLP model BERT which gave us feature vectors. Used the feature vectors as input for a standard artificial neural network which did the classification. For the classification task, compared various machine learning models like Linear Support Vector Machine, Random Forest, SGD Classifier and LSTM. For the machine learning approaches, tried various embeddings like TF-IDF and CountVectorizer. Did topic modelling with Latent Dirichlet Allocation to find the major topics of discussion in the dataset.

About

Classification of various user's emails based on writing patterns and extraction of topics from text corpus using Enron dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.3%
  • Python 12.7%