Skip to content

KJanzon/project-nlp-challenge

 
 

Repository files navigation

Predict Fake News

Project to predict fake vs real news from a data set of news headlines, attached in dataset folder. We trained the model on data.cvs, then predicted real or fake on validation_data.csv. Several machine learning models were tested and evaluated: Logistic Regression, Naive Bayes, Support Vector Macine, XGBoost and Random Forest.

Final result:

Best result with a Logistic Regression model Accuracy: 94.74% – The model correctly classified news articles in 94.74% of cases.

F1-Score (Balance of Precision & Recall): Fake news: 94.86% Real news: 94.62%

Analysis

Several data cleaning techniques were tried, but fake news could be predicted better when not cleaning the data. Presumable fake news has more poor formatting in it, such as several exclamation marks. We also tried stemming and lemmitization, but important context was lost when applying these techniques, so accuracy went down. unnamed (1)

Certain words proved to be important predictors of fake vs real news. These words were the strongest predictors for each class: wordcloudpredictivepower

Deliverables

  1. Python Code: main.ipynb
  2. Predictions: testing_data_predicted.csv
  3. Accuracy estimation: 90%
  4. Presentation: View the Google Slides Presentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%