Predict Fake News

Project to predict fake vs real news from a data set of news headlines, attached in dataset folder. We trained the model on data.cvs, then predicted real or fake on validation_data.csv. Several machine learning models were tested and evaluated: Logistic Regression, Naive Bayes, Support Vector Macine, XGBoost and Random Forest.

Final result:

Best result with a Logistic Regression model Accuracy: 94.74% – The model correctly classified news articles in 94.74% of cases.

F1-Score (Balance of Precision & Recall): Fake news: 94.86% Real news: 94.62%

Analysis

Several data cleaning techniques were tried, but fake news could be predicted better when not cleaning the data. Presumable fake news has more poor formatting in it, such as several exclamation marks. We also tried stemming and lemmitization, but important context was lost when applying these techniques, so accuracy went down.

Certain words proved to be important predictors of fake vs real news. These words were the strongest predictors for each class:

Deliverables

Python Code: main.ipynb
Predictions: testing_data_predicted.csv
Accuracy estimation: 90%
Presentation: View the Google Slides Presentation

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
dataset		dataset
LSTM_RealOrFakeNews_1.ipynb		LSTM_RealOrFakeNews_1.ipynb
README.md		README.md
main.ipynb		main.ipynb
testing_data_predicted (1).csv		testing_data_predicted (1).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predict Fake News

Final result:

Analysis

Deliverables

About

Uh oh!

Releases

Packages

Languages

KJanzon/project-nlp-challenge

Folders and files

Latest commit

History

Repository files navigation

Predict Fake News

Final result:

Analysis

Deliverables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages