USPECS: US Presidential Election Candidate Speeches Corpus

Overview

Project Aim: Identify differences in thematic worldfields and sentiment in US presidentaial campaign speeches between 2007 an 2024, comparing Democrats and Republicans.
Used Methods: Natural Language Processing, such as Topic Modelling and Sentiment Analysis
Key Findings: Differences in thematic worldfields across election periods and insights into rhethoric patterns based on the speaker.

Dataset

Speeches were extracted from the American Presidency Project using the Python Library beautifulsoup4.
The corpus contains 938 documents, 4.16 million tokens.
Preprocessing was done using a piepline comprised of Tokenization, POS tagging, lemmatization, and Named Entity Recognition using additional Python libraries.

Methodology

Techniques
- Topic Modelling: Using LDA with Gensimto extract thematic worldfields.
- Sentiment Analysis: Using a lexicon-based approach with NLTK-Library (VADER-Sentiment Analysis) to extract sentiment-scores per sentence.
Libraries
- Main Python-Libraries: spaCy, NLTK, SciPy and Pandas.
- For visualization: Plotly, Wordcloud and pyLDAvis

Getting started

datafolder is not included in the repo instead it is attached as release.

Update the data:

clone repo and download release
zip release and add the data folder along the repo.
remove rows of unwanted speeches either in the metadata.csv or the metadata.xlsx. inside the datafolder.
run preprocessing/dataUpdater.py once. It will delete the corresponding txt files and update the graphic.
zip the updated data folder and create a new release (increment version and document the changes)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
analysis		analysis
images		images
preprocessing		preprocessing
tests		tests
.gitignore		.gitignore
Readme.md		Readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

USPECS: US Presidential Election Candidate Speeches Corpus

Overview

Dataset

Methodology

Getting started

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

SojaSurfer/USPECSCorpus

Folders and files

Latest commit

History

Repository files navigation

USPECS: US Presidential Election Candidate Speeches Corpus

Overview

Dataset

Methodology

Getting started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages