Protest event detection

This code can be used to run protest event detection models that were trained as part of my master thesis. There are four different models:

Haystack classifier: Predict whether a given text is about a protest (binary classification).
Haystack and form classifier: Predict whether a given text is about a protest, and the form of the protest (multiclass classification).
Haystack and issue classifier: Predict whether a given text is about a protest, and the issue of the protest (multiclass classification).
Haystack and target classifier: Predict whether a given text is about a protest, and the target of the protest (multiclass classification).
Full multitask classifier: Jointly predicts all the above tasks.

List of classes

Haystack classes

Non-protest
Protest

Form classes

Blockade/slowdown/disruption
Boycott
Hunger strike
March
Non-protest
Rally/demonstration
Riot
Strike/walkout/lockout

Issue classes

Anti-colonial/political independence
Anti-war/peace
Criminal justice system
Democratisation
Economy/inequality
Environmental
Foreign policy
Human and civil rights
Labour & work
Non-protest
Political corruption/malfeasance
Racial/ethnic rights
Religion
Social services & welfare
None of the above

Target classes

Domestic government
Foreign government
Individual
Intergovernmental organisation
Non-protest
Private/business

How-to

Running the Python (3.7+) script requires a couple of packages:

PyTorch 1.4 (works with and without CUDA)
huggingface transformers
tqdm
numpy

The models can be downloaded from here (approx. 4.2GB). The file must be extracted into the src/ directory, and this can be done using e.g. 7-zip on Windows or with tar on Linux/Mac.

Using tar, the command is

tar -xzvf models.tar.gz -C path/to/protest-event-detection/src/

The main script to run is classify_article.py. It has several parameters:

--article: Path to a text file with the raw article text to make predictions on. Optimal article length is around 350 words (maximum).
--output_path (optional): Path to where the prediction file will be stored. At default the script does not output predictions to file, but to terminal.
--out_file (optional): Name of the output file where the prediction will be stored, if the above parameter is set. Default name: pred.txt.
--task: Which prediction model to use. Possible values are haystack, form, issue, target and multi.
--mc_samples (optiona): Number of times to make predictions on the same article. This sets the model in training mode, such that its predictions are stochastic. Then, the final prediction is based on an average of the number of predictions made. This gives additional output (uncertainty estimates). If not set, the model makes a single prediction in evaluation mode. Recommended value is 50 if not zero. Default value: 0.
--gpu_devices (optional): If CUDA is available, will use GPU(s). Here, one can specify a GPU to use, e.g. --gpu_devices 1 if there are multiple GPUs in the system and you want to use the second GPU. With CUDA available, using a single GPU is more than enough to classify one article. Normally, it is not necessary to modify this option. Default value: 0 (the first GPU).

Note: Depending on your setup, running predictions without CUDA/GPU is (relatively) very slow.

Run examples

Make a single haystack prediction without uncertainty estimates:

python classify_article.py --article ../path/to/article.txt --task haystack

Make 50 Monte Carlo haystack predictions and output to file:

python classify_article.py --article ../path/to/article.txt --mc_samples 50 --task haystack --output_path ../some/path/

Make 50 Monte Carlo haystack and form predictions and output to file with specific name:

python classify_article.py --article ../path/to/article.txt --mc_samples 50 --task form --output_path ../some/path/ --out_file prediction.txt

Make 50 Monte Carlo predictions for all tasks and output to file:

python classify_article.py --article ../path/to/article.txt --mc_samples 50 --task multi --output_path ../some/path/

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Protest event detection

List of classes

How-to

Run examples

About

Uh oh!

Releases

Packages

Languages

chrisghai/protest-event-detection

Folders and files

Latest commit

History

Repository files navigation

Protest event detection

List of classes

How-to

Run examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages