CS545-Real-Fake-Image-Detection

Repository for developing Fake-Real Image Detection models using the Active Learning framework.

Dependencies

Install the required packages by running the following:

pip install -r requirements.txt

Getting the Data

We use a subset of the Sentry dataset for training and testing models. All images are resized to 256x256 and compressed as JPGs to reduce size and mimic the format used most commonly for uploading images to websites and the internet. The subset we partition contains 240k fake and 240k real images.

Download the dataset from Kaggle here. Alternatively, use the scripts below to manually download the fake and real data. The scripts allow for custom partition sizes of the fake data.

cd CS545-Real-Fake-Image-Detection

Run the following script to download, compress, and generate the subset of fake images and metadata used from Sentry.

Note: for each dataset in sentry, the respective tar files are downloaded together, extracted, and then deleted to save space. Meaning, you will at most need ~100 GBs to handle the intermediate downloading process. The final compressed subset is only ~18 GBs.

python3 datagen/make_sentry_subset.py < /dev/null > log.txt 2>&1 &

For real image data, we use CC3M (Google Conceptual Captions), FFHQ, and AFHQv2 for training, and CC3M and CelebA-HQ for testing.

For CC3M, we use 155k of the train images and all of the val images.

Run the following script to download all real data components:

python3 datagen/cc3m/add_real_data.py < /dev/null > log.txt 2>&1 &

After downloading the data, there should be 482k train images and 187k validation images.

Active Learning

We leverage Active Learning in an attempt to improve/maintain model performance while reducing the overall size of the training data.

An example of how to use the Active Learning code is located in UniversalFakeDetect/train.py under the train_active_learning() function.

Available Models

The models currently available are listed below. The names listed can be directly input as an argument for the --arch option.

Imagenet:resnet18
Imagenet:resnet34
Imagenet:resnet50
Imagenet:resnet101
Imagenet:resnet152
Imagenet:vgg11
Imagenet:vgg19
Imagenet:vit_b_16
CLIP:RN50
CLIP:RN101
CLIP:ViT-L/14

Available Acquisition Functions for Active Learning

The numbers associated with each can be directly input as an argument for the --acq_func option.

Random Uniform Sampling
Max Entropy
BALD
Variational Ratios
Mean Standard Deviation
Loss Weighted Max Entropy
Loss Weighted BALD
Loss Weighted Variational Ratios
Loss Weighted Mean Standard Deviation

Training the Models

Run the train_normal.sh script for normal training without Active Learning. Edit the arguments in the script to change the model and adjust training hyperparameters.

bash UniversalFakeDetect/train_normal.sh

To run training with Active Learning:

bash UniversalFakeDetect/train_active_learning.sh

Note: testing is automatically performed after Active Learning training finishes. The results dict will be printed to the output log file.

Performing Inference

First in UniversalFakeDetect folder, go to dataset_paths.py and edit the first line of code, ROOT=, with the path to your sentry subset folder. Run the following to perform inference for either normal or active learning checkpoints. Make sure to edit the path arguments for the checkpoint file and the save directory.

bash UniversalFakeDetect/test.sh

LLaVA Real-Fake Explainer

Training data

Explainer training data is generated with LLaVA 1.6-7b and source category-guided prompts as shown in LLaVA_Experiments/llava_train_data.py. The real_fake_llava_train.json file can be found here.

Test Explanations

Human-like reasoning behind why our LoRA (PEFT) finetuned LLaVA model thinks a test image is real or fake can be found here in the test_explanations_and_classifications.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
LLaVa_Experiments		LLaVa_Experiments
UniversalFakeDetect		UniversalFakeDetect
active_learning		active_learning
datagen		datagen
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CS545_Project_Paper.pdf		CS545_Project_Paper.pdf
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS545-Real-Fake-Image-Detection

Dependencies

Getting the Data

Active Learning

Available Models

Available Acquisition Functions for Active Learning

Training the Models

Performing Inference

LLaVA Real-Fake Explainer

Training data

Test Explanations

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

CarrotPeeler/CS545-Real-Fake-Image-Detection

Folders and files

Latest commit

History

Repository files navigation

CS545-Real-Fake-Image-Detection

Dependencies

Getting the Data

Active Learning

Available Models

Available Acquisition Functions for Active Learning

Training the Models

Performing Inference

LLaVA Real-Fake Explainer

Training data

Test Explanations

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages