Skip to content

Analyze AMR-related Instagram content using Instaloader, NLP, and image recognition — bridging social media and public health research.

Notifications You must be signed in to change notification settings

WeiJanChang/amr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMR

This project aims to categorize and evaluate images, videos, and posts related to antimicrobial resistance (AMR) on Instagram, using the Instagram API via Instaloader.

Installation

  • Create an environment for the required dependencies
conda create -n [ENV_NAME] python =3.10
conda activate [ENV_NAME]
cd [CLONED_DIRECTORY]
pip install -r requirements.txt  

Flowchart

Example 2

Usage

Pre-processing

  • After extracting data and obtaining JSON files using Apify, read the directory containing the downloaded JSON files, remove unused columns and duplicate post IDs, and save the result as a DataFrame.
    from Ig_info import LatestPostInfo, load_from_directory
    # Set the directory path to your local JSON files
    d = 'your/json/folder/path'
    ify = load_from_directory(d) 
    info = [it.collect_latest_posts() for it in ify]
    ret = LatestPostInfo.concat(info)  # concat json files
    df = ret.remove_unused_fields().remove_duplicate().to_dataframe()
  • Next, using the images_download.py script to start downloading images, videos, and text files, and save them into different folders named by ID. Each folder will be automatically named based on the ID, and the images/videos/text files will also be named accordingly. The download status, including “successful”, “connection_error”, or “post_unavailable”, will be logged and saved into a CSV file.
    d =  'your/json/folder/path'
    info = create_latestpost_info(d)
    downloaded_dir = 'your/output/image/folder'
    log_download = 'your/download/log.csv'
    download_image(info, output_path=downloaded_dir, log_download=log_download)
  • After downloading, using the same images_download.py script to consolidate files from individual ID folders into a single directory, and summarize the number of images and videos per ID.
    overview = 'your/overview/file.csv'
    ret = download_postprocess(downloaded_dir, new_dir=downloaded_dir, out=overview)

Post-processing

Image Text Extraction

If you need to extract text from .jpg images automatically, use the extract_text_from_image.py script.

  • Modify the directory variable in the script to point to your folder containing .jpg images.
  • The script will process each image, extract text and save the results as a CSV file

Statistics

Descriptive statistics

The descriptive_stats function displays descriptive statistics (count and percentage) for a given column in a DataFrame. You can modify the grouping column(s) as needed depending on your analysis goals.

    from descriptive_stats import descriptive_stats
    df = pd.read_excel('~/code/amr/test_file/post_processed_data.xlsx')
    descriptive_stats(df, 'likesCount', groupby_col=['cat', 'year'])
    descriptive_stats(df,col_name='cat')

Cohen’s kappa

Use this script to calculate Cohen’s kappa, a statistical measure of inter-annotator agreement.

  • coder_1 and coder_2 are flexible: just pass the column names of any two coders you want to compare.
  • Cohen’s kappa value ranges from -1 to 1. Values closer to 1 indicate strong agreement.
df = pd.read_excel("~/code/amr/test_file/coders_messages.xlsx")
cal_kappa(df, coder_1='coder1', coder_2='coder2')

Data Visualisation

Example 1

Contact

Wei Jan Chang, [email protected]

About

Analyze AMR-related Instagram content using Instaloader, NLP, and image recognition — bridging social media and public health research.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published