Skip to content

PLP-Pipeline is a patient-level prediction pipeline using Julia and DrWatson. It predicts diabetes onset in hypertension patients by analyzing healthcare data with OMOP CDM. The pipeline includes cohort definition, feature extraction, preprocessing, and model training using L1 Logistic Regression, Random Forest, and XGBoost.

Notifications You must be signed in to change notification settings

kosuri-indu/PLP-Pipeline

Repository files navigation

PLP-Pipeline

This code base is using the Julia Language and DrWatson to make a reproducible scientific project named PLP-Pipeline. It is authored by kosuri-indu, demonstrates a pipeline for patient-level prediction. Special thanks to @TheCedarPrince for guiding this project.

Research Question

  • Can we predict the onset of diabetes in patients diagnosed with hypertension using patient-level observational healthcare data?
  • The goal is to build a predictive model that identifies patterns from historical healthcare data to determine which patients with hypertension are more likely to develop diabetes.

Getting Started

To (locally) reproduce this project, do the following:

  1. Download the Code Base Note: Raw data are not included in the repository. You will need to download them separately.

  2. Set Up the Julia Environment Open a Julia console and execute:

    using Pkg
    Pkg.add("DrWatson")        # Install DrWatson globally
    Pkg.activate("path/to/this/project")
    Pkg.instantiate()          # Install all necessary packages
    

This will install all necessary packages for you to be able to run the scripts and everything should work out of the box, including correctly finding local paths.

You may notice that most scripts start with the commands:

using DrWatson
@quickactivate "PLP-Pipeline"

which auto-activate the project and enable local path handling from DrWatson.

Usage

Setting Up the Database

Set up the DuckDB database with your data by running:

julia> include("scripts/setup_db.jl")

Running the Pipeline

To run the machine learning pipeline, use the run_plp.jl script:

julia> include("scripts/run_plp.jl")

TODO List

  • Documentation & Research

    • Add initial documentation in the _research folder
    • Expand documentation with detailed research questions and hypotheses
  • Core Pipeline Implementation

    • Set up project structure with DrWatson
    • Database setup (setup_db.jl)
    • Data loading (data_loader.jl)
    • Cohort definition (cohort_definition.jl)
    • Feature extraction (feature_extraction.jl)
    • Distribution check (distribution_check.jl)
    • Outcome attachment (outcome_attach.jl)
    • Data preprocessing (preprocessing.jl)
    • Model training & evaluation (train_model.jl)
  • Future Enhancements

    • Add robust error handling and logging
    • Refine research questions and incorporate additional clinical variables
    • Develop tests and expand documentation further

References

  • OHDSI Patient-Level Prediction in R
  • Reps, J. M., Schuemie, M. J., Suchard, M. A., Ryan, P. B., & Rijnbeek, P. R. (2018). Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. Journal of the American Medical Informatics Association, 25(8), 969–975. https://doi.org/10.1093/jamia/ocy032

About

PLP-Pipeline is a patient-level prediction pipeline using Julia and DrWatson. It predicts diabetes onset in hypertension patients by analyzing healthcare data with OMOP CDM. The pipeline includes cohort definition, feature extraction, preprocessing, and model training using L1 Logistic Regression, Random Forest, and XGBoost.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages