PLP-Pipeline

This code base is using the Julia Language and DrWatson to make a reproducible scientific project named PLP-Pipeline. It is authored by kosuri-indu, demonstrates a pipeline for patient-level prediction. Special thanks to @TheCedarPrince for guiding this project.

Research Question

Can we predict the onset of diabetes in patients diagnosed with hypertension using patient-level observational healthcare data?
The goal is to build a predictive model that identifies patterns from historical healthcare data to determine which patients with hypertension are more likely to develop diabetes.

Getting Started

To (locally) reproduce this project, do the following:

Download the Code Base Note: Raw data are not included in the repository. You will need to download them separately.

Set Up the Julia Environment Open a Julia console and execute:

using Pkg
Pkg.add("DrWatson")        # Install DrWatson globally
Pkg.activate("path/to/this/project")
Pkg.instantiate()          # Install all necessary packages

This will install all necessary packages for you to be able to run the scripts and everything should work out of the box, including correctly finding local paths.

You may notice that most scripts start with the commands:

using DrWatson
@quickactivate "PLP-Pipeline"

which auto-activate the project and enable local path handling from DrWatson.

Usage

Setting Up the Database

Set up the DuckDB database with your data by running:

julia> include("scripts/setup_db.jl")

Running the Pipeline

To run the machine learning pipeline, use the run_plp.jl script:

julia> include("scripts/run_plp.jl")

TODO List

Documentation & Research
- Add initial documentation in the _research folder
- Expand documentation with detailed research questions and hypotheses
Core Pipeline Implementation
- Set up project structure with DrWatson
- Database setup (setup_db.jl)
- Data loading (data_loader.jl)
- Cohort definition (cohort_definition.jl)
- Feature extraction (feature_extraction.jl)
- Distribution check (distribution_check.jl)
- Outcome attachment (outcome_attach.jl)
- Data preprocessing (preprocessing.jl)
- Model training & evaluation (train_model.jl)
Future Enhancements
- Add robust error handling and logging
- Refine research questions and incorporate additional clinical variables
- Develop tests and expand documentation further

References

OHDSI Patient-Level Prediction in R
Reps, J. M., Schuemie, M. J., Suchard, M. A., Ryan, P. B., & Rijnbeek, P. R. (2018). Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. Journal of the American Medical Informatics Association, 25(8), 969–975. https://doi.org/10.1093/jamia/ocy032

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
_research		_research
data/exp_raw		data/exp_raw
scripts		scripts
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PLP-Pipeline

Research Question

Getting Started

Usage

Setting Up the Database

Running the Pipeline

TODO List

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

kosuri-indu/PLP-Pipeline

Folders and files

Latest commit

History

Repository files navigation

PLP-Pipeline

Research Question

Getting Started

Usage

Setting Up the Database

Running the Pipeline

TODO List

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages