This code base is using the Julia Language and DrWatson to make a reproducible scientific project named PLP-Pipeline. It is authored by kosuri-indu, demonstrates a pipeline for patient-level prediction. Special thanks to @TheCedarPrince for guiding this project.
- Can we predict the onset of diabetes in patients diagnosed with hypertension using patient-level observational healthcare data?
- The goal is to build a predictive model that identifies patterns from historical healthcare data to determine which patients with hypertension are more likely to develop diabetes.
To (locally) reproduce this project, do the following:
-
Download the Code Base Note: Raw data are not included in the repository. You will need to download them separately.
-
Set Up the Julia Environment Open a Julia console and execute:
using Pkg Pkg.add("DrWatson") # Install DrWatson globally Pkg.activate("path/to/this/project") Pkg.instantiate() # Install all necessary packages
This will install all necessary packages for you to be able to run the scripts and everything should work out of the box, including correctly finding local paths.
You may notice that most scripts start with the commands:
using DrWatson
@quickactivate "PLP-Pipeline"
which auto-activate the project and enable local path handling from DrWatson.
Set up the DuckDB database with your data by running:
julia> include("scripts/setup_db.jl")
To run the machine learning pipeline, use the run_plp.jl
script:
julia> include("scripts/run_plp.jl")
-
Documentation & Research
- Add initial documentation in the
_research
folder - Expand documentation with detailed research questions and hypotheses
- Add initial documentation in the
-
Core Pipeline Implementation
- Set up project structure with DrWatson
- Database setup (
setup_db.jl
) - Data loading (
data_loader.jl
) - Cohort definition (
cohort_definition.jl
) - Feature extraction (
feature_extraction.jl
) - Distribution check (
distribution_check.jl
) - Outcome attachment (
outcome_attach.jl
) - Data preprocessing (
preprocessing.jl
) - Model training & evaluation (
train_model.jl
)
-
Future Enhancements
- Add robust error handling and logging
- Refine research questions and incorporate additional clinical variables
- Develop tests and expand documentation further
- OHDSI Patient-Level Prediction in R
- Reps, J. M., Schuemie, M. J., Suchard, M. A., Ryan, P. B., & Rijnbeek, P. R. (2018). Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. Journal of the American Medical Informatics Association, 25(8), 969–975. https://doi.org/10.1093/jamia/ocy032