- Install mysql server with jdbc connector. Start server and fill sql_cfg.json file to be able make connection to sql from spark
- Install dependencies
conda env create -f environment.yml
- Run loadToSQL.py, that loads instacart data to local path. Then loads that dataframes to spark, which sends data to sql server
- Check analysis inside sql.md. There are written sql queries and analysis of table with checking different hypothesis. All visualizations made by Grafana
- Look at spark.ipynb to analysis using spark