This project sets up a data pipeline to ingest Fitbit data into a TimescaleDB database using Docker.
Fitbit Challenge Answers:
Task 0.a
Task 0.b
Task 1,3
Task 2,3,4
Task 4
Task 4
Task 5
Task 5
Fork the repo -> clone the forked repo -> cd into it
git clone https://github.com/<your-username>/fitbit_challenge.git
cd fitbit_challenge
-
Create a virtual env:
python -m venv venv source venv/bin/activate
-
Prerequisites: Docker and Docker Compose must be installed
-
npm install
(once to generate the package-lock.json) andpip install -r requirements.txt
to install all required python packages -
chmod +x cleanup.sh
to make the script executable -
Run the service:
docker-compose up --build
This will start the TimescaleDB database and run the ingestion script to load the data
After the frontend and backend are loaded, wait for 2 minutes for the first set of data to get ingested and loaded to the database
Then open localhost:3000
to see the dashboard
If needed to restart from the beginning: recommeded to ./cleanup.sh
to restart the docker containers and databases from beginning to avoid conflicts
Open Grafana at localhost:3001
and log in with username admin and password admin
The visualization options will be available after login and are all automatically loaded without creating any new dashboards
6. run the impute.py script after the ingestion completes and imputation works with timescaleDB's inbuild interpolation, an advantage of using timescaledb for timeseries data
Run the impute service:
docker-compose up -d
docker-compose exec ingestion python3 /app/impute.py
Some notes:
- Currently the application depends completely on wearipedia library's synthetic data and its extensible to incoporate real data
- During the intial setup a user database will be created and three user records will be added to simulate
- The parsing and ingestion works with intraday_heart_rate, intraday_spo2, intraday_activity, azm, sleep, breathing_rate, intraday_hrv and it happens currently for three users with user_ids 1, 2, 3
- Ingestion is set to run every 2 minutes and ingests one day's data every 2 minutes. A state file is created automatically on the first day to store the current days ingestion. This is equvalent to simulating a real data's ingestion of one day's data ingesting every day once at a particular time. The reason I set ingestion to 2min is to test ingestion rather than waiting for a day to ingest next day's data
- Aggregates of 1d, 1min, 1hr tables have been created and gets updated regularly as per the scheduler and used to render frontend and data analysis
- Pagination/chunking has been implemented when data is requested to frontend for better performance
- impute.py is still under development
- impute.py script should only be run at the end of the complete ingestion for data analysis only, otherwise conflicts can arise as data is getting ingested real time and impute engine may work on uningested data
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. To get started, please read our contributing guidelines for details on our code of conduct, and the process for submitting pull requests to us. We look forward to your contributions!
Distributed under the MIT License. See LICENSE for more information.
Moses - @Github
Project Link: https://github.com/fitbit-project/fitbit_challenge