Cleaning the Nashville Housing Data dataset.
A postgres instance is expected to be available on your machine. You can download it here: https://www.postgresql.org/download/. You also need to install the Levenshtein Postgres extension.
You need Poetry to setup this project. You can install it here: https://python-poetry.org/docs/#installation.
Then run the following to install the Poe plugin as well as the project dependencies.
poetry self add 'poethepoet[poetry_plugin]'
poetry install
Create a .env
in the root of your project and set your Postgres
password as the DB_PASSWORD
. Alternatively, set it directly as an env variable.
Run the cleaning pipeline
poetry poe data-cleaning-pipeline