This repository contains the practical code and examples for the second class of the Fundamentals of Data Engineering with Python and SQL course. The focus is on introducing Apache Airflow, a powerful workflow orchestration tool widely used in modern data pipelines.
If you haven't already cloned the repository, you can do so by running the following command:
git clone [email protected]:GADES-DATAENG/mod2-airflow.git
cd webinar
Before starting the services, you need to build the .env file with some variables. Please check the .env.template file and use it as a template for your .env file.
cp .env.template .env
After downloading your GCP service account JSON credentials file, just past it under the keys folder with the name gcp-key.json
If you don't need (or have) any GCP account yet, you can just create an empty file with the name gcp-key.json
Once the image is built, you can start the services (Airflow, and other dependencies) using Docker Compose. Run the following command:
docker-compose up -d
This command will start all the containers defined in the docker-compose.yml
file. It will set up Airflow, and any necessary services, including BigQuery integration.
- Airflow Web UI: You can access the Airflow web interface at http://localhost:8080
- Default login credentials are
- Username:
airflow
- Password:
airflow
- Username:
- Default login credentials are
- The service account key file (
gcp-key.json
) should be inside thekeys
folder
Ensure that the key file is placed correctly in the repository folder as:
/mod2-airflow/gcp-key.json