PostgreSQL Data Pipeline with Docker

This project is part of the Data Engineering Zoomcamp by DataTalksClub.

It demonstrates how to build a data ingestion pipeline using Python, SQL and Docker, with PostgreSQL as the main database.

Project Overview

The goal of this project is to create a reproducible data pipeline that:

Downloads a dataset ( NYC Taxi data)
Loads it into a PostgreSQL database
Uses Docker to manage the environment
Allows querying and analysis via SQL tools

Tech Stack

Python
PostgreSQL
Docker & Docker Compose
Pandas
Jupyter Notebook
pgAdmin / pgcli

Architecture

Pipeline flow:

Data Source → Python Script → PostgreSQL (Docker) → SQL Analysis

Project Structure

.
├── ingest_data.py        # script to download and load data into Postgres
├── pipeline.py           # pipeline logic
├── upload_data.ipynb     # notebook for testing and exploration
├── Dockerfile
├── docker-compose.yml    # defines Postgres & pgAdmin services
├── scripts/
│   ├── postgres.sh
│   ├── pgadmin.sh
│   └── pgcli.sh

How to Run

1. Start Docker services

docker-compose up

This will start:

PostgreSQL database
pgAdmin (optional UI)

2. Run data ingestion

python ingest_data.py \
  --user=your_user \
  --password=your_password \
  --host=localhost \
  --port=5432 \
  --db=ny_taxi \
  --table_name=yellow_taxi_data \
  --url=<dataset_url>

3. Query the data

Use:

pgAdmin (browser UI)
pgcli
or any SQL client

Example:

SELECT COUNT(*) FROM yellow_taxi_data;

Example Use Cases

Loading large datasets into a database
Practicing SQL queries
Building reproducible data pipelines
Local data warehouse setup

Learning Outcomes

Through this project I learned:

How to use Docker for data engineering workflows
How to set up and manage PostgreSQL locally
How to ingest large datasets efficiently
Writing and executing SQL queries
Structuring data pipelines in Python

Acknowledgments

This project is based on the Data Engineering Zoomcamp by DataTalksClub: https://github.com/DataTalksClub/data-engineering-zoomcamp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PostgreSQL Data Pipeline with Docker

Project Overview

Tech Stack

Architecture

Project Structure

How to Run

1. Start Docker services

2. Run data ingestion

3. Query the data

Example Use Cases

Learning Outcomes

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dockerfile		Dockerfile
README.md		README.md
ingest_data.py		ingest_data.py
ingest_data.sh		ingest_data.sh
pgAdmin.sh		pgAdmin.sh
pgcli.sh		pgcli.sh
pipeline.py		pipeline.py
postgres.sh		postgres.sh
upload_data.ipynb		upload_data.ipynb

Folders and files

Latest commit

History

Repository files navigation

PostgreSQL Data Pipeline with Docker

Project Overview

Tech Stack

Architecture

Project Structure

How to Run

1. Start Docker services

2. Run data ingestion

3. Query the data

Example Use Cases

Learning Outcomes

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages