Real-Time Data Pipeline for E-commerce Business

Project Overview

An e-commerce business needs to be capture real-time user behavior and transactional data to be able help in personalization, marketing optmization and fraud detection. This Solution is a scalable real-time data pipeline that collects, processe and stores behabior data from diffrent sources and deliver insights for instant decison maing.

Architecture

Technologies Used

Kafka (Real-time Data Ingestion)
Apache Flink (Stream Processing)
Apache Airflow (Workflow Orchestration)
Google BigQuery (Data Storage & Analytics)
DBT (Data Transformation)
Looker (Data Visualization)

How to Run This Project

Installation & Setup

Step 1: Clone the Repository

Clone the repository to your local machine:

git clone   
cd

Step 2: Install Dependencies

Install the required dependencies using pip:

pip install -r requirements.txt

Step 3: Set Up Kafka

Set up Apache Kafka locally or use a cloud-based Kafka service.
If running locally, start Zookeeper and Kafka:

bin/zookeeper-server-start.sh config/zookeeper.properties &  
bin/kafka-server-start.sh config/server.properties

Create Kafka topics:

bin/kafka-topics.sh --create --topic ecommerce_transactions --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3

Step 4: Docker Setup

To deploy the entire system using Docker, build and start the containers:

docker-compose up --build

This starts Kafka, Flink, Airflow, and all required services.

Running the Project

1. Start the Kafka Pipeline

Run Kafka Producer (Simulates real-time transactions):
```
python kafka_producer.py
```
Run Kafka Consumer (Processes and streams data to Flink):
```
python kafka_consumer.py
```

2. Run Flink Job (Real-time Processing)

Start the Flink job to process transaction data:

python flink_processing.py

3. Execute Airflow DAG (Batch Processing)

Trigger the Airflow DAG for batch processing:

airflow dags trigger ecommerce_batch_processing

4. Run DBT for Data Transformation

dbt run --profiles-dir bigquery/dbt

5. Visualize in BigQuery / Power BI

Once data is processed and stored in Google BigQuery, connect Power BI or Tableau for visualization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Data Pipeline for E-commerce Business

Project Overview

Architecture

Technologies Used

How to Run This Project

Installation & Setup

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Set Up Kafka

Step 4: Docker Setup

Running the Project

1. Start the Kafka Pipeline

2. Run Flink Job (Real-time Processing)

3. Execute Airflow DAG (Batch Processing)

4. Run DBT for Data Transformation

5. Visualize in BigQuery / Power BI

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
airflow		airflow
bigquery		bigquery
data		data
docker		docker
flink		flink
kafka		kafka
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

jamessandy/Scalable-Real-Time-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Real-Time Data Pipeline for E-commerce Business

Project Overview

Architecture

Technologies Used

How to Run This Project

Installation & Setup

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Set Up Kafka

Step 4: Docker Setup

Running the Project

1. Start the Kafka Pipeline

2. Run Flink Job (Real-time Processing)

3. Execute Airflow DAG (Batch Processing)

4. Run DBT for Data Transformation

5. Visualize in BigQuery / Power BI

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages