An e-commerce business needs to be capture real-time user behavior and transactional data to be able help in personalization, marketing optmization and fraud detection. This Solution is a scalable real-time data pipeline that collects, processe and stores behabior data from diffrent sources and deliver insights for instant decison maing.
- Kafka (Real-time Data Ingestion)
- Apache Flink (Stream Processing)
- Apache Airflow (Workflow Orchestration)
- Google BigQuery (Data Storage & Analytics)
- DBT (Data Transformation)
- Looker (Data Visualization)
Clone the repository to your local machine:
git clone
cd Install the required dependencies using pip:
pip install -r requirements.txtSet up Apache Kafka locally or use a cloud-based Kafka service.
If running locally, start Zookeeper and Kafka:
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.propertiesCreate Kafka topics:
bin/kafka-topics.sh --create --topic ecommerce_transactions --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3To deploy the entire system using Docker, build and start the containers:
docker-compose up --buildThis starts Kafka, Flink, Airflow, and all required services.
- Run Kafka Producer (Simulates real-time transactions):
python kafka_producer.py
- Run Kafka Consumer (Processes and streams data to Flink):
python kafka_consumer.py
Start the Flink job to process transaction data:
python flink_processing.pyTrigger the Airflow DAG for batch processing:
airflow dags trigger ecommerce_batch_processingdbt run --profiles-dir bigquery/dbtOnce data is processed and stored in Google BigQuery, connect Power BI or Tableau for visualization.
