Skip to content

Ctt011/realtime-ecommerce-observability-stack

Repository files navigation

Real-Time E-Commerce Observability Stack

Apache Kafka Apache Spark Grafana InfluxDB Docker

Production-grade real-time data pipeline and observability platform for e-commerce systems with Kafka streaming, Spark processing, InfluxDB time-series storage, and Grafana dashboards.

Overview

This project implements a complete real-time observability stack for e-commerce platforms. It processes live customer activity events (purchases, campaigns, orders), enriches them with demographic data from MySQL, and feeds the results into Grafana dashboards for real-time monitoring and business analytics.

The architecture demonstrates enterprise patterns for streaming data enrichment, schema management with Avro, and building observability platforms.

Architecture

User Activity Events (Generated)
         |
         v
Kafka Topic: consumer_activity
(Avro Serialized)
         |
         v
+--------------------------------+
| Spark Structured Streaming     |
|   - Deserialize Avro           |
|   - Read from Kafka            |
|   - Join with MySQL Demographics|
|   - Enrich with Age/Country/   |
|     Gender/State               |
|   - Re-serialize to Avro       |
+--------------------------------+
         |
         v
Kafka Topic: user_activity_demographics
         |
         v
Kafka Connect (InfluxDB Sink Connector)
         |
         v
InfluxDB (Time-Series Storage)
         |
         v
Grafana Dashboards
(Real-Time Monitoring & Analytics)

Tech Stack

Category Technologies
Message Streaming Apache Kafka (Confluent Platform 6.2.0)
Schema Registry Confluent Schema Registry
Stream Processing Apache Spark 3.0.1 (Structured Streaming)
Data Serialization Apache Avro, Confluent Avro Serializers
Time-Series Database InfluxDB
Visualization Grafana
Relational Database MySQL (demographic data)
Connectors Kafka Connect, InfluxDB Sink Connector
Languages Java 8+, Python 3.x, Scala 2.12
Build Tool Maven
Containerization Docker & Docker Compose
Coordination Zookeeper

Key Features

  • Real-Time Event Processing: Kafka + Spark Structured Streaming pipeline
  • Schema Evolution: Avro schemas managed via Confluent Schema Registry
  • Stream-Batch Join: Enriching real-time streams with static MySQL data
  • Multi-Language Support: Both Java (production) and Python implementations
  • Full Observability Stack: Grafana + InfluxDB for metrics and dashboards
  • Containerized Infrastructure: Complete Docker Compose setup
  • KSQL Support: Stream processing via SQL queries

Project Structure

realtime-ecommerce-observability-stack/
├── Codes/
│   └── spark-streaming-development/
│       ├── src/main/java/jobs/
│       │   ├── stream/
│       │   │   ├── UserActivityEventProducer.java    # Event generator
│       │   │   ├── SparkKafkaConsumerActivityNDemographic.java
│       │   │   └── SparkKafkaConsumerForConsumerIdCountry.java
│       │   ├── batch/
│       │   │   └── UserDemographicDataJob.java       # MySQL batch loader
│       │   ├── UserActivity.java                     # Data model
│       │   └── TopicDetails.java                     # Kafka config
│       ├── src/main/java/util/
│       │   ├── UserActivityUtil.java                 # Event generation utils
│       │   └── DemoGraphicDataUtil.java              # Demographics utils
│       ├── config/
│       │   └── kafka-influx/                         # InfluxDB connector configs
│       ├── docker-compose.yml                        # Full stack definition
│       └── pom.xml                                   # Maven dependencies
├── python/
│   ├── UserActivityEventProducer.py                  # Python event producer
│   ├── SparkKafkaConsumerActivityNDemographic.py     # Python Spark consumer
│   ├── avro_deserialization.py                       # Avro UDFs
│   ├── avro_serialize.py                             # Avro serialization
│   ├── user_activity_util.py                         # Python utils
│   └── demo_graphic_data_util.py                     # Demographics utils
├── documents/
│   └── (Project documentation)
└── Documentation/
    ├── Projectdetails.pdf
    └── Solution Methodology.pdf

Docker Services

The docker-compose.yml includes:

Service Purpose Port
MySQL User demographic data 3306
Zookeeper Kafka coordination 2181
Kafka Broker Message streaming 9092
Schema Registry Avro schema management 8081
Kafka Connect Data connectors 8083
KSQL DB Stream processing SQL 8088
Control Center Kafka management UI 9021
REST Proxy HTTP interface to Kafka 8082
Grafana Dashboards & visualization 3000
InfluxDB Time-series metrics 8086

Data Flow

Event Generation

// Generate mock e-commerce events
UserActivity activity = new UserActivity();
activity.setConsumerId(generateConsumerId());
activity.setOrderCount(random.nextInt(10));
activity.setTotalPurchase(random.nextDouble() * 1000);
activity.setCampaignName(campaigns[random.nextInt(campaigns.length)]);

Stream Processing

// Spark Structured Streaming
Dataset<Row> kafkaStream = spark
    .readStream()
    .format("kafka")
    .option("kafka.bootstrap.servers", "localhost:9092")
    .option("subscribe", "consumer_activity")
    .load();

// Deserialize Avro and join with demographics
Dataset<Row> enrichedStream = kafkaStream
    .select(from_avro(col("value"), schema).as("activity"))
    .join(demographicsDF, "consumer_id")
    .select(
        col("activity.*"),
        col("age"),
        col("country"),
        col("gender"),
        col("state")
    );

Grafana Dashboard Metrics

  • Active users per region
  • Revenue by campaign
  • Order volume trends
  • Customer demographics breakdown
  • Real-time purchase alerts

What I Learned

  • Streaming Architecture: Building production-grade Kafka streaming pipelines
  • Avro & Schema Registry: Managing schema evolution in streaming systems
  • Stream-Batch Joins: Enriching real-time data with reference data
  • Observability Patterns: Implementing metrics collection and visualization
  • Grafana Dashboards: Creating real-time business intelligence dashboards
  • InfluxDB: Working with time-series databases for metrics
  • Confluent Platform: Enterprise Kafka ecosystem components

Setup & Installation

Prerequisites

  • Docker and Docker Compose
  • Java 8+ (for building Java components)
  • Maven
  • Python 3.x (for Python implementation)

1. Start Infrastructure

cd Codes/spark-streaming-development
docker-compose up -d

2. Build Java Components

mvn clean package

3. Start Event Producer

# Java
java -cp target/spark-streaming-1.0.jar jobs.stream.UserActivityEventProducer

# Python
python python/UserActivityEventProducer.py

4. Start Spark Consumer

spark-submit \
  --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 \
  --class jobs.stream.SparkKafkaConsumerActivityNDemographic \
  target/spark-streaming-1.0.jar

5. Configure InfluxDB Sink Connector

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @config/kafka-influx/influx-sink-connector.json

6. Access Grafana

Navigate to http://localhost:3000 (default: admin/admin)

Usage

Access Services

Service URL
Grafana http://localhost:3000
Kafka Control Center http://localhost:9021
Schema Registry http://localhost:8081
KSQL CLI http://localhost:8088

View Real-Time Metrics

  1. Open Grafana at http://localhost:3000
  2. Add InfluxDB as data source
  3. Import dashboard JSON or create custom panels
  4. Watch real-time e-commerce metrics flow in

Future Enhancements

  • Add custom Grafana alerting rules
  • Implement A/B testing metrics tracking
  • Create automated report generation
  • Add business KPI dashboards
  • Implement user behavior funnel analysis
  • Add anomaly detection for revenue spikes

Acknowledgments

  • Implemented as part of data engineering portfolio
  • Based on ProjectPro real-time observability project template
  • Demonstrates proficiency in streaming, observability, and data visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors