Production-grade real-time data pipeline and observability platform for e-commerce systems with Kafka streaming, Spark processing, InfluxDB time-series storage, and Grafana dashboards.
This project implements a complete real-time observability stack for e-commerce platforms. It processes live customer activity events (purchases, campaigns, orders), enriches them with demographic data from MySQL, and feeds the results into Grafana dashboards for real-time monitoring and business analytics.
The architecture demonstrates enterprise patterns for streaming data enrichment, schema management with Avro, and building observability platforms.
User Activity Events (Generated)
|
v
Kafka Topic: consumer_activity
(Avro Serialized)
|
v
+--------------------------------+
| Spark Structured Streaming |
| - Deserialize Avro |
| - Read from Kafka |
| - Join with MySQL Demographics|
| - Enrich with Age/Country/ |
| Gender/State |
| - Re-serialize to Avro |
+--------------------------------+
|
v
Kafka Topic: user_activity_demographics
|
v
Kafka Connect (InfluxDB Sink Connector)
|
v
InfluxDB (Time-Series Storage)
|
v
Grafana Dashboards
(Real-Time Monitoring & Analytics)
| Category | Technologies |
|---|---|
| Message Streaming | Apache Kafka (Confluent Platform 6.2.0) |
| Schema Registry | Confluent Schema Registry |
| Stream Processing | Apache Spark 3.0.1 (Structured Streaming) |
| Data Serialization | Apache Avro, Confluent Avro Serializers |
| Time-Series Database | InfluxDB |
| Visualization | Grafana |
| Relational Database | MySQL (demographic data) |
| Connectors | Kafka Connect, InfluxDB Sink Connector |
| Languages | Java 8+, Python 3.x, Scala 2.12 |
| Build Tool | Maven |
| Containerization | Docker & Docker Compose |
| Coordination | Zookeeper |
- Real-Time Event Processing: Kafka + Spark Structured Streaming pipeline
- Schema Evolution: Avro schemas managed via Confluent Schema Registry
- Stream-Batch Join: Enriching real-time streams with static MySQL data
- Multi-Language Support: Both Java (production) and Python implementations
- Full Observability Stack: Grafana + InfluxDB for metrics and dashboards
- Containerized Infrastructure: Complete Docker Compose setup
- KSQL Support: Stream processing via SQL queries
realtime-ecommerce-observability-stack/
├── Codes/
│ └── spark-streaming-development/
│ ├── src/main/java/jobs/
│ │ ├── stream/
│ │ │ ├── UserActivityEventProducer.java # Event generator
│ │ │ ├── SparkKafkaConsumerActivityNDemographic.java
│ │ │ └── SparkKafkaConsumerForConsumerIdCountry.java
│ │ ├── batch/
│ │ │ └── UserDemographicDataJob.java # MySQL batch loader
│ │ ├── UserActivity.java # Data model
│ │ └── TopicDetails.java # Kafka config
│ ├── src/main/java/util/
│ │ ├── UserActivityUtil.java # Event generation utils
│ │ └── DemoGraphicDataUtil.java # Demographics utils
│ ├── config/
│ │ └── kafka-influx/ # InfluxDB connector configs
│ ├── docker-compose.yml # Full stack definition
│ └── pom.xml # Maven dependencies
├── python/
│ ├── UserActivityEventProducer.py # Python event producer
│ ├── SparkKafkaConsumerActivityNDemographic.py # Python Spark consumer
│ ├── avro_deserialization.py # Avro UDFs
│ ├── avro_serialize.py # Avro serialization
│ ├── user_activity_util.py # Python utils
│ └── demo_graphic_data_util.py # Demographics utils
├── documents/
│ └── (Project documentation)
└── Documentation/
├── Projectdetails.pdf
└── Solution Methodology.pdf
The docker-compose.yml includes:
| Service | Purpose | Port |
|---|---|---|
| MySQL | User demographic data | 3306 |
| Zookeeper | Kafka coordination | 2181 |
| Kafka Broker | Message streaming | 9092 |
| Schema Registry | Avro schema management | 8081 |
| Kafka Connect | Data connectors | 8083 |
| KSQL DB | Stream processing SQL | 8088 |
| Control Center | Kafka management UI | 9021 |
| REST Proxy | HTTP interface to Kafka | 8082 |
| Grafana | Dashboards & visualization | 3000 |
| InfluxDB | Time-series metrics | 8086 |
// Generate mock e-commerce events
UserActivity activity = new UserActivity();
activity.setConsumerId(generateConsumerId());
activity.setOrderCount(random.nextInt(10));
activity.setTotalPurchase(random.nextDouble() * 1000);
activity.setCampaignName(campaigns[random.nextInt(campaigns.length)]);// Spark Structured Streaming
Dataset<Row> kafkaStream = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "consumer_activity")
.load();
// Deserialize Avro and join with demographics
Dataset<Row> enrichedStream = kafkaStream
.select(from_avro(col("value"), schema).as("activity"))
.join(demographicsDF, "consumer_id")
.select(
col("activity.*"),
col("age"),
col("country"),
col("gender"),
col("state")
);- Active users per region
- Revenue by campaign
- Order volume trends
- Customer demographics breakdown
- Real-time purchase alerts
- Streaming Architecture: Building production-grade Kafka streaming pipelines
- Avro & Schema Registry: Managing schema evolution in streaming systems
- Stream-Batch Joins: Enriching real-time data with reference data
- Observability Patterns: Implementing metrics collection and visualization
- Grafana Dashboards: Creating real-time business intelligence dashboards
- InfluxDB: Working with time-series databases for metrics
- Confluent Platform: Enterprise Kafka ecosystem components
- Docker and Docker Compose
- Java 8+ (for building Java components)
- Maven
- Python 3.x (for Python implementation)
cd Codes/spark-streaming-development
docker-compose up -dmvn clean package# Java
java -cp target/spark-streaming-1.0.jar jobs.stream.UserActivityEventProducer
# Python
python python/UserActivityEventProducer.pyspark-submit \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 \
--class jobs.stream.SparkKafkaConsumerActivityNDemographic \
target/spark-streaming-1.0.jarcurl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @config/kafka-influx/influx-sink-connector.jsonNavigate to http://localhost:3000 (default: admin/admin)
| Service | URL |
|---|---|
| Grafana | http://localhost:3000 |
| Kafka Control Center | http://localhost:9021 |
| Schema Registry | http://localhost:8081 |
| KSQL CLI | http://localhost:8088 |
- Open Grafana at http://localhost:3000
- Add InfluxDB as data source
- Import dashboard JSON or create custom panels
- Watch real-time e-commerce metrics flow in
- Add custom Grafana alerting rules
- Implement A/B testing metrics tracking
- Create automated report generation
- Add business KPI dashboards
- Implement user behavior funnel analysis
- Add anomaly detection for revenue spikes
- Implemented as part of data engineering portfolio
- Based on ProjectPro real-time observability project template
- Demonstrates proficiency in streaming, observability, and data visualization