Skip to content
View soumilshah1995's full-sized avatar
🎯
happy
🎯
happy

Highlights

  • Pro

Block or report soumilshah1995

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
soumilshah1995/README.md

πŸ‘‹ Hey, I'm Soumil Nitin Shah

Lead Software Engineer β€’ Big Data Architect β€’ Tech Educator

Typing SVG

LinkedIn YouTube Medium Website GitHub


πŸš€ About Me

I'm a Lead Software Engineer at Zeta Global with 6+ years of experience architecting and building scalable data lakes and big data platforms on AWS. I specialize in transforming complex data into actionable insights through high-performance, cost-efficient data workflows.

"Making sophisticated data engineering accessible to everyone"

πŸ—οΈ Creator of LakeBoost β€” A framework integrating Apache Hudi with AWS Glue ETL for large-scale data operations with significant cost reduction

πŸ“ Featured on AWS Blog β€” How Zeta Global scales multi-tenant data ingestion with Amazon S3 Tables


πŸ’Ό What I Do at Zeta Global

Role: Lead Software Engineer – Big Data
Focus: Lakehouse Platform Architecture

Achievements:
  - 🏒 Leading Lakehouse adoption across Zeta's data ecosystem
  - πŸ“Š Processing 60-120 GB/hour | 1.3+ TB daily | 53+ TB monthly
  - πŸ—ƒοΈ Managing 10,000+ Iceberg tables in production
  - ⚑ Reduced data processing time from hours to minutes
  - πŸ’° Achieved 4-5x cost reduction through incremental ETL

πŸ› οΈ Tech Stack & Expertise

Data Lakehouse & Processing

Apache Hudi Apache Iceberg Apache Spark Delta Lake

AWS Services

AWS AWS Glue Amazon EMR S3 Step Functions

Languages & Tools

Python PySpark SQL Kafka Elasticsearch


πŸ“Ί Content Creator & Educator

Platform Stats
🎬 YouTube 46,000+ Subscribers
πŸ“Ή Videos 1,600+ Tutorials
πŸ“‚ GitHub Repos 300+ Projects
✍️ Blog Posts 200+ Articles
πŸ‘οΈ Monthly Views 124K+

πŸ“š Content I Create

  • πŸ—οΈ Data Lakehouse Architecture β€” Hudi, Iceberg, Delta Lake
  • ☁️ AWS Big Data Services β€” Glue, EMR, S3, Step Functions
  • ⚑ Real-time Data Pipelines β€” Kafka, Kinesis, Streaming
  • πŸ”§ Performance Optimization β€” Query tuning, cost reduction
  • πŸŽ“ Hands-on Labs β€” End-to-end project tutorials

πŸ† Featured Work & Publications

πŸ“° AWS Storage Blog

How Zeta Global scales multi-tenant data ingestion with Amazon S3 Tables

Architecture for handling massive scale with 10,000+ Iceberg tables and 2TB daily processing

πŸ“ Recent Medium Articles

πŸ”¬ Research Publications

  • Arduino based Seismic Sensor for Earthquake Detection
  • Silicon Membrane Thickness Monitoring System based on Optical Sensing
  • Simulation of PM2.5 Particulate Matter Pollution β€” NASA CT Space Grant
  • A-stick: Arduino-based Smart Blind Stick Navigator

πŸ“Š GitHub Stats

GitHub Stats GitHub Streak Top Languages

πŸŽ“ Education

Degree Field Institution
πŸŽ“ M.S. Electrical Engineering University of Bridgeport
πŸŽ“ M.S. Computer Engineering University of Bridgeport
πŸŽ“ B.S. Electronic Engineering K.J. Somaiya Institute

πŸ… Awards: Best Academic Achievement Award (4.0 GPA) β€’ 3rd Place UB Hackathon β€’ The Builder Award


🀝 Let's Connect & Collaborate

πŸ’¬ I'm Happy to Discuss

Lakehouse Architecture β€’ Apache Hudi & Iceberg β€’ AWS Data Platforms β€’ Spark Optimization Multi-tenant Systems β€’ Cost Optimization β€’ Content Creation β€’ Tech Speaking

🎀 Speaking & Collaboration

I'm open to speaking opportunities, technical collaborations, and content partnerships around data engineering, lakehouse architecture, and AWS big data technologies.


Email LinkedIn YouTube


⚑ Fun Fact

I've created 1,600+ videos teaching big data technologies β€” that's like a new tutorial every other day for years!


Profile Views

Made with ❀️ by Soumil Shah

Last updated: 2025

Pinned Loading

  1. Smart-way-to-Capture-Jobs-and-Process-Meta-Data-Using-DynamoDB-Project-Demo-Python-Templates Smart-way-to-Capture-Jobs-and-Process-Meta-Data-Using-DynamoDB-Project-Demo-Python-Templates Public

    Smart way to Capture Jobs and Process Meta Data Using DynamoDB | Project Demo | Python Templates

    CSS 4 2

  2. Project-Using-Apache-Hudi-Deltastreamer-and-AWS-DMS-Hands-on-Lab Project-Using-Apache-Hudi-Deltastreamer-and-AWS-DMS-Hands-on-Lab Public

    Project : Using Apache Hudi Deltastreamer and AWS DMS Hands on Labs

    4 1

  3. Python-Flask-Redis-Celery-Docker Python-Flask-Redis-Celery-Docker Public

    Learn how to use Python with Flask Redis and Celery and pack everything into Docker Container

    Python 78 65

  4. An-easy-to-use-Python-utility-class-for-accessing-incremental-data-from-Hudi-Data-Lakes An-easy-to-use-Python-utility-class-for-accessing-incremental-data-from-Hudi-Data-Lakes Public

    An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

    Python 3

  5. LakeBoost LakeBoost Public

    LakeBoost

    Python 9 3

  6. emr-apache-iceberg-workshop emr-apache-iceberg-workshop Public

    emr-apache-iceberg-workshop

    Python 4