I build data pipelines, dashboards, and forecasting models — with a focus on outputs that drive real decisions, not just sit in a folder.
Based in Jersey City, NJ · Open to relocation · Authorized to work in the US on F-1 OPT (no sponsorship required) · Graduating May 2026
Nov 2025 – Apr 2026 · Hoboken, NJ
- Cut data inconsistencies 20% and reduced processing time 30% by unifying operational datasets across 12+ U.S. cities, standardizing disparate city feeds in Python (Pandas, NumPy, GeoPandas) through schema validation and geospatial joins
- Shaped multi-city transportation investment recommendations by quantifying service equity gaps across 12+ U.S. cities, applying statistical distribution metrics (Gini, coefficient of variation, quantile analysis) across 5 operational dimensions
- Automated recurring reporting to eliminate 45% of manual workload, building reusable Python scripts with parameterized refresh logic that streamlined stakeholder deliverables across the research team
Python · Pandas · GeoPandas · Statistical Analysis · ETL
May 2025 – Jul 2025 · Hoboken, NJ
- Enabled the research team's first cross-regional benchmarking across 9 cities by architecting a Python and AWS pipeline (S3 staging, EC2 processing), consolidating 5M+ operational records from 8 transit systems into a centralized dataset
- Uncovered 3 critical service coverage gaps across 9 regions through demographic and operational analysis in SQL and Python; findings cited in the team's policy recommendation report to city-level transit agencies
- Guided resource allocation at 4 city-level agencies by building Tableau dashboards visualizing the top 3 drivers of operational delays by geography, presented directly to stakeholders to inform service planning
Python · SQL · AWS (S3, EC2) · Tableau · Data Pipelines
Feb 2023 – May 2024 · Navi Mumbai, India
- Consolidated 3 years of procurement data into the department's first unified dataset, designing an inventory analytics pipeline that queried 50,000+ rows in SQL and restructured records with Pandas across 8+ categories, cutting data prep time 35%
- Replaced legacy Excel reporting and improved prep time 40% by deploying a 4-dashboard Power BI suite covering inventory turnover, supplier lead times, and enrollment-driven demand with drill-through navigation
- Forecasted peak-period demand to reduce stockouts 15%, building a Python model using moving averages, exponential smoothing, and enrollment-driven calendar variables, validated against 3 prior years of data
Python · SQL · Power BI · Forecasting · Inventory Analytics
| Category | Stack |
|---|---|
| Languages | Python (Pandas, NumPy, Scikit-learn, XGBoost, LightGBM), SQL, R, DAX |
| Databases | SQL Server, PostgreSQL |
| BI & Visualization | Power BI (PL-300 Certified), Tableau, Excel |
| Cloud & Tools | AWS, Git, VS Code, SSMS |
| Methods | RFM Segmentation, Cohort Analysis, A/B Testing, Regression, Hypothesis Testing, KPI Design, Forecasting, ETL Pipelines |
End-to-end customer analytics platform built on 1M+ retail transactions. Python cleaning pipeline, 8 SQL scripts (NTILE, LEAD/LAG, cohort self-joins, chi-square validation), and Power BI dashboards.
Key findings: 18% of customers drive 79% of revenue · 61% never return after first purchase
Python · SQL Server · Power BI · RFM · Cohort Analysis
Analyzed 44,000+ inventory records across $2.7M+ in stock using SQL and Excel. Flagged $180K+ in idle inventory and built a Tableau dashboard tracking DIO, turnover rates, and overstock risk by SKU across 5 product categories.
SQL · Excel · Tableau · Inventory Analytics
Pricing intelligence framework using Python (XGBoost) with competitor price analysis, freight impact modeling, and Tableau dashboards. Designed to help category managers identify margin leakage and simulate pricing strategies.
Python · XGBoost · Tableau · Machine Learning
Demand forecasting model for perishable grocery items using store-level sales, weather, and calendar data. Trained a LightGBM regressor (MAE: 3.76 units) and built a prediction interface for store-level replenishment decisions.
Python · LightGBM · Feature Engineering · Forecasting
Segmented 2,000 customers into 5 behavioral clusters by income and spending patterns using K-Means. Output informed targeted marketing strategies and loyalty program prioritization.
Python · K-Means · Scikit-learn · Clustering
Supply Chain Intelligence — Python + SQL + Power BI platform analyzing 65K+ shipments across carriers, inventory, and routes. Early dashboard preview live. View repo →
- Microsoft — Power BI Data Analyst Associate (PL-300)
- Google — Data Analytics Specialization (Coursera)
Open to full-time Data Analyst roles starting May 2026. Best reached via LinkedIn or email.