Skip to content
View MahdiNavaei's full-sized avatar

Block or report MahdiNavaei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MahdiNavaei/README.md

👋 Hi, I'm Mahdi Navaei

Profile Views

AI Engineer / Data Scientist | Building Production-Grade AI Systems | Tehran, Iran 🇮🇷

7+ years specializing in Agentic AI, LLMs, RAG Systems, and Enterprise ML Pipelines

LinkedIn

GitHub

Kaggle

Email


🚀 About Me

I'm an AI Engineer and Data Scientist with 7+ years of experience designing and deploying production-grade intelligent systems at scale. I specialize in cutting-edge AI technologies that drive real business impact:

🎯 Core Specializations

🤖 Agentic AI & Autonomous Workflows

  • Building autonomous AI agents capable of multi-step reasoning and complex task orchestration

  • Designing structured reasoning frameworks for root-cause analysis and problem-solving

  • Developing retrieval-augmented agentic workflows with external tool integration

💬 Large Language Models (LLMs) & NLP

  • Fine-tuning GPT, LLaMA, and custom transformer models for domain-specific applications

  • Implementing RAG (Retrieval-Augmented Generation) architectures with hybrid retrieval strategies

  • Building Natural-Language-to-SQL agents and conversational AI systems

  • Advanced prompt engineering and model optimization techniques

🔍 Enterprise RAG & Knowledge Systems

  • Creating enterprise-grade knowledge engines with hybrid retrieval (semantic + keyword)

  • Implementing re-ranking pipelines for improved relevance and accuracy

  • Building production RAG systems for organizational knowledge management

📊 Production ML & MLOps

  • End-to-end ML pipeline development from data ingestion to model deployment

  • Scalable recommendation systems for large-scale e-commerce platforms

  • Real-time inference systems with optimized performance and latency


💼 Current Role & Impact

Data Scientist at Daria Hamrah Paytakht (Jul 2024 – Present)

Leading AI initiatives and building production systems that serve enterprise customers:

🔥 Key Projects & Achievements

🧠 Agentic AI Workflows

  • Developed autonomous AI agents capable of performing multi-step root-cause analysis on customer complaints through structured reasoning and retrieval orchestration

  • Built agents that autonomously navigate complex decision trees and generate actionable insights

💬 LLM-Powered Intelligence Systems

  • Built autonomous Natural-Language-to-SQL agent capable of understanding Persian queries, generating validated SQL commands, executing on Postgres, and producing automated analysis with visualizations through an end-to-end LLM-driven workflow

  • Developed LLM-powered call-center intelligence pipeline integrating speech-to-text transcription, entity extraction, and automated agent-performance scoring—substantially improving insight coverage and quality-control effectiveness

  • Integrated LLM agents into analytics dashboards, enabling conversational insights, automated reporting, and interactive data exploration

🔍 Enterprise RAG Knowledge Engine

  • Created enterprise RAG knowledge engine using hybrid retrieval (semantic + keyword) and advanced re-ranking to enable accurate, context-grounded responses and improved access to organizational knowledge

  • Implemented retrieval pipelines optimized for accuracy and relevance in production environments

📈 Production ML Systems

  • Built large-scale hybrid recommender system (content-based + collaborative) enhanced with RFM-based personalization to deliver precise and real-time user targeting

  • Designed aspect-based sentiment analysis framework to surface issue-level signals across device models, directly supporting after-sales strategy and product optimization

🎤 Multimodal AI Capabilities

  • Added production-grade STT and TTS capabilities for automated report narration, customer-support voice responses, and enhanced call-center automation pipelines

🏆 Key Achievements

  • 🥈 2nd Place in Tehran Provincial AI Competition (2022)

  • 🎓 Member of Iran's National Elites Foundation

  • 📜 Kaggle Notebooks Master

  • 📄 Published Research in Health Science Reports (Wiley), ICVPR, AMLAI


🛠️ Tech Stack

🤖 AI & LLM Technologies

OpenAI GPT LLaMA Transformers LangChain LlamaIndex RAG Fine-tuning Prompt Engineering Semantic Search Vector Databases

🧠 Agentic AI & Workflows

Agentic AI Multi-step Reasoning Retrieval Orchestration Structured Reasoning Task Orchestration Tool Integration

💬 NLP & Language Technologies

Transformers BERT GPT LLaMA Natural Language Processing Semantic Search Entity Extraction Sentiment Analysis Aspect-based Analysis STT/TTS

🔍 RAG & Knowledge Systems

Retrieval-Augmented Generation Hybrid Retrieval Re-ranking Vector Embeddings Knowledge Graphs Document Processing

🚀 Production ML & Engineering

FastAPI Docker Kubernetes MLOps Model Deployment Real-time Inference A/B Testing Monitoring & Logging

📊 ML & Data Science

Python TensorFlow Keras PyTorch scikit-learn Pandas NumPy SciPy Collaborative Filtering Deep Learning Time-Series Forecasting Causal Inference

🗄️ Data & Infrastructure

SQL PostgreSQL NoSQL MongoDB Parquet Apache Spark Distributed Computing

🌐 Full-Stack Development

React TypeScript JavaScript REST APIs GraphQL Microservices

📈 Analytics & BI

Power BI Tableau Data Visualization Business Intelligence


🌟 Featured Projects

End-to-end collision prediction platform using Nexar's state-of-the-art BADAS-Open model, a FastAPI backend, and a fully bilingual React dashboard.

Key Features:

  • 🎯 State-of-the-Art Collision Prediction: Integrates Nexar's BADAS-Open vision model for real-time risk analysis.
  • 🚀 Production-Ready Architecture: Scalable FastAPI backend and a modern, responsive React + TypeScript frontend.
  • 🌐 Bilingual UI (English/Persian): Features real-time language switching and a dark, modern theme.
  • 🔒 100% Offline Inference: Runs entirely locally without external API calls, ideal for production and edge deployments.
  • 📊 Comprehensive Evaluation Pipeline: Includes industry-standard metrics like AUC-ROC and Average Precision.
  • 🎬 Live Demo GIF: Showcases the full user workflow from video upload to risk visualization.

Tech Stack: Python | FastAPI | React | TypeScript | PyTorch | Computer Vision | MLOps

Production-Ready Features:

  • Complete MLOps workflow from SOTA model integration to interactive UI.
  • Designed for scalability, clean code practices, and type safety.
  • Fully reproducible setup and evaluation instructions.

🔗 View Repository →


Production-ready hybrid recommender system combining collaborative filtering & content-based ML for large-scale e-commerce applications.

Key Features:

  • 🔀 Hybrid recommendation engine (Collaborative Filtering + Content-Based)

  • 🚀 FastAPI backend with async support and structured logging

  • 🌐 Bilingual React UI (English/Persian) with RTL/LTR support

  • 📊 Comprehensive offline evaluation metrics (Precision@K, Recall@K, NDCG@K, MAP@K)

  • 🐳 Docker containerization for easy deployment

  • 📈 Real-time recommendations with optimized sparse matrix operations

  • 🔧 Modular architecture following production best practices

Tech Stack: Python | FastAPI | React | TypeScript | NumPy | SciPy | scikit-learn | Docker

Results: Achieved 140% improvement in precision and 175% improvement in recall compared to a baseline on a 38K+ user dataset—demonstrating effectiveness on challenging real-world data.

Production-Ready Features:

  • Complete end-to-end pipeline from raw data to web interface

  • Scalable architecture designed for enterprise deployment

  • Comprehensive evaluation framework for model comparison

🔗 View Repository →


Enterprise-grade intelligent pricing and ETA prediction platform for ride-hailing platforms, combining predictive demand forecasting, real-time ETA estimation, and dynamic surge pricing optimization.

Key Features:

  • 🎯 Predictive Surge Pricing Engine: Anticipates future demand-supply imbalances using ML models, enabling proactive pricing adjustments before demand spikes occur—reducing price volatility by 30-40%

  • ⏱️ Advanced ETA Prediction: Achieves 20%+ improvement in accuracy over baseline using distance, speed, and zone-specific load factors with robust fallback mechanisms

  • 📊 Demand Forecasting: Predicts supply-demand imbalance within 15% margin of error for 5-30 minute time horizons, enabling data-driven pricing decisions

  • 💰 Revenue Optimization: Demonstrates +10-25% improvement in platform revenue per trip while maintaining customer satisfaction through balanced pricing strategies

  • 🚀 Real-Time Marketplace Dashboard: Interactive React dashboard with heatmaps, KPI delta cards, and scatter plot visualizations for policy comparison and real-time monitoring

  • 🔀 Multiple Pricing Policies: Implements three sophisticated pricing strategies:

    • Base Policy: Fixed pricing baseline for comparison
    • Smart Surge v1: Reactive surge pricing with demand-supply ratio analysis
    • Predictive Surge v2: Anticipatory surge pricing using short-term demand forecasting
  • 📈 Policy Simulation Engine: Comprehensive replay simulator comparing pricing policies with detailed KPI analysis (ETA, completion rates, revenue, volatility)

  • 🗺️ Geospatial Intelligence: Built with GeoPandas and OSMNX for Tehran, enabling accurate routing, distance calculations, and zone-based demand analysis

Tech Stack: Python | FastAPI | React | GeoPandas | OSMNX | NetworkX | scikit-learn | Time-Series Forecasting | Geospatial Analysis | MLOps

Performance Metrics:

  • +20% ETA accuracy improvement over baseline
  • ±15% demand forecast error margin
  • +5-15% trip completion rate in high-demand zones
  • +10-25% revenue efficiency per trip
  • -30-40% price volatility reduction

Production-Ready Features:

  • Complete end-to-end ML pipeline from data ingestion to real-time API deployment
  • Scalable FastAPI backend with async support and structured logging
  • Bilingual React dashboard (English/Persian) with real-time visualizations
  • Comprehensive evaluation framework with policy comparison and KPI tracking
  • Production-grade architecture designed for enterprise ride-hailing platforms

🔗 View Repository →


📁 Other Notable Projects

Advanced CNN-based model for highly accurate classification of blood cells. Achieved over 99% accuracy, ensuring precise identification across diverse cell types for streamlined medical diagnostics.

Tech Stack: Python | TensorFlow | Keras | CNN | Medical Imaging | Computer Vision


Production recommendation system using collaborative filtering and content-based filtering techniques. Demonstrated 8% sales increase after deployment, showcasing real business impact.

Tech Stack: Python | Collaborative Filtering | Content-Based Filtering | scikit-learn | Production ML


Automated data collection pipeline for historical stock price data from Yahoo Finance with database storage. Built for training ML models for stock price prediction and time-series analysis.

Tech Stack: Python | Web Scraping | Database Design | Data Pipeline | Time-Series Data


Deep learning model using Convolutional Neural Networks (CNNs) to classify images from the CIFAR-10 dataset. Achieved over 90% accuracy on the test set.

Tech Stack: Python | TensorFlow | Keras | CNN | Computer Vision | Image Classification


Production-ready deep learning classifier for cats and dogs using Convolutional Neural Networks. Achieved over 90% accuracy on the test set.

Tech Stack: Python | TensorFlow | Keras | CNN | Image Classification | Transfer Learning


Comprehensive customer segmentation and personality analysis for targeted marketing campaigns. Demonstrates advanced analytics and data-driven decision making.

Tech Stack: Python | Customer Analytics | Marketing Analytics | Clustering | Data Visualization | Business Intelligence


📊 GitHub Stats

GitHub Stats

Top Languages

GitHub Streak


📚 Publications & Research

Peer-Reviewed Publications


💼 Professional Experience

Data Scientist | Daria Hamrah Paytakht (Jul 2024 – Present)

Leading AI initiatives for enterprise customers, building production-grade systems serving thousands of users.

Data Science Team Lead | Diar-e Kohan CO. (Sep 2020 – May 2022)

Led a team of 7 data scientists and analysts, overseeing end-to-end ML projects and ensuring timely delivery of scalable solutions. Implemented data-driven strategies that increased bookstore sales by 5%.

Data Scientist | Diar-e Kohan CO. (Sep 2018 – Sep 2020)

Developed ML models for book sales prediction across Iran's bookstores, enabling targeted book distribution based on reading interests in different provinces.


🎓 Education

  • Master's Degree in Artificial Intelligence | Islamic Azad University (Jun 2024 – Present)

  • Bachelor's Degree in Information Technology | University of Applied Science and Technology (Feb 2024)


💬 Languages

  • 🇬🇧 English – Duolingo English Test: 120 (Proficient)

  • 🇩🇪 German – A2 (Basic)

  • 🇮🇷 Persian – Native


🎯 Looking For

I'm actively seeking opportunities to:

  • 🚀 Build and scale Agentic AI systems and LLM applications at innovative companies

  • 💼 Work on production-grade AI systems that solve real business problems

  • 🌍 Collaborate with international teams on cutting-edge AI/ML projects

  • 📈 Contribute to enterprise RAG systems and knowledge management platforms

  • 🤝 Join forward-thinking organizations that value innovation and technical excellence

Open to: Remote positions, Contract work, Full-time opportunities worldwide


📬 Let's Connect!

I'm always open to discussing AI/ML projects, collaborating on interesting initiatives, or exploring new opportunities. Let's connect!

LinkedIn

GitHub

Kaggle

Email


⭐ If you find my work interesting, please consider giving my repositories a star!

Building the future of AI, one system at a time. 🚀

Made with ❤️ by Mahdi Navaei

Pinned Loading

  1. DriveShield DriveShield Public

    Real-time dashcam collision risk prediction with BADAS-Open, FastAPI backend, and a bilingual React dashboard.

    Python

  2. ParaBoostForest-Hybrid-Parallel-Boosting-for-Imbalanced-Learning ParaBoostForest-Hybrid-Parallel-Boosting-for-Imbalanced-Learning Public

    Reproducible credit-card fraud benchmark on Kaggle with Optuna tuning, PR-based thresholding, and publication-ready plots; compares ParaBoostForest, RF, and XGBoost.

    Python