This project was developed as part of the Phase 3 Tech Challenge, focusing on Order Cycle Time (OCT) prediction using Amazon Delivery data.
The objective is to build a robust Machine Learning Operations (MLOps) pipeline that includes MLflow model serving connected with a Streamlit visualization app and a SimPy-based last-mile delivery simulator.
Access the live application and the full history of model experiments using the links below.
| Tool | Status | Link |
|---|---|---|
| Active Forecasting App | π’ Deployed | Open App |
| MLflow Experiment Tracking | π Monitoring | View Dashboard |
The architecture illustrates the data flow from raw ingestion, through feature engineering and model training, to the final deployment and tracking services.
- OTD Prediction = The total time elapsed from the moment in which the order is input into the system until its final delivery at the customer's location in minutes.
This prediction is crucial for ensuring the deliveries meet the minimum Service Level Agreement (SLA) of 120 minutes, thereby directly impacting customer satisfaction and logistics efficiency.
The Amazon Delivery Dataset provides a comprehensive view of last-mile logistics operations, including:
- 43,632 deliveries across multiple cities
- Order details and delivery agents information
- Weather and traffic conditions
- Delivery performance metrics
We also utilized Simpy for discrete-event simulation to model and analyze various scenarios related to delivery performance.
βββ data/ # Raw and processed data
β βββ raw/ # Original data from Kaggle
β βββ processed/ # Cleaned and transformed data
β βββ simulation/ # Simulation results
βββ notebooks/ # Notebooks
β βββ 02_EDA # Exploratory Data Analysis (EDA)
β βββ 02_MODEL_VALIDATION # ML Process Analysis
β βββ 02_SIMULATION.ipynb # Simulation Data Analysis
βββ reports/ # Reports and figures
β βββ figures/models/ # Images Plots from Model Validation
βββ src/ # Project source code
β βββ config/ # Configuration files
β βββ data/ # Processing modules
β βββ features/ # Feature Engineering modules
β βββ modeling/ # ML training modules
β βββ models/ # Models files
β βββ utils/ # Utilities modules
β βββ visualization/ # Visualizations
βββ tests/ # Project tests
βββ app.py # Streamlit app
βββ otd_simulator.py # Simpy script
βββ project.toml # Poetry config files
βββ requirements.txt # Requirements
- β Data collection from Kaggle
- β Data processing and cleaning
- β Storage in organized structure
- β Categorical variable mappings
- β Exploratory Data Analysis (EDA)
- β Feature Engineering
- β Training with LightGBM
- β Experiment tracking with MLflow
- β MlFlow for experiment and versioning
- β Interactive dashboard in Streamlit
- β Data and results visualizations
- β Real-time prediction interface
- β Simpy Last Mile simulation
- Python 3.11.x
- Pandas & NumPy for data manipulation
- Statsmodels & Scipy for statistics
- Scikit-learn for ML pipeline
- LightGBM for ML modeling
- Shap for feature importances
- MLflow for experiment tracking
- Streamlit for interactive dashboard
- Matplotlib, Seaborn & Plotly for visualizations
- Simpy for simulation
- Python 3.11.x
- Conda installed globally
- Poetry installed globally
Environment Variables Setup (Critical)
- Create an account at https://dagshub.com/ and generate a token from your profile settings.
- You must set up your environment variables (including DAGsHub credentials and MLflow URI) before installing dependencies.
Option 1: Installation via Conda and pip (Recommended)
- Clone the repository:
git clone https://github.com/IgorComune/tech_challenge_ml_engineer_phase3.git
cd tech_challenge_ml_engineer_phase3- Create a conda environment:
conda create -n tech_challenge python=3.11.9
conda activate tech_challenge- Install dependencies:
pip install -r requirements.txt- Run the project: Streamlit Dashboard:
streamlit run app.pyOption 2: Installation via Poetry
- Clone the repository:
git clone [https://github.com/IgorComune/tech_challenge_ml_engineer_phase3.git](https://github.com/IgorComune/tech_challenge_ml_engineer_phase3.git)
cd tech_challenge_ml_engineer_phase3- Install dependencies and create the Poetry virtual environment:
# This command automatically reads pyproject.toml, creates the .venv (virtual environment), and installs all dependencies.
poetry install- Activate the environment:
# This command spawns a shell in the project's virtual environment.
poetry shell- Running the Project: Streamlit Dashboard:
streamlit run app.py- Improvement in on-Time Delivery (OTD) SLA (120 minutes) rate from 41% to 70% with Predictive model and corrective real-time actions.
- Identification of key delivery patterns based on categorical features.
- Development of new predictive features based on categorical patterns and statistics test.
- Correlation between weather conditions and delivery time
- Traffic impact on logistics performance.
- LightGBM model for OTD prediction
- Evaluation metrics available in MLflow
- Feature importance visualization with Shap
- User-friendly interface for data analysis
- Real-time predictions
- Interactive result visualizations
- Implementation of politics in real-time logistics (e.g., re-routing, agent reassignment), resulting in a measurable uplift in On-Time Delivery (OTD) performance.
amazon_delivery_processed.csv- Complete processed dataset
models:/LightGBM_Ajustado/Production- Versioned LightGBM
- Test notebooks available in
tests/
This project was developed as part of Phase 3 Tech Challenge. Feedback and suggestions are welcome!
This project is under the license specified in the LICENSE file.
Project: Tech Challenge ML Engineer - Phase 3
Institution: PΓ³s-Tech
"Transforming data into insights, insights into value."
