An ML-based advisory system that recommends suitable crops for Indian farmers by state and district, land size, and region-specific soil/climate. It shows the top 5 crops by advisory score (suitability, risk, and regional potential), with estimated production in kg, market price in ₹/kg, risk and disease information, and prevention measures. No profit figures are shown—advisory only. Built for final-year / academic use.
- Region-first flow: Select state, district, and land size (bigha). No manual soil/climate input—the system uses state-specific agro-climatic defaults so recommendations vary by region (e.g. Rajasthan vs Kerala vs Himachal Pradesh).
- Advisory-only output: Top 5 crops ranked by a balanced advisory score (suitability + regional potential − risk). No net profit, ROI, or profit charts.
- Indian units: Production and sale quantity in kg; prices in ₹/kg; land in bigha (with acres shown). No quintals or tons.
- Per-crop details: Suitability %, estimated production (kg), market price (₹/kg), estimated sale quantity (kg), risk score, disease/pest risks, prevention measures, and soil-based growing tips.
- Soil nutrient view: After analysis, a soil nutrient distribution (N, P, K) chart is shown as a crop-average reference for the selected region.
- Data used for analysis: Sidebar shows total records, number of states, and total crops used by the engine. Optional refresh via data.gov.in API.
- Dark theme UI; optional lighter theme in code.
Choosing the wrong crop for a given region and land leads to lower yield and wasted effort. This system acts as a decision-support tool: given state, district, and land size, it uses region-specific soil/climate profiles and an ML model to recommend the most suitable crops, with explainability, risk, and preventive advice—without showing direct profit to avoid misleading estimates.
-
Data
- Crop recommendation dataset: N, P, K, temperature, humidity, ph, rainfall → crop label (e.g. Kaggle – Crop Recommendation Dataset).
- Adding more training data: Put any extra CSVs with the same columns (N, P, K, temperature, humidity, ph, rainfall, label) in
data/raw/. The pipeline merges all compatible CSVs when you runpython run_pipeline.py, so you can keepCrop_Recommendation.csvand add e.g.crop_extra.csv. - Regional data (optional):
state_wise_yield.csv,market_prices.csv,cost_of_cultivation.csv,climate_vulnerability.csvindata/raw/for state/district-aware yield, price, and risk. If absent, embedded national averages are used.
-
Region-specific inputs
- States are mapped to agro-climatic zones (arid NW, eastern humid, southern, west coast, central, Himalayan, western dry). Each zone has distinct default N, P, K, temperature, humidity, ph, rainfall (aligned with training data). State + district offsets ensure different regions get meaningfully different inputs so recommendations vary by location and land size.
-
ML pipeline
- Preprocessing: Label encoding,
StandardScaleron training set, stratified train–test split. - Models: Decision Tree, Random Forest, KNN, SVM, Logistic Regression—tuned via GridSearchCV; best model by test F1-macro (e.g. SVM).
- Prediction:
predict_crop(N, P, K, temperature, humidity, ph, rainfall, land_size_bigha, state, district, scoring_mode="balanced")returns top 5 crops with suitability, production (kg), price (₹/kg), risk, disease risks, prevention, and explanations.
- Preprocessing: Label encoding,
-
Explainability & soil health
- Feature importance (in
models/metadata.json), explanation text, and rule-based soil health messages and crop-specific suggestions.
- Feature importance (in
| Component | Role |
|---|---|
| StandardScaler | Feature scaling |
| LabelEncoder | Crop labels |
| SVM / KNN / RF / etc. | Classification (best model saved) |
| GridSearchCV | Hyperparameter tuning |
| Stratified K-Fold | Cross-validation |
| Region data loader | State/district yield, price, cost, vulnerability |
| Balanced scoring | Suitability + regional potential − risk (no profit in UI) |
SMART CROP REC/
├── data/raw/ # Crop_Recommendation.csv (or sample); optional: state_wise_yield, market_prices, cost_of_cultivation, climate_vulnerability
├── models/ # model.joblib, scaler.joblib, label_encoder.joblib, metadata.json (after run_pipeline.py)
├── reports/figures/ # EDA and evaluation plots
├── src/ # config, data_loader, zone_soil, preprocess, train, evaluate, predictor, region_data_loader, profit_engine, risk_engine, soil_health, explainer, market_price_fetcher
├── app.py # Streamlit UI — Smart Agriculture Advisory System
├── run_pipeline.py # One-command ML pipeline
├── tests/ # Crop variety tests (state, district, land size)
├── requirements.txt
├── README.md
├── REPORT.md
└── docs/ARCHITECTURE.md
cd "SMART CROP REC"
pip install -r requirements.txt- Place Crop_Recommendation.csv in
data/raw/(e.g. from Kaggle), or use Crop_Recommendation_sample.csv for quick testing. - Optional: Add
state_wise_yield.csv,market_prices.csv,cost_of_cultivation.csv,climate_vulnerability.csvfor better region-aware results.
python run_pipeline.pyThis loads data, runs EDA, preprocesses, trains and compares models, selects the best, and saves artifacts to models/.
streamlit run app.pyOr:
python -m streamlit run app.pyThen open the URL (e.g. http://localhost:8501). Select state, district, and land size (bigha) → click Proceed to Analysis → view top 5 crops, production (kg), price (₹/kg), risk, diseases, and prevention. Use Start new analysis to run again.
python -m pytest tests/test_crop_variety.py -vTests verify that crop recommendations vary by state, district, and land size (not the same 5 crops for all regions).
- Technical design:
docs/ARCHITECTURE.md - Academic report:
REPORT.md - Code: Comments in
src/andapp.py
Use the dataset in accordance with its source (e.g. Kaggle) and cite it in your report. This project is for academic and educational use.