Crop growth modeling is essential for understanding and predicting agricultural outcomes. Traditional process-based crop models, like ORYZA2000, are effective but often suffer from oversimplification and parameter estimation challenges. Machine learning methods, though promising, are often criticized for being "black-box" models and requiring large datasets that are frequently unavailable in real-world agricultural settings.
DeepCGM addresses these limitations by integrating knowledge-guided constraints into a deep learning model to ensure physically plausible crop growth simulations, even with sparse data.
This repository contains the code and resources for the paper: Knowledge-guided machine learning with multivariate sparse data for crop growth modelling
- Features
- Installation
- Repository Structure
- Model Architecture
- Data
- Train the Model
- Training Flowchart
- Evaluate the Model
- License
- Mass-Conserving Deep Learning Architecture: Adheres to crop growth principles such as mass conservation to ensure physically realistic predictions.
- Knowledge-Guided Constraints: Includes crop physiology and model convergence constraints, enabling accurate predictions with sparse data.
- Improved Accuracy: Outperforms traditional process-based models and classical deep learning models on real-world crop datasets.
- Multivariable Prediction: Simulates multiple crop growth variables (e.g., biomass, leaf area) in a single framework.
To install the dependencies, clone the repository and install the required packages using the command below:
conda
git clone https://github.com/WUR-AI/DeepCGM.git
cd DeepCGM
conda create -n DeepCGM python==3.10.16
conda activate DeepCGM
pip install -r ./requirements.txtrequirements.txt: Requirements.train.py: Script to train the different model.utils.py: Utility functions for data preprocessing and model support.fig_5.py,fig_6.py,fig_7.py, etc.: Scripts to generate figures for model results.models_aux: Folder containing models.DeepCGM.pyis the DeepCGM model 100% following the detail process of DeepCGMDeepCGM_fast.pyimprove the model speed by combining the gate calculation according to this suggestion and by combining the redistribution calculation.MCLSTM.pyandMCLSTM_fast.pyare raw MCLSTM and speed improved MCLSTM
format_dataset: Formatted dataset.figure: Folder for storing figures generated during model evaluation and analysis.
DeepCGM is a deep learning-based crop growth model with a mass-conserving architecture (detail refers to detail process of DeepCGM). The architecture ensures that simulated crop growth adheres to physical principles, including:
DeepCGM operates on time series data representing crop growth cycles.
- Input Data: Meteorological variables (e.g., daily solar radiation, maximum temperature, minimum temperature), management information (e.g., cumulative nitrogen applied), and optionally, simulated variables like crop development stage (DVS) from a model like ORYZA2000.
- Output (Target) Data: Measured crop variables used for training and evaluation, such as Plant Area Index (PAI), biomass of individual organs (leaf, stem, grain), total above-ground biomass (WAGT), and final yield.
Run the train.py script to train the model using your formatted data:
python train.py --model DeepCGM --target spa --input_mask 1 --convergence_loss 1 --tra_year 2018You can modify the training parameters, such as model type, knowledge triggers, and training years as following arguments:
- --model: Specifies the model type (
NaiveLSTM,MCLSTM,DeepCGM). - --target: Specifies the training label (
spafor sparse dataset andintfor interpolated dataset). - --input_mask: Enables the input mask (
1to enable,0to disable). - --convergence_trigger: Enables the convergence_loss (
1to enable,0to disable). - --tra_year: Specifies the training year (e.g.,
2018and2019).
The fitting loss, convergence loss and input mask can be used in training DeepCGM
Use the figure scripts (e.g., fig_5.py, fig_12.py, etc.) to generate visualizations of the model's performance. Example:
python fig_5.pyCrop growth simulation results of models trained by different data and strategies
python fig_12.pyDeepCGM outperforms traditional process-based models (Normlized Index):
More results are saved in the figure folder, and detailed evaluation figures are generated using the provided scripts.
This project is licensed under the CC BY-NC 4.0 License — for non-commercial research and academic use only.
See LICENSE for full details.
For commercial use, please contact: [email protected]