feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature#151
Conversation
…dling - Add MIN_R2_THRESHOLD (0.05) accuracy gate to training script - Remove Feature Sensitivity Analysis section from training - Filter out zero-yield rows from Cook harvest data (failed crops) - Add structured accuracy report logging for CI visibility - Update Dockerfile error message for accuracy failures - Document accuracy requirements and thresholds in README Closes #148 Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/a4ad74d0-5f39-48bf-a47b-5a9f59d1de6e Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
|
@copilot The R^2 error is too low (less than 0) investigate why this might be happening problems with how I am training the model and using the data. |
|
@copilot Also, set the r^2 to 0 and not 0.05 |
Root cause of negative R²: the model used only 4 terrain features (elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness) from a single small farm where terrain has minimal variance. Crop type explains ~58% of yield variance but was not included as a feature. Changes: - Add Crop to MODEL_FEATURE_COLUMNS in YieldCalculator - Add CROP_ENCODING dict for deterministic crop-to-integer mapping - Update _calculate() to encode string Crop values at inference time - Update encode() in helpers.py to use the fixed mapping instead of sklearn LabelEncoder (which produces inconsistent encodings) - Add BatchNormalization layer to model architecture for feature scaling (elev ~750-800 vs aspect ~-1..1 caused gradient dominance) - Update test_calculator.py to include Crop column in test data - Update README with model features and encoding documentation Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
…tly defaulting Address code review feedback: unknown crop codes now raise a ValueError with a clear message listing the bad codes and valid options, instead of silently producing -1 or NaN values. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Investigated the root cause: the model was trained on only 4 terrain features ( Three fixes in commits
|
Update the accuracy gate threshold from 0.05 to 0.0 per reviewer request, and update the README table to match. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/deeb7473-bc54-44ca-ae58-8f21b51c9788 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Updated |
|
@copilot Is there a way to train the model every time. I don't like that the model itself is being cached. |
Add ARG CACHEBUST to the Dockerfile before the model training step. In CI, CACHEBUST is set to the git SHA so every push retrains the model instead of reusing the cached layer. For local builds, pass CACHEBUST=$(date +%s) to force retraining. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/df81b0f8-f952-446f-9dbc-48430362a3fd Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Added a |
The training script had no accuracy enforcement — a broken model could be baked into the Docker image silently. Additionally, the model produced negative R² because it excluded crop type (which explains ~58% of yield variance) and lacked feature scaling. The Feature Sensitivity Analysis ran on every build (including CI), and zero-yield rows (failed crops) polluted training data.
Fix negative R² — model training improvements
CroptoMODEL_FEATURE_COLUMNS: Crop type was the dominant predictor of yield (means range from ~116 g/m² for Winter Lentil to ~488 g/m² for Winter Wheat) but was excluded. The model now uses 5 features:Crop,elev_mean_m,slope_mean_deg,aspect_eastness,aspect_northness.CROP_ENCODINGdict toYieldCalculatorfor consistent crop-to-integer mapping across training and inference, replacingsklearn.LabelEncoderwhich produces different encodings depending on input data.BatchNormalization: Added aBatchNormalizationlayer as the first layer in the model to handle scale differences (elevation ~750-800 vs aspect ~-1..1). Learned statistics are stored in the.kerasfile._calculate()inYieldCalculatornow automatically encodes string crop codes, and raisesValueErroron unknown crop codes (fail-fast).Training script (
CreateAndTrainYieldCalculatorModel.py)MIN_R2_THRESHOLD = 0.0— script exits non-zero if R² on the held-out test set is below this, failing the Docker build and CI pipelineGrainYieldAirDry <= 0) — these are planting failures, not valid observationsHelpers (
helpers.py)encode()to use the fixedCROP_ENCODINGmapping instead ofsklearn.LabelEncoder, with validation that raisesValueErroron unknown crop codesCreate_Model()to includeBatchNormalizationafter the input layerDockerfile
CACHEBUSTbuild arg: Placed before the trainingRUNstep so passing a unique value (e.g. git SHA, timestamp) invalidates Docker's layer cache and forces model retrainingCI (
ci.yaml)CACHEBUSTis set to${{ github.sha }}so the training layer cache is busted on every push, ensuring the model is never staleDocker Compose (
docker-compose.yml)CACHEBUSTbuild arg to the backend service, defaulting to empty (cached) for local dev; setCACHEBUST=$(date +%s)to force retraining locallyTest updates
test_calculator.py: AddedCropcolumn to test data to match updatedMODEL_FEATURE_COLUMNSDocumentation (
YieldPredictionModel/README.md)