feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature by Copilot · Pull Request #151 · JosephPBaruch/CharAI

Copilot · 2026-04-14T21:47:06Z

The training script had no accuracy enforcement — a broken model could be baked into the Docker image silently. Additionally, the model produced negative R² because it excluded crop type (which explains ~58% of yield variance) and lacked feature scaling. The Feature Sensitivity Analysis ran on every build (including CI), and zero-yield rows (failed crops) polluted training data.

Fix negative R² — model training improvements

Added Crop to MODEL_FEATURE_COLUMNS: Crop type was the dominant predictor of yield (means range from ~116 g/m² for Winter Lentil to ~488 g/m² for Winter Wheat) but was excluded. The model now uses 5 features: Crop, elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness.
Deterministic crop encoding: Added CROP_ENCODING dict to YieldCalculator for consistent crop-to-integer mapping across training and inference, replacing sklearn.LabelEncoder which produces different encodings depending on input data.
Feature scaling via BatchNormalization: Added a BatchNormalization layer as the first layer in the model to handle scale differences (elevation ~750-800 vs aspect ~-1..1). Learned statistics are stored in the .keras file.
Inference-time crop encoding: _calculate() in YieldCalculator now automatically encodes string crop codes, and raises ValueError on unknown crop codes (fail-fast).

Training script (`CreateAndTrainYieldCalculatorModel.py`)

Accuracy gate: Added MIN_R2_THRESHOLD = 0.0 — script exits non-zero if R² on the held-out test set is below this, failing the Docker build and CI pipeline
Structured accuracy report: Logs MSE, MAE, RMSE, R², threshold, row/feature counts in a delimited block for easy CI log inspection
Data quality: Filter out zero-yield rows (GrainYieldAirDry <= 0) — these are planting failures, not valid observations
Removed Feature Sensitivity Analysis: No longer runs during build

Helpers (`helpers.py`)

Updated encode() to use the fixed CROP_ENCODING mapping instead of sklearn.LabelEncoder, with validation that raises ValueError on unknown crop codes
Updated Create_Model() to include BatchNormalization after the input layer

Dockerfile

Updated error message to point at the accuracy report in build logs
Added comment documenting the accuracy enforcement behavior
Added CACHEBUST build arg: Placed before the training RUN step so passing a unique value (e.g. git SHA, timestamp) invalidates Docker's layer cache and forces model retraining

CI (`ci.yaml`)

Model retrains on every CI build: CACHEBUST is set to ${{ github.sha }} so the training layer cache is busted on every push, ensuring the model is never stale

Docker Compose (`docker-compose.yml`)

Passes CACHEBUST build arg to the backend service, defaulting to empty (cached) for local dev; set CACHEBUST=$(date +%s) to force retraining locally

Test updates

test_calculator.py: Added Crop column to test data to match updated MODEL_FEATURE_COLUMNS

Documentation (`YieldPredictionModel/README.md`)

Added Model Accuracy Requirements section: threshold table, rationale for R², failure behavior, how to update the threshold, example log output
Added Model Features section: feature table, crop encoding details, and feature scaling documentation
Added Data Quality section: Cook harvest cleaning steps

…dling - Add MIN_R2_THRESHOLD (0.05) accuracy gate to training script - Remove Feature Sensitivity Analysis section from training - Filter out zero-yield rows from Cook harvest data (failed crops) - Add structured accuracy report logging for CI visibility - Update Dockerfile error message for accuracy failures - Document accuracy requirements and thresholds in README Closes #148 Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/a4ad74d0-5f39-48bf-a47b-5a9f59d1de6e Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>

JosephPBaruch · 2026-04-14T22:05:29Z

@copilot The R^2 error is too low (less than 0) investigate why this might be happening problems with how I am training the model and using the data.

JosephPBaruch · 2026-04-14T22:05:57Z

@copilot Also, set the r^2 to 0 and not 0.05

Root cause of negative R²: the model used only 4 terrain features (elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness) from a single small farm where terrain has minimal variance. Crop type explains ~58% of yield variance but was not included as a feature. Changes: - Add Crop to MODEL_FEATURE_COLUMNS in YieldCalculator - Add CROP_ENCODING dict for deterministic crop-to-integer mapping - Update _calculate() to encode string Crop values at inference time - Update encode() in helpers.py to use the fixed mapping instead of sklearn LabelEncoder (which produces inconsistent encodings) - Add BatchNormalization layer to model architecture for feature scaling (elev ~750-800 vs aspect ~-1..1 caused gradient dominance) - Update test_calculator.py to include Crop column in test data - Update README with model features and encoding documentation Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>

…tly defaulting Address code review feedback: unknown crop codes now raise a ValueError with a clear message listing the bad codes and valid options, instead of silently producing -1 or NaN values. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>

Copilot · 2026-04-14T22:15:21Z

@copilot The R^2 error is too low (less than 0) investigate why this might be happening problems with how I am training the model and using the data.

Investigated the root cause: the model was trained on only 4 terrain features (elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness) from a single small farm (~1km²) where terrain values have very low variance. Meanwhile, crop type alone explains ~58% of yield variance (mean yields range from ~116 g/m² for Winter Lentil to ~488 g/m² for Winter Wheat) but was excluded as a feature.

Three fixes in commits 0709ebf and e8d7d62:

Added Crop to MODEL_FEATURE_COLUMNS with a fixed CROP_ENCODING dict for deterministic encoding across training and inference (replaced sklearn.LabelEncoder which produces inconsistent mappings).
Added BatchNormalization as the first model layer to handle the scale mismatch (elevation ~750-800 vs aspect ~-1..1) that caused gradient dominance.
Updated _calculate() to automatically encode string crop codes at inference time, and fail fast with a clear ValueError on unknown crop codes.

Update the accuracy gate threshold from 0.05 to 0.0 per reviewer request, and update the README table to match. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/deeb7473-bc54-44ca-ae58-8f21b51c9788 Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>

Copilot · 2026-04-14T22:17:24Z

@copilot Also, set the r^2 to 0 and not 0.05

Updated MIN_R2_THRESHOLD from 0.05 to 0.0 in the training script and the README table in 9a3c33a.

JosephPBaruch · 2026-04-14T22:49:14Z

@copilot Is there a way to train the model every time. I don't like that the model itself is being cached.

Add ARG CACHEBUST to the Dockerfile before the model training step. In CI, CACHEBUST is set to the git SHA so every push retrains the model instead of reusing the cached layer. For local builds, pass CACHEBUST=$(date +%s) to force retraining. Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/df81b0f8-f952-446f-9dbc-48430362a3fd Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>

Copilot · 2026-04-14T22:51:06Z

@copilot Is there a way to train the model every time. I don't like that the model itself is being cached.

Added a CACHEBUST build arg to the Dockerfile right before the training RUN step in 6c79401. In CI, it's set to ${{ github.sha }} so every push gets a unique value and Docker cannot reuse the cached training layer. For local builds, run CACHEBUST=$(date +%s) docker compose build backend to force retraining.

Initial plan

580d7b0

Copilot AI assigned Copilot and JosephPBaruch Apr 14, 2026

Copilot AI linked an issue Apr 14, 2026 that may be closed by this pull request

Document and enforce model accuracy check in CI for backend image creation #148

Closed

Copilot started work on behalf of JosephPBaruch April 14, 2026 21:47 View session

Copilot AI changed the title ~~[WIP] Document and enforce model accuracy check in CI for backend image creation~~ feat(backend): enforce model accuracy gate in CI and clean Cook harvest data Apr 14, 2026

Copilot AI requested a review from JosephPBaruch April 14, 2026 21:54

Copilot finished work on behalf of JosephPBaruch April 14, 2026 21:55

Copilot started work on behalf of JosephPBaruch April 14, 2026 22:05 View session

Copilot AI and others added 2 commits April 14, 2026 22:13

Copilot AI changed the title ~~feat(backend): enforce model accuracy gate in CI and clean Cook harvest data~~ feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature Apr 14, 2026

Copilot finished work on behalf of JosephPBaruch April 14, 2026 22:16

Copilot started work on behalf of JosephPBaruch April 14, 2026 22:16 View session

Copilot finished work on behalf of JosephPBaruch April 14, 2026 22:18

JosephPBaruch marked this pull request as ready for review April 14, 2026 22:33

Copilot started work on behalf of JosephPBaruch April 14, 2026 22:49 View session

Copilot finished work on behalf of JosephPBaruch April 14, 2026 22:52

JosephPBaruch merged commit a746184 into main Apr 14, 2026
1 check passed

JosephPBaruch deleted the copilot/document-model-accuracy-check branch April 14, 2026 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature#151

feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature#151
JosephPBaruch merged 6 commits intomainfrom
copilot/document-model-accuracy-check

Copilot AI commented Apr 14, 2026 •

edited

Loading

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix negative R² — model training improvements

Training script (CreateAndTrainYieldCalculatorModel.py)

Helpers (helpers.py)

Dockerfile

CI (ci.yaml)

Docker Compose (docker-compose.yml)

Test updates

Documentation (YieldPredictionModel/README.md)

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

JosephPBaruch commented Apr 14, 2026

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 14, 2026 •

edited

Loading

Training script (`CreateAndTrainYieldCalculatorModel.py`)

Helpers (`helpers.py`)

CI (`ci.yaml`)

Docker Compose (`docker-compose.yml`)

Documentation (`YieldPredictionModel/README.md`)