Skip to content

feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature#151

Merged
JosephPBaruch merged 6 commits intomainfrom
copilot/document-model-accuracy-check
Apr 14, 2026
Merged

feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature#151
JosephPBaruch merged 6 commits intomainfrom
copilot/document-model-accuracy-check

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 14, 2026

The training script had no accuracy enforcement — a broken model could be baked into the Docker image silently. Additionally, the model produced negative R² because it excluded crop type (which explains ~58% of yield variance) and lacked feature scaling. The Feature Sensitivity Analysis ran on every build (including CI), and zero-yield rows (failed crops) polluted training data.

Fix negative R² — model training improvements

  • Added Crop to MODEL_FEATURE_COLUMNS: Crop type was the dominant predictor of yield (means range from ~116 g/m² for Winter Lentil to ~488 g/m² for Winter Wheat) but was excluded. The model now uses 5 features: Crop, elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness.
  • Deterministic crop encoding: Added CROP_ENCODING dict to YieldCalculator for consistent crop-to-integer mapping across training and inference, replacing sklearn.LabelEncoder which produces different encodings depending on input data.
  • Feature scaling via BatchNormalization: Added a BatchNormalization layer as the first layer in the model to handle scale differences (elevation ~750-800 vs aspect ~-1..1). Learned statistics are stored in the .keras file.
  • Inference-time crop encoding: _calculate() in YieldCalculator now automatically encodes string crop codes, and raises ValueError on unknown crop codes (fail-fast).

Training script (CreateAndTrainYieldCalculatorModel.py)

  • Accuracy gate: Added MIN_R2_THRESHOLD = 0.0 — script exits non-zero if R² on the held-out test set is below this, failing the Docker build and CI pipeline
  • Structured accuracy report: Logs MSE, MAE, RMSE, R², threshold, row/feature counts in a delimited block for easy CI log inspection
  • Data quality: Filter out zero-yield rows (GrainYieldAirDry <= 0) — these are planting failures, not valid observations
  • Removed Feature Sensitivity Analysis: No longer runs during build

Helpers (helpers.py)

  • Updated encode() to use the fixed CROP_ENCODING mapping instead of sklearn.LabelEncoder, with validation that raises ValueError on unknown crop codes
  • Updated Create_Model() to include BatchNormalization after the input layer

Dockerfile

  • Updated error message to point at the accuracy report in build logs
  • Added comment documenting the accuracy enforcement behavior
  • Added CACHEBUST build arg: Placed before the training RUN step so passing a unique value (e.g. git SHA, timestamp) invalidates Docker's layer cache and forces model retraining

CI (ci.yaml)

  • Model retrains on every CI build: CACHEBUST is set to ${{ github.sha }} so the training layer cache is busted on every push, ensuring the model is never stale

Docker Compose (docker-compose.yml)

  • Passes CACHEBUST build arg to the backend service, defaulting to empty (cached) for local dev; set CACHEBUST=$(date +%s) to force retraining locally

Test updates

  • test_calculator.py: Added Crop column to test data to match updated MODEL_FEATURE_COLUMNS

Documentation (YieldPredictionModel/README.md)

  • Added Model Accuracy Requirements section: threshold table, rationale for R², failure behavior, how to update the threshold, example log output
  • Added Model Features section: feature table, crop encoding details, and feature scaling documentation
  • Added Data Quality section: Cook harvest cleaning steps

…dling

- Add MIN_R2_THRESHOLD (0.05) accuracy gate to training script
- Remove Feature Sensitivity Analysis section from training
- Filter out zero-yield rows from Cook harvest data (failed crops)
- Add structured accuracy report logging for CI visibility
- Update Dockerfile error message for accuracy failures
- Document accuracy requirements and thresholds in README

Closes #148

Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/a4ad74d0-5f39-48bf-a47b-5a9f59d1de6e

Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Copilot AI changed the title [WIP] Document and enforce model accuracy check in CI for backend image creation feat(backend): enforce model accuracy gate in CI and clean Cook harvest data Apr 14, 2026
Copilot AI requested a review from JosephPBaruch April 14, 2026 21:54
@JosephPBaruch
Copy link
Copy Markdown
Owner

@copilot The R^2 error is too low (less than 0) investigate why this might be happening problems with how I am training the model and using the data.

@JosephPBaruch
Copy link
Copy Markdown
Owner

@copilot Also, set the r^2 to 0 and not 0.05

Copilot AI and others added 2 commits April 14, 2026 22:13
Root cause of negative R²: the model used only 4 terrain features
(elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness)
from a single small farm where terrain has minimal variance. Crop type
explains ~58% of yield variance but was not included as a feature.

Changes:
- Add Crop to MODEL_FEATURE_COLUMNS in YieldCalculator
- Add CROP_ENCODING dict for deterministic crop-to-integer mapping
- Update _calculate() to encode string Crop values at inference time
- Update encode() in helpers.py to use the fixed mapping instead of
  sklearn LabelEncoder (which produces inconsistent encodings)
- Add BatchNormalization layer to model architecture for feature scaling
  (elev ~750-800 vs aspect ~-1..1 caused gradient dominance)
- Update test_calculator.py to include Crop column in test data
- Update README with model features and encoding documentation

Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85

Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
…tly defaulting

Address code review feedback: unknown crop codes now raise a
ValueError with a clear message listing the bad codes and valid
options, instead of silently producing -1 or NaN values.

Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/7ad0e165-bb39-4a38-8e16-a0df8b3f6e85

Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

@copilot The R^2 error is too low (less than 0) investigate why this might be happening problems with how I am training the model and using the data.

Investigated the root cause: the model was trained on only 4 terrain features (elev_mean_m, slope_mean_deg, aspect_eastness, aspect_northness) from a single small farm (~1km²) where terrain values have very low variance. Meanwhile, crop type alone explains ~58% of yield variance (mean yields range from ~116 g/m² for Winter Lentil to ~488 g/m² for Winter Wheat) but was excluded as a feature.

Three fixes in commits 0709ebf and e8d7d62:

  1. Added Crop to MODEL_FEATURE_COLUMNS with a fixed CROP_ENCODING dict for deterministic encoding across training and inference (replaced sklearn.LabelEncoder which produces inconsistent mappings).
  2. Added BatchNormalization as the first model layer to handle the scale mismatch (elevation ~750-800 vs aspect ~-1..1) that caused gradient dominance.
  3. Updated _calculate() to automatically encode string crop codes at inference time, and fail fast with a clear ValueError on unknown crop codes.

Copilot AI changed the title feat(backend): enforce model accuracy gate in CI and clean Cook harvest data feat(backend): enforce model accuracy gate in CI, fix negative R² by adding Crop feature Apr 14, 2026
Update the accuracy gate threshold from 0.05 to 0.0 per reviewer
request, and update the README table to match.

Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/deeb7473-bc54-44ca-ae58-8f21b51c9788

Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

@copilot Also, set the r^2 to 0 and not 0.05

Updated MIN_R2_THRESHOLD from 0.05 to 0.0 in the training script and the README table in 9a3c33a.

@JosephPBaruch JosephPBaruch marked this pull request as ready for review April 14, 2026 22:33
@JosephPBaruch
Copy link
Copy Markdown
Owner

@copilot Is there a way to train the model every time. I don't like that the model itself is being cached.

Add ARG CACHEBUST to the Dockerfile before the model training step.
In CI, CACHEBUST is set to the git SHA so every push retrains the
model instead of reusing the cached layer. For local builds, pass
CACHEBUST=$(date +%s) to force retraining.

Agent-Logs-Url: https://github.com/JosephPBaruch/CharAI/sessions/df81b0f8-f952-446f-9dbc-48430362a3fd

Co-authored-by: JosephPBaruch <132173774+JosephPBaruch@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

@copilot Is there a way to train the model every time. I don't like that the model itself is being cached.

Added a CACHEBUST build arg to the Dockerfile right before the training RUN step in 6c79401. In CI, it's set to ${{ github.sha }} so every push gets a unique value and Docker cannot reuse the cached training layer. For local builds, run CACHEBUST=$(date +%s) docker compose build backend to force retraining.

@JosephPBaruch JosephPBaruch merged commit a746184 into main Apr 14, 2026
1 check passed
@JosephPBaruch JosephPBaruch deleted the copilot/document-model-accuracy-check branch April 14, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document and enforce model accuracy check in CI for backend image creation

2 participants