Updating the format of our rollout (eval.py) to make it easier to calculate metrics (true to the original data).#754
Updating the format of our rollout (eval.py) to make it easier to calculate metrics (true to the original data).#754alxmrs wants to merge 9 commits into
eval.py) to make it easier to calculate metrics (true to the original data).#754Conversation
eval.py) to make it easier to calculate metrics (true to the original data).
|
@codex may I have your review? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cc2447628
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Thanks, that was a genuine bug. I pushed a fix. @codex, may I have another round of review? |
… issues, provided the data is properly structured.
1c8b5fa to
c495447
Compare
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
jder
left a comment
There was a problem hiding this comment.
Would be great to ask @adam-subel to look at some outputs here since this is a lot of new code and I'm finding it a bit hard to follow exactly how data is flowing & being transformed now.
I would rather re-run the full pipeline rather than using/keeping the backfill script + tests to avoid any drift. What datasets would you want to backfill? Can we reprocess them instead?
More fuzzily, I am somewhat worried about the proliferation of names, magic numbers, and name patterns like foo_i in our code and tests especially given we are starting to use other datasets which presumably will not match. WDYT?
| DZ = np.array( | ||
| [ | ||
| 5, | ||
| 10, | ||
| 15, | ||
| 20, | ||
| 30, | ||
| 50, | ||
| 70, | ||
| 100, | ||
| 150, | ||
| 200, | ||
| 250, | ||
| 300, | ||
| 400, | ||
| 500, | ||
| 600, | ||
| 800, | ||
| 1000, | ||
| 1000, | ||
| 1000, | ||
| ], | ||
| dtype="float64", | ||
| ) | ||
| LEV = np.array( | ||
| [ | ||
| 2.5, | ||
| 10, | ||
| 22.5, | ||
| 40, | ||
| 65, | ||
| 105, | ||
| 165, | ||
| 250, | ||
| 375, | ||
| 550, | ||
| 775, | ||
| 1050, | ||
| 1400, | ||
| 1850, | ||
| 2400, | ||
| 3100, | ||
| 4000, | ||
| 5000, | ||
| 6000, | ||
| ], | ||
| dtype="float64", |
There was a problem hiding this comment.
Is there somewhere we could read these from?
| base, _, idx = var_str.rpartition("_") | ||
| if base and idx.isdigit() and int(idx) < n_depths: |
There was a problem hiding this comment.
Instead of testing this heuristically, can we read the known set of variable names off the DatasetSpec?
| self.time_buffer = None | ||
|
|
||
| def _output_coords(self) -> dict: | ||
| """Build CF-aligned output coordinates matching the ground-truth layout. |
There was a problem hiding this comment.
What is "CF-aligned"? What is "ground-truth layout"?
There was a problem hiding this comment.
CF is the "climate and forest" metadata standards. They are the conventions that are defaults in Xarray, but only relevant to some Xarray-openable datasets, like ours.
Let me get back to you on the other question with a closer look.
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.13/cf-conventions.html
| "lat": (("y", "x"), np.broadcast_to(y_vals[:, None], (ny, nx)).copy()), | ||
| "lon": (("y", "x"), np.broadcast_to(x_vals[None, :], (ny, nx)).copy()), |
There was a problem hiding this comment.
How about keeping the old lat/lon around as lat_2d lon_2d or something so that we don't need to recompute them here and potentially mess something up?
|
|
||
|
|
||
| def _source_coords(ny, nx): | ||
| """Coords as `get_coords_dict` returns them after backfill: 1D lat/lon dims, |
There was a problem hiding this comment.
What does this have to do with backfill?
Fixes #508. This makes the three changes in this issue by correcting the underlying data (adding essential coordinate information) and propagating that data during eval time. Since this is a data fix along with an eval output fix (IMO, this was the simplest approach), I provide two data engineering corrections:
My plan is that after this PR is reviewed (and merged), I'll run the coordinate update script on the public FOMO data. For good measure, I've run the backup script on this data so we can inspect it:
s3://emulators/am16581/data/2025-11/om4_onedeg_v3/OM4.zarr.