Skip to content

Updating the format of our rollout (eval.py) to make it easier to calculate metrics (true to the original data).#754

Open
alxmrs wants to merge 9 commits into
mainfrom
u/alxmrs/eval-usability
Open

Updating the format of our rollout (eval.py) to make it easier to calculate metrics (true to the original data).#754
alxmrs wants to merge 9 commits into
mainfrom
u/alxmrs/eval-usability

Conversation

@alxmrs

@alxmrs alxmrs commented Jun 2, 2026

Copy link
Copy Markdown
Member

Fixes #508. This makes the three changes in this issue by correcting the underlying data (adding essential coordinate information) and propagating that data during eval time. Since this is a data fix along with an eval output fix (IMO, this was the simplest approach), I provide two data engineering corrections:

  1. I updated the original data engineering script to no longer omit the metadata that we need for evaluation. This way, we won't make the same mistake for future datasets.
  2. I've added a script to update the existing datasets in place (they should be safe since we only add new coordinate information; we don't have to re-write any of the existing chunks).

My plan is that after this PR is reviewed (and merged), I'll run the coordinate update script on the public FOMO data. For good measure, I've run the backup script on this data so we can inspect it: s3://emulators/am16581/data/2025-11/om4_onedeg_v3/OM4.zarr.

@alxmrs alxmrs changed the title Experiment: how do we conserve global ocean area? Updating the format of our rollout (eval.py) to make it easier to calculate metrics (true to the original data). Jun 3, 2026
Comment thread data/ocean_preprocessing/areacello_xesmf_check.py Outdated
@alxmrs

alxmrs commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

@codex may I have your review?

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cc2447628

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/ocean_emulators/utils/writer.py
@alxmrs

alxmrs commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Thanks, that was a genuine bug. I pushed a fix. @codex, may I have another round of review?

@alxmrs alxmrs force-pushed the u/alxmrs/eval-usability branch from 1c8b5fa to c495447 Compare June 4, 2026 00:54
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@alxmrs alxmrs requested review from adam-subel and jder June 4, 2026 01:23
@alxmrs alxmrs marked this pull request as ready for review June 4, 2026 01:23

@jder jder left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to ask @adam-subel to look at some outputs here since this is a lot of new code and I'm finding it a bit hard to follow exactly how data is flowing & being transformed now.

I would rather re-run the full pipeline rather than using/keeping the backfill script + tests to avoid any drift. What datasets would you want to backfill? Can we reprocess them instead?

More fuzzily, I am somewhat worried about the proliferation of names, magic numbers, and name patterns like foo_i in our code and tests especially given we are starting to use other datasets which presumably will not match. WDYT?

Comment on lines +33 to +79
DZ = np.array(
[
5,
10,
15,
20,
30,
50,
70,
100,
150,
200,
250,
300,
400,
500,
600,
800,
1000,
1000,
1000,
],
dtype="float64",
)
LEV = np.array(
[
2.5,
10,
22.5,
40,
65,
105,
165,
250,
375,
550,
775,
1050,
1400,
1850,
2400,
3100,
4000,
5000,
6000,
],
dtype="float64",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere we could read these from?

Comment on lines +809 to +810
base, _, idx = var_str.rpartition("_")
if base and idx.isdigit() and int(idx) < n_depths:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing this heuristically, can we read the known set of variable names off the DatasetSpec?

self.time_buffer = None

def _output_coords(self) -> dict:
"""Build CF-aligned output coordinates matching the ground-truth layout.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "CF-aligned"? What is "ground-truth layout"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CF is the "climate and forest" metadata standards. They are the conventions that are defaults in Xarray, but only relevant to some Xarray-openable datasets, like ours.

Let me get back to you on the other question with a closer look.

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.13/cf-conventions.html

Comment on lines +123 to +124
"lat": (("y", "x"), np.broadcast_to(y_vals[:, None], (ny, nx)).copy()),
"lon": (("y", "x"), np.broadcast_to(x_vals[None, :], (ny, nx)).copy()),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping the old lat/lon around as lat_2d lon_2d or something so that we don't need to recompute them here and potentially mess something up?

Comment thread tests/test_writer.py


def _source_coords(ny, nx):
"""Coords as `get_coords_dict` returns them after backfill: 1D lat/lon dims,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this have to do with backfill?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Usability of Eval Outputs

2 participants