Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
# Required
version: 2


# Set the version of Python and other tools you might need
build:
os: ubuntu-24.04
Expand All @@ -20,4 +19,4 @@ build:
- make fetch-test-data
- uv run ref datasets ingest --source-type cmip6 $READTHEDOCS_REPOSITORY_PATH/tests/test-data/sample-data/CMIP6
# Run a strict build
- NO_COLOR=1 uv run mkdocs build --strict --site-dir $READTHEDOCS_OUTPUT/html
- unset NO_COLOR; FORCE_COLOR=1 uv run mkdocs build --strict --site-dir $READTHEDOCS_OUTPUT/html
1 change: 1 addition & 0 deletions changelog/466.docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add a Jupyter notebook showing how to use the CMIP7 Assessment Fast Track website OpenAPI.
32 changes: 16 additions & 16 deletions docs/how-to-guides/dataset-selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.16.4
# jupytext_version: 1.17.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
Expand Down Expand Up @@ -121,7 +121,7 @@ def display_groups(frames):


# %% [markdown]

#
# ### Facet filters
# The simplest data request is a `FacetFilter`.
# This filters the data catalog to include only the data required for a given diagnostic run.
Expand All @@ -141,7 +141,7 @@ def display_groups(frames):
display_groups(groups)

# %% [markdown]

#
# ### Group by
# The `group_by` field can be used to split the filtered data into multiple groups,
# each of which has a unique set of values in the specified facets.
Expand All @@ -166,15 +166,13 @@ def display_groups(frames):


# %% [markdown]

#
# ### Constraints
# A data requirement can optionally specify `Constraint`s.
# These constraints are applied to each group independently to modify a group or ignore it.
# All constraints must hold for a group to be executed.
# A group must not be empty after modification for it to be executed.
#
# One type of constraint is a `GroupOperation`.
# This constraint allows for the manipulation of a given group.
# This can be used to remove datasets or include additional datasets from the catalog,
# Constraints can be used to remove datasets or include additional datasets from the catalog,
# which is useful to select common datasets for all groups (e.g. cell areas).
#
# Below, an `IncludeTas` GroupOperation is included which adds the corresponding `tas` dataset to each group.
Expand All @@ -187,6 +185,8 @@ def apply(self, group: pd.DataFrame, data_catalog: pd.DataFrame) -> pd.DataFrame
tas = data_catalog[
(data_catalog["variable_id"] == "tas")
& data_catalog["source_id"].isin(group["source_id"].unique())
& data_catalog["experiment_id"].isin(group["experiment_id"].unique())
& data_catalog["member_id"].isin(group["member_id"].unique())
]

return pd.concat([group, tas])
Expand All @@ -195,7 +195,7 @@ def apply(self, group: pd.DataFrame, data_catalog: pd.DataFrame) -> pd.DataFrame
data_requirement = DataRequirement(
source_type=SourceDatasetType.CMIP6,
filters=(FacetFilter(facets={"frequency": "mon"}),),
group_by=("variable_id", "source_id", "member_id"),
group_by=("variable_id", "source_id", "member_id", "experiment_id"),
constraints=(IncludeTas(),),
)

Expand All @@ -205,26 +205,26 @@ def apply(self, group: pd.DataFrame, data_catalog: pd.DataFrame) -> pd.DataFrame


# %% [markdown]
# In addition to operations, a `GroupValidator` constraint can be specified.
# This validator is used to determine if a group is valid or not.
# If the validator does not return True, then the group is excluded from the list of groups for execution.
# In addition to operations adding datasets, it is also possible to remove datasets.


# %%
class AtLeast2:
def validate(self, group: pd.DataFrame) -> bool:
return len(group["instance_id"].drop_duplicates()) >= 2
def apply(self, group: pd.DataFrame, data_catalog: pd.DataFrame) -> pd.DataFrame:
if len(group["variable_id"].drop_duplicates()) >= 2:
return group
return group.loc[[]]


# %% [markdown]
# Here we add a simple validator which ensures that at least 2 unique datasets are present.
# This removes the tas-only group from above.
# This removes the groups from above where tas was not available.

# %%
data_requirement = DataRequirement(
source_type=SourceDatasetType.CMIP6,
filters=(FacetFilter(facets={"frequency": "mon"}),),
group_by=("variable_id", "source_id", "member_id"),
group_by=("variable_id", "source_id", "member_id", "experiment_id"),
constraints=(IncludeTas(), AtLeast2()),
)

Expand Down
285 changes: 285 additions & 0 deletions docs/how-to-guides/using-pre-computed-results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---

# %% [markdown]
# # Using pre-computed results
#
# Results computed by the CMIP7 Assessment Fast Track Rapid Evaluation Framework are available from
# the website: https://dashboard.climate-ref.org and the associated API: https://api.climate-ref.org.
# This API provides an [OpenAPI](https://www.openapis.org) schema that documents what queries are available.
# The API documentation can be viewed at: https://api.climate-ref.org/docs.
#
# This Jupyter notebook shows how to use this API to download pre-computed results and use those to do
# your own analyses.

# %% [markdown]
# ## Generate and install
#
# We start by generating and installing a Python package for interacting with the API
# from the OpenAPI-compatible [schema](https://api.climate-ref.org/api/v1/openapi.json).

# %%
# !uvx --quiet --from openapi-python-client openapi-python-client generate --url https://api.climate-ref.org/api/v1/openapi.json --meta setup --output-path climate_ref_client --overwrite

# %%
# !pip install --quiet ./climate_ref_client

# %% [markdown]
# ## Set up the notebook
#
# Import some libraries and load the [rich](https://rich.readthedocs.io/en/latest/introduction.html)
# Jupyter notebook extension for conveniently viewing the large data structures produced by the client
# package.

# %%
from pathlib import Path

import cartopy.crs
import matplotlib.pyplot as plt
import pandas as pd
import requests
import seaborn as sns
import xarray as xr
from climate_rapid_evaluation_framework_client import Client
from climate_rapid_evaluation_framework_client.api.diagnostics import (
diagnostics_list,
diagnostics_list_metric_values,
)
from climate_rapid_evaluation_framework_client.api.executions import executions_get
from climate_rapid_evaluation_framework_client.models.metric_value_type import (
MetricValueType,
)
from IPython.display import Markdown
from pandas_indexing import formatlevel

# %%
# %load_ext rich

# %% [markdown]
# ## View the available diagnostics
#
# We start by setting up a client for interacting with the server:

# %%
client = Client("https://api.climate-ref.org")

# %% [markdown]
# Retrieve the available diagnostics from the server, and inspect the first one:

# %%
diagnostics = diagnostics_list.sync(client=client).data
diagnostics[0]

# %% [markdown]
# To get an idea of what is available, we create a list of all diagnostics
# with short descriptions (a full overview is available in Appendix C of
# [Hoffman et al., 2025](https://doi.org/10.5194/egusphere-2025-2685)):

# %%
txt = ""
for diagnostic in sorted(diagnostics, key=lambda diagnostic: diagnostic.name):
title = f"### {diagnostic.name}"
description = diagnostic.description.strip()
if not description.endswith("."):
description += "."
if diagnostic.aft_link:
description += f" {diagnostic.aft_link.short_description.strip()}"
if not description.endswith("."):
description += "."
if (aft_description := diagnostic.aft_link.description.strip()) != "nan":
description += f" {aft_description}"
if not description.endswith("."):
description += "."
txt += f"{title}\n{description}\n\n"
Markdown(txt)

# %% [markdown]
# ## Metrics
#
# Many of the diagnostics provide "metric" values, single values that describe some property
# of a model. Here we show how to access these values and create a plot.

# %%
# Select the "Atlantic Meridional Overturning Circulation (RAPID)"
# diagnostic as an example
diagnostic_name = "Atlantic Meridional Overturning Circulation (RAPID)"
diagnostic = next(d for d in diagnostics if d.name == diagnostic_name)
# Inspect an example value.
diagnostics_list_metric_values.sync(
diagnostic.provider.slug,
diagnostic.slug,
value_type=MetricValueType.SCALAR,
client=client,
).data[0]

# %% [markdown]
# Read the metric values into a Pandas DataFrame:

# %%
df = (
pd.DataFrame(
metric.dimensions.additional_properties | {"value": metric.value}
for metric in diagnostics_list_metric_values.sync(
diagnostic.provider.slug,
diagnostic.slug,
value_type=MetricValueType.SCALAR,
client=client,
).data
)
.replace("None", pd.NA)
.drop_duplicates()
)
# Drop a few columns that appear to be the same for all entries of
# particular diagnostic.
df.drop(columns=["experiment_id", "metric", "region"], inplace=True)
# Use the columns that do not contain the metric value for indexing
df.set_index([c for c in df.columns if c != "value"], inplace=True)
df

# %% [markdown]
# and create a portrait diagram:

# %%
# Use the median metric value for models with multiple ensemble
# members to keep the figure readable.
df = df.groupby(level=["source_id", "grid_label", "statistic"]).median()
# Convert df to a "2D" dataframe for use with the seaborn heatmap plot
df_2D = (
formatlevel(df, model="{source_id}.{grid_label}", drop=True)
.reset_index()
.pivot(columns="statistic", index="model", values="value")
)
figure, ax = plt.subplots(figsize=(5, 8))
_ = sns.heatmap(
df_2D / df_2D.median(),
annot=df_2D,
cmap="viridis",
linewidths=0.5,
ax=ax,
cbar_kws={"label": "Color indicates value relative to the median"},
)
# %% [markdown]
# ## Series
#
# Many of the diagnostics provide "series" values, a range of values along with an index
# that describe some property of a model. Here we show how to access these values and create a plot.

# %%
# Select the "Sea Ice Area Basic Metrics" diagnostic as an example
diagnostic_name = "Sea Ice Area Basic Metrics"
diagnostic = next(d for d in diagnostics if d.name == diagnostic_name)
# Inspect an example series value:
diagnostics_list_metric_values.sync(
diagnostic.provider.slug,
diagnostic.slug,
value_type=MetricValueType.SERIES,
client=client,
).data[0]

# %% [markdown]
# Read the metric values into a Pandas DataFrame:

# %%
statistic_name = "20-year average seasonal cycle"
value_name = "sea ice area (1e6 km2)"
df = pd.DataFrame(
metric.dimensions.additional_properties | {value_name: value, "month": int(month)}
for metric in diagnostics_list_metric_values.sync(
diagnostic.provider.slug,
diagnostic.slug,
value_type=MetricValueType.SERIES,
client=client,
).data
if metric.dimensions.additional_properties["statistic"].startswith(statistic_name)
for value, month in zip(metric.values, metric.index)
if value < 1e10 # Ignore some invalid values.
)
df

# %% [markdown]
# and create a plot:

# %%
_ = sns.relplot(
data=df.sort_values("source_id"),
x="month",
y=value_name,
col="region",
hue="source_id",
kind="line",
)
# %% [markdown]
# ## Files
#
# Many of the diagnostics produce NetCDF files that can be used for further analysis or custom plotting.
# We will look at the global warming levels diagnostic and create our own figure using the available data.
#
# Each diagnostic can be run (executed) multiple times with different input data. The global warmings
# levels diagnostic has been executed several times, leading to multiple "execution groups":

# %%
diagnostic_name = "Climate at Global Warming Levels"
diagnostic = next(d for d in diagnostics if d.name == diagnostic_name)
[executions_get.sync(g, client=client).key for g in diagnostic.execution_groups]

# %% [markdown]
# Let's select the "ssp585" scenario and look at the output files that were produced:

# %%
for group in diagnostic.execution_groups:
execution = executions_get.sync(group, client=client)
if execution.key.endswith("ssp585"):
ssp585_outputs = execution.latest_execution.outputs
break
else:
msg = "Failed to find the ssp585 execution group"
raise ValueError(msg)
[o.filename for o in ssp585_outputs]

# %% [markdown]
# Select one of the output files and inspect it:

# %%
filename = "tas/plot_gwl_stats/CMIP6_mm_mean_2.0.nc"
file = next(f for f in ssp585_outputs if f.filename.endswith(filename))
file

# %% [markdown]
# Download the file and open it with `xarray`:

# %%
local_file = Path(Path(file.filename).name)
local_file.write_bytes(requests.get(file.url, timeout=120).content)
ds = xr.open_dataset(local_file).drop_vars("cube_label")
ds

# %% [markdown]
# Create our own plot:

# %%
plot = ds.tas.plot.contourf(
cmap="viridis",
vmin=-30,
vmax=30,
levels=11,
figsize=(12, 5),
transform=cartopy.crs.PlateCarree(),
subplot_kws={
"projection": cartopy.crs.Orthographic(
central_longitude=-100,
central_latitude=40,
),
},
)
_ = plot.axes.coastlines()
Loading