This repository contains scripts to download, preprocess, standardize, and consolidate the catalogues available in the CDS.
The environment is the one used in the c3s-atlas user tools: https://github.com/ecmwf-projects/c3s-atlas/blob/main/environment.yml
| Directory | Contents |
|---|---|
| requests | Contains one CSV file per CDS catalogue, listing the requested variables, temporal resolution, interpolation method, the target save directory, and whether the variable is raw or requires post-processing to be standardized. |
| provenance | Contains one JSON file per catalogue describing the provenance and definitions of each variable. |
| scripts/download | Python scripts to download data from the CDS. |
| scripts/standardization | Python recipes to standardize the variables. |
| scripts/derived | Python recipes to calculate derived products from the variables. |
| scripts/interpolation | Python recipes to interpolate data using reference grids. |
| scripts/catalogue | Python recipes to produce the catalogues of downloaded data. |
| catalogues | CSV catalogues of datasets consolidated in Lustre or GPFS. The catalogues are updated through a nightly CI job. |
The repository uses a structured directory path format to organize downloaded, derived, and interpolated data:
{base_path}/{product_type}/{dataset}/{temporal_resolution}/{interpolation}/{variable}/
Examples:
- Raw ERA5 hourly wind components:
/lustre/.../raw/reanalysis-era5-single-levels/hourly/native/u10/
Note: Interpolated data is stored under derived with the interpolation field indicating the target grid (e.g., gr006). This distinguishes it from calculated variables which use interpolation=native.
Format of the files is "{var}_{dataset}_{date}.nc" With date:
- "{year}{month}" for big datasets like CERRA saved month by month (download is faster this way).
- "{year}" for the other datasets the data is saved year by year.
Before downloading data, you can create the complete folder structure without downloading or calculating any data:
# Preview what directories would be created (dry-run mode)
python scripts/create_folder_structure.py --dry-run
# Create all directories
python scripts/create_folder_structure.pyThe script reads all CSV files in the requests/ directory and creates the directory structure according to the format:
{base_path}/{product_type}/{dataset}/{temporal_resolution}/{interpolation}/{variable}/