Skip to content

Conversation

@jkikstra
Copy link

@jkikstra jkikstra commented Jan 6, 2025

This PR aims to implement a first version [v0] of the ScenarioMIP/CMIP7 emissions harmonization and gridding workflow.

It takes data from:

How to use

  1. Repo: fork this repository, checkout branch jkikstra:cmip7_v0 (if you want to contribute, please branch off and make a PR into this branch)
  2. Environment: create an environment using environment.yml
  3. Docs: work-in-progress, currently building them in docs/source/index.rst
  4. Workflow: open the jupytext file notebooks/workflow-cmip7.py as a notebook
  5. Data: currently on IIASA Sharepoint ("C:\Users\kikstra\IIASA\ECE.prog - Documents\Projects\CMIP7\IAM Data Processing\concordia_cmip7_v0_testing\input") - place this where you want and adjust the config (step 6) accordingly
  6. Config: notebooks/config_cmip7_v0_testing.yaml

When all this is done, you should be able to run workflow-cmip7.py

TODO

Common (simplified) TODO list:

  • historical emissions:

  • CMIP7 workflow:

    • Harmonization:
      • ensure 1 model beyond MESSAGE can be run (do REMIND) - add to regionmapping
      • ensure all IAM models can be run - add to regionmapping
      • switch to 2023 harmonization
      • create diagnostics for 11 April IAM submission: harmonization
    • Gridding
      • investigate MESSAGE gridding resuts:
        • issue: why was for CO2 only 'AIR' produced, and no 'anthro' or 'openburning'
        • issue: why was CH4 agriculture empty?
      • create diagnostics for 11 April IAM submission: gridded files
      • update proxy files (with Steve)
  • comparing harmonization

    • what is difference between regional harmonization and current global harmonization
    • + vetting
    • also note: CEDS from last year, not latest CEDS

Long raw todo list (notes to self)

see TODO list in docs/source/index.rst

  • check emissions other land use change variables
  • check that sector naming is aligned
    • variabledefs-cmip7_*.csv
    • CEDS (emissions_harmonization_historical)
    • GFED (emissions_harmonization_historical)
  • write IAM data aggregation and variable processing script
    • without dask?
    • [~] with dask. (note: started work on this, but does not look necessary for this workflow, so to keep things simple I removed it again)
  • update variabledefs-cmip7_noCDR.csv
  • align units between IAM and CEDS data
  • update input data: map from ssp_submission downloaded data to concordia input data
    • start from newly downloaded data
    • run pipeline on a newly submitted MESSAGE ScenarioMIP scenario (without aviation)
    • use multiple IAMs (with appropriate region mappings)
      • add region mapping code, or just the files?
  • update harmonization reporting emission files notebook
    • make it for multiple models
    • update historical data input
  • test out global-first harmonization: adding test file for top-down harmonization iiasa/aneris#79
  • make more progress on common-definitions IPCC2006 labels for machine-readable mapping; maybe ask Will, Robert, Johannes - currently working together with Luca on this...
  • update CEDS data (still harmonization in 2020)
  • create interpolation methods between 2020 and 2025
    • linear interpolation
    • interpolation based only on historical (relative) trends
    • interpolation based only on historical (absolute) trends
    • 2022 as base_year
  • update proxy .nc files, (especially for N2O)
    • new CEDS based data
    • double check N2O; also in varaiabledefs-cmip7_noCDR.csv concordia input file
  • use interpolated input files, and move harmonization to 2022
  • update gridding files with new CEDS data (from ESGF, or direct download?)
  • update to BB4CMIP7 national GFED data (from emissions_harmonization_historical?)
  • new SSP data (NB. new data is available, but not final yet, so not updated yet)
    • GDP
    • Population
  • is country_combinations still needed when SSP data have been updated?
  • create mapping file with regionmapping following ssp_submission scenario explorer mapping style, using common-definitions / nomenclature
  • register multiple models
  • try new harmonization algorithms
  • update variabledefs-cmip7_*.csv to have CDR
  • deal properly with units and minor gases
  • remove alkalinity option?

Updated to-do list (28.03.2025):

Updated short-term to-do list (03.04.2025):

  • work on docs and figure out how to do the gridding
    • workflow.grid (different steps)
    • input files (of workflow.grid; for scenario-based gridding)
      • ensure same format as in workflow-rescue.py
        • model
          • columns were swapped in order: 'gas' and 'sector'. will be fixed in where we do scens_iam_wide = (scens_iam.rename.pivot_table(index=[order_array]). does not affect downscaling outcomes
        • hist
          • otherwise mostly the same, noting some differences:
            • a bit longer because N2O now with sector and country data instead of global total
            • no CDR sector variables in cmip7 history file
          • columns were swapped in order: 'gas' and 'sector'. will be fixed in where we do hist_wide = (hist_long.rename.pivot_table(index=[order_array]). does not affect downscaling outcomes
        • gdp
          • same format, but cmip7 gdp starts from 2020
        • regionmapping
          • same format, with cmip7 having about a dozen more countries covered
        • indexraster_country
          • same format
        • indexraster_region
          • same format
        • variabledefs
          • same format
        • harm_overrides
          • same format
        • settings
          • same format, but check some differences:
            • variable_template: is this correct in cmip7?
            • proxy_path: same files in this folder?
            • gridding_path: same files in this folder?
            • postprocess_path: same files in this folder?
        • FOUND ISSUE: inconsistency between region name of regionmapping and region name in model data (MESSAGEix-GLOBIOM-GAINS-2.1-R12 vs. MESSAGEix-GLOBIOM-2.1-R12); with this fixed, the
          • write an assert statement before (or in) the workflowdriver that checks this for the future
    • generating 'static' (historical emissions-based) inputs before gridding

Copied from index.rst:
The list below is ordered.
v0

  • check that sector naming is aligned
    • variabledefs-cmip7_*.csv
    • CEDS (emissions_harmonization_historical)
    • GFED (emissions_harmonization_historical)
  • write IAM data aggregation and variable processing script
    • without dask?
    • [~] with dask. (note: started work on this, but does not look necessary for this workflow, so to keep things simple I removed it again)
  • update variabledefs-cmip7_noCDR.csv
  • align units between IAM and CEDS data
  • update input data: map from ssp_submission downloaded data to concordia input data
    • start from newly downloaded data
    • run pipeline on a newly submitted MESSAGE ScenarioMIP scenario (without aviation)
    • use multiple IAMs (with appropriate region mappings)
      • add region mapping code, or just the files?
  • update harmonization reporting emission files notebook
    • make it for multiple models
    • update historical data input
  • update CEDS data to July-2024 version (still harmonization in 2020)
  • create overview by model
    • of missing sector-species information
    • HTML files to browse
  • update proxy .nc files, (especially for N2O)
    • new CEDS based data
    • double check N2O; also in varaiabledefs-cmip7_noCDR.csv concordia input file

v1

  • update CEDS data to 2025 version

    • national data;
      • update to latest aggregate data
    • grids
      • proxy data (waiting for Steve to explain)
      • ESGF: aggregate to check against the latest aggregate data
  • test out global-first harmonization: adding test file for top-down harmonization iiasa/aneris#79

  • test out 10 vs 30 yr grids for biomass burning

  • create interpolation methods between 2020 and 2025

    • linear interpolation
    • interpolation based only on historical (relative) trends
    • interpolation based only on historical (absolute) trends
    • 2023 as base_year (possible after updating GFED CMIP7 ) + necessary IAM interpolation - annika
  • use interpolated input files, and move harmonization to 2022

  • update gridding files with new CEDS data (from ESGF, or direct download?)

  • update to BB4CMIP7 national GFED data (from emissions_harmonization_historical?)

  • new SSP data

    • GDP
    • Population
  • is country_combinations still needed?

  • create mapping file with regionmapping following ssp_submission scenario explorer mapping style, using common-definitions / nomenclature

  • register multiple models

  • try new harmonization algorithms

  • update variabledefs-cmip7_*.csv to have CDR

  • deal properly with units and minor gases (NO)

  • remove alkalinity option?

  • remake rasters .nc using data (https://iiasahub.sharepoint.com/:f:/r/teams/RESCUE/Shared%20Documents/WP%201/data_2024_09_16/gridding_process_files/ceds_input/input/gridding?csf=1&web=1&e=1OHegg) and script (notebooks\gridding_data\generate_non_ceds_proxy_netcdfs.py)

  • think about moving stuff directly into emissions_historical_harmonization

  • check/update ssp_comb_indexraster.nc

  • produce netCDF files for REMIND

    • switch to REMIND scenario from the April 11 resubmission (jarmo)
      • ...includes: use region-mapping file from emissions_historical_harmonization
  • deal with small countries having no GDP data

  • historical data: check e.g. AWB and Forest Burning (World) data. 2015 and 2020 emissions, but suspiciously zero in the years between and after?

  • add a CMIP7 version of rescue_utils.DS_ATTRS in a cmip7_utils file

  • Harmonization code understanding:

    • global: clarify hist CO2 agriculture (but not in model)
    • country: why is all_countries; workflow.regionmapping.data.index; only 194 long? GDP proxy?
    • country: describe how the WorkflowDriver.country_groups Iterator works
  • add test that scenario region names are all covered in the regionmapping

  • From updated yaml for regionmapping files jkikstra/concordia#3: ensure that there is a good strategy for missing countries (minor effect, we have most countries)

    Details


    I think we need a strategy of how to deal with small territories/missing countries.
    data:
    gfed (iso mask)
    ceds (aggregate data)
    ceds (proxy masks): https://iiasahub.sharepoint.com/:f:/r/sites/eceprog/Shared%20Documents/Projects/CMIP7/IAM%20Data%20Processing/concordia_cmip7_v0_testing/input/gridding/20250523/Jarmo_files/mask?csf=1&web=1&e=ppMlcZ
    IAM region mappings
    GDP
    For this, we want to understand
    how the current workflow deals with emissions for 'missing' regions
    how we make sure we're not 'losing' any emissions
    how we make sure that the downsclaing of combined regions makes sense
    how we make sure that the gridding of combined regions makes sense

other

  • ask Will, Robert, Johannes - about ... ipcc category mapping for common-definitions ...; this may be more of an emissions_harmonization_historical thing.

jkikstra and others added 30 commits December 10, 2024 00:31
no harmonization report yet
no gridded data yet
* delete data after 2100
* `scens_iam_wide` instead of less-descriptive `model`
TODO: use local config file for data location
tested harmonization
did not test gridding
This reverts commit 2cd8b12.
jkikstra and others added 30 commits September 19, 2025 17:04
Pattern harmonisation (fix spatial pattern deviations to 2023 CEDS)
CMIP7 v0-3-0 for first alpha upload ESGF (under CMIP6Plus)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants