-
Notifications
You must be signed in to change notification settings - Fork 10
feat: add CMIP6-to-CMIP7 Data Request variable mappings #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds bundled CMIP6→CMIP7 Data Request (DReq) variable mappings and updates Climate REF’s CMIP7 conversion + ESMValTool provider/diagnostics so CMIP7 datasets can be handled via OR-logic data requirements and correct CMIP7 DRS/filename conventions.
Changes:
- Introduces a frozen
DReqVariableMappingmodel and loads a bundledcmip6_cmip7_variable_map.jsonat import time for branding/realm/compound-name lookups. - Adds/updates conversion and filename/path generation to align with CMIP7 (MIP-DRS7) and updates CMIP7 conversion caching behavior.
- Updates ESMValTool diagnostics/recipe/config handling to accept either CMIP6 or CMIP7 inputs, plus provider setup to install pinned ESMValTool/ESMValCore via pip URLs.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/extract-data-request-mappings.py | New script to download/filter DReq export and generate the bundled mapping JSON. |
| scripts/create-cmip7-datasets.py | Writes CMIP7-style filenames when converting sample datasets. |
| packages/climate-ref-esmvaltool/tests/unit/diagnostics/test_base.py | Updates expectations for the generated ESMValTool config (projects/search settings). |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/recipe.py | Adds CMIP7 facet mapping and pins ESMValTool/ESMValCore git URLs. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/base.py | Adds CMIP6/CMIP7 selector helper + rewrites ESMValTool config to include CMIP7 local templates. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/zec.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically for recipe updates. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcre.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcr.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_sensitivity.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_area_basic.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/regional_historical_changes.py | Adds CMIP7 requirements + CMIP7 ESGF test cases using CMIP7Request. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/example.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/enso.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/ecs.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_scatterplots.py | Generalizes requirements to CMIP6+CMIP7; suptitle now uses project. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_radiative_effects.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_drivers_for_fire.py | Adds CMIP7 alternative requirements + selects CMIP source dynamically. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_at_global_warming_levels.py | Adds CMIP7 alternative requirements + CMIP7-specific grouping/matching facets in recipe update. |
| packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/init.py | Switches provider setup to install ESMValTool/ESMValCore via pip_packages. |
| packages/climate-ref-core/tests/unit/test_providers.py | Adds unit test coverage for pip_packages installation behavior. |
| packages/climate-ref-core/tests/unit/test_cmip6_to_cmip7.py | Reworks tests around DReq-backed branding/realm/compound-name lookups + serialization. |
| packages/climate-ref-core/tests/unit/esgf/test_cmip7.py | Updates CMIP7 conversion tests for new filename behavior. |
| packages/climate-ref-core/src/climate_ref_core/providers.py | Replaces single dev install URL with a list of pip_packages installed post-conda-create. |
| packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py | Generates CMIP7-style output filename for cached conversions. |
| packages/climate-ref-core/src/climate_ref_core/data/cmip6_cmip7_variable_map.json | Adds bundled subset of DReq mappings shipped with the package. |
| packages/climate-ref-core/src/climate_ref_core/cmip6_to_cmip7.py | Loads bundled DReq mappings; updates branding/realm/compound-name logic; adds CMIP7 filename/path helpers. |
| changelog/519.feature.md | Changelog entry for CMIP7 support via OR-logic requirements. |
| .vscode/settings.json | Editor setting update for Python REPL smart send. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/climate-ref-core/src/climate_ref_core/data/cmip6_cmip7_variable_map.json
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
…ppings Replace raw dict usage with a frozen attrs class for type-safe serialisation/deserialisation of Data Request variable mappings. The class is used both in the extract script (to_dict for JSON output) and at load time (from_dict when reading the bundled JSON).
- Replace __file__ with stable relative path in extract script metadata - Fix bundled JSON description containing absolute path - Validate branding suffix format before splitting in extract script - Fix docstring to match raise-on-duplicate behavior - Move cache check before xr.open_dataset in CMIP7 converter
52c7275 to
e327bfa
Compare
The function was using cmip6_path.name for the output filename instead of generating a proper CMIP7 filename. Import and use create_cmip7_filename to produce correct CMIP7 DRS filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix docstring grammar in extract script - Add 60s timeout to urllib.request.urlopen - Add assert_not_called checks for mock_open/mock_convert in cache test
…nstruction _convert_file_to_cmip7 now looks up the DReq entry using table_id and variable_id to inject branding_suffix and region into the facets before calling create_cmip7_filename. This prevents empty branding components in the generated filenames. Tests no longer mock create_cmip7_filename, instead providing table_id in the facets so real filename construction is exercised.
For variables where out_name differs from variable_id (e.g. tasmax -> tas), the filename and DRS path now correctly use the CMIP7 out_name. The variable_id attribute in the dataset stays as the CMIP6 identity. - create_cmip7_filename and create_cmip7_path prefer out_name over variable_id (with fallback) - convert_cmip6_to_cmip7_attrs sets out_name and branded_variable from the DReq entry - _convert_file_to_cmip7 injects out_name during DReq enrichment - Added end-to-end tests for tasmax filename generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| str(cmip7_facets.get("frequency", "mon")), | ||
| str(cmip7_facets.get("variable_id", "tas")), | ||
| str(cmip7_facets.get("grid_label", "gn")), |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_file is now generated from facets via create_cmip7_filename(cmip7_facets) but no time_range is passed. For multi-file datasets split by time (typical on ESGF), every slice for the same variable/experiment will map to the same CMIP7 filename and collide/overwrite in the cache directory. Consider extracting the time range from cmip6_path.name (or from the dataset’s time coordinate) and passing it through to create_cmip7_filename, or otherwise incorporating the original CMIP6 timerange suffix to keep filenames unique per file.
- Make _get_dreq_entry a public API (get_dreq_entry) for cross-module use - Use out_name in DRS path construction in _convert_file_to_cmip7 - Use DReq region instead of hardcoded 'glb' in convert_cmip6_to_cmip7_attrs - Fix pytest parametrize ids to use pytest.param with explicit id strings - Add tests for out_name != variable_id (tasmax/tasmin) and non-glb region (ImonAnt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py:71
- The inline comment describing the CMIP7 DRS path still says the variable component is
{variable_id}, but the code now usesout_name(and the corecreate_cmip7_pathusesout_nametoo). Update the comment to avoid documenting the wrong facet and confusing future changes.
# Build CMIP7 DRS path
# CMIP7 DRS: {activity_id}/{institution_id}/{source_id}/{experiment_id}/
# {variant_label}/{frequency}/{variable_id}/{grid_label}/{version}
# Ensure all facet values are strings (some may be integers from metadata)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Create output filename and check cache before opening the source file | ||
| output_file = drs_path / create_cmip7_filename(cmip7_facets) | ||
|
|
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_cmip7_filename(cmip7_facets) will emit empty components when keys like source_id/experiment_id/variant_label are missing (it defaults to empty strings), producing filenames with double underscores. Since _convert_file_to_cmip7 already applies defaults when building drs_path, consider normalizing cmip7_facets with the same defaults before calling create_cmip7_filename so cache paths/filenames are consistent and spec-like.
| } | ||
|
|
||
| output.parent.mkdir(parents=True, exist_ok=True) | ||
| with open(output, "w") as f: |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing the JSON output, consider opening the file with an explicit encoding="utf-8" to make the script’s output deterministic across platforms/locales.
| with open(output, "w") as f: | |
| with open(output, "w", encoding="utf-8") as f: |
…ments * origin/dr-mappings: fix: remove redundant information from breaking changelog entry for Data Request API docs: add breaking changelog entry for DReq API changes fix: address third round of PR review comments fix: use out_name from DReq for CMIP7 filenames and paths fix: enrich cmip7_facets with DReq branding_suffix before filename construction chore: exclude setting chnage fix: address second round of PR review comments fix(core): use create_cmip7_filename in _convert_file_to_cmip7 fix: address PR review comments docs: add changelog entry for PR #530 refactor(core): update documentation and remove unused CMIP7 name mapping functions feat(core): add DReqVariableMapping attrs class for CMIP6-to-CMIP7 mappings # Conflicts: # packages/climate-ref-core/src/climate_ref_core/cmip6_to_cmip7.py # packages/climate-ref-core/src/climate_ref_core/esgf/cmip7.py
Description
Add structured CMIP6-to-CMIP7 variable mappings sourced from the CMIP7 Data Request (DReq).
In the data request there are unique mappings of cmip6_compound_name ({table_id}.{variable_id}) to cmip7_compound_name. This is now the source of truth for mapping to branded variables.
Key changes:
extract-data-request-mappings.pyscript: Downloads the DReq release export, extracts variable mappings filtered to mon/fx tables and REF provider variables, and writes the bundled JSON.cmip6_cmip7_variable_map.json): Pre-extracted subset of DReq mappings shipped with climate-ref-core.Checklist
Please confirm that this pull request has done the following:
changelog/