Skip to content

feat: add esgf_data_catalog fixture and solver regression baselines#529

Merged
lewisjared merged 9 commits intomainfrom
solve-helpers
Feb 12, 2026
Merged

feat: add esgf_data_catalog fixture and solver regression baselines#529
lewisjared merged 9 commits intomainfrom
solve-helpers

Conversation

@lewisjared
Copy link
Contributor

Description

Add infrastructure for testing solver behavior without requiring sample data downloads:

  • esgf_data_catalog fixture: Session-scoped fixture in conftest_plugin.py that provides dict[SourceDatasetType, pd.DataFrame] from pre-generated parquet catalogs. Tests that only need DataFrames for solver logic can use this instead of the data_catalog fixture (which triggers sample data downloads).

  • PMP climatology parquet catalog: Generated from the pooch-cached PMP_obs4MIPsClims data (25 entries), enabling PMP solver regression testing.

  • Per-provider regression baselines: Deterministic, alphabetically-sorted YAML baselines for example (3175 lines), ESMValTool (5120 lines), ILAMB (440 lines), and PMP (6026 lines). Any change to solver logic, constraints, or data requirements will show up as a diff. Regenerate with --force-regen.

  • Fixture swap: test_solve_helpers.py and test_solve_regression.py now use esgf_data_catalog instead of data_catalog, removing their dependency on sample data.

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Add a library module (solve_helpers.py) with functions to generate
parquet catalogs from local datasets, run the solver on catalogs,
and format results for human inspection and regression testing.

- solve_helpers.py: generate_catalog, load_solve_catalog, solve_to_results,
  format_solve_results_table/json, solve_results_for_regression
- scripts/generate_esgf_catalog.py: CLI to produce parquet catalogs
- test_solve_helpers.py: 18 unit tests for helper functions
- test_solve_regression.py: regression tests using parquet catalogs
- conftest.py: simplify esgf_solve_catalog fixture to use load_solve_catalog
Add esgf_data_catalog fixture that provides data catalogs from parquet
files instead of requiring sample data downloads. Switch solve_helpers
and solve_regression tests to use it.

Generate PMP climatology parquet catalog and per-provider regression
baselines (example, esmvaltool, ilamb, pmp) to detect changes in
solver logic and constraints.
Copilot AI review requested due to automatic review settings February 12, 2026 02:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ESGF parquet-catalog infrastructure and regression baselines so solver behavior can be tested deterministically without downloading sample datasets.

Changes:

  • Introduces solve_helpers utilities to generate/load parquet catalogs and produce regression-friendly solver outputs.
  • Adds a session-scoped esgf_data_catalog fixture that fails fast when the parquet catalogs are missing.
  • Adds solver regression tests and per-provider YAML baselines (generated output) plus a catalog-generation script.

Reviewed changes

Copilot reviewed 7 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/generate_esgf_catalog.py CLI to scan local archives and write parquet catalogs used by tests.
packages/climate-ref/src/climate_ref/solve_helpers.py Core helpers to generate/load catalogs and format solver outputs for regression testing.
packages/climate-ref/src/climate_ref/conftest_plugin.py Adds esgf_data_catalog fixture backed by parquet catalogs (no sample downloads).
packages/climate-ref/tests/unit/test_solve_helpers.py Unit tests for catalog IO and solver result formatting using parquet-backed fixture.
packages/climate-ref/tests/unit/test_solve_regression.py Regression tests that check solver output against deterministic baselines.
packages/climate-ref/tests/unit/test_solve_regression/test_solve_regression_per_provider_ilamb_.yml Adds ILAMB regression baseline data for per-provider tracking.
changelog/529.feature.md Changelog entry describing the new fixture + regression baseline approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Narrow exception catch from bare Exception to specific types
  (FileNotFoundError, OSError, ValueError) in generate_catalog
- Update load_solve_catalog docstring to include cmip7_catalog.parquet
- Use resolved path in esgf_data_catalog fixture error message
- Split regression baselines to one file per diagnostic for easier
  review of changes
- Move per-provider regression tests to provider test suites
- Add test for key collision detection within a diagnostic
Copilot AI review requested due to automatic review settings February 12, 2026 04:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 57 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace str.replace() with startswith() check + slicing so that
the prefix is only removed from the beginning of paths, not from
arbitrary positions within the string.
The regression tests were not calling provider.configure() which applies
the default_ignore_datasets.yaml entries. This caused test ordering
side-effects: when integration tests ran first and called configure()
on the shared provider singleton, CAS-ESM2-0 was excluded, but when
regression tests ran in isolation, CAS-ESM2-0 was included.

Now all regression test fixtures explicitly call provider.configure()
to match real-world solver behavior. Regenerated the sea-ice-sensitivity
baseline to exclude CAS-ESM2-0 (ignored in default_ignore_datasets.yaml).
Copilot AI review requested due to automatic review settings February 12, 2026 12:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 57 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 98.90110% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...kages/climate-ref/src/climate_ref/solve_helpers.py 98.90% 0 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
...kages/climate-ref/src/climate_ref/solve_helpers.py 98.90% <98.90%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lewisjared
Copy link
Contributor Author

@bouweandela We can now add regression tests for diagnostics executions and constraints

@lewisjared lewisjared merged commit 196cbbf into main Feb 12, 2026
31 checks passed
@lewisjared lewisjared deleted the solve-helpers branch February 12, 2026 13:27
lewisjared added a commit to bouweandela/climate-ref that referenced this pull request Feb 12, 2026
…nt-experiment-support-with-solve-helpers

* origin/solve-helpers: (196 commits)
  fix: apply ignore_datasets config in solver regression tests
  fix: use prefix-only replacement in strip_path_prefix
  fix: address PR review comments for solve helpers
  docs: add ESGF catalog and solver regression testing to developer guide
  docs: add changelog entry for PR Climate-REF#529
  feat: add esgf_data_catalog fixture and solver regression baselines
  feat: Add an esgf catalog with ~100k entries
  feat: add CMIP7 catalog support in load_solve_catalog function
  feat: add solve helpers for solver regression testing
  chore(deps-dev): bump the python-dependencies group with 4 updates
  chore(deps): bump the github-actions group with 2 updates
  Bump version: 0.9.1 → 0.10.0
  chore: add changelog entry for PR Climate-REF#523
  fix: use subset check for CMEC bundle dimension validation
  fix(helm): use distinct chart version tags per build context
  chore: remove unnecessary conftest.py from climate-ref-core tests
  fix: address PR review comments
  fix: improve diff coverage and restrict CI permissions
  fix(ci): use coverage run for proper entry-point plugin tracking
  fix(ci): skip conftest_plugin in imports-without-extras check
  ...

# Conflicts:
#	packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/recipe.py
#	packages/climate-ref/src/climate_ref/testing.py
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/index.html
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/plots/ecs/calculate/ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/plots/ecs/calculate/ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/plots/ecs/calculate/ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/run/cmor_log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/run/ecs/calculate/log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/run/recipe.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/run/recipe_filled.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/lambda.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/lambda_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/lambda_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20250922_114815/work/ecs/calculate/lambda_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/index.html
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/plots/ecs/calculate/ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/plots/ecs/calculate/ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/plots/ecs/calculate/ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/run/cmor_log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/run/ecs/calculate/log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/run/main_log_debug.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/run/recipe.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/run/recipe_filled.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/lambda.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/lambda_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/lambda_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20251002_203243/work/ecs/calculate/lambda_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/index.html
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/plots/ecs/calculate/ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/plots/ecs/calculate/ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/plots/ecs/calculate/ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/run/cmor_log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/run/ecs/calculate/diagnostic_provenance.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/run/ecs/calculate/log.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/run/recipe.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/run/recipe_filled.yml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/ecs_regression_ACCESS-ESM1-5_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/lambda.nc
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/lambda_citation.bibtex
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/lambda_data_citation_info.txt
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/executions/recipe_20260130_162732/work/ecs/calculate/lambda_provenance.xml
#	tests/test-data/regression/esmvaltool/equilibrium-climate-sensitivity/cmip6_gn_r1i1p1f1_ACCESS-ESM1-5/output.json
@bouweandela
Copy link
Contributor

Awesome! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants