Script to compute spatial autocorrelation of structured/unstructured datasets by shmh40 · Pull Request #1955 · ecmwf/WeatherGenerator

shmh40 · 2026-02-27T14:04:39Z

Description

Generated script to compute autocorrelations in structured and unstructured datasets.

How it works:

load data from anemoi or obs dataset or xarray.
Optional anomaly computation: remove climatology for gridded data, or try to do this with spatial mean/std for unstructured data.
Estimate spatial autocorrelation by sampling random pairs of points, compute haversine (great-circle) distance and then bin by distance and compute correlation per bin. Fit to the length of autocorrelation with 1/e threshold. The number of samples and time slices to use to make this estimate can be set by the user -- it is pretty cheap, and only needs to be done once, so you can run with many samples. Claude did something annoying with fallbacks for fitting the correlation, 1/e -> integrated scale -> log linear. I am not too worried about this.
The script additionally maps the spatial autocorrelation to a suggested healpix masking level (according to the user-chosen coefficient) and groups variables (for putting in the separated streams configs) and produces yaml snippets for per-stream masking overrides.

Issue Number

Closes #1952

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

…a structured or unstructured dataset

shmh40 added 4 commits February 27, 2026 13:34

enable healpix masking at the level of the data

81961aa

enable per stream masking strategy config override

b7c1342

per stream masking override test

ceb590a

standalone script to compute spatial autocorrelation of variables in …

22f0923

…a structured or unstructured dataset

shmh40 self-assigned this Feb 27, 2026

shmh40 added the data Anything related to the datasets used in the project label Feb 27, 2026

shmh40 added this to WeatherGen-dev Feb 27, 2026

shmh40 linked an issue Feb 27, 2026 that may be closed by this pull request

Compute spatial autocorrelation of variables in a dataset #1952

Open

6 tasks

shmh40 and others added 3 commits February 27, 2026 18:03

Merge branch 'develop' into shmh40/dev/1952-compute-spatial-autocorr

db8fd3a

remove commits that should be in pr 1951

39ccc64

lint

e17219f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to compute spatial autocorrelation of structured/unstructured datasets#1955

Script to compute spatial autocorrelation of structured/unstructured datasets#1955
shmh40 wants to merge 7 commits intodevelopfrom
shmh40/dev/1952-compute-spatial-autocorr

shmh40 commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shmh40 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shmh40 commented Feb 27, 2026 •

edited

Loading