Skip to content

Script to compute spatial autocorrelation of structured/unstructured datasets#1955

Open
shmh40 wants to merge 7 commits intodevelopfrom
shmh40/dev/1952-compute-spatial-autocorr
Open

Script to compute spatial autocorrelation of structured/unstructured datasets#1955
shmh40 wants to merge 7 commits intodevelopfrom
shmh40/dev/1952-compute-spatial-autocorr

Conversation

@shmh40
Copy link
Contributor

@shmh40 shmh40 commented Feb 27, 2026

Description

Generated script to compute autocorrelations in structured and unstructured datasets.

How it works:

  1. load data from anemoi or obs dataset or xarray.

  2. Optional anomaly computation: remove climatology for gridded data, or try to do this with spatial mean/std for unstructured data.

  3. Estimate spatial autocorrelation by sampling random pairs of points, compute haversine (great-circle) distance and then bin by distance and compute correlation per bin. Fit to the length of autocorrelation with 1/e threshold. The number of samples and time slices to use to make this estimate can be set by the user -- it is pretty cheap, and only needs to be done once, so you can run with many samples. Claude did something annoying with fallbacks for fitting the correlation, 1/e -> integrated scale -> log linear. I am not too worried about this.

  4. The script additionally maps the spatial autocorrelation to a suggested healpix masking level (according to the user-chosen coefficient) and groups variables (for putting in the separated streams configs) and produces yaml snippets for per-stream masking overrides.

Issue Number

Closes #1952

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@shmh40 shmh40 self-assigned this Feb 27, 2026
@shmh40 shmh40 added the data Anything related to the datasets used in the project label Feb 27, 2026
@shmh40 shmh40 linked an issue Feb 27, 2026 that may be closed by this pull request
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Anything related to the datasets used in the project

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Compute spatial autocorrelation of variables in a dataset

1 participant