Skip to content

[1843][DRAFT] Diffusion Single Sample#1909

Draft
moritzhauschulz wants to merge 261 commits intoecmwf:developfrom
moritzhauschulz:mh/mk/diffusion-single-sample
Draft

[1843][DRAFT] Diffusion Single Sample#1909
moritzhauschulz wants to merge 261 commits intoecmwf:developfrom
moritzhauschulz:mh/mk/diffusion-single-sample

Conversation

@moritzhauschulz
Copy link
Contributor

DRAFT for tracking progress – replacing #1845.

Not to be merged yet.

Description

Issue Number

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Jubeku and others added 30 commits November 11, 2025 16:09
…andom and healpix masking. Open issues with _coords_local, centroids and probably other things.
TODO:
- Forecast still needs to be adapted
- Some more cleanup of variable naming, return values etc
sbAsma and others added 22 commits January 23, 2026 13:24
* initial commit [draft]

* adapt noise conditioner to make it closer to DiT

* adapt dimensionalities – code runs with default config

* lint

* fix: add conditional prediction mode handling

This commit resolves architectural incompatibilities when integrating
diffusion-based forecast engines:

1. FSDP Sharding: DiffusionForecastEngine wraps ForecastingEngine
   as `self.net`, but trainer code assumed direct `fe_blocks` access. Fixed by:
   - Adding fe_diffusion_model conditional check in init_model_and_shard()
   - Routing to model.forecast_engine.net.fe_blocks for diffusion mode

2. Model Initialization: Reordered ForecastingEngine creation to handle both
   standard and diffusion-wrapped variants with proper fallback.

3. Target Format Handling: Autoencoder mode uses different target
   structure than diffusion mode. Added conditional formatting:
   - Diffusion: targets = {"targets": [targets], "aux_outputs": aux}
   - Autoencoder: targets = {"physical": batch[0]}

4. Config Updates: added file config/diffusion_config.yml for diffusion
   model config

* added forecast engine argument

* removed unecessary logging

* reverting back to the previous config

* replaced getattr by get

* modification of forecasting engine initialization

---------

Co-authored-by: moritzhauschulz <moritz.hauschulz@gmail.com>
Co-authored-by: Matthias Karlbauer <matthias.karlbauer@ecmwf.int>
@moritzhauschulz
Copy link
Contributor Author

@MatKbauer @clessig new branch (name) as discussed

@moritzhauschulz moritzhauschulz changed the title Mh/mk/diffusion single sample [1843][DRAFT] Single Sample Experiments Feb 23, 2026
@moritzhauschulz moritzhauschulz changed the title [1843][DRAFT] Single Sample Experiments [1843][DRAFT] Diffusion Single Sample Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants