Skip to content

Implement parametrisation for standardising model runs#392

Open
mo-nikosbaltas wants to merge 11 commits intomainfrom
338-implement-parallel-parameterised-ref-eval-multi-run-standardisation
Open

Implement parametrisation for standardising model runs#392
mo-nikosbaltas wants to merge 11 commits intomainfrom
338-implement-parallel-parameterised-ref-eval-multi-run-standardisation

Conversation

@mo-nikosbaltas
Copy link
Collaborator

@mo-nikosbaltas mo-nikosbaltas commented Mar 2, 2026

Closes #338

PR creation checklist for the developer

  • The <issue_number> above ☝️ has been replaced with the issue number.
  • main has been selected as the base branch.
  • The feature branch name follows the format <issue_number>_<short_description_of_feature>.
  • The text of the PR title exactly matches with the text (not including the issue number) of the issue title.
  • Appropriate reviewers have been added to the PR (once it is ready for review).
  • The PR has been assigned to the developer(s).
  • The same labels as on the issue (except for the good first issue label) have been added to the PR.
  • The Climate Model Evaluation Workflow (CMEW) project has been added to the PR.
  • The appropriate milestone has been added to the PR.

Definition of Done for the developer

  • The change in this PR addresses the above issue / all acceptance criteria have been met.
  • The change in this PR follows the requirements in the wiki: Developer Guide (including copyrights).
  • The GitHub Actions workflow checks pass.
  • The tests run locally and pass (Note: the tests are not run by the GitHub Actions workflow, see wiki: Run the tests locally).
  • Updating the Rose metadata (select one of the following):
    • Rose metadata related to the change has been added or updated.
    • The change does not require Rose metadata to be added or updated.
  • Rendering the Rose metadata (select one of the following):
    • The Rose GUI shows the change as expected.
    • The change in this PR does not affect the Rose GUI.
  • Updating the tests (select one of the following):
    • Tests related to the change have been added or updated.
    • The change does not require tests to be added or updated.
  • Updating the user documentation (i.e. everything in the doc directory, including the Quick Start section; select one of the following):
    • The user documentation related to the change has been updated appropriately.
    • The change in this PR does not require the user documentation to be updated.
  • Rendering the user documentation (wiki: Build the documentation locally provides instructions; select one of the following):
    • The HTML pages show the change as expected.
    • The change in this PR does not affect the HTML pages.
  • Updating the API documentation (e.g. docstrings in Python modules; select one of the following):
    • The API documentation related to the change has been updated appropriately.
    • The change in this PR does not affect the API documentation.

PR creation checklist for the reviewer

  • The <issue_number> above ☝️ has been replaced with the issue number.
  • main has been selected as the base branch.
  • The feature branch name follows the format <issue_number>_<short_description_of_feature>.
  • The text of the PR title exactly matches with the text (not including the issue number) of the issue title.
  • Appropriate reviewers have been added to the PR (once it is ready for review).
  • The PR has been assigned to the developer(s).
  • The same labels as on the issue (except for the good first issue label) have been added to the PR.
  • The Climate Model Evaluation Workflow (CMEW) project has been added to the PR.
  • The appropriate milestone has been added to the PR.

Definition of Done for the reviewer

  • The change in this PR addresses the above issue / all acceptance criteria have been met.
  • The change in this PR follows the requirements in the wiki: Developer Guide (including copyrights).
  • The GitHub Actions workflow checks pass.
  • The tests run locally and pass (Note: the tests are not run by the GitHub Actions workflow, see wiki: Run the tests locally).
  • Updating the Rose metadata (select one of the following):
    • Rose metadata related to the change has been added or updated.
    • The change does not require Rose metadata to be added or updated.
  • Rendering the Rose metadata (select one of the following):
    • The Rose GUI shows the change as expected.
    • The change in this PR does not affect the Rose GUI.
  • Updating the tests (select one of the following):
    • Tests related to the change have been added or updated.
    • The change does not require tests to be added or updated.
  • Updating the user documentation (i.e. everything in the doc directory, including the Quick Start section; select one of the following):
    • The user documentation related to the change has been updated appropriately.
    • The change in this PR does not require the user documentation to be updated.
  • Rendering the user documentation (wiki: Build the documentation locally provides instructions; select one of the following):
    • The HTML pages show the change as expected.
    • The change in this PR does not affect the HTML pages.
  • Updating the API documentation (e.g. docstrings in Python modules; select one of the following):
    • The API documentation related to the change has been updated appropriately.
    • The change in this PR does not affect the API documentation.

Important

  • Remember to re-check the Definition of Done after making changes in response to a review.
  • The developer merges the PR.
  • Remember to use the format #<pull_request_number>: <pull_request_title> when writing the merge commit message for the pull request, so the pull request number is immediately visible on GitHub, regardless of the length of the pull request title.

@mo-nikosbaltas mo-nikosbaltas requested a review from zmaalick March 2, 2026 10:47
@mo-nikosbaltas mo-nikosbaltas added enhancement New feature or request standardise Anything related to CDDS labels Mar 2, 2026
@mo-nikosbaltas mo-nikosbaltas changed the title implement parallel parameterised ref eval multi run standardisation Implement parametrisation for standardising model runs Mar 2, 2026
@zmaalick zmaalick assigned zmaalick and mo-nikosbaltas and unassigned zmaalick Mar 3, 2026
Copy link
Collaborator

@zmaalick zmaalick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, nothing to add

@NParsonsMO NParsonsMO self-requested a review March 5, 2026 14:32
Copy link
Collaborator

@NParsonsMO NParsonsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @ehogan saying that this PR should make a variables file per dataset, which is not currently happening.
Image

I've also queried a couple of other bits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can believe it was necessary to pull this step out into its own app (if it just needs running once overall rather than separately for each dataset), so if @ehogan is happy with the addition of a new app(?) then this seems fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra API (restrucre_dirs) is required because it acts as the fan-in api for synchronisation. We have two instances of standardise_model_data running in parallel and unless they both finish you cannot restructure the data. I had discussed this with Alistair as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a way to run restructure_for_cmip6 in a way that can be done in parallel, but I propose that any investigation related to this is deferred. The name of the app should be changed, but I proposed that this is also deferred (to #318).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the agreement we made at the technical meeting this morning to revert configure_standardise.sh, restructure_dirs.sh should be reverted to this previous version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although not different it had only some useful logging which have been removed. 7d8aef1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of restructure_dirs.sh in this PR has not been reverted to this previous version. Please revert the changes to restructure_dirs.sh so that it matches this previous version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all comments and checks and reverted back to the serial version. f889818

@@ -1,4 +1,4 @@
# (C) Crown Copyright 2024-2025, Met Office.
# (C) Crown Copyright 2024-2026, Met Office.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this file has changed, so the copyright shouldn't be updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now changed.

fi

# ---------------------------------------------------------------------------
# 1. Create variables.txt once (shared by both runs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are meant to make two now, not one shared one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had changed our minds!!! Perhaps you missed that. Not allowed to have a separate 'create_variables' app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not just be a case of making a variables_<dataset>.txt filepath akin to REQUEST_PATH = ${CYLC_WORKFLOW_SHARE_DIR}/etc/request_%(dataset)s.cfg while keeping it in the same (combined) app?

Copy link
Member

@ehogan ehogan Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed at the technical meeting this morning, the reason we didn't want to introduce a create_variables app that created a single variables file is because it's possible for the same variables to come from different streams depending on the model evaluation run (/ dataset) used, which means a variables file is required for each dataset. Keeping the variables file creation within configure_standardise is preferred, since it minimises the number of changes compared to the previous version of the code. Edited to add that since configure_standardise is running in parallel, each job should write its own variables file, rather than try to write a single file with the same name, which is not safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it as it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All defensive programming and logging have been removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these have been resolved 7d8aef1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done it. 79e6324

CMEW/flow.cylc Outdated
[[[environment]]]
ROSE_TASK_APP = standardise_model_data
SITE = {{ SITE }}
RUN_LABEL = %(dataset)s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem good to define these things (RUN_LABEL and REQUEST_PATH) twice (once in configure_standardise<recipe,dataset> and once in standardise_model_data) - could you just define them once then inherit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please use the appropriate Cylc variable for the parameterisation variable, i.e. ${CYLC_TASK_PARAM_dataset} rather than %(dataset)s. An example of this for the recipe parameterised variable is available at https://github.com/MetOffice/CMEW/blob/main/CMEW/flow.cylc#L51.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use only ${CYLC_TASK_PARAM_dataset} in favour of ${RUN_LABEL}, please? (Using ${RUN_LABEL} obfuscates the recognisable and understandable variable ${CYLC_TASK_PARAM_dataset})

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done f889818


def create_request():
"""Retrieve CDDS request information from Rose suite configuration.
def create_request() -> configparser.ConfigParser:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note from an earlier comment of Emma's on another review that CMEW should not contain type hints:
#349 (comment)

I'm not sure why or if that's an absolute, again probably one for @ehogan.

Should there still be a docstring?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python functions must have docstrings. Documentation requirements: Python documentation provides more information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hinting has been removed and docstrings reinstated. docstrings were removed after previous changes were made.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All type hinting and the inclusion of docstrings had been done 7d8aef1

BASH_XTRACEFD=1
set -xeuo pipefail

echo "[INFO] Running configure_standardise for REF and EVAL runs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the technical meeting this morning, we agreed that configure_standardise.sh should be reverted to this previous version. Specifically:

  • -o pipefail is only needed in rose-app.conf files, as described in the Developer Guide: Rose requirements
  • The echoing and defensive programming is not required; the options provided to set at the top of the script do the following:
    • -x Print commands and their arguments as they are executed.
    • -e Exit immediately if a command exits with a non-zero status.
    • -u Treat unset variables as an error when substituting.
  • If we want to validate variables, this should be done in the flow.cylc file, e.g. https://github.com/MetOffice/CMEW/blob/main/CMEW/flow.cylc#L5, and not in an individual app. However, this is out of scope of this issue (a new issue should be opened should we want to validate variables)

Copy link
Collaborator Author

@mo-nikosbaltas mo-nikosbaltas Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 7d8aef1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of configure_standardise.sh in this PR has not been reverted to this previous version. Please revert the changes to configure_standardise.sh so that it matches this previous version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically removed some useful comments and checks. The only diffrence is that the create_variables_file.py script has to be executed before create_request_file.py otherwise it breaks. f889818

@ehogan
Copy link
Member

ehogan commented Mar 6, 2026

I remember @ehogan saying that this PR should make a variables file per dataset, which is not currently happening. Image

As discussed in the technical meeting this morning, configure_standardise must produce a variables file for each dataset, i.e. variables_u-bv526.txt and variables_u-cw673.txt.

@@ -1,4 +1,4 @@
# (C) Crown Copyright 2024-2025, Met Office.
# (C) Crown Copyright 2024-2026, Met Office.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the agreement we made at the technical meeting this morning to revert configure_standardise.sh, the rose-app.conf file for standardise_model_data should be reverted to this previous version and the standardise_model_data.sh script removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has 7d8aef1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of standardise_model_data/rose-app.conf in this PR has not been reverted to this previous version. Please revert the changes to standardise_model_data/rose-app.conf so that it matches this previous version, and then remove the standardise_model_data.sh script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standardise_model_data tasks run in parallel and they are standalone tasks so rose-app.conf had to be changed. You cannot revert to the serial version.

Copy link
Collaborator Author

@mo-nikosbaltas mo-nikosbaltas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making comment visible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although not different it had only some useful logging which have been removed. 7d8aef1

fi

# ---------------------------------------------------------------------------
# 1. Create variables.txt once (shared by both runs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these have been resolved 7d8aef1


def create_request():
"""Retrieve CDDS request information from Rose suite configuration.
def create_request() -> configparser.ConfigParser:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All type hinting and the inclusion of docstrings had been done 7d8aef1

@@ -1,4 +1,4 @@
# (C) Crown Copyright 2024-2025, Met Office.
# (C) Crown Copyright 2024-2026, Met Office.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has 7d8aef1

fi

# ---------------------------------------------------------------------------
# 1. Create variables.txt once (shared by both runs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done it. 79e6324

@@ -1,4 +1,4 @@
# (C) Crown Copyright 2024-2025, Met Office.
# (C) Crown Copyright 2024-2026, Met Office.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now changed.

Copy link
Collaborator

@NParsonsMO NParsonsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy that my concerns have now been addressed.

I've left one comment on a copyright that I think now needs updating.

@@ -1,18 +1,11 @@
#!/bin/bash
# (C) Crown Copyright 2024-2026, Met Office.
# (C) Crown Copyright 2024-2025, Met Office.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now it has changed... So maybe it should now be 2026?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. e21f4fe

Copy link
Collaborator

@NParsonsMO NParsonsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

Approved.

@zmaalick zmaalick requested review from ehogan and removed request for ehogan March 16, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request standardise Anything related to CDDS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement parametrisation for standardising model runs

5 participants