Extension of ESA CCI OZONE CMORizer by axel-lauer · Pull Request #4125 · ESMValGroup/ESMValTool

axel-lauer · 2025-07-28T13:03:14Z

Description

This PR extends the existing CMORizer scripts (downloading and formatting) for ESA CCI OZONE data to include the following additional dataset versions:

MEGRIDOP (o3)
IASI (o3, toz)

In addition, problems with the time bounds of the dataset versions included in the first version of the CMORizer SAGE-OMPS (o3) and GTO-ECV (toz) are fixed in this PR.

For automatic downloading of IASI data, support for webdav is needed (https://pypi.org/project/webdavclient/). The webdavclient package has been added to the environment files.

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🛠 This pull request has a descriptive title
🛠 Code is written according to the code quality guidelines
🛠 Documentation is available
🛠 Tests run successfully
🛠 The list of authors is up to date
🛠 Any changed dependencies have been added or removed correctly
🛠 All checks below this pull request were successful

New or updated data reformatting script

🛠 Documentation is available
🛠 The dataset has been added to the CMOR check recipe
🛠 The dataset has been added to the shared data pools of DKRZ and Jasmin by the @ESMValGroup/OBS-maintainers team
🧪 Numbers and units of the data look physically meaningful

schlunma · 2026-01-20T09:57:37Z

Hello, this pull request has been marked with the v2.14.0 milestone. The release of version 2.14.0 is currently scheduled for February 2026. To get this into the new release, it would be great to get this merged by the end of January.

If you won't be able to finish this in time, don't worry - just unassign the milestone v2.14.0. If you need any support, ping myself (@schlunma; the release manager for v2.14.0) or the @ESMValGroup/technical-lead-development-team. Please note that I won't be available until the beginning of February, though.

bouweandela · 2026-02-06T09:47:40Z

@valeriupredoi volunteered to do the technical review of this one.

axel-lauer · 2026-02-12T19:13:39Z

The downloader now uses webdav3 instead of webdav. I also updated the CDS requests to account for the latest changes from CDS as the attributes to request the data slightly changed. The downloader and formatter work fine with the changes.

As expected, the tests now fail because webdav3 is not in the testing environment. From my point of view, this would now be ready for merging. Thank again @valeriupredoi for taking a look!

valeriupredoi · 2026-02-13T12:57:50Z

@axel-lauer good stuff, bud! Please also add webdav3 to pyproject.toml 🍺

axel-lauer · 2026-02-13T13:30:38Z

@axel-lauer good stuff, bud! Please also add webdav3 to pyproject.toml 🍺

Thanks @valeriupredoi ! Just added webdav3 to pyproject.toml: da5fe5a

valeriupredoi

sorry, my bad - I forgot the actual package name 😁

pyproject.toml

valeriupredoi

code looks spiffy! Very many thanks @axel-lauer 🍺

schlunma · 2026-02-20T14:13:04Z

@ESMValGroup/science-reviewers Anyone available to do to a quick scientific review on this one?

bettina-gier

Code looks good, I'll try and run to see the output unless you want to point me to a folder with the download and formatting logs

esmvaltool/cmorizers/data/formatters/datasets/esacci_ozone.py

Co-authored-by: Bettina Gier <gier@uni-bremen.de>

bettina-gier · 2026-03-04T17:53:01Z

Sorry for the delay. I'm getting the following error in regards to the webdav3 client:

2026-03-04 17:41:47,480 UTC [720960] ERROR   Program terminated abnormally, see stack trace below for more information:
Traceback (most recent call last):
  File "/work/bd0854/b309137/esmval/ESMValCore/esmvalcore/_main.py", line 786, in run
    fire.Fire(ESMValTool())
    ~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ~~~~~~~~~~~~~~~~~~~^
        component,
        ^^^^^^^^^^
    ...<2 lines>...
        treatment='class' if is_class else 'routine',
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        target=component.__name__)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/work/bd0854/b309137/esmval/ESMValTool/esmvaltool/cmorizers/data/cmorizer.py", line 708, in download
    self.formatter.download(start_date, end_date, overwrite=overwrite)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/bd0854/b309137/esmval/ESMValTool/esmvaltool/cmorizers/data/cmorizer.py", line 200, in download
    self.download_dataset(
    ~~~~~~~~~~~~~~~~~~~~~^
        dataset,
        ^^^^^^^^
    ...<2 lines>...
        overwrite=overwrite,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/work/bd0854/b309137/esmval/ESMValTool/esmvaltool/cmorizers/data/cmorizer.py", line 250, in download_dataset
    downloader.download_dataset(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        original_data_dir=self.original_data_dir,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        overwrite=overwrite,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/work/bd0854/b309137/esmval/ESMValTool/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py", line 157, in download_dataset
    files = wd_client.list(remotepath)
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/webdav3/client.py", line 78, in _wrapper
    res = fn(self, *args, **kw)
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/webdav3/client.py", line 295, in list
    if directory_urn.path() != Client.root and not self.check(directory_urn.path()):
                                                   ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/webdav3/client.py", line 78, in _wrapper
    res = fn(self, *args, **kw)
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/webdav3/client.py", line 344, in check
    response = self.execute_request(action="check", path=urn.quote())
  File "/home/b/b309137/.conda/envs/aival/lib/python3.13/site-packages/webdav3/client.py", line 257, in execute_request
    raise ResponseErrorCode(
    ...<3 lines>...
    )
webdav3.exceptions.ResponseErrorCode: Request to https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/2008/ failed with code 401 and message: b''

Which is a very.. telling error message. Could you double check if this is a problem on my side or in general?

bettina-gier · 2026-03-11T06:08:47Z

Tried to track it down as the problem is still persisting for me
Error Message 401 apparently means unauthorized. I tried going from the base down folder by folde. The basepath works but doesn't include the "guest" directory in the list - but this is the same I can see using a browser.
Using the path 'https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/2008/' as example, going through the folders one by one I get different errors:
/guest and /guest/o3_cci give error 403: which is apparently an authorization error
the others give a 401 error. I can access all these folders through a webbrowser. According to Google the difference between the errors is: "401 refers to the lack of valid authentication credentials, whereas the 403 error occurs after authentication, signaling the absence of necessary permissions to access a resource", which seems weird when the browser is fine with all folder with the same exact authentification.

I'd download manually to test the actual cmorizer but the files are on a per-day basis so even just trying to test a single year would be manually downloading 365 files.

bouweandela · 2026-03-11T07:57:27Z

I am able to download with just wget and the command (based on this post):

wget -e robots=off -r -nH --cut-dirs=3 --no-parent --reject="index.html*"  --user=o3_cci_public --password='' https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/2008/

Since we already have the WgetDownloader, would it be possible to use that instead of adding yet another dependency?

axel-lauer · 2026-03-11T13:38:27Z

I am able to download with just wget and the command (based on [this post]
Since we already have the WgetDownloader, would it be possible to use that instead of adding yet another dependency?

Good idea, but for reasons I do not understand, this does not work for me from inside the downloder script. No matter what I try, I end up with an "HTTP request sent, awaiting response... 401 Unauthorized" error. Using the WebDAV client downloading the files works fine.

bouweandela · 2026-03-11T16:48:18Z

The wget downloader works for me if I make these changes:

diff --git a/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py b/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
index e509faf7d..84aeb6446 100644
--- a/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
+++ b/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
@@ -9,7 +9,7 @@ from datetime import datetime
 
 import cdsapi
 from dateutil import relativedelta
-from webdav3.client import Client
+from esmvaltool.cmorizers.data.downloaders.wget import WGetDownloader
 
 logger = logging.getLogger(__name__)
 
@@ -134,42 +134,27 @@ def download_dataset(
         if end_date is None:
             end_date = datetime(2023, 12, 31)
 
-        options = {
-            "webdav_hostname": "https://webdav.aeronomie.be",
-            "webdav_login": "o3_cci_public",
-            "webdav_password": "",
-        }
-
-        wd_client = Client(options)
+        downloader = WGetDownloader(
+            original_data_dir=original_data_dir,
+            dataset=dataset,
+            dataset_info=dataset_info,
+            overwrite=overwrite,
+        )
+        wget_options = [
+            "-e robots=off",  # Ignore robots.txt
+            "--no-parent",  # Don't ascend to the parent directory
+            "--user=o3_cci_public",  # User name
+            "--password=",  # Empty password (no password needed for public access)
+        ]
 
-        basepath = "/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/"
+        basepath = "https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI"
 
         loop_date = start_date
         while loop_date <= end_date:
             year = loop_date.year
-
-            # if needed, create local output directory
-            outdir = output_folder / f"IASI_{year}"
-            os.makedirs(outdir, exist_ok=True)
-
-            # directory on WebDAV server to download
+            # directory on server to download
             remotepath = f"{basepath}/{year}"
-            files = wd_client.list(remotepath)
-            info = wd_client.info(remotepath + "/" + files[0])
-            numfiles = len(files)
-            # calculate approx. download volume in Gbytes
-            size = int(info["size"]) * numfiles // 1073741824
-            del files
-
-            loginfo = (
-                f"downloading {numfiles} files for year {year}"
-                f" (approx. {size} Gbytes)"
-            )
-            logger.info(loginfo)
-
-            # synchronize local (output) directory and WebDAV server directory
-            wd_client.pull(remote_directory=remotepath, local_directory=outdir)
-
+            downloader.download_folder(remotepath, wget_options)
             loop_date += relativedelta.relativedelta(years=1)
 
     else:

This reverts commit 31afafc.

axel-lauer · 2026-03-12T14:04:14Z

Thanks @bouweandela ! This seems to work fine at a first look, but there are strange things happening when continuing unfinished downloads. For example, the wget command
['wget', '-e robots=off', '--no-parent', '--accept=nc', '--user=o3_cci_public', '--password=', '--no-clobber', '--directory-prefix=/work/bd0854/b380103/download/Tier2/ESACCI-OZONE/IASI_2008', '--recursive', '--no-directories', 'https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/2008']
starts downloading files for the year 2009(!) into the folder for 2008, e.g.
Saving to: ‘/work/bd0854/b380103/download/Tier2/ESACCI-OZONE/IASI_2008/IASI_FORLI_O3_MERGED_20090112_V1.0.nc’

Here is the code I used:

        if start_date is None:
            start_date = datetime(2008, 1, 1)
        if end_date is None:
            end_date = datetime(2023, 12, 31)

        downloader = WGetDownloader(
            original_data_dir=original_data_dir,
            dataset=dataset,
            dataset_info=dataset_info,
            overwrite=overwrite,
        )

        basepath = "https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI"

        wget_options = [
            "-e robots=off",  # ignore robots.txt
            "--no-parent",    # don't ascend to the parent directory
            "--accept=nc",    # download only *.nc files
            "--user=o3_cci_public",  # user name
            "--password=",    # empty password (no password needed for public access)
        ]

        loop_date = start_date
        while loop_date <= end_date:
            year = loop_date.year

            # directory on server to download
            remotepath = f"{basepath}/{year}"
            downloader.download_folder(remotepath, wget_options, f"IASI_{year}")

            loop_date += relativedelta.relativedelta(years=1)

In order to save the output into custom subfolders, I extended download_folder in wget.py with the new option sub_folder:

    def download_folder(self, server_path, wget_options, sub_folder=""):
        """Download folder.

        Parameters
        ----------
        server_path: str
            Path to remote folder
        wget_options: list(str)
            Extra options for wget
        sub_folder : str, optional
            Name of the local subfolder to store the results in, by default ''
        """
        if self.overwrite:
            raise ValueError(
                "Overwrite does not work with downloading directories through "
                "wget. Please, remove the unwanted data manually",
            )
        output_dir = Path(self.local_folder) / sub_folder

        command = (
            ["wget"]
            + wget_options
            + [
                "--no-clobber",
                f"--directory-prefix={output_dir}",
                "--recursive",
                "--no-directories",
                f"{server_path}",
            ]
        )

        logger.debug(command)
        subprocess.check_output(command)

I do not know enough about WebDAV to understand what's going on, but simply switching from WebDAV to wget seems quite error prone. I would therefore prefer to leave things as they are, i.e. continue using WebDAV.

bouweandela · 2026-03-12T14:29:38Z

I'm not too keen on adding yet another dependency since we already have quite some work on making sure it is possible to install the tool and the conda environment solves, especially @valeriupredoi.

Do you really need the subfolders? The files all seem to have the date in the name anyway. If you need them, adding that functionality to the WgetDownloader or creating a customized subclass and using that seems a fine solution too.

axel-lauer · 2026-03-12T14:36:53Z

I use subfolders because the number of files in one directory makes hanlding the data very slow (on Levante). Even a plain "ls" will take very long. Anyways. I will try to put everthing in the same folder using wget. If that works fine. I don't care anymore.

axel-lauer · 2026-03-12T14:58:01Z

There you go. All files in the same folder, no more webdav: f720d7b

schlunma · 2026-03-12T16:05:37Z

Thanks for making theses changes @axel-lauer! @bouweandela @bettina-gier @valeriupredoi anything you'd like to add? From my side this can be merged. This is the final feature PR for v2.14.0 🚀

valeriupredoi · 2026-03-12T16:07:37Z

all fine by me, cheers, folks 🍺

bouweandela · 2026-03-12T16:20:09Z

LGTM

axel-lauer added 14 commits May 13, 2025 14:02

added MERIDOP to ESACCI-OZONE CMORizer

022eef1

Added MEGRIDOP (ESACCI-OZONE) to datasets.yml

bc04e52

snapshot adding IASI downloader and formatter

0fb5051

snapshot 2025-05-28

52c5db6

snapshot 2025-05-30

50831cc

unknown

f81e2f2

snapshot 2025-07-03

5001397

snapshot 2025-07-03 (2)

e4993db

added fix for time bounds

752380a

fixed longitude fix

fae58c1

update datasets.yml

67edfed

removed local rest recipe

a105a72

Merge branch 'main' into extend_esa_cci_ozone_cmorizer

07f2516

reverted downloader esacci_cloud.py

0b4062d

axel-lauer added observations looking for technical reviewer looking for scientific reviewer labels Jul 28, 2025

axel-lauer added 2 commits July 28, 2025 16:08

reactivated CDS download

896dca5

updated esacci_ozone.py (formatter)

284168c

axel-lauer marked this pull request as ready for review July 30, 2025 06:40

axel-lauer requested a review from a team as a code owner July 30, 2025 06:40

hb326 added in scientific review and removed looking for scientific reviewer labels Aug 27, 2025

Merge branch 'main' into extend_esa_cci_ozone_cmorizer

9e2f974

axel-lauer added this to the v2.14.0 milestone Jan 12, 2026

bouweandela requested a review from valeriupredoi February 6, 2026 09:46

bouweandela added in technical review and removed looking for technical reviewer labels Feb 6, 2026

axel-lauer added 3 commits February 12, 2026 10:05

updated CDS requests and webdav

d57894d

Merge branch 'main' into extend_esa_cci_ozone_cmorizer

4b8f2a2

updated recipe_check_obs.yml

36e07fb

Add webdav3 to pyproject.toml

da5fe5a

valeriupredoi reviewed Feb 13, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Update pyproject.toml

2262fda

valeriupredoi approved these changes Feb 13, 2026

View reviewed changes

valeriupredoi added approved by technical reviewer and removed in technical review labels Feb 13, 2026

bettina-gier reviewed Feb 20, 2026

View reviewed changes

esmvaltool/cmorizers/data/formatters/datasets/esacci_ozone.py Outdated Show resolved Hide resolved

Update esmvaltool/cmorizers/data/formatters/datasets/esacci_ozone.py

08d59dd

Co-authored-by: Bettina Gier <gier@uni-bremen.de>

failed attempt to switch from webdav to wget

31afafc

Revert "failed attempt to switch from webdav to wget"

73df026

This reverts commit 31afafc.

changed webdav to wget

f720d7b

Conversation

axel-lauer commented Jul 28, 2025 • edited by hb326 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

New or updated data reformatting script

Uh oh!

schlunma commented Jan 20, 2026

Uh oh!

bouweandela commented Feb 6, 2026

Uh oh!

axel-lauer commented Feb 12, 2026

Uh oh!

valeriupredoi commented Feb 13, 2026

Uh oh!

axel-lauer commented Feb 13, 2026

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

schlunma commented Feb 20, 2026

Uh oh!

bettina-gier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bettina-gier commented Mar 4, 2026

Uh oh!

bettina-gier commented Mar 11, 2026

Uh oh!

bouweandela commented Mar 11, 2026

Uh oh!

axel-lauer commented Mar 11, 2026

Uh oh!

bouweandela commented Mar 11, 2026

Uh oh!

axel-lauer commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bouweandela commented Mar 12, 2026

Uh oh!

axel-lauer commented Mar 12, 2026

Uh oh!

axel-lauer commented Mar 12, 2026

Uh oh!

schlunma commented Mar 12, 2026

Uh oh!

valeriupredoi commented Mar 12, 2026

Uh oh!

bouweandela commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

axel-lauer commented Jul 28, 2025 •

edited by hb326

Loading

axel-lauer commented Mar 12, 2026 •

edited

Loading