Conversation
|
Hello, this pull request has been marked with the If you won't be able to finish this in time, don't worry - just unassign the milestone |
|
@valeriupredoi volunteered to do the technical review of this one. |
|
The downloader now uses webdav3 instead of webdav. I also updated the CDS requests to account for the latest changes from CDS as the attributes to request the data slightly changed. The downloader and formatter work fine with the changes. As expected, the tests now fail because webdav3 is not in the testing environment. From my point of view, this would now be ready for merging. Thank again @valeriupredoi for taking a look! |
|
@axel-lauer good stuff, bud! Please also add webdav3 to |
Thanks @valeriupredoi ! Just added webdav3 to pyproject.toml: da5fe5a |
valeriupredoi
left a comment
There was a problem hiding this comment.
sorry, my bad - I forgot the actual package name 😁
valeriupredoi
left a comment
There was a problem hiding this comment.
code looks spiffy! Very many thanks @axel-lauer 🍺
|
@ESMValGroup/science-reviewers Anyone available to do to a quick scientific review on this one? |
bettina-gier
left a comment
There was a problem hiding this comment.
Code looks good, I'll try and run to see the output unless you want to point me to a folder with the download and formatting logs
Co-authored-by: Bettina Gier <gier@uni-bremen.de>
|
Sorry for the delay. I'm getting the following error in regards to the webdav3 client: Which is a very.. telling error message. Could you double check if this is a problem on my side or in general? |
|
Tried to track it down as the problem is still persisting for me I'd download manually to test the actual cmorizer but the files are on a per-day basis so even just trying to test a single year would be manually downloading 365 files. |
|
I am able to download with just Since we already have the WgetDownloader, would it be possible to use that instead of adding yet another dependency? |
Good idea, but for reasons I do not understand, this does not work for me from inside the downloder script. No matter what I try, I end up with an "HTTP request sent, awaiting response... 401 Unauthorized" error. Using the WebDAV client downloading the files works fine. |
|
The wget downloader works for me if I make these changes: diff --git a/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py b/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
index e509faf7d..84aeb6446 100644
--- a/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
+++ b/esmvaltool/cmorizers/data/downloaders/datasets/esacci_ozone.py
@@ -9,7 +9,7 @@ from datetime import datetime
import cdsapi
from dateutil import relativedelta
-from webdav3.client import Client
+from esmvaltool.cmorizers.data.downloaders.wget import WGetDownloader
logger = logging.getLogger(__name__)
@@ -134,42 +134,27 @@ def download_dataset(
if end_date is None:
end_date = datetime(2023, 12, 31)
- options = {
- "webdav_hostname": "https://webdav.aeronomie.be",
- "webdav_login": "o3_cci_public",
- "webdav_password": "",
- }
-
- wd_client = Client(options)
+ downloader = WGetDownloader(
+ original_data_dir=original_data_dir,
+ dataset=dataset,
+ dataset_info=dataset_info,
+ overwrite=overwrite,
+ )
+ wget_options = [
+ "-e robots=off", # Ignore robots.txt
+ "--no-parent", # Don't ascend to the parent directory
+ "--user=o3_cci_public", # User name
+ "--password=", # Empty password (no password needed for public access)
+ ]
- basepath = "/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI/"
+ basepath = "https://webdav.aeronomie.be/guest/o3_cci/webdata/Nadir_Profiles/L3/IASI_MG_FORLI"
loop_date = start_date
while loop_date <= end_date:
year = loop_date.year
-
- # if needed, create local output directory
- outdir = output_folder / f"IASI_{year}"
- os.makedirs(outdir, exist_ok=True)
-
- # directory on WebDAV server to download
+ # directory on server to download
remotepath = f"{basepath}/{year}"
- files = wd_client.list(remotepath)
- info = wd_client.info(remotepath + "/" + files[0])
- numfiles = len(files)
- # calculate approx. download volume in Gbytes
- size = int(info["size"]) * numfiles // 1073741824
- del files
-
- loginfo = (
- f"downloading {numfiles} files for year {year}"
- f" (approx. {size} Gbytes)"
- )
- logger.info(loginfo)
-
- # synchronize local (output) directory and WebDAV server directory
- wd_client.pull(remote_directory=remotepath, local_directory=outdir)
-
+ downloader.download_folder(remotepath, wget_options)
loop_date += relativedelta.relativedelta(years=1)
else: |
This reverts commit 31afafc.
|
Thanks @bouweandela ! This seems to work fine at a first look, but there are strange things happening when continuing unfinished downloads. For example, the wget command Here is the code I used: In order to save the output into custom subfolders, I extended I do not know enough about WebDAV to understand what's going on, but simply switching from WebDAV to wget seems quite error prone. I would therefore prefer to leave things as they are, i.e. continue using WebDAV. |
|
I'm not too keen on adding yet another dependency since we already have quite some work on making sure it is possible to install the tool and the conda environment solves, especially @valeriupredoi. Do you really need the subfolders? The files all seem to have the date in the name anyway. If you need them, adding that functionality to the WgetDownloader or creating a customized subclass and using that seems a fine solution too. |
|
I use subfolders because the number of files in one directory makes hanlding the data very slow (on Levante). Even a plain "ls" will take very long. Anyways. I will try to put everthing in the same folder using wget. If that works fine. I don't care anymore. |
|
There you go. All files in the same folder, no more webdav: f720d7b |
|
Thanks for making theses changes @axel-lauer! @bouweandela @bettina-gier @valeriupredoi anything you'd like to add? From my side this can be merged. This is the final feature PR for v2.14.0 🚀 |
|
all fine by me, cheers, folks 🍺 |
|
LGTM |
Description
This PR extends the existing CMORizer scripts (downloading and formatting) for ESA CCI OZONE data to include the following additional dataset versions:
In addition, problems with the time bounds of the dataset versions included in the first version of the CMORizer SAGE-OMPS (o3) and GTO-ECV (toz) are fixed in this PR.
For automatic downloading of IASI data, support for webdav is needed (https://pypi.org/project/webdavclient/). The webdavclient package has been added to the environment files.
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
New or updated data reformatting script