Handle multiple files from a single source when using curl to transfer #59

EddieLF · 2024-08-21T01:48:07Z

Currently, we can have multiple files in a Source, so long as the transfer_cmd is gcs_rsync or gcs_cp_r.

This PR adds a way to handle multiple files in a source when the transfer_cmd is curl, by iterating through the source.files dictionary values and constructing URLs for each file source and destination to pipe into.

In response to this PR and this workflow run which did not handle multiple source files with curl well at all.

illusional · 2024-08-21T02:02:35Z

.github/workflows/transfer.py

-        )
-        print(cmd)
-        subprocess.run(cmd, shell=True)
+        if source.files and source.transfer_cmd in (curl, curl_with_user_agent):


IMO, I'd probably tweak this so the transfer_cmd returns a list of commands (or a single string), and then run multiple if it's a list. So we keep how it's run closer to the implementation.

Are there type annotations that are wrong here, or is files not used here, I'm a bit confused what the behaviour is supposed to be here.

is files not used here

I think files is not used here. There are two types of Source objects in references.py - sources containing a single reference file, for which only src and dst are defined. Then there are sources containing multiple reference files. For these sources, the attribute files is defined as a dict.

When the transfer_cmd is gsutil rsync, we copy the entire src directory to the dst destination, meaning we get what's in the files dict with the correct directory structure (alongside whatever else the rsync picks up from the source), so there's no need to actually look at the files dict.

Whereas for a curl, we need to actually look at all the files in the dict and execute curl command for each of them.

IMO, I'd probably tweak this so the transfer_cmd returns a list of commands (or a single string), and then run multiple if it's a list. So we keep how it's run closer to the implementation.

Can you elaborate? Are you suggesting something like updating this line to

transfer_cmd = [curl_with_user_agent, curl_with_user_agent, curl_with_user_agent]

And then updating the transfer script code to do something like

if source.files and isinstance(source.transfer_cmd, list): for file, command in zip(source.files.values(), transfer_cmd): cmd = command(...) print(command) subprocess.run(command, shell=True)

Oh I see, I think this behaviour should be decided by the transfer_cmd implementation. It probably makes more sense to pass the Source, and change the transfer_cmd to take the Source object.

RE the list of commands, I mean something like:

# this is a garbage implementation, it's just trying to illustrate the point def my_transfer_command(source: Source) -> list[str] | str: if many_files or a_directory: return ["cp {f} ." for f in many_files] return "cp one.file ."

Handle multiple files from a single source when using curl to transfer

2b5f885

EddieLF requested a review from illusional August 21, 2024 01:51

illusional reviewed Aug 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle multiple files from a single source when using curl to transfer #59

Handle multiple files from a single source when using curl to transfer #59

Uh oh!

EddieLF commented Aug 21, 2024 •

edited

Loading

Uh oh!

illusional Aug 21, 2024

Uh oh!

EddieLF Aug 21, 2024

Uh oh!

illusional Aug 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Handle multiple files from a single source when using curl to transfer #59

Are you sure you want to change the base?

Handle multiple files from a single source when using curl to transfer #59

Uh oh!

Conversation

EddieLF commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

illusional Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

EddieLF Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

illusional Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EddieLF commented Aug 21, 2024 •

edited

Loading