Conversation
There was a problem hiding this comment.
My view, and perhaps this is less satisfying, is that we actually need to treat this case by case.
The following retrievals are intended to be free -- a sort of "cost of doing business" for the SP:
- replication to other hot nodes
- replication to Filecoin (when the Filecoin SP fetches to get all the data for their aggregate)
This is part of the SLA they will sign and we've done the work to position it.
For these cases, I think we use service retrievals. In the case of replication between nodes,blob/replicate/transfer and its associated delegation chain should serve as authorization for the retrieval. We'll need to figure things out for Filecoin SPs but it should follow some form of free retrieval.
For other cases, I much prefer the on demand approach, and ultimately I think the customer should pay. In the warm storage context, every query to the indexer should have a space, and we can attach a ucan via header. I think we need to essentially setup a usage count or some other form of limiting factor, though I could see how that itself could be complicated.
Two of the mentioned cases I would like to avoid entirely in the warm storage context
- fetching the index for IPNI publishing on the Upload Service (this is so it's available on the gateway, which won't be the case for warm storage)
- verifiying the piece cid -- there is work slated for october to instead verify agreement between the data owner and the storage node, which should be sufficient
|
|
||
| There are a number of occasions where internal services within the Storacha Network need to retrieve data from Storage Nodes: | ||
|
|
||
| 1. The _Upload Service_ retrieves indexes as part of an `space/index/add` invocation, so that it can publish all the hashes to IPNI. |
There was a problem hiding this comment.
This refers to the bitswap/trustless-http-gateway IPNI chain pointing to elastic.dag.house and dag.w3s.link right? For warm storage, we should not be publishing this chain.
We can figure out hot storage later but I would say the eventual goal here is that the gateway (or other CDN) itself is the one publishing this chain, and space/index/add should go to them, and then the client should pay for the one time retrieval
There was a problem hiding this comment.
This refers to the bitswap/trustless-http-gateway IPNI chain pointing to elastic.dag.house and dag.w3s.link right? For warm storage, we should not be publishing this chain.
👍 this is already the case.
| On-demand delegated retrieval is the process of generating a delegation and attaching it to an invocation at the point at which it is needed. | ||
|
|
||
| * 🔴 Engineering work required to extract delegation and use it in a UCAN retrieval request for each invocation where it is needed. | ||
| * 🟠 Need to delegate an open ended capability to the indexer since the CID of an index it may need to retrieve is not known beforehand. |
There was a problem hiding this comment.
seems like this could be mitigated with a usage limit -- namely one time use.
There was a problem hiding this comment.
the assert index claim itself could serve as "authorization" for a one time retrieval for the case of storing assert/index
There was a problem hiding this comment.
hmm, one time usage isn't really supported in UCAN.
|
|
||
| 1. The _Upload Service_ retrieves indexes as part of an `space/index/add` invocation, so that it can publish all the hashes to IPNI. | ||
| 2. The _Indexing Service_ fetches blob indexes when an `assert/index` claim is published to it, as well as part of the normal flow when querying for hashes. | ||
| 3. The _Filecoin Service_ fetches all data submitted to it in order to validate the provided Piece CID matches the data. |
There was a problem hiding this comment.
It's on the roadmap but for warm storage, the storage node itself has to make a piece CID, and we have the one from the client. I believe agreement of these piece CIDs is sufficient to throw it in a filecoin deal.
When I say "on the roadmap" -- it's somewhere in the October development I believe.
This doesn't cover one other use case, which is retrieval by the actual Filecoin SP
|
|
||
| ## Service Retrieval via `blob/retrieve` (name TBC) | ||
|
|
||
| Storage Nodes delegate a retrieval capability to the upload service, indexing service and filecoin service giving them access to retrieve blobs as required. This can be done while onboarding. Note, this is a variation of Pre-Delegated Retrieval (as above). |
There was a problem hiding this comment.
I think we should have service retrieval for:
Replication (blob/replicate/transfer is itself an authorization to retrieve)
Filecoin SP fetch -- which is a tad more complicated cause of spade -- perhaps we can just do roundabout as a proxy for now.
There was a problem hiding this comment.
space/replica/transfer is an effect that is self signed by the storage node that resolves when the transfer is complete. It doesn't give a storage node authorization to retrieve from a different storage node...
An idea:
We could change the space/blob/replicate invocation to ALSO require a delegation to the upload service to space/content/retrieve the blob from the space. When the upload service calls blob/replica/allocate on a storage node, it re-delegates space/content/retrieve to the storage node, allowing it to fetch the blob.
If we're not going to pay for replications then this could be a new/different capability (e.g. space/content/replication) instead of space/content/retrieve.
There was a problem hiding this comment.
Could the Cause field of the space/replica/transfer invocation, which iirc is an blob/replica/allocate invocation issued by the upload service, be used as "proof" to perform the transfer?
If that's a possibility, my only concern would be the size of the header in the request.
|
Closing in favor of #68 |
RFC to lay out a few options for authorizing retrievals that need to be made internally by the network.
I think it's worth moving forward with On-Demand Delegated Retrieval but interested in opinions and ideas for alternatives.