Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions rfc/guppy-retrieval-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Guppy Retrieval Strategy

## Multi-block requests

When we retrieve data, how should we make requests to the storage nodes? We want to minimize the overhead of many requests, but also minimize the overhead of over-fetching data. We have a few options:

* **Naive:** On every request for a block, look up its location, and fetch exactly that block.
* Each block is retrieved in a separate request.

* **Shard-Optimistic:** When the first block from a shard is requested, fetch the entire shard blob and cache it.
* Only one request is made per involved shard, but we may fetch (and egress) more data than required.
* We also need to hold onto the cached shards until we're done using them, and those shards potentially take up much more space than the actual target data.
Copy link
Member

@alanshaw alanshaw Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning that if you don't know you're going to use all the data in the shard then this is not friendly to the user that has stored the data i.e. you have made them pay for egress of data you're not using.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the way DAGs are constructed the root block is the last block in the file so by requesting the entire shard, the time it takes to start streaming the data is a lot longer than if you were using naive or range coalescing strategy. From a CLI tool that maybe doesn't matter all that much, but in a gateway situation e.g. retrieving a video, it effects your TTFB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a gateway perspective, as soon as you start getting requests for files in a larger DAG this approach doesn't really hold up:

  • The TTFB will be too slow
  • The disk cache would have to be massive and is very wasteful
  • There's a big cost to users for wasted bandwidth


* **Range-Coalescing:** When a block is requested, place it in a queue. Periodically, for each shard with blocks in the queue, coalesce the ranges of those blocks, and then request each range. The shards are (currently) CAR files, so adjacent blocks are not literally adjacent (there's a CID and length between them), so we count blocks that are "close enough" as adjacent.
* If the requested data involves many continguous blocks, this involves many fewer requests than the Naive approach, but not as few as the Shard-Optimistic approach.
* Like the Naive approach, it retrieves the minimal blocks, though unlike the Naive approach it must egress the CID and length data between blocks, which is then ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly why Filepack exists :)

* Startup is slow, because only the root block can be fetched on the first request. Efficiency on further rounds is best on wide DAGs and worse on deep DAGs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain this is correct, and I don't know if startup is the right word here. With shard optimistic you'd certainly be making fewer requests. However, depending on the shard size, you can potentially get to streaming the data faster using range coalescing since you don't have to download an entire shard before exporting the data from the DAG (per the aforementioned root as last block problem).

#66 would resolve the deep DAGs problem.

* This approach can also be tuned to coalesce across larger gaps, incurring some of the benefits and costs of the Shard-Optimistic approach when blocks are nearby each other but not directly adjacent.
* There is also a multipart version of this approach which can make a single request for multiple ranges at once, but server support for this is spotty, and notably lacking in Go. It also incurs overhead in the response which may negate any benefits.
* Currently, Freeway implements a Range-Coalescing approach, so we have some evidence it works decently.

* **Chunk-Optimistic:** This is the same as Shard-Optimistic, but divides shards into smaller chunks. When the first block is requested, a range of nearby blocks are fetched along with it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like an optimization rather than a strategy.

* Like Range-Coalescing, this strikes a balance between Naive and Shard-Optimistic.
* Unlike Range-Coalescing, startup can include multiple blocks.
* Unlike other approaches, the ranges may not be on block borders, because the borders of blocks are unknown until we look up the block in the index. That may make managing cached data difficult to manage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the borders of blocks are unknown until we look up the block in the index

The only way you know a block is in a shard is because you have an index, so you should know the block boundaries no?


### Thoughts

* For large data, the startup cost of Range-Coalescing is much less significant. Large data is also (warning: speculation) more likely to be wide than deep. The only way to make an especially deep UnixFS DAG would be to start with a very deep directory tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I think no way to know and no one size fits all.

If you find the CID requested is the root CID of the DAG the index describes i.e. request CID == index.content and the shard is within some acceptable size bound you could use shard optimistic but otherwise I think Range-Coalescing is the best way.


* Range-Coalescing takes some effort, but it's a strategy we've implemented before, in JS, and it seems to provide a good balance.

* Chunk-Optimistic is probably not very useful.

* Metrics are probably the key to tuning.

* Any time we egress data that's ultimately discarded, we should have a pretty strong argument for why, since egress is charged to the customer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## Managing HTTP requests

Things we can do:

* Open a connection to each available storage node and pool them.
* Round-robin requests.
* Make sure nodes support HTTP/2 and use its pipelining. *(I think we already do?)*

## Possible Metrics

Metrics that might indicate whether our strategy is serving us, and if not, what to address:

* Egressed bytes desired vs overhead
* Number of fetches, and bytes / fetch
* Total retrieval byte rate
* Retrieval byte rate per connection
* Inactive time per connection
* Request overhead (if we can measure this)

## Current proposal

* Range-Coalescing as implemented in Freeway.
* All of the connection management ideas above.