-
Notifications
You must be signed in to change notification settings - Fork 2
rfc: Guppy Retrieval Strategy #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Guppy Retrieval Strategy | ||
|
|
||
| ## Multi-block requests | ||
|
|
||
| When we retrieve data, how should we make requests to the storage nodes? We want to minimize the overhead of many requests, but also minimize the overhead of over-fetching data. We have a few options: | ||
|
|
||
| * **Naive:** On every request for a block, look up its location, and fetch exactly that block. | ||
| * Each block is retrieved in a separate request. | ||
|
|
||
| * **Shard-Optimistic:** When the first block from a shard is requested, fetch the entire shard blob and cache it. | ||
| * Only one request is made per involved shard, but we may fetch (and egress) more data than required. | ||
| * We also need to hold onto the cached shards until we're done using them, and those shards potentially take up much more space than the actual target data. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Due to the way DAGs are constructed the root block is the last block in the file so by requesting the entire shard, the time it takes to start streaming the data is a lot longer than if you were using naive or range coalescing strategy. From a CLI tool that maybe doesn't matter all that much, but in a gateway situation e.g. retrieving a video, it effects your TTFB.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From a gateway perspective, as soon as you start getting requests for files in a larger DAG this approach doesn't really hold up:
|
||
|
|
||
| * **Range-Coalescing:** When a block is requested, place it in a queue. Periodically, for each shard with blocks in the queue, coalesce the ranges of those blocks, and then request each range. The shards are (currently) CAR files, so adjacent blocks are not literally adjacent (there's a CID and length between them), so we count blocks that are "close enough" as adjacent. | ||
| * If the requested data involves many continguous blocks, this involves many fewer requests than the Naive approach, but not as few as the Shard-Optimistic approach. | ||
| * Like the Naive approach, it retrieves the minimal blocks, though unlike the Naive approach it must egress the CID and length data between blocks, which is then ignored. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is exactly why Filepack exists :) |
||
| * Startup is slow, because only the root block can be fetched on the first request. Efficiency on further rounds is best on wide DAGs and worse on deep DAGs. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not certain this is correct, and I don't know if startup is the right word here. With shard optimistic you'd certainly be making fewer requests. However, depending on the shard size, you can potentially get to streaming the data faster using range coalescing since you don't have to download an entire shard before exporting the data from the DAG (per the aforementioned root as last block problem). #66 would resolve the deep DAGs problem. |
||
| * This approach can also be tuned to coalesce across larger gaps, incurring some of the benefits and costs of the Shard-Optimistic approach when blocks are nearby each other but not directly adjacent. | ||
| * There is also a multipart version of this approach which can make a single request for multiple ranges at once, but server support for this is spotty, and notably lacking in Go. It also incurs overhead in the response which may negate any benefits. | ||
| * Currently, Freeway implements a Range-Coalescing approach, so we have some evidence it works decently. | ||
|
|
||
| * **Chunk-Optimistic:** This is the same as Shard-Optimistic, but divides shards into smaller chunks. When the first block is requested, a range of nearby blocks are fetched along with it. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like an optimization rather than a strategy. |
||
| * Like Range-Coalescing, this strikes a balance between Naive and Shard-Optimistic. | ||
| * Unlike Range-Coalescing, startup can include multiple blocks. | ||
| * Unlike other approaches, the ranges may not be on block borders, because the borders of blocks are unknown until we look up the block in the index. That may make managing cached data difficult to manage. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The only way you know a block is in a shard is because you have an index, so you should know the block boundaries no? |
||
|
|
||
| ### Thoughts | ||
|
|
||
| * For large data, the startup cost of Range-Coalescing is much less significant. Large data is also (warning: speculation) more likely to be wide than deep. The only way to make an especially deep UnixFS DAG would be to start with a very deep directory tree. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ya, I think no way to know and no one size fits all. If you find the CID requested is the root CID of the DAG the index describes i.e. |
||
|
|
||
| * Range-Coalescing takes some effort, but it's a strategy we've implemented before, in JS, and it seems to provide a good balance. | ||
|
|
||
| * Chunk-Optimistic is probably not very useful. | ||
|
|
||
| * Metrics are probably the key to tuning. | ||
|
|
||
| * Any time we egress data that's ultimately discarded, we should have a pretty strong argument for why, since egress is charged to the customer. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ➕ |
||
|
|
||
| ## Managing HTTP requests | ||
|
|
||
| Things we can do: | ||
|
|
||
| * Open a connection to each available storage node and pool them. | ||
| * Round-robin requests. | ||
| * Make sure nodes support HTTP/2 and use its pipelining. *(I think we already do?)* | ||
|
|
||
| ## Possible Metrics | ||
|
|
||
| Metrics that might indicate whether our strategy is serving us, and if not, what to address: | ||
|
|
||
| * Egressed bytes desired vs overhead | ||
| * Number of fetches, and bytes / fetch | ||
| * Total retrieval byte rate | ||
| * Retrieval byte rate per connection | ||
| * Inactive time per connection | ||
| * Request overhead (if we can measure this) | ||
|
|
||
| ## Current proposal | ||
|
|
||
| * Range-Coalescing as implemented in Freeway. | ||
| * All of the connection management ideas above. | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth mentioning that if you don't know you're going to use all the data in the shard then this is not friendly to the user that has stored the data i.e. you have made them pay for egress of data you're not using.