rfc: space-diff table refactoring by BravoNatalie · Pull Request #78 · storacha/RFC

BravoNatalie · 2026-01-13T14:40:45Z

This RFC proposes a refactor of the space-diff table to eliminate space usage calculation timeouts, prevent duplicate diffs, and improve the efficiency of billing usage calculations.

📖 Preview

travis

looks great! I have a couple suggestions but generally very excited about this change!!

travis · 2026-01-13T20:08:40Z

rfc/refactor-space-diff-table.md

+
+**Proposed solution**
+
+* Add a **GSI with a timestamp-based sort key**


travis · 2026-01-13T20:09:53Z

rfc/refactor-space-diff-table.md

+   * Correct PK design
+   * `cause` as SK
+   * GSI for timestamp-based queries
+2. Export data from the existing table


I think we might even be able to just skip importing the old data - we can spin this system up alongside it, use it in parallel until february, and then cut over to the new table for usage reporting and billing and leave the old diffs in the old table

travis · 2026-01-13T20:11:59Z

rfc/refactor-space-diff-table.md

+
+**Considerations**
+
+- The accumulator MUST process diffs for a space in **ascending** `receiptAt` order. If the write path can deliver out-of-order events and strict ordering cannot be guaranteed, this solution SHOULD be revisited. Pragmatic mitigations include:


is this strictly necessary? I think we could just choose the later of "current accumulator date" or "new diff receiptAt" - this will ensure we always have the "latest" date for a particular accumulator, even in the case where we process diffs out of order

I reviewed this again later and don’t think it’s a problem, especially since receiptAt is generated server-side when we add the diff entry to the table.

travis · 2026-01-13T20:13:30Z

rfc/refactor-space-diff-table.md

+* Keep `space-diff` entries for N months using TTL
+* Archive older diffs to S3 (TBD)
+
+**Considerations**


my only major concern here is the potential race condition if two diffs for the same space read the current total at the same time - we could mitigate this by having a single queue reader processing the diffs - it should go pretty fast and uploads aren't THAT high frequency so having a single queue reader process diffs and update the accumulator should be plenty - this would solve race conditions pretty conclusively I think

my only major concern here is the potential race condition if two diffs for the same space read the current total at the same time

For incrementing? You'd use an update command with ADD which would consistently increment.

I am unsure how this works for the transition from one month to the next though...

hannahhoward

Left one comment with some thoughts and things to consider

hannahhoward · 2026-01-15T20:47:58Z

rfc/refactor-space-diff-table.md

- Alternative when strict ordering is infeasible:
-  - Use time-bucketed diffs (hour/day): persist per-bucket, order-independent aggregates (e.g., Σdelta and Σ(delta × (bucketEnd − receiptAt))). At billing time, iterate buckets in chronological order to compute exact monthly usage, where no event sorting required.
-  - Maintain a size-only monthly state (track `lastSize` and `lastChangeAt`) to accelerate space usage report. Note: this does NOT remove the need to iterate diffs for the billing run.
+To avoid potential race conditions when two diffs for the same space read the current total at the same time, one option is to process diffs through a queue. This would also help preserve the correct ordering of diffs.


@alanshaw can confirm and you should chat briefly with him about this:

my understanding is:

the existing diffs are written in batches of 25 from a kinesis stream with enhanced fan-out. Enhanced fan-out means up to 20 invocations of 25 can be called in parallel to the lambda that writes space diffs

While you could apply a FIFO SQS Queue for a post process step to get maintain this running total, note that part of the heavy batching and parallelism for the kinesis stream, at least to my understanding, is make sure the lambda doesn't fall behind.

I think it may make sense to pursue a kind of daily auto-compaction instead, so that you process in aggregate once a day. Lambda invocations are non-zero cost per invocation.

Second, please make sure the usage/report & account-usage/report maintain the ability to pass a time range -- this is neccesary for Forge

My understanding is that the scenario you’re describing only applies to store/add and store/remove receipts. Blob protocol receipts are written through a different path, via the Blob Registry register and deregister flows.
Since we’re planning to deprecate the Store Protocol, this shouldn’t be an issue. Can you confirm, @alanshaw ?

Yes in the blob protocol the blob registry writes to the space diff table - the ucan stream is not used for this any more. My original idea was to remove it post store protocol removal.

The space snapshot table consolidates space diffs, could you use the existing code/infra (with small modification) to calculate a snapshot on a daily/weekly basis for spaces so that when you want to know current usage you don't have to look so far back?

alanshaw · 2026-01-16T17:29:33Z

rfc/refactor-space-diff-table.md

+
+### Fix for problem 2: Usage calculation timeouts
+
+Introduce a new table (e.g. `space-usage-month`) keyed by `provider#space#YYYY-MM` that is updated atomically on each diff write, making billing reads **O(1)**.


This is basically the space metrics table - could you use that instead?

So yes, we could combine the space-metrics and snapshots with more frequent runs. However, we would still need to query the full list of customers and spaces every time to iterate though the diffs. This would reduce the time spent calculating usage and billing, but the proposed new table would consolidate currently dispersed data into a single place (snapshots, usage, and size) while also providing near real-time visibility into each space. This feels like a better approach for scalability to me, but I'd like to know if you have a different thought since you have a better view of the system. @travis might also have good input here.

@alanshaw I have a question about space-metrics: is it still actively used? I only see writes to it and no reads, does it still make sense to keep it around?

Can you speak to "updated atomically on each diff write" - how is this applied? Is it lambda triggered on an insert event to the space diff table?

How do we transition from one month to the next, retaining the current total, so that concurrent updates succeed and are all applied as expected?

but the proposed new table would consolidate currently dispersed data into a single place (snapshots, usage, and size)

Yes, but at the expense of another dynamo table (storage, writes) and lambda invocation per space diff insert. It's just more infra to run, maintain and (potentially) go wrong, whereas upping the cron frequency for calculating snapshots basically gets you everything you need...unless I'm missing something?

while also providing near real-time visibility into each space

We already have this though...right? Latest snapshot + diffs since is the real time.

Latest snapshot + diffs since is the real time.

yea I think the issue here is that not all spaces can calculate this in a reasonable amount of time - if there are 200k diffs to add to a space snapshot it generally just doesn't work

alanshaw · 2026-01-16T17:40:47Z

rfc/refactor-space-diff-table.md

- Alternative when strict ordering is infeasible:
-  - Use time-bucketed diffs (hour/day): persist per-bucket, order-independent aggregates (e.g., Σdelta and Σ(delta × (bucketEnd − receiptAt))). At billing time, iterate buckets in chronological order to compute exact monthly usage, where no event sorting required.
-  - Maintain a size-only monthly state (track `lastSize` and `lastChangeAt`) to accelerate space usage report. Note: this does NOT remove the need to iterate diffs for the billing run.
+To avoid potential race conditions when two diffs for the same space read the current total at the same time, one option is to process diffs through a queue. This would also help preserve the correct ordering of diffs.


Yes in the blob protocol the blob registry writes to the space diff table - the ucan stream is not used for this any more. My original idea was to remove it post store protocol removal.

The space snapshot table consolidates space diffs, could you use the existing code/infra (with small modification) to calculate a snapshot on a daily/weekly basis for spaces so that when you want to know current usage you don't have to look so far back?

…solutions for space-diff duplication

According to this storacha/RFC#78 (related issue storacha/project-tracking#307) This PR implements the short term solution, adding a **GSI on `cause`** to the existing `space-diff` table. It then uses that GSI to check if a diff with the same `cause` already exists before writing a new one. If it does, we just skip the write. This approach: * Helps prevent new duplicates without changing the table schema * Is quick to roll out with minimal risk * Doesn’t require a migration or any dual-write setup **Limitation:** this is a best-effort guard. It adds a read-before-write step and doesn’t fully guarantee uniqueness (there’s still a small chance of a race condition).

According to this [RFC#78](storacha/RFC#78) This PR updates the billing process to run daily, enabling more frequent snapshot generation, which should help reduce usage report timeouts, and allow for more regular reporting to Stripe.

rfc: initial proposal for refactoring the space-diff table

29db332

BravoNatalie force-pushed the rfc/refactor-space-diff branch from 786450f to 29db332 Compare January 13, 2026 14:51

BravoNatalie requested a review from travis January 13, 2026 15:47

travis approved these changes Jan 13, 2026

View reviewed changes

fix: add adjustments based on the comments

d988b3c

BravoNatalie mentioned this pull request Jan 14, 2026

Redesign space diff table storacha/project-tracking#307

Closed

BravoNatalie requested a review from a team January 14, 2026 18:42

BravoNatalie mentioned this pull request Jan 15, 2026

feat: add new space-diff-v2 table storacha/w3infra#592

Closed

hannahhoward reviewed Jan 15, 2026

View reviewed changes

alanshaw requested changes Jan 16, 2026

View reviewed changes

fix: adjust usage calculation timeout solution

dae4e99

BravoNatalie mentioned this pull request Jan 22, 2026

feat: update billing to run daily storacha/w3infra#601

Merged

fix: remove usage timeout discussion and clarify different timeframe …

36c2c21

…solutions for space-diff duplication

BravoNatalie mentioned this pull request Feb 10, 2026

feat: add space diff table new gsi for cause storacha/w3infra#609

Merged

BravoNatalie requested review from alanshaw and hannahhoward February 10, 2026 16:44


		Proposed solution

		* Add a GSI with a timestamp-based sort key


		Considerations

		- The accumulator MUST process diffs for a space in ascending `receiptAt` order. If the write path can deliver out-of-order events and strict ordering cannot be guaranteed, this solution SHOULD be revisited. Pragmatic mitigations include:


		### Fix for problem 2: Usage calculation timeouts

		Introduce a new table (e.g. `space-usage-month`) keyed by `provider#space#YYYY-MM` that is updated atomically on each diff write, making billing reads O(1).

Conversation

BravoNatalie commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahhoward left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BravoNatalie commented Jan 13, 2026 •

edited

Loading