Skip to content

rfc: billing dashboard implementation strategy#70

Open
volmedo wants to merge 4 commits intomainfrom
vic/rfc/billing-dashboard
Open

rfc: billing dashboard implementation strategy#70
volmedo wants to merge 4 commits intomainfrom
vic/rfc/billing-dashboard

Conversation

@volmedo
Copy link
Member

@volmedo volmedo commented Oct 13, 2025

📚 Preview

A quick RFC to learn what others think about the implementation of the billing dashboard for node operators in the warm storage network.

@volmedo volmedo self-assigned this Oct 13, 2025
Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my main concern is that we'd be setting up another centralized service for this purpose when actually all this information is local to the node, provided they haven't cleared their DB. Would another alternative be to add a CLI or web app that just reads and displays the local data - no complicated auth needed, no central billing oracle...

This initial implementation focuses on:
- **Single metric**: Egressed bytes only (no other billing dimensions)
- **Fixed periods**: Previous month, current month, current week, current day. No ability to configure periods or thresholds (future enhancement)
- **Node-level stats**: Total egress for the node (not broken down by space for now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space shouldn't really be a concern for node operators.


### Alternative 3: Hybrid Approach

Build a UCAN-based API that can be consumed by both CLI and web interfaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the 3 alternatives this is my favourite.

You could setup piri to serve the web app where it dynamically injects a delegation into the UI for auth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea for the operator case, but since the dashboard serves both clients and operators (per the RFC), and clients don't run piri, we'd end up maintaining two different auth paths for the same dashboard. I'd rather have a single flow that works universally.

@volmedo
Copy link
Member Author

volmedo commented Oct 13, 2025

I think my main concern is that we'd be setting up another centralized service for this purpose when actually all this information is local to the node, provided they haven't cleared their DB. Would another alternative be to add a CLI or web app that just reads and displays the local data - no complicated auth needed, no central billing oracle...

I agree with this being another centralized service. But I'm not sure the information to present here is local to the node. What we want to show in the dashboard is not what the node already knows, but the view from the billing service's perspective, which is ultimately what will be considered to decide how much the node will be paid for egress. As long as there is a centralized entity (the billing service) calculating payments, that is the source of truth for that data. Until we have a blockchain to track this kind of stuff 🙂.

Nodes could keep an estimate, though, and that might be enough? Looks like that is more a product question.

@alanshaw
Copy link
Member

...but, what does the billing service know that the storage node doesn't? ...other than the lag due to fetching receipts?

@volmedo
Copy link
Member Author

volmedo commented Oct 13, 2025

some receipts may fail validation, anti-fraud measures could be implemented and reduce claimed egress, USD/byte could change (which indicates that maybe we should be returning currency in the stats rather than bytes...)

I have always thought about the billing service as the place were we confirm everything is in order before compensating node operators, rather than just collecting data.

@frrist
Copy link
Member

frrist commented Oct 13, 2025

I want to step back for a second. The RFC title says "billing dashboard for node operators" but I don't think the original issue was that specific about who this is for (or maybe I missed something here?). From my view, there are actually at least two user groups here:

  1. Node operators - want to understand their egress and earnings
  2. Network operators (Storacha) - need to calculate payments and monitor the network

Given our deadlines, I'd like to avoid building two different dashboards. The underlying data is (mostly) the same(?), we just need different views over it.

What (I think) node operators care about:

  1. How much data did I egress in the context of retrieval
  2. How much of that was valid (and why did some fail validation)
  3. What am I getting paid

What Storacha cares about:

  1. How much data was served, requiring payment, by each node in a given time frame (i.e. a month)
  2. Patterns in validation failures (potential fraud or misconfiguration)
  3. Actually being able to run payment calculations

Note: We will need to calculate payment owed to nodes for egress anyway, and I'd like to re-use those calculations for what we show node operators.

I'd like to see Storacha's/our needs drive the initial design. We're the ones who need to actually calculate and issue payments. If we build something that only lets node operators see charts locally, but doesn't help us answer "how much do we owe each node this month?", I am worried we are building the wrong thing.

I believe the real value is aggregation and trends:

  • For Storacha: Roll up validated egress across all nodes to calculate monthly payments, identify validation failure patterns, monitor network health, etc. (in future versions, I suspect we'll need to pull in data from clients (guppy) to measure actual success and build a reputation score for nodes, since the presences of an invocation and receipt doesn't mean the data was actually served)
  • For node operators: See their validated egress by day/week/month, understand validation failure trends over time, track payment status, yadda yadda yadda,

I think this ends up as a data modeling problem, rather than an auth problem. Before we can build any interface (CLI, web, local display), we need to nail down:

  1. How do we aggregate consolidation results over time? We have individual ConsolidatedRecord entries, but we need monthly totals for payment runs. I assume this is a simple query over the dynamo table(s)?

  2. How do we store validation failures in a queryable way? The errors are in the consolidate receipts, but can we query "show me all validation failures for node X last month"? I suspect this may require a separate view/table of the invocations and receipts in the records, rather than just the links we keep currently.

  3. What's the payment state tracking? We need to mark which consolidated records have been included in payment runs so we can show "paid" vs "pending" vs "in current period". Again, this will probably need a separate table/view.

  4. Do we need additional fields? Maybe we want to track both claimed egress (bytes in submitted receipts) and valid egress (after validation) for fraud detection. Probably a summation per node over all EgressRecord's receipts/invocation size field minus the consolidation TotalBytes field.

The most resilient source of truth for this data is the database maintained by the etracker. IMO the fastest way to get something running that satisfies these points would be to get a Grafana (or similar) dashboard running against our DynamoDB tables. Then we can:

  • Build the aggregation queries we need for payment calculations
  • Create views that answer "how much do we owe each node this month?" (Storacha's view)
  • Create filtered views showing per-node stats (node operator's view)
  • Surface validation failures so we can track patterns
  • Run it in Kiosk mode initially (auth can come later, if at all)
  • Learn what data we're missing before we architect a bigger solution

Once we know what queries we need and what data gaps exist, building a custom solution becomes straightforward - we're just re-implementing the queries we already wrote. And importantly, we can serve both user groups with the same underlying queries, just different filters and aggregations. (If that doesn't feel like enough, then a simple CLI on Piri over the egress tracker state could be added)

TL;DR, I'd like us to figure out what data we need to make payment/business decisions first, then the question of "how do node operators access their slice of this data" becomes much simpler to answer.

@frrist
Copy link
Member

frrist commented Oct 13, 2025

ooo and as @heyjay44 just mentioned, clients of the network will also likely be a separate user group to consider.


## Appendix A: Implementation Alternatives

### Alternative 1: Web Application
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the options presented I strongly prefer this one, though with a few modifications.

We can reduce the complexity of the authentication flow here significantly by adopting patterns that already exist. All entities in the Storacha ecosystem have DIDs, let's use them. I see at least two options:

  1. Extend/use the existing Storacha login flow — email verification (or OAuth, but probably later) that authenticates to a did:mailto. This is already familiar to users and the infrastructure exists. I'd expect this to be used by Forge Clients.
  2. Direct DID key challenge-response. Client signs a nonce with their private key, site verifies against the claimed DID. No passwords, sessions, or separate node ownership verification. Standard wallet-auth pattern. Expect this to be used by Piri Node operators.

Prior art for (2): https://eips.ethereum.org/EIPS/eip-4361

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the idea is to use the same email login flow to build what is, essentially, the console equivalent for Forge. This is mainly the difference between the web application approach and the hybrid approach, the way the user authenticates to the app. I added some wording to clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, is the immediate deliverable a web dashboard only, with the API designed such that CLI access could be added later? Or is the plan to ship both web and CLI interfaces now?

Asking since the RFC frames the distinction between Alt 1 and Alt 3 as "web only" vs "web + CLI" — but your comment describes auth mechanism, which seems orthogonal... We could build a UCAN-authenticated web dashboard without any CLI component.

If CLI is in scope, I'm open to a minimal presence — a single piri billing command that prints current period stats. But I'd want to explicitly scope this in the RFC: read-only, no interactive TUI, no date range queries, no subcommand sprawl. Anything beyond "show me my numbers" belongs in the web dashboard. Is that the intention here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: yes, the immediate goal is providing a web interface only, so no tasks for you related with this feature.

I'm sorry it's confusing. I originally created the RFC to implement the operator dashboard, then ended up implementing the admin dashboard, and now re-used it to define the customer dashboard. The assumption I was running on when I wrote about the different alternatives was that operators would likely prefer a CLI-based interface.

As you mention, the UCAN-based API will be there if we wanted to build some CLI in the future. For this specific endeavour, since we are talking about the customer dashboard, I think it would make more sense for such CLI to live in guppy rather than piri.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe having the UCAN-based API will prove useful for both Piri and Guppy 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants