Skip to content

rfc: mutability & encryption for forge#84

Open
fforbeck wants to merge 1 commit intomainfrom
rfc/privacy-mutability-forge
Open

rfc: mutability & encryption for forge#84
fforbeck wants to merge 1 commit intomainfrom
rfc/privacy-mutability-forge

Conversation

@fforbeck
Copy link
Member

@fforbeck fforbeck commented Mar 4, 2026

@fforbeck fforbeck requested a review from a team March 4, 2026 16:47
@fforbeck fforbeck self-assigned this Mar 4, 2026
Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is good, but there are some critical missing bits, that I think will emerge from working closely with @Peeja , @alanshaw and the existing go devs.

Specifically,

  1. "Service layer" -- is this a server? a secondary process process on the dev machine? I don't either of these are a good idea, and the server approach would break a number of design principles about our system (namely that all CIDs should be generated on the client). Personally I think a language port is gonna be WAY faster (especially with AI aided dev) and product a way less complex system, so I'd argue strongly for a full port. These aren't complex libraries and I think we could have them ported in a week or two with the AI helping. And then we have a single process for a single machine, way simpler to maintain and reason about.

  2. There's a bit of unspecified confusion about how Pail works in the Forge context, that I think you might need to embed with @Peeja on guppy to really grok. So Guppy has a notion of "sources" -- i.e. data sources (usually large, deep directories) that get uploaded within a space. Each space has 1..n sources, and when you upload within a space, after the first upload of a source, only the "delta" gets updated-- Guppy knows how to upload just blocks to make a new updated UnixFS root. So with mutabiltiy:

  • You have the list of sources which get updated, and you DEFINITELY want that to be represented by Pail + UCN.
  • You have the directory tree structure within the sources itself. This is currently UnixFS and is updated properly each incremental upload.
  • So the real question is about whether to use Pail for the whole directory tree, and I think that's a complicated question that merits further examination

Reasons not to use Pail:

  1. These are extremely big complicated directories and Pail hasn't been tested at a scale even remotely close to working with these directories
  2. The retrieval patters and general usage for Pail is totally different than for UnixFS -- so the downstream change implications of using Pail for the whole directory tree structure are unknown.

Reasons to use Pail:

  1. Much more fine grained "multi-writer" capabilities are unlocked if you use Pail for everything. If you used pail for just the sources list, then you'd essentially have a last-writer-wins on a per-source level -- if source X is in state A, and two different guppies make several changes to the directory tree structure, written as UnixFS, then the directory structure would by default ONLY get the changes of the last client to write. Note: we could apply a smarter merge outside of PAIL, similar to the way I merge Markdown files in Clawracha. I actually believe this wouldn't be TOO hard.

Final sidebar: Current Guppy is also smart enough to only upload diff blocks for Files when they change. Encryption will kill that ability I believe, unless there's some useful way to encode only changes that works for encrypted data. Worth a google.

│ │ │
│ ▼ │
│ 9. Publish to UCN: Name.publish(pailRootCID) │
│ → mutable name now points to updated index │
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "pail without CRDT" - in the case of multiple concurrent updates to the same name, the UCN resolution is to just use the first of the alphabetically sorted CIDs (IIRC). It means if 2 users start with the same pail, and both make an update, only 1 wins.

The Pail CRDT library allows the two updates to be applied, only resorting to alphabetically sorted CIDs when the two updates have the same causal order and touch the same key.

│ - KMS info │
│ │ │
│ ▼ │
│ 5. Extract encrypted content from CAR using encryptedDataCID│
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we encrypt each block?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants