feat(internal): checkpoint population WASM#522
feat(internal): checkpoint population WASM#522aristidesstaffieri wants to merge 32 commits intofeature/data-migrationsfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds WASM tracking during checkpoint population to support future data migrations that will match contract interfaces against known protocol specifications. The changes introduce a new protocol_wasms table, refactor checkpoint processing into a dedicated service, and extract a TokenProcessor interface to support the new architecture.
Changes:
- Add
protocol_wasmstable to track WASM hashes encountered during checkpoint population - Create
WasmIngestionServiceto process and persist WASM bytecode with optional protocol validation - Refactor checkpoint population logic into
CheckpointServicethat orchestrates both token and WASM ingestion in a single pass - Extract
TokenProcessorinterface fromTokenIngestionServiceto separate checkpoint processing from live ingestion concerns
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/db/migrations/2026-02-20.0-protocol_wasms.sql | Adds protocol_wasms table with wasm_hash (PK), protocol_id, and created_at columns |
| internal/data/protocol_wasms.go | Implements ProtocolWasmModel with BatchInsert method using UNNEST for efficient inserts |
| internal/data/protocol_wasms_test.go | Comprehensive tests for BatchInsert including empty input, duplicates, and error cases |
| internal/data/mocks.go | Adds ProtocolWasmModelMock for testing |
| internal/data/models.go | Adds ProtocolWasm field to Models struct |
| internal/services/wasm_ingestion.go | Implements WasmIngestionService to track WASM hashes and run protocol validators |
| internal/services/wasm_ingestion_test.go | 10 test cases covering ProcessContractCode and PersistProtocolWasms |
| internal/services/checkpoint.go | Implements CheckpointService to orchestrate single-pass checkpoint population |
| internal/services/checkpoint_test.go | 16 test cases covering entry routing, error propagation, and context cancellation |
| internal/services/token_ingestion.go | Extracts TokenProcessor interface, moves checkpoint iteration logic to CheckpointService, removes db/archive dependencies |
| internal/services/token_ingestion_test.go | Adds tests for TokenProcessor methods (ProcessEntry, ProcessContractCode) with helper functions |
| internal/services/mocks.go | Adds mocks for CheckpointService, WasmIngestionService, TokenProcessor, ContractValidator, ProtocolValidator, and ChangeReader |
| internal/services/ingest.go | Wires CheckpointService into IngestServiceConfig and ingestService |
| internal/services/ingest_live.go | Updates to call checkpointService.PopulateFromCheckpoint instead of tokenIngestionService.PopulateAccountTokens |
| internal/ingest/ingest.go | Creates WasmIngestionService and CheckpointService in setupDeps, updates TokenIngestionService parameter order |
| internal/loadtest/runner.go | Removes dbPool parameter from NewTokenIngestionServiceForLoadtest call |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@aristidesstaffieri I've opened a new pull request, #524, to work on those changes. Once the pull request is ready, I'll request review from you. |
| SELECT u.contract_id, u.wasm_hash, u.protocol_id, u.name | ||
| FROM UNNEST($1::text[], $2::text[], $3::text[], $4::text[]) | ||
| AS u(contract_id, wasm_hash, protocol_id, name) | ||
| WHERE EXISTS (SELECT 1 FROM protocol_wasms pw WHERE pw.wasm_hash = u.wasm_hash) |
There was a problem hiding this comment.
This WHERE EXISTS clause is redundant since we already have an FK check using the references in the schema
There was a problem hiding this comment.
The problem is that ContractData entries and ContractCode entries have different TTLs so an instance can outlive its WASM code. You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms.
The FK in this case causes the entire batch to fail if any row references a missing WASM hash. The WHERE EXISTS clause silently skips those rows, allowing the rest of the batch to succeed.
There was a problem hiding this comment.
Got it. makes sense.
You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms
@aristidesstaffieri so does that mean if a wasm is evicted before reading a checkpoint then we will not have it ingested in wallet backend?
There was a problem hiding this comment.
yeah so if a WASM's TTL expires before the checkpoint is captured, it won't be in the history archive snapshot, so we won't ingest it during checkpoint population, and then this check will skip the contract entirely. The WASM will need to be restored before it can be invoked, and at that point classification will happen during normal ingestion.
I see your concern here, since our checkpoint is at the tip and not at the beginning of the retention window, that leaves the possibility for there to be contracts before the checkpoint but in the history retention window which cannot be classified if they are never restored.
This doesn't seem like it will be common given the auto-extend TTL pattern, but we can consider some alternatives:
- Use a checkpoint at the beginning of history retention and change the startup process to catch up from there . This is a pretty big departure and changes the orchestration for startup a lot.
- Progressively look back to previous checkpoints for missing WASMs. This has no guarantees on how long it will take and we need to scan the entire archive, not great.
- Accept the edge case and leave contracts that are evicted before the checkpoint as un-classifiable until restoration happens.
Personally, I think the third option of accepting this is the most practical but open to talking through any of this.
There was a problem hiding this comment.
@aditya1702 I had a chat with @JakeUrban and this is what we came up with.
This gap can be resolved by introducing another checkpoint population which should happen at the last checkpoint before the protocol migration's first ledger. There will likely be a gap between that snapshot and the start of your migration's window, so migration backfill should account for this by looking for WASM uploads in that gap as well before regular processing begins.
I'll wait to get your thoughts on this, but if you agree then this should become an addition to this ticket: #516
There was a problem hiding this comment.
You can use the hot archive iterator to iterate through archived entries: http://pkg.go.dev/github.com/stellar/go/ingest#NewHotArchiveIterator
There was a problem hiding this comment.
I've POC'd this approach(not committed yet) and it seems like this does work but with some caveats.
Today, evicted persistent entries will be kept in the hot archive indefinitely, so we will be able to get any evicted WASM regardless of how long it's been.
Once CAP-57 is implemented then this will no longer be true, and only the Merkle root of the archival snapshot will be retained. This will make evicted entries unrecoverable once the hot archive becomes full.
This should be ok for our deployment since it will likely happen before CAP-57 is introduced but at some point in the future it could become a problem for future deployments.
wdyt @JakeUrban @aditya1702
Our backup approach(noted above) wouldn't have this same constraint but is much more complex. I still have the opinion that this will be an acceptable edge case and from the product side I believe this has been accepted as an edge case that we can document but not solve for.
There was a problem hiding this comment.
Yea sounds like using the hot archive won't be a solution long-term. I'm ok with the approach described here.
There was a problem hiding this comment.
sounds good, I'll go with that approach then.
…_status to differentiate between not started and in progress migrations
…steps in the ContractData branch, removes Balance branch
…istinction between uploads and upgrades/deployments
…col-setup in the "When Checkpoint Classification Runs" section
…dow, in order to discard state changes outside of retention. 1. Schema changes: enabled field removed, display_name removed, status default is not_started 2. Status values: All updated to new naming scheme (not_started, classification_in_progress, classification_success, backfilling_in_progress, backfilling_success, failed) 3. protocol-setup: Now uses --protocol-id flag (opt-in), updated command examples and workflow 4. Classification section (line 125): Updated to describe ContractCode validation and ContractData lookup 5. Checkpoint population diagram: Removed Balance branch, updated to show WASM hash storage in known_wasms 6. Live ingestion classification diagram: Separated into ContractCode and ContractData paths with RPC fallback 7. Live State Production diagram: Updated classification box to mention ContractCode uploads and ContractData Instance changes 8. Backfill migration: Added retention-aware processing throughout (flow diagram, workflow diagram, parallel processing) 9. Parallel backfill worker pool: Added steps for retention window filtering
… relationship between classification and state production
…igration status in the API for protocols
…s tracking - Add known_wasms table (migration, model, mock, and data layer tests) for tracking WASM hashes during checkpoint population - Add KnownWasm field to Models struct - Create WasmIngestionService (wasm_ingestion.go) that runs protocol validators against WASM bytecode and batch-persists hashes to known_wasms - Create CheckpointService (checkpoint.go) that orchestrates single-pass checkpoint population, delegating ContractCode entries to both WasmIngestionService and TokenProcessor, and all other entries to TokenProcessor - Extract readerFactory on checkpointService for injectable checkpoint reader creation - Extract TokenProcessor interface and NewTokenProcessor from TokenIngestionService, moving checkpoint iteration logic out of token_ingestion.go into checkpoint.go - Remove db, archive, and PopulateAccountTokens from TokenIngestionService interface and struct - Remove dbPool parameter from NewTokenIngestionServiceForLoadtest - Wire CheckpointService into IngestServiceConfig and ingestService - Update ingest_live.go to call checkpointService.PopulateFromCheckpoint instead of tokenIngestionService.PopulateAccountTokens - Update ingest.go setupDeps to construct WasmIngestionService and CheckpointService - Add ContractValidatorMock, ProtocolValidatorMock, ChangeReaderMock, CheckpointServiceMock, WasmIngestionServiceMock, TokenProcessorMock, and TokenIngestionServiceMock updates to mocks.go - Add unit tests for WasmIngestionService (10 cases covering ProcessContractCode and PersistKnownWasms) - Add unit tests for CheckpointService (16 cases covering entry routing, error propagation, and context cancellation)
Replace mock.Anything with the actual contractValidatorMock (or cv in setupMocks closures) for the third argument in all NewTokenProcessor mock expectations. This ensures tests verify that checkpointService passes its own contractValidator through to the token processor. Also capture contractValidatorMock in the context cancellation test destructuring instead of using a type assertion on svc.contractValidator.
Cover the six entry-type branches in ProcessEntry with direct unit tests using a lightweight tokenProcessor (no DB, no mocks—just inspect accumulated batch slices and checkpoint data maps). Test cases: - account_entry: native balance with minimum balance computation - trustline_entry: credit trustline balance + uniqueAssets tracking - trustline_pool_share_skipped: pool share silently skipped - contract_instance_non_sac: WASM contract stored as Unknown + hash tracked - contract_balance_non_sac: holder→contract UUID mapping - unhandled_entry_type_ignored: offer entry produces no effect
…IngestionService (#524) * Initial plan * Remove validator execution from WasmIngestionService Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com> * services/wasm_ingestion: remove ProtocolValidator execution from WasmIngestionService Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com>
…IngestionService to use config struct WasmIngestionService.ProcessContractCode no longer receives the full bytecode—it only needs the hash to track protocol WASMs. This reduces memory pressure during checkpoint population. TokenIngestionService construction is consolidated into a single NewTokenIngestionService(config) constructor, eliminating the separate NewTokenIngestionServiceForLoadtest variant. The loadtest runner now uses the same constructor with only the fields it needs. Also refactors processContractInstanceChange to return a contractInstanceResult struct instead of multiple return values, extracts newCheckpointData() helper, uses idiomatic nil slices instead of make([]T, 0), and introduces a checkpointTestFixture struct to reduce boilerplate in checkpoint tests. Constructors return concrete types instead of interfaces to allow direct field access in tests.
Persist contract-to-WASM-hash mappings by extending WasmIngestionService with ProcessContractData and PersistProtocolContracts methods. During checkpoint population, ContractData Instance entries are parsed to extract the wasm_hash and contract_id relationship, which is stored in a new protocol_contracts table (FK to protocol_wasms). This mapping will be used by protocol-setup and live ingestion to classify contracts by protocol.
…and backfill Add two new LedgerChangeProcessors (ProtocolWasmProcessor, ProtocolContractProcessor) that extract WASM hashes and contract-to-WASM mappings from ledger changes during live ingestion, catchup, and historical backfill. Previously this data was only populated during checkpoint. - ProtocolWasmProcessor extracts hashes from ContractCode entries - ProtocolContractProcessor extracts contract-to-WASM mappings from ContractData Instance entries - Extended IndexerBuffer with protocolWasmsByHash/protocolContractsByID maps (Push/Get/Merge/Clear) - PersistLedgerData inserts wasms before contracts (FK ordering) with ON CONFLICT DO NOTHING - BatchChanges and processBatchChanges extended for backfill paths
ContractData Instance entries can outlive their referenced ContractCode entries due to independent TTLs, causing FK violations when inserting protocol_contracts during checkpoint population. - Skip contracts referencing unknown WASM hashes in PersistProtocolContracts - Add WHERE EXISTS guard in BatchInsert SQL for live/backfill path - Add test for contracts_with_missing_wasm_skipped scenario
Store wasm_hash and contract_id as raw bytes instead of hex/strkey-encoded strings. Both values originate as [32]byte arrays in XDR, so BYTEA reduces storage by ~50%, improves index performance on fixed-size keys, and removes unnecessary encoding/decoding at the persistence boundary.
The protocol_id on protocol_contracts was always NULL and never queried. It's derivable via the existing FK join: protocol_contracts.wasm_hash → protocol_wasms.wasm_hash → protocol_wasms.protocol_id.
7a0015c to
c62226b
Compare
Replace raw []byte with types.HashBytea for WasmHash and ContractID fields in ProtocolWasm and ProtocolContract models. HashBytea implements sql.Scanner and driver.Valuer to auto-convert between raw bytes (DB) and hex strings (Go), consistent with how Transaction.Hash is handled. Updated files: - internal/data/protocol_wasms.go, protocol_contracts.go (models + BatchInsert) - internal/indexer/processors/protocol_wasms.go, protocol_contracts.go - internal/services/wasm_ingestion.go - All corresponding test files
Replace unchecked hex.DecodeString calls with HashBytea.Value() for DB verification queries, and remove unused encoding/hex import.
The persistence pipeline was silently dropping contract upgrades via first-write-wins semantics, meaning a contract that upgrades its WASM never got its wasm_hash updated. This changes all layers (buffer, buffer merge, backfill batch/cross-batch merge, and DB upsert) to last-write-wins so contract upgrades are correctly reflected.
| SELECT u.contract_id, u.wasm_hash, u.protocol_id, u.name | ||
| FROM UNNEST($1::text[], $2::text[], $3::text[], $4::text[]) | ||
| AS u(contract_id, wasm_hash, protocol_id, name) | ||
| WHERE EXISTS (SELECT 1 FROM protocol_wasms pw WHERE pw.wasm_hash = u.wasm_hash) |
There was a problem hiding this comment.
Got it. makes sense.
You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms
@aristidesstaffieri so does that mean if a wasm is evicted before reading a checkpoint then we will not have it ingested in wallet backend?
matching the pattern used by AccountsProcessor, TrustlinesProcessor, and other existing processors.
…kpointService
WasmIngestionService was only used by CheckpointService, and
TokenIngestionService's NewTokenProcessor/TokenProcessor interface was
only used by CheckpointService. This inlines all checkpoint-specific
logic directly into CheckpointService, eliminating unnecessary
intermediate service abstractions.
- Rewrite checkpoint.go to absorb all checkpoint logic: checkpointData,
batch, trustline/contract/WASM processing, and protocol persistence
- Replace positional NewCheckpointService args with CheckpointServiceConfig
- Strip token_ingestion.go to live-only (ProcessTokenChanges); remove
TokenProcessor interface, NewTokenProcessor, and checkpoint-only fields
from TokenIngestionServiceConfig
- Delete wasm_ingestion.go (absorbed into checkpoint.go)
- Remove WasmIngestionServiceMock, TokenProcessorMock from mocks.go
- Update ingest.go wiring and simplify TokenIngestionServiceConfig
- Rewrite checkpoint_test.go with data model mocks; port WASM and
checkpoint processor tests from deleted test files
- Add TrustlineAssetModelMock to data/mocks.go
- Add valid AccountId to makeAccountChange() helper to prevent nil pointer dereference - Add missing protocolWasmModel.BatchInsert mock expectation in ContractCodeEntry test - Fix ContextCancellation test to cancel context during reader.Read() instead of before PopulateFromCheckpoint, matching the expected error path
Closes #505
Closes #507
What
Adds
protocol_wasmsandprotocol_contractstables to track WASM hashes and contract-to-WASM mappings. These are populated during checkpoint population, live ingestion,catchup, and historical backfill.
Checkpoint population (existing + extended):
protocol_wasmsstores WASM hashes extracted from ContractCode ledger entriesprotocol_contractsstores contract-to-WASM-hash mappings extracted from ContractData Instance entriesprotocol_wasms.wasm_hashLive ingestion & backfill (new):
Why
To support the classification side of the data migrations feature. Protocol classification needs to know which WASM hashes exist and which contracts map to which WASMs.
Known limitations
N/A
Issue that this PR addresses
#505
#507
Checklist
PR Structure
allif the changes are broad or impact many packages.Thoroughness
Release