Skip to content

feat(internal): checkpoint population WASM#522

Open
aristidesstaffieri wants to merge 32 commits intofeature/data-migrationsfrom
feature/checkpoint-population-wasm
Open

feat(internal): checkpoint population WASM#522
aristidesstaffieri wants to merge 32 commits intofeature/data-migrationsfrom
feature/checkpoint-population-wasm

Conversation

@aristidesstaffieri
Copy link
Contributor

@aristidesstaffieri aristidesstaffieri commented Feb 27, 2026

Closes #505
Closes #507

What

Adds protocol_wasms and protocol_contracts tables to track WASM hashes and contract-to-WASM mappings. These are populated during checkpoint population, live ingestion,
catchup, and historical backfill.

Checkpoint population (existing + extended):

  • protocol_wasms stores WASM hashes extracted from ContractCode ledger entries
  • protocol_contracts stores contract-to-WASM-hash mappings extracted from ContractData Instance entries
  • CheckpointService delegates ContractCode entries to WasmIngestionService for hash extraction and ContractData Instance entries for contract-to-WASM mapping
  • New ProcessContractData and PersistProtocolContracts methods on WasmIngestionService
  • New protocol_contracts migration with FK to protocol_wasms.wasm_hash

Live ingestion & backfill (new):

  • ProtocolWasmProcessor — new LedgerChangeProcessor that extracts WASM hashes from ContractCode ledger changes
  • ProtocolContractProcessor — new LedgerChangeProcessor that extracts contract-to-WASM mappings from ContractData Instance entries with WASM executables
  • IndexerBuffer extended with protocolWasmsByHash / protocolContractsByID maps (Push/Get/Merge/Clear, first-write-wins dedup)
  • PersistLedgerData inserts wasms before contracts (FK ordering) with ON CONFLICT DO NOTHING for idempotency
  • BatchChanges and processBatchChanges extended to collect and persist protocol data during backfill

Why

To support the classification side of the data migrations feature. Protocol classification needs to know which WASM hashes exist and which contracts map to which WASMs.

Known limitations

N/A

Issue that this PR addresses

#505
#507

Checklist

PR Structure

  • It is not possible to break this PR down into smaller PRs.
  • This PR does not mix refactoring changes with feature changes.
  • This PR's title starts with name of package that is most changed in the PR, or all if the changes are broad or impact many packages.

Thoroughness

  • This PR adds tests for the new functionality or fixes.
  • All updated queries have been tested (refer to this check if the data set returned by the updated query is expected to be same as the original one).

Release

  • This is not a breaking change.
  • This is ready to be tested in development.
  • The new functionality is gated with a feature flag if this is not ready for production.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds WASM tracking during checkpoint population to support future data migrations that will match contract interfaces against known protocol specifications. The changes introduce a new protocol_wasms table, refactor checkpoint processing into a dedicated service, and extract a TokenProcessor interface to support the new architecture.

Changes:

  • Add protocol_wasms table to track WASM hashes encountered during checkpoint population
  • Create WasmIngestionService to process and persist WASM bytecode with optional protocol validation
  • Refactor checkpoint population logic into CheckpointService that orchestrates both token and WASM ingestion in a single pass
  • Extract TokenProcessor interface from TokenIngestionService to separate checkpoint processing from live ingestion concerns

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/db/migrations/2026-02-20.0-protocol_wasms.sql Adds protocol_wasms table with wasm_hash (PK), protocol_id, and created_at columns
internal/data/protocol_wasms.go Implements ProtocolWasmModel with BatchInsert method using UNNEST for efficient inserts
internal/data/protocol_wasms_test.go Comprehensive tests for BatchInsert including empty input, duplicates, and error cases
internal/data/mocks.go Adds ProtocolWasmModelMock for testing
internal/data/models.go Adds ProtocolWasm field to Models struct
internal/services/wasm_ingestion.go Implements WasmIngestionService to track WASM hashes and run protocol validators
internal/services/wasm_ingestion_test.go 10 test cases covering ProcessContractCode and PersistProtocolWasms
internal/services/checkpoint.go Implements CheckpointService to orchestrate single-pass checkpoint population
internal/services/checkpoint_test.go 16 test cases covering entry routing, error propagation, and context cancellation
internal/services/token_ingestion.go Extracts TokenProcessor interface, moves checkpoint iteration logic to CheckpointService, removes db/archive dependencies
internal/services/token_ingestion_test.go Adds tests for TokenProcessor methods (ProcessEntry, ProcessContractCode) with helper functions
internal/services/mocks.go Adds mocks for CheckpointService, WasmIngestionService, TokenProcessor, ContractValidator, ProtocolValidator, and ChangeReader
internal/services/ingest.go Wires CheckpointService into IngestServiceConfig and ingestService
internal/services/ingest_live.go Updates to call checkpointService.PopulateFromCheckpoint instead of tokenIngestionService.PopulateAccountTokens
internal/ingest/ingest.go Creates WasmIngestionService and CheckpointService in setupDeps, updates TokenIngestionService parameter order
internal/loadtest/runner.go Removes dbPool parameter from NewTokenIngestionServiceForLoadtest call

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Feb 27, 2026

@aristidesstaffieri I've opened a new pull request, #524, to work on those changes. Once the pull request is ready, I'll request review from you.

@stellar stellar deleted a comment from Copilot AI Feb 27, 2026
@aristidesstaffieri aristidesstaffieri requested a review from a team February 27, 2026 21:22
Base automatically changed from feature/data-migrations-design to feature/data-migrations March 5, 2026 20:55
@aristidesstaffieri aristidesstaffieri changed the title feature: checkpoint population WASM feat(internal): checkpoint population WASM Mar 9, 2026
SELECT u.contract_id, u.wasm_hash, u.protocol_id, u.name
FROM UNNEST($1::text[], $2::text[], $3::text[], $4::text[])
AS u(contract_id, wasm_hash, protocol_id, name)
WHERE EXISTS (SELECT 1 FROM protocol_wasms pw WHERE pw.wasm_hash = u.wasm_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WHERE EXISTS clause is redundant since we already have an FK check using the references in the schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that ContractData entries and ContractCode entries have different TTLs so an instance can outlive its WASM code. You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms.

The FK in this case causes the entire batch to fail if any row references a missing WASM hash. The WHERE EXISTS clause silently skips those rows, allowing the rest of the batch to succeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. makes sense.

You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms

@aristidesstaffieri so does that mean if a wasm is evicted before reading a checkpoint then we will not have it ingested in wallet backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so if a WASM's TTL expires before the checkpoint is captured, it won't be in the history archive snapshot, so we won't ingest it during checkpoint population, and then this check will skip the contract entirely. The WASM will need to be restored before it can be invoked, and at that point classification will happen during normal ingestion.

I see your concern here, since our checkpoint is at the tip and not at the beginning of the retention window, that leaves the possibility for there to be contracts before the checkpoint but in the history retention window which cannot be classified if they are never restored.

This doesn't seem like it will be common given the auto-extend TTL pattern, but we can consider some alternatives:

  • Use a checkpoint at the beginning of history retention and change the startup process to catch up from there . This is a pretty big departure and changes the orchestration for startup a lot.
  • Progressively look back to previous checkpoints for missing WASMs. This has no guarantees on how long it will take and we need to scan the entire archive, not great.
  • Accept the edge case and leave contracts that are evicted before the checkpoint as un-classifiable until restoration happens.

Personally, I think the third option of accepting this is the most practical but open to talking through any of this.

Copy link
Contributor Author

@aristidesstaffieri aristidesstaffieri Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditya1702 I had a chat with @JakeUrban and this is what we came up with.

This gap can be resolved by introducing another checkpoint population which should happen at the last checkpoint before the protocol migration's first ledger. There will likely be a gap between that snapshot and the start of your migration's window, so migration backfill should account for this by looking for WASM uploads in that gap as well before regular processing begins.

I'll wait to get your thoughts on this, but if you agree then this should become an addition to this ticket: #516

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the hot archive iterator to iterate through archived entries: http://pkg.go.dev/github.com/stellar/go/ingest#NewHotArchiveIterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've POC'd this approach(not committed yet) and it seems like this does work but with some caveats.

Today, evicted persistent entries will be kept in the hot archive indefinitely, so we will be able to get any evicted WASM regardless of how long it's been.

Once CAP-57 is implemented then this will no longer be true, and only the Merkle root of the archival snapshot will be retained. This will make evicted entries unrecoverable once the hot archive becomes full.

This should be ok for our deployment since it will likely happen before CAP-57 is introduced but at some point in the future it could become a problem for future deployments.

wdyt @JakeUrban @aditya1702

Our backup approach(noted above) wouldn't have this same constraint but is much more complex. I still have the opinion that this will be an acceptable edge case and from the product side I believe this has been accepted as an edge case that we can document but not solve for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sounds like using the hot archive won't be a solution long-term. I'm ok with the approach described here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I'll go with that approach then.

aristidesstaffieri and others added 15 commits March 11, 2026 09:00
…_status to differentiate between not started and in progress migrations
…steps in the ContractData branch, removes Balance branch
…istinction between uploads and upgrades/deployments
…col-setup in the "When Checkpoint Classification Runs" section
…dow, in order to discard state changes outside of retention.

1. Schema changes: enabled field removed, display_name removed, status default is not_started
2. Status values: All updated to new naming scheme (not_started, classification_in_progress, classification_success, backfilling_in_progress, backfilling_success, failed)
3. protocol-setup: Now uses --protocol-id flag (opt-in), updated command examples and workflow
4. Classification section (line 125): Updated to describe ContractCode validation and ContractData lookup
5. Checkpoint population diagram: Removed Balance branch, updated to show WASM hash storage in known_wasms
6. Live ingestion classification diagram: Separated into ContractCode and ContractData paths with RPC fallback
7. Live State Production diagram: Updated classification box to mention ContractCode uploads and ContractData Instance changes
8. Backfill migration: Added retention-aware processing throughout (flow diagram, workflow diagram, parallel processing)
9. Parallel backfill worker pool: Added steps for retention window filtering
… relationship between classification and state production
…s tracking

  - Add known_wasms table (migration, model, mock, and data layer tests) for tracking WASM hashes during checkpoint population
  - Add KnownWasm field to Models struct
  - Create WasmIngestionService (wasm_ingestion.go) that runs protocol validators against WASM bytecode and batch-persists hashes to known_wasms
  - Create CheckpointService (checkpoint.go) that orchestrates single-pass checkpoint population, delegating ContractCode entries to both WasmIngestionService and
  TokenProcessor, and all other entries to TokenProcessor
  - Extract readerFactory on checkpointService for injectable checkpoint reader creation
  - Extract TokenProcessor interface and NewTokenProcessor from TokenIngestionService, moving checkpoint iteration logic out of token_ingestion.go into checkpoint.go
  - Remove db, archive, and PopulateAccountTokens from TokenIngestionService interface and struct
  - Remove dbPool parameter from NewTokenIngestionServiceForLoadtest
  - Wire CheckpointService into IngestServiceConfig and ingestService
  - Update ingest_live.go to call checkpointService.PopulateFromCheckpoint instead of tokenIngestionService.PopulateAccountTokens
  - Update ingest.go setupDeps to construct WasmIngestionService and CheckpointService
  - Add ContractValidatorMock, ProtocolValidatorMock, ChangeReaderMock, CheckpointServiceMock, WasmIngestionServiceMock, TokenProcessorMock, and TokenIngestionServiceMock
  updates to mocks.go
  - Add unit tests for WasmIngestionService (10 cases covering ProcessContractCode and PersistKnownWasms)
  - Add unit tests for CheckpointService (16 cases covering entry routing, error propagation, and context cancellation)
  Replace mock.Anything with the actual contractValidatorMock (or cv in
  setupMocks closures) for the third argument in all NewTokenProcessor
  mock expectations. This ensures tests verify that checkpointService
  passes its own contractValidator through to the token processor.

  Also capture contractValidatorMock in the context cancellation test
  destructuring instead of using a type assertion on svc.contractValidator.
  Cover the six entry-type branches in ProcessEntry with direct unit tests
  using a lightweight tokenProcessor (no DB, no mocks—just inspect
  accumulated batch slices and checkpoint data maps).

  Test cases:
  - account_entry: native balance with minimum balance computation
  - trustline_entry: credit trustline balance + uniqueAssets tracking
  - trustline_pool_share_skipped: pool share silently skipped
  - contract_instance_non_sac: WASM contract stored as Unknown + hash tracked
  - contract_balance_non_sac: holder→contract UUID mapping
  - unhandled_entry_type_ignored: offer entry produces no effect
…IngestionService (#524)

* Initial plan

* Remove validator execution from WasmIngestionService

Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com>

* services/wasm_ingestion: remove ProtocolValidator execution from WasmIngestionService

Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: aristidesstaffieri <6886006+aristidesstaffieri@users.noreply.github.com>
…IngestionService to use config struct

  WasmIngestionService.ProcessContractCode no longer receives the full
  bytecode—it only needs the hash to track protocol WASMs. This reduces
  memory pressure during checkpoint population.

  TokenIngestionService construction is consolidated into a single
  NewTokenIngestionService(config) constructor, eliminating the separate
  NewTokenIngestionServiceForLoadtest variant. The loadtest runner now
  uses the same constructor with only the fields it needs.

  Also refactors processContractInstanceChange to return a
  contractInstanceResult struct instead of multiple return values,
  extracts newCheckpointData() helper, uses idiomatic nil slices
  instead of make([]T, 0), and introduces a checkpointTestFixture
  struct to reduce boilerplate in checkpoint tests. Constructors
  return concrete types instead of interfaces to allow direct field
  access in tests.
  Persist contract-to-WASM-hash mappings by extending WasmIngestionService
  with ProcessContractData and PersistProtocolContracts methods. During
  checkpoint population, ContractData Instance entries are parsed to extract
  the wasm_hash and contract_id relationship, which is stored in a new
  protocol_contracts table (FK to protocol_wasms). This mapping will be used
  by protocol-setup and live ingestion to classify contracts by protocol.
…and backfill

  Add two new LedgerChangeProcessors (ProtocolWasmProcessor, ProtocolContractProcessor)
  that extract WASM hashes and contract-to-WASM mappings from ledger changes during
  live ingestion, catchup, and historical backfill. Previously this data was only
  populated during checkpoint.

  - ProtocolWasmProcessor extracts hashes from ContractCode entries
  - ProtocolContractProcessor extracts contract-to-WASM mappings from ContractData Instance entries
  - Extended IndexerBuffer with protocolWasmsByHash/protocolContractsByID maps (Push/Get/Merge/Clear)
  - PersistLedgerData inserts wasms before contracts (FK ordering) with ON CONFLICT DO NOTHING
  - BatchChanges and processBatchChanges extended for backfill paths
  ContractData Instance entries can outlive their referenced ContractCode
  entries due to independent TTLs, causing FK violations when inserting
  protocol_contracts during checkpoint population.

  - Skip contracts referencing unknown WASM hashes in PersistProtocolContracts
  - Add WHERE EXISTS guard in BatchInsert SQL for live/backfill path
  - Add test for contracts_with_missing_wasm_skipped scenario
  Store wasm_hash and contract_id as raw bytes instead of hex/strkey-encoded
  strings. Both values originate as [32]byte arrays in XDR, so BYTEA reduces
  storage by ~50%, improves index performance on fixed-size keys, and removes
  unnecessary encoding/decoding at the persistence boundary.
  The protocol_id on protocol_contracts was always NULL and never queried.
  It's derivable via the existing FK join: protocol_contracts.wasm_hash →
  protocol_wasms.wasm_hash → protocol_wasms.protocol_id.
@aristidesstaffieri aristidesstaffieri force-pushed the feature/checkpoint-population-wasm branch from 7a0015c to c62226b Compare March 11, 2026 15:35
   Replace raw []byte with types.HashBytea for WasmHash and ContractID
   fields in ProtocolWasm and ProtocolContract models. HashBytea implements
   sql.Scanner and driver.Valuer to auto-convert between raw bytes (DB)
   and hex strings (Go), consistent with how Transaction.Hash is handled.

   Updated files:
   - internal/data/protocol_wasms.go, protocol_contracts.go (models + BatchInsert)
   - internal/indexer/processors/protocol_wasms.go, protocol_contracts.go
   - internal/services/wasm_ingestion.go
   - All corresponding test files
   Replace unchecked hex.DecodeString calls with HashBytea.Value() for
   DB verification queries, and remove unused encoding/hex import.
  The persistence pipeline was silently dropping contract upgrades via
  first-write-wins semantics, meaning a contract that upgrades its WASM
  never got its wasm_hash updated. This changes all layers (buffer, buffer
  merge, backfill batch/cross-batch merge, and DB upsert) to
  last-write-wins so contract upgrades are correctly reflected.
SELECT u.contract_id, u.wasm_hash, u.protocol_id, u.name
FROM UNNEST($1::text[], $2::text[], $3::text[], $4::text[])
AS u(contract_id, wasm_hash, protocol_id, name)
WHERE EXISTS (SELECT 1 FROM protocol_wasms pw WHERE pw.wasm_hash = u.wasm_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. makes sense.

You can encounter a contract that references a wasm_hash that has already been evicted and therefore isn't in protocol_wasms

@aristidesstaffieri so does that mean if a wasm is evicted before reading a checkpoint then we will not have it ingested in wallet backend?

   matching the pattern used by AccountsProcessor, TrustlinesProcessor,
   and other existing processors.
…kpointService

  WasmIngestionService was only used by CheckpointService, and
  TokenIngestionService's NewTokenProcessor/TokenProcessor interface was
  only used by CheckpointService. This inlines all checkpoint-specific
  logic directly into CheckpointService, eliminating unnecessary
  intermediate service abstractions.

  - Rewrite checkpoint.go to absorb all checkpoint logic: checkpointData,
    batch, trustline/contract/WASM processing, and protocol persistence
  - Replace positional NewCheckpointService args with CheckpointServiceConfig
  - Strip token_ingestion.go to live-only (ProcessTokenChanges); remove
    TokenProcessor interface, NewTokenProcessor, and checkpoint-only fields
    from TokenIngestionServiceConfig
  - Delete wasm_ingestion.go (absorbed into checkpoint.go)
  - Remove WasmIngestionServiceMock, TokenProcessorMock from mocks.go
  - Update ingest.go wiring and simplify TokenIngestionServiceConfig
  - Rewrite checkpoint_test.go with data model mocks; port WASM and
    checkpoint processor tests from deleted test files
  - Add TrustlineAssetModelMock to data/mocks.go
  - Add valid AccountId to makeAccountChange() helper to prevent nil pointer dereference
  - Add missing protocolWasmModel.BatchInsert mock expectation in ContractCodeEntry test
  - Fix ContextCancellation test to cancel context during reader.Read() instead of before PopulateFromCheckpoint, matching the expected error path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants