Skip to content

feat: add checkpoint-sync#93

Draft
MegaRedHand wants to merge 23 commits intomainfrom
checkpoint-sync
Draft

feat: add checkpoint-sync#93
MegaRedHand wants to merge 23 commits intomainfrom
checkpoint-sync

Conversation

@MegaRedHand
Copy link
Collaborator

@MegaRedHand MegaRedHand commented Feb 4, 2026

This PR adds checkpoint-sync functionality when --checkpoint-sync-url is specified.

NOTE: has some changes from #96, they'll be gone once that's merged.

@MegaRedHand MegaRedHand changed the title feat: add checkpoint-sync support feat: add checkpoint-sync Feb 4, 2026
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

🤖 Kimi Code Review

Review Summary

This PR adds checkpoint sync functionality to ethlambda, allowing nodes to sync from a recent finalized state instead of genesis. The implementation is well-structured and follows Rust best practices, but there are several important issues to address:

Critical Issues

  1. Security: Missing TLS certificate validation (checkpoint_sync.rs:51-65)

    • The Client is built without certificate validation, making it vulnerable to MITM attacks
    • Fix: Add .use_rustls_tls() and proper certificate configuration
  2. Security: Missing state root verification (checkpoint_sync.rs:20-32)

    • The downloaded state isn't verified against a trusted state root
    • Fix: Add state root verification using a trusted checkpoint from config
  3. Consensus: Incomplete fork choice initialization (main.rs:336-338)

    • Store::get_forkchoice_store doesn't exist - this will fail to compile
    • Fix: Use proper initialization method or implement the missing function

Important Issues

  1. Error handling: Missing timeout handling (checkpoint_sync.rs:51-65)

    • Timeouts aren't properly handled - could hang indefinitely
    • Fix: Add explicit timeout handling and retry logic
  2. Resource management: No connection pooling (checkpoint_sync.rs:51-65)

    • New client created for each request - inefficient
    • Fix: Use a shared client instance
  3. Validation: Missing slot range check (checkpoint_sync.rs:125-129)

    • No check if checkpoint slot is reasonable (not too old/future)
    • Fix: Add slot range validation against current time

Minor Issues

  1. Code duplication: create_test_validator() and create_different_validator() could be parameterized
  2. Test coverage: Missing tests for HTTP error cases and timeout scenarios
  3. Documentation: Missing documentation for public functions in checkpoint_sync.rs

Line-specific Feedback

  • Line 51-65: Add TLS configuration
  • Line 125-129: Add slot range validation
  • Line 336: Fix Store::get_forkchoice_store call
  • Line 427: Add more comprehensive test cases for edge conditions

The PR is well-structured overall, but the security and consensus issues must be addressed before merging.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

🤖 Codex Code Review

Review Findings (ordered by severity)

  • Checkpoint sync can accept a forged state with no cryptographic anchoring. verify_checkpoint_state only checks genesis time and validator pubkeys, so a malicious peer can return an arbitrary state (balances, finality, etc.) that still passes. For checkpoint sync, you typically need a trusted checkpoint root (weak subjectivity) or at least verify the state root against a known finalized block root. Consider requiring an expected checkpoint root input (CLI flag) and verifying state.tree_hash_root() equals it, or verifying the downloaded state corresponds to a trusted root from config. bin/ethlambda/src/checkpoint_sync.rs:94-185

  • Missing consistency check between state root and latest_block_header.state_root. The state includes a latest_block_header with a state_root, but there is no verification that state.tree_hash_root() equals latest_block_header.state_root. Without this, the header can be unrelated to the state. Add a check to bind the state to its header. bin/ethlambda/src/checkpoint_sync.rs:165-183

  • Potential fork choice mismatch: anchor block uses empty body. construct_anchor_block uses BlockBody::default() even though the header’s body_root may differ, making Block::tree_hash_root() inconsistent with latest_block_header. If fork choice or storage uses the block root, this can cause incorrect anchoring or invalid justification links. Consider storing/using the header root directly, or constructing an “anchor block” type that preserves body_root semantics. bin/ethlambda/src/checkpoint_sync.rs:188-201, bin/ethlambda/src/main.rs:322-336

  • DoS risk: unbounded checkpoint response size. response.bytes().await? buffers the full state in memory without a maximum size. A peer can return a huge body and OOM the node. Add a maximum size guard (e.g., check Content-Length and enforce a cap, or stream into a bounded buffer). bin/ethlambda/src/checkpoint_sync.rs:76-84

Consensus-layer considerations

  • Fork choice anchoring correctness depends on the block root. With the current empty-body anchor, verify that Store::get_forkchoice_store does not depend on Block::tree_hash_root() for the anchor root. If it does, you need to store or pass the header root directly to avoid mismatch. bin/ethlambda/src/checkpoint_sync.rs:188-201, bin/ethlambda/src/main.rs:336

Rust/maintainability notes

  • The test coverage for structural checks is solid. Consider adding a test that explicitly checks the new “state root vs header state_root” invariant if you add it. bin/ethlambda/src/checkpoint_sync.rs:204-427

Open questions

  • Is /lean/v0/states/finalized guaranteed to return a state at the finalized slot? If yes, you should add state.slot == state.latest_finalized.slot (and probably state.latest_block_header.slot == state.slot) to prevent inconsistent checkpoints. bin/ethlambda/src/checkpoint_sync.rs:148-176

If you address the anchoring and validation gaps above, the rest of the changes look clean and well-structured.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

🤖 Claude Code Review

Perfect! Now I have a complete picture of the PR. Let me compile my review findings.

Code Review: PR #93 - Add Checkpoint Sync Support

Summary

This PR adds checkpoint sync functionality to allow nodes to sync from a trusted checkpoint state instead of starting from genesis. The implementation is well-structured with proper validation and error handling. Below are my findings.


✅ Strengths

  1. Comprehensive validation checks - The verify_checkpoint_state() function includes extensive safety checks for consensus-critical data
  2. Well-tested - 19 unit tests covering various validation scenarios
  3. Clean error handling - Uses thiserror for structured error types
  4. Proper timeout strategy - Uses read timeout instead of total timeout to handle large states gracefully
  5. Good separation of concerns - Checkpoint logic isolated in dedicated module
  6. Follows existing patterns - Reuses Store::get_forkchoice_store() correctly

🔴 Critical Issues

1. State Root Mismatch Not Validated (Security)

Location: bin/ethlambda/src/checkpoint_sync.rs:327

The construct_anchor_block() creates a block with a default empty body, but the function doesn't verify that the computed block root matches the checkpoint's finalized/justified root. This could allow accepting an invalid checkpoint state where the state root doesn't correspond to the actual state data.

Issue:

pub fn construct_anchor_block(state: &State) -> Block {
    Block {
        slot: state.latest_block_header.slot,
        parent_root: state.latest_block_header.parent_root,
        proposer_index: state.latest_block_header.proposer_index,
        state_root: state.latest_block_header.state_root,
        body: BlockBody::default(),  // ⚠️ Default body - root may not match!
    }
}

The block's tree_hash_root() will differ from the original if the original had a non-empty body. While the comment mentions this, there's no validation that the state is actually from the checkpoint we think it is.

Recommendation:

  • Compute the state root from the downloaded state: let computed_state_root = state.tree_hash_root()
  • Compare with state.latest_block_header.state_root to ensure integrity
  • Add this check to verify_checkpoint_state() as:
    // Verify state root integrity
    let computed_state_root = state.tree_hash_root();
    if computed_state_root != state.latest_block_header.state_root {
        return Err(CheckpointSyncError::StateRootMismatch { 
            expected: state.latest_block_header.state_root, 
            got: computed_state_root 
        });
    }

2. No HTTPS Enforcement (Security)

Location: bin/ethlambda/src/checkpoint_sync.rs:69-73

Checkpoint sync downloads consensus state over HTTP without enforcing HTTPS. An attacker performing a MITM attack could provide a malicious state that passes validation checks.

Current code:

let url = format!("{base_url}/lean/v0/states/finalized");
let client = Client::builder()
    .connect_timeout(CHECKPOINT_CONNECT_TIMEOUT)
    .read_timeout(CHECKPOINT_READ_TIMEOUT)
    .build()?;

Recommendation:

  • Enforce HTTPS scheme in the URL validation
  • Add a check like:
    if !base_url.starts_with("https://") {
        return Err(CheckpointSyncError::InsecureUrl);
    }
  • Consider adding a --allow-insecure-checkpoint flag for development/testing

3. Potential Type Confusion in Checkpoint Validation

Location: bin/ethlambda/src/checkpoint_sync.rs:172-182

The validation checks if latest_block_header matches finalized/justified roots, but uses .tree_hash_root() on the header, not the full block. This could lead to false positives/negatives.

let block_root = state.latest_block_header.tree_hash_root();  // Header root

if state.latest_block_header.slot == state.latest_finalized.slot
    && block_root != state.latest_finalized.root.0  // Comparing to block root?
{
    return Err(CheckpointSyncError::BlockHeaderFinalizedRootMismatch);
}

Question: Does state.latest_finalized.root store a block root or a header root? If it's a block root, this comparison is incorrect.

Investigation needed: Check the consensus spec to confirm what Checkpoint.root represents. If it's a block root, this validation is broken.


⚠️ High Priority Issues

4. Missing Slot Bounds Check

Location: bin/ethlambda/src/checkpoint_sync.rs:147-149

While we check finalized_slot <= state.slot, we don't validate that the checkpoint isn't from the future relative to system time.

Recommendation:

// Verify checkpoint slot is not in the future
let current_time = std::time::SystemTime::now()
    .duration_since(std::time::UNIX_EPOCH)
    .unwrap()
    .as_secs();
let max_slot = (current_time - state.config.genesis_time) / SECONDS_PER_SLOT;
if state.slot > max_slot + 10 {  // Allow 10 slot tolerance
    return Err(CheckpointSyncError::CheckpointTooFarInFuture);
}

5. No Size Limit on Downloaded State

Location: bin/ethlambda/src/checkpoint_sync.rs:82

The code downloads the entire state into memory without any size checks:

let bytes = response.bytes().await?;

A malicious or misconfigured peer could send gigabytes of data, causing OOM.

Recommendation:

  • Add a reasonable size limit (e.g., 100MB)
  • Use response.content_length() to check before download:
    if let Some(len) = response.content_length() {
        if len > MAX_STATE_SIZE {
            return Err(CheckpointSyncError::StateTooLarge);
        }
    }

6. Error Context Lost in Main

Location: bin/ethlambda/src/main.rs:110-111

The error is logged but all context is lost before exit:

.inspect_err(|err| error!(%err, "Failed to initialize state"))
.unwrap_or_else(|_| std::process::exit(1));

The unwrap_or_else closure receives |_|, discarding the error. While inspect_err logs it, this pattern is fragile.

Recommendation:

.unwrap_or_else(|err| {
    error!(%err, "Failed to initialize state");
    std::process::exit(1)
})

💡 Medium Priority Issues

7. Inconsistent Checkpoint Root Field Access

Location: bin/ethlambda/src/checkpoint_sync.rs:172,178

Sometimes accessing state.latest_finalized.root.0 (inner field) and sometimes state.latest_finalized.root (the H256 wrapper):

&& block_root != state.latest_finalized.root.0  // Line 172 - accessing .0
...
&& state.latest_justified.root != state.latest_finalized.root  // Line 158 - direct comparison

Fix: Be consistent - use direct H256 comparison everywhere since tree_hash_root() returns H256:

if ... && H256(block_root) != state.latest_finalized.root {

Or keep the block_root as [u8; 32] if that's the intent.


8. Missing Validator Epoch Checks

Location: bin/ethlambda/src/checkpoint_sync.rs:134-144

The validator pubkey validation is good, but doesn't check for epoch boundaries or activation/exit epochs. While this may be simplified for lean consensus, ensure validators in the checkpoint are actually active.


9. Construct Anchor Block Creates Misleading Data

Location: bin/ethlambda/src/checkpoint_sync.rs:188-197

The comment acknowledges the issue:

IMPORTANT: This creates a block with default body. The block's tree_hash_root() will only match the original block if the original also had an empty body.

This is dangerous because:

  • The anchor block's hash won't match the actual block hash
  • Fork choice relies on block roots being correct
  • Could cause issues during block requests by root

Concern: Will fork choice break if we ask peers for this block by its (incorrect) computed root?

Recommendation: Document why this is safe, or fetch the actual block body from the peer.


🔧 Minor Issues / Style

10. Redundant URL Trimming

Location: bin/ethlambda/src/checkpoint_sync.rs:68

let base_url = base_url.trim_end_matches('/');
let url = format!("{base_url}/lean/v0/states/finalized");

Minor nit: works correctly but adds complexity. Consider documenting expected URL format.


11. Magic Numbers

Location: bin/ethlambda/src/checkpoint_sync.rs:9-13

const CHECKPOINT_CONNECT_TIMEOUT: Duration = Duration::from_secs(15);
const CHECKPOINT_READ_TIMEOUT: Duration = Duration::from_secs(15);

Good to have constants, but 15 seconds might be aggressive for connect timeout in poor network conditions. Consider making configurable or bumping to 30s.


12. Test Coverage Gaps

The tests are comprehensive for validation logic but missing:

  • Integration test for actual HTTP download
  • Test for SSZ deserialization failures
  • Test for malformed HTTP responses
  • Test for timeout scenarios

13. Logging Could Be More Informative

Location: bin/ethlambda/src/main.rs:320-334

Good logging, but consider adding:

  • Checkpoint root hash (finalized/justified)
  • Time taken to download
  • State size in MB
  • URL being used (already logged, good!)

🎯 Rust Best Practices

14. Good Use of From/Into Traits

The codebase properly uses From implementations for ResponseCode conversions elsewhere, and this PR follows suit.

15. Proper Ownership

The validators vec is moved into fetch_initial_state, avoiding unnecessary clones. Good!

16. Error Handling with inspect_err

Follows project patterns correctly.


📋 Architecture Questions

  1. What happens to existing RocksDB data? If a node has existing state and then starts with --checkpoint-sync-url, will it overwrite? Should we check if DB is empty first?

  2. Recovery strategy: If checkpoint sync fails mid-download, do we fall back to genesis or fail completely? Current behavior: fail and exit (line 111).

  3. Checkpoint age: Should we warn if the checkpoint is very old (e.g., >1 week behind current slot)?


Summary Assessment

Category Status
Correctness ⚠️ Issues #1, #3 need investigation
Security 🔴 Issues #1, #2 are critical
Performance ✅ Good timeout strategy
Memory Safety ⚠️ Issue #5 (unbounded download)
Error Handling ✅ Well-structured
Testing ✅ Comprehensive unit tests
Code Quality ✅ Clean, readable

Recommendations Priority

  1. MUST FIX: State root validation (docs: add initial roadmap #1)
  2. MUST FIX: HTTPS enforcement (chore: initial repo setup #2)
  3. MUST INVESTIGATE: Checkpoint root type confusion (docs: improve readme #3)
  4. SHOULD FIX: Download size limit (feat: load genesis network configuration and state #5)
  5. SHOULD FIX: Future slot check (docs: add philosophy and design principles #4)
  6. CONSIDER: Anchor block root mismatch implications (feat: listen for gossipsub blocks #9)

The core implementation is solid, but the consensus-critical validation needs strengthening before merge.


Automated review by Claude (Anthropic) · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant