Skip to content

refactor: store pending block separately#3073

Open
alpe wants to merge 7 commits intomainfrom
alex/pending_block_fix
Open

refactor: store pending block separately#3073
alpe wants to merge 7 commits intomainfrom
alex/pending_block_fix

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 13, 2026

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors how pending blocks are handled within the system. It transitions from storing unconfirmed blocks as regular block data with empty signatures to using a dedicated metadata store. This change enhances the robustness of block production, particularly during node restarts, and includes a crucial migration path to ensure compatibility with existing data formats, thereby preventing potential issues like double-signing after a system upgrade. Additionally, module replacement directives were enabled in application-level go.mod files.

Highlights

  • Pending Block Storage Refactor: Introduced a dedicated mechanism for storing pending blocks in the metadata store, separating them from finalized block data.
  • Legacy Pending Block Migration: Implemented a migration strategy to convert old-style pending blocks (stored at height N+1 with empty signatures) to the new metadata format upon node initialization, preventing double-signing issues during upgrades.
  • Block Production Logic Update: Refactored the block production process to utilize the new getPendingBlock, savePendingBlock, and deletePendingBlock functions for managing unconfirmed blocks.
  • Module Replacement Directives: Uncommented replace directives in go.mod files for apps/evm and apps/testapp, likely for local development or testing setups.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • apps/evm/go.mod
    • Uncommented replace directives to enable local module replacements.
  • apps/evm/go.sum
    • Removed outdated module hashes for github.com/evstack/ev-node and github.com/evstack/ev-node/execution/evm.
  • apps/testapp/go.mod
    • Uncommented a replace directive to enable local module replacement.
  • apps/testapp/go.sum
    • Removed outdated module hashes for github.com/evstack/ev-node.
  • block/internal/executing/executor.go
    • Added a call to migrateLegacyPendingBlock during state initialization.
    • Updated pending block retrieval to use e.getPendingBlock instead of e.store.GetBlockData.
    • Refactored early block saving into a new e.savePendingBlock function.
    • Added a call to e.deletePendingBlock before committing the batch.
  • block/internal/executing/executor_restart_test.go
    • Updated context.Background() calls to t.Context() for better test isolation.
    • Modified the test to use exec1.savePendingBlock for simulating pending block storage.
  • block/internal/executing/pending.go
    • Added a new file to encapsulate pending block management logic.
    • Implemented getPendingBlock to retrieve pending blocks from metadata.
    • Implemented savePendingBlock to store blocks as pending in metadata.
    • Implemented deletePendingBlock to remove pending block metadata.
    • Implemented migrateLegacyPendingBlock to handle migration of old-style pending blocks.
  • pkg/store/header_store_adapter_test.go
    • Updated context.Background() calls to t.Context() for better test isolation.
  • pkg/store/keys.go
    • Made getHeaderKey, getDataKey, getSignatureKey, getMetaKey, and getIndexKey functions public (Get*Key) and added internal aliases.
  • pkg/store/store.go
    • Updated internal calls to getMetaKey to use the newly public GetMetaKey function.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Feb 13, 2026

Claude finished @alpe's task in 2m 21s —— View job


Code Review Complete

I've completed a comprehensive review of PR #3073, which refactors pending block storage to use dedicated metadata keys instead of storing unconfirmed blocks as regular block data with empty signatures. This is a significant architectural improvement that enhances restart robustness and prevents double-signing issues.


Summary of Changes

Positive aspects:

  • Excellent separation of concerns: pending blocks now use dedicated metadata storage
  • Well-designed migration path for legacy pending blocks
  • Comprehensive test coverage for restart scenarios
  • Clean API design with getPendingBlock, savePendingBlock, and deletePendingBlock helpers

Areas requiring attention:

  • Critical error handling issues that could lead to data inconsistency
  • Missing test coverage (36% patch coverage per Codecov)
  • Some concurrency safety concerns
  • Missing context propagation in one location

Critical Issues

1. Silent Failure on Pending Block Deletion (High Priority)

Location: block/internal/executing/executor.go:536-538

if err := e.deletePendingBlock(batch); err != nil {
    e.logger.Warn().Err(err).Uint64("height", newHeight).Msg("failed to delete pending block metadata")
}

Issue: If deletePendingBlock fails, the code logs a warning but continues to commit the batch. This leaves stale pending block metadata in the store, which could cause issues on subsequent restarts.

Impact:

  • Stale metadata pollutes storage
  • Could indicate underlying batch/store corruption
  • May confuse restart logic on next node startup

Recommendation: Treat this as a critical error and abort the commit:

if err := e.deletePendingBlock(batch); err != nil {
    return fmt.Errorf("failed to delete pending block metadata for height %d: %w", newHeight, err)
}

This aligns with Gemini's review comment and ensures state consistency. Fix this issue


2. Incomplete Error Detection in Pending Block Retrieval (Medium Priority)

Location: block/internal/executing/pending.go:29-35

dataBytes, err := e.store.GetMetadata(ctx, dataKey)
if err != nil {
    if errors.Is(err, ds.ErrNotFound) {
        return nil, nil, fmt.Errorf("pending header exists but data is missing: corrupt state")
    }
    return nil, nil, err
}

Issue: When a pending header exists without corresponding data, this indicates an inconsistent state. While the code correctly returns an error, it doesn't log this condition, making debugging difficult.

Recommendation: Add warning log before returning the error:

dataBytes, err := e.store.GetMetadata(ctx, dataKey)
if err != nil {
    if errors.Is(err, ds.ErrNotFound) {
        e.logger.Warn().Msg("found pending header without pending data: corrupt state detected")
        return nil, nil, fmt.Errorf("pending header exists but data is missing: corrupt state")
    }
    return nil, nil, err
}

This helps with operational observability. Fix this issue


Medium Priority Issues

3. Context Not Passed to deletePendingBlock

Location: block/internal/executing/executor.go:536 and block/internal/executing/pending.go:81

Issue: The deletePendingBlock function signature doesn't accept a context.Context, but it performs datastore operations through a batch. While the batch was created with a context, best practice is to explicitly pass context to all I/O operations for proper cancellation handling.

Current signature:

func (e *Executor) deletePendingBlock(batch store.Batch) error

Recommended signature:

func (e *Executor) deletePendingBlock(ctx context.Context, batch store.Batch) error

Rationale: Consistent with Go best practices and other methods like getPendingBlock and savePendingBlock. Fix this issue


4. Migration Logic Doesn't Handle Partial Failures

Location: block/internal/executing/pending.go:99-147

Issue: The migrateLegacyPendingBlock function performs two separate write operations:

  1. savePendingBlock (creates new batch and commits)
  2. Cleanup of old keys (creates second batch and commits)

If the second operation fails after the first succeeds, you'll have duplicate data in both old and new formats.

Recommendation: Consider using a single batch for atomicity:

batch, err := e.store.NewBatch(ctx)
if err != nil {
    return fmt.Errorf("create migration batch: %w", err)
}

// Write to new metadata keys
headerBytes, err := pendingHeader.MarshalBinary()
if err != nil {
    return fmt.Errorf("marshal header: %w", err)
}
dataBytes, err := pendingData.MarshalBinary()
if err != nil {
    return fmt.Errorf("marshal data: %w", err)
}

if err := batch.Put(ds.NewKey(store.GetMetaKey(headerKey)), headerBytes); err != nil {
    return fmt.Errorf("save pending header: %w", err)
}
if err := batch.Put(ds.NewKey(store.GetMetaKey(dataKey)), dataBytes); err != nil {
    return fmt.Errorf("save pending data: %w", err)
}

// Delete old keys in same batch
for _, key := range []string{...} {
    if err := batch.Delete(ds.NewKey(key)); err != nil && !errors.Is(err, ds.ErrNotFound) {
        return fmt.Errorf("delete legacy key %s: %w", key, err)
    }
}

if err := batch.Commit(); err != nil {
    return fmt.Errorf("commit migration: %w", err)
}

This ensures atomic migration. Fix this issue


5. Missing TODO Link Per Maintainer Request

Location: block/internal/executing/executor.go:273-275

Issue: Reviewer julienrbrt requested linking issue #2795 as a TODO comment for the migration code.

Current:

// Migrate any old-style pending block (stored at height N+1 via SaveBlockData
// with empty signature) to the new metadata-key format.
// Todo remove in the future: https://github.com/evstack/ev-node/issues/2795

Recommended:

// Migrate any old-style pending block (stored at height N+1 via SaveBlockData
// with empty signature) to the new metadata-key format.
// TODO(#2795): Remove migration logic in future release after all nodes have upgraded

This provides better IDE integration and clearer action items. Fix this issue


Architecture & Design

Strengths

  1. Excellent Separation of Concerns: Moving pending blocks to metadata storage is architecturally sound. It clearly separates uncommitted state from committed blocks.

  2. Well-Designed Migration Path: The migrateLegacyPendingBlock function properly detects old-style pending blocks (empty signature at height N+1) and migrates them, preventing double-signing issues during upgrades.

  3. Clean Encapsulation: The new pending.go file nicely encapsulates all pending block logic with clear helper methods.

  4. Good Use of Metadata Store: Using metadata keys (m/pending_header, m/pending_data) is appropriate for transient state that shouldn't be indexed as regular blocks.

Potential Concerns

  1. Key Naming: The constants headerKey = "pending_header" and dataKey = "pending_data" are package-level and could collide with other metadata keys. Consider a prefix namespace:

    const (
        pendingHeaderKey = "block/pending/header"
        pendingDataKey   = "block/pending/data"
    )
  2. No Expiration Mechanism: If a node crashes during block production and never restarts, the pending block metadata persists indefinitely. Consider adding:

    • A timestamp field to pending blocks
    • Validation that checks if pending block is too old (e.g., > 24 hours)
    • Automatic cleanup of stale pending blocks during initialization

Error Handling & Resilience

Strengths

  1. Proper Error Wrapping: All errors use fmt.Errorf with %w for proper error chain preservation.
  2. Datastore Error Handling: Correctly distinguishes between ds.ErrNotFound and other errors.
  3. Defensive Programming: The migration checks for len(pendingHeader.Signature) != 0 to avoid migrating already-signed blocks.

Issues

  1. Silent Deletion Failure: As noted above (Issue ABCI interface #1), failing to delete pending metadata should be fatal, not a warning.

  2. Batch Commit Without Rollback: If batch.Commit() fails in savePendingBlock, there's no explicit rollback. While most datastores handle this automatically, explicit error messages would help debugging.

  3. Unmarshal Error Context: When unmarshaling fails, the error message doesn't include the height or context:

    if err := header.UnmarshalBinary(headerBytes); err != nil {
        return nil, nil, fmt.Errorf("unmarshal pending header: %w", err)
    }

    Consider adding more context for debugging:

    if err := header.UnmarshalBinary(headerBytes); err != nil {
        return nil, nil, fmt.Errorf("unmarshal pending header at height %d: %w", e.getLastState().LastBlockHeight+1, err)
    }

Security Considerations

Positive

  1. No Credential Exposure: No sensitive data logged or exposed in error messages.
  2. Proper Signature Validation: Migration logic correctly checks for empty signatures to identify legacy pending blocks.
  3. No Injection Vulnerabilities: All keys are constructed from constants or properly formatted integers.

Concerns

  1. Double-Signing Prevention: This PR directly addresses the double-signing vulnerability mentioned in the migration comments. The implementation appears sound, but critical integration testing is needed to verify:

    • Node restart with pending block correctly reuses the block
    • No scenario where the same height gets signed twice
    • Migration handles all edge cases (empty blocks, genesis blocks, etc.)
  2. Corrupt State Detection: The code properly detects "pending header without data" as corrupt state, but doesn't have a recovery mechanism. Consider:

    • Auto-cleanup option (with confirmation)
    • Detailed instructions in error message
    • Alert/metric for monitoring

Performance & Resource Efficiency

Observations

  1. Batch Operations: Good use of batching for atomic writes in savePendingBlock and migration.

  2. Minimal Overhead: The pending block check in ProduceBlock is lightweight (single metadata lookup).

  3. No Memory Leaks: Properly handles marshaling/unmarshaling without retaining excess memory.

Recommendations

  1. Consider Batch Reuse: In savePendingBlock, a new batch is created. In deletePendingBlock, the batch is passed in. Consider making this consistent - either always create batches internally or always pass them in.

  2. Hash Calculation Optimization: In migration, sha256.Sum256(headerBytes) is calculated on already-marshaled data, which is efficient. Good optimization.


Testing & Coverage

Current State

Per Codecov report:

  • Patch coverage: 36.36% (63 lines missing coverage)
  • New file pending.go: 28.76% coverage (40 missing, 12 partial)
  • Modified executor.go: 25.00% coverage (2 missing, 4 partial)

Existing Tests

The PR includes executor_restart_test.go with two test cases:

  1. TestExecutor_RestartUsesPendingHeader - Tests restart with pending block (✅ Excellent)
  2. TestExecutor_RestartNoPendingHeader - Tests normal restart without pending block (✅ Good)

Missing Test Coverage

Critical scenarios not covered:

  1. Migration Edge Cases:

    • Legacy pending block at wrong height
    • Legacy pending block with non-empty signature (error case)
    • Multiple legacy pending blocks (shouldn't happen, but should be tested)
    • Migration failure scenarios
  2. Corrupt State Scenarios:

    • Pending header exists but data missing (error path at line 31-35)
    • Pending data exists but header missing (currently not checked)
    • Malformed pending header/data (unmarshal errors)
  3. Concurrent Access:

    • Multiple goroutines calling savePendingBlock
    • Deletion during retrieval
  4. Batch Operation Failures:

    • batch.Commit() failure in savePendingBlock
    • batch.Delete() failure in deletePendingBlock

Recommended Additional Tests

func TestPendingBlock_CorruptState(t *testing.T) {
    // Test header without data
    // Test data without header
    // Test malformed data
}

func TestMigrateLegacyPendingBlock_EdgeCases(t *testing.T) {
    // Test with signed block (should error)
    // Test with no legacy block (should no-op)
    // Test cleanup of all legacy keys
}

func TestPendingBlock_BatchFailures(t *testing.T) {
    // Mock batch.Commit() failure
    // Mock batch.Delete() failure
}

View full coverage report


Code Quality & Go Best Practices

Strengths

  1. Idiomatic Error Handling: Consistent use of error wrapping with context.
  2. Good Naming: Function and variable names are clear and descriptive.
  3. Proper Use of Context: Context propagated through most operations (except deletePendingBlock).
  4. No Magic Numbers: Constants defined for metadata keys.

Minor Issues

  1. Inconsistent Batch Handling: As noted earlier, savePendingBlock creates its own batch, but deletePendingBlock receives one. Consider consistency.

  2. Missing Package Documentation: The new pending.go file lacks package-level documentation explaining the pending block mechanism.

  3. Test Context Usage: Good improvement in test file changing context.Background() to t.Context() for better test isolation (lines 82, 104, etc.).


Recommendations Summary

Must Fix (Before Merge)

  1. ✅ Change pending block deletion failure from warning to error (Issue ABCI interface #1)
  2. ✅ Add warning log for corrupt state detection (Issue Peer discovery #2)
  3. ✅ Update TODO comment format per maintainer request (Issue Mempool #5)

Should Fix (High Priority)

  1. Add context parameter to deletePendingBlock (Issue Transaction gossiping #3)
  2. Make migration atomic with single batch (Issue Fraud proofs gossiping #4)
  3. Add test coverage for error paths (target >70% coverage)

Nice to Have (Future Improvements)

  1. Add pending block expiration/cleanup mechanism
  2. Improve key naming with namespace prefixes
  3. Add package documentation to pending.go
  4. Make batch handling consistent across pending block operations

Conclusion

This is a solid architectural improvement that addresses a real issue (double-signing on restart). The core logic is sound, but there are critical error handling issues that must be addressed before merging.

Overall Assessment:Approve with required changes

The refactoring correctly separates pending blocks from committed blocks and includes a well-designed migration path. However, the silent failure on pending block deletion (Issue #1) is a blocker that could lead to data inconsistency. Once the must-fix issues are addressed and test coverage improves, this will be a valuable enhancement to the codebase.


@github-actions
Copy link
Contributor

github-actions bot commented Feb 13, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 16, 2026, 9:51 AM

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 36.36364% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.00%. Comparing base (c048201) to head (eb7d790).

Files with missing lines Patch % Lines
block/internal/executing/pending.go 28.76% 40 Missing and 12 partials ⚠️
block/internal/executing/executor.go 25.00% 2 Missing and 4 partials ⚠️
pkg/store/store.go 44.44% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3073      +/-   ##
==========================================
- Coverage   61.25%   61.00%   -0.25%     
==========================================
  Files         112      113       +1     
  Lines       11355    11431      +76     
==========================================
+ Hits         6956     6974      +18     
- Misses       3630     3676      +46     
- Partials      769      781      +12     
Flag Coverage Δ
combined 61.00% <36.36%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

e.logger.Info().Uint64("height", state.LastBlockHeight).
Str("chain_id", state.ChainID).Msg("initialized state")

// Migrate any old-style pending block (stored at height N+1 via SaveBlockData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link #2795 as a todo.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the storage of pending blocks by moving them from the main block storage to a separate metadata area. This is a good architectural improvement that isolates uncommitted data. The changes include a new pending.go file with helpers for managing pending blocks, a migration path for old-style pending blocks, and updates to the block production logic to use the new mechanism.

My review includes a couple of suggestions to improve robustness and observability. One suggestion is to log a warning when an inconsistent pending block state is detected (e.g., a header without data). Another is to handle failures in deleting pending blocks more strictly by aborting the block commit, which would prevent leaving stale data in the store.

Comment on lines 535 to 537
if err := e.deletePendingBlock(e.ctx, batch); err != nil {
e.logger.Warn().Err(err).Uint64("height", newHeight).Msg("failed to delete pending block metadata")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If deletePendingBlock fails, the current implementation logs a warning and proceeds to commit the batch. This leaves a stale pending block in the metadata store. While this might not cause issues in the next block production cycle (due to height checks), it indicates an underlying problem with the batch or store and pollutes the storage. It would be safer to treat this as a critical error and abort the commit by returning an error. This ensures that the state remains consistent.

Suggested change
if err := e.deletePendingBlock(e.ctx, batch); err != nil {
e.logger.Warn().Err(err).Uint64("height", newHeight).Msg("failed to delete pending block metadata")
}
if err := e.deletePendingBlock(e.ctx, batch); err != nil {
return fmt.Errorf("failed to delete pending block metadata for height %d: %w", newHeight, err)
}

Comment on lines 30 to 36
dataBytes, err := e.store.GetMetadata(ctx, dataKey)
if err != nil {
if errors.Is(err, ds.ErrNotFound) {
return nil, nil, nil
}
return nil, nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic correctly handles the case where pending data is not found after a pending header has been found, but it does so silently. If a pending header exists without corresponding data, it indicates an inconsistent state that should be logged as a warning. This would help in debugging potential issues where pending blocks are not saved correctly.

Suggested change
dataBytes, err := e.store.GetMetadata(ctx, dataKey)
if err != nil {
if errors.Is(err, ds.ErrNotFound) {
return nil, nil, nil
}
return nil, nil, err
}
dataBytes, err := e.store.GetMetadata(ctx, dataKey)
if err != nil {
if errors.Is(err, ds.ErrNotFound) {
e.logger.Warn().Msg("found pending header without pending data, ignoring")
return nil, nil, nil
}
return nil, nil, err
}

alpe added 5 commits February 16, 2026 10:31
* main:
  build(deps): Bump github.com/pion/dtls/v3 from 3.0.6 to 3.0.11 (#3068)
  feat: block Pruning (#2984)
(cherry picked from commit 354bc76e32eb7a150d1437aa9674b697cd95af09)
* main:
  chore: Fix mismatched comment in TestCache_WithNilStore function (#3074)
@alpe alpe marked this pull request as ready for review February 16, 2026 09:58
@julienrbrt julienrbrt changed the title chore: store pending block separately refactor: store pending block separately Feb 16, 2026
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants