Skip to content

refactor: split headers from blocks in storage#96

Open
MegaRedHand wants to merge 11 commits intomainfrom
refactor/split-headers-from-blocks
Open

refactor: split headers from blocks in storage#96
MegaRedHand wants to merge 11 commits intomainfrom
refactor/split-headers-from-blocks

Conversation

@MegaRedHand
Copy link
Collaborator

@MegaRedHand MegaRedHand commented Feb 5, 2026

Required for #93

This PR streamlines block storage by splitting block headers from bodies. This simplifies the handling of the anchor state, since the block header is readily available from it, but the body is not, even if it's unnecessary for child block processing.

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🤖 Kimi Code Review

Security & Consensus-Critical Issues

  1. Critical: on_block uses has_state instead of contains_block (blockchain/src/store.rs:328)

    • This change breaks the duplicate block check. A block root is derived from the block header, but has_state checks for state existence. This allows duplicate blocks to be processed if their state hasn't been computed yet, violating consensus rules.
  2. Critical: Missing block body validation in get_signed_block (storage/src/store.rs:571-576)

    • When reconstructing a SignedBlockWithAttestation, there's no verification that the header's body_root matches the actual body. This allows malicious storage corruption to go undetected.
  3. High: Unsafe Block::from_header_and_body panics in production (common/types/src/block.rs:224-227)

    • The debug_assert! will be compiled out in release builds, allowing silent data corruption. Should use Result or proper error handling.

Performance & Resource Issues

  1. Inefficient: Double state root computation (storage/src/store.rs:193-202)

    • init_store computes the state root twice (once zeroed, once final). This is expensive for large states.
  2. Unnecessary: get_signed_block always loads body (storage/src/store.rs:571)

    • Many operations only need headers. Consider a get_signed_block_header method to avoid loading bodies when not needed.

Code Quality & Maintainability

  1. Inconsistent naming: from_anchor_state vs get_forkchoice_store (storage/src/store.rs:123, 149)

    • These methods serve similar purposes but have inconsistent naming patterns. Consider from_genesis_with_block for the latter.
  2. Missing documentation: has_state method (storage/src/store.rs:594-599)

    • New public method lacks documentation about its purpose and guarantees.
  3. Redundant validation: Block::from_header_and_body (common/types/src/block.rs:224-227)

    • The validation is already performed in init_store and other places. Consider making this a simple constructor with validation at call sites.

Minor Issues

  1. Typo: "put non-finalized chain index" comment (storage/src/store.rs:245)

    • Should be "live chain index" to match the table name.
  2. Unused methods: get_gossip_signature and contains_gossip_signature removed (storage/src/store.rs:718-732)

    • Good cleanup, but ensure these weren't needed elsewhere.

Recommendations

  1. Fix the duplicate block check by reverting to contains_block or implementing proper block existence checking.
  2. Add body root verification in get_signed_block to prevent storage corruption attacks.
  3. Replace debug_assert! with proper error handling in Block::from_header_and_body.
  4. Cache the computed state root in init_store to avoid recomputation.
  5. Add get_signed_block_header method for operations that don't need full blocks.

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🤖 Claude Code Review

Perfect. Now let me analyze the overall change for any potential issues. Let me compile the review:

Code Review: PR #96 - Split Headers from Blocks in Storage

Summary

This PR refactors block storage to separate block headers from bodies, preparing for checkpoint sync (issue #93). The changes are well-structured and largely correct, but there are a few potential issues to address.


Critical Issues

1. Release vs Debug Behavior Mismatch - Block::from_header_and_body() (crates/common/types/src/block.rs:225-229)

The body root validation uses debug_assert_eq! instead of a runtime check:

pub fn from_header_and_body(header: BlockHeader, body: BlockBody) -> Self {
    debug_assert_eq!(
        header.body_root,
        body.tree_hash_root(),
        "body root mismatch"
    );

Issue: This is a consensus-critical validation that will be silently skipped in release builds. An attacker could exploit this to create invalid blocks.

Recommendation: Use a runtime check with proper error handling:

pub fn from_header_and_body(header: BlockHeader, body: BlockBody) -> Result<Self, String> {
    let computed_body_root = body.tree_hash_root();
    if header.body_root != computed_body_root {
        return Err(format!(
            "body root mismatch: header={:?}, computed={:?}",
            header.body_root, computed_body_root
        ));
    }
    Ok(Self { ... })
}

Or if the panic is intentional (programmer error only), add a comment explaining why this should never happen in production.


2. Missing Error Handling - get_signed_block() (crates/storage/src/store.rs:569-584)

The method reconstructs blocks but doesn't validate the body root:

let header = BlockHeader::from_ssz_bytes(&header_bytes).expect("valid header");
let body = BlockBody::from_ssz_bytes(&body_bytes).expect("valid body");
let block = Block::from_header_and_body(header, body);

Issue: If from_header_and_body is changed to return a Result, this code will break. Even with debug_assert, there's a conceptual mismatch - we're trusting database integrity without verification.

Recommendation:

  • If you change from_header_and_body to return Result, handle it here
  • Consider adding a comment about database integrity assumptions

Moderate Issues

3. Inconsistent State Checking - blockchain/src/lib.rs:282

Changed from contains_block to has_state:

if !self.store.has_state(&parent_root) {
    info!(%slot, %parent_root, %block_root, "Block parent missing, storing as pending");

Question: Is this semantically correct? Previously we checked for the parent block's existence, now we check for the parent's post-state.

  • If a block header exists but its state hasn't been computed yet, the old code would proceed, the new code would defer
  • This might be intentional (can't process without parent state), but verify this aligns with fork choice requirements

Recommendation: Verify this matches the intended block processing semantics. If correct, consider updating the log message to say "parent state missing" for clarity.


4. Duplicate Checking Semantics - blockchain/src/store.rs:327

Changed from contains_block to has_state for duplicate detection:

if store.has_state(&block_root) {
    return Ok(());
}

Analysis: This is likely correct - if we have the state, we've processed the block. But it changes semantics slightly:

  • Old: "have we seen this block?"
  • New: "have we processed this block?"

If a block is stored but state computation failed, the old code would skip it, the new code would retry. This is probably an improvement, but should be verified.


Minor Issues

5. Missing Serialize Trait - BlockHeader (crates/common/types/src/block.rs:179)

BlockHeader derives Serialize but not Deserialize:

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Encode, Decode, TreeHash)]
pub struct BlockHeader {

Issue: This is asymmetric. If serialization is needed (e.g., for JSON APIs), deserialization likely is too.

Recommendation: Either add Deserialize or confirm Serialize is needed without deserialization.


6. Removed Helper Methods - No Usage Found

Removed contains_block, insert_block, get_gossip_signature, contains_gossip_signature, and iter_states methods.

Status: Verified these aren't used elsewhere in the codebase. ✅


Positive Observations

  1. Good separation of concerns: Headers stored separately from bodies enables efficient chain tracking
  2. Proper ownership semantics: insert_signed_block takes ownership to avoid cloning large signatures
  3. Clear documentation: New methods have good doc comments explaining behavior
  4. Backward compatibility: Genesis block special case (no body) is properly handled
  5. State root validation: init_store properly validates state root consistency
  6. Consistent API: get_block_header follows naming conventions

Testing Recommendations

  1. Verify the fork choice spectests pass with skip-signature-verification
  2. Test checkpoint sync scenario with from_anchor_state (no body)
  3. Test genesis initialization still works correctly
  4. Verify from_header_and_body properly validates body roots in debug builds
  5. Add a test that explicitly checks database corruption detection

Summary

The refactoring is well-designed and mostly correct. The main concern is the consensus-critical body root validation using debug_assert - this should use runtime validation or have clear justification. The semantic changes from contains_block to has_state should be verified against fork choice requirements.


Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🤖 Codex Code Review

Found a few issues worth addressing; rest looks reasonable.

Findings (ordered by severity)

  • Critical: Block::from_header_and_body uses debug_assert_eq!, which is compiled out in release, so mismatched header/body pairs will be accepted silently. This can let corrupted or maliciously mixed data pass integrity checks and propagate invalid blocks. Use assert_eq! or return Result and handle it at call sites. crates/common/types/src/block.rs:224
  • High (availability/compat): from_anchor_state no longer stores a block body for the anchor. get_signed_block now requires a body, so requests for the anchor block (including genesis) will return None and peers may fail to fetch it during sync. Consider storing BlockBody::default() for genesis or allowing get_signed_block to reconstruct from header + default body when root is the anchor/genesis. crates/storage/src/store.rs:120, crates/storage/src/store.rs:565
  • Medium (storage migration): DB schema splits Blocks into BlockHeaders/BlockBodies. Existing DBs with a blocks CF will not be read, and the old data becomes orphaned unless you add migration or backward-compat logic. If this is intended, consider a migration path or an explicit version bump to avoid silent data loss. crates/storage/src/api/tables.rs:3

Notes

  • The move to header-only lookups in fork-choice and attestation validation is a nice performance win; the usage is consistent and avoids unnecessary deserialization.

If you want, I can propose concrete fixes for the debug_assert and anchor body handling.


Automated review by OpenAI Codex · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR refactors block storage by splitting block headers from block bodies into separate database tables. This architectural change simplifies handling of the anchor state, where block headers are readily available but bodies may not be.

Key Changes

  • Split Blocks table into BlockHeaders and BlockBodies tables
  • Renamed Store::from_genesis() to Store::from_anchor_state() to better reflect its purpose
  • Added Store::init_store() as internal initialization helper supporting optional body storage
  • Replaced get_block() calls throughout codebase with get_block_header() where only header data needed
  • Replaced contains_block() with has_state() for parent existence checks (more semantically accurate)
  • Added Block::header() and Block::from_header_and_body() helper methods
  • Removed unused functions: insert_block(), contains_block(), get_gossip_signature(), contains_gossip_signature(), iter_states()

Architecture Benefits

The split storage provides several advantages:

  • Headers can be retrieved without deserializing full block bodies
  • Anchor state initialization can work without block body availability
  • More efficient fork choice operations that only need header metadata (slot, parent_root)
  • Cleaner separation of concerns in storage layer

Implementation Quality

The refactoring is systematic and thorough across all layers (storage, blockchain, network, tests). The API surface has been appropriately updated with consistent naming conventions.

Confidence Score: 3/5

  • Safe to merge with one critical fix needed for body root validation in release builds
  • The refactoring is well-structured and systematically updates all affected code paths. However, there is one critical issue: Block::from_header_and_body() only validates body root consistency in debug builds via debug_assert_eq!. In release builds, mismatched headers and bodies would be silently accepted, potentially causing state corruption when reconstructing blocks from storage. This needs to be changed to a runtime assert_eq! to prevent data integrity issues.
  • Pay close attention to crates/common/types/src/block.rs - the body root validation must be fixed before merging

Important Files Changed

Filename Overview
crates/storage/src/api/tables.rs split Blocks table into BlockHeaders and BlockBodies tables, updated ALL_TABLES count
crates/common/types/src/block.rs added PartialEq, Eq to BlockHeader, added header() and from_header_and_body() methods; body root validation only in debug mode
crates/storage/src/store.rs major refactoring: split block storage into headers/bodies, renamed from_genesis to from_anchor_state, refactored initialization logic, removed unused functions, updated all block access to use headers
crates/blockchain/src/store.rs replaced all get_block calls with get_block_header, replaced contains_block with has_state

Sequence Diagram

sequenceDiagram
    participant Client
    participant Store
    participant Backend
    participant BlockHeadersTable
    participant BlockBodiesTable
    participant StatesTable

    Note over Client,StatesTable: Block Storage Flow (New Architecture)

    Client->>Store: insert_signed_block(root, signed_block)
    Store->>Store: extract header via block.header()
    Store->>Store: extract body from block.body
    Store->>Backend: begin_write()
    Backend-->>Store: WriteBatch
    Store->>Backend: put_batch(BlockHeaders, header)
    Store->>Backend: put_batch(BlockBodies, body)
    Store->>Backend: put_batch(BlockSignatures, signatures)
    Store->>Backend: commit()

    Note over Client,StatesTable: Block Retrieval Flow

    Client->>Store: get_signed_block(root)
    Store->>Backend: begin_read()
    Backend-->>Store: ReadView
    Store->>BlockHeadersTable: get(root)
    BlockHeadersTable-->>Store: header_bytes
    Store->>BlockBodiesTable: get(root)
    BlockBodiesTable-->>Store: body_bytes
    Store->>Store: Block::from_header_and_body(header, body)
    Store-->>Client: SignedBlockWithAttestation

    Note over Client,StatesTable: Header-Only Access (Optimized)

    Client->>Store: get_block_header(root)
    Store->>Backend: begin_read()
    Backend-->>Store: ReadView
    Store->>BlockHeadersTable: get(root)
    BlockHeadersTable-->>Store: header_bytes
    Store-->>Client: BlockHeader

    Note over Client,StatesTable: Initialization from Anchor State

    Client->>Store: from_anchor_state(backend, state)
    Store->>Store: extract header from state.latest_block_header
    Store->>Backend: put_batch(BlockHeaders, header)
    Note over Store,Backend: Body NOT stored (not available)
    Store->>Backend: put_batch(States, state)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant