feat(accountsdb): implement external snapshot insertion#1031
feat(accountsdb): implement external snapshot insertion#1031
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces directory-only snapshots with tar.gz archives and adds tar/flate2 dependencies. Snapshot creation is two-phase: create snapshot directory while holding a write lock, then archive the directory to tar.gz in background and register the archive (removing the directory). SnapshotManager gains archive creation/validation/extraction/atomic-swap/pruning and external-archive insertion/fast-forward. AccountsDb API adds take_snapshot, insert_external_snapshot, lock_database, and an unsafe checksum variant. Tests updated to use deterministic snapshot slots and validate archive lifecycle and fast-forward behavior. Assessment against linked issues
Out-of-scope changes
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 226-246: In find_and_remove_snapshot, avoid
registry.remove(index).unwrap(): replace the unwrap with explicit handling by
matching registry.remove(index) (or using
registry.get(index).cloned().ok_or(...)? then remove) and return an
AccountsDbError::SnapshotMissing(target_slot) (or a more specific error) if
removal fails; ensure chosen_archive is a PathBuf and preserve existing parsing
via Self::parse_slot, keeping the info! log and returning Ok((chosen_archive,
chosen_slot, index)).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7207e584-ba2a-40d2-819f-84caa4f9f124
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
magicblock-accounts-db/Cargo.tomlmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/snapshot.rsmagicblock-accounts-db/src/tests.rs
99c2c1f to
43a9a23
Compare
190bd7a to
b71e317
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
magicblock-accounts-db/src/snapshot.rs (1)
226-246:⚠️ Potential issue | 🟠 MajorReplace
.unwrap()with proper error handling.Line 240 uses
.unwrap()onregistry.remove(index). While the index is derived frombinary_searchon the same locked registry (making this practically safe), the coding guidelines require explicit error handling for any.unwrap()in production code.🛠️ Suggested fix
- let chosen_archive = registry.remove(index).unwrap(); + let chosen_archive = registry.remove(index).ok_or_else(|| { + AccountsDbError::Internal(format!( + "Registry index {} out of bounds during snapshot lookup", + index + )) + })?;As per coding guidelines: "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 226 - 246, In find_and_remove_snapshot, avoid the .unwrap() on registry.remove(index): replace it with explicit handling (e.g., match or if let) so that if removal yields None or an unexpected value you return Err(AccountsDbError::SnapshotMissing(target_slot)) instead of panicking; keep the rest of the flow (parse_slot -> ok_or(...), logging of chosen_slot/target_slot, and returning (chosen_archive, chosen_slot, index)) unchanged and reference registry, chosen_archive, chosen_slot, and Self::parse_slot to locate the exact change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 291-302: fast_forward currently pushes the new archive into the
registry via registry.lock().push_back(...) without invoking pruning, which can
let the registry exceed max_snapshots; update fast_forward in snapshot.rs to
call the existing prune_registry() (or the method that enforces max_snapshots)
after pushing the new entry (or call it before/after atomic_swap as appropriate)
so the registry is trimmed to max_snapshots, referencing the fast_forward,
extract_archive, atomic_swap, prune_registry, and registry.lock().push_back
symbols to locate and modify the code.
- Around line 216-224: The current validate_archive(bytes: &[u8]) implementation
only calls tar.entries() which doesn't detect truncated archives; update
validate_archive to iterate the Archive::entries() iterator and for each entry
attempt to fully read or validate the entry (e.g., read its header and drain its
contents) so any I/O or truncated-data errors surface during validation, and
propagate/log those errors via AccountsDbResult (replace the single
tar.entries() check with a loop that calls a read/drain operation on each Entry
and propagates failures).
---
Duplicate comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 226-246: In find_and_remove_snapshot, avoid the .unwrap() on
registry.remove(index): replace it with explicit handling (e.g., match or if
let) so that if removal yields None or an unexpected value you return
Err(AccountsDbError::SnapshotMissing(target_slot)) instead of panicking; keep
the rest of the flow (parse_slot -> ok_or(...), logging of
chosen_slot/target_slot, and returning (chosen_archive, chosen_slot, index))
unchanged and reference registry, chosen_archive, chosen_slot, and
Self::parse_slot to locate the exact change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d51a36a2-27d5-4a9f-8242-6aebee487b2a
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
magicblock-accounts-db/Cargo.tomlmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/snapshot.rsmagicblock-accounts-db/src/tests.rs
Manual Deploy AvailableYou can trigger a manual deploy of this PR branch to testnet: Alternative: Comment
Comment updated automatically when the PR is synchronized. |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
magicblock-accounts-db/src/snapshot.rs (1)
46-58:⚠️ Potential issue | 🔴 CriticalDo not release the global write lock before copying the live index files.
Line 56 resumes writers while
deep_copy_diris still copying everything exceptaccounts.dbfrom the live directory. On non-CoW filesystems, that can snapshotaccounts.dbat one point in time and the LMDB index at another, yielding a mismatched or unrecoverable archive.🛠️ Minimal correctness fix
Self::LegacyCopy => { - drop(lock); // Release lock before slow I/O - fs_backend::deep_copy_dir(src, dst, &memory_state) + fs_backend::deep_copy_dir(src, dst, &memory_state)?; + drop(lock); + Ok(()) }Based on learnings: "In magicblock-validator, the AccountsDb 'stop-the-world' synchronizer is managed at the processor/executor level... Snapshot operations acquire a write lock, blocking until all executors release their read locks."
Also applies to: 435-457
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 46 - 58, The LegacyCopy branch in execute currently drops the RwLockWriteGuard (lock) before calling fs_backend::deep_copy_dir, which allows writers to resume while the live index files are still being copied; instead keep the write lock held for the entire duration of the copy to ensure a consistent snapshot: remove the drop(lock) and call fs_backend::deep_copy_dir while the RwLockWriteGuard<()>, passed as lock, remains in scope (i.e., held) so the write lock isn't released until after deep_copy_dir returns; apply the same change to the other execute-like copy path referenced (the similar LegacyCopy handling around the other occurrence).
♻️ Duplicate comments (1)
magicblock-accounts-db/src/snapshot.rs (1)
228-241:⚠️ Potential issue | 🟠 MajorReplace the production
unwrap()with an explicit error path.Even if the index is expected to exist, Line 240 should not panic in production code.
🛠️ Proposed fix
- let chosen_archive = registry.remove(index).unwrap(); + let Some(chosen_archive) = registry.remove(index) else { + return Err(AccountsDbError::Internal(format!( + "snapshot registry lost index {index} while restoring slot {target_slot}" + ))); + };As per coding guidelines: "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 228 - 241, The code in find_and_remove_snapshot uses registry.remove(index).unwrap(), which can panic in production; replace the unwrap with an explicit error branch that returns an AccountsDbResult::Err (e.g., AccountsDbError::SnapshotMissing(target_slot) or a more specific error) when registry.remove(index) yields None, then proceed to parse the chosen_archive via Self::parse_slot(&chosen_archive) as before; ensure you also propagate any parse errors from Self::parse_slot by returning a proper AccountsDbError rather than panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-accounts-db/src/lib.rs`:
- Around line 397-408: The code currently swaps in the new snapshot as soon as
snapshot_manager.insert_external_snapshot returns true, then calls
self.storage.reload and self.index.reload which can fail and leave the DB
replaced; instead, change the flow so you validate the extracted snapshot before
deleting the backup or flipping db_path: modify insert_external_snapshot
behavior or add a new operation on SnapshotManager (e.g.
extract_external_snapshot or snapshot_manager.extracted_path()) that only
extracts and returns the extracted path without committing; then open/validate
that path (e.g. attempt to open a temporary Storage/Index instance or run a
light integrity check) and only if both validations succeed perform the actual
swap/commit (or call snapshot_manager.commit_fast_forward); alternatively, keep
the backup until both self.storage.reload(path) and self.index.reload(path)
succeed and on any error call snapshot_manager.restore_backup() to revert to the
previous db_path; reference snapshot_manager.insert_external_snapshot,
fast_forwarded, snapshot_manager.database_path, self.storage.reload,
self.index.reload when applying this change.
- Around line 286-289: Change the method signature of set_slot from taking
&Arc<Self> to taking &self and adjust its implementation to call
self.storage.update_slot(slot) as before; specifically, update the receiver on
the pub fn set_slot declaration to &self (not &Arc<Self>) so callers can call
set_slot on any &State without requiring an Arc, and leave the body using
storage.update_slot(slot) since storage already uses interior mutability.
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 111-123: Call prune_registry only after the new snapshot/archive
has been durably created and recorded: move the prune_registry() invocation from
before SnapshotStrategy::execute to after strategy.execute(...) has returned Ok
and any registration of the new snapshot (the code path that makes the snapshot
visible/recorded) has completed, so you never delete the last good restore point
on a failed copy/archive; apply the same change to the other occurrence
referenced around lines 151-152 (the other prune_registry call), and ensure any
in-flight snapshot accounting (used to enforce max_snapshots) is updated before
pruning so overlapping background snapshots are counted correctly.
- Around line 342-368: The current startup recovery collects snapshot archive
paths into `paths`, trims to the newest `max` by computing `offset` and
returning `paths.into_iter().skip(offset).collect()`, but it does not remove the
trimmed files from disk causing orphaned archives; update this logic to remove
on-disk files for entries that will be dropped (e.g., iterate the older set:
`paths.into_iter().take(offset)`), call `fs::remove_file` (or
`fs::remove_dir_all` for any orphan dirs if applicable), log success/failure
(use the existing logger and include the path), and then return the remaining
paths (the ones after `offset`) so `parse_slot`, `SNAPSHOT_PREFIX`, `max`, and
`offset` behavior remain intact.
- Around line 304-307: register_archive currently always appends
(registry.lock().push_back(archive_path)) which breaks the binary-search
assumptions in find_and_remove_snapshot and snapshot_exists; change
register_archive to insert the new archive_path into registry in sorted order
instead of always pushing back: compute the slot for archive_path (reuse or call
the same helper used by snapshot_exists/find_and_remove_snapshot to extract slot
from a PathBuf), lock the registry, find the insertion index with
binary_search_by or partition_point comparing slots of existing PathBuf entries,
and insert at that index (registry.lock().insert(idx, archive_path)) so the
deque remains sorted by slot and binary-search lookups continue to work.
---
Outside diff comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 46-58: The LegacyCopy branch in execute currently drops the
RwLockWriteGuard (lock) before calling fs_backend::deep_copy_dir, which allows
writers to resume while the live index files are still being copied; instead
keep the write lock held for the entire duration of the copy to ensure a
consistent snapshot: remove the drop(lock) and call fs_backend::deep_copy_dir
while the RwLockWriteGuard<()>, passed as lock, remains in scope (i.e., held) so
the write lock isn't released until after deep_copy_dir returns; apply the same
change to the other execute-like copy path referenced (the similar LegacyCopy
handling around the other occurrence).
---
Duplicate comments:
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 228-241: The code in find_and_remove_snapshot uses
registry.remove(index).unwrap(), which can panic in production; replace the
unwrap with an explicit error branch that returns an AccountsDbResult::Err
(e.g., AccountsDbError::SnapshotMissing(target_slot) or a more specific error)
when registry.remove(index) yields None, then proceed to parse the
chosen_archive via Self::parse_slot(&chosen_archive) as before; ensure you also
propagate any parse errors from Self::parse_slot by returning a proper
AccountsDbError rather than panicking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fdc6023b-6117-49c4-8dcb-ada0a3d68452
📒 Files selected for processing (7)
config.example.tomlmagicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/snapshot.rsmagicblock-accounts-db/src/tests.rsmagicblock-config/src/config/accounts.rsmagicblock-config/src/consts.rsmagicblock-config/src/tests.rs
💤 Files with no reviewable changes (4)
- magicblock-config/src/config/accounts.rs
- magicblock-config/src/tests.rs
- magicblock-config/src/consts.rs
- config.example.toml
| let current_slot = self.slot(); | ||
| let fast_forwarded = self.snapshot_manager.insert_external_snapshot( | ||
| slot, | ||
| archive_bytes, | ||
| current_slot, | ||
| )?; | ||
|
|
||
| if fast_forwarded { | ||
| // Reload components to reflect new state | ||
| let path = self.snapshot_manager.database_path(); | ||
| self.storage.reload(path)?; | ||
| self.index.reload(path)?; |
There was a problem hiding this comment.
Validate the staged snapshot before replacing the live DB.
Once Line 398 returns true, SnapshotManager::fast_forward has already swapped db_path. If Line 407 or Line 408 fails, this method returns Err after the old database has been replaced and its backup removed. A malformed-but-extractable external archive can therefore destroy the current DB. Validate/open the extracted snapshot first, or keep the backup until both reloads succeed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/lib.rs` around lines 397 - 408, The code currently
swaps in the new snapshot as soon as snapshot_manager.insert_external_snapshot
returns true, then calls self.storage.reload and self.index.reload which can
fail and leave the DB replaced; instead, change the flow so you validate the
extracted snapshot before deleting the backup or flipping db_path: modify
insert_external_snapshot behavior or add a new operation on SnapshotManager
(e.g. extract_external_snapshot or snapshot_manager.extracted_path()) that only
extracts and returns the extracted path without committing; then open/validate
that path (e.g. attempt to open a temporary Storage/Index instance or run a
light integrity check) and only if both validations succeed perform the actual
swap/commit (or call snapshot_manager.commit_fast_forward); alternatively, keep
the backup until both self.storage.reload(path) and self.index.reload(path)
succeed and on any error call snapshot_manager.restore_backup() to revert to the
previous db_path; reference snapshot_manager.insert_external_snapshot,
fast_forwarded, snapshot_manager.database_path, self.storage.reload,
self.index.reload when applying this change.
| self.prune_registry(); | ||
|
|
||
| // 2. Prepare Data Capture | ||
| // If legacy copy, we must capture state while lock is held. | ||
| let snap_path = self.slot_to_dir_path(slot); | ||
| let memory_capture = | ||
| matches!(self.strategy, SnapshotStrategy::LegacyCopy) | ||
| .then(|| active_mem.to_vec()) | ||
| .unwrap_or_default(); | ||
|
|
||
| // 3. Execute Snapshot | ||
| self.strategy | ||
| .execute(&self.db_path, &snap_path, memory_capture, lock) | ||
| .log_err(|| "Snapshot failed")?; | ||
|
|
||
| // 4. Register success | ||
| self.registry.lock().push_back(snap_path); | ||
| Ok(()) | ||
| Ok(snap_path) |
There was a problem hiding this comment.
Prune after the new archive is durably registered, not before.
Line 111 deletes the oldest snapshot before the replacement exists. If the copy/archive step fails, you've already dropped the last good restore point. Overlapping background snapshots also undercount in-flight archives and can exceed max_snapshots.
🛠️ Proposed fix
- self.prune_registry();
-
let snap_path = self.slot_to_dir_path(slot);
let memory_capture =
matches!(self.strategy, SnapshotStrategy::LegacyCopy)
.then(|| active_mem.to_vec())
.unwrap_or_default();
@@
- self.register_archive(archive_path.clone());
+ self.register_archive(archive_path.clone());
+ self.prune_registry();
Ok(archive_path)Also applies to: 151-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 111 - 123, Call
prune_registry only after the new snapshot/archive has been durably created and
recorded: move the prune_registry() invocation from before
SnapshotStrategy::execute to after strategy.execute(...) has returned Ok and any
registration of the new snapshot (the code path that makes the snapshot
visible/recorded) has completed, so you never delete the last good restore point
on a failed copy/archive; apply the same change to the other occurrence
referenced around lines 151-152 (the other prune_registry call), and ensure any
in-flight snapshot accounting (used to enforce max_snapshots) is updated before
pruning so overlapping background snapshots are counted correctly.
| /// Registers an archive in the registry. | ||
| fn register_archive(&self, archive_path: PathBuf) { | ||
| info!(archive_path = %archive_path.display(), "Snapshot registered"); | ||
| self.registry.lock().push_back(archive_path); |
There was a problem hiding this comment.
Maintain sorted order in the registry.
find_and_remove_snapshot and snapshot_exists binary-search this deque, but Line 307 always appends. The slot <= current_slot external-insert path can therefore append an older archive after newer ones and break restore/existence lookups.
🛠️ Proposed fix
fn register_archive(&self, archive_path: PathBuf) {
info!(archive_path = %archive_path.display(), "Snapshot registered");
- self.registry.lock().push_back(archive_path);
+ let mut registry = self.registry.lock();
+ let insert_at = registry
+ .make_contiguous()
+ .binary_search(&archive_path)
+ .unwrap_or_else(|i| i);
+ registry.insert(insert_at, archive_path);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Registers an archive in the registry. | |
| fn register_archive(&self, archive_path: PathBuf) { | |
| info!(archive_path = %archive_path.display(), "Snapshot registered"); | |
| self.registry.lock().push_back(archive_path); | |
| /// Registers an archive in the registry. | |
| fn register_archive(&self, archive_path: PathBuf) { | |
| info!(archive_path = %archive_path.display(), "Snapshot registered"); | |
| let mut registry = self.registry.lock(); | |
| let insert_at = registry | |
| .make_contiguous() | |
| .binary_search(&archive_path) | |
| .unwrap_or_else(|i| i); | |
| registry.insert(insert_at, archive_path); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 304 - 307,
register_archive currently always appends
(registry.lock().push_back(archive_path)) which breaks the binary-search
assumptions in find_and_remove_snapshot and snapshot_exists; change
register_archive to insert the new archive_path into registry in sorted order
instead of always pushing back: compute the slot for archive_path (reuse or call
the same helper used by snapshot_exists/find_and_remove_snapshot to extract slot
from a PathBuf), lock the registry, find the insertion index with
binary_search_by or partition_point comparing slots of existing PathBuf entries,
and insert at that index (registry.lock().insert(idx, archive_path)) so the
deque remains sorted by slot and binary-search lookups continue to work.
| let mut paths: Vec<_> = fs::read_dir(dir)? | ||
| .filter_map(|e| e.ok()) | ||
| .map(|e| e.path()) | ||
| .filter(|p| p.is_dir() && Self::parse_slot(p).is_some()) | ||
| .filter(|p| { | ||
| p.is_file() | ||
| && p.extension().is_some_and(|e| e == "gz") | ||
| && Self::parse_slot(p).is_some() | ||
| }) | ||
| .collect(); | ||
| paths.sort(); | ||
|
|
||
| // Clean orphan directories (interrupted archiving) | ||
| for entry in fs::read_dir(dir)?.filter_map(|e| e.ok()) { | ||
| let path = entry.path(); | ||
| let name = path.file_name().and_then(OsStr::to_str); | ||
| if path.is_dir() | ||
| && name.is_some_and(|n| { | ||
| n.starts_with(SNAPSHOT_PREFIX) && !n.contains('.') | ||
| }) | ||
| { | ||
| warn!(path = %path.display(), "Cleaning up orphan snapshot directory"); | ||
| let _ = fs::remove_dir_all(&path); | ||
| } | ||
| } | ||
|
|
||
| paths.sort(); // Sorting works due to zero-padded filenames | ||
|
|
||
| // Enforce limit strictly on startup | ||
| let offset = paths.len().saturating_sub(max); | ||
| Ok(paths.into_iter().skip(offset).collect()) |
There was a problem hiding this comment.
Delete over-limit archives during startup recovery.
Line 368 drops old paths from the in-memory registry but leaves the files on disk. After a restart, anything beyond max_snapshots becomes an orphan that is never pruned again, which defeats the space-saving goal of the archival change.
🛠️ Proposed fix
- let offset = paths.len().saturating_sub(max);
- Ok(paths.into_iter().skip(offset).collect())
+ let offset = paths.len().saturating_sub(max);
+ for stale in &paths[..offset] {
+ if let Err(e) = fs::remove_file(stale) {
+ warn!(path = %stale.display(), error = ?e, "Failed to prune stale snapshot archive during recovery");
+ }
+ }
+ Ok(paths.into_iter().skip(offset).collect())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut paths: Vec<_> = fs::read_dir(dir)? | |
| .filter_map(|e| e.ok()) | |
| .map(|e| e.path()) | |
| .filter(|p| p.is_dir() && Self::parse_slot(p).is_some()) | |
| .filter(|p| { | |
| p.is_file() | |
| && p.extension().is_some_and(|e| e == "gz") | |
| && Self::parse_slot(p).is_some() | |
| }) | |
| .collect(); | |
| paths.sort(); | |
| // Clean orphan directories (interrupted archiving) | |
| for entry in fs::read_dir(dir)?.filter_map(|e| e.ok()) { | |
| let path = entry.path(); | |
| let name = path.file_name().and_then(OsStr::to_str); | |
| if path.is_dir() | |
| && name.is_some_and(|n| { | |
| n.starts_with(SNAPSHOT_PREFIX) && !n.contains('.') | |
| }) | |
| { | |
| warn!(path = %path.display(), "Cleaning up orphan snapshot directory"); | |
| let _ = fs::remove_dir_all(&path); | |
| } | |
| } | |
| paths.sort(); // Sorting works due to zero-padded filenames | |
| // Enforce limit strictly on startup | |
| let offset = paths.len().saturating_sub(max); | |
| Ok(paths.into_iter().skip(offset).collect()) | |
| let mut paths: Vec<_> = fs::read_dir(dir)? | |
| .filter_map(|e| e.ok()) | |
| .map(|e| e.path()) | |
| .filter(|p| { | |
| p.is_file() | |
| && p.extension().is_some_and(|e| e == "gz") | |
| && Self::parse_slot(p).is_some() | |
| }) | |
| .collect(); | |
| paths.sort(); | |
| // Clean orphan directories (interrupted archiving) | |
| for entry in fs::read_dir(dir)?.filter_map(|e| e.ok()) { | |
| let path = entry.path(); | |
| let name = path.file_name().and_then(OsStr::to_str); | |
| if path.is_dir() | |
| && name.is_some_and(|n| { | |
| n.starts_with(SNAPSHOT_PREFIX) && !n.contains('.') | |
| }) | |
| { | |
| warn!(path = %path.display(), "Cleaning up orphan snapshot directory"); | |
| let _ = fs::remove_dir_all(&path); | |
| } | |
| } | |
| let offset = paths.len().saturating_sub(max); | |
| for stale in &paths[..offset] { | |
| if let Err(e) = fs::remove_file(stale) { | |
| warn!(path = %stale.display(), error = ?e, "Failed to prune stale snapshot archive during recovery"); | |
| } | |
| } | |
| Ok(paths.into_iter().skip(offset).collect()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/snapshot.rs` around lines 342 - 368, The current
startup recovery collects snapshot archive paths into `paths`, trims to the
newest `max` by computing `offset` and returning
`paths.into_iter().skip(offset).collect()`, but it does not remove the trimmed
files from disk causing orphaned archives; update this logic to remove on-disk
files for entries that will be dropped (e.g., iterate the older set:
`paths.into_iter().take(offset)`), call `fs::remove_file` (or
`fs::remove_dir_all` for any orphan dirs if applicable), log success/failure
(use the existing logger and include the path), and then return the remaining
paths (the ones after `offset`) so `parse_slot`, `SNAPSHOT_PREFIX`, `max`, and
`offset` behavior remain intact.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-integration/test-ledger-restore/src/lib.rs`:
- Line 61: The helper currently calls AccountsDbConfig::default(), which wipes
any reset flag set by the caller and can leave on-disk AccountsDb/snapshots that
make restores boot from stale state; change the creation of accountsdb_config so
it preserves the caller's reset_ledger flag (e.g., construct AccountsDbConfig
with reset: reset_ledger and the rest from Default::default()) instead of
calling AccountsDbConfig::default(); update the variable used by
setup_offline_validator(...) so the same config (AccountsDbConfig with reset set
from reset_ledger) is passed through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a1c42ab8-339d-402f-a08b-e2d698332151
⛔ Files ignored due to path filters (1)
test-integration/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
test-integration/configs/api-conf.ephem.tomltest-integration/configs/chainlink-conf.devnet.tomltest-integration/configs/claim-fees-test.tomltest-integration/configs/cloning-conf.devnet.tomltest-integration/configs/cloning-conf.ephem.tomltest-integration/configs/committor-conf.devnet.tomltest-integration/configs/config-conf.devnet.tomltest-integration/configs/restore-ledger-conf.devnet.tomltest-integration/configs/schedulecommit-conf-fees.ephem.tomltest-integration/configs/schedulecommit-conf.devnet.tomltest-integration/configs/schedulecommit-conf.ephem.frequent-commits.tomltest-integration/configs/schedulecommit-conf.ephem.tomltest-integration/configs/validator-offline.devnet.tomltest-integration/test-ledger-restore/src/lib.rs
💤 Files with no reviewable changes (13)
- test-integration/configs/validator-offline.devnet.toml
- test-integration/configs/schedulecommit-conf.ephem.toml
- test-integration/configs/config-conf.devnet.toml
- test-integration/configs/schedulecommit-conf.ephem.frequent-commits.toml
- test-integration/configs/cloning-conf.ephem.toml
- test-integration/configs/schedulecommit-conf.devnet.toml
- test-integration/configs/api-conf.ephem.toml
- test-integration/configs/restore-ledger-conf.devnet.toml
- test-integration/configs/schedulecommit-conf-fees.ephem.toml
- test-integration/configs/chainlink-conf.devnet.toml
- test-integration/configs/claim-fees-test.toml
- test-integration/configs/cloning-conf.devnet.toml
- test-integration/configs/committor-conf.devnet.toml
| snapshot_frequency: SNAPSHOT_FREQUENCY, | ||
| ..Default::default() | ||
| }; | ||
| let accountsdb_config = AccountsDbConfig::default(); |
There was a problem hiding this comment.
Preserve the AccountsDb reset flag here.
AccountsDbConfig::default() forces reset back to false, so setup_offline_validator(..., reset_ledger = true, ...) now clears the ledger but can leave the on-disk AccountsDb/snapshot archive behind in the same storage directory. That makes these restore tests prone to booting from stale state. This helper should mirror the pattern already used at Lines 133-136.
Proposed fix
- let accountsdb_config = AccountsDbConfig::default();
+ let accountsdb_config = AccountsDbConfig {
+ reset: reset_ledger,
+ ..Default::default()
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let accountsdb_config = AccountsDbConfig::default(); | |
| let accountsdb_config = AccountsDbConfig { | |
| reset: reset_ledger, | |
| ..Default::default() | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-integration/test-ledger-restore/src/lib.rs` at line 61, The helper
currently calls AccountsDbConfig::default(), which wipes any reset flag set by
the caller and can leave on-disk AccountsDb/snapshots that make restores boot
from stale state; change the creation of accountsdb_config so it preserves the
caller's reset_ledger flag (e.g., construct AccountsDbConfig with reset:
reset_ledger and the rest from Default::default()) instead of calling
AccountsDbConfig::default(); update the variable used by
setup_offline_validator(...) so the same config (AccountsDbConfig with reset set
from reset_ledger) is passed through.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
magicblock-accounts-db/src/snapshot.rs (4)
114-119:⚠️ Potential issue | 🟠 MajorPrune after archive is durably registered, not after directory creation.
prune_registry()is called increate_snapshot_dir(line 117) after the snapshot directory is created but beforearchive_and_registercompletes. If archiving fails, you've already deleted the oldest archive, potentially leaving fewer restore points than expected.Consider moving the prune to
archive_and_registerafter the archive is successfully written and registered.🛠️ Suggested fix
// In create_snapshot_dir: self.strategy .execute(&self.db_path, &snap_path, memory_capture) .log_err(|| "Snapshot failed")?; - self.prune_registry(); Ok(snap_path) } // In archive_and_register, after register_archive: self.register_archive(archive_path.clone()); + self.prune_registry(); Ok(archive_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 114 - 119, The prune_registry() call is happening too early in create_snapshot_dir (after the snapshot directory is created) which can delete oldest archives even if archive_and_register/strategy.execute fails; move the prune_registry() invocation out of create_snapshot_dir and call it only after archive_and_register (or after self.strategy.execute completes successfully) so pruning runs only when the archive is durably written and registered; update create_snapshot_dir to no longer prune and add the prune_registry() call immediately after archive_and_register (or after the log_err? check) in create_snapshot so error paths do not trigger pruning.
259-259:⚠️ Potential issue | 🟠 MajorReplace
.unwrap()with explicit error handling.Line 259 uses
.unwrap()onregistry.remove(index). While the index was just validated viabinary_search, the coding guidelines require proper error handling for all.unwrap()in production code. Theremovemethod returnsOption<T>, and while logically safe here, an explicit.ok_or_else()would satisfy the guidelines and document the invariant.🛠️ Suggested fix
- let chosen_archive = registry.remove(index).unwrap(); + // INVARIANT: index is valid from binary_search on the same locked registry + let chosen_archive = registry.remove(index).ok_or_else(|| { + AccountsDbError::Internal(format!( + "Registry index {index} became invalid during snapshot lookup" + )) + })?;As per coding guidelines: "Treat any usage of
.unwrap()or.expect()in production Rust code as a MAJOR issue."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` at line 259, Replace the .unwrap() on registry.remove(index) with explicit error handling: call registry.remove(index).ok_or_else(|| /* create a descriptive error e.g., SnapshotError::InvariantViolation(...) */) and propagate or map that error where appropriate from the surrounding function (e.g., by returning a Result). Update any callers if necessary to handle the new error return; reference the variables chosen_archive, registry.remove, and index and ensure the error message documents the invariant that binary_search validated the index.
323-327:⚠️ Potential issue | 🟠 MajorMaintain sorted order when registering archives.
register_archivealways appends viapush_back, butfind_and_remove_snapshotandsnapshot_existsusebinary_searchwhich requires sorted order. When an external snapshot withslot <= current_slotis inserted (the non-fast-forward path), it will be appended after potentially newer archives, breaking the binary search.🛠️ Suggested fix to maintain sorted order
fn register_archive(&self, archive_path: PathBuf) { info!(archive_path = %archive_path.display(), "Snapshot registered"); - self.registry.lock().push_back(archive_path); + let mut registry = self.registry.lock(); + // Maintain sorted order for binary_search compatibility + let insert_pos = registry + .binary_search(&archive_path) + .unwrap_or_else(|i| i); + registry.insert(insert_pos, archive_path); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 323 - 327, register_archive currently appends with registry.lock().push_back(archive_path) which breaks the sorted invariant required by find_and_remove_snapshot and snapshot_exists (they use binary_search). Instead of push_back, compute the sort key used by those functions (the snapshot slot extracted from the PathBuf) and insert archive_path into registry at the correct sorted position: obtain the mutex guard on registry, use binary_search_by (or binary_search_by_key) with the same comparator/key extraction as snapshot_exists/find_and_remove_snapshot to get an index, then call insert(index_or_insertion_point, archive_path) on the underlying collection (VecDeque supports insert) so the registry remains sorted; ensure you handle the Ok (already present) and Err (insertion point) cases consistently.
386-388:⚠️ Potential issue | 🟠 MajorDelete over-limit archives during startup recovery.
The code trims the registry to
maxsnapshots but doesn't delete the excluded files from disk. After restart, archives beyondmax_snapshotsbecome orphans that are never pruned, defeating the space-saving goal.🛠️ Suggested fix to delete excess archives
let offset = paths.len().saturating_sub(max); + // Delete excess archives from disk + for stale in &paths[..offset] { + if let Err(e) = fs::remove_file(stale) { + warn!(path = %stale.display(), error = ?e, "Failed to prune excess snapshot archive during recovery"); + } + } Ok(paths.into_iter().skip(offset).collect())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/snapshot.rs` around lines 386 - 388, The current startup recovery logic trims the in-memory list (using paths, max, offset) but does not remove the excluded archive files from disk; update the function that builds/returns paths (the block using let offset = paths.len().saturating_sub(max); Ok(paths.into_iter().skip(offset).collect())) to delete the files that are being dropped before returning the trimmed list: iterate over paths[..offset], call std::fs::remove_file (or async equivalent) for each path, handle and log errors (don’t panic on failure) and only collect/return the remaining paths via paths.into_iter().skip(offset).collect(), ensuring deletion occurs during startup recovery to prevent orphaned archives.magicblock-accounts-db/src/lib.rs (1)
386-406:⚠️ Potential issue | 🟠 MajorVerify error recovery when reload fails after fast-forward.
If
insert_external_snapshotreturnstrue(fast-forward completed with atomic swap), butstorage.reload()orindex.reload()subsequently fails, the method returnsErrwhile the filesystem has already been modified. The caller has no way to recover the in-memory state.Consider either:
- Validating the extracted snapshot can be loaded before committing the atomic swap, or
- Keeping the backup until both reloads succeed
#!/bin/bash # Check if atomic_swap keeps backup available for rollback after swap completes rg -n "atomic_swap|backup|\.bak" magicblock-accounts-db/src/snapshot.rs -B 2 -A 10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@magicblock-accounts-db/src/lib.rs` around lines 386 - 406, The fast-forward path in insert_external_snapshot calls snapshot_manager.insert_external_snapshot which performs an atomic swap before calling storage.reload and index.reload, leaving no way to recover if either reload fails; change this to a two-step flow: modify snapshot_manager.insert_external_snapshot (or add new prepare_external_snapshot / commit_external_snapshot APIs) so you can first extract and prepare the snapshot without performing the atomic swap, then call storage.reload(path) and index.reload(path) to validate the new on-disk state, and only after both reloads succeed call a commit/atomic_swap method (or remove the backup) to finalize; alternatively, if you cannot change insert_external_snapshot, add a rollback path that uses snapshot_manager.database_path/backup support to restore the previous DB on reload failure and ensure insert_external_snapshot returns a handle or flag you can use to trigger rollback/commit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@magicblock-accounts-db/src/tests.rs`:
- Around line 337-342: The test unnecessarily wraps AccountsDb in an Arc then
immediately unwraps it; replace the Arc::new + Arc::try_unwrap sequence by
instantiating AccountsDb directly so you can mutate it: call
AccountsDb::new(&config, temp_dir.path(), 0).unwrap() into a mutable variable,
then call set_slot(SNAPSHOT_SLOT + 1000) on that variable (remove Arc::new and
Arc::try_unwrap around AccountsDb and references to Arc).
---
Duplicate comments:
In `@magicblock-accounts-db/src/lib.rs`:
- Around line 386-406: The fast-forward path in insert_external_snapshot calls
snapshot_manager.insert_external_snapshot which performs an atomic swap before
calling storage.reload and index.reload, leaving no way to recover if either
reload fails; change this to a two-step flow: modify
snapshot_manager.insert_external_snapshot (or add new prepare_external_snapshot
/ commit_external_snapshot APIs) so you can first extract and prepare the
snapshot without performing the atomic swap, then call storage.reload(path) and
index.reload(path) to validate the new on-disk state, and only after both
reloads succeed call a commit/atomic_swap method (or remove the backup) to
finalize; alternatively, if you cannot change insert_external_snapshot, add a
rollback path that uses snapshot_manager.database_path/backup support to restore
the previous DB on reload failure and ensure insert_external_snapshot returns a
handle or flag you can use to trigger rollback/commit.
In `@magicblock-accounts-db/src/snapshot.rs`:
- Around line 114-119: The prune_registry() call is happening too early in
create_snapshot_dir (after the snapshot directory is created) which can delete
oldest archives even if archive_and_register/strategy.execute fails; move the
prune_registry() invocation out of create_snapshot_dir and call it only after
archive_and_register (or after self.strategy.execute completes successfully) so
pruning runs only when the archive is durably written and registered; update
create_snapshot_dir to no longer prune and add the prune_registry() call
immediately after archive_and_register (or after the log_err? check) in
create_snapshot so error paths do not trigger pruning.
- Line 259: Replace the .unwrap() on registry.remove(index) with explicit error
handling: call registry.remove(index).ok_or_else(|| /* create a descriptive
error e.g., SnapshotError::InvariantViolation(...) */) and propagate or map that
error where appropriate from the surrounding function (e.g., by returning a
Result). Update any callers if necessary to handle the new error return;
reference the variables chosen_archive, registry.remove, and index and ensure
the error message documents the invariant that binary_search validated the
index.
- Around line 323-327: register_archive currently appends with
registry.lock().push_back(archive_path) which breaks the sorted invariant
required by find_and_remove_snapshot and snapshot_exists (they use
binary_search). Instead of push_back, compute the sort key used by those
functions (the snapshot slot extracted from the PathBuf) and insert archive_path
into registry at the correct sorted position: obtain the mutex guard on
registry, use binary_search_by (or binary_search_by_key) with the same
comparator/key extraction as snapshot_exists/find_and_remove_snapshot to get an
index, then call insert(index_or_insertion_point, archive_path) on the
underlying collection (VecDeque supports insert) so the registry remains sorted;
ensure you handle the Ok (already present) and Err (insertion point) cases
consistently.
- Around line 386-388: The current startup recovery logic trims the in-memory
list (using paths, max, offset) but does not remove the excluded archive files
from disk; update the function that builds/returns paths (the block using let
offset = paths.len().saturating_sub(max);
Ok(paths.into_iter().skip(offset).collect())) to delete the files that are being
dropped before returning the trimmed list: iterate over paths[..offset], call
std::fs::remove_file (or async equivalent) for each path, handle and log errors
(don’t panic on failure) and only collect/return the remaining paths via
paths.into_iter().skip(offset).collect(), ensuring deletion occurs during
startup recovery to prevent orphaned archives.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 60531684-0351-4a7b-abcb-d6d5d84fbbe7
📒 Files selected for processing (3)
magicblock-accounts-db/src/lib.rsmagicblock-accounts-db/src/snapshot.rsmagicblock-accounts-db/src/tests.rs
| let new_db = | ||
| Arc::new(AccountsDb::new(&config, temp_dir.path(), 0).unwrap()); | ||
| new_db.set_slot(SNAPSHOT_SLOT + 1000); // Advance past snapshot slot | ||
|
|
||
| // Unwrap Arc to get mutable access | ||
| let mut new_db = Arc::try_unwrap(new_db).unwrap(); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Simplify: Remove unnecessary Arc wrap/unwrap.
The code wraps AccountsDb in an Arc only to immediately unwrap it. Since this is test code and you need mutable access, create the AccountsDb directly without the Arc.
♻️ Suggested simplification
- let new_db =
- Arc::new(AccountsDb::new(&config, temp_dir.path(), 0).unwrap());
- new_db.set_slot(SNAPSHOT_SLOT + 1000); // Advance past snapshot slot
-
- // Unwrap Arc to get mutable access
- let mut new_db = Arc::try_unwrap(new_db).unwrap();
+ let mut new_db = AccountsDb::new(&config, temp_dir.path(), 0).unwrap();
+ new_db.set_slot(SNAPSHOT_SLOT + 1000); // Advance past snapshot slot📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let new_db = | |
| Arc::new(AccountsDb::new(&config, temp_dir.path(), 0).unwrap()); | |
| new_db.set_slot(SNAPSHOT_SLOT + 1000); // Advance past snapshot slot | |
| // Unwrap Arc to get mutable access | |
| let mut new_db = Arc::try_unwrap(new_db).unwrap(); | |
| let mut new_db = AccountsDb::new(&config, temp_dir.path(), 0).unwrap(); | |
| new_db.set_slot(SNAPSHOT_SLOT + 1000); // Advance past snapshot slot |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@magicblock-accounts-db/src/tests.rs` around lines 337 - 342, The test
unnecessarily wraps AccountsDb in an Arc then immediately unwraps it; replace
the Arc::new + Arc::try_unwrap sequence by instantiating AccountsDb directly so
you can mutate it: call AccountsDb::new(&config, temp_dir.path(), 0).unwrap()
into a mutable variable, then call set_slot(SNAPSHOT_SLOT + 1000) on that
variable (remove Arc::new and Arc::try_unwrap around AccountsDb and references
to Arc).

Summary
Compatibility
Testing
Checklist
Summary by CodeRabbit
New Features
Tests
Chores