Skip to content

feat: add checkpoint/resume system#196

Open
giveen wants to merge 7 commits intoSnaffCon:masterfrom
giveen:checkpoint
Open

feat: add checkpoint/resume system#196
giveen wants to merge 7 commits intoSnaffCon:masterfrom
giveen:checkpoint

Conversation

@giveen
Copy link

@giveen giveen commented Mar 6, 2026

Checkpoint / Resume System

Adds the ability to save scan progress to a checkpoint file and resume from where Snaffler left off — useful for long-running engagements against large environments where the scan may be interrupted.


New Flags

Flag Description
-g <path> / --checkpoint <path> Path to the checkpoint file or a directory (auto-names snaffler_checkpoint.json inside it). Enables checkpointing. If the file already exists, automatically resumes without a separate flag.
-w <minutes> / --checkpointinterval <minutes> How often to save the checkpoint (default: 10 minutes).

Usage

First run — save progress every 5 minutes into the current directory:

Snaffler.exe -s -o out.log -g . -w 5

This creates ./snaffler_checkpoint.json. If the scan is interrupted (Ctrl-C, network drop, VM reboot), the file retains everything completed so far.

Resume — run the exact same command again:

Snaffler.exe -s -o out.log -g . -w 5

Snaffler detects the existing checkpoint file and prints:

[Checkpoint] Loaded checkpoint from .\snaffler_checkpoint.json (written 2026-03-06 14:22:01Z)
[Checkpoint] Resuming – will skip 843 directories and 12 computers (0 redundant dir entries pruned)
...
[Checkpoint] Skipping already-scanned computer: DC01.corp.local
[Checkpoint] Skipping already-scanned directory: \\FS01\Finance\Reports\2024

Only the remaining targets are processed — nothing already scanned is repeated.


How It Works

  • A CheckpointManager singleton holds two ConcurrentDictionary sets: scanned directories and scanned computers.
  • A System.Threading.Timer fires every N minutes and atomically writes the state to disk via a temp-file + rename, preventing corruption if killed mid-write.
  • A final save is performed when the scan completes normally.
  • Skip logic is applied at three levels:
    1. ShareDiscovery — entire computers are skipped if already scanned.
    2. FileDiscovery — top-level path targets are skipped.
    3. TreeWalker.WalkTree — individual directories are skipped recursively.
  • Directories are marked complete at the end of WalkTree (after all file tasks are dispatched), not on entry — preventing silent data loss if the process is killed between marking and execution.
  • On load, redundant child-directory entries are pruned: if \\SRV\SHARE is in the set, \\SRV\SHARE\subdir is removed since the parent skip makes it unreachable anyway.
  • No extra NuGet dependencies — serialization uses DataContractJsonSerializer from System.Runtime.Serialization.

Benefits

  • Long engagements: scan a 10,000-host environment over multiple sessions without starting over.
  • Resilience: dropped VPN, rebooted jump box, or a crash no longer means hours of lost work.
  • Operator-friendly: resume is automatic — same command, no extra bookkeeping.
  • No performance cost on fresh runs: CheckpointManager is only instantiated when -g is supplied.
  • Compact state: load-time deduplication keeps the checkpoint file lean even after thousands of directories.

Files Changed

File Change
SnaffCore/Checkpoint/CheckpointData.cs New — serialisable POCO ([DataContract])
SnaffCore/Checkpoint/CheckpointManager.cs New — singleton manager, timer, save/load/dedup
SnaffCore/Config/Options.cs Added CheckpointFile, CheckpointIntervalMinutes
Snaffler/Config.cs Added -g/--checkpoint and -w/--checkpointinterval CLI flags
SnaffCore/SnaffCon.cs Initialise manager, start/stop timer, skip logic in discovery
SnaffCore/ShareFind/ShareFinder.cs Mark computer scanned after GetComputerShares completes
SnaffCore/TreeWalk/TreeWalker.cs Skip + mark directory at end of WalkTree
SnaffCore/SnaffCore.csproj Added System.Runtime.Serialization reference, new Compile items

giveen added 4 commits March 6, 2026 09:31
- New -g/--checkpoint <file> flag saves progress to a JSON file every
  N minutes (default 10, override with -w/--checkpointinterval).
- On re-run with the same -g path, completed directories and computers
  are skipped automatically so the scan picks up where it left off.
- CheckpointManager (thread-safe singleton) tracks scanned dirs and
  computers in ConcurrentDictionary sets; writes atomically via
  temp-file + rename to survive mid-write kills.
- TreeWalker marks each directory entered; ShareFinder marks each
  computer after its shares are queued.
- Final checkpoint is always written when the scan completes cleanly.
- Add Microsoft.NETFramework.ReferenceAssemblies to both projects to
  allow building net451 targets on Linux with dotnet SDK.
- Add explicit <Reference HintPath> items backed by $(NuGetPackageRoot)
  so old-style .csproj PackageReferences resolve on Linux/dotnet SDK.
- Manually import dnMerge.targets so the merged single-file exe is
  produced correctly (dnMerge embeds NLog, Nett, CommandLineParser and
  SnaffCore as compressed resources inside Snaffler.exe).
- Include compiled Snaffler.exe in this branch for quick testing.
- Mark directory as scanned at end of WalkTree (after all file/subdir
  tasks are queued) instead of on entry. Prevents files being silently
  dropped when the process is killed between the entry mark and the
  actual file-task execution. Inspired by analysis of upstream PR SnaffCon#171.
- CheckpointManager.TryLoad(): prune child-directory entries whose
  parent is already in the completed set. The parent being marked means
  WalkTree will skip it entirely, making any child entries unreachable
  and dead weight. Pruning keeps the in-memory set lean and now reports
  how many redundant entries were dropped at resume time.
- CheckpointManager: if a directory path is given (e.g. '-g .'), auto-
  generate 'snaffler_checkpoint.json' inside it rather than trying to
  use the directory itself as the file. Previously File.Exists('.') was
  false so TryLoad was never called, and File.Copy to '.' silently
  failed, leaving data forever stuck in the .tmp file.
- Promote checkpoint skip messages from Mq.Trace to Mq.Info so they
  are visible in normal (non-verbose) output.
Copilot AI review requested due to automatic review settings March 6, 2026 17:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a checkpoint/resume system to Snaffler, enabling scan progress to be persisted to a JSON file and automatically resumed on subsequent runs. This is useful for long-running engagements against large environments where scans may be interrupted.

Changes:

  • New CheckpointManager singleton and CheckpointData POCO that track scanned directories and computers using ConcurrentDictionary, with periodic atomic saves via a System.Timers.Timer.
  • CLI flags -g/--checkpoint and -w/--checkpointinterval wired into the existing argument parser and Options class.
  • Skip logic integrated at three levels: ShareDiscovery (computers), FileDiscovery (top-level paths), and TreeWalker.WalkTree (individual directories), with directories marked complete only after all sub-tasks are dispatched.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
SnaffCore/Checkpoint/CheckpointData.cs New [DataContract] POCO for serializable checkpoint state
SnaffCore/Checkpoint/CheckpointManager.cs New singleton managing in-memory state, save/load, and deduplication
SnaffCore/Config/Options.cs Added CheckpointFile and CheckpointIntervalMinutes properties
Snaffler/Config.cs Added -g and -w CLI argument parsing
SnaffCore/SnaffCon.cs Checkpoint initialization, timer lifecycle, and skip logic in discovery methods
SnaffCore/ShareFind/ShareFinder.cs Marks computer as scanned after share discovery completes
SnaffCore/TreeWalk/TreeWalker.cs Skip already-scanned directories and mark at end of WalkTree
SnaffCore/SnaffCore.csproj Added System.Runtime.Serialization reference, checkpoint compile items, and cross-platform build workarounds
Snaffler/Snaffler.csproj Added cross-platform build workarounds (HintPaths, dnMerge targets import)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- CheckpointData: clarify ScannedDirectories doc comment ('initiated' -> 'fully dispatched')
- SnaffCon: move IsComputerScanned check after DNS resolution to avoid IP vs hostname mismatch
- CheckpointManager: use File.Replace/File.Move for atomic checkpoint writes instead of File.Copy+Delete
- CheckpointManager: fix Initialize() doc comment (called inside SnaffCon ctor, not before)
- CheckpointManager: replace O(n²) deduplication loop with O(n log n) sort + single linear pass
- UltraSnaffCore.csproj: add System.Runtime.Serialization reference and Checkpoint compile items
- Config: validate checkpoint interval is >= 1 minute before accepting
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…d paths

- TreeWalker.WalkTree: call MarkDirectoryScanned before both early returns
  in the SCCM ContentLib block so those directories are checkpointed and
  not re-scanned on resume.

- CheckpointManager.TryLoad: clear _scannedDirectories and _scannedComputers
  in the catch block so a partial load failure truly resets to fresh state
  instead of silently skipping entries.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- TreeWalker.WalkTree: downgrade checkpoint skip message from Mq.Info to
  Mq.Trace to avoid flooding log output during resume, consistent with
  other per-item skip messages in the codebase.

- Config.cs: remove 'n' from the unused-letters comment; n is already
  taken by compTargetArg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants