-
Notifications
You must be signed in to change notification settings - Fork 385
Description
Description
After each AutoForge coding session ends, all dev server processes spawned by the agent remain running as orphans. Over multiple feature implementations, this accumulates dozens of zombie node.exe, cmd.exe, vite, and tsx processes that consume memory and lock ports.
In a typical session building a React + Express app, a single npm run dev (via concurrently) spawns ~10 processes. After 3 coding sessions, I had 22 orphan node.exe and 20 orphan cmd.exe processes running with no parent, all holding memory and blocking ports 3000/5173.
Environment
- OS: Windows 11
- AutoForge version: Latest (installed Feb 2026)
- Project type: React (Vite) + Express (tsx) with
concurrently - Agent model: Claude
Steps to Reproduce
- Create a project that uses
npm run dev(e.g., React + Express withconcurrently) - Let the coding agent run a few features — the agent will call
npm run devto test - Wait for the agent session to complete (feature marked as passing)
- Open Task Manager → observe orphan
node.exeandcmd.exeprocesses still running - Repeat for 2-3 more features → processes accumulate
Root Cause Analysis
I traced through the codebase and identified a three-layer process cleanup gap:
AutoForge Orchestrator (parallel_orchestrator.py)
└─ Agent Subprocess (autonomous_agent_demo.py)
└─ Claude SDK Client (client.py)
└─ Claude CLI Subprocess (subprocess_cli.py)
└─ Bash tool executes: npm run dev
└─ cmd.exe → node (concurrently)
├─ cmd.exe → node (vite) ← ORPHANED
└─ cmd.exe → node (tsx watch) ← ORPHANED
Layer 1 — Orchestrator (parallel_orchestrator.py:1166-1172): ✅ Correctly calls kill_process_tree(proc) using psutil to recursively terminate agent subprocesses.
Layer 2 — Agent (agent.py:264-269): disconnect().
Layer 3 — Claude SDK Transport (subprocess_cli.py:454-460): ❌ Only calls process.terminate() — does NOT recursively kill child processes:
# subprocess_cli.py - close() method
if self._process.returncode is None:
with suppress(ProcessLookupError):
self._process.terminate() # Only kills the immediate CLI process
The irony is that AutoForge already has the correct solution in server/utils/process_utils.py (kill_process_tree()), which handles Windows process trees properly — but it's only used at the orchestrator level, not at the SDK transport level.
Windows-Specific Impact
As noted in process_utils.py:44:
"On Windows, subprocess.terminate() only kills the immediate process, leaving orphaned child processes."
On Linux, process groups and SIGTERM propagation partially mitigate this. On Windows, it's a guaranteed orphan factory.
Observed Orphan Process Tree (real example)
Time | Process | Command | Type -- | -- | -- | -- 01:47 | node.exe | npm run dev | npm wrapper 01:47 | cmd.exe → node | vite | Frontend dev server 01:50 | cmd.exe → node | tsx watch src/index.ts | Backend dev server 01:59 | node | concurrently "npm run dev:backend" "npm run dev:frontend" | Process manager 01:59 | cmd.exe → node | vite | Frontend (2nd session) 01:59 | cmd.exe → node | tsx watch src/index.ts | Backend (2nd session) 10:58 | (same pattern repeats) | (3rd session) | 3rd batch of orphansTotal: 22 node + 18 cmd = 40 orphan processes from just 3 coding sessions.
Suggested Fix
Option A — Post-session cleanup hook (simplest): Add a cleanup step in parallel_orchestrator.py after an agent session ends that scans for and kills any remaining child processes spawned during that session, using the existing kill_process_tree() utility.
Option B — Track spawned processes: In client.py or the bash tool handler, track PIDs of all processes spawned via bash tool calls. On session end, terminate them all recursively.
Option C — Process group isolation (robust): Spawn the Claude CLI subprocess in a new process group (CREATE_NEW_PROCESS_GROUP on Windows / os.setpgrp on Linux), then kill the entire group on cleanup. This would catch all descendants regardless of depth.
Option D — Timeout-based auto-kill in bash tool: When the agent runs a long-lived command like npm run dev, automatically kill it after collecting the needed output (e.g., "Server running on port 3000"), rather than leaving it running indefinitely.
Workaround
Manual cleanup via PowerShell after each session:
# Kill all node processes related to the project
Get-CimInstance Win32_Process -Filter "Name='node.exe'" |
Where-Object { $_.CommandLine -match "your-project-path" } |
ForEach-Object { Stop-Process -Id $_.ProcessId -Force }
Labels suggestion
bug, windows, process-management