Skip to content

Orphan dev server processes (node, cmd, vite, tsx) survive after agent session ends on Windows #197

@Gallisatyricon

Description

@Gallisatyricon

Description

After each AutoForge coding session ends, all dev server processes spawned by the agent remain running as orphans. Over multiple feature implementations, this accumulates dozens of zombie node.exe, cmd.exe, vite, and tsx processes that consume memory and lock ports.

In a typical session building a React + Express app, a single npm run dev (via concurrently) spawns ~10 processes. After 3 coding sessions, I had 22 orphan node.exe and 20 orphan cmd.exe processes running with no parent, all holding memory and blocking ports 3000/5173.

Environment

  • OS: Windows 11
  • AutoForge version: Latest (installed Feb 2026)
  • Project type: React (Vite) + Express (tsx) with concurrently
  • Agent model: Claude

Steps to Reproduce

  1. Create a project that uses npm run dev (e.g., React + Express with concurrently)
  2. Let the coding agent run a few features — the agent will call npm run dev to test
  3. Wait for the agent session to complete (feature marked as passing)
  4. Open Task Manager → observe orphan node.exe and cmd.exe processes still running
  5. Repeat for 2-3 more features → processes accumulate

Root Cause Analysis

I traced through the codebase and identified a three-layer process cleanup gap:

AutoForge Orchestrator (parallel_orchestrator.py)
  └─ Agent Subprocess (autonomous_agent_demo.py)
     └─ Claude SDK Client (client.py)
        └─ Claude CLI Subprocess (subprocess_cli.py)
           └─ Bash tool executes: npm run dev
              └─ cmd.exe → node (concurrently)
                 ├─ cmd.exe → node (vite)        ← ORPHANED
                 └─ cmd.exe → node (tsx watch)   ← ORPHANED

Layer 1 — Orchestrator (parallel_orchestrator.py:1166-1172): ✅ Correctly calls kill_process_tree(proc) using psutil to recursively terminate agent subprocesses.

Layer 2 — Agent (agent.py:264-269): ⚠️ Uses context manager, delegates cleanup to the SDK's disconnect().

Layer 3 — Claude SDK Transport (subprocess_cli.py:454-460):  Only calls process.terminate() — does NOT recursively kill child processes:

# subprocess_cli.py - close() method
if self._process.returncode is None:
    with suppress(ProcessLookupError):
        self._process.terminate()  # Only kills the immediate CLI process

The irony is that AutoForge already has the correct solution in server/utils/process_utils.py (kill_process_tree()), which handles Windows process trees properly — but it's only used at the orchestrator level, not at the SDK transport level.

Windows-Specific Impact

As noted in process_utils.py:44:

"On Windows, subprocess.terminate() only kills the immediate process, leaving orphaned child processes."

On Linux, process groups and SIGTERM propagation partially mitigate this. On Windows, it's a guaranteed orphan factory.

Observed Orphan Process Tree (real example)

Time | Process | Command | Type -- | -- | -- | -- 01:47 | node.exe | npm run dev | npm wrapper 01:47 | cmd.exe → node | vite | Frontend dev server 01:50 | cmd.exe → node | tsx watch src/index.ts | Backend dev server 01:59 | node | concurrently "npm run dev:backend" "npm run dev:frontend" | Process manager 01:59 | cmd.exe → node | vite | Frontend (2nd session) 01:59 | cmd.exe → node | tsx watch src/index.ts | Backend (2nd session) 10:58 | (same pattern repeats) | (3rd session) | 3rd batch of orphans

Total: 22 node + 18 cmd = 40 orphan processes from just 3 coding sessions.

Suggested Fix

Option A — Post-session cleanup hook (simplest): Add a cleanup step in parallel_orchestrator.py after an agent session ends that scans for and kills any remaining child processes spawned during that session, using the existing kill_process_tree() utility.

Option B — Track spawned processes: In client.py or the bash tool handler, track PIDs of all processes spawned via bash tool calls. On session end, terminate them all recursively.

Option C — Process group isolation (robust): Spawn the Claude CLI subprocess in a new process group (CREATE_NEW_PROCESS_GROUP on Windows / os.setpgrp on Linux), then kill the entire group on cleanup. This would catch all descendants regardless of depth.

Option D — Timeout-based auto-kill in bash tool: When the agent runs a long-lived command like npm run dev, automatically kill it after collecting the needed output (e.g., "Server running on port 3000"), rather than leaving it running indefinitely.

Workaround

Manual cleanup via PowerShell after each session:

# Kill all node processes related to the project
Get-CimInstance Win32_Process -Filter "Name='node.exe'" |
  Where-Object { $_.CommandLine -match "your-project-path" } |
  ForEach-Object { Stop-Process -Id $_.ProcessId -Force }

Labels suggestion

bug, windows, process-management

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions