-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Evaluate replacing Docker container isolation for sandbox workloads with Firecracker microVMs or Kata Containers to achieve hardware-level isolation for untrusted LLM agent code.
Motivation
The sandbox runs untrusted, LLM-generated code. The current isolation model uses Docker containers with extensive defense-in-depth (.git shadow mounts, Squid proxy, gateway credential isolation, phase-based file restrictions). While effective, containers share the host kernel — container escapes are well-documented and the attack surface includes namespaces, cgroups, and seccomp filter gaps.
Firecracker microVMs provide hardware-level isolation via KVM. Each VM gets its own kernel, and the attack surface is reduced to the hypervisor boundary (VM escapes are 6-figure bug bounty territory vs. container escapes being routine). This is the isolation model used by AWS Lambda, Fargate, and Fly.io for running untrusted code.
Proposed approach: Hybrid (sandbox-only microVMs)
Keep gateway and orchestrator as Docker containers (trusted components).
Replace sandbox Docker containers with microVMs (untrusted component).
This gives:
- Hardware isolation where it matters — the sandbox is the only untrusted component
- Existing infrastructure preserved for trusted components
- Simpler sandbox security — could drop
.gitshadow mounts, simplify network proxy chain - Safe Docker-in-VM — a microVM sandbox could safely run Docker internally (relevant to DinD deployment validation in check phase #645 DinD trust model)
Options to evaluate
1. Firecracker microVMs
- <5 MiB overhead per VM, ~125ms boot time
- Requires KVM on every host
- No OCI compatibility — needs rootfs images (can convert from Docker images via
firecracker-containerd) - Orchestrator would need a parallel Firecracker backend alongside Docker in
sandbox_template.py - Tap device networking instead of Docker bridge networks
2. Kata Containers (recommended to evaluate first)
- MicroVM isolation with OCI-compatible interface
- Existing Docker/containerd tooling mostly works unchanged
- Orchestrator could continue using the Docker API while the runtime transparently launches microVMs
- Lower migration cost than raw Firecracker
- Supports both QEMU and Cloud Hypervisor backends
3. gVisor (lightweight alternative)
- User-space kernel (Sentry) intercepts syscalls — stronger than containers, weaker than hardware VMs
- OCI-compatible (drop-in
runscruntime) - No KVM required
- Lowest migration cost but weakest isolation improvement
What this does NOT change
- Gateway sidecar architecture (still needed for credential isolation and policy enforcement)
- Network proxy model (still needed for public/private mode enforcement)
- Phase-based file restrictions (still needed for SDLC pipeline control)
- Integration test complexity (DinD deployment validation in check phase #645) — test complexity comes from multi-component orchestration, not the isolation model
Resource impact
Minimal. Firecracker overhead is <5 MiB per VM. Sandbox workloads are allocated 4GB RAM / 2 CPUs — the container/VM overhead is noise compared to the agent workload. Boot times are comparable (~125ms).
Investigation tasks
- Verify KVM availability in target deployment environments (bare metal, cloud VMs with nested virt, GitHub Actions runners)
- Prototype Kata Containers runtime with existing sandbox Docker image
- Benchmark boot time and memory overhead vs current Docker containers
- Evaluate network isolation model (tap devices vs Docker bridge) and impact on gateway proxy routing
- Assess rootfs image build pipeline if using raw Firecracker
- Determine impact on
orchestrator/sandbox_template.pycontainer spawning path - Test compatibility with worktree bind mounts and volume architecture
References
- Firecracker
- Kata Containers
- gVisor
- Firecracker vs Docker: Security Tradeoffs for Agentic Workloads
- Related: DinD deployment validation in check phase #645 (DinD deployment validation)