Evaluate microVM isolation for sandbox workloads (Firecracker / Kata Containers)

## Summary

Evaluate replacing Docker container isolation for sandbox workloads with Firecracker microVMs or Kata Containers to achieve hardware-level isolation for untrusted LLM agent code.

## Motivation

The sandbox runs untrusted, LLM-generated code. The current isolation model uses Docker containers with extensive defense-in-depth (`.git` shadow mounts, Squid proxy, gateway credential isolation, phase-based file restrictions). While effective, containers share the host kernel — container escapes are well-documented and the attack surface includes namespaces, cgroups, and seccomp filter gaps.

Firecracker microVMs provide hardware-level isolation via KVM. Each VM gets its own kernel, and the attack surface is reduced to the hypervisor boundary (VM escapes are 6-figure bug bounty territory vs. container escapes being routine). This is the isolation model used by AWS Lambda, Fargate, and Fly.io for running untrusted code.

## Proposed approach: Hybrid (sandbox-only microVMs)

**Keep** gateway and orchestrator as Docker containers (trusted components).
**Replace** sandbox Docker containers with microVMs (untrusted component).

This gives:
- Hardware isolation where it matters — the sandbox is the only untrusted component
- Existing infrastructure preserved for trusted components
- Simpler sandbox security — could drop `.git` shadow mounts, simplify network proxy chain
- Safe Docker-in-VM — a microVM sandbox could safely run Docker internally (relevant to #645 DinD trust model)

## Options to evaluate

### 1. Firecracker microVMs

- <5 MiB overhead per VM, ~125ms boot time
- Requires KVM on every host
- No OCI compatibility — needs rootfs images (can convert from Docker images via `firecracker-containerd`)
- Orchestrator would need a parallel Firecracker backend alongside Docker in `sandbox_template.py`
- Tap device networking instead of Docker bridge networks

### 2. Kata Containers (recommended to evaluate first)

- MicroVM isolation with OCI-compatible interface
- Existing Docker/containerd tooling mostly works unchanged
- Orchestrator could continue using the Docker API while the runtime transparently launches microVMs
- Lower migration cost than raw Firecracker
- Supports both QEMU and Cloud Hypervisor backends

### 3. gVisor (lightweight alternative)

- User-space kernel (Sentry) intercepts syscalls — stronger than containers, weaker than hardware VMs
- OCI-compatible (drop-in `runsc` runtime)
- No KVM required
- Lowest migration cost but weakest isolation improvement

## What this does NOT change

- Gateway sidecar architecture (still needed for credential isolation and policy enforcement)
- Network proxy model (still needed for public/private mode enforcement)
- Phase-based file restrictions (still needed for SDLC pipeline control)
- Integration test complexity (#645) — test complexity comes from multi-component orchestration, not the isolation model

## Resource impact

Minimal. Firecracker overhead is <5 MiB per VM. Sandbox workloads are allocated 4GB RAM / 2 CPUs — the container/VM overhead is noise compared to the agent workload. Boot times are comparable (~125ms).

## Investigation tasks

- [ ] Verify KVM availability in target deployment environments (bare metal, cloud VMs with nested virt, GitHub Actions runners)
- [ ] Prototype Kata Containers runtime with existing sandbox Docker image
- [ ] Benchmark boot time and memory overhead vs current Docker containers
- [ ] Evaluate network isolation model (tap devices vs Docker bridge) and impact on gateway proxy routing
- [ ] Assess rootfs image build pipeline if using raw Firecracker
- [ ] Determine impact on `orchestrator/sandbox_template.py` container spawning path
- [ ] Test compatibility with worktree bind mounts and volume architecture

## References

- [Firecracker](https://firecracker-microvm.github.io/)
- [Kata Containers](https://katacontainers.io/)
- [gVisor](https://gvisor.dev/)
- [Firecracker vs Docker: Security Tradeoffs for Agentic Workloads](https://nextkicklabs.substack.com/p/firecracker-vs-docker-security-tradeoffs)
- Related: #645 (DinD deployment validation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate microVM isolation for sandbox workloads (Firecracker / Kata Containers) #916

Summary

Motivation

Proposed approach: Hybrid (sandbox-only microVMs)

Options to evaluate

1. Firecracker microVMs

2. Kata Containers (recommended to evaluate first)

3. gVisor (lightweight alternative)

What this does NOT change

Resource impact

Investigation tasks

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluate microVM isolation for sandbox workloads (Firecracker / Kata Containers) #916

Description

Summary

Motivation

Proposed approach: Hybrid (sandbox-only microVMs)

Options to evaluate

1. Firecracker microVMs

2. Kata Containers (recommended to evaluate first)

3. gVisor (lightweight alternative)

What this does NOT change

Resource impact

Investigation tasks

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions