Skip to content

Evaluate microVM isolation for sandbox workloads (Firecracker / Kata Containers) #916

@james-in-a-box

Description

@james-in-a-box

Summary

Evaluate replacing Docker container isolation for sandbox workloads with Firecracker microVMs or Kata Containers to achieve hardware-level isolation for untrusted LLM agent code.

Motivation

The sandbox runs untrusted, LLM-generated code. The current isolation model uses Docker containers with extensive defense-in-depth (.git shadow mounts, Squid proxy, gateway credential isolation, phase-based file restrictions). While effective, containers share the host kernel — container escapes are well-documented and the attack surface includes namespaces, cgroups, and seccomp filter gaps.

Firecracker microVMs provide hardware-level isolation via KVM. Each VM gets its own kernel, and the attack surface is reduced to the hypervisor boundary (VM escapes are 6-figure bug bounty territory vs. container escapes being routine). This is the isolation model used by AWS Lambda, Fargate, and Fly.io for running untrusted code.

Proposed approach: Hybrid (sandbox-only microVMs)

Keep gateway and orchestrator as Docker containers (trusted components).
Replace sandbox Docker containers with microVMs (untrusted component).

This gives:

  • Hardware isolation where it matters — the sandbox is the only untrusted component
  • Existing infrastructure preserved for trusted components
  • Simpler sandbox security — could drop .git shadow mounts, simplify network proxy chain
  • Safe Docker-in-VM — a microVM sandbox could safely run Docker internally (relevant to DinD deployment validation in check phase #645 DinD trust model)

Options to evaluate

1. Firecracker microVMs

  • <5 MiB overhead per VM, ~125ms boot time
  • Requires KVM on every host
  • No OCI compatibility — needs rootfs images (can convert from Docker images via firecracker-containerd)
  • Orchestrator would need a parallel Firecracker backend alongside Docker in sandbox_template.py
  • Tap device networking instead of Docker bridge networks

2. Kata Containers (recommended to evaluate first)

  • MicroVM isolation with OCI-compatible interface
  • Existing Docker/containerd tooling mostly works unchanged
  • Orchestrator could continue using the Docker API while the runtime transparently launches microVMs
  • Lower migration cost than raw Firecracker
  • Supports both QEMU and Cloud Hypervisor backends

3. gVisor (lightweight alternative)

  • User-space kernel (Sentry) intercepts syscalls — stronger than containers, weaker than hardware VMs
  • OCI-compatible (drop-in runsc runtime)
  • No KVM required
  • Lowest migration cost but weakest isolation improvement

What this does NOT change

  • Gateway sidecar architecture (still needed for credential isolation and policy enforcement)
  • Network proxy model (still needed for public/private mode enforcement)
  • Phase-based file restrictions (still needed for SDLC pipeline control)
  • Integration test complexity (DinD deployment validation in check phase #645) — test complexity comes from multi-component orchestration, not the isolation model

Resource impact

Minimal. Firecracker overhead is <5 MiB per VM. Sandbox workloads are allocated 4GB RAM / 2 CPUs — the container/VM overhead is noise compared to the agent workload. Boot times are comparable (~125ms).

Investigation tasks

  • Verify KVM availability in target deployment environments (bare metal, cloud VMs with nested virt, GitHub Actions runners)
  • Prototype Kata Containers runtime with existing sandbox Docker image
  • Benchmark boot time and memory overhead vs current Docker containers
  • Evaluate network isolation model (tap devices vs Docker bridge) and impact on gateway proxy routing
  • Assess rootfs image build pipeline if using raw Firecracker
  • Determine impact on orchestrator/sandbox_template.py container spawning path
  • Test compatibility with worktree bind mounts and volume architecture

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions