Hydra

Parallel Ghidra analysis on AWS. Distribute binary analysis across 100 workers and merge results into a single Ghidra project.

Quick Start

1. AWS Setup

AWS Console → IAM → Users → Create User
Name: hydra
Attach policies:
- AmazonEC2FullAccess
- AmazonEC2ContainerRegistryFullAccess
- AmazonECS_FullAccess
- AWSBatchFullAccess
- AmazonS3FullAccess
- AmazonSQSFullAccess
- IAMFullAccess
- CloudWatchLogsFullAccess
Security credentials → Create access key → CLI
Save the Access Key ID and Secret Access Key
If you are planning on running more than 10 workers, ensure that you increase Fargate Spot vCPU. You do this by doing the following:
1. AWS Console → Service Quotas → AWS Fargate
2. "Fargate Spot vCPU" → Request increase
3. Enter 400 (max workers * 4 vCPUs)

2. Install & Run

# Install Nix (if needed)
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install

# Enter environment
nix develop

# Configure AWS (paste keys from step 1)
aws configure

# Create infrastructure
./hydra.py setup

# Run analysis
./hydra.py run -b my-bucket/samples -s find_crypto.py

# Check progress
./hydra.py status abc123

# Download results
./hydra.py download abc123

# Cleanup (when done with Hydra entirely)
./hydra.py teardown

Features

100 parallel workers on AWS Fargate Spot (~70% cheaper)
SQS work queue - workers pull files dynamically, automatic load balancing
Visibility heartbeat - long-running files don't get duplicated
Continuous merging - merger imports GZFs as workers produce them
Rerun mode - re-run scripts on previous results without re-analyzing
Project mode - run scripts on existing local Ghidra projects
Custom scripts - run your Ghidra/Python scripts on every binary
Artifact collection - extract analysis results (JSON, CSV, etc.)
Ghidra extensions - install custom extensions across all workers
Python dependencies - install packages for PyGhidra scripts

Architecture

                        ┌──────────────────────────────────────┐
                        │             hydra.py                 │
                        │  - Scans S3 / previous run / project │
                        │  - Pushes files to SQS work queue    │
                        │  - Submits AWS Batch jobs            │
                        └──────────────────┬───────────────────┘
                                           │
              ┌────────────────────────────┼────────────────────────────┐
              │                            ▼                            │
              │     ┌──────────────────────────────────────────┐        │
              │     │           SQS Work Queue                 │        │
              │     │    (files to process, pulled by workers) │        │
              │     └──────────────────────────────────────────┘        │
              │                            │                            │
              │         ┌──────────────────┼──────────────────┐         │
              │         ▼                  ▼                  ▼         │
              │    ┌─────────┐        ┌─────────┐        ┌─────────┐    │
              │    │Worker 1 │        │Worker 2 │   ...  │Worker N │    │
              │    │ (Spot)  │        │ (Spot)  │        │ (Spot)  │    │
              │    └────┬────┘        └────┬────┘        └────┬────┘    │
              │         │                  │                  │         │
              │         └──────────────────┼──────────────────┘         │
              │                            ▼                            │
              │     ┌──────────────────────────────────────────┐        │
              │     │           SQS Coord Queue                │        │
              │     │  (GZF notifications → Merger)            │        │
              │     └──────────────────────┬───────────────────┘        │
              │                            ▼                            │
              │                    ┌──────────────┐                     │
              │                    │    Merger    │                     │
              │                    │ (On-Demand)  │                     │
              │                    │ 30GB/200GB   │                     │
              │                    └──────┬───────┘                     │
              │                           │                             │
              └───────────────────────────┼─────────────────────────────┘
                                          ▼
                                  ┌──────────────┐
                                  │Merged Ghidra │
                                  │   Project    │
                                  └──────────────┘

Work Queue: Files pushed by hydra.py, pulled by workers (automatic load balancing)
Workers: Fargate Spot, 4 vCPU/8GB RAM, auto-retry on interruption
Coord Queue: Workers notify merger after each GZF upload
Merger: Fargate On-Demand, 4 vCPU/30GB RAM/200GB disk

Usage

Config File (Recommended)

hydra.yaml:

input:
  bucket: my-bucket/samples
  extensions: [exe, dll]

analysis:
  scripts:
    - scripts/find_crypto.py
  workers: 50
  timeout: 600

output:
  project_name: malware-analysis
  artifacts:
    - "/tmp/hydra/*"

# Multi-pass modes (alternative to input.bucket)
# rerun: abc123                  # Re-run scripts on previous run
# project: ./my_analysis.gpr    # Run on local Ghidra project

./hydra.py run -c hydra.yaml

CLI Arguments

# Basic
./hydra.py run -b my-bucket -s script.py

# Filter by extension
./hydra.py run -b my-bucket -s script.py -e exe -e dll

# Multiple scripts
./hydra.py run -b my-bucket -s script1.py -s script2.py

# Adjust worker count
./hydra.py run -b my-bucket -s script.py -w 50

# Skip merge (get individual GZFs)
./hydra.py run -b my-bucket --no-merge

Rerun Mode

Re-run scripts on GZFs from a previous run without re-analyzing:

# First run - analyze binaries
./hydra.py run -b my-bucket -s script_v1.py
# Output: RUN ID: abc123

# Later - run updated script on same binaries (no re-analysis)
./hydra.py run --rerun abc123 -s script_v2.py

Project Mode

Run scripts on an existing local Ghidra project:

# Run on existing project
./hydra.py run --project ./my_analysis.gpr -s new_script.py

# Add new files to existing project
./hydra.py run --project ./my_analysis.gpr -b my-bucket/new-samples

Writing Scripts

Scripts run in Ghidra's headless environment after auto-analysis:

# find_crypto.py - runs via PyGhidra
from pathlib import Path

def main():
    # Search for AES S-box
    mem = currentProgram.getMemory()
    aes_sbox = bytes([0x63, 0x7c, 0x77, 0x7b])
    addr = mem.findBytes(mem.getMinAddress(), aes_sbox, None, True, monitor)

    if addr:
        print(f"FOUND: AES S-box at {addr}")

        # Write artifact for collection
        output = Path("/tmp/hydra")
        output.mkdir(parents=True, exist_ok=True)
        (output / "crypto.txt").write_text(f"AES S-box: {addr}\n")

main()

Artifact collection: Add patterns to config to collect script outputs:

output:
  artifacts:
    - "/tmp/hydra/*"

Commands

Command	Description
`./hydra.py setup`	Create AWS infrastructure
`./hydra.py run`	Submit analysis jobs
`./hydra.py status <id>`	Check job progress
`./hydra.py download <id>`	Download results
`./hydra.py cancel <id>`	Cancel running jobs
`./hydra.py cleanup <id>`	Delete SQS queues for a run
`./hydra.py teardown`	Delete AWS infrastructure

Options

./hydra.py run [options]

Input:
  -b, --bucket BUCKET     S3 bucket (required, or use -c/--rerun/--project)
  -p, --prefix PREFIX     S3 prefix filter
  -e, --ext EXT           File extension filter (repeatable)
  -c, --config FILE       YAML config file

Analysis:
  -s, --script FILE       Script to run (repeatable)
  -w, --workers N         Parallel workers (default: 100)
  -t, --timeout N         Seconds per file (default: 600)
  --loader NAME           Ghidra loader (e.g., RawBinaryLoader)
  --ghidra-extension ZIP  Ghidra extension to install (repeatable)

Output:
  --output-bucket BUCKET  Output bucket (default: same as input)
  --output-prefix PREFIX  Output prefix (default: hydra-output)
  --project-name NAME     Ghidra project name (default: analysis)
  --no-merge              Skip merge, output individual GZFs

Multi-pass modes:
  --rerun RUN_ID          Re-run scripts on GZFs from previous run
  --project PATH          Run against existing Ghidra project (.gpr)

Project Structure

hydra/
├── hydra.py              # CLI
├── flake.nix             # Nix environment
└── container/
    ├── main.tf           # Terraform infrastructure
    ├── constants.py      # Shared constants
    ├── ghidra-batch.Dockerfile
    ├── worker.py         # Analysis worker
    ├── merger.py         # Project merger
    ├── splitter.py       # Project splitter (for rerun/project modes)
    └── scripts/
        └── export_gzf.py # GZF export script

Troubleshooting

Check logs:

# Via AWS Console
AWS Console → CloudWatch → Log Groups → /aws/batch/hydra

# Or via CLI
aws logs tail /aws/batch/hydra --follow

Common issues:

"No files found": Check bucket/prefix, ensure files exist
Worker timeout: Increase -t timeout for large binaries
Out of disk: Merger has 200GB, use --no-merge for very large runs
Spot interruption: Workers auto-retry 5x, usually resolves
Only ~10 workers running: Request Fargate Spot quota increase:
```
# Check current quota
aws service-quotas get-service-quota \
  --service-code fargate \
  --quota-code L-3032A538 \
  --query 'Quota.Value'
```
Request increase: AWS Console → Service Quotas → AWS Fargate → "Fargate Spot vCPU"

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
container		container
examples		examples
.gitignore		.gitignore
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
README.md		README.md
flake.nix		flake.nix
hydra.py		hydra.py
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydra

Quick Start

1. AWS Setup

2. Install & Run

Features

Architecture

Usage

Config File (Recommended)

CLI Arguments

Rerun Mode

Project Mode

Writing Scripts

Commands

Options

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

License

Zetier/hydra

Folders and files

Latest commit

History

Repository files navigation

Hydra

Quick Start

1. AWS Setup

2. Install & Run

Features

Architecture

Usage

Config File (Recommended)

CLI Arguments

Rerun Mode

Project Mode

Writing Scripts

Commands

Options

Project Structure

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages