Parallel Ghidra analysis on AWS. Distribute binary analysis across 100 workers and merge results into a single Ghidra project.
- AWS Console → IAM → Users → Create User
- Name:
hydra - Attach policies:
AmazonEC2FullAccessAmazonEC2ContainerRegistryFullAccessAmazonECS_FullAccessAWSBatchFullAccessAmazonS3FullAccessAmazonSQSFullAccessIAMFullAccessCloudWatchLogsFullAccess
- Security credentials → Create access key → CLI
- Save the Access Key ID and Secret Access Key
- If you are planning on running more than 10 workers, ensure that you increase Fargate Spot vCPU. You do this by doing the following:
AWS Console → Service Quotas → AWS Fargate"Fargate Spot vCPU" → Request increaseEnter 400 (max workers * 4 vCPUs)
# Install Nix (if needed)
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
# Enter environment
nix develop
# Configure AWS (paste keys from step 1)
aws configure
# Create infrastructure
./hydra.py setup
# Run analysis
./hydra.py run -b my-bucket/samples -s find_crypto.py
# Check progress
./hydra.py status abc123
# Download results
./hydra.py download abc123
# Cleanup (when done with Hydra entirely)
./hydra.py teardown- 100 parallel workers on AWS Fargate Spot (~70% cheaper)
- SQS work queue - workers pull files dynamically, automatic load balancing
- Visibility heartbeat - long-running files don't get duplicated
- Continuous merging - merger imports GZFs as workers produce them
- Rerun mode - re-run scripts on previous results without re-analyzing
- Project mode - run scripts on existing local Ghidra projects
- Custom scripts - run your Ghidra/Python scripts on every binary
- Artifact collection - extract analysis results (JSON, CSV, etc.)
- Ghidra extensions - install custom extensions across all workers
- Python dependencies - install packages for PyGhidra scripts
┌──────────────────────────────────────┐
│ hydra.py │
│ - Scans S3 / previous run / project │
│ - Pushes files to SQS work queue │
│ - Submits AWS Batch jobs │
└──────────────────┬───────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ SQS Work Queue │ │
│ │ (files to process, pulled by workers) │ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Worker 1 │ │Worker 2 │ ... │Worker N │ │
│ │ (Spot) │ │ (Spot) │ │ (Spot) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ SQS Coord Queue │ │
│ │ (GZF notifications → Merger) │ │
│ └──────────────────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Merger │ │
│ │ (On-Demand) │ │
│ │ 30GB/200GB │ │
│ └──────┬───────┘ │
│ │ │
└───────────────────────────┼─────────────────────────────┘
▼
┌──────────────┐
│Merged Ghidra │
│ Project │
└──────────────┘
- Work Queue: Files pushed by hydra.py, pulled by workers (automatic load balancing)
- Workers: Fargate Spot, 4 vCPU/8GB RAM, auto-retry on interruption
- Coord Queue: Workers notify merger after each GZF upload
- Merger: Fargate On-Demand, 4 vCPU/30GB RAM/200GB disk
hydra.yaml:
input:
bucket: my-bucket/samples
extensions: [exe, dll]
analysis:
scripts:
- scripts/find_crypto.py
workers: 50
timeout: 600
output:
project_name: malware-analysis
artifacts:
- "/tmp/hydra/*"
# Multi-pass modes (alternative to input.bucket)
# rerun: abc123 # Re-run scripts on previous run
# project: ./my_analysis.gpr # Run on local Ghidra project./hydra.py run -c hydra.yaml# Basic
./hydra.py run -b my-bucket -s script.py
# Filter by extension
./hydra.py run -b my-bucket -s script.py -e exe -e dll
# Multiple scripts
./hydra.py run -b my-bucket -s script1.py -s script2.py
# Adjust worker count
./hydra.py run -b my-bucket -s script.py -w 50
# Skip merge (get individual GZFs)
./hydra.py run -b my-bucket --no-mergeRe-run scripts on GZFs from a previous run without re-analyzing:
# First run - analyze binaries
./hydra.py run -b my-bucket -s script_v1.py
# Output: RUN ID: abc123
# Later - run updated script on same binaries (no re-analysis)
./hydra.py run --rerun abc123 -s script_v2.pyRun scripts on an existing local Ghidra project:
# Run on existing project
./hydra.py run --project ./my_analysis.gpr -s new_script.py
# Add new files to existing project
./hydra.py run --project ./my_analysis.gpr -b my-bucket/new-samplesScripts run in Ghidra's headless environment after auto-analysis:
# find_crypto.py - runs via PyGhidra
from pathlib import Path
def main():
# Search for AES S-box
mem = currentProgram.getMemory()
aes_sbox = bytes([0x63, 0x7c, 0x77, 0x7b])
addr = mem.findBytes(mem.getMinAddress(), aes_sbox, None, True, monitor)
if addr:
print(f"FOUND: AES S-box at {addr}")
# Write artifact for collection
output = Path("/tmp/hydra")
output.mkdir(parents=True, exist_ok=True)
(output / "crypto.txt").write_text(f"AES S-box: {addr}\n")
main()Artifact collection: Add patterns to config to collect script outputs:
output:
artifacts:
- "/tmp/hydra/*"| Command | Description |
|---|---|
./hydra.py setup |
Create AWS infrastructure |
./hydra.py run |
Submit analysis jobs |
./hydra.py status <id> |
Check job progress |
./hydra.py download <id> |
Download results |
./hydra.py cancel <id> |
Cancel running jobs |
./hydra.py cleanup <id> |
Delete SQS queues for a run |
./hydra.py teardown |
Delete AWS infrastructure |
./hydra.py run [options]
Input:
-b, --bucket BUCKET S3 bucket (required, or use -c/--rerun/--project)
-p, --prefix PREFIX S3 prefix filter
-e, --ext EXT File extension filter (repeatable)
-c, --config FILE YAML config file
Analysis:
-s, --script FILE Script to run (repeatable)
-w, --workers N Parallel workers (default: 100)
-t, --timeout N Seconds per file (default: 600)
--loader NAME Ghidra loader (e.g., RawBinaryLoader)
--ghidra-extension ZIP Ghidra extension to install (repeatable)
Output:
--output-bucket BUCKET Output bucket (default: same as input)
--output-prefix PREFIX Output prefix (default: hydra-output)
--project-name NAME Ghidra project name (default: analysis)
--no-merge Skip merge, output individual GZFs
Multi-pass modes:
--rerun RUN_ID Re-run scripts on GZFs from previous run
--project PATH Run against existing Ghidra project (.gpr)
hydra/
├── hydra.py # CLI
├── flake.nix # Nix environment
└── container/
├── main.tf # Terraform infrastructure
├── constants.py # Shared constants
├── ghidra-batch.Dockerfile
├── worker.py # Analysis worker
├── merger.py # Project merger
├── splitter.py # Project splitter (for rerun/project modes)
└── scripts/
└── export_gzf.py # GZF export script
Check logs:
# Via AWS Console
AWS Console → CloudWatch → Log Groups → /aws/batch/hydra
# Or via CLI
aws logs tail /aws/batch/hydra --followCommon issues:
- "No files found": Check bucket/prefix, ensure files exist
- Worker timeout: Increase
-ttimeout for large binaries - Out of disk: Merger has 200GB, use
--no-mergefor very large runs - Spot interruption: Workers auto-retry 5x, usually resolves
- Only ~10 workers running: Request Fargate Spot quota increase:
Request increase: AWS Console → Service Quotas → AWS Fargate → "Fargate Spot vCPU"
# Check current quota aws service-quotas get-service-quota \ --service-code fargate \ --quota-code L-3032A538 \ --query 'Quota.Value'