Skip to content

Zetier/hydra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hydra

Parallel Ghidra analysis on AWS. Distribute binary analysis across 100 workers and merge results into a single Ghidra project.

Quick Start

1. AWS Setup

  1. AWS Console → IAM → Users → Create User
  2. Name: hydra
  3. Attach policies:
    • AmazonEC2FullAccess
    • AmazonEC2ContainerRegistryFullAccess
    • AmazonECS_FullAccess
    • AWSBatchFullAccess
    • AmazonS3FullAccess
    • AmazonSQSFullAccess
    • IAMFullAccess
    • CloudWatchLogsFullAccess
  4. Security credentials → Create access key → CLI
  5. Save the Access Key ID and Secret Access Key
  6. If you are planning on running more than 10 workers, ensure that you increase Fargate Spot vCPU. You do this by doing the following:
    1. AWS Console → Service Quotas → AWS Fargate
    2. "Fargate Spot vCPU" → Request increase
    3. Enter 400 (max workers * 4 vCPUs)

2. Install & Run

# Install Nix (if needed)
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install

# Enter environment
nix develop

# Configure AWS (paste keys from step 1)
aws configure

# Create infrastructure
./hydra.py setup

# Run analysis
./hydra.py run -b my-bucket/samples -s find_crypto.py

# Check progress
./hydra.py status abc123

# Download results
./hydra.py download abc123

# Cleanup (when done with Hydra entirely)
./hydra.py teardown

Features

  • 100 parallel workers on AWS Fargate Spot (~70% cheaper)
  • SQS work queue - workers pull files dynamically, automatic load balancing
  • Visibility heartbeat - long-running files don't get duplicated
  • Continuous merging - merger imports GZFs as workers produce them
  • Rerun mode - re-run scripts on previous results without re-analyzing
  • Project mode - run scripts on existing local Ghidra projects
  • Custom scripts - run your Ghidra/Python scripts on every binary
  • Artifact collection - extract analysis results (JSON, CSV, etc.)
  • Ghidra extensions - install custom extensions across all workers
  • Python dependencies - install packages for PyGhidra scripts

Architecture

                        ┌──────────────────────────────────────┐
                        │             hydra.py                 │
                        │  - Scans S3 / previous run / project │
                        │  - Pushes files to SQS work queue    │
                        │  - Submits AWS Batch jobs            │
                        └──────────────────┬───────────────────┘
                                           │
              ┌────────────────────────────┼────────────────────────────┐
              │                            ▼                            │
              │     ┌──────────────────────────────────────────┐        │
              │     │           SQS Work Queue                 │        │
              │     │    (files to process, pulled by workers) │        │
              │     └──────────────────────────────────────────┘        │
              │                            │                            │
              │         ┌──────────────────┼──────────────────┐         │
              │         ▼                  ▼                  ▼         │
              │    ┌─────────┐        ┌─────────┐        ┌─────────┐    │
              │    │Worker 1 │        │Worker 2 │   ...  │Worker N │    │
              │    │ (Spot)  │        │ (Spot)  │        │ (Spot)  │    │
              │    └────┬────┘        └────┬────┘        └────┬────┘    │
              │         │                  │                  │         │
              │         └──────────────────┼──────────────────┘         │
              │                            ▼                            │
              │     ┌──────────────────────────────────────────┐        │
              │     │           SQS Coord Queue                │        │
              │     │  (GZF notifications → Merger)            │        │
              │     └──────────────────────┬───────────────────┘        │
              │                            ▼                            │
              │                    ┌──────────────┐                     │
              │                    │    Merger    │                     │
              │                    │ (On-Demand)  │                     │
              │                    │ 30GB/200GB   │                     │
              │                    └──────┬───────┘                     │
              │                           │                             │
              └───────────────────────────┼─────────────────────────────┘
                                          ▼
                                  ┌──────────────┐
                                  │Merged Ghidra │
                                  │   Project    │
                                  └──────────────┘
  • Work Queue: Files pushed by hydra.py, pulled by workers (automatic load balancing)
  • Workers: Fargate Spot, 4 vCPU/8GB RAM, auto-retry on interruption
  • Coord Queue: Workers notify merger after each GZF upload
  • Merger: Fargate On-Demand, 4 vCPU/30GB RAM/200GB disk

Usage

Config File (Recommended)

hydra.yaml:

input:
  bucket: my-bucket/samples
  extensions: [exe, dll]

analysis:
  scripts:
    - scripts/find_crypto.py
  workers: 50
  timeout: 600

output:
  project_name: malware-analysis
  artifacts:
    - "/tmp/hydra/*"

# Multi-pass modes (alternative to input.bucket)
# rerun: abc123                  # Re-run scripts on previous run
# project: ./my_analysis.gpr    # Run on local Ghidra project
./hydra.py run -c hydra.yaml

CLI Arguments

# Basic
./hydra.py run -b my-bucket -s script.py

# Filter by extension
./hydra.py run -b my-bucket -s script.py -e exe -e dll

# Multiple scripts
./hydra.py run -b my-bucket -s script1.py -s script2.py

# Adjust worker count
./hydra.py run -b my-bucket -s script.py -w 50

# Skip merge (get individual GZFs)
./hydra.py run -b my-bucket --no-merge

Rerun Mode

Re-run scripts on GZFs from a previous run without re-analyzing:

# First run - analyze binaries
./hydra.py run -b my-bucket -s script_v1.py
# Output: RUN ID: abc123

# Later - run updated script on same binaries (no re-analysis)
./hydra.py run --rerun abc123 -s script_v2.py

Project Mode

Run scripts on an existing local Ghidra project:

# Run on existing project
./hydra.py run --project ./my_analysis.gpr -s new_script.py

# Add new files to existing project
./hydra.py run --project ./my_analysis.gpr -b my-bucket/new-samples

Writing Scripts

Scripts run in Ghidra's headless environment after auto-analysis:

# find_crypto.py - runs via PyGhidra
from pathlib import Path

def main():
    # Search for AES S-box
    mem = currentProgram.getMemory()
    aes_sbox = bytes([0x63, 0x7c, 0x77, 0x7b])
    addr = mem.findBytes(mem.getMinAddress(), aes_sbox, None, True, monitor)

    if addr:
        print(f"FOUND: AES S-box at {addr}")

        # Write artifact for collection
        output = Path("/tmp/hydra")
        output.mkdir(parents=True, exist_ok=True)
        (output / "crypto.txt").write_text(f"AES S-box: {addr}\n")

main()

Artifact collection: Add patterns to config to collect script outputs:

output:
  artifacts:
    - "/tmp/hydra/*"

Commands

Command Description
./hydra.py setup Create AWS infrastructure
./hydra.py run Submit analysis jobs
./hydra.py status <id> Check job progress
./hydra.py download <id> Download results
./hydra.py cancel <id> Cancel running jobs
./hydra.py cleanup <id> Delete SQS queues for a run
./hydra.py teardown Delete AWS infrastructure

Options

./hydra.py run [options]

Input:
  -b, --bucket BUCKET     S3 bucket (required, or use -c/--rerun/--project)
  -p, --prefix PREFIX     S3 prefix filter
  -e, --ext EXT           File extension filter (repeatable)
  -c, --config FILE       YAML config file

Analysis:
  -s, --script FILE       Script to run (repeatable)
  -w, --workers N         Parallel workers (default: 100)
  -t, --timeout N         Seconds per file (default: 600)
  --loader NAME           Ghidra loader (e.g., RawBinaryLoader)
  --ghidra-extension ZIP  Ghidra extension to install (repeatable)

Output:
  --output-bucket BUCKET  Output bucket (default: same as input)
  --output-prefix PREFIX  Output prefix (default: hydra-output)
  --project-name NAME     Ghidra project name (default: analysis)
  --no-merge              Skip merge, output individual GZFs

Multi-pass modes:
  --rerun RUN_ID          Re-run scripts on GZFs from previous run
  --project PATH          Run against existing Ghidra project (.gpr)

Project Structure

hydra/
├── hydra.py              # CLI
├── flake.nix             # Nix environment
└── container/
    ├── main.tf           # Terraform infrastructure
    ├── constants.py      # Shared constants
    ├── ghidra-batch.Dockerfile
    ├── worker.py         # Analysis worker
    ├── merger.py         # Project merger
    ├── splitter.py       # Project splitter (for rerun/project modes)
    └── scripts/
        └── export_gzf.py # GZF export script

Troubleshooting

Check logs:

# Via AWS Console
AWS Console → CloudWatch → Log Groups → /aws/batch/hydra

# Or via CLI
aws logs tail /aws/batch/hydra --follow

Common issues:

  • "No files found": Check bucket/prefix, ensure files exist
  • Worker timeout: Increase -t timeout for large binaries
  • Out of disk: Merger has 200GB, use --no-merge for very large runs
  • Spot interruption: Workers auto-retry 5x, usually resolves
  • Only ~10 workers running: Request Fargate Spot quota increase:
    # Check current quota
    aws service-quotas get-service-quota \
      --service-code fargate \
      --quota-code L-3032A538 \
      --query 'Quota.Value'
    Request increase: AWS Console → Service Quotas → AWS Fargate → "Fargate Spot vCPU"

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published