CodeBatch

Content-addressed batch execution engine with deterministic sharding and queryable outputs.

What it is: A filesystem-based execution substrate that snapshots code, shards work deterministically, and indexes every output for structured queries — no database required.

Who it's for: Developers building repeatable code analysis pipelines, CI integrations, or batch transformation workflows that need reproducibility and auditability.

Why it's different: Every input is content-addressed and every execution is deterministic. Re-run the same batch six months later and get identical results. Query outputs by semantic type without parsing logs.

Overview

CodeBatch provides a filesystem-based execution substrate for running deterministic transformations over codebases. It captures inputs as immutable snapshots, executes work in isolated shards, and indexes all semantic outputs for efficient querying—without requiring a database.

Documentation

SPEC.md — Full storage and execution specification
docs/TASKS.md — Task reference (parse, analyze, symbols, lint)
CHANGELOG.md — Version history

Spec Versioning

The specification uses semantic versioning with draft/stable markers. Each version is tagged in git (e.g., spec-v1.0-draft). Breaking changes increment the major version. Implementations should declare which spec version they target and tolerate unknown fields for forward compatibility.

Project Structure

schemas/      JSON Schema definitions for all record types
src/          Core implementation
tests/        Test suites and fixtures
examples/     Usage examples
.github/      CI/CD workflows

Quick Start

# Initialize a store
codebatch init ./store

# Create a snapshot of a directory
codebatch snapshot ./my-project --store ./store

# List available pipelines
codebatch pipelines

# Initialize a batch with a pipeline
codebatch batch init --snapshot <id> --pipeline full --store ./store

# Run all tasks and shards (Phase 5 workflow)
codebatch run --batch <id> --store ./store

# View progress
codebatch status --batch <id> --store ./store

# View summary
codebatch summary --batch <id> --store ./store

Human Workflow (Phase 5)

Phase 5 adds human-friendly commands that compose existing primitives:

# Run entire batch (no manual shard iteration needed)
codebatch run --batch <id> --store ./store

# Resume interrupted execution
codebatch resume --batch <id> --store ./store

# Progress summary
codebatch status --batch <id> --store ./store

# Output summary
codebatch summary --batch <id> --store ./store

Discoverability

# List pipelines
codebatch pipelines

# Show pipeline details
codebatch pipeline full

# List tasks in a batch
codebatch tasks --batch <id> --store ./store

# List shards for a task
codebatch shards --batch <id> --task 01_parse --store ./store

Query Aliases

# Show errors
codebatch errors --batch <id> --store ./store

# List files in a snapshot
codebatch files --batch <id> --store ./store

# Top output kinds
codebatch top --batch <id> --store ./store

Exploration & Comparison (Phase 6)

Phase 6 adds read-only views for exploring outputs and comparing batches—without modifying the store.

# Inspect all outputs for a file
codebatch inspect src/main.py --batch <id> --store ./store

# Compare two batches
codebatch diff <batchA> <batchB> --store ./store

# Show regressions (new/worsened diagnostics)
codebatch regressions <batchA> <batchB> --store ./store

# Show improvements (fixed/improved diagnostics)
codebatch improvements <batchA> <batchB> --store ./store

# Explain data sources for any command
codebatch inspect src/main.py --batch <id> --store ./store --explain

Low-Level Commands

For fine-grained control, the original commands remain available:

# Run a specific shard
codebatch run-shard --batch <id> --task 01_parse --shard ab --store ./store

# Query outputs
codebatch query outputs --batch <id> --task 01_parse --store ./store

# Query diagnostics
codebatch query diagnostics --batch <id> --task 01_parse --store ./store

# Build LMDB acceleration cache
codebatch index-build --batch <id> --store ./store

Support

Questions / help: Discussions
Bug reports: Issues

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
docs		docs
schemas		schemas
scripts		scripts
src/codebatch		src/codebatch
tests		tests
.gitignore		.gitignore
.spec_baseline_hash		.spec_baseline_hash
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
SPEC.md		SPEC.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeBatch

Overview

Documentation

Spec Versioning

Project Structure

Quick Start

Human Workflow (Phase 5)

Discoverability

Query Aliases

Exploration & Comparison (Phase 6)

Low-Level Commands

Support

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

mcp-tool-shop-org/code-batch

Folders and files

Latest commit

History

Repository files navigation

CodeBatch

Overview

Documentation

Spec Versioning

Project Structure

Quick Start

Human Workflow (Phase 5)

Discoverability

Query Aliases

Exploration & Comparison (Phase 6)

Low-Level Commands

Support

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages