Skip to content

AI-powered GitHub issue intelligence - semantic duplicate detection, cross-repo search, and intelligent issue routing

License

Notifications You must be signed in to change notification settings

similigh/simili-bot

Simili Logo

Simili Bot

AI-Powered GitHub Issue Intelligence

Build Status Release License Stars

Automatically detect duplicate issues, find similar issues with semantic search, and intelligently route issues across repositories.


Features

  • Semantic Duplicate Detection — Find related issues using AI-powered embeddings, not just keyword matching.
  • Cross-Repository Search — Search for similar issues across your organization.
  • Intelligent Routing — Automatically transfer issues to the correct repository based on content.
  • Smart Triage — AI-powered labeling and quality assessment.
  • Modular Pipeline — Customize workflows with plug-and-play steps.
  • Multi-Repo Support — Central configuration with per-repo overrides.

Architecture

Simili uses a "Lego with Blueprints" architecture:

  • Lego Blocks: Independent, reusable pipeline steps (Gatekeeper, Similarity, Triage, etc.).
  • Blueprints: Pre-defined workflows for common use cases.
  • State Branch: Git-based state management using an orphan branch (no comment scanning).
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Gatekeeper  │───▶│  Similarity │───▶│   Triage    │───▶│   Action    │
│   Check     │    │   Search    │    │  Analysis   │    │  Executor   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Quick Start

Simili-Bot supports both Single-Repository and Organization-wide setups.

Setup Guides

Guide Description
Single Repo Setup Instructions for setting up Simili-Bot on a standalone repository.
Organization Setup Best practices for deploying across an organization using Reusable Workflows.

Examples

We provide copy-pasteable examples to get you started quickly:

Available Workflows

You can specify a workflow in your simili.yaml or define custom steps.

Preset Description
issue-triage Full pipeline: similarity search, duplicate check, triage analysis, and action execution.
similarity-only Runs similarity search only. Useful for "Find Similar Issues" features without auto-triage.
index-only Indexes issues to the vector database without providing feedback.

CLI Commands

Simili provides a powerful CLI for local development, testing, and batch operations.

simili index

Bulk index issues from a GitHub repository into the vector database.

simili index --repo owner/repo --workers 5 --limit 100

Flags:

  • --repo (required): Target repository (owner/name)
  • --workers: Number of concurrent workers (default: 5)
  • --since: Start from issue number or timestamp
  • --limit: Maximum issues to index
  • --dry-run: Simulate without writing to database

simili process

Process a single issue through the pipeline.

simili process --issue issue.json --workflow issue-triage --dry-run

Flags:

  • --issue: Path to issue JSON file
  • --workflow: Workflow preset to run (default: "issue-triage")
  • --dry-run: Run without side effects
  • --repo, --org, --number: Override issue fields

simili batch

Process multiple issues from a JSON file in batch mode. All operations run in dry-run mode to prevent GitHub writes.

simili batch --file issues.json --format csv --out-file results.csv --workers 5

Use Cases:

  • Test bot logic on historical data without spamming repositories
  • Generate reports showing similarity analysis and duplicate detection
  • Analyze issues from repositories where you lack write access
  • Bulk identify transfer recommendations and quality scores

Flags:

  • --file (required): Path to JSON file with array of issues
  • --out-file: Output file path (stdout if not specified)
  • --format: Output format: json or csv (default: json)
  • --workers: Number of concurrent workers (default: 1)
  • --workflow: Workflow preset (default: "issue-triage")
  • --collection: Override Qdrant collection name
  • --threshold: Override similarity threshold
  • --duplicate-threshold: Override duplicate confidence threshold
  • --top-k: Override max similar issues to show

Input Format:

Create a JSON file with an array of issues:

[
  {
    "org": "owner",
    "repo": "repo-name",
    "number": 123,
    "title": "Issue title",
    "body": "Issue description...",
    "state": "open",
    "labels": ["bug", "high-priority"],
    "author": "username",
    "created_at": "2026-02-10T10:00:00Z"
  }
]

Output Formats:

  • JSON: Full pipeline results with detailed analysis
  • CSV: Flattened summary for spreadsheet analysis

Example Workflow:

# 1. Index repository issues
simili index --repo ballerina-platform/ballerina-library --workers 10

# 2. Prepare test issues in batch.json
# 3. Run batch analysis
simili batch --file batch.json --format csv --out-file analysis.csv --workers 5

# 4. Review results
cat analysis.csv

Development

# Clone the repository
git clone https://github.com/similigh/simili-bot.git
cd simili-bot

# Build
go build ./...

# Run tests
go test ./...

# Lint
go vet ./...

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.


Made by the Simili Team

About

AI-powered GitHub issue intelligence - semantic duplicate detection, cross-repo search, and intelligent issue routing

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •