Skip to content

Security: halindrome/jcodemunch-mcp

Security

SECURITY.md

Security Controls

jcodemunch-mcp indexes source code from local folders and GitHub repositories. This document describes the security controls that protect against common risks when handling arbitrary codebases.


Path Traversal Prevention

All user-supplied paths are validated before any file is read or written.

  • validate_path(root, target) resolves both paths to absolute form and verifies the target is a descendant of root using os.path.commonpath().
  • Applied during file discovery and again before each file read (defense in depth).
  • Paths such as ../../etc/passwd or absolute paths outside the repository root are rejected.

Symlink Escape Protection

Symlinks can be used to escape the repository root and read arbitrary files.

  • Default: follow_symlinks=False — symlinks are skipped during file discovery.
  • When symlinks are followed (follow_symlinks=True), each symlink target is resolved and validated against the repository root. Escaping symlinks are skipped with a warning.
  • is_symlink_escape(root, path) checks whether a symlink resolves outside the root.
  • On Windows, environments without symlink support automatically skip symlink traversal.

Default Ignore Policy

Files are filtered through multiple layers:

  1. SKIP_PATTERNS — directories and files always excluded (e.g., node_modules/, vendor/, .git/, build/, dist/, generated files, lock files).
  2. .gitignore — respected by default for both local folders and GitHub repositories (via the pathspec library).
  3. extra_ignore_patterns — user-configurable additional gitignore-style patterns passed to indexing tools.

Secret Exclusion

Files matching known secret patterns are excluded during indexing.

Excluded patterns include:

  • Environment files: .env, .env.*, *.env
  • Certificates / keys: *.pem, *.key, *.p12, *.pfx, *.keystore, *.jks
  • SSH keys: id_rsa*, id_ed25519*, id_dsa*, id_ecdsa*
  • Credentials: credentials.json, service-account*.json, *.credentials
  • Auth files: .htpasswd, .netrc, .npmrc, .pypirc
  • Generic secret indicators: *secret*, *.secrets, *.token

When a secret file is detected, a warning is included in the indexing response. Secret files are never stored in the index or cached content directory.


File Size Limits

  • Default maximum: 500 KB per file (configurable via max_file_size).
  • Files exceeding the limit are skipped during discovery.
  • A configurable file count limit (default: 500 files) prevents runaway indexing of extremely large repositories.

Binary File Detection

Binary files are excluded using a two-stage check:

  1. Extension-based detection — common binary extensions (.exe, .dll, .so, .png, .jpg, .zip, .wasm, .pyc, .class, .pdf, .db, .sqlite, etc.).
  2. Content-based detection — files containing null bytes within the first 8 KB are treated as binary and skipped, even if the extension suggests source code.

Encoding Safety

  • All file reads use errors="replace" to substitute invalid UTF-8 bytes with the Unicode replacement character (U+FFFD) instead of raising decode errors.
  • Symbol content retrieval also uses errors="replace" to ensure safe decoding.
  • Cached raw files are stored using UTF-8 encoding.

Storage Safety

  • Index storage defaults to ~/.code-index/.
  • The storage path can be overridden using the CODE_INDEX_PATH environment variable.
  • Repository identifiers are derived from {owner}-{name}, preventing path injection in storage locations.
  • Index files are stored as JSON and validated during load to ensure schema integrity.

Summary of Controls

Control Location Default
Path traversal validation security.validate_path() Always enabled
Symlink escape protection security.is_symlink_escape() Symlinks skipped by default
Secret file exclusion security.is_secret_file() Always enabled
Binary file detection security.is_binary_file() Always enabled
File size limit File discovery pipeline 500 KB
File count limit File discovery pipeline 500 files
.gitignore respect Indexing pipeline Enabled
UTF-8 safe decode All file reads errors="replace"

There aren’t any published security advisories