Skip to content

feat: add Perl language support and JCODEMUNCH_EXTRA_EXTENSIONS#1

Open
halindrome wants to merge 41 commits intomainfrom
feat/add-perl-language-support
Open

feat: add Perl language support and JCODEMUNCH_EXTRA_EXTENSIONS#1
halindrome wants to merge 41 commits intomainfrom
feat/add-perl-language-support

Conversation

@halindrome
Copy link
Owner

Summary

  • Perl language support — adds PERL_SPEC with subroutine and package extraction, .pl/.pm/.t extension mappings, POD docstring support, and use constant extraction
  • JCODEMUNCH_EXTRA_EXTENSIONS env var — allows users to map non-standard file extensions to languages at runtime (e.g. .cgi:perl,.psgi:perl) without modifying source
  • Fixture + test coveragetests/fixtures/perl/sample.pl and full test coverage in test_languages.py and test_extra_extensions.py

Changes

Perl language support (feat/parser)

  • languages.py: PERL_SPEC defined and registered in LANGUAGE_REGISTRY; .pl, .pm, .t added to LANGUAGE_EXTENSIONS
  • extractor.py: POD node type added to _extract_preceding_comments(); use_statement branch added to _extract_constant() for use constant NAME => value
  • server.py: "perl" added to search_symbols language enum
  • tests/fixtures/perl/sample.pl: representative fixture with packages, subs, POD docs, and constants
  • tests/test_languages.py: test_parse_perl() covering packages, subs, comments, and constants

Configurable extension mappings (feat/parser)

  • languages.py: _apply_extra_extensions() reads JCODEMUNCH_EXTRA_EXTENSIONS at module load, merges .ext:lang pairs into LANGUAGE_EXTENSIONS; unknown languages and malformed entries are skipped with warnings
  • server.py: startup log confirms extra extensions loaded
  • LANGUAGE_SUPPORT.md: documents the env var format, examples, and skip rules
  • tests/test_extra_extensions.py: 10 tests covering valid mappings, unknown languages, malformed entries, overrides, and whitespace handling

Test plan

  • pytest tests/test_languages.py — Perl extraction passes
  • pytest tests/test_extra_extensions.py — env var parsing passes (10 tests)
  • pytest tests/test_parser.py — no regressions
  • Set JCODEMUNCH_EXTRA_EXTENSIONS=".cgi:perl" and verify .cgi files are indexed as Perl

🤖 Generated with Claude Code

shanemccarron-maker and others added 30 commits March 5, 2026 12:21
- Add PERL_SPEC to languages.py with subroutine_declaration_statement
  (function) and package_statement (class) node types
- Register .pl, .pm, .t extensions in LANGUAGE_EXTENSIONS
- Add "pod" to _extract_preceding_comments() for POD docstring support
- Add use_statement branch to _extract_constant() for Perl constants
- Create tests/fixtures/perl/sample.pl fixture file
- Add test_parse_perl() covering packages, subs, comments, constants
- Add "perl" to search_symbols language enum in server.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Swift to the language registry using tree-sitter-swift (already
bundled in tree-sitter-language-pack — no new dependency).

Changes:
- languages.py: SWIFT_SPEC covering function_declaration, class_declaration
  (class/struct/enum/extension), protocol_declaration, init_declaration;
  preceding_comment docstring strategy for /// and /* */ comments
- extractor.py: property_declaration handler in _extract_constant for
  Swift let MAX_CONST = ... bindings
- server.py: adds "swift" to the search_symbols language enum
- tests/test_languages.py: test_parse_swift covering function, class,
  struct, protocol, enum, init, method, and constant extraction

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using-jcodemunch-mcp

Add `--version` / `-V` flag to jcodemunch-mcp CLI
Add JCODEMUNCH_MAX_INDEX_FILES so large repos can be indexed with a higher file cap when needed.

Apply the limit to both local folder and GitHub repo indexing, and only show truncation notices when files were actually dropped.

Cover the env override and truncation behavior in tests.
Harden C++ support after jgravelle#24: header auto-detection, symbol accounting, and coverage expansion
Make the indexing file cap configurable
Index `const foo = () => {}`, `const bar = function() {}`, and
`const gen = function*() {}` patterns as function symbols. The name
is extracted from the parent variable_declarator node.

Changes:
- New `_extract_variable_function()` in extractor.py handles
  variable_declarator nodes with function-like initializers
- Remove `arrow_function` from JS/TS symbol_node_types (was silently
  dropped — caused wasted work on inline callbacks)
- Remove dead guard in `_extract_name()` that returned None for
  arrow_function nodes
- 11 new tests covering positive cases (arrow, function expression,
  generator, exported, typed), negative cases (non-function values,
  inline callbacks, destructuring), and docstring/signature extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…upport

feat: add arrow function variable support for JS/TS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `file_summaries` field to CodeIndex (INDEX_VERSION 2→3)
- Generate heuristic summaries from symbols (e.g., "Defines X class (3 methods)")
- Wire summaries into both full and incremental indexing paths
- Surface summaries in get_file_tree (include_summaries) and get_file_outline
- Use single-pass defaultdict grouping for O(n) symbol-to-file mapping
- Backward compatible: v2 indexes load with empty file_summaries
- 258 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements full Elixir symbol extraction via a custom AST walker,
bypassing the standard LanguageSpec approach because Elixir's
tree-sitter grammar is homoiconic (all constructs are generic
`call`/`unary_operator` nodes with no dedicated function/class types).

Symbols extracted:
- defmodule / defimpl  → class
- defprotocol          → type
- def / defp / defmacro / defmacrop / defguard / defguardp → method (inside module) or function (top-level)
- @type / @TypeP / @opaque / @callback → type
- @doc / @moduledoc string content captured as docstrings
- Nested modules handled via recursive descent with parent tracking
- Multi-clause functions disambiguated by existing _disambiguate_overloads()

Key implementation note: child_by_field_name("arguments") returns None
in the Elixir grammar even though the node exists as a named child.
Added _get_elixir_args() helper that finds it by type iteration.

Files changed:
- src/jcodemunch_mcp/parser/languages.py: ELIXIR_SPEC + extensions
- src/jcodemunch_mcp/parser/extractor.py: _parse_elixir_symbols() + walker
- src/jcodemunch_mcp/server.py: add "elixir" to language enum
- tests/fixtures/elixir/sample.ex: comprehensive fixture
- tests/test_languages.py: test_parse_elixir() with 13 assertions
- tests/test_hardening.py: determinism + per-language extraction tests
- LANGUAGE_SUPPORT.md / README.md: updated language tables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _node_text(), _first_named_child(), _get_elixir_attr_name(), and
  _make_elixir_symbol() helpers to eliminate repeated patterns
- Add _ELIXIR_*_KW frozenset constants to replace inline tuples
- Simplify _find_elixir_do_block() by removing dead arguments branch
- Remove redundant @ operator check in _walk_elixir (attr_name check suffices)
- Inline trivial _extract_elixir_callback() delegation
- Fix single-element tuple comparison and duplicate byte slice
- Remove redundant LANGUAGE_REGISTRY import inside function body
- Net: -46 lines, no behavioral changes (all 257 tests pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using-jcodemunch-mcp-54c17x

Add CLI -V/--version flag and dynamic package version lookup
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add --version / -V flag to jcodemunch-mcp CLI
…support

feat: add Elixir language support (.ex, .exs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts class, module (type), instance methods, singleton methods
(def self.foo), and top-level functions via standard LanguageSpec.
Preceding # comments captured as docstrings.

Closes jgravelle#31

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `file_path` arg name to tool description so models don't confuse
  it with `file` (observed: models called the tool with `file=...`,
  triggering a silent KeyError that returned only `"'file_path'"`)
- Catch KeyError before generic Exception to return a meaningful
  "Missing required argument" message for all tools

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
shanemccarron-maker and others added 11 commits March 6, 2026 07:07
- Add Perl row to LANGUAGE_SUPPORT.md feature matrix
- Documents extensions (.pl, .pm, .t), symbol types (function, class, constant), and docstring extraction (preceding # comments and POD blocks)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _apply_extra_extensions() to languages.py, called at module load
time after LANGUAGE_REGISTRY. Reads comma-separated .ext:lang pairs
from JCODEMUNCH_EXTRA_EXTENSIONS env var and merges valid entries into
LANGUAGE_EXTENSIONS in-place. Unknown languages and malformed entries
are skipped with WARNING logs.

Import chain (index_folder.py, index_repo.py) is unaffected — both
import the same LANGUAGE_EXTENSIONS dict object, so in-place mutation
propagates automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

Log extra extension mappings at INFO level in server.py startup.
Document the env var format, rules, and registered languages in
LANGUAGE_SUPPORT.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests/test_extra_extensions.py with 10 test cases covering:
- Valid .ext:lang pair merging
- Unknown language skipping with WARNING
- Malformed entries (no colon, empty ext, empty lang) with WARNING
- Empty/absent env var leaving LANGUAGE_EXTENSIONS unchanged
- Whitespace-only env var handling
- Override of built-in extension mappings
- Mixed valid and invalid entries
- Whitespace stripping in entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ives

- Raise discover_local_files() default max_files from 500 → 10,000 so
  large monorepos are fully indexed instead of silently truncated
- Fix should_skip_file() in index_folder.py and index_repo.py to use
  segment-aware matching for directory patterns (those ending in "/")
  preventing false positives on names like "rebuild/", "proto-utils/",
  or "build-tools/" that contain a skip pattern as a substring
- Update misleading result note to reflect new 10,000 threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip # comment markers in _clean_comment_markers() for Perl/shell/Python
- Clean POD directives (=pod, =head1, =cut etc.) leaving only content text
- Fix //! prefix ordering (must check before //) in comment marker stripping
- Raise index_repo max_files from 500 → 2,000 (lower than local 10k due to
  per-file GitHub API calls); update truncation warning message to match

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…file_outline

fix: improve get_file_outline description and error handling
Merged upstream/main (Swift, C++, Ruby, Elixir, get_max_index_files refactor)
with our branch (Perl, JCODEMUNCH_EXTRA_EXTENSIONS, indexer fixes).

- Keep both Perl use_statement and upstream Swift property_declaration
  constant extraction in _extract_constant()
- Add PERL_SPEC, .pl/.pm/.t extensions, and "perl" registry entry
  alongside upstream's new Swift/C++/Ruby/Elixir specs
- Add "perl" to search_symbols language enum
- Restore _apply_extra_extensions() and JCODEMUNCH_EXTRA_EXTENSIONS
  startup log on top of upstream's server.py changes
- Restore LANGUAGE_SUPPORT.md Configuration section
- Restore test_parse_perl() in test_languages.py alongside new upstream tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The should_skip_file() segment-aware matching (from commit 72028b8) was
accidentally reverted when resolving merge conflicts with --theirs.
Reapplied to both index_folder.py and index_repo.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The upstream refactor introduced get_max_index_files() with a 500-file
default. Monorepos easily exceed this cap, causing silent truncation.
Raise the default to 10,000 to match prior behavior; users can still
override via JCODEMUNCH_MAX_INDEX_FILES env var.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.