feat: add Perl language support and JCODEMUNCH_EXTRA_EXTENSIONS#1
Open
halindrome wants to merge 41 commits intomainfrom
Open
feat: add Perl language support and JCODEMUNCH_EXTRA_EXTENSIONS#1halindrome wants to merge 41 commits intomainfrom
halindrome wants to merge 41 commits intomainfrom
Conversation
- Add PERL_SPEC to languages.py with subroutine_declaration_statement (function) and package_statement (class) node types - Register .pl, .pm, .t extensions in LANGUAGE_EXTENSIONS - Add "pod" to _extract_preceding_comments() for POD docstring support - Add use_statement branch to _extract_constant() for Perl constants - Create tests/fixtures/perl/sample.pl fixture file - Add test_parse_perl() covering packages, subs, comments, constants - Add "perl" to search_symbols language enum in server.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Swift to the language registry using tree-sitter-swift (already bundled in tree-sitter-language-pack — no new dependency). Changes: - languages.py: SWIFT_SPEC covering function_declaration, class_declaration (class/struct/enum/extension), protocol_declaration, init_declaration; preceding_comment docstring strategy for /// and /* */ comments - extractor.py: property_declaration handler in _extract_constant for Swift let MAX_CONST = ... bindings - server.py: adds "swift" to the search_symbols language enum - tests/test_languages.py: test_parse_swift covering function, class, struct, protocol, enum, init, method, and constant extraction Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using-jcodemunch-mcp Add `--version` / `-V` flag to jcodemunch-mcp CLI
Add JCODEMUNCH_MAX_INDEX_FILES so large repos can be indexed with a higher file cap when needed. Apply the limit to both local folder and GitHub repo indexing, and only show truncation notices when files were actually dropped. Cover the env override and truncation behavior in tests.
Harden C++ support after jgravelle#24: header auto-detection, symbol accounting, and coverage expansion
Make the indexing file cap configurable
Index `const foo = () => {}`, `const bar = function() {}`, and
`const gen = function*() {}` patterns as function symbols. The name
is extracted from the parent variable_declarator node.
Changes:
- New `_extract_variable_function()` in extractor.py handles
variable_declarator nodes with function-like initializers
- Remove `arrow_function` from JS/TS symbol_node_types (was silently
dropped — caused wasted work on inline callbacks)
- Remove dead guard in `_extract_name()` that returned None for
arrow_function nodes
- 11 new tests covering positive cases (arrow, function expression,
generator, exported, typed), negative cases (non-function values,
inline callbacks, destructuring), and docstring/signature extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…upport feat: add arrow function variable support for JS/TS
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `file_summaries` field to CodeIndex (INDEX_VERSION 2→3) - Generate heuristic summaries from symbols (e.g., "Defines X class (3 methods)") - Wire summaries into both full and incremental indexing paths - Surface summaries in get_file_tree (include_summaries) and get_file_outline - Use single-pass defaultdict grouping for O(n) symbol-to-file mapping - Backward compatible: v2 indexes load with empty file_summaries - 258 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements full Elixir symbol extraction via a custom AST walker, bypassing the standard LanguageSpec approach because Elixir's tree-sitter grammar is homoiconic (all constructs are generic `call`/`unary_operator` nodes with no dedicated function/class types). Symbols extracted: - defmodule / defimpl → class - defprotocol → type - def / defp / defmacro / defmacrop / defguard / defguardp → method (inside module) or function (top-level) - @type / @TypeP / @opaque / @callback → type - @doc / @moduledoc string content captured as docstrings - Nested modules handled via recursive descent with parent tracking - Multi-clause functions disambiguated by existing _disambiguate_overloads() Key implementation note: child_by_field_name("arguments") returns None in the Elixir grammar even though the node exists as a named child. Added _get_elixir_args() helper that finds it by type iteration. Files changed: - src/jcodemunch_mcp/parser/languages.py: ELIXIR_SPEC + extensions - src/jcodemunch_mcp/parser/extractor.py: _parse_elixir_symbols() + walker - src/jcodemunch_mcp/server.py: add "elixir" to language enum - tests/fixtures/elixir/sample.ex: comprehensive fixture - tests/test_languages.py: test_parse_elixir() with 13 assertions - tests/test_hardening.py: determinism + per-language extraction tests - LANGUAGE_SUPPORT.md / README.md: updated language tables Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _node_text(), _first_named_child(), _get_elixir_attr_name(), and _make_elixir_symbol() helpers to eliminate repeated patterns - Add _ELIXIR_*_KW frozenset constants to replace inline tuples - Simplify _find_elixir_do_block() by removing dead arguments branch - Remove redundant @ operator check in _walk_elixir (attr_name check suffices) - Inline trivial _extract_elixir_callback() delegation - Fix single-element tuple comparison and duplicate byte slice - Remove redundant LANGUAGE_REGISTRY import inside function body - Net: -46 lines, no behavioral changes (all 257 tests pass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…using-jcodemunch-mcp-54c17x Add CLI -V/--version flag and dynamic package version lookup
feat: add file-level summaries to index
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add --version / -V flag to jcodemunch-mcp CLI
…support feat: add Elixir language support (.ex, .exs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts class, module (type), instance methods, singleton methods (def self.foo), and top-level functions via standard LanguageSpec. Preceding # comments captured as docstrings. Closes jgravelle#31 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `file_path` arg name to tool description so models don't confuse it with `file` (observed: models called the tool with `file=...`, triggering a silent KeyError that returned only `"'file_path'"`) - Catch KeyError before generic Exception to return a meaningful "Missing required argument" message for all tools Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Perl row to LANGUAGE_SUPPORT.md feature matrix - Documents extensions (.pl, .pm, .t), symbol types (function, class, constant), and docstring extraction (preceding # comments and POD blocks) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _apply_extra_extensions() to languages.py, called at module load time after LANGUAGE_REGISTRY. Reads comma-separated .ext:lang pairs from JCODEMUNCH_EXTRA_EXTENSIONS env var and merges valid entries into LANGUAGE_EXTENSIONS in-place. Unknown languages and malformed entries are skipped with WARNING logs. Import chain (index_folder.py, index_repo.py) is unaffected — both import the same LANGUAGE_EXTENSIONS dict object, so in-place mutation propagates automatically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Log extra extension mappings at INFO level in server.py startup. Document the env var format, rules, and registered languages in LANGUAGE_SUPPORT.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests/test_extra_extensions.py with 10 test cases covering: - Valid .ext:lang pair merging - Unknown language skipping with WARNING - Malformed entries (no colon, empty ext, empty lang) with WARNING - Empty/absent env var leaving LANGUAGE_EXTENSIONS unchanged - Whitespace-only env var handling - Override of built-in extension mappings - Mixed valid and invalid entries - Whitespace stripping in entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ives - Raise discover_local_files() default max_files from 500 → 10,000 so large monorepos are fully indexed instead of silently truncated - Fix should_skip_file() in index_folder.py and index_repo.py to use segment-aware matching for directory patterns (those ending in "/") preventing false positives on names like "rebuild/", "proto-utils/", or "build-tools/" that contain a skip pattern as a substring - Update misleading result note to reflect new 10,000 threshold Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip # comment markers in _clean_comment_markers() for Perl/shell/Python - Clean POD directives (=pod, =head1, =cut etc.) leaving only content text - Fix //! prefix ordering (must check before //) in comment marker stripping - Raise index_repo max_files from 500 → 2,000 (lower than local 10k due to per-file GitHub API calls); update truncation warning message to match Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…file_outline fix: improve get_file_outline description and error handling
Merged upstream/main (Swift, C++, Ruby, Elixir, get_max_index_files refactor) with our branch (Perl, JCODEMUNCH_EXTRA_EXTENSIONS, indexer fixes). - Keep both Perl use_statement and upstream Swift property_declaration constant extraction in _extract_constant() - Add PERL_SPEC, .pl/.pm/.t extensions, and "perl" registry entry alongside upstream's new Swift/C++/Ruby/Elixir specs - Add "perl" to search_symbols language enum - Restore _apply_extra_extensions() and JCODEMUNCH_EXTRA_EXTENSIONS startup log on top of upstream's server.py changes - Restore LANGUAGE_SUPPORT.md Configuration section - Restore test_parse_perl() in test_languages.py alongside new upstream tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The should_skip_file() segment-aware matching (from commit 72028b8) was accidentally reverted when resolving merge conflicts with --theirs. Reapplied to both index_folder.py and index_repo.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The upstream refactor introduced get_max_index_files() with a 500-file default. Monorepos easily exceed this cap, causing silent truncation. Raise the default to 10,000 to match prior behavior; users can still override via JCODEMUNCH_MAX_INDEX_FILES env var. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PERL_SPECwith subroutine and package extraction,.pl/.pm/.textension mappings, POD docstring support, anduse constantextractionJCODEMUNCH_EXTRA_EXTENSIONSenv var — allows users to map non-standard file extensions to languages at runtime (e.g..cgi:perl,.psgi:perl) without modifying sourcetests/fixtures/perl/sample.pland full test coverage intest_languages.pyandtest_extra_extensions.pyChanges
Perl language support (
feat/parser)languages.py:PERL_SPECdefined and registered inLANGUAGE_REGISTRY;.pl,.pm,.tadded toLANGUAGE_EXTENSIONSextractor.py: POD node type added to_extract_preceding_comments();use_statementbranch added to_extract_constant()foruse constant NAME => valueserver.py:"perl"added tosearch_symbolslanguage enumtests/fixtures/perl/sample.pl: representative fixture with packages, subs, POD docs, and constantstests/test_languages.py:test_parse_perl()covering packages, subs, comments, and constantsConfigurable extension mappings (
feat/parser)languages.py:_apply_extra_extensions()readsJCODEMUNCH_EXTRA_EXTENSIONSat module load, merges.ext:langpairs intoLANGUAGE_EXTENSIONS; unknown languages and malformed entries are skipped with warningsserver.py: startup log confirms extra extensions loadedLANGUAGE_SUPPORT.md: documents the env var format, examples, and skip rulestests/test_extra_extensions.py: 10 tests covering valid mappings, unknown languages, malformed entries, overrides, and whitespace handlingTest plan
pytest tests/test_languages.py— Perl extraction passespytest tests/test_extra_extensions.py— env var parsing passes (10 tests)pytest tests/test_parser.py— no regressionsJCODEMUNCH_EXTRA_EXTENSIONS=".cgi:perl"and verify.cgifiles are indexed as Perl🤖 Generated with Claude Code