Skip to content

Replace glob with compare-changes#113

Open
anttiharju wants to merge 5 commits intompalmer:mainfrom
anttiharju:27-fix-wildcards-105-fix-gitignore-perf
Open

Replace glob with compare-changes#113
anttiharju wants to merge 5 commits intompalmer:mainfrom
anttiharju:27-fix-wildcards-105-fix-gitignore-perf

Conversation

@anttiharju
Copy link
Contributor

@anttiharju anttiharju commented Dec 1, 2025

What / Why

How

I had some free time on my hands so I spent Nov 21 to Nov 30 on building a Rust library called compare-changes that is a reimplementation of GitHub's workflow syntax.

While setting up a trivial parser/pattern matcher is easy enough, one will soon start to learn about things like memoization, backtracking, pathological patterns, automatas and such, which are interesting, but would take me ages to use / account for, so instead the library relies on chumsky and regex for the difficult runtime bits.

Overview of compare-changes
  1. It is a library intended to replace glob for action-validator
  2. It is a CLI / a GitHub Action that replaces my previous (poorly implemented) Python-based compare-changes. It is not that Python is bad (arguably better when targeting GitHub Actions), it is that it was a quick hack that worked well enough that until now wasn't worth improving.

and how the library works:

  • lib.rs defines the public API, and does some input normalization on the provided path string [L41-45] to match GitHub's behavior.
  • path/mod.rs uses chumsky to parse the input strings into Rust structs/enums so the data is easier to work with down the line. It also performs some validations on e.g. brackets and such. An alternative would have been nom but I understood chumsky's a bit better at emitting errors. Most of the integration tests focus on validation errors.

Chumsky is a parser library for Rust that makes writing expressive, high-performance parsers easy.

  • convert/mod.rs ingests the Rust structs/enums to build regex patterns to match the Github Actions workflow syntax filter patterns spec. It has (imho) lots of unit tests at convert/test.rs and it even performs some validation on the patterns by emitting errors on invalid see L300-324.

While GitHub's Documentation is helpful, it doesn't cover various edge cases of how pattern validation should behave. Therefore I've used various temporary repositories to run dummy workflows to see the actual behavior and eventually settled on the use of anttiharju/tmp for the purpose.

Now while I am quite happy with the state of the library, it of course could be that I've missed some edge cases and such, even if I am aware of a decent chunk of them. In case people encounter further issues with action-validator in regards to path patterns, feel free to point them to open an issue in anttiharju/compare-changes. These are fairly easy to provide repros for and it should be easy enough for me to update the library.

Testing

Library has plenty of automated tests. Also tested manually:

  • Usage as a Git hook (via Lefthook) still catches glob issues in pre-commit
  • Works well in a monorepo (nixpkgs). Since the difficult runtime bits are handled by other crates I didn't expect issues here, but a hand-written implementation could have struggled.

Benchmarks

Main motivator for me here was fixing the slow Git hook so here are some numbers from a repo that uses a lot of glob patterns (anttiharju/relcheck@v1.8.13).

worst case: from 7.883 s to 118.5 ms, a 66x improvement (7883/118=66,8050847458)
best case: from 92.4 ms to 119.0 ms, a 26.6 ms overhead

the multiplier may change a lot depending on the repo picked for running the benchmarks. In the performance issue discussion the multiplier was 19x [comment] because the worst case was not as bad. I did check that the numbers here were from 0.8.0 and not 0.6.0, with 0.6.0 worst case went up to ~10s.

Benchmarks ran on a 2021 14" MBP using hyperfine:

hyperfine --warmup 3 "git ls-files -z '.github/workflows/*.yml' '*/action.yml' | xargs -0 action-validator --verbose"
details

worst-case

with a gitignored .direnv from nix-direnv

action-validator@27-fix-wildcards-105-fix-gitignore-perf

  Time (mean ± σ):     118.5 ms ±   0.7 ms    [User: 80.5 ms, System: 36.4 ms]
  Range (min … max):   117.3 ms … 120.2 ms    23 runs

action-validator@v0.8.0

  Time (mean ± σ):      7.883 s ±  0.137 s    [User: 0.814 s, System: 7.062 s]
  Range (min … max):    7.717 s …  8.195 s    10 runs

best case

no gitignored files ensured with

git reset --hard && git clean -dfx

action-validator@27-fix-wildcards-105-fix-gitignore-perf

  Time (mean ± σ):     119.0 ms ±   1.0 ms    [User: 80.6 ms, System: 36.4 ms]
  Range (min … max):   117.8 ms … 121.3 ms    24 runs

action-validator@v0.8.0

  Time (mean ± σ):      92.4 ms ±   0.8 ms    [User: 66.6 ms, System: 26.2 ms]
  Range (min … max):    91.3 ms …  94.3 ms    30 runs

Anything else

  • Noticed that some tests (tests/fixtures/012_github_glob_syntax/glob.yml) did not match GitHub's behavior, so went ahead and fixed them.
  • action-validator now shells out to git for ls-files to solve the performance issue when a repository has a lot of gitignored files.

Comments on the code in the library (lib.rs, path/mod.rs, convert/mod.rs) are also welcome, but not expected.

@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 379fdd3 to 3010328 Compare December 1, 2025 09:51
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 9f1c3e0 to a830265 Compare December 5, 2025 21:27
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch 4 times, most recently from 5fbdf60 to a830265 Compare December 24, 2025 17:53
@anttiharju
Copy link
Contributor Author

anttiharju commented Dec 24, 2025

Broken cache setup is causing the CI failures, it is referring to output of a step that hasn't been executed yet, defaulting to empty. See:

key: ${{ runner.os }}-${{ steps.rust-install.outputs.cachekey }}-${{ matrix.rust_target }}-binary-release

and Linux--build, meaning, ${{ steps.rust-install.outputs.cachekey }} is empty

Run actions/cache@v4
Cache hit for: Linux--build
Received 25165824 of 303862892 (8.3%), 24.0 MBs/sec
Received 150994944 of 303862892 (49.7%), 71.9 MBs/sec
Received 299668588 of 303862892 (98.6%), 95.2 MBs/sec
Received 303862892 of 303862892 (100.0%), 94.2 MBs/sec
Cache Size: ~290 MB (303862892 B)
/usr/bin/tar -xf /home/runner/work/_temp/6dc3820f-ec58-409f-b2ad-c40a4dc4eb50/cache.tzst -P -C /home/runner/work/action-validator/action-validator --use-compress-program unzstd
Cache restored successfully
Cache restored from key: Linux--build

I would just manually delete the caches at https://github.com/mpalmer/action-validator/actions/caches and rerun the ci, but I can't.

For a proper fix something more involved is required and I don't have easy answers for that.

CI passes fine in my fork:

image

@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from a830265 to 5fbdf60 Compare December 24, 2025 18:07
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 5fbdf60 to c0a9294 Compare January 1, 2026 09:58
This unlocks helpful syntax in compare-changes

edition 2024 mandates minimum 1.86 as well, but
had some CI issues so bumping directly to 1.92
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 6793c47 to 9055611 Compare January 1, 2026 10:06
@anttiharju
Copy link
Contributor Author

Fixed CI with 0de554d, it may need further work later but for now should be ok

@anttiharju
Copy link
Contributor Author

@mpalmer happy new year, this PR is ready for review

The library hasn't really changed, just packaging & other updates.
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from be69f1b to 5f2453d Compare February 6, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance issue in repositories with lots of gitignored files

1 participant