Skip to content

Switch to new range syntax#2173

Merged
traviscross merged 12 commits intorust-lang:masterfrom
ehuss:new-range-syntax
Feb 18, 2026
Merged

Switch to new range syntax#2173
traviscross merged 12 commits intorust-lang:masterfrom
ehuss:new-range-syntax

Conversation

@ehuss
Copy link
Contributor

@ehuss ehuss commented Feb 15, 2026

This changes the syntax for range repeat so that it handles inclusive and exclusive upper bounds with ..= and ...

The old syntax was confusing. It used .. for inclusive bound, but that's not how Rust syntax works. This changes it so that it uses ..= for inclusive bounds to be consistent with Rust syntax.

There are some other options for range syntax that I considered:

  • {a,b} which is the syntax used by most regex engines, and some parsers like Pest and Parsimonious.
  • IETF ABNF and W3C EBNF uses a*bexpr where a and b are optional.
  • Peg-rs uses *<n,m> where n and m are optional.
  • Various languages use : (Python, Julia, Excel, etc.) or .. (Rust, Kotlin, Swift, C#, F#, Zig, Perl, etc.) to represent ranges.

This will become more relevant when we switch the raw string literals to use a bounded range. We can't easily avoid the use of bounded repetition because of raw-string's bound of 255. Listing out 255 variants would be just too much, and it is convenient to avoid English-descriptive rules.

This changes the syntax for range repeat so that it handles inclusive
and exclusive upper bounds with `..=` and `..`.

The old syntax was confusing. It used `..` for inclusive bound, but
that's not how Rust syntax works. This changes it so that it uses `..=`
for inclusive bounds to be consistent with Rust syntax.

There are some other options for range syntax that I considered:

- `{a,b}` which is the syntax used by most regex engines, and some
  parsers like Pest and Parsimonious.
- IETF ABNF and W3C EBNF uses `a*bexpr` where `a` and `b` are optional.
- Peg-rs uses `*<n,m>` where `n` and `m` are optional.
- Various languages use `:` (Python, Julia, Excel, etc.) or `..` (Rust,
  Kotlin, Swift, C#, F#, Zig, Perl, etc.) to represent ranges.

This will become more relevant when we switch the raw string literals to
use a bounded range. We can't easily avoid the use of bounded repetition
because of raw-string's bound of 255. Listing out 255 variants would be
just too much, and it is convenient to avoid English-descriptive rules.
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Feb 15, 2026
Copy link
Contributor

@traviscross traviscross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. A few things:

  1. In grammar.md, the Unicode production on line 69 still uses 4..4, which under the new exclusive semantics would be the empty range rather than "exactly 4". Should that be 4..=4?

  2. The phrasing "a to b repetitions (exclusive) of x" and "a to b repetitions (inclusive) of x" reads a bit awkwardly — the parenthetical interrupts the noun phrase. Would something like "a to b (exclusive) repetitions of x" or moving the qualifier to the end work better?

  3. The grammar rule for RepeatRangeInclusive requires both bounds (Range `..=` Range), but the parser appears to accept {..=4} (no lower bound). Is this intentional? If {..=b} should be valid, the grammar might want Range? for the lower bound.

  4. Minor: in render_railroad.rs, the comment for the compound range decomposition arm (around diff line 381) seems to have a missing backtick and {e..b-(a-1)} where the min value should be 1.

One edge case I noticed in the railroad renderer: e{a..a} with HalfOpen (e.g., e{2..2}) — an exclusive range where min equals max — decomposes into copies of e rather than rendering as empty. The range 2..2 is empty, so this seems like it should produce nothing. This is probably academic (nobody would write such a range), but might be worth a parser validation or a special case in the renderer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this.

src/notation.md Outdated
Comment on lines 19 to 20
| x<sup>a..b</sup> | HEX_DIGIT<sup>1..6</sup> | a to b repetitions (exclusive) of x |
| x<sup>a..=b</sup> | HEX_DIGIT<sup>1..=6</sup> | a to b repetitions (inclusive) of x |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it'd be better to put the parenthetical earlier, "a to be (inclusive) repetitions".

// Treat:
// - `e{a..}` as `e{0..a-1} e{1..}`
// - `e{a..=b}` as `e{0..a-1} e{1..=b-(a-1)}`
// - `e{a..b} as `e{0..a-1} {e..b-(a-1)}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// - `e{a..b} as `e{0..a-1} {e..b-(a-1)}`
// - `e{a..b}` as `e{0..a-1} {e..b-(a-1)}`

Missing a closing backtick.

}

/// Parse `{a..}` | `{..b}` | `{a..b}` after expression.
/// Parse `{a..b}` | `{a..=b}` after expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing, the renderer still seems to support {..b} and {..=b}.

Comment on lines 67 to 69
RepeatRange -> `{` Range? `..` Range? `}`

RepeatRangeInclusive -> `{` Range `..=` Range `}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to support {..=b} -- which is what I'd expect. Just not {a..=} or {..=}.

Comment on lines 249 to +258
let mut es = Vec::<Expression>::new();
for _ in 0..(a - 1) {
es.push(*e.clone());
}
es.push(Expression::new_kind(ExpressionKind::RepeatRange(
e.clone(),
Some(1),
b.map(|x| x - (a - 1)),
)));
es.push(Expression::new_kind(ExpressionKind::RepeatRange {
expr: e.clone(),
min: Some(1),
max: b.map(|x| x - (a - 1)),
limit: *limit,
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're rendering x{a..a} incorrectly; we're emitting x one time instead of zero times.

Under the new exclusive-range semantics, `4..4` is an empty range --
it matches zero characters.  The intent is to match exactly four
characters, so we need the inclusive form `4..=4`.
An exclusive range such as `{2..2}` is empty (it matches zero
repetitions), and `{3..2}` doesn't make sense.  The existing
validation catches the case where `max < min`, but it doesn't catch
`max == min` for half-open ranges, which is equally degenerate.

We now reject `b <= a` when the range limit is `HalfOpen`.  The
closed-range check is unchanged: `{2..=2}` (exactly two repetitions)
remains valid.
The parser already accepts `{..=b}` (no lower bound) for inclusive
ranges, but the grammar production requires both bounds.  We update
the production to use `Range?` for the lower bound, matching what the
parser actually accepts.

The forms `{..=}` and `{a..=}` remain correctly rejected by the
parser (a closed range must have an upper bound).

We also update the description table to note that the lower bound can
be omitted.
Several comments in the `RepeatRange` match block are inaccurate or
incomplete after the switch to the new range syntax.

The decomposition comment for the `min >= 2` arm has a missing
backtick and an incorrect formula.  We fix the backtick and correct
the decomposition for `e{a..b}` from `{e..b-(a-1)}` to
`e{1..b-(a-1)}`.  We also drop the separate `e{x..=x}` bullet, which
is just a special case of the `e{a..=b}` decomposition.

The empty-node comment lists `e{a..=0}` and `e{a..0}`, which the
parser now rejects when `a > 0` (an earlier commit rejects empty
exclusive ranges, and the existing validation rejects `max < min`).
We update the comment to list the actually-reachable cases: `e{..=0}`,
`e{0..=0}`, `e{..0}`, `e{..1}`, and `e{0..1}`.

We also update comments on the other arms to mention both the
half-open and closed forms they match.
The parenthetical "(exclusive)" and "(inclusive)" interrupt the noun
phrase "a to b repetitions of x" awkwardly, leaving the reader to work
out which bound is being qualified.

We rephrase to "a to b repetitions of x, exclusive of b" and "a to b
repetitions of x, inclusive of b", which reads more naturally and
makes the referent of the qualifier unambiguous.
All four fields of the `RepeatRange` variant (`expr`, `min`, `max`,
`limit`) are already named in this pattern, so the trailing `..` is
redundant.
Even though the parser now rejects `{a..a}` half-open ranges, the
renderer should handle them correctly on principle.  A half-open range
where `min >= max` is empty -- it specifies zero repetitions -- and
should render as an empty node rather than falling through to the
decomposition arm, which would incorrectly produce copies of the
expression.
The parser validates several invariants on repeat ranges: half-open
ranges must satisfy `max > min`, closed ranges need an explicit upper
bound, and malformed ranges (`max < min`) are rejected outright.  We
had no test coverage for any of these checks.

Let's add tests for the full matrix of valid range forms
(half-open, closed, with and without bounds) as well as the error
paths (`max < min`, empty exclusive ranges like `x{2..2}` and
`x{0..0}`, and closed ranges missing an upper bound).  We also cover
the edge cases `x{2..=2}` (exactly two, via closed range) and
`x{0..1}` (exactly zero, via half-open range).
The range rendering logic in `render_railroad.rs` handles several
tricky edge cases -- empty half-open ranges, exact closed ranges,
decomposition of multi-element ranges -- but had no test coverage.

We add a `for_test()` constructor on `RenderCtx` so that the
renderer's internal `render_expression` function can be called from
tests without needing a full `Chapter` and link-map setup.  We then
add five tests that construct `RepeatRange` expressions directly and
verify the SVG output:

- `e{2..2}` and `e{3..1}` both render as empty nodes (defense in depth
  against the parser's own rejection).
- `e{1..=1}` renders as a single nonterminal with no repeat comment.
- `e{2..=4}` renders with nonterminal content and a "more times"
  repeat comment.
- `e{..=1}` renders as an optional containing the nonterminal.
The half-open range `{..0}` means zero to less-than-zero repetitions,
which is empty.  An earlier commit rejects `{a..a}` when both bounds
are present, but `{..0}` (with no lower bound) slipped through because
the validation requires `Some(min)`.  Let's close this gap.
The inclusive example row in the notation table uses
`HEX_DIGIT<sup>1..=6</sup>`, but the exclusive row already uses
`HEX_DIGIT<sup>1..6</sup>`.  Let's change the inclusive example to
`1..=5` so that both rows describe the same range
(1 through 5), making the equivalence between `1..6` and
`1..=5` immediately visible.
@traviscross traviscross added this pull request to the merge queue Feb 18, 2026
Merged via the queue into rust-lang:master with commit e518786 Feb 18, 2026
6 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments