Skip to content

Implementing grammar enumerator#3750

Draft
SWASTIC-7 wants to merge 1 commit intoAFLplusplus:mainfrom
SWASTIC-7:enumerator
Draft

Implementing grammar enumerator#3750
SWASTIC-7 wants to merge 1 commit intoAFLplusplus:mainfrom
SWASTIC-7:enumerator

Conversation

@SWASTIC-7
Copy link

Description

Added Enumeration method for gramatron as proposed in paper https://arxiv.org/pdf/2305.00522 based on issue #2309

Future work: Need to implement mutations based on enumeration methods

Checklist

  • I have run ./scripts/precommit.sh and addressed all comments

Copilot AI review requested due to automatic review settings March 9, 2026 09:48
@SWASTIC-7 SWASTIC-7 marked this pull request as draft March 9, 2026 09:48
@SWASTIC-7
Copy link
Author

@addisoncrump have a look at this please

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds deterministic grammar enumeration support for LibAFL’s Gramatron generator, based on the IntegerizedStack approach described in the referenced paper, enabling reproducible generation of the n-th derived input.

Changes:

  • Exposes a new generators::enumerator module implementing enumerate_automaton.
  • Adds GramatronGenerator::enumerate_nth to deterministically construct a GramatronInput from an index n.
  • Adds unit tests for enumerate_automaton using a small sample automaton.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
crates/libafl/src/generators/mod.rs Exports the new enumerator module.
crates/libafl/src/generators/gramatron.rs Adds enumerate_nth API on GramatronGenerator.
crates/libafl/src/generators/enumerator.rs Implements integerized-stack enumeration logic and adds tests.

Comment on lines +34 to +42
/// Assumes value codes exactly n integers. Zero afterwards.
pub fn split(&mut self, n: usize) -> Vec<u64> {
let mut out = Vec::with_capacity(n);
for _ in 0..(n - 1) {
out.push(self.pop());
}
out.push(self.value);
self.value = 0;
out
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntegerizedStack::split will underflow/panic for n == 0 (it does n - 1) and also behaves oddly for n == 0/n == 1. Since this is a public method, please handle these edge cases explicitly (e.g., return an empty vec for n == 0, and for n == 1 just return the current value) instead of relying on n - 1.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +55
/// Rosenberg-Strong pairing decode
fn decode(z: u64) -> (u64, u64) {
let m = (z as f64).sqrt().floor() as u64;
let msq = m * m;
if z - msq < m {
(z - msq, m)
} else {
(m, msq + 2 * m - z)
}
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode uses (z as f64).sqrt() which can produce incorrect results for large u64 values due to f64 precision loss, and it pulls in floating-point math for a no-std crate. Consider using the existing libafl_bolts::math::integer_sqrt (or another integer sqrt) to compute m without floats, and avoid potential underflow in z - msq if m is off by 1.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +63
/// Modular pairing decode
/// Returns (z mod k, (z - (z mod k)) / k)
fn mod_decode(z: u64, k: u64) -> (u64, u64) {
let a = z % k;
let b = (z - a) / k;
(b, a)
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for mod_decode says it returns (z mod k, (z - (z mod k)) / k), but the implementation returns (quotient, remainder) ((b, a)). Please fix the comment (or swap the tuple) so callers don’t misinterpret the return order.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +106
// if nonterminals then we need to choose one and recurse
let mut stack = IntegerizedStack::new(n - num_terminal);
let num_nonterminal = nonterminal_indices.len() as u64;
let rule_choice = stack.modpop(num_nonterminal) as usize;
let trigger_idx = nonterminal_indices[rule_choice];
let trigger = &triggers[trigger_idx];
let dest = trigger.dest;

let mut result = vec![Terminal::new(state, trigger_idx, trigger.term.clone())];

let child_terminals = enumerate_automaton(dest, stack.value, automaton);
result.extend(child_terminals);
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enumerate_automaton can panic or fail to make progress:

  • If nonterminal_indices is empty and n >= num_terminal, num_nonterminal == 0 and modpop(0) will panic (mod/div by zero).
  • If num_nonterminal == 1, modpop(1) leaves stack.value unchanged, so for grammars like “1 terminal + 1 recursive rule” this recurses forever for all n >= num_terminal.
    Please handle num_nonterminal == 0 and == 1 explicitly (likely by using a different decoding step that consumes n, per the IntegerizedStack/paper), or change the API to return a Result when the n-th derivation is undefined.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +75
/// Enumerate the n-th input deterministically using the IntegerizedStack algorithm.
/// This produces a unique [`GramatronInput`] for each value of `n`.
pub fn enumerate_nth(&self, n: u64) -> GramatronInput {
let terminals = crate::generators::enumerator::enumerate_automaton(
self.automaton.init_state,
n,
self.automaton,
);
GramatronInput::new(terminals)
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enumerate_nth is documented as producing a unique input for each n, but it currently delegates to enumerate_automaton, which can panic or loop indefinitely for some automata / n values (e.g., when a state has no nonterminals, or has exactly one nonterminal). Consider returning Result<GramatronInput, Error> (new API, so non-breaking) and documenting/handling the “no n-th derivation” case explicitly instead of implicitly panicking/hanging.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants