Conversation
|
@addisoncrump have a look at this please |
There was a problem hiding this comment.
Pull request overview
This PR adds deterministic grammar enumeration support for LibAFL’s Gramatron generator, based on the IntegerizedStack approach described in the referenced paper, enabling reproducible generation of the n-th derived input.
Changes:
- Exposes a new
generators::enumeratormodule implementingenumerate_automaton. - Adds
GramatronGenerator::enumerate_nthto deterministically construct aGramatronInputfrom an indexn. - Adds unit tests for
enumerate_automatonusing a small sample automaton.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
crates/libafl/src/generators/mod.rs |
Exports the new enumerator module. |
crates/libafl/src/generators/gramatron.rs |
Adds enumerate_nth API on GramatronGenerator. |
crates/libafl/src/generators/enumerator.rs |
Implements integerized-stack enumeration logic and adds tests. |
| /// Assumes value codes exactly n integers. Zero afterwards. | ||
| pub fn split(&mut self, n: usize) -> Vec<u64> { | ||
| let mut out = Vec::with_capacity(n); | ||
| for _ in 0..(n - 1) { | ||
| out.push(self.pop()); | ||
| } | ||
| out.push(self.value); | ||
| self.value = 0; | ||
| out |
There was a problem hiding this comment.
IntegerizedStack::split will underflow/panic for n == 0 (it does n - 1) and also behaves oddly for n == 0/n == 1. Since this is a public method, please handle these edge cases explicitly (e.g., return an empty vec for n == 0, and for n == 1 just return the current value) instead of relying on n - 1.
| /// Rosenberg-Strong pairing decode | ||
| fn decode(z: u64) -> (u64, u64) { | ||
| let m = (z as f64).sqrt().floor() as u64; | ||
| let msq = m * m; | ||
| if z - msq < m { | ||
| (z - msq, m) | ||
| } else { | ||
| (m, msq + 2 * m - z) | ||
| } | ||
| } |
There was a problem hiding this comment.
decode uses (z as f64).sqrt() which can produce incorrect results for large u64 values due to f64 precision loss, and it pulls in floating-point math for a no-std crate. Consider using the existing libafl_bolts::math::integer_sqrt (or another integer sqrt) to compute m without floats, and avoid potential underflow in z - msq if m is off by 1.
| /// Modular pairing decode | ||
| /// Returns (z mod k, (z - (z mod k)) / k) | ||
| fn mod_decode(z: u64, k: u64) -> (u64, u64) { | ||
| let a = z % k; | ||
| let b = (z - a) / k; | ||
| (b, a) | ||
| } |
There was a problem hiding this comment.
The doc comment for mod_decode says it returns (z mod k, (z - (z mod k)) / k), but the implementation returns (quotient, remainder) ((b, a)). Please fix the comment (or swap the tuple) so callers don’t misinterpret the return order.
| // if nonterminals then we need to choose one and recurse | ||
| let mut stack = IntegerizedStack::new(n - num_terminal); | ||
| let num_nonterminal = nonterminal_indices.len() as u64; | ||
| let rule_choice = stack.modpop(num_nonterminal) as usize; | ||
| let trigger_idx = nonterminal_indices[rule_choice]; | ||
| let trigger = &triggers[trigger_idx]; | ||
| let dest = trigger.dest; | ||
|
|
||
| let mut result = vec![Terminal::new(state, trigger_idx, trigger.term.clone())]; | ||
|
|
||
| let child_terminals = enumerate_automaton(dest, stack.value, automaton); | ||
| result.extend(child_terminals); |
There was a problem hiding this comment.
enumerate_automaton can panic or fail to make progress:
- If
nonterminal_indicesis empty andn >= num_terminal,num_nonterminal == 0andmodpop(0)will panic (mod/div by zero). - If
num_nonterminal == 1,modpop(1)leavesstack.valueunchanged, so for grammars like “1 terminal + 1 recursive rule” this recurses forever for alln >= num_terminal.
Please handlenum_nonterminal == 0and== 1explicitly (likely by using a different decoding step that consumesn, per the IntegerizedStack/paper), or change the API to return aResultwhen the n-th derivation is undefined.
| /// Enumerate the n-th input deterministically using the IntegerizedStack algorithm. | ||
| /// This produces a unique [`GramatronInput`] for each value of `n`. | ||
| pub fn enumerate_nth(&self, n: u64) -> GramatronInput { | ||
| let terminals = crate::generators::enumerator::enumerate_automaton( | ||
| self.automaton.init_state, | ||
| n, | ||
| self.automaton, | ||
| ); | ||
| GramatronInput::new(terminals) | ||
| } |
There was a problem hiding this comment.
enumerate_nth is documented as producing a unique input for each n, but it currently delegates to enumerate_automaton, which can panic or loop indefinitely for some automata / n values (e.g., when a state has no nonterminals, or has exactly one nonterminal). Consider returning Result<GramatronInput, Error> (new API, so non-breaking) and documenting/handling the “no n-th derivation” case explicitly instead of implicitly panicking/hanging.
Description
Added Enumeration method for
gramatronas proposed in paper https://arxiv.org/pdf/2305.00522 based on issue #2309Future work: Need to implement mutations based on enumeration methods
Checklist
./scripts/precommit.shand addressed all comments