ropstar is an interactive ROP gadget semantic search tool. It fills the gap when fully automated tooling fails and you need exploratory search without falling back to plain text or regex.
- Lift to VEX --> Translate to Prolog --> Query
- Incrementally filter gadgets with stacked queries, each query filters the previous set
- Multi-arch support: x86/amd64, arm32/arm64, ppc32/ppc64, mips32/mips64, riscv64
- Requires rust: rustup
- Requires SWI Prolog
# install dependencies (debian/ubuntu)
sudo apt install \
swi-prolog \
build-essential \
autoconf automake libtool \
patch \
git \
libclang-dev clangcargo install --git https://github.com/Zetier/ropstar.git- Generate gadget list using your tool of choice
- Can use anything that will produce a listing of
<address>: disasm\n
ROPgadget --binary arm64_test > arm64_test.gadgets
ropper --nocolor -f arm64_test > arm64_test.gadgetsropstar analyze --binary arm64_test --gadgets arm64_test.gadgets --output arm64_test.cacheropstar query arm64_test.cache- Type to search
- Prefix with
!to invert - Prefix with
%for Prolog mode
- Prefix with
- Enter to commit
- Queries stack so each filters the previous result
- Backspace to undo
- Ctrl+C to exit
- When typing in Prolog mode, there is support for standard readline shortcuts
With custom rules:
ropstar query arm64_test.cache --rules ./custom_rules.plropstar dump --binary arm64_test --gadgets arm64_test.gadgets 0x12345678- Manually searching gadgets is painful
- Searching text for
pop rdiwon't findmov rdi, [rsp+8]orpop rax; mov rdi, rax - ropstar will find all of these with
stack_load(rdi)
- Searching text for
- Most semantic ROP tools only support x86 (and many are academic/unreleased)
- The ones that are multi-arch are mostly focused on building chains non-interactively
- RopView is interactive, but it requires a Binary Ninja license and only handles pre/post condition semantics
- Multi-arch support (tracks modern valgrind/VEX)
- No commercial software dependencies
- Interactive with an iterative refinement workflow
- Over-approximate then filter down
- Everything from the internal representation to user queries all use the same language (Prolog)
- Stay at a high level using convenience wrappers like
stack_pivot(FromReg)orwrite_what_where(DstReg, ValReg) - Or drop down all the way to referencing IR level constructs like
reg(Id, Ssa)orbinop(Op, Lhs, Rhs)and writing your own custom constraints - If a query fails unexpectedly, decompose it and adjust the constraints
- Stay at a high level using convenience wrappers like
The default input mode. Input is directly matched against gadget disassembly text. This can be useful for quickly filtering trivial conditions before moving to semantic search or even just to highlight terms for easier visibility.
# will show only gadgets containing the string "rax" in the results
# and highlight instances of "rax"
rax
Adding a prefix of ! will invert the query to reject any gadgets with matching text.
# filter out jumps to constants (sometimes inserted by tools like ROPgadget)
!jmp 0x
Starting a query with % drops you into a prolog> prompt, where the input is interpreted as Prolog.
mov_reg(x1, X), member(X, [x2, x3])Here, mov_reg is the predicate name and x1 and X are the arguments. Lowercase names are atoms, while uppercase are variables. In Prolog, , means AND. This query matches gadgets where there exists some X that satisfies mov_reg(x1, X), then restricts X to members of [x2, x3].
Register names will be automatically normalized to their reg(Id, Ssa) form. So if you type:
rdiit becomes, internally:
reg(72, _)The core of the semantic search is the source tracking. This is transitive, so mov rcx, rax; mov rbx, [rcx] will have rax in the sources of rbx. To find gadgets where rdi depends on a load:
% rdi is the final ssa version
final(rdi),
% get the sources of rdi
reg_sources(rdi, Srcs),
% constrain the sources list to have a non-zero amount of memory reads
sources_mems(Srcs, [_|_])Or maybe we want to find gadgets where the stack pointer is modified by a register. Similar structure, but with registers instead of memory:
% S must be the final ssa version of sp
final_sp(Sp),
% get the sources of Sp
reg_sources(Sp, Srcs),
% extract out the register reads from the sources list
sources_regs(Srcs, Regs),
% Regs must contain at least one value R
member(R, Regs),
% R must be the initial ssa version of some register
init(R)This contains unexpected results such as ret or call because we didn't constrain R \= Sp. This can be added by appending on:
% extract R's register id
R = reg(RegId, _),
% extract Sp's register id
Sp = reg(SpId, _),
RegId \= SpIdThis is an over-approximation though, as the sources rule is not exclusive. So gadgets like sub rsp, rsi or add sp, sp, 0x20 would still match.
The built-in helper stack_pivot(Reg) is implemented along these lines. If filtering out things like the sub was needed it could be further constrained by the user.
- Completely unbound queries are effectively meaningless (will match everything) and are very slow (have to check everything)
- Some of the helpers have guards to prevent this, but there can still be missed cases
- Example:
read_what_where(X, Y)
- Constrain register SSA versions with
init(R)andfinal(R)if you want effects- Without this, queries could match on intermediate states
- Filter out noise early
- The fewer gadgets there are to search through, the faster everything works
- Text mode
!jmp 0xto eliminate invalid gadgets - Small, fast prolog queries such as
clobbers(rdi)orsyscall_gadgetto limit to gadgets containing necessary values
- For debugging, use
ropstar dumpto inspect internal gadget representation
const(C)
reg(Id, Ssa)
reg_read(Reg)
reg_write(Reg, ValTmp)
mem_read(AddrTmp)
mem_write(AddrTmp, ValTmp)
tmp_write(DstTmp, X)
unop(Op, Arg)
binop(Op, Lhs, Rhs)
exit_kind(Kind)
cond_exit(Guard, Dst)Registers take the form reg(Id, Ssa) where Id is an architecture specific index and Ssa is the SSA version of the register. Queries are normalized such that on x86 eax gets replaced with reg(8, _) and on arm lr gets replaced with reg(64, _) and so on. The initial state of a register will have reg(Id, 0) and then each modification will introduce a new SSA version like reg(Id, 1).
There are several convenience queries included which cover common cases. These are generally over-approximations.
% the register Dst depends on the initial value of Src (and no other register or memory)
mov_reg(Dst, Src)
% DstReg depends on a load which depends on SrcReg
read_what_where(DstReg, SrcReg)
% there is a store where the address depends on AddrReg and the value depends on ValReg
write_what_where(AddrReg, ValReg)
% Reg depends on a load from the stack
% simple wrapper around read_what_where
stack_load(Reg)
% the final value of the stack pointer depends on a non-SP register
% note: misses the memory dependency case (`pop sp`, etc.)
stack_pivot(Reg)% Reg is modified by the gadget
clobbers(Reg)
% Reg is not modified by the gadget
preserves(Reg)
% constrains the Reg's SSA value to 0 (the initial value of the register)
init(Reg)
% constrains the Reg's SSA value to the max (the final value of the register)
final(Reg)
% gadgets involving a syscall
syscall_gadget
% helpers for generically accessing SP across architectures
sp_reg(SP)
init_sp(SP)
final_sp(SP)
% collect all (transitive) leaf sources (reg_read, const, mem_read)
% Srcs = [reg_read(R), const(C), etc.]
sources(Tmp, Srcs)
% collect immediate (non-transitive) sources
sources_shallow(Tmp, Srcs)
% filter sources to just registers or memory
sources_regs(Srcs, Regs)
sources_mems(Srcs, Mems)
% sources of a reg's value
reg_sources(Reg, Srcs)
% just register sources
reg_regs(Reg, RegSrcs)
% just memory sources
reg_mems(Reg, MemSrcs)
% [deprecated] transitive dependency tracking between all fact types
% this is excessively slow in most cases and can hang the UI
depends(A, B)
% [deprecated] non-transitive dependency
direct_depends(A, B)Gadgets ending in conditional jumps have unexpected behavior. Mid-gadget conditionals are handled by continuing down the fallthrough path. However, as ropstar currently does not disassemble the gadgets itself to validate, the analysis can continue past the displayed end of the gadget in this case. This can manifest as gadgets matching queries that don't line up with the displayed disassembly.
TODO: disassemble the gadgets and validate them (end in a controllable jump) and record the exact bytes involved.
Jumps that land within the gadget (creating cycles or multiple control flow paths) are not handled. These are rare and cycles would break the model.
TODO: if this use case is actually necessary, forward jumps within a gadget wouldn't be too bad to implement. would need to add some scaffolding for phi nodes and then emit facts for all branches simultaneously (or have it re-emit each path as a separate gadget)
This was a design decision. However, gadgets with mid-sequence conditional jumps are handled by chaining together the basic blocks internally. The dependency rules also already handle this case through register SSA tracking.
TODO: build an alternative interface for building chains?
No rules currently exist for handling memory aliasing. This would probably be more relevant for chains of multiple gadgets than the current single-gadget case.
TODO: could be interesting to add memory aliasing rules?