ropstar

ropstar is an interactive ROP gadget semantic search tool. It fills the gap when fully automated tooling fails and you need exploratory search without falling back to plain text or regex.

Lift to VEX --> Translate to Prolog --> Query
Incrementally filter gadgets with stacked queries, each query filters the previous set
Multi-arch support: x86/amd64, arm32/arm64, ppc32/ppc64, mips32/mips64, riscv64

Quickstart

Prerequisites

Requires rust: rustup
Requires SWI Prolog

# install dependencies (debian/ubuntu)
sudo apt install \
  swi-prolog \
  build-essential \
  autoconf automake libtool \
  patch \
  git \
  libclang-dev clang

Build and Install

cargo install --git https://github.com/Zetier/ropstar.git

Obtain Gadget Listing

Generate gadget list using your tool of choice
Can use anything that will produce a listing of <address>: disasm\n

ROPgadget --binary arm64_test > arm64_test.gadgets

ropper --nocolor -f arm64_test > arm64_test.gadgets

Create an index

ropstar analyze --binary arm64_test --gadgets arm64_test.gadgets --output arm64_test.cache

Query over the index

ropstar query arm64_test.cache

Type to search
- Prefix with ! to invert
- Prefix with % for Prolog mode
Enter to commit
Queries stack so each filters the previous result
Backspace to undo
Ctrl+C to exit
When typing in Prolog mode, there is support for standard readline shortcuts

With custom rules:

ropstar query arm64_test.cache --rules ./custom_rules.pl

Dump a gadget's VEX IR and Prolog facts

ropstar dump --binary arm64_test --gadgets arm64_test.gadgets 0x12345678

Motivation

Semantic Search

Manually searching gadgets is painful
- Searching text for pop rdi won't find mov rdi, [rsp+8] or pop rax; mov rdi, rax
- ropstar will find all of these with stack_load(rdi)

Existing Tools

Most semantic ROP tools only support x86 (and many are academic/unreleased)
The ones that are multi-arch are mostly focused on building chains non-interactively
RopView is interactive, but it requires a Binary Ninja license and only handles pre/post condition semantics

Design Goals

Multi-arch support (tracks modern valgrind/VEX)
No commercial software dependencies
Interactive with an iterative refinement workflow
- Over-approximate then filter down
Everything from the internal representation to user queries all use the same language (Prolog)
- Stay at a high level using convenience wrappers like stack_pivot(FromReg) or write_what_where(DstReg, ValReg)
- Or drop down all the way to referencing IR level constructs like reg(Id, Ssa) or binop(Op, Lhs, Rhs) and writing your own custom constraints
- If a query fails unexpectedly, decompose it and adjust the constraints

Query Language

Plain Text

The default input mode. Input is directly matched against gadget disassembly text. This can be useful for quickly filtering trivial conditions before moving to semantic search or even just to highlight terms for easier visibility.

# will show only gadgets containing the string "rax" in the results
# and highlight instances of "rax"
rax

Adding a prefix of ! will invert the query to reject any gadgets with matching text.

# filter out jumps to constants (sometimes inserted by tools like ROPgadget)
!jmp 0x

Prolog

Starting a query with % drops you into a prolog> prompt, where the input is interpreted as Prolog.

mov_reg(x1, X), member(X, [x2, x3])

Here, mov_reg is the predicate name and x1 and X are the arguments. Lowercase names are atoms, while uppercase are variables. In Prolog, , means AND. This query matches gadgets where there exists some X that satisfies mov_reg(x1, X), then restricts X to members of [x2, x3].

Register names will be automatically normalized to their reg(Id, Ssa) form. So if you type:

rdi

it becomes, internally:

reg(72, _)

The core of the semantic search is the source tracking. This is transitive, so mov rcx, rax; mov rbx, [rcx] will have rax in the sources of rbx. To find gadgets where rdi depends on a load:

% rdi is the final ssa version
final(rdi),

% get the sources of rdi
reg_sources(rdi, Srcs),

% constrain the sources list to have a non-zero amount of memory reads
sources_mems(Srcs, [_|_])

Or maybe we want to find gadgets where the stack pointer is modified by a register. Similar structure, but with registers instead of memory:

% S must be the final ssa version of sp
final_sp(Sp),

% get the sources of Sp
reg_sources(Sp, Srcs),

% extract out the register reads from the sources list
sources_regs(Srcs, Regs),

% Regs must contain at least one value R
member(R, Regs),

% R must be the initial ssa version of some register
init(R)

This contains unexpected results such as ret or call because we didn't constrain R \= Sp. This can be added by appending on:

% extract R's register id
R = reg(RegId, _),

% extract Sp's register id
Sp = reg(SpId, _),

RegId \= SpId

This is an over-approximation though, as the sources rule is not exclusive. So gadgets like sub rsp, rsi or add sp, sp, 0x20 would still match.

The built-in helper stack_pivot(Reg) is implemented along these lines. If filtering out things like the sub was needed it could be further constrained by the user.

Guidelines

Completely unbound queries are effectively meaningless (will match everything) and are very slow (have to check everything)
- Some of the helpers have guards to prevent this, but there can still be missed cases
- Example: read_what_where(X, Y)
Constrain register SSA versions with init(R) and final(R) if you want effects
- Without this, queries could match on intermediate states
Filter out noise early
- The fewer gadgets there are to search through, the faster everything works
- Text mode !jmp 0x to eliminate invalid gadgets
- Small, fast prolog queries such as clobbers(rdi) or syscall_gadget to limit to gadgets containing necessary values
For debugging, use ropstar dump to inspect internal gadget representation

Data Model Reference

Base Facts

const(C)
reg(Id, Ssa)
reg_read(Reg)
reg_write(Reg, ValTmp)
mem_read(AddrTmp)
mem_write(AddrTmp, ValTmp)
tmp_write(DstTmp, X)
unop(Op, Arg)
binop(Op, Lhs, Rhs)
exit_kind(Kind)
cond_exit(Guard, Dst)

Registers

Registers take the form reg(Id, Ssa) where Id is an architecture specific index and Ssa is the SSA version of the register. Queries are normalized such that on x86 eax gets replaced with reg(8, _) and on arm lr gets replaced with reg(64, _) and so on. The initial state of a register will have reg(Id, 0) and then each modification will introduce a new SSA version like reg(Id, 1).

Convenience Queries

There are several convenience queries included which cover common cases. These are generally over-approximations.

% the register Dst depends on the initial value of Src (and no other register or memory)
mov_reg(Dst, Src)

% DstReg depends on a load which depends on SrcReg
read_what_where(DstReg, SrcReg)

% there is a store where the address depends on AddrReg and the value depends on ValReg
write_what_where(AddrReg, ValReg)

% Reg depends on a load from the stack
% simple wrapper around read_what_where
stack_load(Reg)

% the final value of the stack pointer depends on a non-SP register
% note: misses the memory dependency case (`pop sp`, etc.)
stack_pivot(Reg)

Helpers

% Reg is modified by the gadget
clobbers(Reg)

% Reg is not modified by the gadget
preserves(Reg)

% constrains the Reg's SSA value to 0 (the initial value of the register)
init(Reg)

% constrains the Reg's SSA value to the max (the final value of the register)
final(Reg)

% gadgets involving a syscall
syscall_gadget

% helpers for generically accessing SP across architectures
sp_reg(SP)
init_sp(SP)
final_sp(SP)

% collect all (transitive) leaf sources (reg_read, const, mem_read)
% Srcs = [reg_read(R), const(C), etc.]
sources(Tmp, Srcs)

% collect immediate (non-transitive) sources
sources_shallow(Tmp, Srcs)

% filter sources to just registers or memory
sources_regs(Srcs, Regs)
sources_mems(Srcs, Mems)

% sources of a reg's value
reg_sources(Reg, Srcs)

% just register sources
reg_regs(Reg, RegSrcs)

% just memory sources
reg_mems(Reg, MemSrcs)

% [deprecated] transitive dependency tracking between all fact types
% this is excessively slow in most cases and can hang the UI
depends(A, B)

% [deprecated] non-transitive dependency
direct_depends(A, B)

Limitations / TODO

Conditional jumps

Gadgets ending in conditional jumps have unexpected behavior. Mid-gadget conditionals are handled by continuing down the fallthrough path. However, as ropstar currently does not disassemble the gadgets itself to validate, the analysis can continue past the displayed end of the gadget in this case. This can manifest as gadgets matching queries that don't line up with the displayed disassembly.

TODO: disassemble the gadgets and validate them (end in a controllable jump) and record the exact bytes involved.

Cycles and other jumps within a gadget

Jumps that land within the gadget (creating cycles or multiple control flow paths) are not handled. These are rare and cycles would break the model.

TODO: if this use case is actually necessary, forward jumps within a gadget wouldn't be too bad to implement. would need to add some scaffolding for phi nodes and then emit facts for all branches simultaneously (or have it re-emit each path as a separate gadget)

Lacks gadget chaining

This was a design decision. However, gadgets with mid-sequence conditional jumps are handled by chaining together the basic blocks internally. The dependency rules also already handle this case through register SSA tracking.

TODO: build an alternative interface for building chains?

Memory aliasing

No rules currently exist for handling memory aliasing. This would probably be more relevant for chains of multiple gadgets than the current single-gadget case.

TODO: could be interesting to add memory aliasing rules?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
pl		pl
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ropstar

Quickstart

Prerequisites

Build and Install

Obtain Gadget Listing

Create an index

Query over the index

Dump a gadget's VEX IR and Prolog facts

Motivation

Semantic Search

Existing Tools

Design Goals

Query Language

Plain Text

Prolog

Guidelines

Data Model Reference

Base Facts

Registers

Convenience Queries

Helpers

Limitations / TODO

Conditional jumps

Cycles and other jumps within a gadget

Lacks gadget chaining

Memory aliasing

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

Zetier/ropstar

Folders and files

Latest commit

History

Repository files navigation

ropstar

Quickstart

Prerequisites

Build and Install

Obtain Gadget Listing

Create an index

Query over the index

Dump a gadget's VEX IR and Prolog facts

Motivation

Semantic Search

Existing Tools

Design Goals

Query Language

Plain Text

Prolog

Guidelines

Data Model Reference

Base Facts

Registers

Convenience Queries

Helpers

Limitations / TODO

Conditional jumps

Cycles and other jumps within a gadget

Lacks gadget chaining

Memory aliasing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages