Skip to content

Conversation

@ludamad
Copy link
Collaborator

@ludamad ludamad commented Feb 9, 2026

Create an experimental version of bb with a bump pointer allocator that preallocates 1GB of RAM per thread and never deallocates memory. This allows us to measure the performance impact and total time loss attributable to memory allocation operations in the current implementation. This is a temporary branch for profiling purposes only.

Adds a thread-local bump pointer allocator behind -DBUMP_ALLOCATOR
that preallocates 1GB per thread via mmap and never frees. This allows
measuring the performance impact of malloc/free in bb.
Instruments operator new/delete with backward-cpp stack traces, sizes,
lifetimes, and thread affinity. Dumps a report at exit sorted by peak
concurrent bytes to identify candidates for region-based allocation.

Build with: cmake --preset alloc-profiler
- Add -fno-omit-frame-pointer for reliable backtrace() stack walking
- Use RelWithDebInfo build type for addr2line symbolization
- Replace backward-cpp symbolization with raw PC dump (avoids SIGSEGV
  from re-entrant allocation in TraceResolver during atexit)
- Fix static destruction order by registering dump_report via atexit
  inside state() initialization
- Use net CRS factory in chonk_bench when ALLOC_PROFILER defined
Use dladdr to get binary base address and subtract it from all PC
addresses in the report. This makes addresses directly usable with
addr2line without needing to know the ASLR base.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant