Add unified benchmarking harness (iris.bench) #368

Copilot · 2026-02-07T20:43:41Z

Benchmarking code across examples/ and benchmark/ reimplements identical warmup loops, timing, statistics, and printing—~100 lines of boilerplate per file. This adds iris.bench, a shared infrastructure to eliminate duplication and standardize measurements.

Changes

Core module (iris/bench.py)

BenchmarkResult: dataclass storing mean/median/p50/p99/min/max with JSON export
BenchmarkRunner: context manager for parameter sweeps with barrier support
@benchmark: decorator for simple function benchmarking
Utilities: torch_dtype_from_str(), compute_bandwidth_gbps()

Integration

Exposed via iris.bench in __init__.py
Internally uses existing iris.do_bench for timing
Backward compatible—existing benchmarks unchanged

Testing & Documentation

test_bench.py: full suite (GPU required)
test_bench_basic.py: unit tests (no GPU)
API reference, migration guide, examples

Usage

Before (~100 lines):

parser.add_argument("-w", "--num_warmup", type=int, default=1)
# ... dtype conversion, manual warmup, timing, stats
triton_ms = iris.do_bench(run_exp, barrier, n_repeat=args["num_experiments"])
print(f"Time: {triton_ms:.4f} ms")

After (~50 lines):

from iris.bench import BenchmarkRunner

runner = BenchmarkRunner(name="gemm", barrier_fn=shmem.barrier)
result = runner.run(fn=operation, warmup=5, iters=50, params={"size": 1024})
result.print_summary()  # automatic mean/p50/p99
runner.save_json("results.json")  # CI-ready

Enables consistent CI performance tracking and reduces maintenance burden by centralizing benchmark infrastructure.

Original prompt

This section details on the original issue you should resolve

<issue_title>Unified benchmarking harness</issue_title>
<issue_description>Benchmarking code in Iris is currently scattered across benchmark/ and examples/, with each script re-implementing the same logic (warmup loops, synchronization, timing, averaging, printing). Over time this has led to copy-pasted code, inconsistent measurement patterns, and benchmarks that are hard to reuse or automate.

It would be useful to introduce a small, shared benchmarking harness (e.g. iris.bench) that standardizes:

warmup and iteration handling

timing and synchronization

basic statistics (mean / p50 / p99)

parameter sweeps

structured result output (e.g. JSON or dict)

This would allow both examples/ and benchmark/ to share the same timing infrastructure, while keeping example code focused on semantics rather than measurement boilerplate.

Example (sketch):

from iris.bench import benchmark

@benchmark(name="gemm_all_scatter", warmup=5, iters=50)
def run(size, world_size):
# setup tensors
# launch Iris kernel
kernel(...)

Internally you can use iris do bench and any code we have. Such a harness would significantly reduce duplicated code, improve maintainability, and make it easier to add consistent benchmarks and eventually integrate CI performance tracking.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Unified benchmarking harness #367

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd · 2026-02-07T20:59:37Z

docs/README_bench.md

+
+# Simple decorator-based benchmarking
+@benchmark(name="my_kernel", warmup=5, iters=50)
+def run_kernel():


@copilot the decorator way is the only way we need. Remove everything else. Also, it is safe to assume that the bench harness will construct the iris instance and pass it to the user benchmark function. When using the decorator the user will need to also annotate parts of the code that are presetup (eg tensor allocation), preamble per run (eg resetting flags) and code to actually benchmark (kernel launch ).

@benchmark

- Remove BenchmarkRunner class - keep only @benchmark decorator - Decorator now constructs iris instance and passes shmem to user function - Add @setup, @Preamble, @measure annotations for code organization - Update all documentation to reflect decorator-only approach - Simplify examples to demonstrate new pattern - Update tests to match new API Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Initial plan

4beb34e

Copilot AI assigned Copilot and mawad-amd Feb 7, 2026

Copilot started work on behalf of mawad-amd February 7, 2026 20:44 View session

Copilot AI and others added 3 commits February 7, 2026 20:52

Add unified benchmarking harness (iris.bench)

789dfb2

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add migration documentation and fix linting issues

0aa03b8

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Add README for bench module

74d3c62

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add unified benchmarking harness for Iris~~ Add unified benchmarking harness (iris.bench) Feb 7, 2026

Copilot finished work on behalf of mawad-amd February 7, 2026 20:57

Copilot AI requested a review from mawad-amd February 7, 2026 20:57

mawad-amd reviewed Feb 7, 2026

View reviewed changes

Copilot started work on behalf of mawad-amd February 7, 2026 21:02 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unified benchmarking harness (iris.bench) #368

Add unified benchmarking harness (iris.bench) #368

Copilot AI commented Feb 7, 2026 •

edited

Loading

Uh oh!

mawad-amd Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add unified benchmarking harness (iris.bench) #368

Are you sure you want to change the base?

Add unified benchmarking harness (iris.bench) #368

Conversation

Copilot AI commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Usage

Comments on the Issue (you are @copilot in this section)

Uh oh!

mawad-amd Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 7, 2026 •

edited

Loading