Add API reference for iris.ccl, iris.ops, and iris.x modules #345

Copilot · 2026-02-03T02:36:55Z

Three core API modules (iris.ccl, iris.ops, iris.x) were undocumented in the reference section, making discovery difficult for users.

Changes

Collective Communication Library (iris.ccl)

Host-side tensor-level collectives: all_reduce, all_gather, all_to_all, reduce_scatter
Configuration via Config class with algorithm selection and tuning parameters
ReduceOp enum for reduction operations

Fused GEMM+CCL Operations (iris.ops)

Computation-communication overlap primitives: matmul_all_reduce, all_gather_matmul, matmul_all_gather, matmul_reduce_scatter
Workspace management via FusedConfig and FusedWorkspace
Accessible through shmem.ops namespace or standalone

Device-Side Tile-Level Primitives (iris.x)

Fine-grained control in Triton kernels via TileView, TensorView, DeviceContext
Five all-reduce algorithms: atomic, ring, two-shot, one-shot, spinlock
Algorithm selection via AllReduceConfig
Helper functions: tile_layout, tile_ptr, offset_ptr

Documentation Structure

docs/reference/
├── ccl/
│   ├── overview.md      # Usage patterns, async operations
│   ├── operations.md    # Autodoc for CCL methods
│   └── config.md        # Config and ReduceOp classes
├── ops/
│   ├── overview.md      # Fused operations, workspace management
│   ├── operations.md    # Autodoc for ops methods
│   └── config.md        # FusedConfig and FusedWorkspace
└── x/
    ├── overview.md      # Tile-level API, algorithm selection
    ├── core.md          # Core abstractions and helpers
    └── operations.md    # Device-side collectives

Each section includes usage examples and follows the existing Triton/Gluon API reference patterns.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Documentation]: Add API reference for iris.ccl, iris.ops and iris.x</issue_title>
<issue_description>### Description of errors

https://github.com/ROCm/iris/tree/main/docs/reference

https://github.com/ROCm/iris/blob/main/docs/reference/api-reference.md

Attach any links, screenshots, or additional evidence you think will be helpful.

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [Documentation]: Add API reference for iris.ccl, iris.ops and iris.x #344

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Copilot

Pull request overview

This pull request adds comprehensive API reference documentation for three core Iris modules (iris.ccl, iris.ops, and iris.x) that were previously undocumented. The documentation follows the established pattern from existing Triton and Gluon reference sections.

Changes:

Added structured documentation for collective communication operations (CCL), fused GEMM+CCL operations (ops), and device-side tile-level primitives (x)
Created overview pages with usage patterns and examples for each module
Added autodoc-based API reference pages for operations, configuration classes, and core abstractions
Updated the main API reference navigation to include links to the new sections

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`docs/reference/api-reference.md`	Added navigation entries for the three new modules in the main API reference index
`docs/reference/ccl/overview.md`	Overview of collective communication library with usage patterns and examples
`docs/reference/ccl/operations.md`	Autodoc references for CCL collective operations (all_reduce, all_gather, all_to_all, reduce_scatter)
`docs/reference/ccl/config.md`	Autodoc references for CCL configuration classes (Config, ReduceOp)
`docs/reference/ops/overview.md`	Overview of fused GEMM+CCL operations with usage patterns and workspace management examples
`docs/reference/ops/operations.md`	Autodoc references for fused operations (matmul_all_reduce, all_gather_matmul, matmul_all_gather, matmul_reduce_scatter) and OpsNamespace
`docs/reference/ops/config.md`	Autodoc references for ops configuration classes (FusedConfig, FusedWorkspace)
`docs/reference/x/overview.md`	Overview of device-side tile-level primitives with algorithm selection and usage patterns
`docs/reference/x/core.md`	Autodoc references for core abstractions (TileView, TensorView, DeviceContext, AllReduceConfig) and helper functions
`docs/reference/x/operations.md`	Autodoc references for device-side collective operations (all-reduce variants, all_gather, all_to_all, reduce_scatter, gather)

Copilot · 2026-02-03T03:01:15Z

docs/reference/api-reference.md

+- [iris.ccl - Collective Communication](ccl/overview.md)
+  - [Operations](ccl/operations.md)
+  - [Configuration](ccl/config.md)
+- [iris.ops - Fused GEMM+CCL](ops/overview.md)
+  - [Operations](ops/operations.md)
+  - [Configuration](ops/config.md)
+- [iris.x - Device-Side Primitives](x/overview.md)
+  - [Core Abstractions](x/core.md)
+  - [Operations](x/operations.md)


The new documentation sections for iris.ccl, iris.ops, and iris.x are not included in the Sphinx table of contents configuration file (docs/sphinx/_toc.yml). This means these pages will not be included in the generated documentation navigation and may not be properly built.

You need to add entries to docs/sphinx/_toc.yml under the "Reference" section, similar to how triton and gluon are structured. The entries should follow this pattern after the gluon section.

Copilot · 2026-02-03T03:01:15Z

docs/reference/x/overview.md

+    ctx.all_reduce(tile, src_view, dst_view)
+```
+
+## Core Abstractions
+
+- **TileView**: Represents a tile's position and size in a 2D grid
+- **TensorView**: Represents a tensor's memory layout (pointer, shape, strides)
+- **DeviceContext**: Holds rank, world size, and heap bases for communication
+- **AllReduceConfig**: Configuration for selecting all-reduce algorithms
+
+## Usage Patterns
+
+### Using DeviceContext (Recommended)
+
+The `DeviceContext` provides a clean API for calling collectives:
+
+```python
+@triton.jit
+def kernel(input_ptr, output_ptr, ...):
+    tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
+    src_view = iris.x.TensorView(input_ptr, M, N, stride_m, stride_n)
+    dst_view = iris.x.TensorView(output_ptr, M, N, stride_m, stride_n)
+    ctx = iris.x.DeviceContext(rank, world_size, heap_bases)
+
+    # Call collectives with default algorithms
+    ctx.all_reduce(tile, src_view, dst_view)
+    ctx.all_gather(tile, src_view, dst_view, dim=0)
+    ctx.all_to_all(tile, src_view, dst_view, N_per_rank)
+    ctx.reduce_scatter(tile, src_view, dst_view)
+```
+
+### Algorithm Selection
+
+Use `AllReduceConfig` to select specific all-reduce algorithms:
+
+```python
+@triton.jit
+def kernel(input_ptr, output_ptr, locks_ptr, ...):
+    ctx = iris.x.DeviceContext(rank, world_size, heap_bases)
+
+    # Use ring algorithm
+    config = iris.x.AllReduceConfig("ring")
+    ctx.all_reduce(tile, src_view, dst_view, config=config)
+
+    # Use spinlock algorithm with locks
+    config = iris.x.AllReduceConfig("spinlock", locks_ptr)
+    tile_id = pid_m * num_tiles_n + pid_n
+    ctx.all_reduce(tile, src_view, dst_view, config=config, tile_id=tile_id)
+```
+
+### Standalone Functions
+
+You can also call operations directly without `DeviceContext`:
+
+```python
+@triton.jit
+def kernel(input_ptr, output_ptr, ...):
+    ctx = iris.x.DeviceContext(rank, world_size, heap_bases)
+
+    # Call operations directly
+    iris.x.all_reduce_atomic(tile, src_view, dst_view, ctx)
+    iris.x.all_reduce_ring(tile, src_view, dst_view, ctx)
+    iris.x.all_gather(tile, src_view, dst_view, dim, ctx)


The documentation shows DeviceContext methods being called (e.g., ctx.all_reduce(tile, src_view, dst_view)), but the actual API uses standalone functions where ctx is passed as the last parameter (e.g., iris.x.all_reduce_atomic(tile, src_view, dst_view, ctx)).

Based on the actual implementation in iris/x/core.py lines 466-489, DeviceContext does not have collective methods. The correct usage is to call the standalone functions from iris.x module, passing ctx as a parameter.

The examples throughout this file (lines 30, 55-58, 90-92) need to be corrected to match the actual API.

Copilot · 2026-02-03T03:01:15Z

docs/reference/x/overview.md

+    # Use ring algorithm
+    config = iris.x.AllReduceConfig("ring")
+    ctx.all_reduce(tile, src_view, dst_view, config=config)
+
+    # Use spinlock algorithm with locks
+    config = iris.x.AllReduceConfig("spinlock", locks_ptr)


The AllReduceConfig initialization examples show string arguments like AllReduceConfig("ring"), but the actual implementation in iris/x/core.py (lines 452-462) expects an integer variant_code (0-4) and a locks_ptr parameter.

The correct usage should be:

AllReduceConfig(1, locks_ptr) for ring (not AllReduceConfig("ring"))

AllReduceConfig(4, locks_ptr) for spinlock (not AllReduceConfig("spinlock", locks_ptr))

The mapping is documented in core.py lines 441-446:

0 = atomic

1 = ring

2 = one_shot

3 = two_shot

4 = spinlock

All examples using AllReduceConfig need to use integer codes, not strings.

Suggested change

# Use ring algorithm

config = iris.x.AllReduceConfig("ring")

ctx.all_reduce(tile, src_view, dst_view, config=config)

# Use spinlock algorithm with locks

config = iris.x.AllReduceConfig("spinlock", locks_ptr)

# Use ring algorithm (variant_code = 1)

config = iris.x.AllReduceConfig(1, locks_ptr)

ctx.all_reduce(tile, src_view, dst_view, config=config)

# Use spinlock algorithm with locks (variant_code = 4)

config = iris.x.AllReduceConfig(4, locks_ptr)

Copilot · 2026-02-03T03:01:16Z

docs/reference/ops/overview.md

+### Via shmem.ops namespace (recommended)
+
+```python
+shmem = iris.iris(heap_size)


The code example uses an undefined variable heap_size. It should either define it (e.g., heap_size = 2**30) or use a literal value like in the first example (line 14: iris.iris(heap_size=2**30)).

Copilot · 2026-02-03T03:01:16Z

docs/reference/ops/overview.md

+```python
+import iris.ops as ops
+
+shmem = iris.iris(heap_size)


The code example uses an undefined variable heap_size. It should either define it (e.g., heap_size = 2**30) or use a literal value like in the first example (line 14: iris.iris(heap_size=2**30)).

Copilot · 2026-02-03T03:01:16Z

docs/reference/x/overview.md

+def my_kernel(input_ptr, output_ptr, M, N, 
+              rank, world_size, heap_bases,


The kernel signature is missing tl.constexpr annotations for M, N parameters. TensorView requires dimensions and strides to be constexpr (as documented in iris/x/core.py lines 264-265 and 269-272).

The parameters M and N should be annotated as:

def my_kernel(input_ptr, output_ptr, M: tl.constexpr, N: tl.constexpr, rank: tl.constexpr, world_size: tl.constexpr, heap_bases, BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):

Additionally, stride_m and stride_n should either be passed as constexpr parameters or computed from constexpr values. See tests/x/test_all_to_all.py lines 21-27 for the correct pattern.

Suggested change

def my_kernel(input_ptr, output_ptr, M, N,

rank, world_size, heap_bases,

def my_kernel(input_ptr, output_ptr, M: tl.constexpr, N: tl.constexpr,

rank: tl.constexpr, world_size: tl.constexpr, heap_bases,

Initial plan

2467cda

Copilot AI assigned Copilot and neoblizz Feb 3, 2026

Copilot started work on behalf of neoblizz February 3, 2026 02:37 View session

Copilot AI and others added 3 commits February 3, 2026 02:39

Add API reference documentation for iris.ccl, iris.ops and iris.x

69cedd0

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Update _toc.yml to include ccl, ops, and x documentation sections

88db72f

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Add API reference for iris.ccl, iris.ops, and iris.x modules

1dd0a88

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add API reference for iris.ccl, iris.ops, and iris.x~~ Add API reference for iris.ccl, iris.ops, and iris.x modules Feb 3, 2026

Copilot AI requested a review from neoblizz February 3, 2026 02:44

Copilot finished work on behalf of neoblizz February 3, 2026 02:44

neoblizz marked this pull request as ready for review February 3, 2026 02:54

neoblizz requested a review from mawad-amd as a code owner February 3, 2026 02:54

Copilot AI review requested due to automatic review settings February 3, 2026 02:54

neoblizz requested a review from BKP as a code owner February 3, 2026 02:54

neoblizz approved these changes Feb 3, 2026

View reviewed changes

Copilot started reviewing on behalf of neoblizz February 3, 2026 02:54 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API reference for iris.ccl, iris.ops, and iris.x modules #345

Add API reference for iris.ccl, iris.ops, and iris.x modules #345

Uh oh!

Copilot AI commented Feb 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def my_kernel(input_ptr, output_ptr, M, N,
		rank, world_size, heap_bases,

Add API reference for iris.ccl, iris.ops, and iris.x modules #345

Are you sure you want to change the base?

Add API reference for iris.ccl, iris.ops, and iris.x modules #345

Uh oh!

Conversation

Copilot AI commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Documentation Structure

Attach any links, screenshots, or additional evidence you think will be helpful.

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 3, 2026 •

edited

Loading