Demo-131 Cuda graph with optimized paged attention #990

PanZezhong1725 · 2026-01-27T03:29:09Z

No description provided.

…o enable inference on Hygon platform

Signed-off-by: Ceng23333 <441651826@qq.com>

…graph recording - Ensure embedding tensors are on the same device. Change format. - Optimize embedding kernel with vectorized memory access and __ldg - Add vectorized memory access using float4/float2, half2, and bfloat162 - Use __ldg instruction for read-only weight and indices access - Add memory alignment checks to enable vectorized paths - Add __restrict__ keywords for better compiler optimization - Implement dynamic block size selection based on embedding_dim

对 `NineToothedTensor` 进行 C++ 层封装加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子 Add an include guard to `ninetoothed/utils.h`

…oothed/build.py` with `concurrent.futures`

…ild ntops

…stantiate

issue/949 - feat: add silu_and_mul for moore gpu with test pass

…ng nvidia

…ing nvidia

issue/899 - fix: fix causal_softmax and rearrange bug

issue/838 - Cambricon Batched RoPE

issue/1012 - feat: add paged caching for moore gpu referencing nvidia

issue/1001 - feat: add paged attention prefill and decode for moore gpu referencing nvidia

issue/837 - support int32 and int64 in cambricon add

issue/523 - switched to cambricon mlu 1.22 interface

Issue/862 - Fix compilation errors (missing headers, cub namespace) t…

…g nvidia

Issue/972：摩尔平台基于 muDNN 的 w8a8 量化实现，并完善 scaled_mm_int8 python 测试脚本

…for iluvatar

Issue/1008

issue/961: fix metax init with preload

gongchensu and others added 30 commits December 29, 2025 17:04

Issue/862 - Fix compilation errors (missing headers, cub namespace) t…

ab52dea

…o enable inference on Hygon platform

issue/837 - support int32 and int64 in cambricon add

cc1d155

issue/838 - Cambricon Batched RoPE

5848b40

issue/961: fix metax init with preload

1a576d4

Signed-off-by: Ceng23333 <441651826@qq.com>

fix ruff

8150c97

Signed-off-by: Ceng23333 <441651826@qq.com>

issue/523 - switched to 1.22 interface

aac54e1

issue/987 - add .cpp files to ninetoothed includes

1e63710

issue/978 - metax cuda graph impl and wrappings

822a534

issue/900 - support embedding on iluvatar, metax, and moore

835209e

issue/900 - adapt to graph and adjust test script

eb34d4d

issue/900 - maintains classic embedding for devices yet to be worked on

f9761a2

issue/791 fix add_rmsnorm api and rmsnorm module

0c204df

issue/884 - add_rms_norm on iluvatar, metax and moore

dfafc21

issue/632 - adapt to iluvatar core 20

4ddc664

issue/791 - fix add_rmsnorm api on mtx and mth

0611cb1

issue/810 support more ops as graph op

81e5fe9

issue/985 - adjust cxflags and cxxflags for lua scripts

7c5aa16

issue/402 - convenient ninetoothed util

55cd22e

对 `NineToothedTensor` 进行 C++ 层封装加入使用数组作为 `shape` 和 `strides` 创建 `ninetoothed::Tensor` 的方式使用 `ninetoothed::Tensor` 接入九齿的 ReLU 算子 Add an include guard to `ninetoothed/utils.h`

issue/925 - Speed up scripts/build_ntops.py and `src/infiniop/ninet…

32340fc

…oothed/build.py` with `concurrent.futures`

issue/940 - check build result and implicitly require build.py for bu…

ca58118

…ild ntops

issue/935 - add metax include dir for ninetoothed

47843aa

issue/919 - ninetoothed flash attention

6ac8f90

issue/931 - ninetoothed swiglu for nv, il, mtx

5614e1b

issue/923 - ninetoothed kv caching for nv, il, mtx

97eced0

issue/979 optimize paged attention

1c18c04

issue/979 - removed commented paged attn codes

4cd1f68

issue/983 - adapted the optimized paged attention to metax

7a18d24

demo131 - patch lua flags and includes

1fa5629

issue/811 use relax graph capture mode, add compile flag for graph in…

807e5e4

…stantiate

spike-zhu and others added 28 commits February 11, 2026 14:41

issue/899 - fix: fix causal_softmax and rearrange bug

e4bce36

Merge pull request #1009 from InfiniTensor/issue/949

c312f17

issue/949 - feat: add silu_and_mul for moore gpu with test pass

issue/1001 - feat: add paged attention decode for moore gpu referenci…

3d3a277

…ng nvidia

issue/1001 - feat: add paged attention prefill for moore gpu referenc…

6074f7b

…ing nvidia

issue/1012 - feat: add paged caching for moore gpu referencing nvidia

8f710be

Merge pull request #1010 from InfiniTensor/issue/899

513a850

issue/899 - fix: fix causal_softmax and rearrange bug

demo131 - remove fp32 from paged tests

d3e27d8

Merge pull request #839 from InfiniTensor/issue/838

c112132

issue/838 - Cambricon Batched RoPE

Merge pull request #1013 from InfiniTensor/issue/1012

718eaf4

issue/1012 - feat: add paged caching for moore gpu referencing nvidia

Merge pull request #1011 from InfiniTensor/issue/1001

84201ad

issue/1001 - feat: add paged attention prefill and decode for moore gpu referencing nvidia

Merge pull request #879 from InfiniTensor/issue/837

f1b8ab6

issue/837 - support int32 and int64 in cambricon add

Merge pull request #963 from InfiniTensor/issue/523-020

012df56

issue/523 - switched to cambricon mlu 1.22 interface

Merge branch 'demo131' into Issue/862

8d09630

Merge pull request #865 from gongchensu/Issue/862

6ec2ea4

Issue/862 - Fix compilation errors (missing headers, cub namespace) t…

issue/972 - feat: add scaled_mm with muDNN BatchMatMul for moore gpu

d4f726d

issue/972 - feat: add per_channel_quant_int8 for moore gpu referencin…

e1974c6

…g nvidia

issue/972 - feat: adjust scaled_mm_int8 python test

6841663

Merge pull request #1018 from InfiniTensor/issue/972

5675a4a

Issue/972：摩尔平台基于 muDNN 的 w8a8 量化实现，并完善 scaled_mm_int8 python 测试脚本

issue/1008: mv "import infinicore" ahead of "import" torch

bd0c922

issue/1008: adapt lpnorm layernorm softmax rearrange paged_attention …

f46e9f6

…for iluvatar

issue/1008: adapt paged_attention_prefill

7377e71

issue/1008 skip scale_mm compile in iluvatar

034b189

issue/1008: wrap iluvatar change in #ifdef ENABLE_ILUVATAR_API

1c32d14

issue/1008: revert python_test.py

3d54ce8

issue/1022 - patch metax hpcc hrc include

d0f405c

issue/1008: use warpBroadcast api

68026bd

Merge pull request #1019 from InfiniTensor/issue/1008

52f0dcf

Issue/1008

Merge pull request #962 from InfiniTensor/issue/961

1d6527c

issue/961: fix metax init with preload

wooway777 approved these changes Feb 12, 2026

View reviewed changes

wooway777 merged commit 784139b into main Feb 13, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo-131 Cuda graph with optimized paged attention #990

Demo-131 Cuda graph with optimized paged attention #990

Uh oh!

PanZezhong1725 commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Demo-131 Cuda graph with optimized paged attention #990

Demo-131 Cuda graph with optimized paged attention #990

Uh oh!

Conversation

PanZezhong1725 commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants