Skip to content

Question about leading dimension handling for quantization scales in cuBLASLt matmul #305

@Haoyan-Ma

Description

@Haoyan-Ma

I have a question regarding how to handle leading dimensions for quantization scales in cuBLASLtMatmul when the input matrices are not tightly packed.

Suppose the leading dimensions of matrices A and B are larger than their logical sizes (M, N, K), i.e., they are strided / padded layouts rather than contiguous M×K or K×N.

For quantized matmul using:

A_scale = CUBLASLT_MATMUL_MATRIX_SCALE_VEC128_32F

B_scale = CUBLASLT_MATMUL_MATRIX_SCALE_BLK128x128_32F

How should the leading dimensions of the corresponding scale tensors be handled?

Specifically:

1.Are scale tensors always expected to be contiguous in memory?

2.Or do scale tensors also support their own leading dimensions / strides similar to matrices A and B?

3.If A or B has padding due to larger leading dimensions, should the scales reflect the padded layout or only the logical tile layout?

I couldn’t find clear documentation on how scale tensor strides relate to matrix strides, so any clarification would be appreciated.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions