Question about leading dimension handling for quantization scales in cuBLASLt matmul

I have a question regarding how to handle leading dimensions for quantization scales in cuBLASLtMatmul when the input matrices are not tightly packed.

Suppose the leading dimensions of matrices A and B are larger than their logical sizes (M, N, K), i.e., they are strided / padded layouts rather than contiguous M×K or K×N.

For quantized matmul using:

    A_scale = CUBLASLT_MATMUL_MATRIX_SCALE_VEC128_32F

    B_scale = CUBLASLT_MATMUL_MATRIX_SCALE_BLK128x128_32F

How should the leading dimensions of the corresponding scale tensors be handled?

Specifically:

  1.Are scale tensors always expected to be contiguous in memory?

  2.Or do scale tensors also support their own leading dimensions / strides similar to matrices A and B?

  3.If A or B has padding due to larger leading dimensions, should the scales reflect the padded layout or only the logical tile layout?

I couldn’t find clear documentation on how scale tensor strides relate to matrix strides, so any clarification would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about leading dimension handling for quantization scales in cuBLASLt matmul #305

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about leading dimension handling for quantization scales in cuBLASLt matmul #305

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions