I have a question regarding how to handle leading dimensions for quantization scales in cuBLASLtMatmul when the input matrices are not tightly packed.
Suppose the leading dimensions of matrices A and B are larger than their logical sizes (M, N, K), i.e., they are strided / padded layouts rather than contiguous M×K or K×N.
For quantized matmul using:
A_scale = CUBLASLT_MATMUL_MATRIX_SCALE_VEC128_32F
B_scale = CUBLASLT_MATMUL_MATRIX_SCALE_BLK128x128_32F
How should the leading dimensions of the corresponding scale tensors be handled?
Specifically:
1.Are scale tensors always expected to be contiguous in memory?
2.Or do scale tensors also support their own leading dimensions / strides similar to matrices A and B?
3.If A or B has padding due to larger leading dimensions, should the scales reflect the padded layout or only the logical tile layout?
I couldn’t find clear documentation on how scale tensor strides relate to matrix strides, so any clarification would be appreciated.