crc: Add vector-accelerated assembly code for RISC-V 64-bit architecture with Zvbc instruction set#350
Conversation
cd1a52d to
3259ef2
Compare
|
Hi @yinlenree. I suggest you include commit title like |
|
I'll review it carefully next week. For now, I’ve noticed two issues: If the compiler doesn't support those -march options, the newly added CRC files will still be compiled, leading to build failures. Some files have trailing blank lines at the end,please remove all of them. Some files have mixed indentation using both spaces and tabs. Please make them consistent. (You can see the differences using git diff.) |
|
Understood, thank you both for the suggestions. I'll first address the compilation-related issues. This might take some time since I haven't worked on this aspect before. |
adef254 to
2379b73
Compare
configure.ac
Outdated
| __asm__ volatile( | ||
| ".option arch, +zbc\n" | ||
| "clmul zero, zero, zero\n" | ||
| "clmulh zero, zero, zero\n" |
There was a problem hiding this comment.
Please use either tabs or spaces consistently for indentation, and check other files as well.
include/riscv64_multibinary.h
Outdated
| #include <sys/auxv.h> | ||
| #include <asm/hwprobe.h> // 包含 RISC-V 硬件探测相关的宏定义(如 RISCV_HWPROBE_EXT_ZBB) | ||
| #include <unistd.h> // 提供系统调用相关声明 | ||
| #include <sys/syscall.h> // 定义 __NR_riscv_hwprobe 系统调用号 |
There was a problem hiding this comment.
Please use English comments, and check other files as well.
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
| DEFINE_INTERFACE_DISPATCHER(crc32_gzip_refl) | ||
| { | ||
| #if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC && HAVE_ZVBB | ||
| struct riscv_hwprobe _probe = INIT_PROBE_STRUCT(); |
There was a problem hiding this comment.
All these lines (from 131 to 135) are always the same throughout this file. Would be better to move it to a separate function.
There was a problem hiding this comment.
I wrote a function to determine experimental extensions, while the logic for standard extensions remains unchanged. Do you think this is acceptable?
DEFINE_INTERFACE_DISPATCHER(crc16_t10dif)
{
#if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC && HAVE_ZVBB
unsigned long auxval = getauxval(AT_HWCAP);
if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZVBB", "ZBC")) {
return crc16_t10dif_vclmul;
}
#endif
return crc16_t10dif_base;
}
|
Understood. I will revise my code and comments according to the requirements. Thank you both for the review. |
2379b73 to
740d8a4
Compare
include/riscv64_multibinary.h
Outdated
| #define EXT_CODE(ext) ( \ | ||
| strcmp(ext, "ZBC") == 0 ? 7 : \ | ||
| strcmp(ext, "ZVBB") == 0 ? 17 : \ | ||
| strcmp(ext, "ZVBC") == 0 ? 18 : -1) |
There was a problem hiding this comment.
Please use macros instead of these numbers, for example: RISCV_HWPROBE_EXT_ZBC, RISCV_HWPROBE_EXT_ZVBB, ...
include/riscv64_multibinary.h
Outdated
| static inline int check_riscv_extensions(const char **extensions, size_t count) | ||
| { | ||
| struct riscv_hwprobe _probe = INIT_PROBE_STRUCT(); | ||
| syscall(__NR_riscv_hwprobe, &_probe, 1, 0, NULL, 0); |
There was a problem hiding this comment.
It seems that using hwprobe requires checking the kernel version?
There was a problem hiding this comment.
Okay, I will replace the numbers with macros. Additionally, regarding the extension check, I plan to use version 6.8 as the cutoff—versions below this will not utilize the ZVBC vector acceleration.
There was a problem hiding this comment.
Considering that the linux/version.h file may change after a kernel upgrade, making it impossible to determine whether the current kernel supports RISC-V extension macros using the LINUX_VERSION_CODE macro, I plan to directly check whether these extension macros are defined to decide the code distribution. The code is as follows.
#if defined(RISCV_HWPROBE_EXT_ZBC) && defined(RISCV_HWPROBE_EXT_ZVBB) && defined(RISCV_HWPROBE_EXT_ZVBC)
#define EXT_CODE(ext) ( \
strcmp(ext, "ZBC") == 0 ? RISCV_HWPROBE_EXT_ZBC : \
strcmp(ext, "ZVBB") == 0 ? RISCV_HWPROBE_EXT_ZVBB : \
strcmp(ext, "ZVBC") == 0 ? RISCV_HWPROBE_EXT_ZVBC : \
-1)
#endif
...
static inline int check_riscv_extensions(const char **extensions, size_t count)
{
#ifdef EXT_CODE
struct riscv_hwprobe _probe = INIT_PROBE_STRUCT();
syscall(__NR_riscv_hwprobe, &_probe, 1, 0, NULL, 0);
for (size_t i = 0; i < count; i++) {
if (!(_probe.value & EXT_CODE(extensions[i]))) {
return 0;
}
}
return 1;
#else
return 0;
#endif
}
Do you think this is okay?
There was a problem hiding this comment.
My understanding is that the detection works by checking whether asm/hwprobe.h exists in the kernel, since these files didn’t exist before.
There was a problem hiding this comment.
I found that in the kernel code from versions 6.4 to 6.7, although there is an hwprobe.h file in the include/asm directory, it lacks macros for offsets of extensions like ZVBC, ZBC, etc. These macros were only defined after version 6.8.
Additionally, I noticed that the offset for the V standard extension was defined in version 6.5. Does this mean I also need to check the definition of the V extension offset?
I'm unsure whether there was an interface for detecting extensions like ZVBC in kernels from versions 6.4 to 6.7. I have little experience in this area. Do you have any suggestions?
Here's the URL for the hwprobe.h file in Linux kernel 6.7 from the official repository:
https://github.com/torvalds/linux/blob/v6.7/arch/riscv/include/asm/hwprobe.h
The macro definitions are located here.
https://github.com/torvalds/linux/blob/v6.7/arch/riscv/include/uapi/asm/hwprobe.h
There was a problem hiding this comment.
Maybe we can directly detect this macro, similar to how DPDK introduced hwprobe, with a minimum requirement of kernel 6.8?
https://inbox.dpdk.org/dev/20240827153230.52880-2-daniel.gregory@bytedance.com/
There was a problem hiding this comment.
Yes, the bitmasks for ZVBC and ZBC were defined starting from version 6.8. I plan to add detection macros for the hwprobe.h file during the compilation phase, as well as detection macros for the bitmasks of these three extensions: ZVBC, ZVBB, and ZBC.
fe7ba9c to
0d79c34
Compare
|
Please test what happens if the kernel does not have hwprobe support, for example, by changing |
0d79c34 to
ddd6706
Compare
|
@sunyuechi I forgot to add the conditional check before including the asm/hwprobe.h header file. I've now fixed it and optimized the code, replacing time-consuming instructions and removing the Zvbb extension. |
|
Could you review this PR again, @sunyuechi? I am thinking of first merging the "prefetch" PR and then this PR (after they are reviewed, of course). |
|
@pablodelara Okay, I'll check it tomorrow. |
crc/riscv64/crc_common_vclmul.h
Outdated
| #define vec_15 v15 | ||
| #define vec_16 v16 | ||
| #define vec_17 v17 | ||
| #define vec_18 v18 |
There was a problem hiding this comment.
#define tmp_0 t0
#define tmp_1 t1
#define tmp_2 t2
#define tmp_3 t3
#define tmp_4 t4
#define tmp_5 t5
#define vec_0 v0
#define vec_1 v1
...
#define vec_10 v10
#define vec_11 v11
#define vec_12 v12
Please remove these #define statements that do not improve readability, these t and v registers are already sufficiently clear on their own.
crc/riscv64/crc_common_vclmul.h
Outdated
| #define len a2 | ||
|
|
||
| // return | ||
| #define crc_ret a0 |
There was a problem hiding this comment.
Okay, understood. I will remove these redundant and unused #define statements.
ddd6706 to
a2ee08c
Compare
| vxor.vv v0, v8, v3 | ||
|
|
||
| addi sp, sp, -16 | ||
| vse64.v v0, (sp) |
There was a problem hiding this comment.
t5 and t6 are unused and can be utilized to eliminate stack operations.
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
| vslideup.vi v8, v9, 1 | ||
| vxor.vv v0, v8, v3 | ||
|
|
||
| addi sp, sp, -16 |
There was a problem hiding this comment.
t5 and t6 are unused and can be utilized to eliminate stack operations.
| addi t4, t4, 16 | ||
| crc_fold_512b_to_128b | ||
|
|
||
| addi sp, sp, -16 |
There was a problem hiding this comment.
Please use t6 to reduce stack operations (and also check other files as well).
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
| ret | ||
|
|
||
| .crc_fold: | ||
| # Initialize vector registers |
There was a problem hiding this comment.
This comment seems to be describing the purpose of vset, but it feels a bit unclear. Could we adjust it slightly?
| ret | ||
|
|
||
| .crc_fold: | ||
| # Initialize vector registers |
There was a problem hiding this comment.
This comment seems to be describing the purpose of vset, but it feels a bit unclear. Could we adjust it slightly?
There was a problem hiding this comment.
Okay, I will replace stack pointer (sp) operations with idle registers in all files and improve the readability of comments in the code.
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
| { | ||
| #if HAVE_RVV && HAVE_ZBC && HAVE_ZVBC | ||
| unsigned long auxval = getauxval(AT_HWCAP); | ||
| if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZBC")) { |
There was a problem hiding this comment.
According to the RISC-V manual,
The Zvknhb and Zvbc Vector Crypto Extensions — and accordingly the composite extensions Zvkn and Zvks — require a Zve64x base, or application ("V") base Vector Extension.
Since the vector instructions you are using exist in both Zve64x and V, it seems that if Zvbc is detected, there’s no need to additionally check for the V extension. Please also review whether any changes are needed in the compilation part regarding this.
There was a problem hiding this comment.
Got it. I will try to place the detection of Zvbc before that of the V extension, so as to omit the check for the V extension. Thank you.
a2ee08c to
ade80d5
Compare
|
I have retained the stack pointer operations when calling crc32_iscsi_refl_vclmul, which are used to save the call information. |
|
It seems that the check in the file |
Sorry, I forgot about this. Now I have removed the HAVE_RVV macro from both the dispatcher and the assembly file. |
ade80d5 to
56e655a
Compare
| .word 0x4ba80000, 0xc01f0000, 0xd7710000, 0x5cc60000, 0xf9ad0000, 0x721a0000, 0x65740000, 0xeec30000 | ||
| .word 0xa4150000, 0x2fa20000, 0x38cc0000, 0xb37b0000, 0x16100000, 0x9da70000, 0x8ac90000, 0x017e0000 | ||
| .word 0x1f650000, 0x94d20000, 0x83bc0000, 0x080b0000, 0xad600000, 0x26d70000, 0x31b90000, 0xba0e0000 | ||
| .word 0xf0d80000, 0x7b6f0000, 0x6c010000, 0xe7b60000, 0x42dd0000, 0xc96a0000, 0xde040000, 0x55b30000 |
There was a problem hiding this comment.
Can you extract a .h file like crc32/64?
(crc/riscv64/crc16_t10dif_vclmul.S and crc/riscv64/crc16_t10dif_copy_vclmul.S)
Since the data is completely identical, many parts of the crc_fold_loop calculation .. are also the same.
There was a problem hiding this comment.
I did not create a .h header file for crc16_t10dif_copy because its procedural code involves memory copying compared to other algorithm implementations. I only extracted the data for the calculation. Do you think this is acceptable?
crc/riscv64/crc_riscv64_dispatcher.c
Outdated
| { | ||
| #if HAVE_ZBC && HAVE_ZVBC | ||
| unsigned long auxval = getauxval(AT_HWCAP); | ||
| if (auxval & HWCAP_RV('V') && CHECK_RISCV_EXTENSIONS("ZVBC", "ZBC")) { |
There was a problem hiding this comment.
The runtime v check can also be removed.
crc/riscv64/crc16_t10dif_vclmul.S
Outdated
|
|
||
| .crc_table_loop: | ||
| lbu a4, 0(a1) | ||
| add a1, a1, 1 |
There was a problem hiding this comment.
This may cause errors in certain environments. For immediates, please use the standard addi whenever possible (and please check other files as well).
56e655a to
cf87b79
Compare
| .set .lanchor_crc_tab,. + 0 | ||
| .type crc64_table_rocksoft_refl, %object | ||
| .size crc64_table_rocksoft_refl, 2048 | ||
| crc64_table_rocksoft_refl: |
There was a problem hiding this comment.
It looks like there is no naming conflict with C here. How about changing it to use the same name?
crc64_table_rocksoft_refl -> crc64_rocksoft_refl_table (including other files).
crc/riscv64/Makefile.am
Outdated
| crc/riscv64/crc64_jones_refl_vclmul.S \ | ||
| crc/riscv64/crc64_jones_norm_vclmul.S \ | ||
| crc/riscv64/crc64_rocksoft_refl_vclmul.S \ | ||
| crc/riscv64/crc64_rocksoft_norm_vclmul.S No newline at end of file |
There was a problem hiding this comment.
All files end with the warning "No newline at end of file", please add it.
There was a problem hiding this comment.
Understood, it has been modified.
cf87b79 to
67f153b
Compare
Maybe you could remove the description about In addition, has the performance gap between using only Zbc and using Zbc+Zvbc been tested? (Since it’s unclear whether Zvbc might cause a performance regression.) It would be helpful if you could provide some performance analysis or other perspectives comparing this PR’s implementation: #299 |
|
I'm testing on a QEMU virtual machine, and since the execution bandwidth of vector and scalar instructions is uncertain, I haven't compared the performance of Zvbc + Zbc versus Zbc alone on QEMU. |
|
After the latest updates on main, this needs rebase. |
67f153b to
d42e43b
Compare
|
@sunyuechi could you review the PR again? |
|
As discussed earlier, I hope to provide clearer performance test results or performance-related explanations, similar to |
|
@sunyuechi are those perf test results expected from @yinlenree? |
d42e43b to
5313ac9
Compare
|
Hi, could you summarize what the major recent changes did? |
|
I also recommend rebasing the PR against latest master. |
|
OK, during this period, I optimized the single-vector folding approach in the main loop to a method using vector-plus-scalar folding, maximizing utilization of the execution units within the core. The performance test of "make perf" shows this method outperforms all previous implementations, and the performance data has already been sent to you via email. @sunyuechi |
d75406a to
e674fbb
Compare
configure.ac
Outdated
| AC_MSG_CHECKING([whether compiler supports RISC-V Extension (ZBB)]) | ||
| AC_COMPILE_IFELSE( | ||
| [AC_LANG_PROGRAM([#include <asm/hwprobe.h>], [ | ||
| int a = RISCV_HWPROBE_EXT_ZBC; |
configure.ac
Outdated
| ".option arch, +v, +zvbc\n" | ||
| "vsetivli zero, 2, e64, m1, ta, ma\n" | ||
| "vclmul.vv v0, v0, v0\n" | ||
| "vclmulh.vv v0, v0, v0\n" |
There was a problem hiding this comment.
It’s sufficient to check either cmul or cmulh
configure.ac
Outdated
| ".option arch, +v, +zvbc\n" | ||
| "vsetivli zero, 2, e64, m1, ta, ma\n" | ||
| "vclmul.vv v0, v0, v0\n" | ||
| "vclmulh.vv v0, v0, v0\n" |
There was a problem hiding this comment.
It’s sufficient to check either vcmul or vcmulh
| #else /* __ASSEMBLY__ */ | ||
| #include <sys/auxv.h> | ||
| #if HAVE_HWPROBE_H | ||
| #include <asm/hwprobe.h> |
There was a problem hiding this comment.
Newer versions of glibc use sys/hwprobe.h, so a similar approach should be used in the relevant places.
#if HAVE_SYS_HWPROBE_H
#include <sys/hwprobe.h>
#elif HAVE_ASM_HWPROBE_H
#include <asm/hwprobe.h>
| check_riscv_extensions((const char*[]){ __VA_ARGS__ }, \ | ||
| sizeof((const char*[]){ __VA_ARGS__ })/sizeof(const char*)) | ||
| #endif | ||
|
|
There was a problem hiding this comment.
The compile-time logic around hwprobe here is a bit odd. It might be better to wrap all hwprobe-related code together, with an outer check for hwprobe itself and inner checks for hwprobe macros such as ZBC, ZBB, and ZVBC, for example:
#if HAVE_SYS_HWPROBE_H || HAVE_ASM_HWPROBE_H
#if zbc zbb ...
|
|
||
| slli a0, a0, 32 | ||
| la t4, .shuffle_data_mask | ||
| addi sp, sp, -120 |
There was a problem hiding this comment.
According to the calling convention, it should be 16-byte aligned.
In the standard RISC-V calling convention, the stack grows downward and the stack pointer is always kept 16-byte aligned.
| clmulh a5, a4, s1 | ||
| clmulh ra, a4, s3 | ||
| clmulh gp, a4, s5 | ||
| clmulh tp, a4, s7 |
There was a problem hiding this comment.
Using ra, gp, tp as temporary registers violates ABI conventions. Is this necessary? If so, comments should be added to explain this.
There was a problem hiding this comment.
Got it. I've applied the changes as requested above.
e674fbb to
81dcddc
Compare
| fi | ||
| if test "x$sys_hwprobe_h" = "xyes"; then | ||
| AC_DEFINE([HAVE_SYS_HWPROBE_H], [1], [Define if sys/hwprobe.h exists]) | ||
| AM_CONDITIONAL([HAVE_SYS_HWPROBE_H], [true]) |
There was a problem hiding this comment.
CI is reporting an issue on x86: configure: error: conditional "HAVE_SYS_HWPROBE_H" was never defined.
There was a problem hiding this comment.
Got it. I forgot the definition in configure.ac. I have updated the file accordingly.
The CRC module of ISA-L has been accelerated using RISC-V's V, Zbc, Zbb and Zvbc, instruction sets, implementing data folding and Barrett reduction optimizations. Signed-off-by: Ji Dong <dong.ji1@zte.com.cn>
81dcddc to
aed015e
Compare
|
@sunyuechi could you review this last revision? Thanks! |
|
@sunyuechi sorry for pinging you again, but would be good to have this reviewed ASAP. Release date is end of this month and I would like new optimizations to be merged by the end of this week. |
|
Last call for this PR to be integrated in v2.32, thanks. @sunyuechi can you review it please? Thanks! |
I have implemented vector-accelerated CRC modules (including CRC16, CRC32, and CRC64) using the RISC-V V, Zbc and Zvbc instruction sets, with full functional verification and performance testing completed.
The implementation primarily leverages the vclmul.v and vclmulh.v (carry-less multiply) instructions for data folding. For big-endian processing, it additionally utilizes vrgather.vv instructions for byte-order reversal. The final checksum is computed via Barrett reduction.