Skip to content

perf: use swar to skip whitespace#1314

Open
cyruspyre wants to merge 2 commits intoserde-rs:masterfrom
cyruspyre:perf/skip_swar_whitespace
Open

perf: use swar to skip whitespace#1314
cyruspyre wants to merge 2 commits intoserde-rs:masterfrom
cyruspyre:perf/skip_swar_whitespace

Conversation

@cyruspyre
Copy link

I don't know if you will accept it but, using swar to skip whitespace improves performance when skipping over large amount of whitespaces. But also degrades, when there is none at all. I tried to avoid the degradation but was unable to do it. I've tried checking for 1-2 chars outside the loop but the generated asm was doing that already and manually doing it somehow made it even worse.

                      old             new        change
-- with -C target-cpu=native --------------------------
twitter/struct    753.11 MiB/s    804.69 MiB/s   +6.85%
twitter/dom       245.85 MiB/s    248.78 MiB/s   +1.19%
citm/struct       787.24 MiB/s    958.22 MiB/s  +21.72%
citm/dom          384.28 MiB/s    427.41 MiB/s  +11.22%
canada/struct     588.49 MiB/s    558.73 MiB/s   -5.06%
canada/dom        148.92 MiB/s    136.03 MiB/s   -8.65%

-- without --------------------------------------------
twitter/struct    749.26 MiB/s    796.89 MiB/s   +6.36%
twitter/dom       248.85 MiB/s    249.94 MiB/s   +0.44%
citm/struct       844.54 MiB/s    992.05 MiB/s  +17.46%
citm/dom          394.14 MiB/s    427.33 MiB/s   +8.42%
canada/struct     581.38 MiB/s    546.50 MiB/s   -6.00%
canada/dom        149.54 MiB/s    139.65 MiB/s   -6.61%

Running with -C target-cpu=native just made it use YMM register to perform the same operations...

@cyruspyre
Copy link
Author

cyruspyre commented Mar 3, 2026

I was somewhat able to reduce the slow down when there is no whitespace by replacing the code in end_seq function with loop equivalent and using simple loop to parse whitespace instead.

There was swar code duplication and a call to panic_in_cleanup in the asm of end_seq. I don't know why there was a call to such function at all. My best guess is that it has something to do with the existence of ErrorCode::Message(Box<str>). Nonetheless, doing so improved/degraded perf again.

                      old             new        change
-- with -C target-cpu=native --------------------------
twitter/struct    753.11 MiB/s    825.14 MiB/s   +9.56%
twitter/dom       245.85 MiB/s    247.15 MiB/s   +0.53%
citm/struct       787.24 MiB/s    960.71 MiB/s  +22.04%
citm/dom          384.28 MiB/s    416.97 MiB/s   +8.51%
canada/struct     588.49 MiB/s    566.53 MiB/s   -3.73%
canada/dom        148.92 MiB/s    139.45 MiB/s   -6.36%
-- without --------------------------------------------
twitter/struct    749.26 MiB/s    823.83 MiB/s   +9.95%
twitter/dom       248.85 MiB/s    251.76 MiB/s   +1.17%
citm/struct       844.54 MiB/s    998.68 MiB/s  +18.25%
citm/dom          394.14 MiB/s    434.81 MiB/s  +10.32%
canada/struct     581.38 MiB/s    576.26 MiB/s   -0.88%
canada/dom        149.54 MiB/s    140.62 MiB/s   -5.97%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant