Conversation
| .Lloopsve_vl: | ||
| whilelo p0.b, x_pos, x_len | ||
| b.none .return_pass | ||
| b.eq .return_pass |
There was a problem hiding this comment.
According to https://llvm.org/doxygen/AArch64AsmParser_8cpp_source.html , b.none is the same as b.eq when +sve is specified.
|
@liuqinfei could you look into this issue? Thanks again! ;) |
In fact, i don't have an Apple computer that supports SVE on hand. So I can't verify this patch. Maybe you can supply your verifications on the machines with and without SVE. @cielavenir |
|
I don't have either I just checked compilation Thus we need to call for tester(s) with M4 Mac, otherwise we need to wait for the next github RUNNER (not image) update. |
|
We are looking into releasing 2.31.1 as soon as next week, with just bug fixes. If we have someone that can test this, then we can include it in the next release. |
|
Let's hold this PR for next release, once more testing is done |
|
I've got an M4 MacBook Air -- doesn't seem to work for me: Oddly enough, running via On master, |
|
@tipabu thank you for testing. maybe current code being accepted with +sme might be a assembler bug.... |
|
@tipabu, what about "master" branch? |
|
@pablodelara, on master (91da2ad |
|
Thanks @tipabu. So it looks like this PR is not needed... |
|
@tipabu actually according to https://qiita.com/zacky1972/items/b7b5dd456fe021b30eb2, I need to wrap the function with (compilation is tested in https://github.com/cielavenir/isa-l/actions/runs/14731321078) |
|
@cielavenir Tests now pass! And looking at someone's investigation, we shouldn't need to worry about losing sve checks for Macs; no Apple silicon supports it.
@pablodelara Only insofar as Macs were always getting the neon implementation. |
|
great thank you~ |
|
@cielavenir can you clean up the commits (so there is no "Merge branch 'master'"...)? Good opportunity to rebase against latest 'master' branch |
|
@pablodelara rebased. |
|
So one concern: This seems to be slightly slower than master. On this branch: Whereas on master: Multiple runs had similar results (±100MB/s on encode, ±200MB/s on decode, give or take). |
|
@tipabu on https://github.com/cielavenir/isa-l/tree/featSME_CI branch, I changed to call smstart/smstop only in the dispatched function. Setting smstart to the each subroutine called by ec_encode_data_sve could have overhead issue. if this is faster, I will rebase featSME branch again. |
|
@cielavenir I see roughly the same on cielavenir@aad8c5c: |
|
Now I'm not sure if this is smstart overhead or sme impl is not faster than neon impl.. |
|
@liuqinfei what do you think of this PR? |
I recommend re-evaluating the ratio configuration 10 + 1 / 4 + 2 / 8 + 3. If benchmarking confirms no performance gains in the SME branch, i propose deferring this patch’s merger pending further validation. |
@tipabu @cielavenir, could you check this? If no gain, we should close this PR then. |
|
@tipabu I pushed a branch named featSME2_CI. https://github.com/cielavenir/isa-l/actions/runs/21317365685 Could you test it with ec_encode_data dispatching sme2/sme/none? [edit] if sme2/sme performances are the same, #367 might have build target issue so we need to wait for them |
|
Locally, compilation fails for cielavenir@5189a30, ending with: Curious when it seems to work for GHA... maybe it comes down to CI using |
|
|
Thanks @cielavenir, compilation works now.
I'm not sure how to control that, could you give some more guidance? At any rate, I can compare whatever the default behavior is:
(Again, done as an average of ten runs.) This looks more promising. 👍 |
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
thank you for checking @tipabu @pablodelara now #367 is required for this pull request |
|
@tipabu just in case could you test featSME2_CI26_noSME2 ? https://github.com/cielavenir/isa-l/actions/runs/21421947349 |
|
Threw in
Looks pretty comparable to |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I now know that SME has eor3 instruction, so SME2 and SME version would not have so much difference, but I keep both for now. And I cleaned up featSME2_CI branch, which now use macos-15 runner. https://github.com/cielavenir/isa-l/actions/runs/21468869590 Now cleanup from my side is done and #367 is a real blocker |
|
@liuqinfei I merged featSME2 into featSME. Only drawback is that the commits are tangled now. If you have concern there, as I |
|
Definitely, this needs rebase so only original commits are part of the PR. |
|
But rebasing requires merging #367 separately |
Ok, let's wait for @liuqinfei to confirm #367 is OK to merge and once it is merged, you can rebase htis. |
|
You are ok for rebase against master now |
|
@pablodelara done rebasing. ( please tell me if I should make another pull request for 258fbed ) |
pablodelara
left a comment
There was a problem hiding this comment.
Can you sign off the second commit? Thanks!
| #elif defined(__APPLE__) | ||
| if (sysctlEnabled(SYSCTL_SVE_KEY)) | ||
| return gf_vect_dot_prod_sve; | ||
| // Due to smstart, should not dispatch SME |
There was a problem hiding this comment.
These lines can be eliminated completely (same below)
There was a problem hiding this comment.
@pablodelara but if someone ask about SME here, please explain to them that __arm_streaming ABI is incompatible with the dispatcher because the git history is lost about this part. I cannot support this.
If it has nothing to do with SME, then open a new PR. |
|
@cielavenir could you look at the final comments? I want all optimizations/new features to be merged by the end of this week as we are releasing the library by the end of the month. |
62f6b29 to
75367b8
Compare
|
I rebased again, but sorry, this will be the last time from my side. I'm exhausted with repeated rebasing, sorry. |
|
I hope better solutions eg squash-merge can be configured so that the manual rebasing is not required. |
|
@liuqinfei could you review this PR? thanks |
Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
|
Last call for this PR to be integrated in v2.32, thanks. @liuqinfei can you review it please? Thanks! |
|
@liuqinfei yes please. the rebasing state is the cleanest ever. |
Today I went to biccamera ( 😂 ) and checked hw.optional. Then I found FEAT_SME but not FEAT_SVE. 1
This means that for apple the +sve code has to be compiled with +sme instead.
This is potentially quite breaking change, so I'd like this to be tested from those who have M4 Mac.
Call for tester(s): if you have M4 mac, please try running the test on your machine~~
Footnotes
Why Apple says something without FEAT_SVE armv9? ↩