You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a port of the existing scalar Flash Attention GLSL shader to Slang. I wanted to try to port it to see what the state of Slang is, how easy is it to use, what does it make easier compared to Slang? The purpose of this PR is just to serve as a point of discussion, for now. @jeffbolznv FYI
I started by copying the GLSL shader and changing just the bare minimum, then went deeper to transform some of the structures using Slang features. Overall I think it has a lot of potential, getting rid of the crazy preprocessor structures we need for GLSL would be nice.
What I like
The dequantization code is much cleaner with Slang generics and interfaces. You can just define an interface and plug in various dequantization/transformation algorithms, the code looks much cleaner in the end, IMO.
Templating/Generics also allows putting common patterns like reductions into functions that can be reused.
Typealiasing and vector<> also simplify data type choice, so I don't need to have a define per vector type I want to use, just one for the data type.
The module system seems nice to abstract out common code into functions that can be reused across many shaders.
What I don't like
The HLSL-based subgroup intrinsic naming seems clunky compared to GLSL. Like WaveReadLaneAt() instead of subgroupShuffle(). subgroupShuffleXor() is even missing completely, requiring WaveReadLaneAt(value, WaveGetLaneIndex() ^ s) as a workaround.
The Generics/Templating system still has flaws that prevent using e.g. the same reduce function for scalars and for vectors. I have to duplicate the code and provide a different function for vectors. A builtin vector type interface seems to be missing.
Shared Memory can't be passed as reference into a module function. That seems like a huge oversight to me. For a reduction I may need shared memory. To keep shared memory amount optimal I have to define it in the main file, but I can't pass it into a module function without strange interface workarounds like this:
publicinterface ISharedMemory<T> {
static T get(uint idx);
staticvoid set(uint idx, T value);
}
I read through the code. It definitely has a more "structured" feel, but it would take some getting used to. Will be interesting to see if it has any performance deficit compared to GLSL.
Typealiasing and vector<> also simplify data type choice
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ggmlchanges relating to the ggml tensor library for machine learningVulkanIssues specific to the Vulkan backend
2 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a port of the existing scalar Flash Attention GLSL shader to Slang. I wanted to try to port it to see what the state of Slang is, how easy is it to use, what does it make easier compared to Slang? The purpose of this PR is just to serve as a point of discussion, for now. @jeffbolznv FYI
I started by copying the GLSL shader and changing just the bare minimum, then went deeper to transform some of the structures using Slang features. Overall I think it has a lot of potential, getting rid of the crazy preprocessor structures we need for GLSL would be nice.
What I like
vector<>also simplify data type choice, so I don't need to have a define per vector type I want to use, just one for the data type.What I don't like
WaveReadLaneAt()instead ofsubgroupShuffle().subgroupShuffleXor()is even missing completely, requiringWaveReadLaneAt(value, WaveGetLaneIndex() ^ s)as a workaround.reducefunction for scalars and for vectors. I have to duplicate the code and provide a different function for vectors. Abuiltin vector typeinterface seems to be missing.That's very verbose and just makes the code harder to read.
There's probably some things that can still be "slang-ified", but I don't have time right now.
I plan to do some performance checks, but I'm having some trouble with the slang compiler currently.
I'll leave this as is for now, discuss what I found with the Slang developers and hopefully pick it up again in the future.