Skip to content

Conversation

@Ziminli
Copy link

@Ziminli Ziminli commented Feb 11, 2026

TL;DR: This PR primarily extends the support of the GEMM (gemm) operator to MetaX and adds a generic dispatcher.

Key Changes

  • Multi-platform Support:

    • Add the mcblas implementation of gemm on MetaX along with its example program.
    • Add a naive CPU implementation of gemm and is tested on MetaX.
  • BLAS Abstraction: Abstract a blas.h which contains the common framework for calling the blas library across different platforms (currently CUDA-ish).

  • Generic Dispatcher: Add the generic dispatcher in src/dispatcher.h, which provides both the core dispatcher and some specialized interfaces/APIs.

  • Codebase Refactoring:

    • Refactored DataType, specifically change it from a class to a enum class and updates the relevant caller codes.
    • Create common/ directory which contains the common constructs that can be used internally (no expose to the outside), and later it is planned to have device-specific subdirectories.
  • Add a bunch of constructs for compile-time usage, currently they are mainly used by the dispatcher and its caller code.

Known Issues & Future Work:

  • C++ Standard Compatibility: the dispatcher currently still uses a specific C++20 feature, which is explicit template parameter list for lambdas. For compatiability, a pure C++17 substitute is planned to be developed.

  • Type Mapping: float16 and bfloat16 have NOT been mapped to primitive types yet.

- Add additional entries to the Device enum class to support new hardware targets.
- Adapt GEMM mcblas implementation to use MetaX backend and add the test example.
- Extract common BLAS interfaces into a new blas.h abstraction for GEMM implementations to share.
…GEMM implementation

- Add `ConstexprMap` and compile-time traits in `common/` for efficient
  type-to-metadata mapping and relevant operations.
- Implement a generic dispatcher to reduce boilerplate for dispatching, especially for data types and devices.
- Add the CPU implementation for the GEMM
- Update `DataType` definitions and type lists to support wide
  dispatching.

Follow-up: support for fp16 and bf16 kernels is pending.
@Ziminli Ziminli self-assigned this Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant