Skip to content

LessUp/cuda-kernel-academy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

CUDA Kernel Academy

CI Docs License: MIT CUDA C++

English | 简体中文

A structured CUDA programming course: from SGEMM optimization to production inference engines. Four progressive sub-projects covering GPU kernel development from basics to advanced.

Sub-Projects

# Project Focus Tech
01 SGEMM Tutorial Matrix multiplication optimization CUDA C++, Makefile
02 TensorCraft Core Header-only kernel library C++17/20, CMake
03 HPC Advanced Advanced HPC techniques CUDA, CMake, Benchmark
04 Inference Engine DL inference engine CUDA, CMake, pybind11

Learning Path

01-SGEMM Tutorial (Basics)
    ↓
02-TensorCraft Core (Library Design)
    ↓
03-HPC Advanced (Optimization)
    ↓
04-Inference Engine (Application)

Quick Start

git clone https://github.com/LessUp/cuda-kernel-academy.git
cd cuda-kernel-academy

# Build all sub-projects
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Run tests
cd build && ctest --output-on-failure

Build Options

Option Default Description
BUILD_TENSORCRAFT ON Build TensorCraft Core
BUILD_HPC_ADVANCED ON Build HPC Advanced
BUILD_INFERENCE_ENGINE ON Build Inference Engine

Key Topics

  • GEMM Optimization: Naive → Tiled → Register Blocked → Tensor Core
  • Memory Hierarchy: Global → Shared → Register, bank conflict avoidance
  • Parallel Patterns: Reduction, scan, histogram, sort
  • Kernel Fusion: Bias+Activation, LayerNorm+Residual
  • Mixed Precision: FP16/BF16 Tensor Core, INT8 quantization

Requirements

  • CUDA Toolkit 12.x+
  • CMake 3.20+
  • C++17/20 compiler
  • GPU: Volta (SM 7.0) or newer

References

License

MIT License

About

CUDA Kernel Optimization Academy: SGEMM Tutorial, TensorCraft Ops, HPC Advanced & Inference Engine | CUDA Kernel 优化学院:SGEMM 教程、TensorCraft 算子库、HPC 进阶、推理引擎,从入门到 Tensor Core

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors