CUDA Kernel Academy

A structured CUDA programming course: from SGEMM optimization to production inference engines. Four progressive sub-projects covering GPU kernel development from basics to advanced.

Sub-Projects

#	Project	Focus	Tech
01	SGEMM Tutorial	Matrix multiplication optimization	CUDA C++, Makefile
02	TensorCraft Core	Header-only kernel library	C++17/20, CMake
03	HPC Advanced	Advanced HPC techniques	CUDA, CMake, Benchmark
04	Inference Engine	DL inference engine	CUDA, CMake, pybind11

Learning Path

01-SGEMM Tutorial (Basics)
    ↓
02-TensorCraft Core (Library Design)
    ↓
03-HPC Advanced (Optimization)
    ↓
04-Inference Engine (Application)

Quick Start

git clone https://github.com/LessUp/cuda-kernel-academy.git
cd cuda-kernel-academy

# Build all sub-projects
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Run tests
cd build && ctest --output-on-failure

Build Options

Option	Default	Description
`BUILD_TENSORCRAFT`	ON	Build TensorCraft Core
`BUILD_HPC_ADVANCED`	ON	Build HPC Advanced
`BUILD_INFERENCE_ENGINE`	ON	Build Inference Engine

Key Topics

GEMM Optimization: Naive → Tiled → Register Blocked → Tensor Core
Memory Hierarchy: Global → Shared → Register, bank conflict avoidance
Parallel Patterns: Reduction, scan, histogram, sort
Kernel Fusion: Bias+Activation, LayerNorm+Residual
Mixed Precision: FP16/BF16 Tensor Core, INT8 quantization

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
.kiro/specs/open-source-project-polish		.kiro/specs/open-source-project-polish
01-sgemm-tutorial		01-sgemm-tutorial
02-tensorcraft-core		02-tensorcraft-core
03-hpc-advanced		03-hpc-advanced
04-inference-engine		04-inference-engine
changelog		changelog
common		common
docs		docs
examples		examples
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitbook.yaml		.gitbook.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
SUMMARY.md		SUMMARY.md
VERSION		VERSION
book.json		book.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Kernel Academy

Sub-Projects

Learning Path

Quick Start

Build Options

Key Topics

Requirements

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA Kernel Academy

Sub-Projects

Learning Path

Quick Start

Build Options

Key Topics

Requirements

References

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages