Mini-Inference Engine is a CUDA GEMM optimization learning project that packages progressive matrix multiplication kernels, a lightweight inference runtime, and profiling-oriented experimentation into one repository.
- CUDA kernels and runtime headers in
src/andinclude/ - Technical docs under
docs/ - Benchmarks and demos in
benchmarks/ - GitHub Pages site for documentation entry, reading paths, and project updates
cmake --preset release
cmake --build --preset release
./build/release/benchmark
./build/release/tests- Project docs:
https://lessup.github.io/mini-inference-engine/ - Site home explains what to read first for architecture, optimization, and API details
- See
CONTRIBUTING.mdfor contribution workflow
MIT License