python jit Poc by PhilipDeegan · Pull Request #31 · mkn/mkn.gpu

PhilipDeegan · 2026-02-14T15:11:36Z

Summary by CodeRabbit

Release Notes

New Features
- Added CUDA GPU warp size computation support
- Introduced NVCC compiler configuration for GPU builds
- Created Python CUDA JIT example demonstrating GPU vector operations
Bug Fixes
- Improved build workflow reliability with download retry logic
Chores
- Updated CUDA compiler configurations and compute capabilities
- Added Python package dependencies for JIT tests
- Enabled Python auto-formatting support

coderabbitai · 2026-02-14T15:11:56Z

📝 Walkthrough

Walkthrough

This PR strengthens GPU development infrastructure by adding robust retry logic to CI workflows, introducing NVCC build configuration, adding GPU warp size utilities, enabling Python JIT bindings for GPU operations, and updating CUDA compute capabilities.

Changes

Cohort / File(s)	Summary
CI and Development Environment `.github/workflows/build.yml`, `.sublime-project`	Enhanced curl download reliability with retry flags (3 attempts, 2s delay); added Python black auto-formatting on save for Sublime Text.
GPU Utilities `inc/mkn/gpu/cuda.hpp`, `inc/mkn/gpu/tuple.hpp`	Added `getWarpSize()` helper function and `warp_size` constant for device warp computation; removed unnecessary commented include.
Build Toolchain Configuration `res/mkn/clang_cuda.yaml`, `res/mkn/nvcc.yaml`	Updated clang_cuda from clang++-17 to clang++ with sm_61→sm_120 compute capability; added new nvcc.yaml config defining NVCC compiler pipeline with C++20 flags and CUDA PATH setup.
Project Configuration `mkn.yaml`	Enabled CPU GPU flag (`-DMKN_GPU_CPU=1`) for CPU test profile.
Python JIT GPU Bindings `test/poc/jit/pybind.cpp`, `test/poc/jit/eg.py`, `test/poc/jit/mkn.yaml`, `test/poc/jit/requirements.txt`	Added pybind11-based C++ module `FunctionSupport` exposing GPU buffer management and Span access; created Numba CUDA JIT example script; configured build with pybind11/Python3 and dependencies (numpy, numba-cuda).
Test Updates `test/any/async_streaming.cpp`	Replaced `KUL_DBG_FUNC_ENTER` macro with `MKN_KUL_DBG_FUNC_ENTER`; adjusted include ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

yaml defaults #29: Directly modifies the same mkn.yaml CPU profile GPU flag that this PR enables.
count zero #28: Related to the warp_size constant modifications in inc/mkn/gpu/cuda.hpp.

Poem

🐰 A warp of CUDA hops with glee,
New bindings bind us GPU-free,
With Python JIT and retries bright,
The toolchain dances through the night! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'python jit Poc' is vague and lacks descriptive detail about the changeset scope or intent.	Use a more descriptive title that clarifies the main objective, such as 'Add Python CUDA JIT proof-of-concept with pybind11 bindings' or similar to better convey the changeset purpose.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `master`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch poc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In @.sublime-project:
- Around line 16-18: The JSON for the "python-black" object contains a trailing
comma after the "format_on_save" value which makes the file invalid; remove the
trailing comma after "format_on_save": "on" (i.e., ensure the "python-black"
object ends with a closing brace and no trailing comma) so the file becomes
strict JSON-parsable while keeping the "python-black" and "format_on_save" keys
intact.

In `@inc/mkn/gpu/cuda.hpp`:
- Around line 88-90: The call to cudaGetDeviceProperties in the function (using
local cudaDeviceProp devProp and returning devProp.warpSize) ignores its return
value, risking uninitialized warpSize on failure; check the returned cudaError_t
and handle errors by invoking MKN_GPU_ASSERT (or similar error macro) or
returning a safe default (e.g., 32) when cudaGetDeviceProperties fails. Update
the code around cudaGetDeviceProperties to capture the return value, assert on
non-success via MKN_GPU_ASSERT(cudaGetDeviceProperties(...) == cudaSuccess,
"msg") or branch to return a sensible default, ensuring devProp.warpSize is only
used when the call succeeded.

In `@test/any/async_streaming.cpp`:
- Around line 8-9: The test uses std::this_thread::sleep_for (seen at usages
around lines for the async streaming tests) but doesn't include <thread>,
relying on transitive includes; add an explicit `#include` <thread> at the top of
async_streaming.cpp so std::this_thread::sleep_for is available portably and no
longer depends on indirect headers.

In `@test/poc/jit/eg.py`:
- Around line 17-19: The vadd function is currently decorated as a kernel
(`@cuda.jit`) but is called from inside vector_add_gpu, so change its decorator to
a device function by using `@cuda.jit`(device=True) (i.e., make vadd a device
function) so it can be invoked from the kernel; keep the function name vadd and
its parameter list unchanged and ensure vector_add_gpu continues to call vadd
from within the kernel.

In `@test/poc/jit/pybind.cpp`:
- Around line 57-66: The PYBIND11_MODULE binding for class FunctionSupport
contains a duplicate def_readwrite("a", &FunctionSupport::a) where the third
member should be exposed; update the duplicate binding to def_readwrite("c",
&FunctionSupport::c) so the class exposes members a, b, and c correctly and keep
the other method bindings (print, A, B, C) unchanged.
- Around line 44-46: The wrapper creates a py::array_t<float> in make(auto p)
with a no-op capsule, risking use-after-free if the underlying CUDA-managed
pointer outlives its parent; change the capsule to hold a reference to the
parent Python object (e.g., the FunctionSupport py::object) or otherwise set the
array base to that parent so NumPy keeps the parent alive. Locate make(auto p)
and modify the capsule construction to capture and incref (hold) the parent
py::object (or pass the parent as the array base) instead of a no-op destructor,
ensuring the parent’s lifetime is extended until the array is freed. Ensure the
symbol FunctionSupport (or the accessor that calls make) provides the py::object
parent to make so the returned py::array_t<float> keeps the parent alive.
- Line 58: The binding uses py::smart_holder in the py::class_<FunctionSupport,
py::smart_holder> (and the other class_ instantiation) which requires pybind11
v2.13+, so either pin/require that version in your build config (add an explicit
version constraint for lang.pybind11 in mkn.yaml such as >=2.13) or add a
build-time verification (check PYBIND11_VERSION_MAJOR/MINOR or equivalent) and
fail with a clear message if the installed pybind11 is older; update mkn.yaml to
explicitly request pybind11 >=2.13 and/or add the version check near the module
initialization that defines FunctionSupport to guarantee compatibility.

🧹 Nitpick comments (3)

test/poc/jit/requirements.txt (1)

1-2: Consider pinning dependency versions.

Unpinned dependencies can lead to non-reproducible environments if APIs change. Even for a PoC, minimum version constraints (e.g., numpy>=1.24) would help avoid surprises.
inc/mkn/gpu/cuda.hpp (1)
94-94: static inline gives each TU its own copy — consider inline only.

static gives internal linkage, so every translation unit that includes this header calls getWarpSize() independently and gets its own warp_size instance. If the intent is a single program-wide constant (typical for header-only libs in C++17+), use just inline without static:
-static std::uint32_t inline const warp_size = getWarpSize();
+inline std::uint32_t const warp_size = getWarpSize();
This ensures a single instance across TUs via the C++17 inline variable rule.
test/poc/jit/eg.py (1)

27-28: Redundant s.print() call — constructor already prints twice.

The FunctionSupport constructor (in pybind.cpp lines 28–30) already calls print() twice (before and after init). Line 28 here is a third call that prints the same post-init state. This is harmless in a PoC but adds noise to the output.

coderabbitai · 2026-02-14T15:15:41Z

.sublime-project

+		"python-black": {
+	      "format_on_save": "on",
+	    },


⚠️ Potential issue | 🟡 Minor

Trailing comma produces invalid JSON.

Line 18 has a trailing comma after "on" which is not valid in strict JSON. Some editors tolerate this, but it may cause parse errors in others.

Proposed fix

"python-black": { - "format_on_save": "on", - }, + "format_on_save": "on" + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"python-black": {

"format_on_save": "on",

},

"python-black": {

"format_on_save": "on"

}

🤖 Prompt for AI Agents

In @.sublime-project around lines 16 - 18, The JSON for the "python-black" object contains a trailing comma after the "format_on_save" value which makes the file invalid; remove the trailing comma after "format_on_save": "on" (i.e., ensure the "python-black" object ends with a closing brace and no trailing comma) so the file becomes strict JSON-parsable while keeping the "python-black" and "format_on_save" keys intact.

coderabbitai · 2026-02-14T15:15:41Z

inc/mkn/gpu/cuda.hpp

+  cudaDeviceProp devProp;
+  [[maybe_unused]] auto ret = cudaGetDeviceProperties(&devProp, dev);
+  return devProp.warpSize;


⚠️ Potential issue | 🟠 Major

Unchecked error from cudaGetDeviceProperties.

If cudaGetDeviceProperties fails (e.g., invalid device index, no GPU present), ret is silently discarded via [[maybe_unused]] and devProp.warpSize will contain uninitialized data. Consider using MKN_GPU_ASSERT here or returning a sensible default on failure.

Proposed fix

- [[maybe_unused]] auto ret = cudaGetDeviceProperties(&devProp, dev); - return devProp.warpSize; + MKN_GPU_ASSERT(cudaGetDeviceProperties(&devProp, dev)); + return devProp.warpSize;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cudaDeviceProp devProp;

[[maybe_unused]] auto ret = cudaGetDeviceProperties(&devProp, dev);

return devProp.warpSize;

cudaDeviceProp devProp;

MKN_GPU_ASSERT(cudaGetDeviceProperties(&devProp, dev));

return devProp.warpSize;

🤖 Prompt for AI Agents

In `@inc/mkn/gpu/cuda.hpp` around lines 88 - 90, The call to cudaGetDeviceProperties in the function (using local cudaDeviceProp devProp and returning devProp.warpSize) ignores its return value, risking uninitialized warpSize on failure; check the returned cudaError_t and handle errors by invoking MKN_GPU_ASSERT (or similar error macro) or returning a safe default (e.g., 32) when cudaGetDeviceProperties fails. Update the code around cudaGetDeviceProperties to capture the return value, assert on non-success via MKN_GPU_ASSERT(cudaGetDeviceProperties(...) == cudaSuccess, "msg") or branch to return a sensible default, ensuring devProp.warpSize is only used when the call succeeded.

coderabbitai · 2026-02-14T15:15:41Z

test/any/async_streaming.cpp

+#include <algorithm>
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/any/async_streaming.cpp | head -30

Repository: mkn/mkn.gpu

Length of output: 989

🏁 Script executed:

rg -n 'std::this_thread' test/any/async_streaming.cpp

Repository: mkn/mkn.gpu

Length of output: 292

🏁 Script executed:

rg -n '#include' test/any/async_streaming.cpp | head -20

Repository: mkn/mkn.gpu

Length of output: 199

Add explicit #include <thread> header.

The file uses std::this_thread::sleep_for at lines 38, 70, 103, 138, and 173 but does not directly include <thread>. This relies on transitive inclusion from one of the project headers, which is fragile and non-portable across different compilers and standard library implementations.

Proposed fix

`#include` <algorithm> +#include <thread>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#include <algorithm>

`#include` <algorithm>

`#include` <thread>

🤖 Prompt for AI Agents

In `@test/any/async_streaming.cpp` around lines 8 - 9, The test uses std::this_thread::sleep_for (seen at usages around lines for the async streaming tests) but doesn't include <thread>, relying on transitive includes; add an explicit `#include` <thread> at the top of async_streaming.cpp so std::this_thread::sleep_for is available portably and no longer depends on indirect headers.

coderabbitai · 2026-02-14T15:15:42Z

test/poc/jit/eg.py

+@cuda.jit
+def vadd(i, a, b, c):
+    c[i] = a[i] + b[i]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Numba CUDA @cuda.jit device=True device function kernel calling conventions

💡 Result:

In Numba CUDA, a @cuda.jit(device=True) function is a CUDA device function (roughly analogous to __device__ in CUDA C/C++): it can only be called from GPU code (a kernel or another device function), and it is invoked with a normal Python call expression inside the kernel (not with the kernel[grid, block](...) launch syntax). Device functions may return a value; kernels may not. [1][2]

Defining + calling a device function (the “calling convention”)

from numba import cuda `@cuda.jit`(device=True) def bar(a, b): return a + b `@cuda.jit` def kern(x, y, out): i = cuda.grid(1) out[i] = bar(x[i], y[i]) # normal call (device-side)

This is the core convention: call like a regular function from within GPU code. [1]

Signatures / specialization

You can supply an explicit signature for a device function (and kernel), e.g. 'int32(int32, int32)', but you don’t have to—Numba can specialize based on argument types (and caches compiled variants). Example of a device signature is shown in Numba’s CUDA JIT docs. [3]

Kernel arguments support device arrays (CUDA Array Interface), NumPy arrays (with host↔device copying caveats), scalars, and records; device functions use the same underlying type system when called from kernels. [2]

Inlining / call overhead controls

@cuda.jit exposes inlining controls (inline="always" at Numba IR level, forceinline=True at NVVM IR level). [2]

Practically, many small device helpers get inlined by compilation/optimization anyway; Numba community guidance often treats device-function calls as effectively “inlined into the caller” from a performance perspective. [4]

What you cannot do (common confusion)

No dynamic parallelism: you cannot launch a kernel from inside a kernel in Numba (i.e., no device-side kernel launches). [5][6]

Also: calling non-cuda.jit(device=True) helpers

Numba can also compile a regular @numba.jit/@njit helper for the GPU when it’s called from a CUDA kernel (subject to CUDA-Python feature restrictions). [7]

Sources: Numba CUDA “Writing Device Functions” and kernel API docs, plus related Numba docs/discussions. [1][2][4][5][6][7]

[1] (nvidia.github.io)
[2] (nvidia.github.io)
[3] (numba.pydata.org)
[4] (numba.discourse.group)
[5] (nvidia.github.io)
[6] (numba.discourse.group)
[7] (developer.nvidia.com)

Citations:

1: https://nvidia.github.io/numba-cuda/user/device-functions.html

2: https://nvidia.github.io/numba-cuda/reference/kernel.html

3: https://numba.pydata.org/numba-doc/0.13/CUDAJit.html?utm_source=openai

4: https://numba.discourse.group/t/usage-of-cuda-python-linear-algebra-on-gpu-and-computational-code/1064?utm_source=openai

5: https://nvidia.github.io/numba-cuda/user/kernels.html?utm_source=openai

6: https://numba.discourse.group/t/kernel-within-a-kernel/1582?utm_source=openai

7: https://developer.nvidia.com/blog/seven-things-numba/?utm_source=openai

🏁 Script executed:

cat test/poc/jit/eg.py

Repository: mkn/mkn.gpu

Length of output: 606

vadd must be a device function to be callable from vector_add_gpu.

@cuda.jit without arguments declares a kernel, which cannot be called from another kernel in Numba CUDA. Since vadd is invoked on line 26 from within vector_add_gpu, it must be marked as a device function.

Fix

-@cuda.jit +@cuda.jit(device=True) def vadd(i, a, b, c): c[i] = a[i] + b[i]

🤖 Prompt for AI Agents

In `@test/poc/jit/eg.py` around lines 17 - 19, The vadd function is currently decorated as a kernel (`@cuda.jit`) but is called from inside vector_add_gpu, so change its decorator to a device function by using `@cuda.jit`(device=True) (i.e., make vadd a device function) so it can be invoked from the kernel; keep the function name vadd and its parameter list unchanged and ensure vector_add_gpu continues to call vadd from within the kernel.

coderabbitai · 2026-02-14T15:15:42Z

test/poc/jit/pybind.cpp

+  py::array_t<float> make(auto p) {
+    return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })};
+  }


⚠️ Potential issue | 🟠 Major

Dangling pointer risk: noop capsule means Python arrays outlive the backing memory.

The make method wraps a raw CUDA-managed pointer into a py::array_t with a no-op destructor capsule. If Python code holds a reference to the returned array after the FunctionSupport object is destroyed, accessing the array will hit freed memory (use-after-free). In a PoC this may be fine, but for anything beyond that, consider having the capsule prevent premature destruction (e.g., tie the capsule's prevent premature destruction to the parent object's prevent premature destruction, or prevent premature destruction).

One approach: pass a reference to the parent py::object as the array's base so NumPy keeps FunctionSupport alive.

💡 Sketch: tie array lifetime to parent

- py::array_t<float> make(auto p) { - return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })}; + py::array_t<float> make(auto p, py::object self) { + // Use `self` as the base object so NumPy keeps FunctionSupport alive + return py::array_t<float>({32}, {sizeof(float)}, p, self); }

Then pass py::object from each accessor:

- py::array_t<float> A() { return make(a); } + py::array_t<float> A(py::object self) { return make(a, self); }

🤖 Prompt for AI Agents

In `@test/poc/jit/pybind.cpp` around lines 44 - 46, The wrapper creates a py::array_t<float> in make(auto p) with a no-op capsule, risking use-after-free if the underlying CUDA-managed pointer outlives its parent; change the capsule to hold a reference to the parent Python object (e.g., the FunctionSupport py::object) or otherwise set the array base to that parent so NumPy keeps the parent alive. Locate make(auto p) and modify the capsule construction to capture and incref (hold) the parent py::object (or pass the parent as the array base) instead of a no-op destructor, ensuring the parent’s lifetime is extended until the array is freed. Ensure the symbol FunctionSupport (or the accessor that calls make) provides the py::object parent to make so the returned py::array_t<float> keeps the parent alive.

coderabbitai · 2026-02-14T15:15:42Z

test/poc/jit/pybind.cpp

+PYBIND11_MODULE(poc_pyb, m) {
+  py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")
+      .def(py::init<>())
+      .def("print", &FunctionSupport::print)
+      .def_readwrite("a", &FunctionSupport::a)
+      .def("A", &FunctionSupport::A)
+      .def_readwrite("b", &FunctionSupport::b)
+      .def("B", &FunctionSupport::B)
+      .def_readwrite("a", &FunctionSupport::a)
+      .def("C", &FunctionSupport::C);


⚠️ Potential issue | 🔴 Critical

Duplicate binding: Line 65 binds "a" again instead of "c".

Line 61 already binds def_readwrite("a", &FunctionSupport::a). Line 65 repeats the exact same binding. This almost certainly should be "c" to expose the c pointer, matching the pattern for a and b.

🐛 Proposed fix

.def_readwrite("b", &FunctionSupport::b) .def("B", &FunctionSupport::B) - .def_readwrite("a", &FunctionSupport::a) + .def_readwrite("c", &FunctionSupport::c) .def("C", &FunctionSupport::C);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

PYBIND11_MODULE(poc_pyb, m) {

py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")

.def(py::init<>())

.def("print", &FunctionSupport::print)

.def_readwrite("a", &FunctionSupport::a)

.def("A", &FunctionSupport::A)

.def_readwrite("b", &FunctionSupport::b)

.def("B", &FunctionSupport::B)

.def_readwrite("a", &FunctionSupport::a)

.def("C", &FunctionSupport::C);

PYBIND11_MODULE(poc_pyb, m) {

py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")

.def(py::init<>())

.def("print", &FunctionSupport::print)

.def_readwrite("a", &FunctionSupport::a)

.def("A", &FunctionSupport::A)

.def_readwrite("b", &FunctionSupport::b)

.def("B", &FunctionSupport::B)

.def_readwrite("c", &FunctionSupport::c)

.def("C", &FunctionSupport::C);

🤖 Prompt for AI Agents

In `@test/poc/jit/pybind.cpp` around lines 57 - 66, The PYBIND11_MODULE binding for class FunctionSupport contains a duplicate def_readwrite("a", &FunctionSupport::a) where the third member should be exposed; update the duplicate binding to def_readwrite("c", &FunctionSupport::c) so the class exposes members a, b, and c correctly and keep the other method bindings (print, A, B, C) unchanged.

coderabbitai · 2026-02-14T15:15:42Z

test/poc/jit/pybind.cpp

+};
+
+PYBIND11_MODULE(poc_pyb, m) {
+  py::class_<FunctionSupport, py::smart_holder>(m, "FunctionSupport")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check pybind11 version references in configuration files echo "=== Pybind11 version references ===" rg -rn 'pybind11' --glob='*.yaml' --glob='*.yml' --glob='*.txt' --glob='*.cmake' --glob='CMakeLists.txt' -C2

Repository: mkn/mkn.gpu

Length of output: 296

🏁 Script executed:

#!/bin/bash # Check CMakeLists.txt for pybind11 version specifications echo "=== CMakeLists.txt content (if present) ===" find . -name 'CMakeLists.txt' -type f -exec head -50 {} + 2>/dev/null # Check setup.py or pyproject.toml for dependencies echo -e "\n=== Setup/config files ===" find . -maxdepth 3 -name 'setup.py' -o -name 'pyproject.toml' -o -name 'requirements.txt' | head -10

Repository: mkn/mkn.gpu

Length of output: 126

🏁 Script executed:

#!/bin/bash # Check smart_holder usage in the codebase echo "=== Smart_holder usage ===" rg -rn 'smart_holder' -C3

Repository: mkn/mkn.gpu

Length of output: 813

🏁 Script executed:

#!/bin/bash # Check the specific file context around lines 58 and 69 echo "=== Context around pybind.cpp lines 58 and 69 ===" cat -n test/poc/jit/pybind.cpp | sed -n '50,80p'

Repository: mkn/mkn.gpu

Length of output: 977

🏁 Script executed:

#!/bin/bash # Read the full mkn.yaml file to see pybind11/dependency specifications echo "=== Full mkn.yaml content ===" find . -name '*.yaml' -o -name '*.yml' | xargs cat 2>/dev/null | head -100

Repository: mkn/mkn.gpu

Length of output: 2280

🏁 Script executed:

#!/bin/bash # Search for any pybind11 version specifications across all files echo "=== Pybind11 version specifications ===" rg -rn 'pybind11.*2\.' rg -rn 'pybind11.*version'

Repository: mkn/mkn.gpu

Length of output: 94

🏁 Script executed:

#!/bin/bash # Look for dependency resolution in mkn related files echo "=== MKN related config files ===" find . -maxdepth 3 -type f \( -name 'mkn*' -o -name '*.mkn' \) | head -20

Repository: mkn/mkn.gpu

Length of output: 98

🏁 Script executed:

#!/bin/bash # Read the mkn.yaml in test/poc/jit directory echo "=== test/poc/jit/mkn.yaml ===" cat test/poc/jit/mkn.yaml

Repository: mkn/mkn.gpu

Length of output: 348

🏁 Script executed:

#!/bin/bash # Search for any pybind11 include or dependency statements echo "=== Pybind11 includes in codebase ===" rg -rn '#include.*pybind11' # Check if there are any header files or submodule references echo -e "\n=== Git submodules ===" cat .gitmodules 2>/dev/null || echo "No submodules file found"

Repository: mkn/mkn.gpu

Length of output: 278

🏁 Script executed:

#!/bin/bash # Look for any CI/build documentation that might specify pybind11 version echo "=== README or documentation files ===" find . -maxdepth 2 -type f \( -name 'README*' -o -name 'CONTRIBUTING*' \) -exec head -50 {} + 2>/dev/null # Check for any version pins in GitHub workflows echo -e "\n=== Full workflow file ===" find . -path './.github/workflows/*.yml' -o -path './.github/workflows/*.yaml' 2>/dev/null | xargs cat 2>/dev/null

Repository: mkn/mkn.gpu

Length of output: 2389

Ensure pybind11 version >= 2.13 is explicitly specified or verified.

The code uses py::smart_holder at lines 58 and 69, which requires pybind11 v2.13 or later. The mkn.yaml dependency specifies lang.pybind11 without explicit version pinning. Verify that mkn's lang.pybind11 module provides v2.13+, or add an explicit version constraint to the mkn.yaml configuration to guarantee compatibility.

🤖 Prompt for AI Agents

In `@test/poc/jit/pybind.cpp` at line 58, The binding uses py::smart_holder in the py::class_<FunctionSupport, py::smart_holder> (and the other class_ instantiation) which requires pybind11 v2.13+, so either pin/require that version in your build config (add an explicit version constraint for lang.pybind11 in mkn.yaml such as >=2.13) or add a build-time verification (check PYBIND11_VERSION_MAJOR/MINOR or equivalent) and fail with a clear message if the installed pybind11 is older; update mkn.yaml to explicitly request pybind11 >=2.13 and/or add the version check near the module initialization that defines FunctionSupport to guarantee compatibility.

PhilipDeegan added 2 commits January 27, 2026 20:18

++

f478dde

...

8b40ef3

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

python jit Poc#31

python jit Poc#31
PhilipDeegan wants to merge 2 commits intomasterfrom
poc

PhilipDeegan commented Feb 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

coderabbitai bot Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

PhilipDeegan commented Feb 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Defining + calling a device function (the “calling convention”)

Signatures / specialization

Inlining / call overhead controls

What you cannot do (common confusion)

Also: calling non-cuda.jit(device=True) helpers

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PhilipDeegan commented Feb 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 14, 2026 •

edited

Loading

Also: calling non-`cuda.jit(device=True)` helpers