GPU Operator sets nvidia.com/gpu.present=false even when compatible GPU is present and nvidia-smi works [Node has g5.12xlarge A10G OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519)  K8s: v1.32.3-eks ]

We are observing that the GPU Operator incorrectly sets the node label nvidia.com/gpu.present=false on nodes that have compatible NVIDIA GPUs (A10G on g5.12xlarge) and where nvidia-smi works correctly.

✅ Expected Behavior:
The GPU Operator should install the driver as a DaemonSet (nvidia-driver-daemonset)

GPU health check should pass

nvidia.com/gpu.present=true should be set if the GPU is detected and healthy

❌ Actual Behavior:
No nvidia-driver-daemonset is created

GPU Operator logs show:
"Setting node label","Label":"nvidia.com/gpu.present","Value":"false"

Log output includes:

"No GPU node in the cluster, do not create DaemonSets"

"Failed to detect GPU"

Node has g5.12xlarge A10G, and nvidia-smi returns correct output when tested manually

🔧 Configuration:
GPU Operator version: v25.3.1

ClusterPolicy:

driver:
  enabled: true
  useNvidiaDriverCRD: false
  usePrecompiled: false
  kernelModuleType: dkms
  version: "570.148.08"

OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519)

K8s: v1.32.3-eks


nvidia-smi works on node

Manually labeling node to nvidia.com/gpu.present=true works temporarily

GPU Operator pod logs confirm it's skipping DaemonSet deployment due to misdirected GPU presence

❓Question:
What conditions could cause the operator to skip driver deployment and mark the GPU as unhealthy even when it is working? Can this be suppressed or debugged further?

Let me know if you'd like to attach logs or specific file samples (e.g. your ClusterPolicy YAML) to include in the issue too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Operator sets nvidia.com/gpu.present=false even when compatible GPU is present and nvidia-smi works [Node has g5.12xlarge A10G OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519) K8s: v1.32.3-eks ] #1551

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU Operator sets nvidia.com/gpu.present=false even when compatible GPU is present and nvidia-smi works [Node has g5.12xlarge A10G OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519) K8s: v1.32.3-eks ] #1551

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions