Skip to content

GPU Operator sets nvidia.com/gpu.present=false even when compatible GPU is present and nvidia-smi works [Node has g5.12xlarge A10G OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519) K8s: v1.32.3-eks ] #1551

@udaykomati

Description

@udaykomati

We are observing that the GPU Operator incorrectly sets the node label nvidia.com/gpu.present=false on nodes that have compatible NVIDIA GPUs (A10G on g5.12xlarge) and where nvidia-smi works correctly.

✅ Expected Behavior:
The GPU Operator should install the driver as a DaemonSet (nvidia-driver-daemonset)

GPU health check should pass

nvidia.com/gpu.present=true should be set if the GPU is detected and healthy

❌ Actual Behavior:
No nvidia-driver-daemonset is created

GPU Operator logs show:
"Setting node label","Label":"nvidia.com/gpu.present","Value":"false"

Log output includes:

"No GPU node in the cluster, do not create DaemonSets"

"Failed to detect GPU"

Node has g5.12xlarge A10G, and nvidia-smi returns correct output when tested manually

🔧 Configuration:
GPU Operator version: v25.3.1

ClusterPolicy:

driver:
enabled: true
useNvidiaDriverCRD: false
usePrecompiled: false
kernelModuleType: dkms
version: "570.148.08"

OS: Amazon Linux 2 (EKS GPU AMI: amazon-eks-gpu-node-1.32-v20250519)

K8s: v1.32.3-eks

nvidia-smi works on node

Manually labeling node to nvidia.com/gpu.present=true works temporarily

GPU Operator pod logs confirm it's skipping DaemonSet deployment due to misdirected GPU presence

❓Question:
What conditions could cause the operator to skip driver deployment and mark the GPU as unhealthy even when it is working? Can this be suppressed or debugged further?

Let me know if you'd like to attach logs or specific file samples (e.g. your ClusterPolicy YAML) to include in the issue too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-triageissue or PR has not been assigned a priority-px label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions