From c05c928054f42f3f2655cc0687ce51d720f5b6fd Mon Sep 17 00:00:00 2001
From: luohua13 <jcwang@alauda.io>
Date: Thu, 15 Jan 2026 18:30:26 +0800
Subject: [PATCH 1/2] dra

---
 .../pgpu_dra/how_to/cdi_enable_containerd.mdx | 34 ++++++++++
 docs/en/pgpu_dra/how_to/index.mdx             | 11 ++++
 docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx    |  7 +++
 docs/en/pgpu_dra/index.mdx                    |  6 ++
 docs/en/pgpu_dra/install.mdx                  | 63 +++++++++++++++++++
 docs/en/pgpu_dra/intro.mdx                    |  6 ++
 6 files changed, 127 insertions(+)
 create mode 100644 docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx
 create mode 100644 docs/en/pgpu_dra/how_to/index.mdx
 create mode 100644 docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
 create mode 100644 docs/en/pgpu_dra/index.mdx
 create mode 100644 docs/en/pgpu_dra/install.mdx
 create mode 100644 docs/en/pgpu_dra/intro.mdx
diff --git a/docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx b/docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx
new file mode 100644
index 0000000..d2dfc57
--- /dev/null
+++ b/docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx
@@ -0,0 +1,34 @@
+---
+weight: 20
+---
+
+# Enable CDI in Containerd
+
+CDI (Container Device Interface) provides a standard mechanism for device vendors to describe what is required to provide access to a specific resource such as a GPU beyond a simple device name.
+
+CDI support is enabled by default in containerd version 2.0 and later. Earlier versions, starting from 1.7.0, support for this feature requires manual activation.
+
+## Steps to Enable CDI in Containerd (1.7.0 <= version < 2.0.0)
+
+1. Update containerd configuration.
+    Edit the configuration file:
+    ```bash
+    vi /etc/containerd/config.toml
+    ```
+    Add or modify the following section:
+    ```toml
+    [plugins."io.containerd.grpc.v1.cri"]
+      enable_cdi = true
+    ```
+2. Restart containerd.
+    ```bash
+    systemctl restart containerd
+    systemctl status containerd
+    ```
+    Ensure the service is running correctly.
+
+3. Verify CDI is Enabled.
+    ```bash
+    journalctl -u containerd | grep "EnableCDI:true"
+    ```
+    Wait a moment, if there are logs, it means the setup was successful.
diff --git a/docs/en/pgpu_dra/how_to/index.mdx b/docs/en/pgpu_dra/how_to/index.mdx
new file mode 100644
index 0000000..1fcbef6
--- /dev/null
+++ b/docs/en/pgpu_dra/how_to/index.mdx
@@ -0,0 +1,11 @@
+---
+weight: 30
+i18n:
+title:
+en: How To
+zh: How To
+---
+
+# How To
+
+<Overview />
diff --git a/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx b/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
new file mode 100644
index 0000000..caa5c1a
--- /dev/null
+++ b/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
@@ -0,0 +1,7 @@
+---
+weight: 30
+---
+
+# Enable DRA(Dynamic Resource Allocation) in Kubernetes
+
+
diff --git a/docs/en/pgpu_dra/index.mdx b/docs/en/pgpu_dra/index.mdx
new file mode 100644
index 0000000..d1577c3
--- /dev/null
+++ b/docs/en/pgpu_dra/index.mdx
@@ -0,0 +1,6 @@
+---
+weight: 83
+---
+# Alauda Build of NVIDIA DRA Driver for GPUs
+
+<Overview />
diff --git a/docs/en/pgpu_dra/install.mdx b/docs/en/pgpu_dra/install.mdx
new file mode 100644
index 0000000..d745382
--- /dev/null
+++ b/docs/en/pgpu_dra/install.mdx
@@ -0,0 +1,63 @@
+---
+weight: 20
+---
+
+# Installation
+
+## Prerequisites
+
+- **NvidiaDriver v565+**
+- **Kubernetes v1.32+**
+- **ACP v4.1+**
+- **Cluster administrator access to your ACP cluster**
+- **CDI must be enabled in the underlying container runtime (such as containerd)**
+- **DRA and corresponding API groups must be enabled**
+
+## Procedure
+
+### Installing Nvidia driver in your gpu node
+Prefer to [Installation guide of Nvidia Official website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)
+
+### Installing Nvidia Container Runtime
+Prefer to [Installation guide of Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+
+### Downloading Cluster plugin
+
+:::info
+
+`Alauda Build of NVIDIA DRA Driver for GPUs` cluster plugin can be retrieved from Customer Portal.
+
+Please contact Consumer Support for more information.
+
+:::
+
+### Uploading the Cluster plugin
+
+For more information on uploading the cluster plugin, please refer to <ExternalSiteLink name="acp" href="ui/cli_tools/index.html#uploading-cluster-plugins" children="Uploading Cluster Plugins" />
+
+### Installing Alauda Build of NVIDIA DRA Driver for GPUs
+
+1. Add label "nvidia-device-enable=pgpu-dra" in your GPU node for `nvidia-dra-driver-gpu-kubelet-plugin` schedule.
+    ```bash
+     kubectl label nodes {nodeid} nvidia-device-enable=pgpu-dra
+    ```
+    :::info
+    **Note: On the same node, you can only set one of the following labels: `gpu=on`, `nvidia-device-enable=pgpu`, or `nvidia-device-enable=pgpu-dra`.**
+    :::
+
+2. Go to the `Administrator` -> `Marketplace` -> `Cluster Plugin` page, switch to the target cluster, and then deploy the `Alauda Build of NVIDIA DRA Driver for GPUs` Cluster plugin.
+
+3. Verify result. You can see the status of "Installed" in the UI or you can check the pod status:
+    ```bash
+    kubectl get pods -n kube-system | grep "nvidia-dra-driver-gpu"
+    ```
+    You should get results similar to:
+    ```
+    nvidia-dra-driver-gpu-controller-675644bfb5-c2hq4   1/1     Running   0              18h
+    nvidia-dra-driver-gpu-kubelet-plugin-65fjt          2/2     Running   0              18h
+    ```
+
+### Upgrading Alauda Build of NVIDIA DRA Driver for GPUs
+
+1. Upload the new version for package of **Alauda Build of NVIDIA DRA Driver for GPUs** plugin to ACP.
+2. Go to the `Administrator` -> `Clusters` -> `Target Cluster` -> `Functional Components` page, then click the `Upgrade` button, and you will see the `Alauda Build of NVIDIA DRA Driver for GPUs` can be upgraded.
diff --git a/docs/en/pgpu_dra/intro.mdx b/docs/en/pgpu_dra/intro.mdx
new file mode 100644
index 0000000..e0f4e65
--- /dev/null
+++ b/docs/en/pgpu_dra/intro.mdx
@@ -0,0 +1,6 @@
+---
+weight: 10
+---
+# Introduction
+
+Dynamic Resource Allocation (DRA) is a Kubernetes feature that provides a more flexible and extensible way to request and allocate hardware resources like GPUs. Unlike traditional device plugins that only support simple counting of identical resources, DRA enables fine-grained resource selection based on device attributes and capabilities.

From cb088061a3f067ec7b8b811b03e73a2d3f0b735c Mon Sep 17 00:00:00 2001
From: luohua13 <jcwang@alauda.io>
Date: Mon, 19 Jan 2026 15:16:51 +0800
Subject: [PATCH 2/2] add dra support

---
 .../pgpu_dra/how_to/cdi_enable_containerd.mdx |   2 +-
 .../pgpu_dra/how_to/index.mdx                 |   0
 .../pgpu_dra/how_to/k8s_dra_enable.mdx        |  58 ++++++
 .../device_management}/pgpu_dra/index.mdx     |   0
 .../device_management/pgpu_dra/install.mdx    | 177 ++++++++++++++++++
 .../device_management}/pgpu_dra/intro.mdx     |   0
 docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx    |   7 -
 docs/en/pgpu_dra/install.mdx                  |  63 -------
 8 files changed, 236 insertions(+), 71 deletions(-)
 rename docs/en/{ => infrastructure_management/device_management}/pgpu_dra/how_to/cdi_enable_containerd.mdx (93%)
 rename docs/en/{ => infrastructure_management/device_management}/pgpu_dra/how_to/index.mdx (100%)
 create mode 100644 docs/en/infrastructure_management/device_management/pgpu_dra/how_to/k8s_dra_enable.mdx
 rename docs/en/{ => infrastructure_management/device_management}/pgpu_dra/index.mdx (100%)
 create mode 100644 docs/en/infrastructure_management/device_management/pgpu_dra/install.mdx
 rename docs/en/{ => infrastructure_management/device_management}/pgpu_dra/intro.mdx (100%)
 delete mode 100644 docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
 delete mode 100644 docs/en/pgpu_dra/install.mdx

diff --git a/docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/cdi_enable_containerd.mdx
similarity index 93%
rename from docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx
rename to docs/en/infrastructure_management/device_management/pgpu_dra/how_to/cdi_enable_containerd.mdx
index d2dfc57..9fa18e0 100644
--- a/docs/en/pgpu_dra/how_to/cdi_enable_containerd.mdx
+++ b/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/cdi_enable_containerd.mdx
@@ -8,7 +8,7 @@ CDI (Container Device Interface) provides a standard mechanism for device vendor
 
 CDI support is enabled by default in containerd version 2.0 and later. Earlier versions, starting from 1.7.0, support for this feature requires manual activation.
 
-## Steps to Enable CDI in Containerd (1.7.0 <= version < 2.0.0)
+## Steps to Enable CDI in containerd v1.7.x
 
 1. Update containerd configuration.
     Edit the configuration file:
diff --git a/docs/en/pgpu_dra/how_to/index.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/index.mdx
similarity index 100%
rename from docs/en/pgpu_dra/how_to/index.mdx
rename to docs/en/infrastructure_management/device_management/pgpu_dra/how_to/index.mdx
diff --git a/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/k8s_dra_enable.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/k8s_dra_enable.mdx
new file mode 100644
index 0000000..2a67108
--- /dev/null
+++ b/docs/en/infrastructure_management/device_management/pgpu_dra/how_to/k8s_dra_enable.mdx
@@ -0,0 +1,58 @@
+---
+weight: 30
+---
+
+# Enable DRA(Dynamic Resource Allocation) and corresponding API groups in Kubernetes
+
+DRA support is enabled by default in Kubernetes 1.34 and later. Earlier versions, starting from 1.32, support for this feature requires manual activation.
+
+
+## Steps to Enable DRA in Kubernetes 1.32–1.33
+
+On the all master nodes:
+1. Edit `kube-apiserver` component manifests in `/etc/kubernetes/manifests/kube-apiserver.yaml`:
+    ```yaml
+    spec:
+      containers:
+      - command:
+        - kube-apiserver
+        - --feature-gates=DynamicResourceAllocation=true # required
+        - --runtime-config=resource.k8s.io/v1beta1 # required
+        - --runtime-config=resource.k8s.io/v1beta2 # required
+        # ... other flags
+    ```
+
+2. Edit `kube-controller-manager` component manifests in `/etc/kubernetes/manifests/kube-controller-manager.yaml`:
+    ```yaml
+    spec:
+      containers:
+      - command:
+        - kube-controller-manager
+        - --feature-gates=DynamicResourceAllocation=true # required
+        # ... other flags
+    ```
+
+3. Edit `kube-scheduler` component manifests in `/etc/kubernetes/manifests/kube-scheduler.yaml`:
+    ```yaml
+    spec:
+      containers:
+      - command:
+        - kube-scheduler
+        - --feature-gates=DynamicResourceAllocation=true
+        # ... other flags
+    ```
+
+4. For kubelet, edit `/var/lib/kubelet/config.yaml` on the all nodes:
+
+    ```yaml
+    apiVersion: kubelet.config.k8s.io/v1beta1
+    kind: KubeletConfiguration
+    featureGates:
+      DynamicResourceAllocation: true
+    ```
+
+    Restart kubelet:
+
+    ```bash
+    sudo systemctl restart kubelet
+    ```
diff --git a/docs/en/pgpu_dra/index.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/index.mdx
similarity index 100%
rename from docs/en/pgpu_dra/index.mdx
rename to docs/en/infrastructure_management/device_management/pgpu_dra/index.mdx
diff --git a/docs/en/infrastructure_management/device_management/pgpu_dra/install.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/install.mdx
new file mode 100644
index 0000000..5414629
--- /dev/null
+++ b/docs/en/infrastructure_management/device_management/pgpu_dra/install.mdx
@@ -0,0 +1,177 @@
+---
+weight: 20
+---
+
+# Installation
+
+## Prerequisites
+
+- **NvidiaDriver v565+**
+- **Kubernetes v1.32+**
+- **ACP v4.1+**
+- **Cluster administrator access to your ACP cluster**
+- **CDI must be enabled in the underlying container runtime (such as containerd, see [Enable CDI](how_to/cdi_enable_containerd.mdx))**
+- **DRA and corresponding API groups must be enabled(see [Enable DRA](how_to/k8s_dra_enable.mdx)).**
+
+## Procedure
+
+### Installing Nvidia driver in your gpu node
+Prefer to [Installation guide of Nvidia Official website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)
+
+### Installing Nvidia Container Runtime
+Prefer to [Installation guide of Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+
+### Downloading Cluster plugin
+
+:::info
+
+`Alauda Build of NVIDIA DRA Driver for GPUs` cluster plugin can be retrieved from Customer Portal.
+
+Please contact Consumer Support for more information.
+
+:::
+
+### Uploading the Cluster plugin
+
+For more information on uploading the cluster plugin, please refer to <ExternalSiteLink name="acp" href="ui/cli_tools/index.html#uploading-cluster-plugins" children="Uploading Cluster Plugins" />
+
+### Installing Alauda Build of NVIDIA DRA Driver for GPUs
+
+1. Add label "nvidia-device-enable=pgpu-dra" in your GPU node for `nvidia-dra-driver-gpu-kubelet-plugin` schedule.
+    ```bash
+     kubectl label nodes {nodeid} nvidia-device-enable=pgpu-dra
+    ```
+    :::info
+    **Note: On the same node, you can only set one of the following labels: `gpu=on`, `nvidia-device-enable=pgpu`, or `nvidia-device-enable=pgpu-dra`.**
+    :::
+
+2. Go to the `Administrator` -> `Marketplace` -> `Cluster Plugin` page, switch to the target cluster, and then deploy the `Alauda Build of NVIDIA DRA Driver for GPUs` Cluster plugin.
+
+
+### Verify DRA setup
+
+1. Check DRA driver and DRA controller pods:
+
+    ```bash
+    kubectl get pods -n kube-system | grep "nvidia-dra-driver-gpu"
+    ```
+    You should get results similar to:
+    ```
+     nvidia-dra-driver-gpu-controller-675644bfb5-c2hq4   1/1     Running   0              18h
+     nvidia-dra-driver-gpu-kubelet-plugin-65fjt          2/2     Running   0              18h
+    ```
+
+2. Verify ResourceSlice objects:
+    ```bash
+    kubectl get resourceslices -o yaml
+    ```
+
+    For GPU nodes, you should see output similar to:
+
+    ```yaml
+    apiVersion: resource.k8s.io/v1beta1
+    kind: ResourceSlice
+    metadata:
+      generateName: 192.168.140.59-gpu.nvidia.com-
+      name: 192.168.140.59-gpu.nvidia.com-gbl46
+      ownerReferences:
+      - apiVersion: v1
+        controller: true
+        kind: Node
+        name: 192.168.140.59
+        uid: 4ab2c24c-fc35-4c75-bcaf-db038356575c
+    spec:
+      devices:
+      - basic:
+          attributes:
+            architecture:
+              string: Pascal
+            brand:
+              string: Tesla
+            cudaComputeCapability:
+              version: 6.0.0
+            cudaDriverVersion:
+              version: 12.8.0
+            driverVersion:
+              version: 570.124.6
+            pcieBusID:
+              string: 0000:00:0b.0
+            productName:
+              string: Tesla P100-PCIE-16GB
+            resource.kubernetes.io/pcieRoot:
+              string: pci0000:00
+            type:
+              string: gpu
+            uuid:
+              string: GPU-b87512d7-c8a6-5f4b-8d3f-68183df62d66
+          capacity:
+            memory:
+              value: 16Gi
+        name: gpu-0
+      driver: gpu.nvidia.com
+      nodeName: 192.168.140.59
+      pool:
+        generation: 1
+        name: 192.168.140.59
+        resourceSliceCount: 1
+    ```
+3. Deploy workloads with DRA.
+    :::info
+    **Note:Fill in the `selector` field of the following `ResourceClaimTemplate` resource according to your specific GPU model.You can use [common expression language (CEL)](https://cel.dev) to select devices based on specific attributes.**
+    :::
+    Create spec file:
+    ```bash
+    cat <<EOF > dra-gpu-test.yaml
+    ---
+    apiVersion: resource.k8s.io/v1beta1
+    kind: ResourceClaimTemplate
+    metadata:
+      name: gpu-template
+    spec:
+      spec:
+        devices:
+          requests:
+          - name: gpu
+            deviceClassName: gpu.nvidia.com
+            selectors:
+            - cel:
+                expression: "device.attributes['gpu.nvidia.com'].productName == 'Tesla P100-PCIE-16GB'" # [!code callout]
+    ---
+    apiVersion: v1
+    kind: Pod
+    metadata:
+      name: dra-gpu-workload
+    spec:
+      tolerations:
+      - key: "nvidia.com/gpu"
+        operator: "Exists"
+        effect: "NoSchedule"
+      runtimeClassName: nvidia
+      restartPolicy: OnFailure
+      resourceClaims:
+      - name: gpu-claim
+        resourceClaimTemplateName: gpu-template
+      containers:
+      - name: cuda-container
+        image: "ubuntu:22.04"
+        command: ["bash", "-c"]
+        args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
+        resources:
+          claims:
+          - name: gpu-claim
+    ```
+    Apply spec:
+
+    ```bash
+    kubectl apply -f dra-gpu-test.yaml
+    ```
+
+    Obtain output of container in the pod:
+    ```bash
+    kubectl logs pod -n dra-gpu-workload -f
+    ```
+    The output is expected to show the GPU UUID from the container. Example:
+
+    ```text
+    GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-b87512d7-c8a6-5f4b-8d3f-68183df62d66)
+    ```
diff --git a/docs/en/pgpu_dra/intro.mdx b/docs/en/infrastructure_management/device_management/pgpu_dra/intro.mdx
similarity index 100%
rename from docs/en/pgpu_dra/intro.mdx
rename to docs/en/infrastructure_management/device_management/pgpu_dra/intro.mdx
diff --git a/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx b/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
deleted file mode 100644
index caa5c1a..0000000
--- a/docs/en/pgpu_dra/how_to/k8s_dra_enable.mdx
+++ /dev/null
@@ -1,7 +0,0 @@
----
-weight: 30
----
-
-# Enable DRA(Dynamic Resource Allocation) in Kubernetes
-
-
diff --git a/docs/en/pgpu_dra/install.mdx b/docs/en/pgpu_dra/install.mdx
deleted file mode 100644
index d745382..0000000
--- a/docs/en/pgpu_dra/install.mdx
+++ /dev/null
@@ -1,63 +0,0 @@
----
-weight: 20
----
-
-# Installation
-
-## Prerequisites
-
-- **NvidiaDriver v565+**
-- **Kubernetes v1.32+**
-- **ACP v4.1+**
-- **Cluster administrator access to your ACP cluster**
-- **CDI must be enabled in the underlying container runtime (such as containerd)**
-- **DRA and corresponding API groups must be enabled**
-
-## Procedure
-
-### Installing Nvidia driver in your gpu node
-Prefer to [Installation guide of Nvidia Official website](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)
-
-### Installing Nvidia Container Runtime
-Prefer to [Installation guide of Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
-
-### Downloading Cluster plugin
-
-:::info
-
-`Alauda Build of NVIDIA DRA Driver for GPUs` cluster plugin can be retrieved from Customer Portal.
-
-Please contact Consumer Support for more information.
-
-:::
-
-### Uploading the Cluster plugin
-
-For more information on uploading the cluster plugin, please refer to <ExternalSiteLink name="acp" href="ui/cli_tools/index.html#uploading-cluster-plugins" children="Uploading Cluster Plugins" />
-
-### Installing Alauda Build of NVIDIA DRA Driver for GPUs
-
-1. Add label "nvidia-device-enable=pgpu-dra" in your GPU node for `nvidia-dra-driver-gpu-kubelet-plugin` schedule.
-    ```bash
-     kubectl label nodes {nodeid} nvidia-device-enable=pgpu-dra
-    ```
-    :::info
-    **Note: On the same node, you can only set one of the following labels: `gpu=on`, `nvidia-device-enable=pgpu`, or `nvidia-device-enable=pgpu-dra`.**
-    :::
-
-2. Go to the `Administrator` -> `Marketplace` -> `Cluster Plugin` page, switch to the target cluster, and then deploy the `Alauda Build of NVIDIA DRA Driver for GPUs` Cluster plugin.
-
-3. Verify result. You can see the status of "Installed" in the UI or you can check the pod status:
-    ```bash
-    kubectl get pods -n kube-system | grep "nvidia-dra-driver-gpu"
-    ```
-    You should get results similar to:
-    ```
-    nvidia-dra-driver-gpu-controller-675644bfb5-c2hq4   1/1     Running   0              18h
-    nvidia-dra-driver-gpu-kubelet-plugin-65fjt          2/2     Running   0              18h
-    ```
-
-### Upgrading Alauda Build of NVIDIA DRA Driver for GPUs
-
-1. Upload the new version for package of **Alauda Build of NVIDIA DRA Driver for GPUs** plugin to ACP.
-2. Go to the `Administrator` -> `Clusters` -> `Target Cluster` -> `Functional Components` page, then click the `Upgrade` button, and you will see the `Alauda Build of NVIDIA DRA Driver for GPUs` can be upgraded.