With the LLM era upon us, I’ve been wanting to play around with some of the open source, self-hosted toys available. I’m using an old workstation as a homelab, which conveniently has an old NVIDIA GPU installed. Seeing as I’m running a Kubernetes cluster I want to expose the GPU to the workloads to utilise the existing infrastructure for easy hosting, scheduling, and deployment of GPU assisted applications.
This article is mainly intended to serve as reference material for myself when I get started with the actual applications, but I hope it will be of use to others as well.
I’m currently running a Kubernetes 1.28 “cluster” on a bare-metal,
one-node Debian 11 machine using containerd
,
so this article will assume a similar setup,
though I’ve tried to link to relevant resources for other setups.
In the future I’m planning to switch over to using Proxmox or something similar for virtualisation. When that time comes I’ll probably also update this article with a new configuration.
Configuration#
Prerequisites for the NVIDIA k8s-device-plugin are functioning NVIDIA CUDA drivers and the Container Toolkit installed on the nodes running the workload.
CUDA Driver#
Before starting make sure you don’t have any existing NVIDIA drivers by uninstalling them using
sudo apt-get autoremove cuda* nvidia* nouveau* --purge
and rebooting your computer.
Before installing the GPU driver we need the appropriate kernel headers which can be fetched by running
sudo apt-get install linux-headers-$(uname -r)
Next we add the keyring and repository for the CUDA driver
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
such that we can easily install the driver using apt-get
sudo apt-get update
sudo apt-get install cuda-drivers
Reboot and make sure that the driver is working by running
nvidia-smi
You should then be greeted with information about your connected GPU and driver version
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 On | 00000000:02:00.0 Off | N/A |
| 30% 37C P8 12W / 180W | 1MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Container Toolkit#
Following
the NVIDIA Container Toolkit installation guide
for apt
we start by configuring the Container Toolkit package repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
before we installing nvidia-container-toolkit
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
containerd runtime#
Take a backup of your existing containerd
configuration in case something goes wrong in the following steps
sudo cp /etc/containerd/config.toml /etc/containerd/config.toml.bak
We can then either configure containerd
manually according to
the k8s-device-plugin readme,
or run
sudo nvidia-ctk runtime configure --runtime=containerd
to set up nvidia-container-runtime
as the default low-level runtime for containerd
.
NVIDIA Device Plugin#
Having cleared all the prerequisites of installing a working CUDA driver,
setting up the NVIDIA Container Toolkit, and configuring containerd
to use the NVIDIA runtime,
we can now apply the NVIDIA device plugin using its Helm chart.
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--include-crds \
--create-namespace \
--version 0.14.3
Time-slicing (Optional)#
The default behaviour of the NVIDIA device plugin is to allocate the entire GPU to a single pod, meaning that if you have multiple pods requesting GPU-time, only one will be scheduled at a time.
To overcome this we can configure time-slicing of the GPU, meaning that the GPU will be shared between pods.
Configure time-slicing by first creating a ConfigMap
with the following configuration to configure a maximum of
10 replicas (line 14).
|
|
We then apply the ConfigMap
and configure nvidia-device-plugin
to use it by name (line 5) and the supplied default
configuration key (line 8)
kubectl apply -f cm-time-slicing.yaml
helm upgrade nvdp nvdp/nvidia-device-plugin \
--reuse-values \
--set config.name=cm-time-slicing \
--set config.default=time-slicing
You should now see a capacity of 10 nvidia.com/gpu
on each node per GPU by running
kubectl get node -o 'jsonpath={.items[*].status.capacity}' | jq
{
...
"nvidia.com/gpu": "10",
...
}
Note that the workloads are granted replicas from the same GPU, and that each workload has access to the same GPU memory and runs in the same fault-domain, meaning that if one workload crashes, they all will.
More details about configuring the device plugin can be found in the readme on GitHub.
Running a workload#
Assuming the configuration went well, we can now try to run a test workload using the GPU by starting a pod which requests a GPU-resource (line 11-13).
|
|
kubectl create ns cuda-test
kubectl apply -f cuda-vectoradd.yaml
If everything went well the logs of the workload should read
kubectl logs -n cuda-test cuda-vectoradd
[Vector addition of 50000 elements]
...
Test PASSED
If everything works it’s as easy as adding a resource limit for nvidia.com/gpu
on each workload that you want to give
access to GPU-resources.
resources:
limits:
nvidia.com/gpu: "1"
Peeking inside a pod requesting GPU-resources we’ll also find two NVIDIA-related environment variables,
kubectl exec -it <pod> -- env | grep NVIDIA
NVIDIA_DRIVER_CAPABILITIES=compute,video,utility
NVIDIA_VISIBLE_DEVICES=GPU-<UUID>
which reveals that we have GPU accelerated compute and video encoding/decoding available in the pod.
Troubleshooting#
If you get a pod startup error similar to
0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
it could mean you don’t have enough GPU-resources available, try increasing the time-slicing replicas from the Time Slicing section — or buy another GPU, whatever is more cost-efficient for you.
I’ve also experienced the error when multiple long-running workloads try to start after rebooting the node.
Restarting the nvidia-device-plugin
pod and workloads requesting GPU-resources seems to fix the issue.
Using Argo CD I’ve added a
negative sync-wave
annotation to make sure the
nvidia-device-plugin
is started before the workloads to avoid this issue
annotations:
argocd.argoproj.io/sync-wave: "-1"
Addendum#
I first tried to use the NVIDIA GPU Operator which I interpreted to be a catch-all solution, installing the device plugin as well as the drivers and container toolkit. However, I couldn’t get it to work, so I opted for the unfortunately more manual approach of installing the device plugin, drivers and container toolkit as separate components.
It could be something with my setup, or I’ve misunderstood something in the documentation. If you have a solution I’d love to hear from you!
Summary#
I’m running Argo CD with Kustomize + Helm in an attempt to follow GitOps best-practices. My full homelab configuration as of the writing of this article can be found on GitHub as a reference.
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonAnnotations:
argocd.argoproj.io/sync-wave: "-1"
resources:
- namespace.yaml
- cm-time-slicing.yaml
helmCharts:
- name: nvidia-device-plugin
repo: https://nvidia.github.io/k8s-device-plugin
version: 0.14.2
releaseName: "nvidia-device-plugin"
namespace: nvidia-device-plugin
includeCRDs: true
valuesFile: values.yaml
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: nvidia-device-plugin
# values.yaml
config:
name: cm-time-slicing
default: time-slicing
# cm-time-slicing.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cm-time-slicing
namespace: nvidia-device-plugin
data:
time-slicing: |-
version: v1
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 10