Update NVIDIA driver symlink script by casparvl · Pull Request #158 · EESSI/software-layer-scripts

casparvl · 2026-02-04T13:41:06Z

We'll need the following variant symlinks to be in place before this script can work as intended:

ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia
ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/aarch64/lib/nvidia
ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/riscv64/lib/nvidia

And then:

ln -s '$(EESSI_NVIDIA_OVERRIDE_DEFAULT:-/dev/null)' /cvmfs/software.eessi.io/defaults/nvidia

This can then be quite easily tested from within the container:

./eessi_container.sh -a rw -r software.eessi.io -b $<host-software-layer-scripts>:/software-layer-scripts --nvidia all
cd /software-layer-scripts/scripts/gpu_support/nvidia
./link_nvidia_host_libraries.sh

This should error out stating that the variant symlink resolves to /dev/null. Then, you can change /etc/cvmfs/default.local to set e.g. EESSI_NVIDIA_OVERRIDE_DEFAULT (e.g. to /opt/eessi/nvidia) and run the linking script again - this should the install the symlinks.

casparvl · 2026-02-04T13:57:16Z

Although we don't have the symlinks yet, I can actually already test this in the container - it will just create the symlinks in /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/ in the writeable overlay. That's fine.

What I did:

$ cd /software-layer-scripts/scripts/gpu_support/nvidia/
$ umask 0022
$  source /cvmfs/software.eessi.io/versions/2025.06/init/lmod/bash
# For some reason this failed to load the module - some module cache issue?
$ module load EESSI/2025.06
$ cat > dummy.c <<'EOF'
int main(void) { return 0; }
EOF
$ gcc -Wall -Wl,--no-as-needed -lcuda dummy.c -o dummy -L /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/
# singularity has /.singularity.d/libs with the CUDA drivers in the LD_LIBRARY_PATH, but those are not the ones we want to find...
$  unset LD_LIBRARY_PATH
$ ldd dummy
Apptainer> ldd dummy
        linux-vdso.so.1 (0x00007ffc59bb4000)
        libcuda.so.1 => /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so.1 (0x000014f19b377000)
...

Works as intended. After implementing the variant symlinks, we should retest, try to use the EESSI_NVIDIA_OVERRIDE_DEFAULT symlink, and, once that works, try again using the EESSI_202506_NVIDIA_OVERRIDE variant symlink.

…-scripts#158 and EESSI/compatibility-layer#229

bedroge · 2026-02-17T12:33:06Z

Tested in the container using EESSI 2025.06 and without having configured the variant symlinks:

ERROR: /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia is a symlink pointing to /cvmfs/software.eessi.io/defaults/nvidia, which is a symlink pointing to /dev/null
If you want to symlink the drivers in a single location for all EESSI versions, please define the EESSI_NVIDIA_OVERRIDE_DEFAULT variant symlink in your local CVMFS configuration to point to writeable location. This will change the target of symlink /cvmfs/software.eessi.io/defaults/nvidia.
If you want to symlink the drivers only for this version of EESSI (2025.06), please define the EESSI__NVIDIA_OVERRIDE variant symlink in your local CVMFS configuration to point to writeable location. This will change the target of symlink /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia.

With the variant symlink reconfigured as EESSI_NVIDIA_OVERRIDE_DEFAULT=/opt/eessi/nvidia:

Ensure the final target of /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia (/opt/eessi/nvidia) exists
Host NVIDIA GPU drivers linked successfully for EESSI

Wiping that dir and doing it again using EESSI_202506_NVIDIA_OVERRIDE=/opt/eessi/nvidia yields the same result.

Also checked the symlinks, and the pointed to the expected locations.

scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh

bedroge · 2026-02-17T12:44:06Z

scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh

+    # Do some checks on existence of links and that we don't end up at /dev/null (the default), so we can print some informative information
+    # One downside is that we can't explicitely check if something is a variant symlink, so we'll just assume that if it's a link AND it
+    # lives in our CVMFS repository, it must be a variant symlink
+    nvidia_trusted_dir="${EESSI_EPREFIX}/lib/nvidia"


Does this mean that the script will no longer work for 2023.06?

Hm, yeah, that's annoying, this script is in an unversioned prefix. I mean, if we deploy this only for 2025.06, we keep the old version for 2023.06. But then if we want to update that, we have to revert all changes, etc. Maybe we should just duplicate the script? I.e. create something like scripts/gpu_support/nvidia/2023.06/link_nvidia_host_libraries.sh? Or should it be at higher level 2023.06/scripts/gpu_support...?

Co-authored-by: Bob Dröge <b.e.droge@rug.nl>

Update NVIDIA driver symlink script

6a5456d

casparvl pushed a commit to casparvl/docs that referenced this pull request Feb 4, 2026

Update documentation to account for changes from EESSI/software-layer…

87ab821

…-scripts#158 and EESSI/compatibility-layer#229

casparvl mentioned this pull request Feb 4, 2026

Update documentation to account for changes in GPU support in 2025.06 EESSI/docs#672

Draft

bedroge mentioned this pull request Feb 4, 2026

add new host injection variant symlinks for 2025.06 EESSI/filesystem-layer#263

Merged