[cfarm-users] cfarm109 GPU access problem
Zach van Rijn
me at zv.io
Thu Mar 12 01:20:38 CET 2026
On Thu, 2026-03-12 at 00:44 +0100, Thomas Schwinge wrote:
> Hi Zach!
>
> On 2026-03-11T16:14:56-0500, Zach van Rijn <me at zv.io> wrote:
> > On Wed, 2026-03-11 at 16:36 +0100, Thomas Schwinge wrote:
> > > Is there some (permissions?) problem on cfarm109? With
> > > 'nvidia-smi', there are some errors, but the GPU shows up:
> > >
> > > $ nvidia-smi
> > > NvRmMemInitNvmap failed: error Permission denied
> > > NvRmMemMgrInit failed: Memory Manager Not supported,
> > > line
> > > 333
> > > NvRmMemMgrInit failed: error type 196626
> > > libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
> > > NvRmMemInitNvmap failed: error Permission denied
> > > NvRmMemMgrInit failed: Memory Manager Not supported,
> > > line
> > > 333
> > > NvRmMemMgrInit failed: error type 196626
> > > libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
> > > Wed Mar 11 07:19:17 2026
>
> Specifically, per 'strace', I see:
>
> [...]
> openat(AT_FDCWD, "/dev/nvmap", O_RDONLY|O_CLOEXEC) = -1
> EACCES (Permission denied)
> write(2, "NvRmMemInitNvmap failed: error P"..., 49) = 49
> [...]
>
> So it fails to open '/dev/nvmap':
>
> $ ls -alF /dev/nvmap
> cr--r----- 1 root video 10, 123 Jan 1 1970 /dev/nvmap
>
> I'm not in group "video" -- but also not sure if that's really
> the expected permissions setup for this file.
>
> > Indeed it is a permissions error (I only tested as root
>
> Uh... ;-P
>
> Before we analyze/experiment any further, please try the
> following.
> Reboot the system. Does '/dev/nvmap' already exist (probably
> not?); if yes, what are its permissions?
As requested, a full reboot, then immediately the following:
root at cfarm109:~# uptime && ls -l /dev/nvmap
18:54:18 up 1 min, 2 users, load average: 0.49, 0.20, 0.07
cr--r----- 1 root video 10, 123 Mar 11 18:53 /dev/nvmap
> Now run, for example, 'nvidia-smi' as non-root (!) user. Does
> '/dev/nvmap' now exist (probably yes); if yes, what are its
> permissions?
Same as before? It already exists with the same permissions.
>
> If I remember correctly, long ago (with "classic" GPU
> card/drivers), I once had a similar issue , where 'nvidia-
> modprobe' (?) wouldn't set up the '/dev/nvidia*' permissions
> correctly, if the first access (triggering
> the 'nvidia' etc. modules load) was as root vs. non-root user.
This makes sense. But since the file already exists on bringup I
don't see how this workaround would help.
I modified /etc/udev/rules.d/99-tegra-devices.rules to use 666
perms for /dev/nvmap, which seems to have helped a bit, but some
/dev/dri* permissions were not suitable, maybe a bad approach
altogether even if we can surgically modify the permissions.
Instead of a full reboot, this should be sufficient?
# udevadm control --reload-rules && udevadm trigger
It's faster to test changes that way but I want to try one more
approach before getting into the weeds.
I reverted those changes to test one more thing; I added an
unprivileged account to both 'video' and 'render' groups, where
the /dev/dri* are part of group 'render'.
That seems to have done the trick:
zv at cfarm109:~/311$ ./llama.cpp/build-cuda/bin/llama-cli -s 0 -c
0 -m models/GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 125771 MiB):
Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes,
VRAM: 125771 MiB (121255 MiB free)
Loading model...
[ Prompt: 54.6 t/s | Generation: 26.1 t/s ]
:)
Does adding all users (and new users by default) to both of
these groups make sense, or do you think there is a better way?
Zach
More information about the cfarm-users
mailing list