[cfarm-users] cfarm109 GPU access problem

Thu Mar 12 01:20:38 CET 2026

On Thu, 2026-03-12 at 00:44 +0100, Thomas Schwinge wrote:
> Hi Zach!
> 
> On 2026-03-11T16:14:56-0500, Zach van Rijn <me at zv.io> wrote:
> > On Wed, 2026-03-11 at 16:36 +0100, Thomas Schwinge wrote:
> > > Is there some (permissions?) problem on cfarm109?  With
> > > 'nvidia-smi', there are some errors, but the GPU shows up:
> > > 
> > >     $ nvidia-smi
> > >     NvRmMemInitNvmap failed: error Permission denied
> > >     NvRmMemMgrInit failed: Memory Manager Not supported,
> > > line
> > > 333
> > >     NvRmMemMgrInit failed: error type 196626
> > >     libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
> > >     NvRmMemInitNvmap failed: error Permission denied
> > >     NvRmMemMgrInit failed: Memory Manager Not supported,
> > > line
> > > 333
> > >     NvRmMemMgrInit failed: error type 196626
> > >     libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196625
> > >     Wed Mar 11 07:19:17 2026    
> 
> Specifically, per 'strace', I see:
> 
>     [...]
>     openat(AT_FDCWD, "/dev/nvmap", O_RDONLY|O_CLOEXEC) = -1
> EACCES (Permission denied)
>     write(2, "NvRmMemInitNvmap failed: error P"..., 49) = 49
>     [...]
> 
> So it fails to open '/dev/nvmap':
> 
>     $ ls -alF /dev/nvmap
>     cr--r----- 1 root video 10, 123 Jan  1  1970 /dev/nvmap
> 
> I'm not in group "video" -- but also not sure if that's really
> the expected permissions setup for this file.
> 
> > Indeed it is a permissions error (I only tested as root
> 
> Uh...  ;-P
> 
> Before we analyze/experiment any further, please try the
> following.
> Reboot the system.  Does '/dev/nvmap' already exist (probably
> not?); if yes, what are its permissions?

As requested, a full reboot, then immediately the following:

root at cfarm109:~# uptime && ls -l /dev/nvmap
 18:54:18 up 1 min,  2 users,  load average: 0.49, 0.20, 0.07
cr--r----- 1 root video 10, 123 Mar 11 18:53 /dev/nvmap

> Now run, for example, 'nvidia-smi' as non-root (!) user.  Does
> '/dev/nvmap' now exist (probably yes); if yes, what are its
> permissions?

Same as before? It already exists with the same permissions.

> 
> If I remember correctly, long ago (with "classic" GPU
> card/drivers), I once had a similar issue , where 'nvidia-
> modprobe' (?) wouldn't set up the '/dev/nvidia*' permissions
> correctly, if the first access (triggering
> the 'nvidia' etc. modules load) was as root vs. non-root user.

This makes sense. But since the file already exists on bringup I
don't see how this workaround would help.

I modified /etc/udev/rules.d/99-tegra-devices.rules to use 666
perms for /dev/nvmap, which seems to have helped a bit, but some
/dev/dri* permissions were not suitable, maybe a bad approach
altogether even if we can surgically modify the permissions.

Instead of a full reboot, this should be sufficient?

  # udevadm control --reload-rules && udevadm trigger

It's faster to test changes that way but I want to try one more
approach before getting into the weeds.

I reverted those changes to test one more thing; I added an
unprivileged account to both 'video' and 'render' groups, where
the /dev/dri* are part of group 'render'.

That seems to have done the trick:

zv at cfarm109:~/311$ ./llama.cpp/build-cuda/bin/llama-cli -s 0 -c
0 -m models/GLM-4.7-Flash-UD-Q8_K_XL.gguf
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 125771 MiB):
  Device 0: NVIDIA Thor, compute capability 11.0, VMM: yes,
VRAM: 125771 MiB (121255 MiB free)

Loading model...

[ Prompt: 54.6 t/s | Generation: 26.1 t/s ]

:)

Does adding all users (and new users by default) to both of
these groups make sense, or do you think there is a better way?

Zach