[cfarm-users] cfarm109 memory monitoring

Zach van Rijn me at zv.io
Sat Mar 7 20:26:58 CET 2026


On cfarm109, you may need more memory than the system reports is
available, even if no applications are visibly using it, leading
to cases where your application may fail to acquire memory or
potentially lead to system stability issues (not yet verified).

Memory usage can be seen in the Munin dashboard [3] or directly
on the machine; please check before running high-memory apps.

During/after testing of a CUDA application I noticed that there
is a possibility for memory to not be freed after program exit.

A search online reveals [1] that this may be Thor-specific (note
I did not test 'ollama' but instead 'llama.cpp'. The same test
on the Spark does not reproduce the issue. Other reports [2]
involve other applications, so it may not be an application bug.

I don't have time right now to investigate further, so any help
would be appreciated and I assume upstream is looking into it.

One workaround is:

  # sync && sysctl -w vm.drop_caches=3

I've implemented an equivalent in C and made the suid executable
available at '/usr/local/bin/cfarm-drop-cache' which allows any
user to clean up the system. This is in the default PATH, too.

  $ cfarm-drop-caches

It takes no arguments and produces no output. It should take
only a moment to execute. There should be no risk in running
this, but if you do, please first check that no one else is
running anything that looks like a benchmark or application that
measures performance since it could potentially skew results.


Zach


[1]: https://github.com/ollama/ollama/issues/12283

[2]:
https://forums.developer.nvidia.com/t/vllm-container-on-jetson-thor-second-start-fails-until-vm-drop-caches-3-system-issue-or-thor-vllm-container-25-08-py3-base-bug/347575

[3]: https://portal.cfarm.net/munin/gccfarm/cfarm109/memory.html


More information about the cfarm-users mailing list