[cfarm-users] Upgrade status and status on cfarm offline
mulx at aplu.fr
Sat Aug 19 10:59:51 CEST 2017
I want to do a short status/summarize about upgrade announced previously
and servers which are offline [¹].
As announced here[²], gcc12, 13, 14 would have been upgraded to a more
recent Debian release.
Unfortunately gcc13 and gcc14 are now off-line for an indefinite time,
we've asked guys from Smile's DC to help us/reinstall from scratch.
gcc13 failed to reboot after the in-place upgrade from Debian 5 to 6
(probably a grub miss-configuration or missing firmware).
gcc14 hard-disk died during the in-place upgrade from Debian 5 to 6
(smartctl wasn't showing anything wrong before, but dmesg was full of
I/O error during the upgrade).
gcc12 is still running Debian 5 with it pretty beautiful uptime (more
than 7 years). I didn't start any upgrade because I'm not sure to have
enough free space on / to upgrade (even after cleaning /tmp), plus the
long uptime made me wonder that this server may never survive reboots
gcc113, 115, 116 were offline few days ago, there was a configuration
issue with the disks. They are back online. Thanks people from OSUOSL
for help on getting those board back online.
gcc117 is offline, we are in contact with guys from OSUOSL and trying to
reach AMD for support, for now the board doesn't boot at all and we have
no idea when we will be able to bring it online.
gcc21, gcc76 are still off-line, people from IUT and INSA Rouen may give
a look during the following next weeks (they were on summer holiday).
gcc67 is unstable, we are aware, either there is a bug within the
processor or within the kernel (maybe both), last captured crash was a
kernel Oops. From what we found were are not the only one to have
stability issue with AMD Ryzen and Linux.
As far as I know gcc20, gcc119 are up, even if the web site report them
That's all :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfarm-users