[cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

Zach van Rijn me at zv.io
Fri Dec 2 19:38:17 CET 2022


On Thu, 2022-12-01 at 23:22 -0600, Jacob Bachmeyer wrote:
> Zach van Rijn wrote:
> > On Wed, 2022-11-30 at 21:21 -0600, Jacob Bachmeyer wrote:
> >   
> > > ...
> ...those panics during early boot, strongly suggest bad RAM as
> Bruno Haible suggested. 


I agree it is likely a hardware issue. The system requires either
half or full DIMM population, so unfortunately the only stable
configuration (w/o hardware replacement) is half memory & cores.

But 16 cores and 64GB memory is probably OK for the farm for now,
and I will replace the defective module(s) early next year.

It's online again, and hopefully remains so:

root at gcc102:~# lscpu
Architecture:          sparc64
  CPU op-mode(s):      32-bit, 64-bit
  Byte Order:          Big Endian
CPU(s):                128
  On-line CPU(s) list: 0-63,128-191
Model name:            UltraSparc T3 (Niagara3)
  Thread(s) per core:  4  <-- actually 8?
  Core(s) per socket:  16 <-- actually 8?
  Socket(s):           2
  Flags:               sun4v
Caches (sum of all):   
  L1d:                 1 MiB (128 instances)
  L1i:                 2 MiB (128 instances)
  L2:                  768 MiB (128 instances)


Pierre, you can feel free to re-enable your cron job.

Jacob, thank you for your careful analysis and brainstorming.


ZV



More information about the cfarm-users mailing list