[cfarm-users] gcc102 (sparc64) down again for maintenance/repairs
Zach van Rijn
me at zv.io
Fri Dec 2 19:38:17 CET 2022
On Thu, 2022-12-01 at 23:22 -0600, Jacob Bachmeyer wrote:
> Zach van Rijn wrote:
> > On Wed, 2022-11-30 at 21:21 -0600, Jacob Bachmeyer wrote:
> >
> > > ...
> ...those panics during early boot, strongly suggest bad RAM as
> Bruno Haible suggested.
I agree it is likely a hardware issue. The system requires either
half or full DIMM population, so unfortunately the only stable
configuration (w/o hardware replacement) is half memory & cores.
But 16 cores and 64GB memory is probably OK for the farm for now,
and I will replace the defective module(s) early next year.
It's online again, and hopefully remains so:
root at gcc102:~# lscpu
Architecture: sparc64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Big Endian
CPU(s): 128
On-line CPU(s) list: 0-63,128-191
Model name: UltraSparc T3 (Niagara3)
Thread(s) per core: 4 <-- actually 8?
Core(s) per socket: 16 <-- actually 8?
Socket(s): 2
Flags: sun4v
Caches (sum of all):
L1d: 1 MiB (128 instances)
L1i: 2 MiB (128 instances)
L2: 768 MiB (128 instances)
Pierre, you can feel free to re-enable your cron job.
Jacob, thank you for your careful analysis and brainstorming.
ZV
More information about the cfarm-users
mailing list