[cfarm-users] gcc102 (sparc64) down again for maintenance/repairs

Segher Boessenkool segher at kernel.crashing.org
Sun Dec 11 16:09:18 CET 2022


Hi Zach,

On Fri, Dec 09, 2022 at 09:12:06AM -0600, Zach van Rijn via cfarm-users wrote:
> On Fri, 2022-12-09 at 15:42 +0100, Pierre Muller via cfarm-users
> wrote:
> > ...
> > 
> >   It still seems that there are CPU lockup :-(

*Soft* lockups.  Tasks that were unresponsive for more than 20s.  This
has been "normal" on bigger Linux systems (a hundred of cores or so) for
very many years, and although it is scary, it does not harm much (there
typically are other cores still available to do any work).  Some of the
scalability problems are solved over time, but new ones crop up as well.

> *sigh* I think the machine should be replaced.

Question.  Does it have ECC RAM?  It should of course; so how can RAM
be undetected bad then?

> Questions for the farm:
> 
>   * Do you want a replacement T3-2 (which has the last in-order
>     CPU they made), or something newer like a T5-2, which would
>     be faster/more useful like gcc202 but a dedicated machine?

I have no opinion.  Any larger machine works for us, I think.

>   * Is gcc202 good enough for the farm's sparc64 needs? If so,
>     is there different hardware that would be interesting? This
>     might be a good opportunity to set up "comparison" machines,
>     e.g. two identical machines with different operating systems.

I think people did find it useful, yes.  And it certainly is good to
have more than one machine.

The cfarm is not there for benchmarketing or any other kind of
comparitive system evaluation, it is there to help open source
developers do their thing; what kind of comparison thing do you have
in mind?  Not criticising you, just confused what your goal is here :-)


Segher


More information about the cfarm-users mailing list