[cfarm-users] gcc112: disk errors

Aymeric mulx at aplu.fr
Tue Apr 24 09:45:39 CEST 2018





On 2018-04-23 18:03, Segher Boessenkool via cfarm-users wrote:
> On Mon, Apr 23, 2018 at 08:05:09AM -0700, David Edelsohn via 
> cfarm-users wrote:
>> I have reported the problem to OSU.
>> 
>> Don't know if it's limited to a filesystem corruption bug or symptom
>> of a hardware disk failure.
> 
> I kicked off all users and tried an xfs_repair.  xfs_repair -n finds
> a lot of errors, but xfs_repair does not want to run because the device
> is busy (although lsof claims it is not).
> 
> We probably need a reboot, and yes I fear hardware failure :-(


I believe more in a crash of xfs than a hardware failure (like we had on 
gcc118).
According to lvm, system have multipath for storage and multipath looks 
like fine, root filesystem too.

I've forced a reboot and system came back online with /home mounted. 
dmesg show an xfs recovery.
[   12.608593] XFS (dm-6): Mounting V4 Filesystem
[   12.841363] XFS (dm-6): Starting recovery (logdev: internal)
[   29.126997] XFS (dm-6): Ending recovery (logdev: internal)

I perform an xfs_repair to be sure, xfs looks like ok… for now.

In case the error came back again, we may try to switch to ext4 (this 
will imply loosing all data).

Aymeric.


More information about the cfarm-users mailing list