[cfarm-users] gcc112: disk errors

Segher Boessenkool segher at kernel.crashing.org
Mon Apr 23 18:03:38 CEST 2018


On Mon, Apr 23, 2018 at 08:05:09AM -0700, David Edelsohn via cfarm-users wrote:
> I have reported the problem to OSU.
> 
> Don't know if it's limited to a filesystem corruption bug or symptom
> of a hardware disk failure.

I kicked off all users and tried an xfs_repair.  xfs_repair -n finds
a lot of errors, but xfs_repair does not want to run because the device
is busy (although lsof claims it is not).

We probably need a reboot, and yes I fear hardware failure :-(


Segher


> On Mon, Apr 23, 2018 at 6:43 AM, Alexander Monakov via cfarm-users
> <cfarm-users at lists.tetaneutral.net> wrote:
> > (Bcc'ed to cfarm-admins@)
> >
> > Hello,
> >
> > on gcc112 there are disk errors and new logins are denied. In dmesg there's
> >
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): Metadata corruption detected at xfs_inode_buf_read_verify+0x84/0x110 [xfs], xfs_inode block 0xbd0a8540
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): Unmount and run xfs_repair
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): First 64 bytes of corrupted metadata buffer:
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e000: 46 69 6c 65 50 61 74 68 2e 6d 61 6b 65 64 69 72  FilePath.makedir
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e010: 73 7d 20 73 75 63 63 65 65 64 73 20 77 68 65 6e  s} succeeds when
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e020: 20 63 61 6c 6c 65 64 20 6f 6e 20 61 20 64 69 72   called on a dir
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e030: 65 63 74 6f 72 79 20 74 68 61 74 20 61 6c 72 65  ectory that alre
> >
> > this "Metadata corruption" message is repeated a few times until
> >
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): Metadata corruption detected at xfs_inode_buf_read_verify+0x84/0x110 [xfs], xfs_inode block 0xbd0a8540
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): Unmount and run xfs_repair
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): First 64 bytes of corrupted metadata buffer:
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e000: 46 69 6c 65 50 61 74 68 2e 6d 61 6b 65 64 69 72  FilePath.makedir
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e010: 73 7d 20 73 75 63 63 65 65 64 73 20 77 68 65 6e  s} succeeds when
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e020: 20 63 61 6c 6c 65 64 20 6f 6e 20 61 20 64 69 72   called on a dir
> > kern  :alert : [Mon Apr 23 11:32:37 2018] c0000011fe95e030: 65 63 74 6f 72 79 20 74 68 61 74 20 61 6c 72 65  ectory that alre
> > kern  :alert : [Mon Apr 23 11:32:37 2018] XFS (dm-6): metadata I/O error: block 0xbd0a8540 ("xfs_trans_read_buf_map") error 117 numblks 16
> > kern  :notice: [Mon Apr 23 11:32:37 2018] XFS (dm-6): xfs_do_force_shutdown(0x1) called from line 370 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xd00000001ce98e9c
> > kern  :alert : [Mon Apr 23 11:32:38 2018] XFS (dm-6): I/O Error Detected. Shutting down filesystem
> > kern  :alert : [Mon Apr 23 11:32:38 2018] XFS (dm-6): metadata I/O error: block 0x70271a2a ("xlog_iodone") error 5 numblks 64
> > kern  :notice: [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_do_force_shutdown(0x2) called from line 1203 of file fs/xfs/xfs_log.c.  Return address = 0xd00000001ce83b8c
> > kern  :alert : [Mon Apr 23 11:32:38 2018] XFS (dm-6): Please umount the filesystem and rectify the problem(s)
> > kern  :warn  : [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> > kern  :warn  : [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_log_force: error -5 returned.
> > kern  :warn  : [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_log_force: error -5 returned.
> > kern  :alert : [Mon Apr 23 11:32:38 2018] XFS (dm-6): metadata I/O error: block 0x70252003 ("xfs_trans_read_buf_map") error 5 numblks 1
> > kern  :notice: [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_do_force_shutdown(0x1) called from line 370 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xd00000001ce98e9c
> > kern  :warn  : [Mon Apr 23 11:32:38 2018] XFS (dm-6): xfs_log_force: error -5 returned.
> > kern  :warn  : [Mon Apr 23 11:32:49 2018] XFS (dm-6): xfs_log_force: error -5 returned.
> > kern  :warn  : [Mon Apr 23 11:33:19 2018] XFS (dm-6): xfs_log_force: error -5 returned.
> >
> >
> > and after that the "xfs_log_force" error is repeated approx. every 30 seconds.
> >
> > Alexander


More information about the cfarm-users mailing list