More
referral
Increase your income with Hive. Invite your friends and earn real cryptocurrency!

Rig rebooting after a minute and i dont know why any help is appreciated

Hi,
I have a rig of 5 cards (asus rx 580, gigabyte rx 580, gigabyte 5600 xt, xfc 5600 xt, gigabyte vega 56. Using msi z170 gaming-x m7 board 4g enabled (tried both gen1 and gen 2, 32 cycles and 96 cycles, intel virtualization and vt disabled in bios) Rig crashes after a 2-3 hours. I was ok with that but after last crash it started to crash after a minute it starts to mine. Generally i see gigabyte 5600 xt or vega 56 dead. So i wanted to check logs but as i am noob the only command i could find was “grep -a error /var/log/syslog | tail -n 100” so i used that and got lots of logs generally indicating some mount, fatal gpu errors and gpu post erros with some number codes but i dont know what do they mean and what should i do. I am not sure if these errors are the cause of reboots. Dont know to check deeper. Any help is appreciated.

Here are my logs:

Nov 29 06:57:30 Rig_3645069 ntfs-3g[464]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 06:57:30 Rig_3645069 ntfs-3g[464]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 06:57:30 Rig_3645069 rngd[500]: read error
Nov 29 06:57:30 Rig_3645069 rngd[500]: read error
Nov 29 06:57:30 Rig_3645069 kernel: [ 10.271839][ T286] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 06:57:34 Rig_3645069 ntfs-3g[614]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 06:57:34 Rig_3645069 ntfs-3g[614]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:25:50 Rig_3645069 ntfs-3g[472]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:25:50 Rig_3645069 ntfs-3g[472]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:25:50 Rig_3645069 rngd[514]: read error
Nov 29 07:25:50 Rig_3645069 rngd[514]: read error
Nov 29 07:25:50 Rig_3645069 kernel: [ 10.190519][ T280] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:25:55 Rig_3645069 ntfs-3g[616]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:25:55 Rig_3645069 ntfs-3g[616]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:30:22 Rig_3645069 kernel: [ 10.224944][ T278] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:30:22 Rig_3645069 ntfs-3g[472]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:30:22 Rig_3645069 ntfs-3g[472]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:30:22 Rig_3645069 rngd[490]: read error
Nov 29 07:30:22 Rig_3645069 rngd[490]: read error
Nov 29 07:31:59 Rig_3645069 ntfs-3g[470]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:31:59 Rig_3645069 ntfs-3g[470]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:31:59 Rig_3645069 rngd[492]: read error
Nov 29 07:31:59 Rig_3645069 rngd[492]: read error
Nov 29 07:31:59 Rig_3645069 kernel: [ 10.129078][ T270] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:32:03 Rig_3645069 ntfs-3g[610]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:32:03 Rig_3645069 ntfs-3g[610]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:32:33 Rig_3645069 kernel: [ 48.792135][ T1002] amdgpu 0000:08:00.0: amdgpu: gpu post error!
Nov 29 07:32:33 Rig_3645069 kernel: [ 48.792402][ T1002] amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
Nov 29 07:32:33 Rig_3645069 kernel: [ 48.792776][ T1002] amdgpu: probe of 0000:08:00.0 failed with error -22
Nov 29 07:34:55 Rig_3645069 ntfs-3g[463]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:34:55 Rig_3645069 ntfs-3g[463]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:34:55 Rig_3645069 rngd[503]: read error
Nov 29 07:34:55 Rig_3645069 rngd[503]: read error
Nov 29 07:34:55 Rig_3645069 kernel: [ 10.261626][ T282] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:35:00 Rig_3645069 ntfs-3g[609]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:35:00 Rig_3645069 ntfs-3g[609]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:48:15 Rig_3645069 ntfs-3g[472]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:48:15 Rig_3645069 ntfs-3g[472]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:48:15 Rig_3645069 rngd[522]: read error
Nov 29 07:48:15 Rig_3645069 rngd[522]: read error
Nov 29 07:48:15 Rig_3645069 kernel: [ 10.144416][ T265] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:48:19 Rig_3645069 ntfs-3g[613]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:48:19 Rig_3645069 ntfs-3g[613]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.679741][ T1012] amdgpu 0000:04:00.0: loading /lib/firmware/amdgpu/polaris10_mec2_2.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.679742][ T1012] amdgpu 0000:04:00.0: Direct firmware load for amdgpu/polaris10_mec2_2.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.681435][ T1012] amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.681525][ T1012] amdgpu: probe of 0000:04:00.0 failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.685621][ T1012] amdgpu 0000:08:00.0: loading /lib/firmware/amdgpu/vega10_gpu_info.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.685622][ T1012] amdgpu 0000:08:00.0: Direct firmware load for amdgpu/vega10_gpu_info.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.685965][ T1012] amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
Nov 29 07:48:30 Rig_3645069 kernel: [ 28.688247][ T1012] amdgpu: probe of 0000:08:00.0 failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.045914][ T1012] amdgpu 0000:09:00.0: loading /lib/firmware/amdgpu/polaris10_mec2_2.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.045915][ T1012] amdgpu 0000:09:00.0: Direct firmware load for amdgpu/polaris10_mec2_2.bin failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.047429][ T1012] amdgpu 0000:09:00.0: amdgpu: Fatal error during GPU init
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.047668][ T1012] amdgpu: probe of 0000:09:00.0 failed with error -4
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.281597][ T1012] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
Nov 29 07:48:30 Rig_3645069 kernel: [ 29.291995][ T1012] amdgpu: probe of 0000:0c:00.0 failed with error -4
Nov 29 07:51:34 Rig_3645069 ntfs-3g[471]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:51:34 Rig_3645069 ntfs-3g[471]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:51:34 Rig_3645069 rngd[493]: read error
Nov 29 07:51:34 Rig_3645069 rngd[493]: read error
Nov 29 07:51:34 Rig_3645069 kernel: [ 10.125907][ T272] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:51:38 Rig_3645069 ntfs-3g[610]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:51:38 Rig_3645069 ntfs-3g[610]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:53:56 Rig_3645069 ntfs-3g[469]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:53:56 Rig_3645069 ntfs-3g[469]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:53:56 Rig_3645069 rngd[498]: read error
Nov 29 07:53:56 Rig_3645069 rngd[498]: read error
Nov 29 07:53:56 Rig_3645069 kernel: [ 12.159315][ T273] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:54:00 Rig_3645069 ntfs-3g[609]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:54:00 Rig_3645069 ntfs-3g[609]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:57:02 Rig_3645069 ntfs-3g[455]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:57:02 Rig_3645069 ntfs-3g[455]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 07:57:02 Rig_3645069 kernel: [ 10.189003][ T275] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 07:57:02 Rig_3645069 rngd[505]: read error
Nov 29 07:57:02 Rig_3645069 rngd[505]: read error
Nov 29 07:57:09 Rig_3645069 ntfs-3g[614]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 07:57:09 Rig_3645069 ntfs-3g[614]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 08:00:08 Rig_3645069 ntfs-3g[469]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 08:00:08 Rig_3645069 ntfs-3g[469]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 08:00:08 Rig_3645069 kernel: [ 10.215576][ T280] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 08:00:09 Rig_3645069 rngd[519]: read error
Nov 29 08:00:09 Rig_3645069 rngd[519]: read error
Nov 29 08:00:14 Rig_3645069 ntfs-3g[615]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 08:00:14 Rig_3645069 ntfs-3g[615]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 08:02:06 Rig_3645069 ntfs-3g[474]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 08:02:06 Rig_3645069 ntfs-3g[474]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096
Nov 29 08:02:06 Rig_3645069 rngd[498]: read error
Nov 29 08:02:06 Rig_3645069 rngd[498]: read error
Nov 29 08:02:06 Rig_3645069 kernel: [ 10.013452][ T275] EXT4-fs (sda4): re-mounted. Opts: errors=remount-ro,commit=120
Nov 29 08:02:10 Rig_3645069 ntfs-3g[613]: Cmdline options: rw,noatime,errors=remount-ro,fmask=0133,dmask=0022,remove_hiberfile
Nov 29 08:02:10 Rig_3645069 ntfs-3g[613]: Mount options: rw,errors=remount-ro,allow_other,nonempty,noatime,default_permissions,fsname=/dev/sda1,blkdev,blksize=4096

Thanks in advance

are the cards confirmed all working properly?
have you tried reflashing hive?
have you tried 1-2-3-4 cards at a time with same result?

I removed the 2 suspicious card and put them to a different board. My rig is working non-stop with 3 cards for almost 15 hours now, and the 2 cards working at another board for 1 hour now

1 Like

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.