Wdog Thread(s) not responding - I'm Losing It

I created a reddit post but no luck so far.

I have an 8 x 5700 XT rig with 1600W PSU, consuming 1200W at the wall. Running on HiveOS latest beta. Both with teamredminer and phoenixminer, it randomly reboots and sometimes freezes. I have tried everything I can think of. Randomly it says “wdog GPU not responding” for several GPUs and after “wdog Thread(s) not responding”, it tries to reboot. Sometimes successfully, sometimes it freezes the whole rig. I get similar error with teamredminer (it says dead gpu)

I thought it is the risers at first but all the “not responding” occurs at the very same time and failed GPUs are random too. You can check the log here

After that I thought, maybe it is the OC profile. I increased VDD from 740 to 850 gradually, no luck. Mem is at reasonable 895 and core is at 1375. I always used latest versions of the miners (for a short while, used phoenixminer v5.0e, no luck again). I’ve never added any option or whatnot to miner options.

I have been using HiveOS on a USB stick. Tried several sticks and now it is working on an 120GB SSD, no luck. Again.

Driver version is 19.30, which should be stable. The board I have is Asrock H110 Pro BTC+. I did BIOS arrangements of the mobo, too. It just resets too much and sometimes it gets stuck. Since there is no pattern in this behaviour (it worked 3 days straight last week without any hassle), I’m simply helpless.

Hardware info: G3930, 2x4GB RAM, 6 * MSI 5700 XT MECH OC + 2 * MSI 5700 XT MECH. 8 GPU+riser are powered seperately. Mobo is powered from seperate outs, too(this board needs a sata and 2 molex to power pci-e slots). Temps are 57 max for core and 88 max for memory in Celcius.

-Tried unplugging 2 of the cards to see whether the PSU is good enough, the problem remained. I cannot check whether cards are stable without OC because they draw too much power.

-I saw a thread someone got stable rig with kernel 5.0.21 and latest beta with this kernel is hiveos-beta-0.6-140@200520 , I will try this one and will update this notice accordingly. This didn’t work, too many errors, switched back to latest beta kernel.

Hey greetings.

I have 3 rigs. Each rig runs 8 x 5700 XT with 1600W PSU, like you. Each rig draws about 1150W from the wall. I am also running latest HiveOS beta on SSD with Phoenixminer. On Rig #3, the one in the picture, I am running the ASRock H110 Pro BTC+ MB as well.

  1. First thing I did was update the BIOS, which is at 1.60. (https://www.asrock.com/MB/Intel/H110%20Pro%20BTC+/#BIOS)
  2. I am using a Core i3 7th gen processor with this motherboard. (Intel Core i3-7100 7th Gen Core Desktop Processor 3M Cache,3.90 GHz).
  3. I am also using 16GB of Crucial RAM for this motherboard (just in case I ever wanted to switch to Windows one day) [Corsair Vengeance LPX 16GB (2x8GB) DDR4 DRAM 2400MHz C16 Desktop Memory Kit - Black (CMK16GX4M2A2400C16), Vengeance LPX Black, 16GB (2 x 8GB)]
  4. For SSD, I am using Kingston 120GB SSD. (Kingston 120GB A400 SATA 3 2.5" Internal SSD SA400S37/120G]
  5. For WiFi, I am using TPLink [TP-Link USB WiFi Adapter for PC(TL-WN725N), N150]

All my power Risers use a 6Pin PICe connection. So from my PSU, which has 8 GPU wires, I use 2 splitters. The first splitter powers both GPU connections, and the second splitter connects to the first and goes between the riser and the power from the PSU.

What I did was wrap each of the board connectors in electrical tape to prevent the metal of one touching the metal of the other, since they are sooooo close together.

I am also using a Watchdog that I just installed – I don’t know if this automatically works and integrats with HiveOS yet. I am run restart from shell and I hear the relay “click” to restart system.

Also, I am using a modified Bios on each card. I used both MPT and RBE to create new values. I just use the same modified Bios across all 5700 XT cards using HiveOS to flash them … only AFTER I have saved original bios from each and every single card first. Oh yes, I am using oboard video from motherboard for my monitor … I do not use any video output from any video card.

1 Like