I have 2 rigs running 8x 5700 xt each (with one 5700). I upgraded to hiveos version 5.4.80 this morning, and both rigs immediately started crashing due to teamredminer detecting a gpu as dead. There were more than 100 reboots in less than 4 hours between both rigs. After extensively troubleshooting both rigs, including taking multiple cards apart and cleaning and adjusting the thermal pads (the cards with errors had the stock thermal pads replaced with thermal grizzly minus 8 pads), I ended up downgrading to 5.4.0. Both rigs are now running stable again for the past 45 minutes.
Yes I had a similar problem, and downgraded to the stable version
So this is why my 5700 rig keep crashing. One card specifically. Happened not soon after I updated it to 0.6-204@210608. Went mad switching out risers, putting the “dead” gpu into my desktop and it worked fine. Such an insanely stressful couple days.
Will downgrade to 0.6-203@210604 and see if that solves the issue with my 5700 rig.
Also, upon updating to 0.6-204@210608 on my other rig of R7’s, I had issues with my network not seeing the rig, couldn’t ping it and SSH didn’t work. Just downgraded to 0.6-203@210604 and can SSH into the rig no problem now.
My Vega 64 rig had 98 invalid shares in less than a 24 hour period on 5.4.80. Downgraded to 5.4.0 and am once again averaging only 1-2 invalid shares per day. There are definitely some problems with 5.4.80.
How did you downgrade to the 5.4.0 kernel? Did you reflash to an older version rather than using the downgrade feature in the dashboard? My rig is relatively stable currently with version 0.6-203608, kernel 5.4.80. I don’t know how to downgrade the kernel, other than re-flashing my SSD with an older image.
In any case, my AMD rigs are stable with 5.4.80/0.6-203608. While my NVIDIA rigs are rock solid stable on the newest version and kernel. Almost a month of uptime, compared to 15 hours stability at most with my AMD rigs.
Annoying to troubleshoot say the least.