GPU driver errors and GPUs lost, forcing reboots

image

I’m having some issues getting my rig to run for a while without crashing due to GPU errors. The two common errors I’m getting are “GPU are lost, rebooting” which causes the system to reboot, and “GPU driver error, no temps” which drops the hash rate to 0.

I’m noticing that sometimes I get very high LA around the times the errors occur, but that may be due to a coincidence.

I’ve also noticed that nvtool cannot read the clock speeds of GPU2. May be related.

I’m on the newest Nvidia driver and HiveOS versions.

1 Like

I have similar problem. Just started mining.
Got 1 msi ventus rtx3080
When card is running without OC, no problem running steady at 86MH.
After doing OC 2000 on mem it runs at 96MH for several minutes and then crashes. Reboot and continue to crush after several minutes of working.

Same error “GPU driver error, no temps”

Tried nvidia driver 455 and 460

1 Like

I’m having the exact same issue with 3- Nvidia cards. Just started yesterday. Did did you find a fix for this?

1 Like

Thave the exact same config. Ditto

1 Like

Did anyone figure this out ? I have the same problem.

I have also been having this issue. The weird thing is it happened around the same time 2 nights in a row.

I get the restart for low hashrate, then an overclock failed notice with a gpu being offline, then a full reboot which seems to fix the issue.

This on a rig with 3x 3070s using gminer.

1 Like

Had similar issue today with a new 3080 I bought today. When I added it to my rig, it would run fine for a few mins then crash with the “GPU driver error, no temps”.

I changed the thermal pads on it so definitely no issues with thermal throttling but I found out that it was an unstable memory OC causing the crash. Dropped it by 100MHz and its been running good for afewhours now.

Hey King,

did you fix the problem? I am having the same issue with my Gigabyte 3080 OC cards… I have decreased the MEM to 2100 and I am having the same issues?

Same issue here with 1070 and 1070 Ti :cold_sweat:

Any idea?

same issue happening with me. but on gtx 1060s !!

so it’s not a RTX 30 series issue.

oc is minimal, and usually stable.

dunno what to do more
any suggestions?

no point in opening duplicate threads. there is no fix and hiveOS devs don’t even care.

I have the same problem on 1070, did you manage to solve it?

Has anyone Figure it out?

Mine has been really weird, I had the same cards and same board in another frame and it was ok never an issue. I bought a new Frame for more space and Air flow, moved the cards over and am getting these errors, sometimes the rig will run for 1d no issues and out of the blue i get The Nvidia OC failed error. Sometimes will happen right away or tell me there is an issue with GPU5. Change Risers, change PSU but the one thing we have in common is the 3070 and the Z170 series Board, am almost certain is either HIVEOS or the board. Am running 7 cards. I will test it with 6 Thursday and post here results. One thing that helps me is to change miners from Phoenix to Gminer and it 80% it works and mines for 1d or so.

Same rtx 3070 . T-rex .
Exactly 2 same rigs the 1 has the problem . The only difference between rigs is nvidia drivers . N460.67. Is the one with problem .

I’m on my 8th rig and starting to have the same problem as soon as I start installing 3080s. My first 7 rigs were all 3070 and never had any issues in the past. I’m currently testing this setup but still having the same issue. The problem was semi-solved when I tried to reduce the number of GPU. I think the problem either mixing 3080 and 3090 or my PSU cannot give enough juice to power 3080 and 3090. I got the same issue when I added a 3060 into my previous rig with 3070 only. Resetting my BIOS helped me with another rig.

I just bought a Corsair HX1000 Platinum 1000w to give it a try. I will also reset my BIOS. I will keep you guys posted!

@russchau
J’ai le même problème mais je ne sais pas cela viens d’où mon rig ne mine plus a n’importe quel moment.
Et je voudrai savoir si je devais améliorer quelque chose sur mes overclock

Je suis entrain de chercher la cause du problème pour le moment en changeant différent PSU. Je te tiens au courant si je trouve une solution.

Essaye de reset ton BIOS et laisse moi savoir si le problème est résolu. Si ca ne fontionne toujours pas, essaye de reset encore et commence avec ton 3080.

c’est possible que se soit a cause de la connexion parce que moi c’est arriver apres que j’ai installer la fibre donc c’est possible que sa vienne de la ?

mon rig s’arrête toujours et je ne sais plus quoi faire mais moi ca marcher très bien avant et tout a coup il s’arrête a n’importe quel moment

Je ne pense pas car ca ne consomme pas beaucoup d’internet. Est-ce que la lumière sur ton ETHERNET port est allumer?
Solution:

  1. Branche un écran sur ton HDMI et prend une photo après que HIVEOS load.
  2. Débranche ton ETHERNET cable et Restart ton rig. Attend que HIVEOS load complètement. Branche le ethernet cable après que le load fini. Si la lumière de ton ethernet port s’allume, met le command: “miner”

Tu peux run les commands suivant: https://hiveos.farm/troubleshooting-conn/

Delete tout les OC settings sur les 3080 (= 0), restart your RIG et ensuite change seulement ton PL à 230w après que c’est stable.