3080 – GPU Driver Error, no temps

fishcake · April 2, 2021, 11:21am

same problem here. are hiveos devs even looking into this? there surely has to be some reason as to why it’s happening.

jmw60546 · April 5, 2021, 12:46pm

I am having a similar issue with one of my 3080 cards. I did an inspection and noticed to +2 pin on one of the 6+2 PSU connectors was loose. I reseated it and testing now to see if it makes a difference. Typically the rig would run fine for 2 hours then reboot or go unresponsive. The only error showing is the GPU Driver Error, no temps.

UncleSamThe1st · April 6, 2021, 8:37am

Any updates? I have the same problem, tried rolling back Drivers, decreased some OC settings, changed the miner, still nothing changed, every 4 hours or so i get the no temp error and reboots.

JDD · April 6, 2021, 12:48pm

Hey guys,

so I am still having the same issues, nothing has worked for me as well. The only option is to decrease my OC even more, but consequently the hash rate drops a lot, which isnt an option for me… I have no idea what more to try

jmw60546 · April 6, 2021, 1:51pm

Mine has been stable for the last 18 hrs. After re-seating the +2 GPU connector, I dropped all the OC settings down and increasing them slowly every 4 hrs without issue so far. I have been running Core -200, Mem 800, PL 235 for the last 9 hrs without issue. I just bumped Mem up to 900 and will see how it goes. Getting 92Mh/s right now. Hoping to get it to 95-96 and stable but will take what I can get.

fishcake · April 9, 2021, 11:29am

pavanery · April 10, 2021, 5:10pm

I have the same problem with the 3080 Gainward Phoenix card. Initially I put it on my rig, it made me lose 24 hours trying to adjust everything! So I put it in a separate machine, testing it with a riser, testing it directly in the motherboard slot, I tried all kinds of OC configuration, my temperature is normal, below 50C; So I have realized that it may be a problem with the production of the card or limitation of the BIOS of it with OC, because when I use it without any OC it works normal, something happens with this model that hiveos doesn’t know how to work on. I haven’t tested it on another system, but it seems to me that the hiveos script that collects temperature information and the fan goes into a loop trying to get information and can’t.

I have 3080 from zotac that runs the OC quite high and is stable at only 213W; This Gainward to work with any OC needs to be 240W.

3080 Gainward Phoenix model supports up to 2050 MC; 1050CC with 240w PL in my tests: 93MH/s

This limitation of OC does not make sense, I use 25% more in zotac and it delivers 100MH/s stable.

Any gain above that for Gainward, stable, is to be happy!

Gardenal · April 15, 2021, 2:21am

I have 3 x RTX 3080 Gainward Phoenix, each one works in a different way! The GPU1 and the GPU3 is the same model MICRON GDDR6X - 94.02.42.00.8E and the GPU2 is …8F.

pavanery · April 15, 2021, 10:15am

@Gardenal , But are all 3080s running without errors?

Gardenal · April 15, 2021, 12:50pm

@pavanery , Yes, in the GPU1, if I put the memory over 2150 it starts to brake and I see errors in hiveos!

pavanery · April 15, 2021, 1:12pm

@Gardenal Exactly, I see that your gpu1 is a twin sister of my haha You have to keep your OC low or start all sorts of problems, I’ve already updated the bios but it’s still the same. Maybe I will change the bios to the own version of gainward, to see if it improves.

My bios is 94.02.42.00.8F, I will test it later with your 8E to see if it resolves, in your case you could change the bios from gpu1 to 8F to see if it corrects this OC problem.

BIOS 94.02.42.00.8F · is from Palit, not the original from Gainward.

How long has it been running without any errors?

Gardenal · April 15, 2021, 1:51pm

@pavanery
I noticed that GPU 2 is different externally from the others, its a tag GS! In the box has a sticker " GS : GOLDEN SAMPLE - BORN TO KILL"!

GPU2 works fine, low PL and temperature and nice mh, its in bios 8E!!!
GPU3 works fine, but more PL, its in bios 8F!!!
GPU1 in this config -250, 2150, 245 works fine, 6 days works without errors!

When you change the bios, please tell me your result!

Gardenal · April 15, 2021, 1:52pm

pavanery · April 15, 2021, 1:56pm

@Gardenal Great feedback man! Mine is the non-GS model. Yes, when I make the modification, if it works, I send it here for you to try too.

Which miner are you using? Are you using autofan?

Gardenal · April 15, 2021, 3:21pm

@pavanery
T-Rex, OS VERSION 0.6-203@210414, full update! Nvidia drivers installed version 460.67 (CUDA 11.2).
Yes, I using autofan.

SKYBOB66 · April 19, 2021, 2:46am

i updated yesterday to the 460.67 firmware and fix it, but now im getting some invalid shares errors in 24hrs i have about 15 bad shares, but its better than offline, since my rig its in my office. im forcing in the morning the hiveos update to see if it fix the shares

my AutoFan its set off

highflyer · April 21, 2021, 5:39pm

Your problem is you are using autofan on 3000 series cards which don’t show memory temp in hiveos so autofan is less than useless. Turn it off and set the fans to 85%-90% minimum full time, this is mining they need that much anyways. The memory core is running extremely hot like 110 degrees at those fan levels and throttling, this is a very easy fix. Up the fans cool off the cards because the core runs cold with Ethereum it uses all memory to process so if the core is even slightly hot know that your memory junction temps are on FIRE!!! So, just turn OFF AUTOFAN, I keep seeing people trying to use it and burning up their brand new cards. Once the fan’s cool off you will get full power levels and speed back. Problem solved. If you don’t believe me just google it, this is a very well known problem with the 3000 series and autofan simply doesn’t work with them because HiveOS doesn’t show mem temps for them. Only windows HWINFO app has memory junction temps and believe me you wouldn’t like what you see with fans that low. I would turn her off, let her cool down completely. Also add a 100 watt box fan to it, takes a lot to cool off 90’s. Also the fans don’t reach the memory junction part of the cards they simply aren’t made for mining, gaming they don’t reach past 100 degrees basically ever, so to even have a hope of cooling off the memory junction temps I cannot overstress this you need to turn off autofan and up the speed to 85-90% heck I some do 95-100% and just replace the fans they give em to you for free warranty or not usually. Better to replace a bad fan ever year or two than burn out a card. Finally, your current GPU target temp is set to 90 which is way too low for 3000 series if it actually indeed did work, they throttle at 110 and usually run if you get it right around 100-102.

Devilz · May 18, 2021, 6:09am

So, any update or fix about this issue?

mad · May 31, 2021, 8:35am

Hello all hope your doing fine,
On my rig I have 2x3080 zotac AMP holoblack.
And 4x3070 msi ventus 2x. Power supply 1800watt, Asus z490-p, i3 10gen, 8gb ddr4

I was getting the message “GPU driver error, no temp” quite often until I downgraded Hive image to 0.6-203@210403 and Nvidia drivers to 460. 67.

I changed Thermal pads for one of my 3080 after accidently broken a blade of the right fan I glued it back on with hard glue ( it’s stable like nothing happened)

The rig will run with no errors until it reaches
8 hour and then again it displays the error message and reboot.

My overclock for the zotac 3080 are

200 core +1800 mem PL 230w fan 76%
temp 50°c I get 96.5 mhs for both gpus

For the 3070 msi 2x OC are
1100 core 2600 mem pl 125w and 1 3070 pl set at 118w because it was getting hotter than the other 3070s even though they are the same gpus
I get 62. 37 mhs for 3x3070 Temps 48-53°C
and 61.5 mhs for the 3070 temp 55°C

Autofan is disabled, hashrate watchdog disabled also
I’m using Trex miner

I would like to know if my problem is a software issue maybe I should try a different invidia driver or OC settings or a hardware issue (power supply isn’t enough, Risers,ups…)

Will appreciate your help, please let me know what you think

Thank you

shawn23232 · June 6, 2021, 2:43am

Ok, here are few things I tried that have worked for me.

I got 3 Gigabyte 3080 Gaming OC, two with bios of 94.02.42.40.33 and one with bios 94.02.42.00.4D.

Two of the 3080 with the same bios worked perfectly out of box for two weeks no problem clocking around 98Mh/s, then started to thermal throttle down to 80-85MH and weren’t using the 230 PL I set. Was only using around 200W. Changed 2mm thermal pads for both front and back, now its back to normal clocking 98MH/s for a week.
The other 3080 with BIOS ending in 4D is the one causing my rigs to restart all the times. Changed thermal pads just like two other 3080, and it still won’t go over 93Mh or 1300 for mem. Can’t recognize Fan speed, no GPU temp and can’t apply OC settings and all sort of errors if I put Mem above 1300Mhz. I have also noticed that this GPU could not flash BIOS for some reason. I have first saved the original BIOS and tried to flash it, it failed. Then I tried to flash other two good card’s BIOS, it failed as well.

My conclusion is to try to find the cards that’s causing the issue and set the Mem OC low, probably 900-1300. Then if its working and not restarting or giving you errors in miner terminal, try to flash its original BIOS and see if that works. If it doesn’t, it probably just means its one of the defect or bad cards like mine. Set it to low Mem rate and call it a day.

Add:

One other tip I got from my friend is to do a power shutoff and reboot with no OC settings. Let it run for 10-20miutes with no issues, then apply OC settings in one go and see if that works. Set a Delay for the OC profile, so next time your rig restarts, it will have 30-60s of delay before applying OC settings

Hope this will help someone, please report back here if it works.