Vega 64 Issues with Dead GPUs

Quick basic info:

Beta Image - 0.6-203@210515
Miner - Teamredminer

1 x vega 64 sapphire nitro plus
1 x vega 64 asus strix

Both detect as dead. No overclock.
Have amdmemtweak setup to run before miner starts. No idea what the issue is. The cards work and display just fine in a normal pc.

Both show up on hiveOS but dont hash (as they’re being detected as dead).

I’ve tried different risers, doesn’t do it make a difference.

Is there a better way to debug why they’re being detected as dead?

1 Like

Same Here
i use
amdmemtweak --CL 16 --PD 8 --RC 44 --RP 10 --WR 16 --CWL 7 --FAW 20 --MOD 15 --MRD 8 --RAS 30 --REF 17550 --RFC 300 --RTP 5 --RDWR 19 --RRDL 5 --RRDS 3 --WRRD 1 --WTRL 9 --WTRS 4 --CKSRE 10 --CKSRX 10 --RCDRD 12 --RCDWR 5

core 1010
core voltage 850
mem 950
mem voltage 1250

when i start the miner the hash is around 8+
and the memMhz stuck at 167, if i try to change the OC of any card then some cards may jump up to 50MH but other cards will be GPU 1: detected DEAD

The following link talk about the problem and need HIVEOS team to look at

1 Like

normally vega´s are started without any straps, then applying oc´c and when tuned to good stable oc, straps are applyed. No idea is this any help, just general info. hope u get issues solved

1 Like

Vega cards are a PITA with DEAD GPU messages. I have 2 in one rig and no matter what I do, one always kills the rig. Need to look more into the straps too when I’ve time. There is a good guide on this forum which goes through Straps in more detail.

1 Like

yes. vegas are really tricky sometimes… can get to your nerves to get them stable…

Have u concider to flash those 64´s with corresponding vega 56 vbios?
That will make 64 much more stable, better temp control… and easier to tune?

I would start with the hive straps that are available in tuning section behind the advanced config tab that shows up when u put your mouse above little i icon …and go from there.

if samsung mem - amdmemtweak --CL 20 --RAS 22 --RCDRD 14 --RCDWR 12 --RC 36 --RP 14 --RRDS 3 --RRDL 6 --RTP 5 --FAW 12 --CWL 8 --WTRS 4 --WTRL 9 --WR 14 --REF 65000 --RFC 249

if Hynix mem- amdmemtweak --RC 35 --RP 13 --RAS 22 --REF 65500 --RFC 148 --RRDL 4 --RRDS 4 --RCDRD 12 --RCDWR 4

hope u get issues solved :slight_smile:

1 Like

dead gpu is caused either bad riser or most likely too high clock´s or too low mV…or combination of those and wrong straps… :confused:

1 Like

Well thankfully I began to get them to work, some had bricked bios.

Though now i’ve run into overclocking issues. I have 4 vega 64’s, 1 vega 56, and an rx 580 all running on the latest version of the beta hiveOS.

If I OC 1 of any of the cards, everything else hardcaps their hash rate at 33m/hs which we all know is far too low for 64’s.

If I OC two cards the cards all fail to mine and come up dead.

This question is to anyone who knows, but does the rig have to be homogeneous? All vega 64’s? I’ve seen issues mixing with Nvidia cards, but no with AMD.

im not sure what u refer when u say “If I OC two cards the cards all fail to mine and come up dead.” u do know that u cant oc vegas on the fly without rig rebooting, normally u get message that the card that u were adjusting or gpu 1 “detected dead” and all boot back up normally. I have mixed, all amd but 6 vegas and 4 rx5xx gpu´s. i wouldnt put nvidia with vega cause TRM is way better support to vega than any other miner that i know. Please keep us posted how it goes…i would set min oc´s to all and hive´s “default” starps and see how it goes, or start with one card connected etc… good luck :slight_smile:

Yup, I took the slow way through the process.

I went one by one through all the vega 64’s to confirm I can overclock them together, but as soon as I add the vega 56 or rx 580 and apply overclocks to those as well, even after reboot, all cards will come up dead.

If I apply OC only to one card while all cards are present, the rest of the cards hit a hash cap, even after reboot.

so…how many gpus u have on the rig, 5x vega´s and 1x 580, right? have u tried to start with either of those “issue cards” as first ones to add ? If it would be only one gpu that is causing this behaviour, would be plausable that there is something wrong with the card…like wrong bios or…?? but u have 2 cards that are causing the same behaviour to your whole rig…no matter witch one u throw into the mix it goes sour…right?
Is your power supply powerfull enought? Are all the output wires okay?
What MB you have, can that have anything to do with this? how about your cpu?
Do you have enought pci-e lanes available for all these cards?..sorry, maybe stupid questions but just shootiing in the dark here.
Let us know what was it when u find out…defo u will solve this soon :slight_smile:

Yea, 4 vega 64s, 1 vega 56. The 64’s all have 56 bios and work fine on their own. The 56 has 64 bios and works on its own. The 580 also works on its own.

Mobo - asus z390-p
CPU is a celeron.
PSU is 2400 watt plat server psu.

The weird part is if all the cards are plugged in together, it doesnt matter which card you OC, OC will result in caps on all other cards.

I have 6 pcie lanes running on gen 1 config. 4g encoding is enabled in the mobo bios. Tested with new wires and new risers. Risers and wires have no faults.

Going to run another test with just the 64’s to see if overclocking two+ cards causes issues or not. I’ve already done this a few times, but I’m going to do it again just to 100% rule out that all the cards.

Hope u find solution. I dont understand too much about pci e lanes, but when my mb PCI e is set to auto, vegas allways run on gen 2…i suppose gen 1 is ok too.? . Just a thought. good luck

1 Like

Try this guide.

I got mine stable for a few days now.

Would reset the miner as i hate seeing invalids.
But i can live with 2 invalids a day on one of the card.

So, a quick update for anyone who wants to know.

The biggest issued i’ve faced so far is having a few cards with faulty drivers.

I didn’t flash them prior, but having flashed new clean V56 bios to two different problem cards, I have found that fault bios can cause every other card on the rig to act up. With fresh bios cards aren’t capping at weird or really low hashes, and the rig is no longer crashing if I OC more than two cards.

I have more to follow, but at least getting over the faulty bios hurdle for the cards is a big step in the right direction.

Second Update:

Running in to issues with amdmemtweak settings not applying.
I have them set in the the usual xinit.user.sh in /home/user

The above location doesn’t work for me as of the latest hiveos build. I found a recommendation somewhere else that says to:

For those that are on the latest HiveOS
enable Gui
then add your amdmemtweaks in sudo nano /etc/xdg/openbox/autostart

which looks like:

/hive/sbin/amememtweak --[settings] &

He made a point to say there should be an & at the end, and a newline break after to make sure it runs. The & has it run in the background otherwise nothing else will run after it.

just out of curiosity…why would u like to play with command lines to apply oc´s or starps??

I had just discovered this, and the ability to change straps this way is so nice.

As a side note, I managed to get all cards operating properly again. The overclocking game was a pain but I finally made my way through it.

All my cards core temps are running 65 C or less, but the mem temps are somewhat high with most of the memtemps over 70-78C. Is this normal for people’s memtemps?