AMD and NVIDIA (Nvidia crashes)

winddude · May 17, 2018, 11:14pm

I have a rig running fine with 3 AMD RX 580 gpus, as soon as I shut it down and plug in a Nvidia GTX 1060, HiveOS will either fail to start, the miner (claymore dual miner) won’t start and on a few instances that claymore dual miner has started it crashed within a minute.

Any help trouble shooting, or advice on how to diagnose this would be appreciated.

steambot · May 18, 2018, 4:30am

Check 1060 riser and connectors. Screenshots, pls.

winddude · May 18, 2018, 2:13pm

Running on SSH, so screen shots when HiveOS doesn’t boot will be hard. I’ll have to get a long monitor cable.

Would a bad gpu prevent HiveOS from starting? I think one of the 1060 gpu’s might be bad. HiveOS will run with 3 rx 580 and 1 gtx 1060. When I plugin in both gtx 1060’s or only the second 1060 Hive won’t start.

winddude · May 18, 2018, 2:27pm

A few times it would start detect 4 gpus of 4 plugged in, the 3 AMDs with out the memory info, but only mine on 3 RX 580s and at very low hashrates. See screen shot in below comment.

I’m running on USB, is there anything in the logs that would help diagnose this? If so ow can I enable logging on USB?

winddude · May 18, 2018, 2:29pm

That’s weird, that first card is a 580 not 470/480

steambot · May 18, 2018, 5:21pm

Try plugin in first pci slot Nvidia, Amd in other slots. Try different options. Switch on 4G Decoding in motherboard bios. Power supply is enough? How many watts? What motherboard?

winddude · May 22, 2018, 5:52pm

with the nvidia cards in the two pcie 16 slot it will detect all cards, but with the same memory error as above. And the miner won’t start, no strange errors, seems to hang after connecting to the pools.

HiveOS might be stuck in loop as sudo poweroff isn’t shutting down the system anymore. (just upgrade to 0.5-52)

winddude · May 24, 2018, 5:34pm

It crashed last night and I haven’t been able to get HiveOS to boot back up detecting all the GPUS correctly. Currently I switch the two NVIDA cards to slots 1 and 2, followed by the 3 AMD cards. with a monitor plugged in HiveOS seems to hang on “modprobe NVIDIA drivers”

winddude · May 24, 2018, 5:40pm

looking through the bootlog, the only thing that jumps out is:

May 24 10:31:46 miner-001 hive[806]: Error connecting to Hive server http://api.hiveos.farm
May 24 10:31:46 miner-001 hive[806]: CURLE_OPERATION_TIMEDOUT (28) Operation timeout. The specified time-out period was reached according to the conditions.

However
net-test returns ok on everything
and
and no issues if i run hello

winddude · May 24, 2018, 5:43pm

looking through the syslog there’s a little bit more:

May 24 10:32:01 miner-001 cron[567]: (root) RELOAD (crontabs/root)
May 24 10:32:03 miner-001 hivex[1564]: #015
May 24 10:32:05 miner-001 hivex[1564]: waiting for X server to begin accepting connections .
May 24 10:32:07 miner-001 hivex[1564]: …
May 24 10:35:00 miner-001 hivex[1564]: message repeated 86 times: [ …]
May 24 10:35:01 miner-001 CRON[4632]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 24 10:35:02 miner-001 hivex[1564]: …
May 24 10:36:04 miner-001 hivex[1564]: message repeated 31 times: [ …]
May 24 10:36:04 miner-001 hivex[1564]: xinit: giving up
May 24 10:36:04 miner-001 hivex[1564]: xinit: unable to connect to X server: Bad file descriptor
May 24 10:36:04 miner-001 hivex[1564]: #015
May 24 10:36:15 miner-001 hivex[1564]: waiting for X server to shut down …
May 24 10:36:15 miner-001 hivex[1564]: xinit: X server slow to shut down, sending KILL signal
May 24 10:36:15 miner-001 hivex[1564]: #015
May 24 10:36:19 miner-001 hivex[1564]: waiting for server to die …
May 24 10:36:19 miner-001 hivex[1564]: xinit: X server refuses to die
May 24 10:36:19 miner-001 hivex[1564]: xinit exited (exitcode=1), starting hive-console

winddude · May 26, 2018, 6:40pm

The solution/resolutions I came to is it’s easier too separate the cards into separate rigs by amd and nvidia.