NVIDIA drivers present but no GPUs found (Ubuntu 14.04, GTX970)

Sparrow
Sparrow
Joined: 4 Jul 11
Posts: 29
Credit: 10701417
RAC: 0
Topic 198261

Until a few days ago, Einstein ran just fine on my system. But now the GPU tasks are not running anymore (no GPU found). Here are the BOINC messages:

Sa 03 Okt 2015 08:06:11 CEST | | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
Sa 03 Okt 2015 08:06:11 CEST | | log flags: file_xfer, sched_ops, task, coproc_debug
Sa 03 Okt 2015 08:06:11 CEST | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Sa 03 Okt 2015 08:06:11 CEST | | Data directory: /var/lib/boinc-client
Sa 03 Okt 2015 08:06:11 CEST | | [coproc] launching child process at /usr/bin/boinc
Sa 03 Okt 2015 08:06:11 CEST | | [coproc] relative to directory /var/lib/boinc-client
Sa 03 Okt 2015 08:06:11 CEST | | [coproc] with data directory /var/lib/boinc-client
Sa 03 Okt 2015 08:06:11 CEST | | OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 346.35, device version OpenCL 1.1 CUDA, 4096MB, 4096MB available, 274 GFLOPS peak)
Sa 03 Okt 2015 08:06:11 CEST | | NVIDIA drivers present but no GPUs found
Sa 03 Okt 2015 08:06:11 CEST | | No ATI library found
Sa 03 Okt 2015 08:06:11 CEST | | App version needs CUDA but GPU doesn't support it
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | Application uses missing NVIDIA GPU
Sa 03 Okt 2015 08:06:11 CEST | | App version needs CUDA but GPU doesn't support it
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | Application uses missing NVIDIA GPU
Sa 03 Okt 2015 08:06:11 CEST | | App version needs CUDA but GPU doesn't support it
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | Application uses missing NVIDIA GPU
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | Missing coprocessor for task p2030.20150806.G66.11-04.07.S.b0s0g0.00000_3152_1
Sa 03 Okt 2015 08:06:11 CEST | | Host name: sparrow
Sa 03 Okt 2015 08:06:11 CEST | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz [Family 6 Model 60 Stepping 3]
Sa 03 Okt 2015 08:06:11 CEST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
Sa 03 Okt 2015 08:06:11 CEST | | OS: Linux: 3.19.0-30-generic
Sa 03 Okt 2015 08:06:11 CEST | | Memory: 15.62 GB physical, 10.99 GB virtual
Sa 03 Okt 2015 08:06:11 CEST | | Disk: 45.71 GB total, 29.93 GB free
Sa 03 Okt 2015 08:06:11 CEST | | Local time is UTC +2 hours
Sa 03 Okt 2015 08:06:11 CEST | | Config: GUI RPCs allowed from:
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11460511; resource share 100
Sa 03 Okt 2015 08:06:11 CEST | | General prefs: from http://boincsimap.org/boincsimap/ (last modified 01-Nov-2009 07:23:32)
Sa 03 Okt 2015 08:06:11 CEST | | Host location: none
Sa 03 Okt 2015 08:06:11 CEST | | General prefs: using your defaults
Sa 03 Okt 2015 08:06:11 CEST | | Reading preferences override file
Sa 03 Okt 2015 08:06:11 CEST | | Preferences:
Sa 03 Okt 2015 08:06:11 CEST | | max memory usage when active: 14397.58MB
Sa 03 Okt 2015 08:06:11 CEST | | max memory usage when idle: 14397.58MB
Sa 03 Okt 2015 08:06:11 CEST | | max disk usage: 15.00GB
Sa 03 Okt 2015 08:06:11 CEST | | max CPUs used: 4
Sa 03 Okt 2015 08:06:11 CEST | | (to change preferences, visit a project web site or select Preferences in the Manager)
Sa 03 Okt 2015 08:06:11 CEST | | gui_rpc_auth.cfg is empty - no GUI RPC password protection
Sa 03 Okt 2015 08:06:11 CEST | | Not using a proxy
Sa 03 Okt 2015 08:06:11 CEST | Einstein@Home | Sending scheduler request: To fetch work.

I am running Ubuntu 14.04 with Kernel 3.19.0-30-generic. I have a NVIDIA GTX970 with 346.35 drivers.
I have not changed anything in the last weeks. Only the Ubuntu auto update was messing around with the system. The NVIDIA driver and the BOINC client have the same version numbers as before. Maybe the Kernel was updated via the auto update. But running Ubuntu with an older Kernel (3.19..-26 or -28) doesn't change the problem.

Any ideas what could have caused this problem?

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

NVIDIA drivers present but no GPUs found (Ubuntu 14.04, GTX970)

Don't run on Linux anymore but did a quick google on "NVIDIA drivers present but no GPUs found" and found something about "nvidia-modprobe" and something about "xhost access control"

Sparrow
Sparrow
Joined: 4 Jul 11
Posts: 29
Credit: 10701417
RAC: 0

I tried both. It didn't help

I tried both. It didn't help :-(

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

what about reinstalling the

what about reinstalling the nvidia driver with a clean install?

Sparrow
Sparrow
Joined: 4 Jul 11
Posts: 29
Credit: 10701417
RAC: 0

Yes, a driver re-install

Yes, a driver re-install could help maybe. I hoped that someone would recognize the error right away and point out an easy way to fix it.

Running GPU tasks under Ubuntu was troublesome from the beginning. It seems that too many software pieces of too many different developers have to work together to get it to run. After almost every update of Ubuntu or the NVIDIA driver it needs fixing. Now it seems that even Ubuntu auto updates mess it up.
I guess I'm at the point of just giving up on it. CPU tasks always work without problems, i will just run those. Since I have no 24/7 machines and I'm spending all my freetime with Final Fantasy 14 lately (running under Win7), it's not that much of a loss anyways :-)

mikey
mikey
Joined: 22 Jan 05
Posts: 11979
Credit: 1834165908
RAC: 199750

RE: Yes, a driver

Quote:

Yes, a driver re-install could help maybe. I hoped that someone would recognize the error right away and point out an easy way to fix it.

Running GPU tasks under Ubuntu was troublesome from the beginning. It seems that too many software pieces of too many different developers have to work together to get it to run. After almost every update of Ubuntu or the NVIDIA driver it needs fixing. Now it seems that even Ubuntu auto updates mess it up.
I guess I'm at the point of just giving up on it. CPU tasks always work without problems, i will just run those. Since I have no 24/7 machines and I'm spending all my freetime with Final Fantasy 14 lately (running under Win7), it's not that much of a loss anyways :-)

I used to run Ubuntu too, but did turn off the frequent OS updates making it stable and crunch just fine. I can not help you in your problems though, I also have lots of trouble getting gpu's to work in Linux in general.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: Running GPU tasks

Quote:

Running GPU tasks under Ubuntu was troublesome from the beginning. It seems that too many software pieces of too many different developers have to work together to get it to run. After almost every update of Ubuntu or the NVIDIA driver it needs fixing. Now it seems that even Ubuntu auto updates mess it up.

You haven't described what the beginning troubles were exactly and i could not find any earlier posts - i guess they are the same issue?

I have had several years running Ubuntu ok with nVidia, without too much difficulty, however i'm reasonably confident with running at the command line.

Others may have a different approach, but i
+ never use GPU drivers from registry
+ never upgrade OS in place
+ read the updates before applying them.
+ do not install optional updates unless a benefit
+ always apply security updates
+ keep to LTS versions of Ubuntu.
+ keep a rescue boot usb handy
+ don't assume a driver works well for all nvidia cards.
+ don't assume the latest driver is more reliable than the one you are currently using. The E@H forums usually have good advice, the release notes and the README usually give a few clues.

I haven't updated nVidia drivers for a while but currently i have been running 349.12 for over 6 months.

Your version: 346.35 had a release date of 2015.1.16 and was the first to support the gtx970 from what i can tell - i have not seen any discussion good or bad on this release, however i would reiterate other advice given of getting a newer driver.

You will need to uninstall and purge the old driver (apt-get ...) and that may have an impact of the system stability and so reinstall from scratch may be easier. If you install a nVidia supplied driver manually you must not use repository version.

Latest 64 bit release 352.41 has the location of the latest driver.

There is a very good GPU recognition post here which helped me GPUs : Debian/Ubuntu/Mint/Derivatives - GPU recognition fixes which is worth reading especially checking the delay settings.

Good luck.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: + never use GPU drivers

Quote:
+ never use GPU drivers from registry


Apologies for the double post - and missed the edit window, that should read

+ never use GPU drivers from (Ubuntu) repository

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110101161497
RAC: 24375503

RE: + never use GPU drivers

Quote:
+ never use GPU drivers from (Ubuntu) repository


So you download the driver package from nvidia and build your own?

I did that for fglrx drivers (AMD) when I first started playing with a HD 7770 card. With the distro I use (not ubuntu or derivatives), I've never had to do it with nvidia as the repo supplied drivers and libs have always worked. When I first wanted to try later versions of fglrx, I was able to use the repo supplied drivers as the packaging issues had been fixed.

I currently have over 20 hosts with various nvidia cards. The version I'm running is 331.49. They run fine and I tend not to update unless there seems to be a reported performance gain by doing so. I do test this (relatively infrequently) and about a month or two ago I decided to buy a GTX 750Ti and try it out with the latest versions of everything for my distro of choice. I don't remember exactly what driver version it was but it was quite recent.

My distro packages the driver separately from what it calls the 'CUDA and OpenCL high performance computing libs' and there is a note saying that these libs are not required for 'normal' use. I actually feel rather special being NOT 'normal' :-). The libs are there in the repo - you just have to make sure you install them. So I made sure they were installed for the test using the GTX 750Ti.

When I tried to start crunching with the new GPU, I got pretty much the same result as reported by the OP - the driver was recognised but no usable GPUs were found. I double checked everything to no avail. So I reinstalled the OS using the kernel and drivers from May 2014 (exactly what is on all other nvidia hosts) and the new machine started crunching straight away.

On further investigation, I found this thread on my distro's forums which I believe is why I couldn't get my card running with the latest driver - a missing module called nvidia-uvm that is now needed for GPU computing. If you look for posts by "TerryN", the person who packages the nvidia drivers you will find this statement

Quote:
Quote:

Quote from: Orion on September 10, 2015, 07:34:57 AM

> FATAL: Module nvidia_uvm not found.
> CUDA cuInit: Unknown error

Ah yes, that is true that we don't actually build the uvm module at the moment (because nobody seemed to want it and it would add an extra dkms build ;D).
That can easily be added (I think). Looks like we also need an extra package for the cuda-toolkit. I'll look when I get time.

Terry.

Of course, that explains my problem - I just have to wait until TerryN works out how he will deal with the nvidia-uvm module - but I know nothing about the situation with ubuntu. All I can suggest is that the OP does some digging in the ubuntu world to see if the same problem exists there. For what nvidia-uvm is and when it was added, check out the 5th bullet point on this page,

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: RE: + never use GPU

Quote:
Quote:
+ never use GPU drivers from (Ubuntu) repository

So you download the driver package from nvidia and build your own?

Yes, the nVidia installs are relatively simple (on Ubuntu), but require X to be unloaded and a shell script run.

I'd also add to my list
+ ensure you can run a ssh or telnet session to the host you are upgrading from another.

Quote:

I did that for fglrx drivers (AMD) when I first started playing with a HD 7770 card. With the distro I use (not ubuntu or derivatives), I've never had to do it with nvidia as the repo supplied drivers and libs have always worked. When I first wanted to try later versions of fglrx, I was able to use the repo supplied drivers as the packaging issues had been fixed.

AMD could make life a *lot* easier for driver installs, it was twice as painful getting my 7990 running *reliably* compared with setting up the nVidia, but worth doing nevertheless.

Quote:

I currently have over 20 hosts with various nvidia cards.

I can see why you are attracted to a good repo version approach!

Quote:


The version I'm running is 331.49. They run fine and I tend not to update unless there seems to be a reported performance gain by doing so. I do test this (relatively infrequently) and about a month or two ago I decided to buy a GTX 750Ti and try it out with the latest versions of everything for my distro of choice. I don't remember exactly what driver version it was but it was quite recent.

My distro packages the driver separately from what it calls the 'CUDA and OpenCL high performance computing libs' and there is a note saying that these libs are not required for 'normal' use. I actually feel rather special being NOT 'normal' :-).

Have no doubt readers, running just one single GPU reliably is special no matter what project, OS or GPU used.

Quote:


The libs are there in the repo - you just have to make sure you install them. So I made sure they were installed for the test using the GTX 750Ti.

When I tried to start crunching with the new GPU, I got pretty much the same result as reported by the OP - the driver was recognised but no usable GPUs were found. I double checked everything to no avail. So I reinstalled the OS using the kernel and drivers from May 2014 (exactly what is on all other nvidia hosts) and the new machine started crunching straight away.

The Ubuntu and other Debian based versions of boinc auto-start in background running as user "boinc", this can and does cause race conditions at start-up as the GPU cards and their drivers are not always ready when boinc starts.

AMD is even more complex as there are X dependencies as well (xhost and aticonfig magic needed).

I like this "auto-start as another user in background" approach - but - it is a very common cause of "no GPU found" and other problems, and again, getting it to be reliable can take some effort.

Of course this means my version of boinc lags behind the latest version as Ubuntu supply it from their repo. I am slightly embarrassed about running boinc 6.10.17 on LTS 10.04 - on my nvidia cruncher - but it keeps going strong, and it is the science apps which are where the crunching gets done.

Quote:

Of course, that explains my problem - I just have to wait until TerryN works out how he will deal with the nvidia-uvm module - but I know nothing about the situation with Ubuntu. All I can suggest is that the OP does some digging in the Ubuntu world to see if the same problem exists there. For what nvidia-uvm is and when it was added, check out the 5th bullet point on this page,

This highlights exactly my ever growing like of open source, and Linux distros. You *can* get in touch and get a direct response from the authors, and if you really want to - fix it yourself.

Ubuntu LTS is a obviously a more mass market distro, so getting things changed and pushed through QC takes longer, but the bug reports are there to review and see progress (or not) on fixes. I don't expect Ubuntu to do any testing with CUDA or OpenCL, I do expect AMD and nvidia to do that testing so that is where i go to to get such stuff.

I use my crunchers for a lot of things so i need a good desktop while crunching away. I can certainly see the attraction to pclinuxos and other lighter distros if i had a farm (zoo / ark / meganangerie[sic] - what is the collective cruncher noun?).

All said - it really is important to read the driver notes before installing, and for example on the the above page that specific driver would not support the OP's GTX 970.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110101161497
RAC: 24375503

RE: RE: I currently have

Quote:
Quote:

I currently have over 20 hosts with various nvidia cards.

I can see why you are attracted to a good repo version approach!


Exactly! With 20+ nvidia and 30+ AMD, I need a quick and painless method to get a host fully configured and working. New installs are one thing but there is a significant saving in having simple procedures for coping with hardware reconfiguration and hardware failure. I keep multiple versions (approx 6 months apart) of my distro's repos on a 1TB USB drive, together with multiple OS versions that can be booted as live media (much faster than a CD). It goes back for the last couple of years so I can very quickly go back to a known good version if the very latest has an issue. Once I found the problem with my new 750Ti and the very latest, it took about 20 mins to reinstall the OS and update to a known good point in time and get back to crunching - quick and painless :-).

Yesterday, my control script notified me of an unresponsive host. On trying to reboot that host, there was no hard disk being detected by the BIOS. Pulled the plug for about 5 mins and tried again. Disk was identified but SMART status was BAD. It was a 40GB IDE from about 2004. Hooked up another old IDE drive to the second plug on the IDE cable and booted the live USB I use for installs and recovery.

Ran fsck on the partitions on the BAD drive. They checked out OK so I created suitable partitions on the 2nd drive, mounted /home partitions of each drive on /media/sdax and /media/sdbx and simply copied the entire contents of /media/sdax to /media/sdbx. There were no copying errors reported so I reinstalled the OS on the root partition of the replacement drive, installed drivers (HD 7850) and applied all extras and updates using my 'date-of-choice' repo copy. After shutting down and removing the BAD drive, I was able to reboot the machine on the replacement drive and immediately restart crunching (4xGPU, 2xCPU) from all the saved checkpoints. Not a single task was lost and the entire exercise took less than two hours.

Quote:
The Ubuntu and other Debian based versions of boinc auto-start in background running as user "boinc", this can and does cause race conditions at start-up as the GPU cards and their drivers are not always ready when boinc starts.


This is really needed when crunching is an adjunct to the main use of the machine. When the machine only does crunching, I prefer to get versions straight from Berkeley and install them directly in a subdirectory of the home directory - /home/gary/BOINC, with everything owned by 'gary'. I don't allow the boinc daemon to autostart. There's an icon on the desktop to click to start the daemon and another one to stop it. The only time machines are restarted is when there is a problem and more often than not you may want to check things after a restart without having the boinc daemon doing its own thing. A big plus for me is the virtually instant version change capability. Stop BOINC, overwrite boinc, boinccmd and boincmgr with new versions, restart BOINC. The new versions are on the 1TB USB drive which automounts when plugged it.

Quote:
... what is the collective cruncher noun?).


I like 'farm' :-).

Quote:
All said - it really is important to read the driver notes before installing, and for example on the the above page that specific driver would not support the OP's GTX 970.


Yes indeed!! You need to read carefully as important info can be quite brief. For example, in the bullet point I mentioned, there is this little bit

... This kernel module provides support for the new Unified Memory feature in an upcoming CUDA release. In other words, the uvm module is already here but wont be needed until some later driver version incorporates an update to CUDA which will then be able to use this module. This is presumably why I can use this driver version with a 750Ti but without actually having a uvm module. I want to try a recent driver with the CUDA changes that require the uvm module because this may well improve performance for Maxwell1 GPUs as well as Maxwell2. If nvidia-uvm doesn't appear soon in my distro's repo, I'll probably build a driver from the stuff on the nvidia website (out of curiosity) :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.