Boinc can't find GPU again.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0
Topic 197551

Greetings.
I have been using my GPU on Einstein for a while now.
But then, I got the notice that my Linux distribution was out of date and I needed to upgrade.

So, I went to the Latest and greatest of Of Ubuntu Studio (which I like) and got release 14.04.

But now, BOINC can't find the GPU.

I had this problem before - it the solution was to get
the OpenCL video/GPU driver.

I have an ATI (Radeon) 7750 GPU and had been using the
"fglrx-updates" driver.

I installed that, verified I had the xserver-org driver for ATI as well, and rebooted.
Still no GPU detected.

here is the message from the event log.

Tue 22 Apr 2014 04:10:16 PM EDT |  | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
Tue 22 Apr 2014 04:10:16 PM EDT |  | log flags: file_xfer, sched_ops, task
Tue 22 Apr 2014 04:10:16 PM EDT |  | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Tue 22 Apr 2014 04:10:16 PM EDT |  | Data directory: /var/lib/boinc-client
Tue 22 Apr 2014 04:10:16 PM EDT |  | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1411.4 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (1411.4))
Tue 22 Apr 2014 04:10:16 PM EDT |  | No usable GPUs found
Tue 22 Apr 2014 04:10:16 PM EDT |  | Host name: pc14-large
Tue 22 Apr 2014 04:10:16 PM EDT |  | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Tue 22 Apr 2014 04:10:16 PM EDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Tue 22 Apr 2014 04:10:16 PM EDT |  | OS: Linux: 3.13.0-24-lowlatency
Tue 22 Apr 2014 04:10:16 PM EDT |  | Memory: 7.70 GB physical, 9.31 GB virtual
Tue 22 Apr 2014 04:10:16 PM EDT |  | Disk: 9.49 GB total, 7.61 GB free

The last time, Boinc couldn't find "libOpenCL.so.1"

The work around was to

  • Stop boincmanager manually (re)start the boinc client
    start boincmanager

and, yes I did find the shared object file.

   find /usr -name "libOpenCL.so.1" -print 
/usr/lib32/fglrx/libOpenCL.so.1
/usr/lib32/pxpress/lib/libOpenCL.so.1
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1
/usr/lib/fglrx/libOpenCL.so.1
/usr/lib/pxpress/lib/libOpenCL.so.1

OK. I am stumped - can't think of what's wrong.

Any suggestions are appreciated.

Meanwhile, I'll run an OpenCL test:
http://www.phoronix-test-suite.com/

THANKS in advance,
Jay

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Boinc can't find GPU again.

You might find this thread over in the Cruncher's corner usefull.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

Wow, I spent over 2 hours

Wow,
I spent over 2 hours reading that.
Thank you for pointing that out.
I think I'll some sleep and start over and read from the bottom up. :)

2 years ago, I spent months doing compiles of an AMD SDK
and trying to install it with a compatible video driver.
Then, I had just used the Ubuntu distribution fglrx drivers that had OpenCL,
and all worked.

I have a deepening appreciation for those people that do the distro builds -
and those users who have the courage and patience to roll their own.

Thanks !!
Jay

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110381098973
RAC: 30452018

RE: ... I spent over 2

Quote:
... I spent over 2 hours reading that.
Thank you for pointing that out.
I think I'll some sleep and start over and read from the bottom up. :)


I'm sorry if you developed a severe allergic reaction after those two hours of suffering :-).

I tend to be able to work things out for myself much easier by theorising in a written form where I can go back at a later stage and remind myself about what I was really thinking at various points in the journey. Hence a series of probably over-detailed thoughts that evolve considerably over time but undoubtedly contain some false or dubious assumptions along the way. They make perfect sense to me however :-).

Previously, you wrote :-

Quote:

... and, yes I did find the shared object file.
find /usr -name "libOpenCL.so.1" -print
/usr/lib32/fglrx/libOpenCL.so.1
/usr/lib32/pxpress/lib/libOpenCL.so.1
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1
/usr/lib/fglrx/libOpenCL.so.1
/usr/lib/pxpress/lib/libOpenCL.so.1

OK. I am stumped - can't think of what's wrong.

This is what I found works very reliably for me. I'm using PCLinuxOS 32 bit and a 32 bit BOINC. In the repos, there are a series of 'fglrx-current' packages, including a 'fglrx-current-opencl' package - 4 packages in total. Only 3 get installed by default so I make sure the OpenCL package gets installed as well.

Before trying to run BOINC after a new install or an upgrade where there is an AMD GPU, I check the BOINC executables with the ldd utility to make sure there are no missing libs. I also do the same with the science apps that are going to be running. For my distro, the driver and OpenCL libs are installed in /usr/lib/fglrx-current/. The BOINC executables show no dependence on any of these but the OpenCL GPU science apps do and show them as being found in /usr/lib/fglrx-current/. However, if I attempt to run BOINC like this, BOINC always fails to find the GPU.

So, by trial and error, I've found that BOINC needed to find certain files in /usr/lib/. BOINC doesn't seem to see them in fglrx-current, although the science apps do. In a root terminal session I did :-

cd /usr/lib
ln -s fglrx-current/libamdocl32.so .
ln -s fglrx-current/libOpenCL.so.1 libOpenCL.so
ldconfig -X


The last command (rebuilding the cache) is a bit of insurance in case the cache isn't fully up-to-date. I don't know if it's really necessary. Having done the above, I always find I can immediately start BOINC and have the GPU properly detected. Crunching starts with no subsequent problems.

From the libs you have listed above, I would suggest creating a symbolic link in /usr/lib/ called libOpenCL.so and pointing to /usr/lib/fglrx/libOpenCL.so.1. If there is also a libamdoclxx.so somewhere, I would create a link to it as well, just as I've done above. My two links cause BOINC to properly detect the GPU but I do have to leave off the version extension on libOpenCL.so or else it doesn't work.

Try something like this and see if it fixes things.

Cheers,
Gary.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

Hello Gary, Thank you for

Hello Gary,

Thank you for sharing the results of your efforts.

I like your approach to use the packages from the distro.
(About a year and a half ago, I spent about 2 weeks compiling different AMD code -
to find out that it didn't agree with the standard video drivers. I then tried Debian and several flavors of Ubuntu to find the versions of fglrx and BOPINC that I needed. I am now gun-shy...)

Anyway, last night, I did two things. I *really* should have stopped with just one and then re-boot to test.

So I now have good news and bad news. :)

Good news: The Ubuntu 14.0-4 did not have fglrx-current, but it did have
fglrx-updates-dev; and I installed it.

Bad news: I was playing with aticonfig, getting the temperature of the GPU and did a sudo aticonfig --initial. On reboot, the video was stuck with a splash screen.

Yeah. Right.

But. I have 2 video controllers: (from lspci)
1) 01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
2) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

(I have no idea why lspci calls my 7750 an audio device, but it does.)

I had put an old GeForce 8400 to use my display driver, hoping to free up crunching resources on the ATI/Radeon 7750.

I can get around the Frozen screen bt switching my display from the old NVIDIA to the 7750 that I use for crunching.

Whichever it was, BOINC now sees the GPU and I can crunch on Einstein.

Perhaps the aticonfig wrote an X11.conf that screwed things up?
I checked in /etc, and X11 reported that there was no preceding X11.conf to save off...

I think, Ill wait for the current WU to complete, backup /home
and prepare to re-install. (and have a beer.)

Then, I'll just install fglrx-updates-dev, and see if that works.
If not, I'll add the links and do the ldconfig -x .

OBTW, the X11 log said
AIGLX error: failed to open /usr/X11R6/lib64/modules/dri/fglrx_dri.so
Well. I have no X11R6 at all in /etc.
I may add empty directories and a link to /usr/lib/fglrx/dri/fglrx_dri.so
(no the lib32 ?)

Does that look OK to you?

I owe you a beer ..

Thanks, Jay

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

RE: I have 2 video

Quote:

I have 2 video controllers: (from lspci)
1) 01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
2) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

(I have no idea why lspci calls my 7750 an audio device, but it does.)


If your graphics card has a HDMI connector it might support sending sound to the HDMI device. That is the audio device being reported.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110381098973
RAC: 30452018

RE: Good news: The Ubuntu

Quote:
Good news: The Ubuntu 14.0-4 did not have fglrx-current, but it did have
fglrx-updates-dev; and I installed it.


Each distro will name their packages differently. In PCLinuxOS, -current is added to mark it as the latest version that has been through testing and is regarded as 'ready for prime time'. There is at least one older version and/or a -legacy version. Other distros will probably use different nomenclature. I wasn't suggesting you should try to find that precise name. The description in the package itself should tell you exactly what driver version it is. I was also trying to alert you to the possibility that there might be more than one package involved if you want everything as provided from AMD. I don't know anything about how this all works with Ubuntu.

Quote:
Bad news: I was playing with aticonfig, getting the temperature of the GPU and did a sudo aticonfig --initial. On reboot, the video was stuck with a splash screen.


Did you run aticonfig from a terminal session after starting X or was it from a console screen with X not running? If the former, you should have a working xorg.conf which should be still there as /etc/X11/xorg.conf.old even after initializing with aticonfig and you could rename it from xorg.conf.old to xorg.conf and then reboot. If the latter, you should have a non-working xorg.conf. You can browse this file to see what driver is being used - should be fglrx. If X is crashing while trying to start, you can browse for error messages in /var/log/Xorg.0.log. You should be able to see exactly what it's unhappy about. Look for entries that are marked with (EE).

One of the things I really like about PCLinuxOS is that there is a very simple control centre function for creating/adjusting xorg.conf in the first place and a very simple recovery from a borked xorg.conf. That's little comfort to you, I know :-).

Quote:
I had put an old GeForce 8400 to use my display driver, hoping to free up crunching resources on the ATI/Radeon 7750.


Did you install video drivers for this card and redo your X configuration? I'm not surprised you got a frozen screen if you still had the fglrx driver in xorg.conf.

Quote:
I can get around the Frozen screen bt switching my display from the old NVIDIA to the 7750 that I use for crunching.


If by "get around the frozen screen" you mean "start X and get fully to the desktop without any problems," then yes, you obviously have the fglrx driver in xorg.conf

Quote:
Whichever it was, BOINC now sees the GPU and I can crunch on Einstein.


That's great, you're home and hosed! :-).

Quote:
Perhaps the aticonfig wrote an X11.conf that screwed things up?
I checked in /etc, and X11 reported that there was no preceding X11.conf to save off...


aticonfig --initial is designed to create a new xorg.conf and it will edit/replace any previous version you may have had. It should save a backup copy that you can recover later. I don't use that flag as PCLinuxOS creates a good version on initial installation. I don't know about your distro but I'd be quite surprised if the file is X11.conf. The full path should be /etc/X11/xorg.conf. Once you have a working version you can easily make a backup copy like /etc/X11/xorg.conf_fglrx_7750 or whatever to remind you of exactly what it was for. That way, you could try to get a working setup with your nvidia card and use whatever procedure is available to create a new xorg.conf for that card without ever losing the working version for your AMD card. Then, if you ever want to change the card your display hangs off back to the AMD card, as root just

cd /etc/X11
mv xorg.conf xorg.conf_nvidia_8400
cp xorg.conf_fglrx_7750 xorg.conf


and then reboot.

Quote:
I think, Ill wait for the current WU to complete, backup /home
and prepare to re-install. (and have a beer.)


Why to you want to do that if your GPU is actually crunching??

Quote:
Then, I'll just install fglrx-updates-dev, and see if that works.
If not, I'll add the links and do the ldconfig -x .


That's a capital X and if your GPU is already crunching, you certainly don't need to do that.

Quote:

OBTW, the X11 log said
AIGLX error: failed to open /usr/X11R6/lib64/modules/dri/fglrx_dri.so
Well. I have no X11R6 at all in /etc.
I may add empty directories and a link to /usr/lib/fglrx/dri/fglrx_dri.so
(no the lib32 ?)

Does that look OK to you?


No it doesn't! :-). What exactly do you mean by "X11 log" - full path please. You shouldn't have X11R6 in /etc. /etc is largely for configuration files for all sorts of things. You should have /etc/X11/ for all the X configuration stuff. You may very well have /usr/X11R6/lib.... for shared libs related to X. The error message you quote looks like X having an issue with using an AMD driver with an NVIDIA card perhaps. If you can't sort it out, cut and paste a few lines of context either side.

Quote:
I owe you a beer ..


No you don't :-).

Cheers,
Gary.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

Hi Gary, Because I

Hi Gary,

Because I probably messed things up really well, I re-did
system install with the UbuntuStudio 13.10 - one release preceding the 14.04 version and started over. This time, I vowed to test only one thing at a time and reboot between tests..

I'll answer your questions first - but they might be moot, now.

Quote:
Did you run aticonfig from a terminal session after starting X or was it from a console screen with X not running? If the former, you should have a working xorg.conf which should be still there as /etc/X11/xorg.conf.old even after initializing with aticonfig and you could rename it from xorg.conf.old to xorg.conf and then reboot. If the latter, you should have a non-working xorg.conf. You can browse this file to see what driver is being used - should be fglrx. If X is crashing while trying to start, you can browse for error messages in /var/log/Xorg.0.log. You should be able to see exactly what it's unhappy about. Look for entries that are marked with (EE)

I was using a gnome-terminal from lightdm.
a ps -ef shows

 root      1025     1  0 16:45 ?        00:00:00 lightdm
root      1051  1025  1 16:45 tty7     00:00:45 /usr/bin/X -core :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
root      1128  1025  0 16:45 ?        00:00:00 lightdm --session-child 12 19

Quote:

Did you install video drivers for this card and redo your X configuration? I'm not surprised you got a frozen screen if you still had the fglrx driver in xorg.conf.

I can get around the Frozen screen bt switching my display from the old NVIDIA to the 7750 that I use for crunching.

If by "get around the frozen screen" you mean "start X and get fully to the desktop without any problems," then yes, you obviously have the fglrx driver in xorg.conf

Yup.
Here is where I feel like a fool.

Both drivers were installed.
If I unplugged the video cable from the old card, or had removed the driver for the old Nvidia card, The Radeon/ATI card would be used and Boinc could crunch.

Otherwise, the old nvidia card would be used and BOINC couldn't see the ATI/Radeon card. (It has 2GB of memory.)

I now *believe* it all has to do with how the two video adapters are connected to a video display.

I have a single flat-panel display with two inputs. the VGA attached to the old nvidia card and the DVI-D input attached to the ATI/Radeon card.

The entire thought behind this was to free up mem & processing on the ATI/Radeon card so I might do two GPU WU at once.
The older Ubuntu OS let me do this out-of-the-box.
Now, it looks like it can't.

I got suspicious when I saw lines out of /var/log/syslog and /var/log/udev
that said that the old nvidia card was the primary device for display=1

Maybe that is it.
I'll disconnect The Nvidia card and see what happens.

Now, I have to do shopping for dinner. :)
Thank you for your time.
I can try to boot with the display turned off and see what happens. :)

Thanks again,
Jay

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

--- M O S T L Y -- S O L V E

--- M O S T L Y -- S O L V E D ---

The problem was getting the OS and X(?and lightdm?) to recognize the correct card.
They did not do well with two cards - of different manufacturer.

Ubuntu pulled in an OpenCL driver with a non-free package. That wasn't a problem.

I got things working by taking out a video card.

BOINC now reports:

Fri 25 Apr 2014 01:33:02 PM EDT |  | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 2048MB, 1908MB available, 2048 GFLOPS peak)
Fri 25 Apr 2014 01:33:02 PM EDT |  | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1411.4 (VM), device version OpenCL 1.2 AMD-APP (1411.4), 2048MB, 1908MB available, 2048 GFLOPS peak)
Fri 25 Apr 2014 01:33:02 PM EDT |  | OpenCL CPU: AMD FX(tm)-8150 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1411.4 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (1411.4))

The Linux is: 3.13.0-24-lowlatency
The Distribution is: UbuntuStudio 14.04 - with only the extra installation of the non-free (multiverse) package of fglrx-updates
(this package required (and pulled in) fglrx-amdcccle-updates.)

In summary, using two different video cards used to work for me in the 12.10 release. I found the problem when I tried to upgrade to either 13.10 or 14.04.

I solved the problem by removing the older NVidia Geforce 8400GS, and using only an ATI/Radeon 7750.

I'll pursue a problem report later.

Thanks for all of the support!!
I learned a lot.

Thanks again,
Jay

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.