ABP1 CUDA applications

log in

Advanced search

Message boards : Cruncher's Corner : ABP1 CUDA applications

1 · 2 · 3 · 4 . . . 6 · Next
Author Message
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3612
Credit: 128,542,060
RAC: 55,106
Message 100691 - Posted: 26 Nov 2009, 12:03:06 UTC
Last modified: 26 Nov 2009, 12:20:31 UTC

We have finally begun to automatically deliver CUDA work & applications (plan class "ABP1cuda23") to machines that satisfy the following requirements:

- enabled NVIDIA GPU work in Einstein@home preferences
- NVidia GPU with at least 450MB of free memory
- Display Driver version 190.38 (&up), i.e. CUDA 2.3 capability
- BOINC Core Client version 6.10 (&up)

CUDA Beta App testers should drain their work cache and switch back to the normal project work.

BM

Richard Haselgrove
Send message
Joined: 10 Dec 05
Posts: 1722
Credit: 64,965,704
RAC: 56,953
Message 100694 - Posted: 26 Nov 2009, 12:07:06 UTC

Are these the same 'hybrid' applications, requiring a full CPU core in support, that we were testing in Beta, or have you been able to tansfer more of the other calculations (apart from fft) onto the GPU?

Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3612
Credit: 128,542,060
RAC: 55,106
Message 100695 - Posted: 26 Nov 2009, 12:13:31 UTC - in response to Message 100694.

Are these the same 'hybrid' applications, requiring a full CPU core in support, that we were testing in Beta, or have you been able to tansfer more of the other calculations (apart from fft) onto the GPU?

These are basically the same Apps as have been in Beta est. They still require a full CPU core.

BM
cristipurdel
Send message
Joined: 19 Jul 07
Posts: 17
Credit: 93,077
RAC: 0
Message 100697 - Posted: 26 Nov 2009, 13:01:55 UTC - in response to Message 100695.

Are these the same 'hybrid' applications, requiring a full CPU core in support, that we were testing in Beta, or have you been able to tansfer more of the other calculations (apart from fft) onto the GPU?

These are basically the same Apps as have been in Beta est. They still require a full CPU core.

BM

They require a core at 100% or we can find a lower value so that other projects can use it?
I'm running it with 0.3 CPU + 1.0 GPU.
Profile Svenie25
Send message
Joined: 21 Mar 05
Posts: 139
Credit: 2,436,862
RAC: 0
Message 100703 - Posted: 26 Nov 2009, 14:33:36 UTC

The new apps are not shown on the apps-page: http://einstein.phys.uwm.edu/apps.php

Now it would be nice, if you could select, that your CPU only gets tasks for S5R6. Because the ABP1-work is done at the GPU.
____________

[B^S] Elphidieus
Send message
Joined: 20 Feb 05
Posts: 220
Credit: 12,347,408
RAC: 0
Message 100704 - Posted: 26 Nov 2009, 14:52:37 UTC - in response to Message 100691.

We have finally begun to automatically deliver CUDA work & applications (plan class "ABP1cuda23") to machines that satisfy the following requirements:

- enabled NVIDIA GPU work in Einstein@home preferences
- NVidia GPU with at least 450MB of free memory
- Display Driver version 190.38 (&up), i.e. CUDA 2.3 capability
- BOINC Core Client version 6.10 (&up)

CUDA Beta App testers should drain their work cache and switch back to the normal project work.

BM


And did you forget to include a fifth piece of requirement - Not meant for Macs...!!!
____________
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3612
Credit: 128,542,060
RAC: 55,106
Message 100706 - Posted: 26 Nov 2009, 15:55:20 UTC - in response to Message 100704.

And did you forget to include a fifth piece of requirement - Not meant for Macs...!!!

Yep. Stupid as it is, NVidia hasn't yet delivered their promised 64Bit CUDA libraries for Mac OS X. As soon as they do, we could build and send out CUDA Apps for Macs, too.

BM
Profile Gundolf Jahn
Send message
Joined: 1 Mar 05
Posts: 1079
Credit: 341,280
RAC: 0
Message 100709 - Posted: 26 Nov 2009, 16:20:51 UTC - in response to Message 100703.

Because the ABP1-work is done at the GPU.

It's not, at least not exclusively.
Grutte Pier [Wa Oars]~GP500
Avatar
Send message
Joined: 18 May 09
Posts: 39
Credit: 4,329,959
RAC: 1,245
Message 100710 - Posted: 26 Nov 2009, 17:18:48 UTC - in response to Message 100697.
Last modified: 26 Nov 2009, 17:22:10 UTC

Are these the same 'hybrid' applications, requiring a full CPU core in support, that we were testing in Beta, or have you been able to tansfer more of the other calculations (apart from fft) onto the GPU?

These are basically the same Apps as have been in Beta est. They still require a full CPU core.

BM

They require a core at 100% or we can find a lower value so that other projects can use it?
I'm running it with 0.3 CPU + 1.0 GPU.


I find that very bad that it needs a 100% core and that you set cuda on Enable as default.

It's a problem with that on a few pc.
I run Einstein on a pc and folding@home on the GPU.

It was only by accident that i discovered that cuda was enabled today.
Bedrich Hajek
Send message
Joined: 9 Dec 05
Posts: 1
Credit: 22,024,344
RAC: 23,369
Message 100713 - Posted: 26 Nov 2009, 19:50:52 UTC - in response to Message 100691.

We have finally begun to automatically deliver CUDA work & applications (plan class "ABP1cuda23") to machines that satisfy the following requirements:

- enabled NVIDIA GPU work in Einstein@home preferences
- NVidia GPU with at least 450MB of free memory
- Display Driver version 190.38 (&up), i.e. CUDA 2.3 capability
- BOINC Core Client version 6.10 (&up)

CUDA Beta App testers should drain their work cache and switch back to the normal project work.

BM


The only problem I have with this, is, it is using 100% CPU, while using only about 4% of the GPU from a GTX 285 card.
____________
Profile XJR-Maniac
Avatar
Send message
Joined: 8 Dec 05
Posts: 3
Credit: 670,148
RAC: 0
Message 100714 - Posted: 26 Nov 2009, 19:57:17 UTC

This was my first AND my last ABP1 so called "CUDA" WU!

This application is a complete nonsense! It's occupying 1 CPU-Core AND a GPU for more than six hours now and most of the time the GPU is nearly IDLE, temperature is at 49°C now! In this amount of time the GPU could have crunched a hundred of Milky Way WUs or dozens of Seti or Collatz WUs.

Why doesn't it suspend after a given time? Are there no checkpoints in CUDA apps or what? Others are way too short to suspend, so there's not much to compare for me.

If that's what you call CUDA then cancel it and revert back to CPU only WUs. This is nothing more than a bad joke. Sorry but I didn't contribute in beta test phase, otherwise I would have mentioned this much earlier.
____________

pelpolaris
Send message
Joined: 5 Nov 06
Posts: 2
Credit: 4,761,578
RAC: 0
Message 100715 - Posted: 26 Nov 2009, 22:07:49 UTC - in response to Message 100691.

[quote]We have finally begun to automatically deliver CUDA work & applications (plan class "ABP1cuda23") to machines that satisfy the following requirements:

- enabled NVIDIA GPU work in Einstein@home preferences
- NVidia GPU with at least 450MB of free memory
- Display Driver version 190.38 (&up), i.e. CUDA 2.3 capability
- BOINC Core Client version 6.10 (&up)

Do you have any idea of what kind of "Linows" or "Windux" platforms that can accept to run the Einstein CUDA 23 ?


____________

ML1
Send message
Joined: 20 Feb 05
Posts: 330
Credit: 27,995,887
RAC: 17,628
Message 100716 - Posted: 26 Nov 2009, 22:11:07 UTC - in response to Message 100715.
Last modified: 26 Nov 2009, 22:11:25 UTC

Do you have any idea of what kind of "Linows" or "Windux" platforms that can accept to run the Einstein CUDA 23 ?

Not sure what you mean by that, but their standard app is working fine on this Mandriva 2010.0 Linux system.

The GPU temps suggest that there is not much GPU utilisation, but then again this is their first attempt.

Happy crunchin',
Martin
____________
Powered by: Mageia5
See & try out your OS Freedom! Linux Voice
The Future is what We all make IT (GPLv3)
Profile XJR-Maniac
Avatar
Send message
Joined: 8 Dec 05
Posts: 3
Credit: 670,148
RAC: 0
Message 100718 - Posted: 26 Nov 2009, 23:08:00 UTC

Now got my first (and probably last) ABP1 3.13 "CUDA" WU finished in over 10 hours time on a Q9650/GeForce GTX260 were CPU time was 8.3 hours and GPU time around 2 hours. This means more than 8 hours of wasted GPU time! Does that make any sense?

My last two ABP1 3.12 CPU only WUs took less than 5 hours on a Q9650.

Are the new WUs more complex or longer than the old ones or is this just another bad joke?
____________

Profile Gary Roberts
Volunteer moderator
Send message
Joined: 9 Feb 05
Posts: 3768
Credit: 3,418,524,246
RAC: 3,938,320
Message 100720 - Posted: 27 Nov 2009, 3:25:25 UTC - in response to Message 100718.

Now got my first (and probably last) ABP1 3.13 "CUDA" WU finished in over 10 hours time on a Q9650/GeForce GTX260 were CPU time was 8.3 hours and GPU time around 2 hours. This means more than 8 hours of wasted GPU time! Does that make any sense?

You don't have any CUDA tasks on your Q9650. The list of tasks for that machine shows 3 completed ABP1 tasks, all of which took around 17k secs and none of which used a GPU for crunching. I decided to look at your other machines and I found the GPU crunched ABP1 task on your Pentium D. There are no other ABP1 tasks still showing on that machine so there's no ability to do a comparison. Here is the list of tasks for your pentium D with the GPU crunched ABP1 task at the top. It's interesting to note that it took much the same time to crunch the ABP1 task (250 credits) as the two previous GW tasks (136 credits) - nearly double the credits for a tiny bit more crunch time.

What is even more interesting is the apparent dramatic slowdown after Nov 20. The three earlier GW tasks took around 11K secs each while the two after this date took 27K and 29k secs respectively. Now there is variability in the GW crunch times but there is usually a variation in credits to compensate - at least partially. Since all GW tasks were awarded the same credit, it's unusual to see such a huge variation in crunch time. Can you think of anything that might have happened to your machine after Nov 20? Something drastic like halving the CPU frequency might do it :-).

My last two ABP1 3.12 CPU only WUs took less than 5 hours on a Q9650.

It's not really fair to compare a Pentium D to a Q9650 :-).

Are the new WUs more complex or longer than the old ones or is this just another bad joke?

There aren't any new tasks - just the same old tasks being crunched with a new program which is (performance-wise) much the same as the beta test app it replaces.

It might help if you realise that just because one project can make hugely efficient use of a GPUs parallelism, other projects may struggle to do anything like the same even after considerable effort has been expended. You might take that into account when firing off your criticism.

____________
Cheers,
Gary.
cyborg888
Send message
Joined: 12 Jul 07
Posts: 1
Credit: 892,361
RAC: 0
Message 100723 - Posted: 27 Nov 2009, 5:27:33 UTC

Hello
I was happy to look APP Cuda optimisation in my boinc ...
(config : Q9950 + 8800gt)

But there is maybe a problem.

My GPU iddle temperature is 48-49 deg ...
And during cruching process Gpu temp is same : 48-49 deg

So My gpu doesn't seem to be used.

Best Regards.

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,300,081
RAC: 101,128
Message 100724 - Posted: 27 Nov 2009, 5:57:05 UTC - in response to Message 100723.
Last modified: 27 Nov 2009, 5:59:30 UTC


So My gpu doesn't seem to be used.

Best Regards.

It's normal that the GPU temp will not rise drastically when running this version of the CUDA app, the same was seen during the beta tests. This is because the app still makes heavy use of the CPU for certain parts of the computatations.

This doesn't exclude a speedup of the app by using the GPU:

Let's assume (it's just a simple example) that an app does computations consisting of two parts, A and B, where A has to be executed before B can start. Let's assume that on a CPU, A and B each take 50% of the runtime.

Now assume that only part B can easily be ported to GPU-code, resulting in a (say) 25 fold speedup for this part B. A still has to be done on the CPU.

The result: If the total runtime was 1000 sec before, it is now 520 s, almost doubling the performance. Only 20 seconds of these 520 seconds will be spent in the GPU, or below 4 %. So even small load factors on the GPU can result in reasonable speedups.
Yes, it would be nicer if parts A and B in our example could be done on the GPU for a total speedup of (say) 25-fold performance, but that might be not so easy. Some types of computation lend themselves more easily to parallelization than others.

CU
Bikeman
____________
Profile BigDaddyDave
Send message
Joined: 6 Sep 09
Posts: 15
Credit: 20,616,221
RAC: 13,164
Message 100728 - Posted: 27 Nov 2009, 7:22:25 UTC

Hi all,

Question for you, if these are CUDA WU, why is my CPU crunching them and my GPU is just sitting idle?


11/26/2009 11:18:21 PM Starting BOINC client version 6.10.18 for windows_intelx86
11/26/2009 11:18:21 PM log flags: file_xfer, sched_ops, task
11/26/2009 11:18:21 PM Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
11/26/2009 11:18:21 PM Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
11/26/2009 11:18:21 PM Running under account User
11/26/2009 11:18:21 PM Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz [x86 Family 15 Model 4 Stepping 1]
11/26/2009 11:18:21 PM Processor: 1.00 MB cache
11/26/2009 11:18:21 PM Processor features: fpu tsc sse sse2 mmx
11/26/2009 11:18:21 PM OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
11/26/2009 11:18:21 PM Memory: 1.50 GB physical, 2.85 GB virtual
11/26/2009 11:18:21 PM Disk: 179.31 GB total, 95.15 GB free
11/26/2009 11:18:21 PM Local time is UTC -8 hours
11/26/2009 11:18:21 PM NVIDIA GPU 0: GeForce GTX 260 (driver version 19107, CUDA version 2030, compute capability 1.3, 896MB, 675 GFLOPS peak)
11/26/2009 11:18:21 PM Not using a proxy
11/26/2009 11:18:21 PM Einstein@Home URL http://einstein.phys.uwm.edu/; Computer ID 2063582; resource share 100
11/26/2009 11:18:21 PM SETI@home URL http://setiathome.berkeley.edu/; Computer ID 2378854; resource share 100
11/26/2009 11:18:21 PM SETI@home General prefs: from SETI@home (last modified 04-Mar-2009 23:03:00)
11/26/2009 11:18:21 PM SETI@home Computer location: home
11/26/2009 11:18:21 PM SETI@home General prefs: no separate prefs for home; using your defaults
11/26/2009 11:18:21 PM Preferences limit memory usage when active to 767.36MB
11/26/2009 11:18:21 PM Preferences limit memory usage when idle to 1381.25MB
11/26/2009 11:18:21 PM Preferences limit disk usage to 89.65GB
11/26/2009 11:18:22 PM Einstein@Home Restarting task p2030_54471_60586_0034_G46.39-00.47.S_1.dm_499_1 using einsteinbinary_ABP1 version 313
11/26/2009 11:18:22 PM Einstein@Home Restarting task h1_1085.60_S5R4__1050_S5R6a_1 using einstein_S5R6 version 301


Thanks!


BDDave
____________

cristipurdel
Send message
Joined: 19 Jul 07
Posts: 17
Credit: 93,077
RAC: 0
Message 100729 - Posted: 27 Nov 2009, 7:38:33 UTC - in response to Message 100724.


So My gpu doesn't seem to be used.

Best Regards.

It's normal that the GPU temp will not rise drastically when running this version of the CUDA app, the same was seen during the beta tests. This is because the app still makes heavy use of the CPU for certain parts of the computatations.

This doesn't exclude a speedup of the app by using the GPU:

Let's assume (it's just a simple example) that an app does computations consisting of two parts, A and B, where A has to be executed before B can start. Let's assume that on a CPU, A and B each take 50% of the runtime.

Now assume that only part B can easily be ported to GPU-code, resulting in a (say) 25 fold speedup for this part B. A still has to be done on the CPU.

The result: If the total runtime was 1000 sec before, it is now 520 s, almost doubling the performance. Only 20 seconds of these 520 seconds will be spent in the GPU, or below 4 %. So even small load factors on the GPU can result in reasonable speedups.
Yes, it would be nicer if parts A and B in our example could be done on the GPU for a total speedup of (say) 25-fold performance, but that might be not so easy. Some types of computation lend themselves more easily to parallelization than others.

CU
Bikeman

Are this the actual numbers for ABP1cuda23 ?
If not, could you post the actual speedup on your machine between the GPU and CPU version?
Oliver Bock
Volunteer moderator
Project administrator
Project developer
Send message
Joined: 4 Sep 07
Posts: 516
Credit: 24,180,435
RAC: 0
Message 100733 - Posted: 27 Nov 2009, 10:09:00 UTC - in response to Message 100710.
Last modified: 27 Nov 2009, 10:14:40 UTC


I find that very bad that it needs a 100% core and that you set cuda on Enable as default.


I understand and we tried not to enable the CUDA app by default. Unfortunately that would have involved a change in the BOINC core client code which is not under our direct control. Please note that this is one of the reasons why we set quite a few minimum requirements. This way the number of volunteers who receive CUDA work is as limited as possible.

WRT the efficiency of the current implementation: we are working on a number of improvements. The CPU part of the radio pulsar search received quite a few changes that will not only benefit the CPU-only application but also the CUDA version, thereby moving the computational ratio towards the GPU. These changes will be released as a new application called "ABP2" - probably in the next 1-2 weeks. In parallel to that we are currently working hard to move the remaining CPU part of the CUDA version more or less completely to the GPU.

Please note that even today the CUDA app wouldn't actually require a full CPU. However, as soon as you tell the client you use less than 100% it doesn't renice the process (reduce it's priority) anymore. From our point of view it's better to have the process claiming one CPU at the lowest priority than using, say, 60% at normal priority.


Hope this gives a small insight...

Oliver
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Cruncher's Corner : ABP1 CUDA applications


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen