BRP4cuda32 vs. BRP4cuda32nv301 performance

log in

Advanced search

Message boards : Cruncher's Corner : BRP4cuda32 vs. BRP4cuda32nv301 performance

Author Message
Bjarke
Send message
Joined: 10 Feb 06
Posts: 5
Credit: 689,409
RAC: 0
Message 118914 - Posted: 31 Aug 2012, 8:32:14 UTC

In the past weeks I have been using the 276.52 driver (released 2012.02.09) enabling me to run the BRP4cuda32 application. Using this driver/application it took my Nvidia Quadro FX1800 arround 6200 seconds to complete a WU and get the 500 credits.

After updating to the most recent 305.93 driver (released 2012.08.28) I am able to run the BRP4cuda32nv301 application. This seems to put more load on the GPU, since my system becomes less responsive. Therefore I would expect the runtimes to be shorter than when using the old driver. However, now a WU takes arround 19000 seconds to complete - and I still get 500 credits.

My questions are:
- Are my GPU performing better using the old driver and the BRP4cuda32 application?
- Or am I doing any better with the new driver (eg. crunching more numbers), and the credits are just not adjusted properly?

The "credit/time" might not be a very accurate measurement of performance.
____________

Richard Haselgrove
Send message
Joined: 10 Dec 05
Posts: 1657
Credit: 54,752,104
RAC: 45,833
Message 118916 - Posted: 31 Aug 2012, 9:22:05 UTC - in response to Message 118914.

The actual applications used for BRP4cuda32 and BRP4cuda32nv301 are identical:

Comparing files einsteinbinary_BRP4_1.25_windows_intelx86__BRP4cuda32nv301.exe and EINSTEINBINARY_BRP4_1.25_WINDOWS_INTELX86__BRP4CUDA32.EXE
FC: no differences encountered

Reading the - huge and complicated - Comparison of Nvidia graphics processing units table at Wikipedia, it looks as if your FX1800 has a variant of the G94 GPU chip, which makes it comparable to a 9600 GS or GT with Compute Capability 1.1

That's quite an old technology. It's highly unlikely that NVidia is targeting driver improvements on those old chips: new features will be designed for the Fermi (GTX 4xx and 5xx) and Kepler (GTX 6xx) ranges. It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.

I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.
Bjarke
Send message
Joined: 10 Feb 06
Posts: 5
Credit: 689,409
RAC: 0
Message 118918 - Posted: 31 Aug 2012, 10:48:28 UTC - in response to Message 118916.

It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.


Interesting view. I havent thought of that, but it truly does make sense.

I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.


However, since my compute-time with the new 305.93 driver is now 3 times as long as with the old 276.52 driver - and I still get the same credit - I might consider downgrading to the old 276.52 driver.

It would be nice if someone could clarify whether the credits truly is an accurate measure for the amount of work done. Put in another way:
Am I doing the same amount of work i 6200 seconds using the 276.52 driver as I am doing in 19000 seconds using the 305.93 driver?
____________
Richard Haselgrove
Send message
Joined: 10 Dec 05
Posts: 1657
Credit: 54,752,104
RAC: 45,833
Message 118919 - Posted: 31 Aug 2012, 11:17:17 UTC - in response to Message 118918.
Last modified: 31 Aug 2012, 11:18:47 UTC

The tasks and the application are the same. The credits are fixed at 500 per task by the project.

The 3xx series drivers are the first to preview cuda5: we're running cuda3(.2) here. That may make a difference - it does sound as if for that card, and this project, the older drivers are more suitable.

(304 and higher drivers for most other NVidia cards are still in Beta)

Bjarke
Send message
Joined: 10 Feb 06
Posts: 5
Credit: 689,409
RAC: 0
Message 118920 - Posted: 31 Aug 2012, 11:44:29 UTC - in response to Message 118919.

Thanks. I will definitely downgrade to the 276.52 driver then.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3420
Credit: 126,107,187
RAC: 175,783
Message 118926 - Posted: 31 Aug 2012, 15:23:18 UTC
Last modified: 31 Aug 2012, 15:24:59 UTC

Hi!

Thanks for reporting this. As Richard has already mentioned, the two apps in question are, in fact, identical, byte-by-byte.

We want to support a wide range of users , even those having older drivers and hardware. However,if the penalty to use older runtime and libraries is getting too severe with newer drivers, we will be forced to have additional versions targeting newer CUDA versions soon. Note that with all the variations in operating system (Win, Linux, OSX), word length (64 bit, 32 bit) and CPU/GPU type (SSE, NVIDIA , ATI/AMD) , we now have a dozen or so different executables for the BRP4 search alone.

Oliver has done some marvelous work on automating our continuous integration environment, so we are in principle prepared to support even more variants if really neaded.

Stay tuned.

HB
____________

Claggy
Send message
Joined: 29 Dec 06
Posts: 555
Credit: 2,413,670
RAC: 1
Message 118933 - Posted: 31 Aug 2012, 19:17:10 UTC - in response to Message 118926.
Last modified: 31 Aug 2012, 19:18:15 UTC

My testing of the Lunatics x41z Cuda apps for Setiathome on legacy hardware also showed a slowdown on Cuda 5 preview drivers, (OpenCL Astropulse times also increased):

x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 v6 Bench on the 9800GTX+ (Win Vista x64, 301.42 (Cuda42 drivers)):

WU : PG1327.wu
Lunatics_x41z_win32_cuda22.exe -verb -nog :
Elapsed 33.418 secs, speedup: 87.71% ratio: 8.14x
CPU 8.315 secs, speedup: 96.95% ratio: 32.81x
Lunatics_x41z_win32_cuda23.exe -verb -nog :
Elapsed 33.467 secs, speedup: 87.69% ratio: 8.12x
CPU 8.486 secs, speedup: 96.89% ratio: 32.15x
Lunatics_x41z_win32_cuda32.exe -verb -nog :
Elapsed 34.870 secs, speedup: 87.17% ratio: 7.80x
CPU 8.908 secs, speedup: 96.74% ratio: 30.63x
Lunatics_x41z_win32_cuda41.exe -verb -nog :
Elapsed 34.253 secs, speedup: 87.40% ratio: 7.94x
CPU 8.268 secs, speedup: 96.97% ratio: 33.00x
Lunatics_x41z_win32_cuda42.exe -verb -nog :
Elapsed 34.351 secs, speedup: 87.37% ratio: 7.91x
CPU 8.658 secs, speedup: 96.83% ratio: 31.51x

v6 MB Bench of x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 on the 9800GTX+ (Win Vista x64, 304.48 Cuda 5 preview drivers):

WU : PG1327.wu
Lunatics_x41z_win32_cuda22.exe -verb -nog :
Elapsed 39.856 secs, speedup: 85.34% ratio: 6.82x
CPU 8.798 secs, speedup: 96.78% ratio: 31.01x
Lunatics_x41z_win32_cuda23.exe -verb -nog :
Elapsed 39.870 secs, speedup: 85.34% ratio: 6.82x
CPU 8.564 secs, speedup: 96.86% ratio: 31.86x
Lunatics_x41z_win32_cuda32.exe -verb -nog :
Elapsed 50.052 secs, speedup: 81.59% ratio: 5.43x
CPU 9.485 secs, speedup: 96.52% ratio: 28.77x
Lunatics_x41z_win32_cuda41.exe -verb -nog :
Elapsed 47.394 secs, speedup: 82.57% ratio: 5.74x
CPU 8.034 secs, speedup: 97.06% ratio: 33.96x
Lunatics_x41z_win32_cuda42.exe -verb -nog :
Elapsed 47.363 secs, speedup: 82.58% ratio: 5.74x
CPU 8.190 secs, speedup: 97.00% ratio: 33.31x

Claggy

Bjarke
Send message
Joined: 10 Feb 06
Posts: 5
Credit: 689,409
RAC: 0
Message 119012 - Posted: 4 Sep 2012, 9:22:44 UTC
Last modified: 4 Sep 2012, 9:25:26 UTC

I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.



Clearly using the lowest required CUDA version seems to speed up computations significantly.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3420
Credit: 126,107,187
RAC: 175,783
Message 119019 - Posted: 4 Sep 2012, 13:10:33 UTC - in response to Message 119012.
Last modified: 4 Sep 2012, 13:12:11 UTC

I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.



Clearly using the lowest required CUDA version seems to speed up computations significantly.




Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.

Cheers
HB
____________
Bjarke
Send message
Joined: 10 Feb 06
Posts: 5
Credit: 689,409
RAC: 0
Message 119021 - Posted: 4 Sep 2012, 14:37:44 UTC - in response to Message 119019.


Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.


You're right. But that truly does makes it hard to decide which driver to use. Not too old and flawed, not too new and slow...
____________

Message boards : Cruncher's Corner : BRP4cuda32 vs. BRP4cuda32nv301 performance


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen