3WU BRP3cuda on a single GPU

log in

Advanced search

Message boards : Cruncher's Corner : 3WU BRP3cuda on a single GPU

1 · 2 · 3 · 4 . . . 5 · Next
Author Message
leks
Send message
Joined: 21 Nov 07
Posts: 28
Credit: 458,522,215
RAC: 137,512
Message 109183 - Posted: 4 Jan 2011, 20:53:48 UTC
Last modified: 4 Jan 2011, 21:21:38 UTC

Only for advanced users. any problems on your conscience. If you do not know what to do - does NOTHING!!!!

Only for windows x86. x64 has not been tested.

1WU uses more than 256M GPU memory.

tested on XP SP3. 260.99 nvidia drivers. BOINC 6.10.58.
9600GT 512M -1WU.
9600GSO 768M - 2WU.
GTS250/GTS450 1024M - 3WU.
All WU passed validation.

BRP3 application runs with low priority is used no more than 30 cuda cores. If you change the priority to "realtime", using no more than 50 cuda cores (run time is reduced by about 2-fold).

to change the priority, you can use Process Tamer (This program is FREE, donations welcome).

Depending on your GPU, you must select the number of simultaneously running WU, and their priority.

The missing files can be downloaded here.

In the project(Data directory)folder E@H (einstein.phys.uwm.edu), you must create app_info.xml

If you want to change the number of WU on a GPU, change the flag <count>. 2WU = 0,5. 3WU = 0.33. 4WU = 0.25.

GC S5HF for CPU + 3 BRP3 cuda. (BRP3 for the CPU is not supported).

listing app_info.xml



<app_info>
<app>
<name>einstein_S5GC1HF</name>
<user_friendly_name>Global Correlations S5 HF search #1</user_friendly_name>
</app>
<app>
<name>einsteinbinary_BRP3</name>
<user_friendly_name>Binary Radio Pulsar Search</user_friendly_name>
</app>
<file_info>
<name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</name>
<main_program/>
</file_info>
<file_info>
<name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>db.dev.win.4330b3e5</name>
</file_info>
<file_info>
<name>dbhs.dev.win.4330b3e5</name>
</file_info>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>104</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>0.500000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.4330b3e5</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.4330b3e5</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.330000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
<app_version>
<app_name>einstein_S5GC1HF</app_name>
<version_num>306</version_num>
<platform>windows_intelx86</platform>
<plan_class>S5GCESSE2</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
</app_version>
</app_info>

PS.. HAPPY NEW YEAR.
____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 367
Credit: 624,839,868
RAC: 467,427
Message 109186 - Posted: 4 Jan 2011, 21:08:51 UTC - in response to Message 109183.

This is excellent. Thanks for posting the config.

Profile Vikk
Send message
Joined: 22 May 10
Posts: 7
Credit: 4,781,696
RAC: 0
Message 109194 - Posted: 5 Jan 2011, 8:08:22 UTC
Last modified: 5 Jan 2011, 8:10:34 UTC

My king regards to leks for this app file. The output of the box is something unbeliveble. I`ve never seen something like this in enshtein before. Thank you again!

The box with GTX580 could run 4 threads, GTX570 only 3 because of the shortage with memory
____________

Armin Burkhardt speaking for MPI/FKF
Send message
Joined: 21 Feb 05
Posts: 8
Credit: 1,463,295,147
RAC: 1,626,847
Message 109196 - Posted: 5 Jan 2011, 10:49:38 UTC

Thank you very much for the posting, leks!

Works like a charm here

Windows 7 Ultimate 64bit
BOINC 6.10.58
Intel Q6700@3.66GHz
Gigabyte GTX 460 OC 1GB
GPU load with 2 concurrent BRP processes at high priority
between 75 and 80%
GPU Memory used: around 630-670GB
2 more CPU processes at low priority.

That's what I call throughput!

Armin
____________

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 18 Jan 05
Posts: 1057
Credit: 273,333,574
RAC: 259,029
Message 109197 - Posted: 5 Jan 2011, 11:33:10 UTC

Holy Higgs Boson.....I can't imagine ever having 167 million wu's finished on my machines.

John Clark
Avatar
Send message
Joined: 4 May 07
Posts: 1092
Credit: 3,143,193
RAC: 0
Message 109198 - Posted: 5 Jan 2011, 14:53:01 UTC

There is many examples, including those with more than the billion crunched
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Richard Haselgrove
Send message
Joined: 10 Dec 05
Posts: 1721
Credit: 64,846,904
RAC: 56,381
Message 109199 - Posted: 5 Jan 2011, 15:05:35 UTC - in response to Message 109198.

There is many examples, including those with more than the billion crunched

If you're talking about BRP3 (the subject of this thread) there's a 500-fold difference between the number of WUs crunched and the number of credits awarded.
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,239,682
RAC: 115,140
Message 109203 - Posted: 5 Jan 2011, 18:08:12 UTC

I'd like to mention two things to keep in mind:

1) just being able to run several apps in parallel doesn't necessarily increase productivity. You should closely monitor the crunchtime, at least one volunteer has reported that for his card, there is no increase in productivity

2) If you apply the app_info.xml file and you do get a productivity boost, please keep in mind that this will disable the automatic app update feature of BOINC. You should come back here to check regularly if there's a new app version (not just for BRP3, all Einstein@Home apps will be affected by the app_info.xml).


Yes I know, this thread was explicitly directed to experienced users only, and 99.99% of those will know all this already, but you know how HOW-TOs like this one spread over the internet and will be adopted by novice volunteers in the end.

Happy crunching
HBE

____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 367
Credit: 624,839,868
RAC: 467,427
Message 109205 - Posted: 5 Jan 2011, 18:21:42 UTC - in response to Message 109183.

From some testing I did last night, here is what I have seen so far with the 580 going from running one task at once to three.

1 WU - 33-34 minutes - 33-40% GPU usage - 300-350MB memory usage
3 WU - 60-65 minutes - 73-75% GPU usage - 1GB memory usage

I had all CUDA tasks set to high priority.

Profile Stranger7777
Avatar
Send message
Joined: 17 Mar 05
Posts: 427
Credit: 189,677,426
RAC: 175,498
Message 109207 - Posted: 5 Jan 2011, 18:28:12 UTC - in response to Message 109203.
Last modified: 5 Jan 2011, 18:29:47 UTC

1) just being able to run several apps in parallel doesn't necessarily increase productivity. You should closely monitor the crunchtime, at least one volunteer has reported that for his card, there is no increase in productivity

It does increase because of using most of available cores in graphic cards.
But it leads to a higher GPU temperatures and lowers the stability of memory and core. But this sorts out by WU validation process.

P.S. Nice job Leks. Now I will sure be right after you.
Profile [AF>EDLS] Polynesia
Avatar
Send message
Joined: 1 Apr 09
Posts: 24
Credit: 2,273,003
RAC: 0
Message 109208 - Posted: 5 Jan 2011, 19:00:10 UTC
Last modified: 5 Jan 2011, 19:13:47 UTC

Hello, thank you for this file app_info.

For 64-bit is it that I change some things?

What is more to these lines:

<name>db.dev.win.4330b3e5</name>
</file_info>
<file_info>
<name>dbhs.dev.win.4330b3e5</name>


<file_name>db.dev.win.4330b3e5</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.4330b3e5</file_name>
<open_name>dbhs.dev</open_name>

and : <api_version>6.13.0</api_version>

I did a test and it takes 900 MB of memory for 70% load GPU ...

Besides, version 1.05 is out today

thanks you
____________

Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 18 Jan 05
Posts: 1057
Credit: 273,333,574
RAC: 259,029
Message 109235 - Posted: 6 Jan 2011, 10:52:54 UTC

Well I tend to read these topics just in case but as you can see I don't run a Cuda on any of the current 5 I am running.

Or on any of the others I have had or have waiting for me to bring back to life.

CAL ATI Radeon HD 2300/2400/3200 (RV610) (256MB) driver: 1.4.635 on my top 3 machines right now....so just processors for me.

Maybe when I get one of my former pc's back to life I will add a Cuda.


____________

Armin Burkhardt speaking for MPI/FKF
Send message
Joined: 21 Feb 05
Posts: 8
Credit: 1,463,295,147
RAC: 1,626,847
Message 109236 - Posted: 6 Jan 2011, 11:23:04 UTC - in response to Message 109197.

No holiness required. Most of the credits (around 2 orders of magnitude less
than wu's) originate from cluster burn-in sessions at work (MPI/FKF). Every newly
purchased cluster is tested for some weeks to bring errors in memory, disk and
chipset to light. This could be done with artificial benchmarks but we prefer
to donate these CPU cycles to our Max-Planck colleagues at the Albert Einstein Institute in Potsdam and Hannover. Over the years this accumulates to "billions
and billions of cobblestones" to quote Carl Sagan ;-).

Holy Higgs Boson.....I can't imagine ever having 167 million wu's finished on my machines.



____________
leks
Send message
Joined: 21 Nov 07
Posts: 28
Credit: 458,522,215
RAC: 137,512
Message 109261 - Posted: 6 Jan 2011, 19:38:46 UTC - in response to Message 109236.

GC S5HF for CPU (SSE2) + 3 BRP3 cuda (1.04 + 1.05). (BRP3 for the CPU is not supported).

The missing files can be downloaded here.

<app_info>
<app>
<name>einstein_S5GC1HF</name>
<user_friendly_name>Global Correlations S5 HF search #1</user_friendly_name>
</app>
<app>
<name>einsteinbinary_BRP3</name>
<user_friendly_name>Binary Radio Pulsar Search</user_friendly_name>
</app>
<file_info>
<name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</name>
<main_program/>
</file_info>
<file_info>
<name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>cufft32_23.dll</name>
<executable/>
</file_info>
<file_info>
<name>cudart32_23.dll</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>db.dev.win.4330b3e5</name>
</file_info>
<file_info>
<name>dbhs.dev.win.4330b3e5</name>
</file_info>
<file_info>
<name>db.dev.win.96b133b1</name>
</file_info>
<file_info>
<name>dbhs.dev.win.96b133b1</name>
</file_info>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>104</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.4330b3e5</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.4330b3e5</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.330000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>105</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.96b133b1</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.96b133b1</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>0.330000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
<app_version>
<app_name>einstein_S5GC1HF</app_name>
<version_num>306</version_num>
<platform>windows_intelx86</platform>
<plan_class>S5GCESSE2</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
</app_version>
</app_info>

____________

Profile tolafoph
Send message
Joined: 14 Sep 07
Posts: 110
Credit: 20,906,904
RAC: 1,131
Message 109264 - Posted: 6 Jan 2011, 20:18:42 UTC - in response to Message 109261.
Last modified: 6 Jan 2011, 20:18:57 UTC

GC S5HF for CPU (SSE2) + 3 BRP3 cuda (1.04 + 1.05). (BRP3 for the CPU is not supported).


Thank you very much!!!
I´m runnig 3 tasks on my GTX260. Each task takes around 9700s. If i run only one task it takes around 3900s. The GPU load goes up to 85%.
Profile Gary Roberts
Volunteer moderator
Send message
Joined: 9 Feb 05
Posts: 3768
Credit: 3,410,768,421
RAC: 3,943,109
Message 109269 - Posted: 7 Jan 2011, 3:13:53 UTC - in response to Message 109261.

GC S5HF for CPU (SSE2) + 3 BRP3 cuda (1.04 + 1.05). (BRP3 for the CPU is not supported).

I can't test the following comments as I don't possess any CUDA capable cards, so I would simply warn that whilst I believe the following points are correct, I cannot give any guarantees. YMMV so please think carefully before adopting any of the suggestions.

Firstly, if your intent is to maximise production, you could simply leave out the specification of the graphics app, if you never intend to display the graphics. You could simply leave out all occurrences of lines like

<file_info> <name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name> <executable/> </file_info>

and

<file_info> <name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name> <executable/> </file_info>


Secondly, there is a better way to handle the transition from V1.04 to V1.05. What you have written ensures that any tasks in your cache branded as 1.04 get done with the V1.04 app. This would be important if (for example) there had been any change in the checkpoint and/or output formats between the two versions. I believe this isn't the case and that the difference is to do with thread priorities only. In that situation, because of the time saving advantages, it would be preferable to have 1.04 branded tasks crunched with the 1.05 app - even partly crunched ones. You can make a modification to app_info.xml to achieve this.

Instead of specifying the two apps separately, as you have done here

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>104</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart_xp32_32_16.dll</file_name> <open_name>cudart32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cufft_xp32_32_16.dll</file_name> <open_name>cufft32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> <file_ref> <file_name>db.dev.win.4330b3e5</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>dbhs.dev.win.4330b3e5</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>0.330000</count> </coproc> <gpu_ram>220200960.000000</gpu_ram> </app_version>

and here

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart_xp32_32_16.dll</file_name> <open_name>cudart32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cufft_xp32_32_16.dll</file_name> <open_name>cufft32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> <file_ref> <file_name>db.dev.win.96b133b1</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>dbhs.dev.win.96b133b1</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>0.330000</count> </coproc> <gpu_ram>220200960.000000</gpu_ram> </app_version>

do it like this (just the essential bits to do with version number shown)

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> .... .... </app_version> <app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>104</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> .... .... </app_version> </app_info>


Note in particular that the second <app_version> clause specifies that any task branded as 1.04 can be done with the V1.05 app. This means you will get the benefit of the new app for all 1.04 tasks in your cache. The messages tab will say that V1.04 is being used but if you check, you will actually find that it's indeed using the new app.

If you want to ensure that newly downloaded tasks are branded 1.05 rather than 1.04, make sure the <app_version> clauses are in the order shown above, ie <version_num> of 105 listed before 104.

As I stated at the outset, I believe the above is correct but I don't have the ability to test it out. Anyone implementing these suggestions should test for themselves. If you find anything that seems odd, please let me know. I did make a couple of cut and paste mistakes while composing but I think I've sorted that out (hopefully).

I've tried to check it out by visual inspection and by inference with the behaviour of a similar file used for CPU tasks under similar conditions. The version number change strategy works perfectly for CPU tasks and I believe it should also be much the same for GPUs.


____________
Cheers,
Gary.
Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5081
Credit: 41,740,040
RAC: 9,847
Message 109271 - Posted: 7 Jan 2011, 3:59:16 UTC - in response to Message 109236.
Last modified: 7 Jan 2011, 4:00:34 UTC

This could be done with artificial benchmarks but we prefer to donate these CPU cycles to our Max-Planck colleagues at the Albert Einstein Institute in Potsdam and Hannover.

Superb! To paraphrase Red Riding Hood 'what a big RAC you have'. So let's hope you at least get Xmas cards from MPG/AEI ..... :-)

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Profile MAGIC Quantum Mechanic
Avatar
Send message
Joined: 18 Jan 05
Posts: 1057
Credit: 273,333,574
RAC: 259,029
Message 109280 - Posted: 7 Jan 2011, 10:53:10 UTC - in response to Message 109236.
Last modified: 7 Jan 2011, 10:53:44 UTC

No holiness required. Most of the credits (around 2 orders of magnitude less
than wu's) originate from cluster burn-in sessions at work (MPI/FKF). Every newly
purchased cluster is tested for some weeks to bring errors in memory, disk and
chipset to light. This could be done with artificial benchmarks but we prefer
to donate these CPU cycles to our Max-Planck colleagues at the Albert Einstein Institute in Potsdam and Hannover. Over the years this accumulates to "billions
and billions of cobblestones" to quote Carl Sagan ;-).

[/color]




Yes I meant to say "credits" and not actually "WU's"

But I do know that most of the members with more credits than I have use clusters from at work or schools.

I have been doing this since seti classic and am just using machines that I bought myself and run at home and have spent more thousands of dollars than I even want to add up.

Since I live in the same State as Bill Gates maybe he will set me up with a cluster over in Redmond (at Microsoft)

But the info here is always good to see since as you know there are quite a few doing this.

____________
Profile [AF>EDLS] Polynesia
Avatar
Send message
Joined: 1 Apr 09
Posts: 24
Credit: 2,273,003
RAC: 0
Message 109286 - Posted: 7 Jan 2011, 20:00:24 UTC - in response to Message 109269.
Last modified: 7 Jan 2011, 20:02:34 UTC

GC S5HF for CPU (SSE2) + 3 BRP3 cuda (1.04 + 1.05). (BRP3 for the CPU is not supported).

I can't test the following comments as I don't possess any CUDA capable cards, so I would simply warn that whilst I believe the following points are correct, I cannot give any guarantees. YMMV so please think carefully before adopting any of the suggestions.

Firstly, if your intent is to maximise production, you could simply leave out the specification of the graphics app, if you never intend to display the graphics. You could simply leave out all occurrences of lines like

<file_info> <name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name> <executable/> </file_info>

and

<file_info> <name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name> <executable/> </file_info>


Secondly, there is a better way to handle the transition from V1.04 to V1.05. What you have written ensures that any tasks in your cache branded as 1.04 get done with the V1.04 app. This would be important if (for example) there had been any change in the checkpoint and/or output formats between the two versions. I believe this isn't the case and that the difference is to do with thread priorities only. In that situation, because of the time saving advantages, it would be preferable to have 1.04 branded tasks crunched with the 1.05 app - even partly crunched ones. You can make a modification to app_info.xml to achieve this.

Instead of specifying the two apps separately, as you have done here

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>104</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.04_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart_xp32_32_16.dll</file_name> <open_name>cudart32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cufft_xp32_32_16.dll</file_name> <open_name>cufft32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> <file_ref> <file_name>db.dev.win.4330b3e5</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>dbhs.dev.win.4330b3e5</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>0.330000</count> </coproc> <gpu_ram>220200960.000000</gpu_ram> </app_version>

and here

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart_xp32_32_16.dll</file_name> <open_name>cudart32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cufft_xp32_32_16.dll</file_name> <open_name>cufft32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> <file_ref> <file_name>db.dev.win.96b133b1</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>dbhs.dev.win.96b133b1</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>0.330000</count> </coproc> <gpu_ram>220200960.000000</gpu_ram> </app_version>

do it like this (just the essential bits to do with version number shown)

<app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> .... .... </app_version> <app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>104</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> .... .... </app_version> </app_info>


Note in particular that the second <app_version> clause specifies that any task branded as 1.04 can be done with the V1.05 app. This means you will get the benefit of the new app for all 1.04 tasks in your cache. The messages tab will say that V1.04 is being used but if you check, you will actually find that it's indeed using the new app.

If you want to ensure that newly downloaded tasks are branded 1.05 rather than 1.04, make sure the <app_version> clauses are in the order shown above, ie <version_num> of 105 listed before 104.

As I stated at the outset, I believe the above is correct but I don't have the ability to test it out. Anyone implementing these suggestions should test for themselves. If you find anything that seems odd, please let me know. I did make a couple of cut and paste mistakes while composing but I think I've sorted that out (hopefully).

I've tried to check it out by visual inspection and by inference with the behaviour of a similar file used for CPU tasks under similar conditions. The version number change strategy works perfectly for CPU tasks and I believe it should also be much the same for GPUs.




We can also make a copy of the application of 1.05 and 1.04 renamed ...
____________
leks
Send message
Joined: 21 Nov 07
Posts: 28
Credit: 458,522,215
RAC: 137,512
Message 109287 - Posted: 7 Jan 2011, 20:26:01 UTC - in response to Message 109280.

My previous app_info (GC S5HF for CPU (SSE2) + 3 BRP3 cuda (1.04 + 1.05)) is working.
But the scheduler gives WU for 1.04 application.

Below will app_info only using 1.05

1. disable the get of new WU.
2. finish all BRP3 tasks.
3. update the project.
4. replace app_info.
5. allow to get new WU.

if you do not complete all 1.04BRP3 WU and replace app_info, then all 1.04BRP3 WU will be aborted.

<app_info>
<app>
<name>einstein_S5GC1HF</name>
<user_friendly_name>Global Correlations S5 HF search #1</user_friendly_name>
</app>
<app>
<name>einsteinbinary_BRP3</name>
<user_friendly_name>Binary Radio Pulsar Search</user_friendly_name>
</app>
<file_info>
<name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</name>
<main_program/>
</file_info>
<file_info>
<name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft_xp32_32_16.dll</name>
<executable/>
</file_info>
<file_info>
<name>db.dev.win.96b133b1</name>
</file_info>
<file_info>
<name>dbhs.dev.win.96b133b1</name>
</file_info>

<app_version>
<app_name>einsteinbinary_BRP3</app_name>
<version_num>105</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.200000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<plan_class>BRP3cuda32</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart_xp32_32_16.dll</file_name>
<open_name>cudart32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>cufft_xp32_32_16.dll</file_name>
<open_name>cufft32_32_16.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>db.dev.win.96b133b1</file_name>
<open_name>db.dev</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>dbhs.dev.win.96b133b1</file_name>
<open_name>dbhs.dev</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>CUDA</type>
<count>1.0000</count>
</coproc>
<gpu_ram>220200960.000000</gpu_ram>
</app_version>
<app_version>
<app_name>einstein_S5GC1HF</app_name>
<version_num>306</version_num>
<platform>windows_intelx86</platform>
<plan_class>S5GCESSE2</plan_class>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name>
<open_name>graphics_app</open_name>
</file_ref>
</app_version>
</app_info>
____________

1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Cruncher's Corner : 3WU BRP3cuda on a single GPU


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen