4 WUs on a GTX 480 |
Message boards : Cruncher's Corner : 4 WUs on a GTX 480
| Author | Message |
|---|---|
|
Just as an info, I run now 4 WUs at a time on a GTX 480 : Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110361 | | |
|
and what is gpu load for 3, 2 and 1 WU at a time? | |
| ID: 110366 | | |
|
Between 62% (1) and 84% (4). Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110369 | | |
|
you have to check 2 and 3 too, because i have info what 2 and 3 WU make the same load for GPU as 4. if so, there is no reason to make more than 2 WU. | |
| ID: 110370 | | |
Just as an info, I run now 4 WUs at a time on a GTX 480 : How many CPU cores are used in support? In other words, how many concurrent CPU-only tasks (of anything) are you running? Is the elapsed time for crunching a CPU task (the wall clock time - not reported CPU time) being significantly affected? Have you actually measured wall clock time to see how well it agrees with what is reported for Run time? What settings do you use for <avg_ncpus> and <max_ncpus> - Run time: +/- 95 minutes So, every ~95 minutes, you complete 4 CUDA tasks? Pretty impressive! My keen interest stems from the fact that I don't (yet) own any CUDA capable cards but I do want to crunch the Parkes work (efficiently - ie not on a CPU) for as long as it lasts. In terms of Australian distances, that facility is just down the road from where I live :-). I have 12 ATI 4850s crunching MW and I'm considering an investment in some suitable CUDA capable units in the interim while I wait to see if there is a successful port to OpenCL. The commentary about an OpenCL version has moved from "unlikely any time soon - if ever" (admittedly made quite a while ago) to "perhaps later in 2011" made about 9hrs ago. The reality will probably be "some time in 2012" :-). With your type of numbers being bandied about, it's very tempting to put more pressure on an already overblown budget :-). ____________ Cheers, Gary. | |
| ID: 110372 | | |
|
I have an I7 CPU. It runs (for the moment): Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110377 | | |
|
In fact the GPU load is: Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110378 | | |
|
what about gpu load for 2 and 3 WUs at a time? | |
| ID: 110379 | | |
|
The GPU load is just above your post Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110380 | | |
|
ok. thank's. sad, what my gtx 560 can't hande more than 2 WU because 1024 is not enough even for 3 | |
| ID: 110381 | | |
|
The memory used is: Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110382 | | |
The memory used is: what's why i can't run 3 =( now i think to sell my 1024mb version and buy 2048 mb | |
| ID: 110383 | | |
The memory used is: ORRRRR just get another box and put it in there! | |
| ID: 110386 | | |
Just as an info, I run now 4 WUs at a time on a GTX 480 : 95-minutes for 4 WUs is excellent. Nice work with that. :) | |
| ID: 110388 | | |
|
yes, an extra 150 Mb of memory allows to run 5 WUs ;-) Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110389 | | |
|
same on 560: | |
| ID: 110390 | | |
|
OK, ok don't make a nervous break down .... buy another GPU ;-) Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110396 | | |
|
OK, here are some numbers for a GTX470 and Linux 64Bit: | |
| ID: 110397 | | |
|
With this config, if I let Seti running, the GPU runs two Einstein and one Seti WU together. Intel I7 930 - GTX 480 - Windows 7 64 Join BOINC Synergy, the best team in the galaxy! | |
| ID: 110400 | | |
OK, here are some numbers for a GTX470 and Linux 64Bit: in this case there is no performance gain: 2WU: 37.5 minutes for each 3WU: 38.3 minutes for each. | |
| ID: 110403 | | |
With this config, if I let Seti running, the GPU runs two Einstein and one Seti WU together. thank you, i will think about to run seti too ) and maybe i will buy 560 with 2048M ram for experiments )) | |
| ID: 110404 | | |
OK, here are some numbers for a GTX470 and Linux 64Bit: @mickydl* Where is your "CAL ATI Radeon HD5x00 series (Redwood) (1024MB) driver: 1.4.815" crunching? It seems that pc, AuthenticAMD AMD Phenom(tm) II X4 940 Processor [Family 16 Model 4 Stepping 2] (4 processors), is ONLY doing cpu units! | |
| ID: 110420 | | |
OK, here are some numbers for a GTX470 and Linux 64Bit: E@H doesn't yet have an app for ATI GPUs. Only CUDA GPUs are supported, and CUDA is an nVidia-only technology. I'm sure there are a lot of OpenCL-capable ATI/AMD cards out there that would love to get in on the action. -- Tony D. | |
| ID: 110433 | | |
OK, here are some numbers for a GTX470 and Linux 64Bit: There are not many projects that have ATI applications for LINUX. The only one that works without too much trouble is DNETC. That's what it is doing, though I would love to do something else with that card - like Einstein or Milkyway :) Michael | |
| ID: 110437 | | |
OK, here are some numbers for a GTX470 and Linux 64Bit: But your rac over there is only 20k, tou should be waaaaay above that with a 58xx card! I have a 5870 on Dnetc and am up above 260k and still climbing, I use Windows but that can't be the difference! Here are the times and credits for one of my units: 904.91 11.95 3,091.20 DNETC@Home v1.31 (ati14) and for one of your units: 10,777.46 99.83 3,292.45 DNETC@Home v1.02 (ati14) For some reason yours is using a much different version of the Dnetc software, probably the Linux version. I can't believe it is that much slower!! My pc is doing units in about 15 minutes each, why is yours taking soooo much longer?! ____________ ![]() | |
| ID: 110459 | | |
|
I don't want to hijack this thread with an off-topic discussion so I'm moving it to ATI performance | |
| ID: 110468 | | |
|
Sorry for being ignorant, but I have a few questions. | |
| ID: 112385 | | |
I tried to google but it's late and I'm sleepy and thus not finding anything useful. You could google for "<app" or "app_info" (without the quotes) or try an advanced forum search (top left corner of this page). But you should try to get some sleep beforehand. ;-) Gruß, Gundolf | |
| ID: 112386 | | |
Sorry for being ignorant, but I have a few questions. I will throw something in the mix here, you need to see how much memory each current unit is using, then you can figure out if playing with an app_info.xml file will do any good. If for instance your current gpu has 512 meg of memory and an Einstein unit takes 350 meg then there is not enough space left over for another unit to run at the same time. If however you have a 2gb gpu and the same 350mb unit, then yes you can run multiple units on the same card. To use multiple cards, which I think is what you are asking, you also need to check the current status of Boinc Manager and see what it says, if it says you are already running more than one gpu unit at a time, 4 in your case, than you may be maxed out depending on the memory requirements of each unit. If however you only see one unit running but have 4 gpu's in one machine then it may NOT an app_info.xml file you need but a cc_config.xml file which is entirely different. A line in the cc_config.xml file will ensure all gpu's on the machine are working, if all gpu's on the machine ARE working and crunching one unit, then you may not be able to separate them. | |
| ID: 112396 | | |
|
You probably looked at my old pc, it has an old 8800GT with 512 MB, my current pc has a GTX480 with 1500 MB. I'm not asking to use multiple cards btw, I only got 1 GPU per system, but I'd like that one GPU, at least the 480 to crunch more than one WU. If I can't it's much more productive for it to just run GPUGRID all the time but I like einstein@home. | |
| ID: 112397 | | |
Any more advice is more than welcome though, a bit busy right now with other things so I'm waiting a bit before I start fiddling around with making an app_info. spare at least 2 CPU-cores if you want to run 4 cuda-jobs at maximum speed. don't start 4 of them at a time - give them an interleave of a minute. sorry to say, but currently einstein has the worst performing cuda app around and it's a real pita to squeeze out a little bit on our side.. of course they could do what you want server side and give us an option to select the number of jobs we want to run in parallel. at least as long as the app is as it is now. | |
| ID: 112398 | | |
I have an I7 CPU. It runs (for the moment): I think this is just a case of the estimate being for a nonHT cpu. ____________ ![]() | |
| ID: 112404 | | |
|
If anyone already made an app_info.xml with S6 WU's included, than it will be highly appreciated if one post it here. | |
| ID: 112407 | | |
If anyone already made an app_info.xml with S6 WU's included, than it will be highly appreciated if one post it here. Both you and Dirk need to send a PM to Claggy and/or rroonnaalldd, they are both experts with app_info.xml files. Dirk how much memory does a use gpu workunit use right now? You will need twice that plus a little bit to fit more than one in memory, you need a little bit for the overhead. | |
| ID: 112410 | | |
If anyone already made an app_info.xml with S6 WU's included, than it will be highly appreciated if one post it here. 100MB is used by windows it seems, 1 cuda WU from einstein adds around 300MB. So if I'd run 4 on my GTX480 it'd use roughly 1300 out of 1500MB. Edit, it'll probably be around the 1350MB mark just like for the op of this thread. I'll pm one of them soon, thanks for the heads up! | |
| ID: 112412 | | |
|
Below, there is my app_info.xml for linux, running quite well. | |
| ID: 112415 | | |
|
Here's a complete app_info for windows with all the latest apps in it, just change the count value to 0.5, 0.33, or 0.25 depending if you want to run 2, 3 or 4 Cuda Wu's at once: <app_info> <app> <name>einstein_S5GC1HF</name> <user_friendly_name>Global Correlations S5 HF search #1</user_friendly_name> </app> <file_info> <name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</name> <executable/> </file_info> <file_info> <name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name> <executable/> </file_info> <app_version> <app_name>einstein_S5GC1HF</app_name> <version_num>306</version_num> <platform>windows_intelx86</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>S5GCESSE2</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> </app_version> <app> <name>einsteinbinary_BRP3</name> <user_friendly_name>Binary Radio Pulsar Search</user_friendly_name> </app> <file_info> <name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3SSE.exe</name> <executable/> </file_info> <file_info> <name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cuda32.exe</name> <executable/> </file_info> <file_info> <name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</name> <executable/> </file_info> <file_info> <name>cudart_xp32_32_16.dll</name> <executable/> </file_info> <file_info> <name>cufft_xp32_32_16.dll</name> <executable/> </file_info> <file_info> <name>db.dev.win.3d35195e</name> </file_info> <file_info> <name>dbhs.dev.win.3d35195e</name> </file_info> <app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>105</version_num> <platform>windows_intelx86</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3SSE</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.05_windows_intelx86__BRP3SSE.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> </app_version> <app_version> <app_name>einsteinbinary_BRP3</app_name> <version_num>107</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>BRP3cuda32</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cuda32.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart_xp32_32_16.dll</file_name> <open_name>cudart32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>cufft_xp32_32_16.dll</file_name> <open_name>cufft32_32_16.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> <file_ref> <file_name>db.dev.win.3d35195e</file_name> <open_name>db.dev</open_name> <copy_file/> </file_ref> <file_ref> <file_name>dbhs.dev.win.3d35195e</file_name> <open_name>dbhs.dev</open_name> <copy_file/> </file_ref> <coproc> <type>CUDA</type> <count>1.000000</count> </coproc> <gpu_ram>314572800.000000</gpu_ram> </app_version> <app> <name>einstein_S6Bucket</name> <user_friendly_name>Gravitational Wave S6 GC search</user_friendly_name> </app> <file_info> <name>einstein_S6Bucket_1.01_windows_intelx86__SSE2.exe</name> <executable/> </file_info> <file_info> <name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</name> <executable/> </file_info> <app_version> <app_name>einstein_S6Bucket</app_name> <version_num>101</version_num> <platform>windows_intelx86</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <plan_class>SSE2</plan_class> <api_version>6.13.0</api_version> <file_ref> <file_name>einstein_S6Bucket_1.01_windows_intelx86__SSE2.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>einstein_S5R6_3.01_graphics_windows_intelx86.exe</file_name> <open_name>graphics_app</open_name> </file_ref> </app_version> </app_info> If you're missing apps and dll's, here are some links: http://einstein.aei.mpg.de/download/einstein_S5GC1HF_3.06_windows_intelx86__S5GCESSE2.exe http://einstein.aei.mpg.de/download/einstein_S5R6_3.01_graphics_windows_intelx86.exe http://einstein.aei.mpg.de/download/einsteinbinary_BRP3_1.05_windows_intelx86__BRP3SSE.exe http://einstein.aei.mpg.de/download/einsteinbinary_BRP3_1.07_windows_intelx86__BRP3cuda32.exe http://einstein.aei.mpg.de/download/einsteinbinary_BRP3_1.00_graphics_windows_intelx86.exe http://einstein.aei.mpg.de/download/cudart_xp32_32_16.dll http://einstein.aei.mpg.de/download/cufft_xp32_32_16.dll http://einstein.aei.mpg.de/download/db.dev.win.3d35195e http://einstein.aei.mpg.de/download/dbhs.dev.win.3d35195e http://einstein.aei.mpg.de/download/einstein_S6Bucket_1.01_windows_intelx86__SSE2.exe Claggy | |
| ID: 112417 | | |
|
Thanks so much Claggy! It works :) | |
| ID: 112418 | | |
|
Just a little update, running 4 WU's on the GPU now. RAM usage is fluctuating between 1200-1300 MB. But the core downclock bug kicked in with all the restarting of tasks so I'll restart now to see how stable it runs overnight. | |
| ID: 112425 | | |
|
Thanks Claggy! 10 minutes, every thing is allright. | |
| ID: 112429 | | |
|
Hmm, I've run into a little problem. I can't get enough WUs in my queue. This is because apparently boinc thinks one cuda WU will now take me 34 hours to complete, despite evidence to the contrary. I remember reading somewhere how to change something that will help reduce that expected time but I just can't remember where. <flops>16276995705.519375</flops> Now the estimated runtime is 2 hours 40 mins which is good enough. I did manage to ruin my current batch of tasks though by fiddling around with the app_info, sorry for that! Also crashed my display driver when I exited boinc but I'm running the latest nvidia beta ones so it's not unexpected and it recovered just fine. Last edit, all my latest tasks are called anonymous platform on my results page now. Is that normal? I don't think they were last night. | |
| ID: 112432 | | |
Last edit, all my latest tasks are called anonymous platform on my results page now. Is that normal? I don't think they were last night. Yes, that's normal. By using an app_info.xml file, you tell the server that you are no longer using the project-supplied applications for the known platforms but an independent (anonymous) one. That's also the drawback with going AP: you won't automatically get the newest applications but have to find and download them before incorporating them in your app_info.xml by hand. Gruß, Gundolf | |
| ID: 112433 | | |
Here's a complete app_info for windows with all the latest apps in it, just change the count value to 0.5, 0.33, or 0.25 depending if you want to run 2, 3 or 4 Cuda Wu's at once: Hi, why not create a project preference for that. Would make it a lot easier and less error prone. Michael ____________ Team Linux Users Everywhere ![]() | |
| ID: 112435 | | |
Just a little update, running 4 WU's on the GPU now. RAM usage is fluctuating between 1200-1300 MB. But the core downclock bug kicked in with all the restarting of tasks so I'll restart now to see how stable it runs overnight. if you can only reach about 75% gpu-load, the tasks are starving. don't care about overall CPU-usage - try to free another cpu-core and see if it improves.. | |
| ID: 112442 | | |
Just a little update, running 4 WU's on the GPU now. RAM usage is fluctuating between 1200-1300 MB. But the core downclock bug kicked in with all the restarting of tasks so I'll restart now to see how stable it runs overnight. True, but even without running any cpu tasks the load only goes to 80%. I think it's because of win7. Got the same thing too on GPUGRID, it just maxes out at a set percentage (differs a bit per task there). I think the best performance I get is when I leave 4 threads free for the GPU. Any more and it doesn't improve gpu usage. Even then each app only uses 5% of the CPU. Could that be changed by changing <avg_ncpus>0.200000</avg_ncpus> to something like? <avg_ncpus>0.400000</avg_ncpus> | |
| ID: 112444 | | |
True, but even without running any cpu tasks the load only goes to 80%. I think it's because of win7. Got the same thing too on GPUGRID, it just maxes out at a set percentage (differs a bit per task there). you got those nvidia-physics drivers installed? maybe those are slowing them down.. I think the best performance I get is when I leave 4 threads free for the GPU. Any more and it doesn't improve gpu usage. Even then each app only uses 5% of the CPU. Could that be changed by changing nope - this is only informative for boinc and does not change anything in the apps behaviour. | |
| ID: 112445 | | |
|
Maybe, but I also use this system for gaming and I'd rather keep physx for that. I haven't installed the 3D vision drivers though. | |
| ID: 112446 | | |
Maybe, but I also use this system for gaming and I'd rather keep physx for that. I haven't installed the 3D vision drivers though. well at the very least you should try uninstalling the PhysX driver temporarily to see if in fact that is the reason your DC projects can't max out your GPU usage. at least then we'll know if PhysX had anything to do with it, and you can go right back to using PhysX. and on the off chance it turns out PhysX isn't the culprit, it would still be worth your while to explore why your GPU isn't reaching full usage under that kind of load... ...besides, inquiring minds would like to know :-) ____________ ![]() | |
| ID: 112447 | | |
|
Well, it does reach full load in projects like milkyway and primegrid so I don't know. | |
| ID: 112450 | | |
Well, it does reach full load in projects like milkyway and primegrid so I don't know. maybe it's just a bandwidth limit on the bus. you may try to scale down to 3 tasks and see if it improves.. | |
| ID: 112455 | | |
|
With 3 the load is lower. And doesn't PCIE x16 give plenty of bandwidth? It's not like I'm running 2 GPU's and the PCIE drops to 8x. I've been told on GPUGRID that win7 is about 11% slower than winXP and linux with their CUDA tasks because of the way win7 handles it. I think it's likely that's the cause here. | |
| ID: 112457 | | |
With 3 the load is lower. And doesn't PCIE x16 give plenty of bandwidth? It's not like I'm running 2 GPU's and the PCIE drops to 8x. I've been told on GPUGRID that win7 is about 11% slower than winXP and linux with their CUDA tasks because of the way win7 handles it. I think it's likely that's the cause here. to my experience this is mostly due to all that nifty crap W7 has by default. as for einstein - yes, the linux-app running full CPU is a lot faster. on other projects like PG, collatz, GPUgrid it's just one or the other. It depends a bit on what else I'm doing with the PC or the kind of CPU tasks that Im running. But if I leave it alone 4 WU's running together will complete in about 90-100 minutes. that's probably as fast as you can get right now. of course it would be much better if the app would be improved and simply running a single WU would fully utilize the GPU. Read somewhere it could be caused by a lack of power but my PSU is 850 watt which should be plenty for my system with just 1 GPU, and besides the GPU regularly gets stressed more while gaming (especially the witcher 2 but damn it looks awesome!). that's silly! if the PSU would not be able to feed you host, you'd have real crashes. I also want to thank everyone for their feedback and advice, much appreciated! HTH! | |
| ID: 112458 | | |
|
Hi - How do you find out your GPU load? | |
| ID: 112967 | | |
Where is the manual/docs for the app_info file? Anonymous platform | |
| ID: 112969 | | |
|
Where is the manual/docs for the app_info file? The app_info.xml file is part of the anonymous platform mechanism. ____________ BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux | |
| ID: 112970 | | |
Hi - How do you find out your GPU load? run GPU Z, download it from HERE ____________ | |
| ID: 112971 | | |
|
You guys are AWESOME! Thanks so much!! | |
| ID: 112972 | | |
Why does BOINC delete a lot of the files/executables when it either doesn't think there's work etc.? Not sure why I keep having to copy stuff back. Not sure if it's because I have things in my app_info file that aren't currently being worked on?? This is 'locality scheduling' in action ( our server is chatting with your rig upon contact and deciding what to do next, preferably with what data files you already have ). See how it goes : I'd suggest it is just settling down with your new setup and BOINC specifically has no work ( project wide ) in the part of the parameter space that you held files for, alternatively it could be that others in the same part of the search space as yourself are chewing through quickly too .... Richard? Should I be trying to get another task on the GPU to get it to 100% utilization? I think the answer is NO, because not enough memory, right, already close to 100% GPU use as well? Your GPU is as fully utilised as possible, the CPUs are fine ( no harm ) and would be doing things other than feeding the GPU. Do you guys have some way you're benchmarking because just counting how many CUDA tasks finished in a certain amount of time manually? That's the way I do it, by checking the runtimes via BOINC 'Projects' tab -> select E@H -> hit 'Your computers' -> on the web page that appears in your browser select 'tasks' for the relevant rig -> the subsequent web page shows the 'Run time' ( wall clock ) and 'CPU time' ( thread time ). I keep getting msgs about the one of the CUDA tasks not being able to run (hasn't really started yet) say every 30 secs or so that says not enough CUDU (3 others are already trying to run which I think is why). Not sure why it does this. In the queue waiting their turn. After a while BOINC will adjust WU requests per your demonstrated load & this will settle down. You don't need to manage that. Good to see you're having fun! :-) Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 112973 | | |
|
Oh nvm, it was F and not C. Ignore my post. | |
| ID: 112975 | | |
|
I can't get any of the app's to work on my GTX 580 & Win XP to try and run 4 Wu's at once, all I keep getting is the following message: | |
| ID: 113471 | | |
|
I finally got the app file figured out, running 6 @ a time on a Dual GTX 580 Setup, will have to see what the times are later ... | |
| ID: 113477 | | |
|
Hi! | |
| ID: 113481 | | |
|
I've searched but am still clueless so please excuse the basic question. | |
| ID: 113482 | | |
|
Hi! | |
| ID: 113485 | | |
As for the GT240: with the new BRP4 workunits I would not expect a dramatic throughput increase by running two units in parallel. Maybe 10% ? It will be interesting to see the actual results. And, to be very specific, the GT240 and all earlier-generation NVidia cards lack the context-switching hardware which make multiple-WU operation worthwhile on 4xx and 5xx series cards. In short, two apps will run simultaneously: but parallel running is likely to be less efficient than running in series, on older hardware. | |
| ID: 113488 | | |
|
I've run 1-2-3 & 4 at a time on each one of my dual GTX 580's and I don't see where I'm gaining anything. Running 1 @ a time is just as productive as running 2-3 or 4 as the times just go up 2-3 or 4 times the amount of time it takes to run 1 @ a time ... :/ | |
| ID: 113489 | | |
I've searched but am still clueless so please excuse the basic question. Here are links to both Linux and Windows app_info.xml files that I made for my systems. I've been able to see 8-13% performance increase running in Linux compared to Windows so Linux is probably the way to go. Linux App Win App The option for changing how many run at once is: <coproc> <type>CUDA</type> <count>0.500000</count> </coproc> 1.0 for one unit per GPU, .50 for two units per GPU, .33 for three units per GPU, etc. The files still have BRP3 GPU in them as I still get a few of those from time-to-time. You also have the option of removing the CPU related sections if you plan to run GPU only to simplify the configuration file. You have to make sure all the necessary project files as specified in the XML are available before using the XML file. | |
| ID: 113490 | | |
As for the GT240: with the new BRP4 workunits I would not expect a dramatic throughput increase by running two units in parallel. Maybe 10% ? It will be interesting to see the actual results. I think running 3 (Memory Permitting) is the most Optimal, anything over that & the run times start increasing Per Wu ... Just observation running different amounts on several GTX 580 Box's ... | |
| ID: 113491 | | |
As for the GT240: with the new BRP4 workunits I would not expect a dramatic throughput increase by running two units in parallel. Maybe 10% ? It will be interesting to see the actual results. Wow, you have an impressive array of hosts there!!! Most of your hosts even come with 2 cards !?! But you were not running 2 x 3 WU in parallel, (3 on each GPU), right?? HB ____________ ![]() ![]() | |
| ID: 113492 | | |
As for the GT240: with the new BRP4 workunits I would not expect a dramatic throughput increase by running two units in parallel. Maybe 10% ? It will be interesting to see the actual results. Right, I was only running 1 on each GPU until a few days ago. I couldn't get an app file to work until then, now I'm just running 2 Dual 580 Box's & settled in on 3 Per GPU as by my figures that's the most Optimal on the GTX 580's I have ... | |
| ID: 113494 | | |
App_info.xml files require that you monitor your system and this forum (for new app releases) so in general I personally would only recommend it to expert users. Or put very bluntly: if you don't know how to write one, please think twice about whether you want to use one ;-) I appreciate the honesty (bluntness) I sometimes miss the subtleties. I consider myself trainable and since I'm in the explore and learn stage I don't mind getting into a little trouble. If I understand properly how this works copying the appropriate file Jeroen linked to into my BOINC directory and changing the coprocessor count 0.5 should allow 2 CUDA tasks to run. The problem will be I have to watch for new versions and modify the file myself, or delete it and go back to one CUDA task at a time. If I miss an update the problem would be no tasks for the new program. Right? Right now it looks like I am using 492 our of 1024 MB of GPU memory with 60-70% or the processors resulting in one BPR4 task every 35 min or so. Temps are pretty steady at 59C with the fan at 40%. Looks like there is excess capacity. This is my new home machine so I keep a pretty close eye on it. I may upgrade that 240 and give it to my son. Joe | |
| ID: 113499 | | |
If I understand properly how this works copying the appropriate file Jeroen linked to into my BOINC directory and changing the coprocessor count 0.5 should allow 2 CUDA tasks to run. Not quite. The app_info.xml file (and all other files mentioned therein) go(es) to the appropriate project directory (einstein.phys.uwm.edu in this case), which is located in the projects subdirectory of the BOINC data directory. Gruß, Gundolf | |
| ID: 113500 | | |
If I understand properly how this works copying the appropriate file Jeroen linked to into my BOINC directory and changing the coprocessor count 0.5 should allow 2 CUDA tasks to run. Thanks, I'll give it a try to see what happens. The more I think about the more I think I'll probably run without it in the long term. This system has only been up for a few days and hasn't got a stable RAC yet but it looks like almost 20K credits/day, no overclocking. I'm guessing another GPU task will add 2 or 3K. Joe | |
| ID: 113510 | | |
|
Would be there a real gain on BRP4 CUDA tasks with GTX260? | |
| ID: 114101 | | |
Would be there a real gain on BRP4 CUDA tasks with GTX260? no, only on Fermi like GPUs. (there was a better performance with BRP3 tasks, but not with BRP4) ____________ | |
| ID: 114112 | | |
|
I still think running more then 2 instances on any gpu is not worth doing. | |
| ID: 114169 | | |
|
its my app_info: | |
| ID: 114432 | | |
failure: couldn't start No main program specified Refer to the app_info documentation The format for directives like that is <main_program/> - a self-closing tag on its own, not an empty open/close pair. Likewise <executable/>, <copy_file/> | |
| ID: 114433 | | |
I still think running more then 2 instances on any gpu is not worth doing. Probably, it is right for more slow cards but just some statistics from GTX 560 Ti card with 2Gb memory: 1. only one BRP4 Wu - about 38 minutes 2. 6 BRP4 Wus samulteniously - about 1 hours and 30 minutes so 150/6= 25 minutes for one Wu. | |
| ID: 114434 | | |
I still think running more then 2 instances on any gpu is not worth doing. Sorry, TWO hours and 30 minutes. | |
| ID: 114438 | | |
|
[ADDED] | |
| ID: 114449 | | |
|
Bikeman wrote: the "new" BRP4 units are quite a bit less CPU intensive than the formerly distributed BRP3 workunits. (while the app itself is the same, the signal data is different).I've been working with my new host which has a GTX 460 graphics card of the Gigabyte SOC flavor. Single WU at a time gives a very tight distribution of elapsed times averaging 1900 seconds, with a stdev of something like 20 seconds. GPU load generally was just under 70%. But running two WU at a time exhibited bimodal behavior. Much of the time the system was running at little if any higher throughput than single WU (as shown by all of rate of progress, GPU load, and power consumption) but at times it would run materially faster (again as manifested by all three). About 50 WUs processed over about a day showed an average throughput advantage over single WU of a bit over 8%. When I first started running three simultaneous WU, all three indicators suggested considerable further improvement. In particular GPU load was mostly about 88%. One of the three active WUs progressed much faster than the other two. But as soon as it finished, this desirable behavior vanished, and since then I've seen GPU load at 77%, power consumption to match, and matched progress on all three WUs with throughput no higher than the 2 WU case average--possibly a bit lower. An additional problem is that the server is generally only awarding this host one BRP4 WU for each request. Sometimes a second or third request is generated and award of one WU each granted at one minute intervals, but then the four hour delay penalty for use of anonymous platform is posted. As the host consumes about eight WUs in four hours if available, this is a problem for unattended operation. When I was trying two WU operation yesterday, the host was given enough work to stay busy--I don't know what the difference is today. In all three conditions, the forecast run time has not converged toward the real one on any reasonable time scale (well past the oft-mentioned 10 returned WU point). For single WU work (with no app_info.xml file) after many dozens of results returned, the estimate was about double truth. At double WU running, the error was much larger, though I failed to log the values. Now at triple WU running, for which the real elapsed time for each WU is about an hour and a half, the initial estimate was over 21 hours, and has so far declined only to 19.5 hours. As I've requested a 3.5 day queue, this alone, however, seems not enough reason for the parsimonious distribution of work. While both double and triple WU operation has given a modest performance boost, unless I can get the higher activity condition observed on some work to be typical by some adjustment, the improvement seems not worth the overhead and risk associated with anonymous platform operation. ____________ | |
| ID: 114491 | | |
Message boards :
Cruncher's Corner :
4 WUs on a GTX 480