Posts by Richard Haselgrove

1) Message boards : Cruncher's Corner : boinc do not want to use second gpu (Message 141830)
Posted 7 hours ago by Richard Haselgrove
There's one other matter for running multiple GPUs.

Do you have a separate monitor attached to each graphics card? GPU cards must be under load to run.

A cheap and easy solution is to make a dummy VGA plug for the card that doesn't have a monitor attached. Then re-start the machine.

Here's a link that I found helpful.
http://www.geeks3d.com/20091230/vga-hack-how-to-make-a-vga-dummy-plug/

Regards,
Steve

That's sometimes true, but not universally necessary. If it does turn out to be the case, many (most?) modern monitors these days come with dual VGA / DVI inputs, and cables to match, and most GPUs come with either dual outputs or DVI / VGA adapters. With a combination of those components, you can often hook up the GPU needing the load to a spare monitor input - ugly, but quicker than a shopping trip.
2) Message boards : Problems and Bug Reports : BRP6 work not working properly (Message 141826)
Posted 10 hours ago by Richard Haselgrove
The GPU utilization factor was a feature added by Bernd before BOINC had a suitable mechanism of its own.

I think you mean the user-defined utilization factor was added by Bernd, right? Surely the utilization factor must have existed before that.

I don't believe it did. As I recall (my memory is hazy because I wasn't using GPUs at the time) the app_config.xml mechanism came in around late 2012 with BOINC 7.0.42 or something like that. I think that prior to that, people had to use the very user unfriendly app_info.xml mechanism if they wanted to schedule more than 1 concurrent task per GPU. Bernd avoided this pain and suffering by developing the GPU utilization factor as a project configuration option, easily settable by the user. As I understand it, the success of that feature led to the BOINC Devs, some time later, adopting the idea and providing the functionality through app_config.xml. I may be wrong and I'm sure someone will correct me if I am. I don't remember exactly when the GPU utilization factor came in but my understanding is that it was unique to Einstein when first introduced.

You're absolutely right, Gary. app_config.xml was first usable with v7.0.42, available for testing from 12 December 2012. It was first introduced a few days earlier with v7.0.40, but that version was buggy and unusable. AFAIK, Bernd's implementation of GPU utilization factor via the the project website remains unique to Einstein to this day.

Both techniques are clunky, and have their individual drawbacks. GPU utilization factor only comes into effect when new work is fetched for the specific application being controlled: app_config (mostly) comes into effect immediately - the exception is the later enhancement to control multi-threaded applications via cmdline, which only works for newly-started tasks - but often the BOINC Manager display takes some time to catch up.

While I agree that app_config is the most potent tool for fine tuning, I don't like the way it is implemented. I avoid it if possible.

It's really quite easy to use compared to app_info.xml. Sure it's not quite as convenient as editing a single number in the GPU utilization preference box but once you have a basic file installed, editing, followed by a click on 'reread config files' is quick, simple and convenient - and part of the user interface :-).

app_info.xml is an absolutely nightmare for anyone except experienced developers to use. Not only do you to have to specify every detail of the application you want to test or deploy, you have to supply every file that the application needs to run, and you have to to the same for every other application running at the same project. And you have to do it all over again every time the project updates any of its application types. One typing error, and you have to go back to the beginning, probably losing your cached work and/or your program files in the process.

App_config.xml is much simpler. You only have to mention the single application you wish to modify, and even then, most elements are optional. If you get it wrong, it doesn't work (which is fair enough), but no damage is done - you can go back and try again.

Bookmark Application configuration.
3) Message boards : Technical News : Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6" (Message 141796)
Posted 2 days ago by Richard Haselgrove
Shouldn't my GTX660 on my W8 box be receiving CUDA 55 Parks PMS task rather than CUDA32?

Not while the app has been withdrawn for bugfixing.
4) Message boards : Technical News : Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6" (Message 141794)
Posted 2 days ago by Richard Haselgrove
Possible problem with an unresolved dependency on LIBWINPTHREAD-1.DLL
5) Message boards : Cruncher's Corner : Times (Elapsed/CPU) for BRP6-Beta-cuda55 compared to BRP6-cuda32 - Results and Discussion Thread (Message 141793)
Posted 2 days ago by Richard Haselgrove
Well, that's allowed me to download the executable, and Dependency Walker is happy with the CUDA DLL naming - but it is showing an unresolved dependency on LIBWINPTHREAD-1.DLL. einsteinbinary_BRP6_1.52_windows_intelx86__BRP6-Beta-cuda32-nv301.exe didn't need that.
6) Message boards : Cruncher's Corner : Times (Elapsed/CPU) for BRP6-Beta-cuda55 compared to BRP6-cuda32 - Results and Discussion Thread (Message 141791)
Posted 2 days ago by Richard Haselgrove
I noticed this morning that one of my Windows 7 hosts received 1.54 CUDA55 work, so those of us hoping for a Windows version are in luck now.

I'm working to download a supply for all three candidate hosts, then intend to suspend CUDA32 work to get a faster look. An extra complication for performance comparison in my case is that I have been using some thermal throttling, so I've turned that off, but will need to process some CUDA32 work without throttling to get a proper comparison population, assuming initial CUDA55 Windows success here.

Before hitting the "post" button I moved on to suspend my pending CUDA32 word and one running CUDA32 task on one host. Sadly the first CUDA55 job promptly errored out, and as I had failed to take the precaution of suspending all save one of the CUDA55s, another 13 errored out before I stopped things.

The exit status shows as -1073741515 (0xffffffffc0000135)

I used a more cautious initial trial technique on my other two Windows 7 GPU hosts, allowing a single CUDA55 task, and in both of those cases that task also errored out promptly, also with exit status -1073741515 (0xffffffffc0000135).

While I certainly agree with the moderation action moving my performance comparison results to this thread, I've thought this initial error result worthy of a short post in the Technical News thread.

Could you post, or PM me, the <app_version> segment of client_state.xml referencing the v1.54 Beta work, please?

Error code 0xc0000135 (as it's usually written) means "The application failed to initialize properly", and that's usually because of a missing DLL. I'd guess a problem with the CUDA runtime files in this case. It should be possible to test a bit further by specifying the correct files in an app_info.xml file.
7) Message boards : Technical News : Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6" (Message 141790)
Posted 2 days ago by Richard Haselgrove
The application failed to initialize properly (0xc0000135)

That usually means a needed DLL file can't be found - I'm guessing no, or the wrong, CUDA runtime.

Every Google search for the error code says that it's the Dot Net runtime which is missing, but that won't be the case here.
8) Message boards : Cruncher's Corner : Times (Elapsed/CPU) for BRP6-Beta-cuda55 compared to BRP6-cuda32 - Results and Discussion Thread (Message 141775)
Posted 3 days ago by Richard Haselgrove
... nor on older generation than Fermi parts.

I have a working 9800GT which I still mount up occasionally for testing, and which can handle up to cuda65. Though I won't be able to test this app until there's a Windows version.

Its most recent outing has been at SETI Beta. There, the performance sweetspot is a rough tie between cuda23 and cuda32: cuda42 is worse, and cuda50 is very poor indeed. OpenCL has a different performance curve, and is better than CUDA for some types of task, but in general worse.
9) Message boards : Cruncher's Corner : boinc do not want to use second gpu (Message 141759)
Posted 4 days ago by Richard Haselgrove
There is an option to force the use of all GPUs.

I think BOINC uses all GPUs by default now. I haven't needed that option for a while.

No, it just uses the 'best' GPU(s).

If you have two identical (or near identical) GPUs, it could well use both of them automatically - there's a 'loose' element to the test which doesn't require a perfect match.

But if the GPUs differ in what is considered a significant feature, then only the 'best' (as defined in code) GPU is used.

And if only one GPU is active locally, then only one GPU will be reported to the server for display.
10) Message boards : Problems and Bug Reports : Immediate timeout? Missing deadline? (Message 141453)
Posted 24 days ago by Richard Haselgrove
Today I got one of those too. The task is still running but assuming the server won't accept the result I think I'll abort it.

http://einstein.phys.uwm.edu/workunit.php?wuid=220348353

The problem task was h1_0378.00_S6GC1__S6BucketFU2UBb_32310395_1

Unfortunately, the host has contacted the server again since then, and picked up another task:

2015-06-11 06:56:18.1403 [PID=981] Request: [USER#xxxxx] [HOST#11711999] [IP xxx.xxx.xxx.150] client 7.4.23
2015-06-11 06:56:18.1415 [PID=981 ] [send] effective_ncpus 7 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-06-11 06:56:18.1415 [PID=981 ] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2015-06-11 06:56:18.1415 [PID=981 ] [send] Not using matchmaker scheduling; Not using EDF sim
2015-06-11 06:56:18.1415 [PID=981 ] [send] CPU: req 8723.15 sec, 0.00 instances; est delay 0.00
2015-06-11 06:56:18.1415 [PID=981 ] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2015-06-11 06:56:18.1415 [PID=981 ] [send] work_req_seconds: 8723.15 secs
2015-06-11 06:56:18.1415 [PID=981 ] [send] available disk 4.60 GB, work_buf_min 345600
2015-06-11 06:56:18.1415 [PID=981 ] [send] active_frac 0.999977 on_frac 0.639708 DCF 0.678278
2015-06-11 06:56:18.1443 [PID=981 ] [send] [HOST#11711999] not reliable; max_result_day 31
2015-06-11 06:56:18.1444 [PID=981 ] [send] set_trust: random choice for error rate 0.000010: yes
2015-06-11 06:56:18.1444 [PID=981 ] [mixed] sending non-locality work first (0.9847)
2015-06-11 06:56:18.1648 [PID=981 ] [version] Checking plan class 'FGRP4-SSE2'
2015-06-11 06:56:18.1678 [PID=981 ] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2015-06-11 06:56:18.1678 [PID=981 ] [version] plan class ok
2015-06-11 06:56:18.1678 [PID=981 ] [version] Best version of app hsgamma_FGRP4 is 1.06 ID 736 FGRP4-SSE2 (2.65 GFLOPS)
2015-06-11 06:56:18.1678 [PID=981 ] [send] [HOST#11711999] [WU#220632476 LATeah1056E_1136.0_118694_0.0] using delay bound 1209600 (opt: 1209600 pess: 1209600)
2015-06-11 06:56:18.1692 [PID=981 ] [debug] Sorted list of URLs follows [host timezone: UTC+7200]
2015-06-11 06:56:18.1692 [PID=981 ] [debug] zone=+03600 url=http://einstein2.aei.uni-hannover.de
2015-06-11 06:56:18.1692 [PID=981 ] [debug] zone=-18900 url=http://einstein-dl.syr.edu
2015-06-11 06:56:18.1692 [PID=981 ] [debug] zone=-21600 url=http://einstein-dl2.phys.uwm.edu
2015-06-11 06:56:18.1692 [PID=981 ] [debug] zone=-28800 url=http://einstein.ligo.caltech.edu
2015-06-11 06:56:18.1694 [PID=981 ] [send] [HOST#11711999] Sending app_version 736 hsgamma_FGRP4 7 106 FGRP4-SSE2; 2.65 GFLOPS
2015-06-11 06:56:18.1714 [PID=981 ] [send] est. duration for WU 220632476: unscaled 39655.32 scaled 42047.23
2015-06-11 06:56:18.1715 [PID=981 ] [HOST#11711999] Sending [RESULT#504393489 LATeah1056E_1136.0_118694_0.0_2] (est. dur. 42047.23 seconds, delay 1209600, deadline 1435215378)
2015-06-11 06:56:18.1731 [PID=981 ] [send] don't need more work
2015-06-11 06:56:18.1731 [PID=981 ] [mixed] sending locality work second
2015-06-11 06:56:18.1745 [PID=981 ] [send] don't need more work
2015-06-11 06:56:18.1745 [PID=981 ] [send] don't need more work
2015-06-11 06:56:18.1760 [PID=981 ] Sending reply to [HOST#11711999]: 1 results, delay req 60.00
2015-06-11 06:56:18.1770 [PID=981 ] Scheduler ran 0.040 seconds

It would be really interesting to catch and examine a server log for one of these immediate timeouts sometime, and try to work out what's going wrong. But you'd need to be quick about it.


Next 10

Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2015 Bruce Allen