Posts by Richard Haselgrove

1) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130743)
Posted 13 minutes ago by Richard Haselgrove
The BRP4G application should be intelligent enough to figure what it is running on, and if that piece of hardware is even capable of doing what it wants to do.

Yes, that seems to be exactly what it's doing.

Or not, not when you see [Einstein@Home] [coproc] ATI instance 1: confirming 0.500000 instance for p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

That still implies it's running on GPU 1, which isn't capable of running the task.

No, that merely implies that BOINC thinks the task is running on GPU 1 (which it shouldn't, being excluded).

I wouldn't call BOINC a reliable witness as to what is actually happening behind the scenes, in this instance.

It might be interesting to see what exactly BOINC had directed the application to do, by examining init_data.xml from the slot directory - but I'm not sure even that would be definitive, because I suspect the application is capable of over-riding an impossible directive (using its own internal OpenCL capability check).
2) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130741)
Posted 39 minutes ago by Richard Haselgrove
Stranger and stranger.

Would it be possible for you to download and run

http://boinc.berkeley.edu/dl/clinfo.zip

-from memory, I think the best way is to run it at an administrative command prompt and redirect the output to a text file.
3) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130737)
Posted 53 minutes ago by Richard Haselgrove
I think it's partially a problem with the back-end here. The server should know that a 256MB non-OpenCL card should not be sent any work, no matter what, since the GPU/memory/drivers/capabilities don't meet the minimum requirements.

The trouble is that the server doesn't allocate work at the device level, only at the host level. The host has a perfectly good GPU/memory/driver/capability resource available for use, so it's OK for the client to request - and for the server to allocate - work for the host in general.

It's only once the task is present on the host that the client has the subsidiary responsibility for allocating it to a particular device - and that's where it seems to be tripping over itself.

The BRP4G application should be intelligent enough to figure what it is running on, and if that piece of hardware is even capable of doing what it wants to do.

Yes, that seems to be exactly what it's doing.
4) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130735)
Posted 1 hour ago by Richard Haselgrove
I could be wrong, but the instance number (0 or 1) is IMHO just an index of the GPU tasks in execution state, not related to any device id.

The way I read the log output, there is no evedence that anything is running on the HD 3000, instead two tasks, one from milkyway and one from E@H, both claim and get half an AMD OpenCL card, here a Cypress, so the dedictaed GPU.


The added support for CPU OpenCL tasks deserves some attention on the project side, tho, so this raises an interesting point indeed: Is the way that new BOINC clients enumerate OpenCL devices (now including CPUs if capable) still compatible with the way science apps enumerate OpenCL devices if they use older BOINC API versions (excluding CPUs), or do all projects now have to recompile their OpenCL apps with new BOINC APIs (when the current beta version is production ready)? And would a new BOINC API version then be compatible with old clients, because not all users will change to the new client instantaneously??? (if not we have a mess).

HB

I think we have a mess.....

I run a host with multiple GPUs (and with the current recommended v7.2.42 client):

23/04/2014 11:16:51 | Einstein@Home | [coproc] intel_gpu instance 0: confirming 1.000000 instance for p2030.20131123.G181.28-03.56.S.b6s0g0.00000_2406_0
23/04/2014 11:16:51 | GPUGRID | [coproc] NVIDIA instance 1: confirming 1.000000 instance for A2ART4Ex04x92-GERARD_A2ART4E-4-14-RND7349_3
23/04/2014 11:16:51 | SETI@home | [coproc] NVIDIA instance 0: confirming 0.500000 instance for 13ja09af.8406.41335.438086664206.12.130_0
23/04/2014 11:16:51 | SETI@home | [coproc] NVIDIA instance 0: confirming 0.500000 instance for 14au09aa.28420.22976.438086664206.12.174_0

I'm quite certain that:

NVIDIA instance 0 (running two SETI tasks) equates to NV Device 0
NVIDIA instance 1 (running one GPUGrid task) equates to NV Device 1

and the intel_gpu has its own numbering sequence, with another instance/device 0...

In general, it's working OK at the moment, and I don't think there's an immediate need to re-compile against a new API. But Darell's case, with three tasks running on the Cypress card, shows something's not quite right. I don't know if the problem is the non-OpenCL capable HD 3000, or the additional OpenCL CPU capability - Oliver might be interested in having a look.
5) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130733)
Posted 2 hours ago by Richard Haselgrove
OK, I see

16-Apr-2014 17:14:34 [Milkyway@Home] [coproc] Assigning 0.500000 of ATI free instance 0 to de_modfit_15_3s_130_wrap_1_1396965303_1897902_1
16-Apr-2014 17:14:34 [SETI@home] [coproc] Assigning 0.500000 of ATI instance 0 to ap_09dc08ad_B0_P1_00257_20140416_15148.wu_1
16-Apr-2014 17:14:34 [Einstein@Home] [coproc] Assigning 0.500000 of ATI free instance 1 to p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

16-Apr-2014 17:15:34 [Milkyway@Home] [coproc] ATI instance 0; 0.500000 pending for de_modfit_15_3s_130_wrap_1_1396965303_1897902_1
16-Apr-2014 17:15:34 [SETI@home] [coproc] ATI instance 0; 0.500000 pending for ap_09dc08ad_B0_P1_00257_20140416_15148.wu_1
16-Apr-2014 17:15:34 [Einstein@Home] [coproc] ATI instance 0; 0.500000 pending for p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

16-Apr-2014 17:15:34 [Milkyway@Home] [coproc] ATI instance 0: confirming 0.500000 instance for de_modfit_15_3s_130_wrap_1_1396965303_1897902_1
16-Apr-2014 17:15:34 [SETI@home] [coproc] ATI instance 0: confirming 0.500000 instance for ap_09dc08ad_B0_P1_00257_20140416_15148.wu_1
16-Apr-2014 17:15:34 [Einstein@Home] [coproc] ATI instance 1: confirming 0.500000 instance for p2030.20131122.G177.14+00.50.S.b3s0g0.00000_3792_0

There's some useful stuff in SETI task stderr:

OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
Number of OpenCL platforms: 1
Name: Cypress

etc. Milkyway says

Using device 0 on platform 0
Found 1 CL device
Device 'Cypress' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)

Einstein doesn't give as much OpenCL diagnostic information, but in result 432253007 I see:

boinc_get_opencl_ids returned [0000000002DDA950 , 000007FEE2D94A08]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "Cypress" by: Advanced Micro Devices, Inc.

- I presume the second opencl_id will be the CPU driver.

All in all, I think it demonstrates that BOINC's management of the OpenCL platform still has some deficiencies - especially that jumping around between device numbers in the Assigning/pending/confirming stages.

My suspicion is that the project science applications are all doing their own OpenCL detection, and all are ending up on the same 'Cypress' card despite what BOINC may or may not be telling them to do.

You are running the latest alpha test version of BOINC (v7.3.15), and BOINC's OpenCL detection for CPUs was only added in this test cycle - I don't think any project has actually deployed an OpenCL CPU application we could use to test it yet. In the meantime, I think it would be helpful to report at least the most obvious bugs (BOINC over-committing the Cypress card with 1.5 instances) to the boinc_alpha bug reporting mailing list.
6) Message boards : Problems and Bug Reports : Can anyone explain this result? (Message 130716)
Posted 1 day ago by Richard Haselgrove
What makes you say that the task was computed on an "integrated AMD HD3000"?

The task output (stderr) says 'Using OpenCL device "Cypress"', and the computer description says it's fitted with 'ATI Radeon HD 5800/5900 series (Cypress/Hemlock)' - and the 'AMD Athlon(tm) II X2 250 Processor' didn't have an integrated APU when it was launched in 2009!

Computation on the discrete HD 5xxx card sounds more plausible.
7) Message boards : News : Gravitational Wave search GPU App version (Message 130473)
Posted 11 days ago by Richard Haselgrove
OK, those API- exports are probably not relevant, then.

Maybe these are more significant, if you can recognise any of them?

[D? ] DCOMP.DLL

Import Ordinal Hint Function Entry Point
------ ------------- ---- ------------------------ -----------
[OE ] 1017 (0x03F9) N/A N/A Not Bound
[CE ] N/A N/A DCompositionCreateDevice Not Bound

[D? ] GPSVC.DLL

Import Ordinal Hint Function Entry Point
------ ------- ---- ------------------------------------- -----------
[CE ] N/A N/A ProcessGroupPolicyCompletedExInternal Not Bound
[CE ] N/A N/A RsopAccessCheckByTypeInternal Not Bound
[CE ] N/A N/A RsopFileAccessCheckInternal Not Bound
[CE ] N/A N/A RsopSetPolicySettingStatusInternal Not Bound
[CE ] N/A N/A ProcessGroupPolicyCompletedInternal Not Bound
[CE ] N/A N/A RsopResetPolicySettingStatusInternal Not Bound

[D? ] IESHIMS.DLL

Import Ordinal Hint Function Entry Point
------ ------- ---- ------------------------------------ -----------
[CE ] N/A N/A IEShims_Initialize Not Bound
[CE ] N/A N/A IEShims_InDllMainContext Not Bound
[CE ] N/A N/A IEShims_GetOriginatingThreadId Not Bound
[CE ] N/A N/A IEShims_CreateWindowEx Not Bound
[CE ] N/A N/A IEShims_SetRedirectRegistryForThread Not Bound
8) Message boards : News : Gravitational Wave search GPU App version (Message 130471)
Posted 11 days ago by Richard Haselgrove
Googling suggests the problem might be related to missing Microsoft Visual Studio runtime redistributable packages. Are you using either VS 2008 or VS 2010 - if so, which?

(tasks are erroring, as Holmis described, but I'll save some for testing later)
9) Message boards : News : Gravitational Wave search GPU App version (Message 130469)
Posted 11 days ago by Richard Haselgrove
Got a similar but rather shorter list of missing files with the 32-bit version of dependency walker (bitness matters, with that tool).



Host is host 5744895 - 64-bit Windows 7 with NV GTX 670, driver 335.23 (about 4 weeks ago).
10) Message boards : News : Gravitational Wave search GPU App version (Message 130465)
Posted 11 days ago by Richard Haselgrove
Is the Windows version running successfully anywhere?

BM

I'll test too.


Next 10

Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2014 Bruce Allen