Scheduler not assigning GPU tasks

Robert Meckley
Robert Meckley
Joined: 8 Jan 11
Posts: 14
Credit: 1261163475
RAC: 678
Topic 197767

I have not received any GPU tasks for about a week now, neither BRP4 nor BRP5. I had been running multiple BRP5 tasks simultaneously on an AMD Radeon 7970 previously. Then, suddenly the Scheduler assigned only CPU tasks. This has been the case for a week now despite multiple revisions to local preferences as a means to coax the Scheduler to assign GPU tasks. The Event Log shows the same result each time GPU tasks are requested - '0 tasks sent, check the log at HTTP://einstein5 etc. etc.' (My reading of this log entry does not provide any useful information as to why I'm not receiving GPU tasks.) I note that the Server Status page shows the Work Generators for both BRP4 and BRP5 tasks are disabled, but I also note that other Hosts are currently receiving BRP5 tasks without any problem. I can only conclude that the information on the Server Status page is not to be trusted. I am completely baffled by this and desire to fill my HD7970 with tasks once again. Can anyone explain this - what am i missing?

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

Scheduler not assigning GPU tasks

Your Linux host with the HD7970 is now lacking OpenCL support for it's GPU, meaning the scheduler won't send new work, let alone resend any lost work, and any existing AMD GPU work that the host still has can't be crunched either:

http://einstein5.aei.uni-hannover.de/EinsteinAtHome/host_sched_logs/11680/11680744

2014-10-27 17:52:36.6847 [PID=29017] [version] Checking plan class 'BRP5-opencl-ati'
2014-10-27 17:52:36.6847 [PID=29017] [version] parsed project prefs setting 'gpu_util_brp': 0.125000
2014-10-27 17:52:36.6847 [PID=29017] [version] ATI device (or driver) doesn't support OpenCL
2014-10-27 17:52:36.6847 [PID=29017] [version] no app version available: APP#23 (einsteinbinary_BRP5) PLATFORM#7 (x86_64-pc-linux-gnu) min_version 0
2014-10-27 17:52:36.6847 [PID=29017] [version] no app version available: APP#23 (einsteinbinary_BRP5) PLATFORM#1 (i686-pc-linux-gnu) min_version 0
2014-10-27 17:52:36.6848 [PID=29017] [CRITICAL] [HOST#11680744] can't resend [RESULT#461310241]: no app version for einsteinbinary_BRP5
2014-10-27 17:52:36.6852 [PID=29017] [CRITICAL] [HOST#11680744] can't resend [RESULT#461310559]: no app version for einsteinbinary_BRP5

Claggy

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I note that the Server

Quote:
I note that the Server Status page shows the Work Generators for both BRP4 and BRP5 tasks are disabled, but I also note that other Hosts are currently receiving BRP5 tasks without any problem. I can only conclude that the information on the Server Status page is not to be trusted.


It's normal for the work generators to show offline on the server status page, the page only shows a snapshot of the status at the time given on top of it and it updates with an interval of 5 minutes. I believe that the process of generating new work is quite fast so the buffer of tasks to generate is quickly filled and then the generator goes to sleep until a lower limit is reached when it wakes up again. The combination of the above makes it quite rare to see the work generators as "Running", don't think I've ever seen that.

It's better to look at the table on the right at the line "Task to send" as that shows how many tasks there were ready to be sent out at the time the page was generated.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110040007302
RAC: 22407498

Whilst Claggy and Holmis have

Whilst Claggy and Holmis have addressed most of what you are "missing", there's a bit more for you to consider regarding the very first of Claggy's highlighted bits. I also do GPU crunching on AMD GPUs so I do see the occasional situation where the GPU seems to go 'missing in action'.

I don't know if you've tried a simple reboot or not. Unix based OS's can run for long periods without rebooting but I've noticed occasional situations where the GPU can stop crunching whilst CPU tasks continue on unhindered. This happens quite infrequently but when it does it's usually an AMD GPU.

Most of the time, a reboot fixes this but sometimes, after the reboot, BOINC can't find the GPU and all GPU tasks in the cache of work are labelled "GPU missing ...". This seems to be the situation that is affecting you. A reinstall of the fglrx driver and the OpenCL libs seems to fix this. I think I've had to do this reinstall about twice in the last 6-12 months in order to get BOINC to 'see' the GPU again. I have about 25 machines with AMD GPUs.

Cheers,
Gary.

Robert Meckley
Robert Meckley
Joined: 8 Jan 11
Posts: 14
Credit: 1261163475
RAC: 678

Thank you Claggy, Holmis and

Thank you Claggy, Holmis and Gary Roberts for your information and insights. Turns out, it was indeed a driver problem. I've reverted back to a previous driver, and now its business as usual. I had noted the scheduler log entry indicating a lack of OpenCL support for the updated driver I was trying to use, but I really didn't trust this. My ATI configuration utility clearly indicated the updated driver's support for OpenCL version 1.2. Apparently the updated driver was either not correctly reported to the Scheduler or the Scheduler did not recognize the driver as one supporting OpenCL. I suppose I should add that I also installed an earlier version of the BOINC client and daemon, the one I had been using with the previous driver. In any event, the combination of the previous driver and the previous BOINC client did it for me. Maybe all this shows is that if it ain't broke, don't fix it. And Holmis, I appreciate your explanation of the Server Status page. I would never have been able to correctly interpret this information without your explanation. Now I'm at peace with the information on the Server Status page as well as with the information on the Scheduler Log. Once again, I am truly awed by the knowledge of my fellow participants and am grateful for your help.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.