WU's stop in mid-computation and others start

JoeB
JoeB
Joined: 24 Feb 05
Posts: 124
Credit: 85265075
RAC: 10646
Topic 197523

Hi, Here's a problem. On one of my crunchers (ID: 5872621) BOINC will start a WU, run it for awhile, then leave it partially finished to work on something due later. This is on a Linux machine, but I think I've seen this on win machines also.
Here's two examples of WU's from this machine from yesterday:

1. E@H Gravitational ... h1_0919.45_S6Directed_S6CasAf40a_919.75Hz_975_1 ... 50% (done) Due Sat 12 Apr ... Waiting to run

2. E@H Gravitational ... h1_0919.50_S6Directed_S6CasAf40a_919.8Hz_1021_2 ... 16.666% (done) Due Wed 16 Apr 2014 ... Running

Why is #1 above not running while #2 is? #1 is due first! How do I make this happen? The are other WU's due 13 & 15 that are not running also???

I am only running gravitational wave WU's on this machine so allocation between projects should not be the source of the problem.

Thanks
JoeB

Joe B

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110007492639
RAC: 24556224

WU's stop in mid-computation and others start

Quote:
Hi, Here's a problem. On one of my crunchers (ID: 5872621) BOINC will start a WU, run it for awhile, then leave it partially finished to work on something due later. This is on a Linux machine, but I think I've seen this on win machines also.


It's not really a problem but rather the way some versions of BOINC react to the sudden onset of high priority (panic) mode. My guess is that a recent task has taken a lot longer than the estimate and so the estimates for all unstarted tasks have blown out to the point that BOINC thinks there is a potential problem and goes into panic mode. This is a sure sign that your work cache setting is really too large for the capabilities of the host.

What BOINC does seems to be crazy but apparently there is a cunning plan behind it all that improves the chances of getting the maximum number of tasks in before the deadline. I don't like this behaviour as well and you can get rid of it immediately by temporarily suspending a couple of the newest tasks in the cache until BOINC drops out of panic mode and resumes crunching normally. Once that happens, you can suspend the one you don't want running and the one you do should resume immediately.

You have to remember to resume all suspended tasks at some point, hopefully when the estimates have reduced a bit and the panic is over. The long term fix is to reduce your work cache setting to a more reasonable size.

Cheers,
Gary.

JoeB
JoeB
Joined: 24 Feb 05
Posts: 124
Credit: 85265075
RAC: 10646

Thanks, I will try suspending

Thanks, I will try suspending some of the tasks. I have already reduced the cache size.
Thanks again
JoeB

Joe B

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059384931
RAC: 1380056

Some older versions of boinc

Some older versions of boinc did really wildly inappropriate suspension of task after task of the same recently started type because of a bug (as I construed it) in panic mode scheduling. I personally saw cases where many dozens of tasks were simultaneously in a suspended state on a single host because of this.

Sadly I don't recall how long ago they patched this up, but I don't see it on my hosts. (all of which are currently 7.0.25 or newer). I do occasionally see a very small number of suspends--but never a severe breakout now.

As the host you mention here has the oldest boinc version (6.10.58) among your fleet, it could be you will like the behavior better if you download and install a modern boinc version to it. This generally does not require any special measures such as running your work queue down to zero--nor even stopping boinc. In most cases just downloading the new install exe and launching it will start a successful over-install.

JoeB
JoeB
Joined: 24 Feb 05
Posts: 124
Credit: 85265075
RAC: 10646

Gary, I did as you said and

Gary, I did as you said and it is now well behaved.

Archae86, When I apt-get install boinc-manager the return is that I already have the latest version.

Thank you all

JoeB

Joe B

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

RE: Gary, I did as you said

Quote:

Gary, I did as you said and it is now well behaved.

Archae86, When I apt-get install boinc-manager the return is that I already have the latest version.

Thank you all

JoeB


What version there is in the repository depends on the distro's OS version, you might need to go into it's software centre, or use Synaptic to select a later backport,

What distro? and what version?

Claggy

JoeB
JoeB
Joined: 24 Feb 05
Posts: 124
Credit: 85265075
RAC: 10646

Hi Claggy, I have Debian

Hi Claggy,
I have Debian 6.0.9 - squeeze. I know they are up to wheezy, but now that squeeze is working well why should I update/upgrade. A good reason would be that it runs wu's faster, but I don't get the impression that wheezy would do that.

Tx,
JoeB

Joe B

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

These are the different

These are the different packages for Debian:

https://packages.debian.org/search?searchon=sourcenames&keywords=boinc

Claggy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.