All Einstein cpu tasks stuck at 98%

Bat
Bat
Joined: 1 Nov 14
Posts: 2
Credit: 15357867
RAC: 0
Topic 197778

All Einstein cpu tasks stuck at 98%. GPU tasks are fine. I have tried changing the preferences for CPU, and also tried suspending all other tasks, and rebooting the PC. I only installed BOINC yesterday, so it must be up to date.

Can anyone help?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

All Einstein cpu tasks stuck at 98%

Hi and welcome to Einstein@home!

It's quite common for tasks to stay at 98 or 99% done for quite some time before actually finishing.
The few Gamma-ray pulsar search #4 tasks that your computer has finished took about 12-12.5 hours so let the tasks run for at least that amount of time before worrying.

So there's probably no problem just a bit of patience needed.

Bat
Bat
Joined: 1 Nov 14
Posts: 2
Credit: 15357867
RAC: 0

You were right - they have

You were right - they have all completed now. Thanks a lot!

driva
driva
Joined: 16 Dec 11
Posts: 1
Credit: 270214
RAC: 0

I have the same problem, but

I have the same problem, but if i restart the boinc client, the time elapsed restart and remains 10 minutes, for only wu stuck at 98%.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109975326959
RAC: 29586044

It's not a 'problem', it's

It's not a 'problem', it's just the way things are. At around 98% a variable length post-processing stage is entered during which no checkpoints are saved. The fact that progress remains at the same value doesn't mean that nothing is happening or that something is 'stuck'.

If you stop BOINC during this time, you will lose any post-processing done since the last checkpoint was written. This stage takes a variable amount of time which cannot be predicted so you just need to be patient and let it complete. It could take up to an hour or more on slower machines. So the message is not to stop BOINC during this time if at all possible.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109975326959
RAC: 29586044

I recently received the

I recently received the following PM from a new volunteer (BJM) who was prevented by the 'minimum credit requirement' from posting to this thread. I thought it would be best to copy the message and reply to it here.

Quote:

Subject: re: BRPS Areceibo past 99% 12 hrs. ago is this ok?

it passed 99.999% 8 hrs. ago; how much longer before it uploads? Will keep it going but it's over 24 hrs. total now & just wondering what's taking so long based on the following posts:

In http://einsteinathome.org/node/197778&nowrap=true#134737
you said:

Quote:

At around 98% a variable length post-processing stage is entered during which no checkpoints are saved. The fact that progress remains at the same value doesn't mean that nothing is happening or that something is 'stuck'.

... you just need to be patient and let it complete. It could take up to an hour or more on slower machines. So the message is not to stop BOINC during this time if at all possible.

Holmis said:

Quote:
It's quite common for tasks to stay at 98 or 99% done for quite some time before actually finishing. The few Gamma-ray pulsar search #4 tasks that your computer has finished took about 12-12.5 hours so let the tasks run for at least that amount of time before worrying. So there's probably no problem just a bit of patience needed.

Would have posted to the thread, but not allowed due to error message:
"Unable to handle request: In order to create a new thread in Problems and Bug Reports you must have a certain amount of credit. This is to prevent and protect against abuse of the system."

Not sure what "credit" means in this case ... aren't 3000 BOINC credits in just a few days enough to indicate some common sense? I remember Archie, Veronica, WAIS & Mosaic & wouldn't flame anyone, just need advice & help understanding the process here.

TIA!

On checking BJM's details, this account joined about 30 hours ago and the computer has just 2 tasks. So far neither one has successfully completed which explains the inability to post. Not sure what is meant by the "3000 BOINC credits in just a few days" comment though. Perhaps these credits were acquired at a different project which wont help with posting here.

I don't use Intel GPUs at all which is why I'm keen to open this up to others who do. There is one failed Intel GPU BRP4 task, which I'm guessing is probably due to a 'bad' Intel driver version, but the subject of the PM seems to imply that this task is stuck at 99+% rather than having completely failed.

The other task is an FGRP4 CPU task of the type we were discussing earlier in this thread, so I'm assuming it's really this one that BJM is seeing as "stuck". I'm not sure because you never see properly performing FGRP4 tasks showing 99.xxx% complete. They get to somewhere in the 96-98% region and then sit there for a while before jumping to 100% and immediately uploading when the final 'variable time' stage has been completed.

If BJM is really seeing an FGRP4 task stuck at 99.999%, I can think of a possible explanation. Modern BOINC clients simulate 'progress' until the first checkpoint is written by artificially incrementing (every second) the %done counter. Once the first checkpoint is written, the per second simulated incrementing stops and the true progress is shown by a jump in the % value at every subsequent checkpoint. If something is wrong and the first checkpoint is never achieved, the simulated value will keep incrementing (per second) until it gets to 99.999%. I have actually seen this happen in the long gone FGRP3 days :-).

So, my advice to BJM would be to read through the Support for (integrated) Intel GPUs thread and make sure a 'good' Intel driver is being used. Then, on restarting BOINC, perhaps both BRP4 and FGRP4 tasks might behave properly.

If anyone has other suggestions, please feel free to contribute.

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I agree with Gary's

I agree with Gary's explanation and conclusion, restarting Boinc to see if things get better seems like a good plan.
As to a working Intel GPU driver I'm using 10.18.10.3945 at the moment and that's working ok on my Intel HD 4000 GPU, as BJM has the same GPU (although a different CPU) I see no apparent reason that it shouldn't work. The driver supports both win 7 and 8.1 so should be good. The driver can be downloaded from https://downloadcenter.intel.com/SearchResult.aspx?lang=eng&ProdId=3712 and is intended for a laptop, if that's not correct then go back to the download center and make the right choices for the system.

As to not being able to post I think you need at least 1 RAC to be allowed to post. At the same time I remember that you used to be able to post in "Problems and Bugs Reports" without having earned credit just to cover cases like this when a problem arises before one i able to complete a single tasks, guess that's changed...

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7058004931
RAC: 1604499

RE: I don't use Intel GPUs

Quote:
I don't use Intel GPUs at all which is why I'm keen to open this up to others who do.


I only ran my sole Einstein-capable Intel GPU briefly, but do remember that work on it had unusual dependence on the CPU support task compared to my nvidia work.

While very, very little CPU time was charged to the CPU support task, progress of the Intel GPU Einstein job was heavily dependent on spare CPU capacity.

So the usual advice around here to use the General preference settings to reduce BOINC CPU task usage to allow better GPU performance seemed in my case to ascend from just a preference to get moderately better performance to a necessity not to have multi-day stalls.

However, I am not remotely current, as when I simplified the BOINC configuration on that machine for lower summer power consumption and for largely unattended operation (it is my wife's daily driver, and I did not want to need to be tampering with it), I gave up on running the Intel GPU for Einstein work. I hope someone with broader or more recent experience with the Intel GPUs at Einstein may comment on this point.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I forgot to comment on the

I forgot to comment on the CPU support needed in my previous post.

Archae86's observations is still true, the Intel GPU app needs a full CPU core to support it or the run time of the tasks really blows up, it's on the scale of taking hours compared to about 15 min per task. Checking in windows task manager the currently running Intel GPU tasks has an elapsed time of 10 min but has only used 21 seconds of CPU time, mostly the process reports 0% CPU usage with the occasional 1-2% blip.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2770663401
RAC: 908947

RE: I forgot to comment on

Quote:

I forgot to comment on the CPU support needed in my previous post.

Archae86's observations is still true, the Intel GPU app needs a full CPU core to support it or the run time of the tasks really blows up, it's on the scale of taking hours compared to about 15 min per task. Checking in windows task manager the currently running Intel GPU tasks has an elapsed time of 10 min but has only used 21 seconds of CPU time, mostly the process reports 0% CPU usage with the occasional 1-2% blip.


Also confirmed, in response to Archae86's post. Tried my Haswell i5 HD4600 (driver version 10.18.10.3621) with all four cores loaded: runtime increased ~7x (on the basis of taking seven minutes to reach 10%, against the normal 11 minutes for the full task).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.