Stuck in endless loop

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0
Topic 197819

My boinc client is performing a gamma ray pulsar (Version 1.04). Now it seems to stuck in an endless loop. Every time
the

remaining time is 0:30
elapsed time is 11:39

it jumps back to

remaining time 0.31
elapsed time 11:37

This behaviour repeats over and over again. Another parallel running pulsar search has ended more than two hours before.

Thomas

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

Stuck in endless loop

Do you have 'Leave tasks in memory while suspended?' and 'Suspend work while computer is in use?' set to 'Yes' or 'No' in your computing preferences?,

If you're got 'Leave tasks in memory while suspended?' set to 'No' every time you interrupt the computer, the app will exit fully, and start from the last checkpoint next time it starts,

Einstein apps often checkpoint infrequently, so having 'Leave tasks in memory while suspended?' to 'Yes' is essential.

Claggy

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0

Hello Claggy, I have

Hello Claggy,

I have already set the options exactly the way you have suggested. In my opinion it is not a problem of a suspended task, because you can look at the running task in the boinc client while the times reset like described above.

Thomas

Te Awamutu Space Centre
Te Awamutu Spac...
Joined: 2 Jun 14
Posts: 2
Credit: 2605715
RAC: 0

I'm having a very similar

I'm having a very similar problem that could be the same thing. All my tasks get to 98.654% done, the time remaining counter drops to zero and is replaced by "---", but the task just keeps running forever and nothing else happens.

I checked my settings as per Claggy's message, no change.

gamma ray pulsar (Version 1.04)
BOINC manager v 7.4.27

Te Awamutu Space Centre
Te Awamutu Spac...
Joined: 2 Jun 14
Posts: 2
Credit: 2605715
RAC: 0

... and no sooner do I post

... and no sooner do I post than the problem appears to be fixed. All back to normal.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I'm having a very

Quote:
I'm having a very similar problem that could be the same thing. All my tasks get to 98.654% done, the time remaining counter drops to zero and is replaced by "---", but the task just keeps running forever and nothing else happens.


How long is "forever"?
Please read this thread and then try to let the run for at least 1 hour after reaching this stage.

Eric Findley
Eric Findley
Joined: 18 Nov 14
Posts: 4
Credit: 44597432
RAC: 0

I have two work units stuck

I have two work units stuck at 95.666% complete at 9hrs + elapsed time 25:41 remaining. What should I do?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Are they in the "running"

Are they in the "running" state or are they "waiting to run"?
95.xxx% is the last progress percentage update before the FGRP (Gamma -ray) tasks reach the final variable length stage of processing at 98-99% complete.
On my i7 3770K these tasks normally run for about 12 hours so you probably just need some patience.

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0

In some sense my thread was

In some sense my thread was captured by other problems with the gamma ray pulsar search. I don't mind. But my original problem is different from "stucking at xxx %". And it is unresolved.

My problem is, that every time the remaining time is 30 minutes it jumps back to about 31 minutes. And the elapsed time jumps back at the same moment from 11:39 to 11:37.

I understand that it is difficult, to calculate the time to end a process. That's the reason for the odd behaviour of a lot of progress bars. But the elapsed time should only increase and never decrease.

mikey
mikey
Joined: 22 Jan 05
Posts: 11974
Credit: 1834140267
RAC: 223156

RE: In some sense my thread

Quote:

In some sense my thread was captured by other problems with the gamma ray pulsar search. I don't mind. But my original problem is different from "stucking at xxx %". And it is unresolved.

My problem is, that every time the remaining time is 30 minutes it jumps back to about 31 minutes. And the elapsed time jumps back at the same moment from 11:39 to 11:37.

I understand that it is difficult, to calculate the time to end a process. That's the reason for the odd behaviour of a lot of progress bars. But the elapsed time should only increase and never decrease.

Could the units be 'suspending' due to the pc doing other things? Are you using all of the Boinc defaults from when you first installed Boinc? If yes to the 2nd question then what you are seeing could be normal. The units 'checkpoint' or saving themselves at certain points as they crunch, if your unit gets suspended due to the pc doing something else, then when it comes back to the unit again if picks up at the last 'checkpoint' and continues on from there.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: My problem is, that

Quote:
My problem is, that every time the remaining time is 30 minutes it jumps back to about 31 minutes. And the elapsed time jumps back at the same moment from 11:39 to 11:37.


I interpret that as the task exiting and then restarting from a checkpoint. Now we need to find out why.

Just to recap, how is the setting for "Leave tasks in memory while suspended" set? Remember that if you've set the prefs through Boinc manager they always override the web based settings.

It might be helpful to check in Boinc's event log and post some of the messages here. Open the event log from Boinc's advanced view and then look in the advanced menu.
If there are any messages about starting and restarting tasks the please post some of them here. Or fully restart Boinc and then after some minutes when the tasks as reset a few time post all of the messages here.

Quote:
I understand that it is difficult, to calculate the time to end a process. That's the reason for the odd behaviour of a lot of progress bars. But the elapsed time should only increase and never decrease.


The only time the elapsed time would decrease is when a task is restarted from an earlier checkpoint where the elapsed time was smaller. All time from the checkpoint up until the reset would be lost and should not be counted.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.