Task exited with zero status but no 'finished' file

log in

Advanced search

Message boards : Cruncher's Corner : Task exited with zero status but no 'finished' file

1 · 2 · Next
Author Message
Juergen Kozok
Send message
Joined: 21 Feb 05
Posts: 5
Credit: 1,247,348
RAC: 270
Message 87455 - Posted: 19 Aug 2008, 13:00:41 UTC

hope this is the correct forum.

this has happened now more times in a row. I am not able to get results. very frustrating?

What is wrong? I have installed the latest version of the BOINC software 6.2.18, but this happened already with the previous version as well.

any resolution to the problem?

19.08.2008 14:39:51|Einstein@Home|Restarting task h1_0759.15_S5R4__505_S5R4a_0 using einstein_S5R4 version 604
19.08.2008 14:40:32|Einstein@Home|
19.08.2008 14:40:32|Einstein@Home|If this happens repeatedly you may need to reset the project.
19.08.2008 14:40:32|Einstein@Home|Restarting task h1_0759.15_S5R4__505_S5R4a_0 using einstein_S5R4 version 604
19.08.2008 14:41:13|Einstein@Home|Task h1_0759.15_S5R4__505_S5R4a_0 exited with zero status but no 'finished' file
19.08.2008 14:41:13|Einstein@Home|If this happens repeatedly you may need to reset the project.
19.08.2008 14:41:14|Einstein@Home|Restarting task h1_0759.15_S5R4__505_S5R4a_0 using einstein_S5R4 version 604
19.08.2008 14:41:54|Einstein@Home|Task h1_0759.15_S5R4__505_S5R4a_0 exited with zero status but no 'finished' file
19.08.2008 14:41:54|Einstein@Home|If this happens repeatedly you may need to reset the project.
19.08.2008 14:41:55|Einstein@Home|Restarting task h1_0759.15_S5R4__505_S5R4a_0 using einstein_S5R4 version 604
19.08.2008 14:42:36|Einstein@Home|Task h1_0759.15_S5R4__505_S5R4a_0 exited with zero status but no 'finished' file
19.08.2008 14:42:36|Einstein@Home|If this happens repeatedly you may need to reset the project.
19.08.2008 14:42:36|Einstein@Home|Restarting task h1_0759.15_S5R4__505_S5R4a_0 using einstein_S5R4 version 604
19.08.2008 14:43:17|Einstein@Home|Task h1_0759.15_S5R4__505_S5R4a_0 exited with zero status but no 'finished' file
19.08.2008 14:43:17|Einstein@Home|If this happens repeatedly you may need to reset the project.

____________

Michael Karlinsky
Avatar
Send message
Joined: 22 Jan 05
Posts: 896
Credit: 21,348,746
RAC: 18,753
Message 87459 - Posted: 19 Aug 2008, 13:43:29 UTC
Last modified: 19 Aug 2008, 13:43:37 UTC

Hi,

what happens, if you do as requested?

Note: All results in progress are lost.

Michael
____________
Team Linux Users Everywhere

Profile Ageless
Avatar
Send message
Joined: 26 Jan 05
Posts: 2974
Credit: 5,374,792
RAC: 0
Message 87460 - Posted: 19 Aug 2008, 13:51:09 UTC

Are you using CPU throttling?
____________
Jord

Juergen Kozok
Send message
Joined: 21 Feb 05
Posts: 5
Credit: 1,247,348
RAC: 270
Message 87461 - Posted: 19 Aug 2008, 13:55:06 UTC - in response to Message 87460.

Are you using CPU throttling?


yes !

____________
Juergen Kozok
Send message
Joined: 21 Feb 05
Posts: 5
Credit: 1,247,348
RAC: 270
Message 87462 - Posted: 19 Aug 2008, 13:56:43 UTC - in response to Message 87459.

Hi,

what happens, if you do as requested?

Note: All results in progress are lost.

Michael

Project is reseted, results are reported as lost (after 10CPU hours used...)
and a new task is downloaded and started from scratch
____________
Profile Ageless
Avatar
Send message
Joined: 26 Jan 05
Posts: 2974
Credit: 5,374,792
RAC: 0
Message 87464 - Posted: 19 Aug 2008, 14:23:05 UTC - in response to Message 87461.
Last modified: 19 Aug 2008, 14:23:53 UTC

Are you using CPU throttling?


yes !

Well, that's your problem then.

Reset Use at most xx percent of CPU time to 100%.
The "CPU throttling" is known to cause restarting of tasks on some machines. If you get heat or response time problems when running on full load, set On multiprocessors, use at most xx percent of processors to 50% in your preferences. That may run only one task at a time, but will prevent your computer from overheating and all your tasks from continuous restarting.

A fix for this problem (better CPU throttling) will (hopefully) be added to the 6.4 client.
____________
Jord
Juergen Kozok
Send message
Joined: 21 Feb 05
Posts: 5
Credit: 1,247,348
RAC: 270
Message 87465 - Posted: 19 Aug 2008, 14:27:54 UTC - in response to Message 87464.

Are you using CPU throttling?


yes !

Well, that's your problem then.

Reset Use at most xx percent of CPU time to 100%.
The "CPU throttling" is known to cause restarting of tasks on some machines. If you get heat or response time problems when running on full load, set On multiprocessors, use at most xx percent of processors to 50% in your preferences. That may run only one task at a time, but will prevent your computer from overheating and all your tasks from continuous restarting.

A fix for this problem (better CPU throttling) will (hopefully) be added to the 6.4 client.


thanks, I am doing so
and wait for the new client or find a better way to cool the CPU again.

____________
Profile Ageless
Avatar
Send message
Joined: 26 Jan 05
Posts: 2974
Credit: 5,374,792
RAC: 0
Message 87467 - Posted: 19 Aug 2008, 14:43:53 UTC - in response to Message 87465.

You can use Threadmaster to throttle the CPU(s).
____________
Jord

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3503
Credit: 149,314,498
RAC: 90,426
Message 87470 - Posted: 19 Aug 2008, 15:41:26 UTC

Hi!

Just to make sure....

The stderr.txt contains this error message:


Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No such file or directory


with each restart. While throttling is certainly one cause for the frequent restarts, is this maybe a separate problem ? Is it typical for throttling-induced restarts to have this error message?

CU
Bikeman
____________
web03
Avatar
Send message
Joined: 1 Jul 05
Posts: 8
Credit: 186,909
RAC: 0
Message 87485 - Posted: 19 Aug 2008, 18:56:08 UTC

I also see as well. And I don't throttle. Isn't this basically the "no heartbeat" message in a different format? I've never found a resolution to this.
____________
Wendy



Check the BOINC Wiki for help

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3503
Credit: 149,314,498
RAC: 90,426
Message 87487 - Posted: 19 Aug 2008, 19:10:47 UTC - in response to Message 87485.

I also see as well. And I don't throttle. Isn't this basically the "no heartbeat" message in a different format? I've never found a resolution to this.


Actually, I think your problem is a different one still:

http://einstein.phys.uwm.edu/result.php?resultid=104169630

This is all about not being able to get the S5R3 (!!) app files.

Strange...Did you use the "dual run" app_info.xml file posted in this forum, but maybe forgot to download the apps in question? Do you have an app_info.xml file installed? Does the problem persist if you remove it?

CU
Bikeman
____________
web03
Avatar
Send message
Joined: 1 Jul 05
Posts: 8
Credit: 186,909
RAC: 0
Message 87490 - Posted: 19 Aug 2008, 20:06:16 UTC - in response to Message 87487.

I also see as well. And I don't throttle. Isn't this basically the "no heartbeat" message in a different format? I've never found a resolution to this.


Actually, I think your problem is a different one still:

http://einstein.phys.uwm.edu/result.php?resultid=104169630

This is all about not being able to get the S5R3 (!!) app files.

Strange...Did you use the "dual run" app_info.xml file posted in this forum, but maybe forgot to download the apps in question? Do you have an app_info.xml file installed? Does the problem persist if you remove it?

CU
Bikeman

Yeah, I tried running dual, but really messed it up. So I detached and reattached to the project. I get this message on both Einstein and CPDN units.
____________
Wendy



Check the BOINC Wiki for help
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3503
Credit: 149,314,498
RAC: 90,426
Message 87491 - Posted: 19 Aug 2008, 20:10:00 UTC

I see.

If you press the "Update" button in BOINC manager while E@H is selected in the projects window (in advanced view), the result files would be uploaded for us to see. Currently only the "Download error units" are visible in the database.

CU
Bikeman
____________

web03
Avatar
Send message
Joined: 1 Jul 05
Posts: 8
Credit: 186,909
RAC: 0
Message 87492 - Posted: 19 Aug 2008, 20:11:25 UTC

result still has about 7 hrs to go before it's finished....

<grins>
____________
Wendy



Check the BOINC Wiki for help

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3503
Credit: 149,314,498
RAC: 90,426
Message 87493 - Posted: 19 Aug 2008, 20:12:50 UTC - in response to Message 87492.

result still has about 7 hrs to go before it's finished....



Oh, so it's running fine now ... ok, I'll keep my fingers crossed.

CU
Bikeman
____________
web03
Avatar
Send message
Joined: 1 Jul 05
Posts: 8
Credit: 186,909
RAC: 0
Message 87494 - Posted: 19 Aug 2008, 20:30:24 UTC

Well, it's been running for about 28 hr 28 min, 78.871% complete from a BOINC Mgr view with about 7 hr 38 min to completion. BOINC LogX thinks it has about 8 hrs 55 min left to go. I'm running a HT machine so I have an Einstein unit paired up with a CPDN unit. There's been some pauses on the Einstein to push through a few SETI units. I've seen this message pop up more than once on each of the projects over the past few days. When I was doing some testing for SETI Astropulse and suspending everything except that one wu, then I didn't notice getting that message. But I hate having to run that way. I don't mind if it takes a bit longer to do the work, especially since it seems that in most cases the wu's do validate. It's just a bit aggravating seeing those messages when I know the proposed solution doesn't change a thing. In the older versions of the BOINC Mgr, it was labeled as the no heartbeat message.

Make sense?
____________
Wendy



Check the BOINC Wiki for help

Profile Ageless
Avatar
Send message
Joined: 26 Jan 05
Posts: 2974
Credit: 5,374,792
RAC: 0
Message 87498 - Posted: 19 Aug 2008, 21:28:49 UTC

I sent off the problem to the developers and the BOINC alpha email list.

David just answered:

I'm using 50% throttling on my dual-core Intel
and can't reproduce the problem.

If any alpha tester is seeing this problem consistently, please let me know;
to debug it I'll probably have to send you some test clients to run.

-- David

So, if you're an alpha, git. ;-)
____________
Jord
web03
Avatar
Send message
Joined: 1 Jul 05
Posts: 8
Credit: 186,909
RAC: 0
Message 87513 - Posted: 20 Aug 2008, 1:31:56 UTC

Thanks Jord. Didn't think I was alpha though... But maybe I am???? ;-)

I saw this on 5.10.45 and now seeing it on 6.2.14 (which I had loaded in that brief moment in time that it was an official version). I can't track down the messages to anything else on the machine (scans, etc). It may happen once a day per project. But then again, it may go a few days without me seeing it.

Also, looking at the OP's messages, I don't see it nearly that frequently on mine. Basically, I'll see it then I'll see the restart, then a few hours or so later, I'll see the wu complete.
____________
Wendy



Check the BOINC Wiki for help

Profile Ageless
Avatar
Send message
Joined: 26 Jan 05
Posts: 2974
Credit: 5,374,792
RAC: 0
Message 87526 - Posted: 20 Aug 2008, 5:17:02 UTC - in response to Message 87513.

Thanks Jord. Didn't think I was alpha though... But maybe I am???? ;-)

If you mean about the PM, it's just so he doesn't get any emails... ;-)

Also, looking at the OP's messages, I don't see it nearly that frequently on mine. Basically, I'll see it then I'll see the restart, then a few hours or so later, I'll see the wu complete.

Frequent enough if you see it happen more than once, or at least once per task. More than I see it at least.
____________
Jord
Profile Gundolf Jahn
Send message
Joined: 1 Mar 05
Posts: 1079
Credit: 341,280
RAC: 0
Message 87535 - Posted: 20 Aug 2008, 7:49:16 UTC - in response to Message 87513.

...It may happen once a day per project. But then again, it may go a few days without me seeing it...

Do you happen to use Internet time servers to automatically adjust your PC time? If so, just ignore the messages. It's not a project specific problem, it's caused by boinc's heartbeat mechanism to check if the application is still working. If it doesn't respond for longer than 30 seconds, boinc assumes that it's "dead" and drops the process. Unfortunately, that also happens when the PC clock is adjusted in the "wrong" direction. Just yesterday I saw this happen to einstein and seti simultaneously. It occurred exactly when the PC clock was adjusted, so I ignored it.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)
1 · 2 · Next

Message boards : Cruncher's Corner : Task exited with zero status but no 'finished' file


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen