One einstein task had client error and another was success , both on July 15

Lotus Lee Bomar and Robert S Hoover
Lotus Lee Bomar...
Joined: 4 Jul 07
Posts: 3
Credit: 3862
RAC: 0
Topic 193767

My VERSON DSL INTERNET was lost from July 11 through July 15 until I received a new modem from Verizon DSL and installed it on July 15 11 pm.
Would this be the reason my Einstein task errored out on July 15 ?

Robert Hoover

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109397983340
RAC: 35723199

One einstein task had client error and another was success , bot

Quote:
My VERSON DSL INTERNET was lost from July 11 through July 15 until I received a new modem from Verizon DSL and installed it on July 15 11 pm.
Would this be the reason my Einstein task errored out on July 15 ?

Probably not. In any case, the task errored out on July 10, not July 15.

A task already on your computer will continue crunching normally, irrespective of any problems you may have with your internet connection. Even if it completes and can't immediately be returned, it will simply sit there until your internet connection is back up again.

You can see for yourself what happened to that task. Go to your account page on the website and click the link to view your computers. On the page that displays (near the right hand side) click the "1" in the results column. On the next page click the "TaskID" for this single task and you will get a detailed report that tells you basically what happened. As you browse this report you will come to a section headed "stderr.out" where you will find the error message.

"too many normally harmless exit(s)"

I can't recall seeing that message before and I must confess at having a good old laugh at the thought of a "normally harmless exit" possibly as opposed to an "exit with extreme prejudice" for example :-).

If you browse down towards the bottom of the file you can see the blocks of text at regular intervals where the task was restarted normally from time to time, probably as a result of different projects stopping and restarting as they shared the CPU. This is quite normal. The problem area is towards the bottom where you will see continuous restarts (which commenced at just after midnight on July 10. The message "No heartbeat from core client for 30 sec - exiting" tells us that the normal communication between the BOINC core client and the science application was continually being lost so the app was continually exiting so that BOINC could restart it and reestablish communication. This process was obviously being repeated too many times for BOINC's liking so the plug was finally pulled.

I have no idea what would cause this almost continuous loss of communication and I also doubt that your modem would have caused it. Maybe someone will come along who has more ideas about this than I do.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686127309
RAC: 577205

The problem was that the

The problem was that the science app did not receive the regular "hartbeat", a periodic message, that the BOINC core client sends out to assure the science app that BOINC is still running. As the science app was restarted after it shut down itself, we can assume that the BOINC was indeed running.

The only relation with the internet connection problems you mentioned that I can imagine: did you change anything with respect to the networking on your computer in an attempt to diagnose/solve those connection problems?

Another possibility is that there are blocking DNS queries performed by the core client during the outage that preveted it fronm sending heartbeats. I guess one way to prevent this could be to deactivate network activities in BOINC manager during known connection problems.

CU

Bikeman

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

RE: Another possibility is

Message 82914 in response to message 82913

Quote:
Another possibility is that there are blocking DNS queries performed by the core client during the outage that preveted it fronm sending heartbeats.


I asked the devs about this and got this answer from Rom:

Rom Walton wrote:

In theory it could, depending on the whole DNS thing.

The CC will lock up when looking for the address of a remote computer. That would lead to the no heartbeat message from the science application and then the exit.

----- Rom

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Hmmm... I thought they

Hmmm...

I thought they addressed the problem of curl blocking CC execution during stalled DNS queries a while back.

Guess I read the conversation wrong, unless the theory Rom is talkng about is the older CC's in the wild. ;-)

Alinator

Dr. Les
Dr. Les
Joined: 30 Jan 06
Posts: 7
Credit: 1068328
RAC: 0

RE: Hmmm... I thought they

Message 82916 in response to message 82915

Quote:

Hmmm...

I thought they addressed the problem of curl blocking CC execution during stalled DNS queries a while back.

Guess I read the conversation wrong, unless the theory Rom is talkng about is the older CC's in the wild. ;-)

Alinator


===========
I seem to have had a similar difficulty on July 15...one message that came on one machine was "may have to reset" and I did that. Yet it is just described as a client error, as far as I am able to figure out...it now does not seem to be getting new work.

Now, I want to abort that but having trouble seeing how to do that since the task no longer shows up on the Project/Task listing. Any ideas?

Best,

Dr. Bob

Dr. Les
Dr. Les
Joined: 30 Jan 06
Posts: 7
Credit: 1068328
RAC: 0

RE: RE: Hmmm... I

Message 82917 in response to message 82916

Quote:
Quote:

Hmmm...

I thought they addressed the problem of curl blocking CC execution during stalled DNS queries a while back.

Guess I read the conversation wrong, unless the theory Rom is talkng about is the older CC's in the wild. ;-)

Alinator


===========
I seem to have had a similar difficulty on July 15...one message that came on one machine was "may have to reset" and I did that. Yet it is just described as a client error, as far as I am able to figure out...it now does not seem to be getting new work.

Now, I want to abort that but having trouble seeing how to do that since the task no longer shows up on the Project/Task listing. Any ideas?

Best,

Dr. Bob

===============
Opps, please ignore the above message from Dr. Bob...my error should be on SETI, not Einstein.

Sorry

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.