Huge number of "aborted by user" tasks

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4813094148
RAC: 116766
Topic 197527

I went away for five days. We had awful weather at home while I was gone.

Apparently my internet connection dropped.

Three computers continued to crunch, but had no way to upload or report.

Now I find that all of the uploaded completed work units are being marked as "too late to validate" which I find odd.

But the really weird thing is that a different machine on the same connection shows I aborted hundreds of tasks, which I most definitely did not.

They were all crunched and uploaded. I watched them upload.

mikey
mikey
Joined: 22 Jan 05
Posts: 11889
Credit: 1828193831
RAC: 203141

Huge number of "aborted by user" tasks

Quote:

I went away for five days. We had awful weather at home while I was gone.

Apparently my internet connection dropped.

Three computers continued to crunch, but had no way to upload or report.

Now I find that all of the uploaded completed work units are being marked as "too late to validate" which I find odd.

But the really weird thing is that a different machine on the same connection shows I aborted hundreds of tasks, which I most definitely did not.

They were all crunched and uploaded. I watched them upload.

Sometimes the Server uses a term that is not exactly accurate but generic instead, I think you are seeing that now. I think the Server is using 'aborted' to say the units didn't get returned properly/on time/whatever instead of the real words explaining what is going on. The programmers seem to have learned the MS way of doing error messages like the 'blue screen' Windows people see...'your pc has a problem and needs to be restarted'...NO DUH!!

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752967030
RAC: 1385085

Yes, Mikey is right. Those

Yes, Mikey is right. Those tasks have Exit Status 200, which should be

#define EXIT_UNSTARTED_LATE 200 but they redefined a lot of the error/exit codes a couple of years ago, and this website still (sigh...) reports the old interpretation.

It's the BOINC client - all on its lonesome - which aborts the task if you haven't even started it before the deadline - 7 days, for these tasks.

Time for another airing of Status 'Cancelled by server' changed.

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4813094148
RAC: 116766

Thanks guys. Since some

Thanks guys.

Since some projects passively punish a machine or user that aborts a lot of tasks or fails to complete tasks on time I was slightly worried about this.

What I think I have learned from it is that I do not want to get a seven day cache of work and then not allow the machine to connect to the servers for four or five days and upload/download five or six days of work at one time.

The work got done by others and reported and it is a good thing that the results were obtained more quickly than they would have been had everyone else waited for me to be able to reconnect.

I am not complaining, I was just surprised by the error / abort messages and you have explained those. Thank you.

mikey
mikey
Joined: 22 Jan 05
Posts: 11889
Credit: 1828193831
RAC: 203141

RE: Thanks guys. Since

Quote:

Thanks guys.

Since some projects passively punish a machine or user that aborts a lot of tasks or fails to complete tasks on time I was slightly worried about this.

What I think I have learned from it is that I do not want to get a seven day cache of work and then not allow the machine to connect to the servers for four or five days and upload/download five or six days of work at one time.

The work got done by others and reported and it is a good thing that the results were obtained more quickly than they would have been had everyone else waited for me to be able to reconnect.

I am not complaining, I was just surprised by the error / abort messages and you have explained those. Thank you.

What I have done on my pc's is have a 0.50 day minimum work buffer cache setting and then a 0.25 day additional work buffer cache setting. Then I also have a 2nd project set at 0% in case the first project is down or runs out of work. The 0% means it never gets any work unless the first project doesn't send me any, and then only enough for one unit per cpu core or gpu depending on what I have it set up for. I have the luxury of an always on internet and this works for me.

If you had that on your pc's that lost internet it would have still meant late returning of some units, but not the trashing of dozens of others as they would not have been in your cache to begin with. That being said I trashed 3 Einstein units yesterday...I was moving a pc from one place to another and it just refuses to boot back up again! The pc starts it just won't boot!!! All 3 units on that pc are toast and I can't abort them, so they too will end up like yours did.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.