new units not downloading

NonValueAdded
NonValueAdded
Joined: 20 Feb 05
Posts: 1
Credit: 12835
RAC: 0

RE: RE: it wasn't till I

Message 13556 in response to message 13554

Quote:
Quote:

it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us

Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce

FYI, it seems that my UPPER Case "H1_" work units got caught up in the delete sequence. I had a reboot in there so that didn't help plus I'm, going from memory on how many WU showed up before and after they system cycle. I had the impression from the original note that only the lower case h1_'s were at issue. I'm not worried about the credit and agree with most other responders on that point. But it might be important for the BOINC people to model out the case sensitivity aspect of the delete. Maybe with the combinations of OS and file system versions to simply avoid using file name case as a differentiator in the future. JMHO

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: RE: RE: it wasn't

Message 13557 in response to message 13555

Quote:
Quote:
Quote:

it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us

Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce

Gary has pointed out to me that credit is not granted for wu that are killed by stealing their files. On consideration this makes sense if the xml that held the cpu time has gone. If the client re-starts the download when the files vanish, presumably it also deletes/overwrites the file that remembers the cpu time so far?

My thought is that it may be better, if running 4.19, to kill those wu from the operating system while BOINC is actually crunching them. This assumes the OS has some kind of task manager (eg not Win-98).

On win-XP for example, hit ctrl-alt-del and the task manager comes up. Highlight the Einstein task, right click, and kill process. The wu will report to BOINC that it ended with some error code that means killed. I think that this means that BOINC will report it back with a 'client error' message and they will get credit.

On linux: you probably already know how to use top or ps to get the pid, and how to use kill to abort. If not, I recommend the man pages on top, ps, kill.

Note: I have tried the win-xp method in the past, but not on these wu. If my suggestion won't work, please say so!

Using taskmanager (or similar utilities like Process Explorer) to kill the science application work great on Windows. Even Win95/98/ME, which has a task list instead of a task manager. Its still used to "kill" progams.

However, Linux seems to recover "better". Most of the time it just restarts the WU with the messages

Restarting result xxxxx
Result xxxx exited with zero status but no 'finished' file
If this happens repeatedly you may need to reset the project.

Going thru the signals, SIGABORT works. Like this (note - you have to use the same userid that you run BOINC under):

List the users tasks, enter:
ps -a

or if it doesn't show the boinc tasks, enter:
ps -x

Use the Process ID (PID) in the kill commmand:
kill -SIGABRT PID

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.