Killing Processes with Manual Refresh

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0

RE: This is one of my

Message 62771 in response to message 62767

Quote:


This is one of my failed WUs

http://einsteinathome.org/task/83371387

I see the error you mentioned, though I can't see this one in my WUs.

Quote:


Also only running the manager in advanced view.

I was able to complete one S5R2 run, tho. This one was on a server that doesn't even have openGL installed...sooooo...just speculating here. I tried to disable graphics by moving away the *.so but BOINC is just too smart and re-installs it from the server

This machine completed 7 S5R2 WUs before trashing WUs.

Quote:

Anyway I don't need the graphics, how can I disable it for good in a safe way?
CU
BRM


Sorry, don't know either.

Regards,
Chris

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

RE: Anyway I don't need the

Message 62772 in response to message 62767

Quote:

Anyway I don't need the graphics, how can I disable it for good in a safe way?

CU

BRM

If possible, install BOINC as a service instead of a single user or shared installation.

bonnyscott
bonnyscott
Joined: 4 Feb 06
Posts: 6
Credit: 5373362
RAC: 0

RE: RE: I will update to

Message 62773 in response to message 62763

Quote:
Quote:

I will update to 5.8.17 now and see whether this helps.

Did so too, now I'm waiting for some WUs to complete, let's see, if manual reporting crashes the running processes again.

The glitch happens _not_ only with the newer Boinc Versions.

Got the same mess here on my Laptop, which still runs Boinc 5.4.9, as well as on an X2 running 5.8.15, both with Linux.
No other debug- output than "caught sigabrt".
And that after several dozen ksec of number- cracking.
Really "enchanting"!

And I _definitly_ didn't fiddle with the "update"- button at the time the X2- WUs went up in smoke!

Regards, Bonnyscott

Kimegi Tepeex
Kimegi Tepeex
Joined: 1 May 05
Posts: 8
Credit: 250148
RAC: 0

I have got several WUs

I have got several WUs terminated in error (process got signal 11) since S5R2 :

On one computer (AMD Duron 1800, Linux 2.6.12-12mdk, BOINC 5.4.11) this result lost 8,000 seconds.
This computer has succesfully crunched 2 WUs of +60,000 seconds after this first crash.

On a second one (AMD Duron 1200, Linux 2.6.18-1.2868.fc6, BOINC 5.8.17), two results here and here have each spent just a little bit more than one full day (+87,000 seconds) of hard work before being sadly killed :(

Maybe one occured after a manual update, but I am sure this was not the case for the two others : BOINC did it by itself.

In either case, no graphics, no screensaver, no other application, just crunching (almost bare) blades on a shelf with others...

Any suggestion on how to avoid those crash to occur again would be greatly appreciated.

Charles Dennett
Charles Dennett
Joined: 22 Jan 05
Posts: 22
Credit: 45273
RAC: 9

Reattached a couple of my

Reattached a couple of my systems several days ago. I've also seen this problem. Just a short while ago I changed the share EAH was getting and did a manual update. A WU that was paused (but still in memory) aborted when I did this. It's here System is running Fedora 5 with a 2.6.20 kernel and the 5.8.17 core client as supplied by the BOINC project (I used to compile my own.) Processor is AMD XP2600+.

There are a couple of other WUs that did the same thing in the past several days when I was adjusting the resource share and doing a manual update.

Charlie


Mikie Tim T
Mikie Tim T
Joined: 22 Jan 05
Posts: 105
Credit: 263777741
RAC: 0

The update glitch also

The update glitch also happened on my linux machine. And it runs 5.8.8, so it doesn't seem to matter what version of CC we're running.

bahur
bahur
Joined: 31 Mar 05
Posts: 1
Credit: 4629271
RAC: 0

Allas, our new compute

Allas, our new compute cluster failed about 50 WUs in just one day. I run E@H for thermal stability assessment and at first I thought that the machine is overheating but then my C2D at home started failing WUs. There's a real twofold annoying problem with the Linux client - we are wasting CPU seconds for nothing and E@H project gets slowed down because the WUs have to be recomputed again...

Now I've switched back to the 64-bit 5.4.11 core client by Debian that used to work flawlessly before S5R2.

ohiomike
ohiomike
Joined: 4 Nov 06
Posts: 80
Credit: 6453639
RAC: 0

I also have had the "Process

I also have had the "Process got signal 11" fault on one of my machines. It is strange in that I have 2 almost identical machines: Boinc 5.8.17, Linux 2.6.21, on AMD x2 CPUs. Both have been running fine for months, now one throws 4 errors in one day, then continues on fine. In my case it looks like it crashed both running tasks after a reboot (I rebooted that machine twice to do updates on other SW).


Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692146435
RAC: 9004

Hi! Is anybody getting

Message 62780 in response to message 62779

Hi!

Is anybody getting these "signal 11" errors on systems that don't have OpenGL installed (e.g. servers without X11)?

My observation (tho only from 13 WU) is that E@H on Linux will run more reliable when graphics are disabled (either because openGL isn't installed at all or because you keep the client from loading libGL ).

Host #1 (no libGL installed) 5 out of 6 WU were completed (1 with an error other than "signal 11")

Host #2 (graphics disabled) 2 out of 2 WU were completed

Host #3: while graphics were still enabled: 3 out of 3 WUs failed (with "signal 11")
After disabling graphics: 3 out of 3 WUs were completed

CU

BRM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.