ABP2 CPU-only applications

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245184851
RAC: 13891

RE: RE: RE: Got

Message 96501 in response to message 96500

Quote:
Quote:
Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

Interesting. Anything running on this machine that could eati up memory at that time?

Quote:
Interesting thing is I still get APP2 WUs stamped to be done by app 1.08, while the actual app should be 1.11.

The CUDA App version is at 1.11; the CPU App is 1.08. That's ok.

BM

The server has 8GB RAM and low load, just some Apache instances, mail server and PosgreSQL running.

I see this 'segfault' errors occasionally happen on some machines, usually all app instances running there get this signal at the very same time, and without any relation to the application source code line they are in or the data they are processing, so this isn't a real programming error.

I suspected the Linux 'optimistic memory allocation' to be responsible for that, that randomly kills processes if the physical memory isn't enough for the memory it 'optimistically' assigned to processes, but it's hard to believe that this is the case here.

We currently loose us up to ~2000h of computing time per day due to this problem.

BM

BM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: RE: RE: Got

Message 96502 in response to message 96501

Quote:
Quote:
Quote:
Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

The server has 8GB RAM and low load, just some Apache instances, mail server and PosgtreSQL running.

I see this 'segfault' errors occasionally happen on some machines, usually all app instances running there get this signal at the very same time, and without any relation to the application source code line they are in or the data they are processing, so this isn't a real programming error.


Same thing here. Server is running kernel...think you know.. ;)

Quote:
I suspected the Linux 'optimistic memory allocation' to be responsible for that, that randomly kills processes if the physical memory isn't enough for the memory it 'optimistically' assigned to processes, but it's hard to believe that this is the case here.


Hm, anything related to 64bit os and 32bit compatibility libs maybe?

Quote:
We currently loose us up to ~2000h of computing time per day due to this problem.

Oh this is ugly. Any information about the distributions/kernels involved? Didn't see this problem on my other hosts so far. The server runs OpenSuse 11.1(64bit), my laptop runs OpenSuse 11.2(32bit), old Athlon XP 3000 runs OpenSuse 10.3 like my development host(64bit). Former root server run OpenSuse 10.3(64bit/8GB) without segfaults. And there is still a little chance for cpu errors or memory failures. Hear about memory problems more and more - maybe a consequence of low profit for the manufacturers and higher integration.
But: Why don't other apps(exception for FF ;)) crash from time to time if this is a Linux problem? I really cant remember a fatal crash on one of my systems in the last years.
And last not least, could the problem be circumvented by a program restart after killed by OOM? Would require the BOINC client to be changed or a wrapper program calling/controlling the science apps(overhead?). But I'm no C/C++ coder, so I might be far off road. ;)

cu,
Michael

[Edit]'killed by OOM' should read as 'ended by out-of-memory killer'.

[Edit2]Last signal 11 on my X2 5000 with E@H:
2008-01-18 18:06:30 [Einstein@Home] Reason: Unrecoverable error for result h1_0762.95_S5R2__255_S5R3a_0 (process got signal 11)

Logfile started 2006 :)

Athlon XP 3000+ running 24/365:
Never ever any signal 11 since 14-May-2008(logging started)

dan
dan
Joined: 7 Sep 10
Posts: 3
Credit: 5343
RAC: 0

Hi Gary I have been working

Hi Gary I have been working with windows 7 taskman and it seems that when I am not using my computer I punch up the running programs by changing the cpus and raising the usage...I also bring the boinc to the front..

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22451438
RAC: 0

Since I switched from

Message 96504 in response to message 96503

Since I switched from 9800GTX+ and 8500GT to GTX470 and now 480, no problems with
CUDA, anymore.
I don't know if it's accepted, but according to the cards GPU & Memory-Load, it's
possible to run 2 at a time.?

(I run 3 SETI MB at a time, which gives a good Load on GPU, 99% and 60% for Memory controller, on it's 384BIT's bus)

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: I don't know if it's

Message 96505 in response to message 96504

Quote:
I don't know if it's accepted, but according to the cards GPU & Memory-Load, it's possible to run 2 at a time.?


You are in the wrong thread (CPU vs. GPU), but I think it's accepted anyway. ;-)

You'll just have to create an app_info.xml with the correct entries. There is at least one other thread with infos about that.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

On my SuSE Linux 11.1 32-bit

On my SuSE Linux 11.1 32-bit pae I can see the ABP2 graphics when I want, but not the S5GC1 graphics. Although I use it rarely because it takes a lot of CPU, I am wondering why.
Tullio

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

I don't know, if this is the

Message 96507 in response to message 96506

I don't know, if this is the right thread, but I can't find a better one. Can anybody tell me about the update cycle of the webpages. I see a number of tasks, wich are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6537
Credit: 286308689
RAC: 103605

RE: I don't know, if this

Message 96508 in response to message 96507

Quote:
I don't know, if this is the right thread, but I can't find a better one. Can anybody tell me about the update cycle of the webpages. I see a number of tasks, wich are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?

Well they are still in progress. But you'd be waiting on your 'wingman'. All work is duplicated ( at least ) to two different hosts, of which you are one in this case. When the other host returns work, then validation occurs, credit is awarded etc and all being well the matter is settled. How long to wait? Well that depends on the activity of the other host and/or other circumstances like missing of deadlines, possible re-issue to complete the quorum ( 2 validated results ) and the like ......

Cheers, Mike.

( edit ) One is always welcome to fire up a new thread if you judge there is no current suitable one ... :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: I see a number of

Message 96509 in response to message 96507

Quote:
I see a number of tasks, which are finished and uploaded for almost 24 hours still as "in progress". What's the reason for this delay?


There's no delay. Tasks are considered "in progress" until they are reported, which is a process separate from uploading.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Cannibal Corpse
Cannibal Corpse
Joined: 21 Feb 05
Posts: 18
Credit: 1555535
RAC: 0

Hello all!! Before I go out

Hello all!! Before I go out and get a loan for the new GTX 580, will it be usable to crunch? Is there an compatibility issue? Oh I will get one reguardless, unless there is a better card?
If and/or when it can crunch,I will post the results and/or report any bugs.

DO WHAT THOW WILL SHALL BE THE WHOLE OF THE LAW.
PROUD MEMBER OF THE CARL SAGAN TEAM.

DO WHAT THO WILL SHALL BE THE WHOLE OF THE LAW.
PROUD MEMBER OF THE CARL SAGAN TEAM.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.