ABP2 CPU-only applications

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7075904931
RAC: 1320516

RE: The only obvious Q8400

Message 96481 in response to message 96476

Quote:

The only obvious Q8400 disadvantage is four cores sharing one RAM interface, as opposed to two. Perhaps the ABP2 ap is far more RAM-access bound than most? Were that true, your Q8400 rig would presumable respond to RAM interface timing tweaking more than you are accustomed to see on other aps.

Just musing aloud. As it happens, I have a matched 2-core, 4-core pair of Conroe-class hosts, so once the ABP2 revisions settle down enough to get work on both, I can look for a similar effect there, though they differ from your Wolfdales profoundly enough that a difference would not surprise me. On existing workloads over the last couple of years on multiple SETI and Einstein aps, they have seldom differed appreciably in typical CPU seconds per result--certainly not by the amount you see. I've believe them not generally to be much RAM-bound, and spent no energy on twisting the tail on the RAM settings.

Early returns are in, and the hypothesis that ABP2 has a higher tendency to get memory-bound than other recent Einstein and SETI aps looks yet more likely.

I have a very close matched pair of hosts with quad-core Conroe (Q6600) and dual-core Conroe (E6600) running on the same model motherboard, same RAM, same clock frequency (stock 2.4 GHz)... Historically, the quad has taken slightly longer on average to do comparable work, but well under 10%.

So far, however, ABP2 running on all four cores is taking about 3600 seconds on the Q6600, vs 2800 on the E6600--a much bigger difference than I am used to seeing.

I also have a Q9450 Penryn-class quad running stock at 2.83 GHz. Historically it enjoys somewhat better than clock-rate advantage over the Q6600, possibly because of architectural advantage, possibly because of considerably higher RAM bandwidth.

Here it is taking about 2600 seconds for ABP2 running 4-up, but a couple of results which ran against GW or SETI work took only 2200. Taken together, these results suggest that even the higher Penryn bandwidth is burdened by the ABP2 RAM demands in a 4-up configuration, but less so than the Conroe.

As I run my RAM dead stock, if my initial guesses are right (which depend, in part, on the notion that unlike GW, ABP2 results are very similar to one another in computation requirement) this all suggests I have an opportunity by twisting the tail on RAM. Sadly, the major opportunity would be on my Q6600 system. But that is my daily driver, host to my serious audio hobby, and all my financial affairs. I'm not ready to go through a "try until fail and back down" sequence on it. So the notion will remain unverified, at least by me.

Elphidieus
Elphidieus
Joined: 20 Feb 05
Posts: 245
Credit: 20603702
RAC: 0

Anybody come across this

Anybody come across this before...?

Fri Jan 22 09:07:17 2010 Einstein@Home [error] p2030_54170_48472_0076_G51.55+00.04.C_4.dm_291_1: negative FLOPs left -12433123904410.585938

I came across several of these, but I can't trace which work unit they belong to.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 33

credit for these WU's is

credit for these WU's is seriously too high. I've got them on 4 of my 6 PCs so far.

Core 1 duo T2250 (1.73 ghz) Vista
S5R6 12-18 credits/core/hour
ABP2 19-24 credits/core/hour
22% faster on average

p3M 800 Win7
S5R6 5.4-6.4 credits/core/hour
ABP2 9.6 credits/core/hour (single WU)
62% faster

Athon 64x2 3800 (2ghz) linux
S5R6 17-25 credits/core/hour
ABP2 24 credits/core/hour
14% faster

Corei7-920 (3.85 ghz) win7
S5R6 29-42 credits/core/hour
ABP2 78 credits/core/hour
120% faster

The spread on my i7 is enormous. If I'm right in thinking ABP2 has a chaotic memory access pattern the much larger amount of cache available and the massive bandwidth that DD3-1600 provides is probably responsible.

The wide spread in performance differentials makes picking the proper credit value somewhat difficult, but it should be a value where some computers are faster with S5R6 and others faster with ABP2 instead of one where some are slightly faster with ABP2 and other enormously faster.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692021382
RAC: 135474

But you are comparing CPU and

But you are comparing CPU and CUDA GPU performances on the i7 here, right? That's not quite fair . The ABP1 WUs are identical for CPU and GPU, and wiull earn 40 credits no matter if crunched on a CPU or GPU.

All the other examples for different CPUs show that you can't get it exactly equal across all platforms.

I wouldn't call the memory access pattern of ABP2 "chaotic", a significant part of it is FFT which has a rather regular access pattern, but still too complex for most CPUs prefetching logic.

CU
HB

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 33

RE: But you are comparing

Message 96485 in response to message 96484

Quote:

But you are comparing CPU and CUDA GPU performances on the i7 here, right? That's not quite fair . The ABP1 WUs are identical for CPU and GPU, and wiull earn 40 credits no matter if crunched on a CPU or GPU.

All the other examples for different CPUs show that you can't get it exactly equal across all platforms.

I wouldn't call the memory access pattern of ABP2 "chaotic", a significant part of it is FFT which has a rather regular access pattern, but still too complex for most CPUs prefetching logic.

CU
HB

argh! Did the no GPU setting get reset when ABP2 came out? I could've sworn I had it turned off before, as opposed to just blocking it via app info. I guess this explains why I took a bit of a hit on collatz the last day or so.

edit: does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692021382
RAC: 135474

RE: edit: does the server

Message 96486 in response to message 96485

Quote:

edit: does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.

I don't think so. In fact I'm not even sure the GPU is considered for calculating the quota at all, I think it's CPU cores * 32 at the moment.

CU
H-B

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

RE: RE: edit: does the

Message 96487 in response to message 96486

Quote:
Quote:

edit: does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.

I don't think so. In fact I'm not even sure the GPU is considered for calculating the quota at all, I think it's CPU cores * 32 at the moment.

CU
H-B

Indeed, my C2D has a quota of 64 tasks. So the GPU doesn´t count.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110417100799
RAC: 30731016

RE: ... does the server

Message 96488 in response to message 96485

Quote:
... does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.


If you abort and return 16 GPU tasks and allow one CPU task to be completed afterwards, your quota of 32 will drop temporarily to 16 and then be restored to 32 by the one CPU task. Rinse and repeat until you have cleared all GPU tasks :-).

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 33

RE: RE: ... does the

Message 96489 in response to message 96488

Quote:
Quote:
... does the server keep seperate quota/failure counts's for CPU and GPU WUs? I want to abort all of the latter if I can do so safely.

If you abort and return 16 GPU tasks and allow one CPU task to be completed afterwards, your quota of 32 will drop temporarily to 16 and then be restored to 32 by the one CPU task. Rinse and repeat until you have cleared all GPU tasks :-).

So my nuking about 200 WU's at once netted a temporary penalty of roughly a dozen. Not bad, and I can see why noone's noticed it before.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

I am finally running ABP2

I am finally running ABP2 units on my Linux box, an Opteron 1210 at 1.8 GHz with SuSE Linux 11.1 and BOINC 6,6.41. It took 6,716.13 s. Don't know about credits since it is still pending,
Tullio

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.