Comparing CPU core productivities

Orgil
Orgil
Joined: 29 Oct 05
Posts: 55
Credit: 91242
RAC: 0

I did not touch my pc

I did not touch my pc configuration since using it. So my guess is this core i5 is running 2.27ghz normally with turbo of 2.5ghz. I have no idea how to play with HT.

The reason I chose multiple samples is if in case those xeons or i7 920's have different setups or oc's then they should have different cpu times but mostly they have those cpu time results that I am showing you. And my 3d time mentioning is you always can verify with that top host list.

I found core i5 desktop cpu's and all of them they are doing 15k-17k sek according to top hosts list.

My RAC is showing daily 10-12 hrs of run by 1 core (2 threads only 2 wu's at a time) but in last 2 days maybe in 30% of time I did few seti's.

transient
transient
Joined: 3 Jun 05
Posts: 62
Credit: 115835369
RAC: 0

In previous posts you mention

In previous posts you mention the performance difference between 32 nm and 45 nm. You can't use the top-Xeons to compare those differences, they are 32 nm, just like your i5 430. Comparing the runtimes to the Opteron is more valid. But only if you want to compare AMD and Intel. That difference is at least as important as the 32 nm and 45 nm differences.

Edit: I'm talking a about the top 5 here. :)

Orgil
Orgil
Joined: 29 Oct 05
Posts: 55
Credit: 91242
RAC: 0

The point is I have clear

The point is I have clear proof list of how 32nm mobile tech is beating 45nm supreme server or quads but you guys just trying to disregard the fact and trying all kinds poor redicule things mentioning all HT or boinc version things every things. Just accept the fact that per core wise core i5 mobile cpu is beating all your old (tech) religion big time. ;D

Open your eyes and sift through top host list from the provided url. fact is fact.

From 3dmark benchmark core i5 430 getting around 2300 but Q6600 is getting 3500 score but here in E@H crunching productivity i5-430 has 22k sek but Q6600 has 25k sek. I mean each application wise all cpu's behave differently. So in case of E@H crunching environment 32nm is beating pretty much everything around (per core wise). <- all non OC'd condition.

I also verified with core i7 720Q latop (that is 45nm) and mostly they are doing 1 wu in 43k sek but with linux 31k sek. That means regular i5 laptop has equal productivity with core i7 laptop in every 12 hours in same OS setup. Meaning i7 laptop will do 8 jobs in 43k time but i5-430 laptop will finish same 8 jobs in around 43k sek.

Now I am testing with i5-460 laptop looks like that is a bit faster than i5-430 as initial job finish timing tells.

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7070584931
RAC: 1297863

RE: I suddenly noticed that

Quote:
I suddenly noticed that single core output of mobile core i5 430 is likely more faster than single core of i7 920 cpu.

The basic error here is ignoring the question of whether hyperthreading is on or off.

As the Lynnfield die possesses 4 physical cores (as does the Nehalem), Lynnfield i5 hosts for which BOINC reports 4 processors are NOT running HT, while all i7 hosts reporting 8 processors are (there is not yet an i7 die shipping with eight physical cores).

The Arrandale and Clarkdale i5 dice, on the other hand, have only two physical cores, so if you found a Core i5 Arrandale or Clarkdale reporting four processors it would be running HT.

If you actually wanted to compare the per-core productivity, you would just divide the HT hosts times by two.

As to process generation, the i5 is a split decision--the Lynnfield products run on 45 nm, while the Clarkdale and Annandale are on 32nm. Some of the actual dice are used for both i7 and i5 products. This is true of Arrandale and Lynnfield.

As you have chosen to hide your own hosts, I can't just look at yours, but you can: Just check the "Number of processors" entry on the host detail page.

One other thing: a host running less than a full load of Einstein will rather commonly report shorter run times. Nehalem has a lot of memory bandwidth, but the processes do conflict at both the cache and memory access level.

If you actually believe i5's have a large fundamental architectural advantage over i7's you are mistaken.

Orgil
Orgil
Joined: 29 Oct 05
Posts: 55
Credit: 91242
RAC: 0

But obviously those Xeons are

But obviously those Xeons are running E@H less than full of their capacity to handle basic server duties and not one or few but most of Xeons have 25% longer run time than mobile i5. And i7 920's all of them have average 24k run time that means mostly they are not OC'd and that is their actual processing capability on E@H.

Here is last job finish sample from i5-430:

11 Dec 2010 16:43:40 UTC 13 Dec 2010 18:30:58 UTC Completed, waiting for validation 23,561.80 23,103.89 74.95 pending Global Correlations S5 HF search #1 v3.06
10 Dec 2010 17:52:48 UTC 13 Dec 2010 5:51:46 UTC Completed and validated 23,555.08 23,237.50 74.95 251.07 Global Correlations S5 HF search #1 v3.06
10 Dec 2010 17:21:40 UTC 13 Dec 2010 5:51:46 UTC Completed and validated 23,489.95 23,177.13 74.95 251.07 Global Correlations S5 HF search #1 v3.06
10 Dec 2010 5:17:35 UTC 12 Dec 2010 17:46:12 UTC Completed and validated 22,940.91 22,510.73 74.95 251.07 Global Correlations S5 HF search #1 v3.06
10 Dec 2010 5:13:18 UTC 12 Dec 2010 17:46:12 UTC Completed, waiting for validation 23,026.75 22,587.01 74.95 pending Global Correlations S5 HF search #1 v3.06
9 Dec 2010 4:49:57 UTC 11 Dec 2010 19:02:50 UTC Completed and validated 23,237.78 22,846.77 74.95 251.07 Global Correlations S5 HF search #1 v3.06
9 Dec 2010 4:48:54 UTC 11 Dec 2010 18:52:12 UTC Completed and validated 23,341.11 22,914.92 74.95 251.07 Global Correlations S5 HF search #1 v3.06

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2779279312
RAC: 761780

RE: But obviously those

Quote:
But obviously those Xeons are running E@H less than full of their capacity to handle basic server duties and not one or few but most of Xeons have 25% longer run time than mobile i5. And i7 920's all of them have average 24k run time that means mostly they are not OC'd and that is their actual processing capability on E@H.


If anyone round here understands the inner workings of Intel processors, it's archae86. If you ask nicely, he might tell you something of his CV.

Orgil
Orgil
Joined: 29 Oct 05
Posts: 55
Credit: 91242
RAC: 0

My main concern is

My main concern is productivity per core. Of course cpu wise it is obvoius any cpu with more core will have higher productivity but per core wise mobile 32nm cpu core is showing clear advantage over 45nm server & desktop quad cpu's.

The thread title is CPU (per) Core Productivity.

Tim Norton
Tim Norton
Joined: 9 Jul 10
Posts: 3
Credit: 591831
RAC: 0

Ok to add some comparible

Ok to add some comparible (ish) results to this discussion

i have an i7 930 and an i5 655k(multiplier unlocked) both doing Einstein

i7 http://einsteinathome.org/host/3625774/tasks

i5 http://einsteinathome.org/host/3025270/tasks

i7 is oc to 3.6 and running HT - (mem @ 180) - other two cores feeding gpu's(8no)

i5 is oc to 4.0 and running HT - (mem @ 133) - other two cores feeding gpu's (6no)

i7 is doing 6 cpu wu @ once

i5 is doing 2 cpu wu @ once

ignore the last few results from the i5 as had overclock turned down to 3.2gig

turbo mode is off on both cpu's

i5 being at a slightly higher speed is quicker than the i7 by about a ~1000 sec - consistent over several days crunching - get same times if i use less cores say 2 on i7 times are still the same

at stock for both i7 at 2.8gig and i5 at 3.2gig - i5 is about an hour or more quicker per wu - this is from memory as results are long gone - i7 does a cpu wu in about ~22k seconds at stock and 16.5k seconds when over clocked

from limited (non scientific) testing cpu wu are more dependant on cpu clock speed than memory speed - i.e. FLOPS bound

if i put the i5 at approx 3.8 then i would get almost the same times as i7

basically at the same clock speed the i7 is slightly faster but not by much per single thread

(have not tried turning off HT on either yet)

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7070584931
RAC: 1297863

RE: (have not tried turning

Quote:
(have not tried turning off HT on either yet)


Great to have the useful data you already posted--better yet if you try an HT comparison

I tried HT productivity comparison on my system a month or two ago. It is an E5620--so a Westmere die on 32nm. With the CPU overclocked at 3.4 GHz, but the memory running at nominal parameters, I saw these results:

CPU seconds:
_app __HT_  __nHT__
ABP2 14890 8695
GC__ 20822 14899

So on the ABP2 ap with my system configuration running hyperthreaded gave a 17% productivity advantage for a full load of ABP2 work, but a much larger 43% advantage for GC work. It would be interesting to see the comparison on your systems with the current ap.

By the way, my Westmere host is sick at the moment--it is not currently recognizing the RAM in one of the three channels. I don't know how long that has been true, but am pretty sure all three channels were alive for the results I'm reporting here.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 541744096
RAC: 169294

Orgil, archae86, me and

Orgil,

archae86, me and probably quite a few others understand perfectly well what you want to say. There's no need to repeat any of this. However, we've been pointing out a fundamental flaw in your interpretation of your data, which may totally invalidate the point you're trying to make. We need to get past that point.

In your initial post you said "both cpu's processing 2 wu's in 2 threads of a core". I tried to make you clarify what this means, as it's not unambiguous. So far you are assuming that you are doing the same work as the CPUs you are comparing to, just with 2 cores instead of 4 cores.

May I suggest you're wrong? What I think you're doing is running "1 WU on each CPU core". HT may be active for your CPU, but you're not using it. The OS is smart enough to schedule tasks on the fastest ressources. That means you're finishing 2 WUs in 22 - 23 ks. If you used HT you could run 4 WUs in parallel and achieve about 43% higher throughput for GC tasks (as archae86 measured in his setup).

The last results you've shown here show nicely how your WUs always finish in pairs, not quads.

Archae86s Westmere needed 14899 s for a GC WU at 3.4 GHz. At 2.27 GHz I'd expect about 14899 * 3.4 / 2.27 = 22300 s on average for these WUs and this CPU architecture. Notice how this is just about what you're getting? That's because the architectures perform about the same at Einstein.

If you compare to runtimes of 24 ks for i7 CPUs then these are using HT. That's a huge difference: these Quads are producing 8 WUs in almost the same time as you're doing 2 WUs with your dual core. Per core productivity is almost a factor of 2 better thanks to higher clock speeds (no offense - lower clock speeds are healthy for laptops) and the use of HT.

Running 4 GC WUs you should be able to deliver 4 of them every ~31 ks.

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.