Windows S5R3 App 4.25 available for Beta Test

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: Brian, I PMed Richard 3

20 Jan 2008 12:50:07 UTC

Message 77359 in response to message 77358

(moderation:

)

Quote:

Brian, I PMed Richard 3 days ago asking about what the numbers mean. Perhaps you can tell me?

The only part I know about is the first 3 and the final number. The first 3 are indeed the frequency being checked. The last number(s), not including the 0-based index number of your result (at the end), are the sequence number of that particular frequency range. Apparently the .nn plays some role. Not sure what though... I too would like to know... :-) Perhaps someone over in Germany that seems to be awake will come along and help...

Quote:

Is the x axis the second number?

Nope, that's the last one, the sequence number.

Quote:

Yes, Dual boot for all my systems (except the wifes laptop).
the linux wus had 101 and 102 as the second set of numbers. Windows wus were375,376, 378, 379, and 380. hmm, is there a range for these numbers??

Good, then I'll keep watching your 6000+ system. I think the range is 0-400, but I could be mistaken...

Astro

Joined: 18 Jan 05

Posts: 257

Credit: 1000560

RAC: 0

I attached my linux hosts 5

20 Jan 2008 13:03:31 UTC

Message 77360

(moderation:

)

I attached my linux hosts 5 days ago, but only attached the windows persona the morning before they released 4.25. Since I don't know beans,,,,I decided to keep running 4.15 on the windows to get a "baseline". Now, If I need to collect one of every x axis number possible to show a decent baseline, then I mights well dump 4.15 and use 4.25 instead. Would that be the right thing to do? Let's see 2 wus/day, 400 samples....If properly distributed, that'd take 200 days to get them all, and that's if I never rebooted to linux, AND if I wasn't attached to 2 other projects. I wonder if 4.15 would even be around long enough to get a decent sample size??, or one that could be "comparable".???

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2142

Credit: 2780099419

RAC: 752959

RE: Brian, I PMed Richard 3

20 Jan 2008 13:37:13 UTC

Message 77361 in response to message 77358

(moderation:

)

Quote:

Brian, I PMed Richard 3 days ago asking about what the numbers mean. Perhaps you can tell me? he pointed to two sets of numbers within the wu name. One which has a decimal (the first numbers which appear to be some sort of frequency), and a seconds set which I have NO idea what it is. I've seen the first number referred to as a "template" by an Archae86 chart. Basically, I don't know the terminology. I see the nifty inverted "full wave" DC chart "like" graphs, but don't know what the bottom line (x axis) represent. Is the x axis the second number? I I were to create a column for each in my script, what titles should I give them??

Yes, Dual boot for all my systems (except the wifes laptop).
the linux wus had 101 and 102 from 0715.25 as the second set of numbers. Windows wus were 375,376, 378, 379, and 380 from 0794.15. hmm, is there a range for these numbers??

Sorry I didn't reply to the PM, but I'm not a gravity wave scientist and, frankly, I don't know. Like everyone else, I can guess, speculate, read the news archive, and search the message boards. I've also still (just) got a couple of memory (brain) cells for emergency use when I can't think of a search phrase.

h1_0679.30_S5R2__97_S5R3a_2

h1 ... Once upon a time, we had separate data from Hanford and Livingstone, so we had 'h' tasks and 'l' tasks. The S5 heirarchical search looks at data from both sites together, so this is redundant.

0679.30 ... A frequency. Expressed to two decimal places.

S5R2 ... When this data format came into use, I think.

97 ... Task number within the frequency band. Issued in decreasing order, down to base 0

S5R3a ... Major version number of the Science App intended to do the search.

2 ... Replication number (base 0). Somebody's wingman must have thrown a wobbly.

The graphs Peter and I have been posting have the field I have called "Task number" along the X-axis. I don't think it's worth combining tasks from multiple (data) frequencies in any sort of crunch-time analysis: as we saw very early in the S5R3 analysis, runtimes oscillate, but the frequency of oscillation is different at different search (data) frequencies. You might like to add that whole thread to your reading list.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: Sorry I didn't reply

20 Jan 2008 13:48:11 UTC

Message 77362 in response to message 77361

(moderation:

)

Quote:

Sorry I didn't reply to the PM, but I'm not a gravity wave scientist and, frankly, I don't know.

...nor are you German... (I was referring to Bikeman)...

Astro

Joined: 18 Jan 05

Posts: 257

Credit: 1000560

RAC: 0

still...two remaining

20 Jan 2008 13:52:35 UTC

Message 77363

(moderation:

)

still...two remaining questions:

Is the range of task numbers finite? I.E never get higher than X, and never negative?

How is the oscillation frequency determined (60 hz, 45 hz, etc). Is it a function of the base frequency? Is it calculated from the resultant chart data peaks/troughs? measured P-P(peak to peak) or every other P as if the second have of the wave wasn't rectified? I.E every 45 tasks a peak occurs??

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2142

Credit: 2780099419

RAC: 752959

RE: still...two remaining

20 Jan 2008 13:59:56 UTC

Message 77364 in response to message 77363

(moderation:

)

Quote:

still...two remaining questions:

Is the range of task numbers finite? I.E never get higher than X, and never negative?

How is the oscillation frequency determined (60 hz, 45 hz, etc). Is it a function of the base frequency? Is it calculated from the resultant chart data peaks/troughs? measured P-P(peak to peak) or every other P as if the second have of the wave wasn't rectified? I.E every 45 tasks a peak occurs??

Read the thread I linked. At the time I posted, it was a completely unknown - even to Bernd - artefact of the search process (unplanned and unexpected). But perhaps he knows by now.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 691961778

RAC: 170785

Hi! You'll find some infos

20 Jan 2008 15:13:42 UTC

Message 77365

(moderation:

)

Hi!

You'll find some infos in the "Visualization" Thread that also was recommended to you.

The task number is directly related to the region of the sky that the respective WU deals with. The zero numbered WU always seems to start at a pole of the starphere coordinate system, so to speak. The following tasks will traverse the sky towards the equator, and then towards the opposing pole. Then it will start all over with other parameters.

Tasks that look at sky-regions near the poles are slower than those near the equator. Because "space" to look at get's smaller at the pole (the circumference of the ring of points that are investigated gets smaller), the "angular speed" of the search increases near the poles, and I guess that's what produces the steep slope of the graphs near the maximums.

For different search frequency (the first number in the WU name), the algorithm traverses the sky with different speeds, so it takes more tasks to cover the whole sky at higher frequencies ==> longer period of the runtime oscillation.

Anyway, I guess we should continue this discussion in the S5R3 sticky thread, because we are getting a bit off topic here.

Bikeman

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 691961778

RAC: 170785

As to the Win beta

20 Jan 2008 15:38:28 UTC

Message 77366

(moderation:

)

As to the Win beta app:

Graphics work like a charm, even under Windows Vista.

As to the speed, I did some profiling and disassembling and as Bernd has already mentioned, the Microsoft compiler just ruined the hot-loop, even worse than gcc did for the latest beta app :-(

In addition, the Compiler emits code that is really not so hot when it comes to copying or initializing double precision floating point data:

For those who are familiar with assembly language programming:

Quote:

....
mov esi, DWORD PTR [eax]
mov eax, DWORD PTR [eax+04h]
mov DWORD PTR [esp+0b4h], eax
mov DWORD PTR [esp+0b0h], esi
fld QWORD PTR [esp+0b0h]
mov DWORD PTR [esp+090h], ecx
mov DWORD PTR [esp+094h], ecx
fld QWORD PTR [esp+090h]
mov DWORD PTR [esp+070h], ecx
mov DWORD PTR [esp+074h], ecx
fld QWORD PTR [esp+070h]
....

Here we see a stunning series of three consecutive instances of what is (IIRC) called a "store forwarding stall". Very expensive. I really don't know what the compiler had on his mind when he was writing that :-), but these little instructions alone might be responsible for an overall 4...5% performance loss. And there are more spots like this. I wonder if there's a compiler switch to prevent this.

CU
Bikeman

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: mov esi, DWORD

20 Jan 2008 17:07:15 UTC

Message 77367 in response to message 77366

(moderation:

)

Quote:

mov esi, DWORD PTR [eax]
mov eax, DWORD PTR [eax+04h]
mov DWORD PTR [esp+0b4h], eax
mov DWORD PTR [esp+0b0h], esi
fld QWORD PTR [esp+0b0h]
mov DWORD PTR [esp+090h], ecx
mov DWORD PTR [esp+094h], ecx
fld QWORD PTR [esp+090h]
mov DWORD PTR [esp+070h], ecx
mov DWORD PTR [esp+074h], ecx
fld QWORD PTR [esp+070h]

I really don't know what the compiler had on his mind when he was writing that :-)

Each complier has to follow a scheme, because they don't have any intuitions.

1, init local variables

Quote:

mov esi, DWORD PTR [eax]
mov eax, DWORD PTR [eax+04h]
mov DWORD PTR [esp+0b4h], eax
mov DWORD PTR [esp+0b0h], esi
mov DWORD PTR [esp+090h], ecx
mov DWORD PTR [esp+094h], ecx
mov DWORD PTR [esp+070h], ecx
mov DWORD PTR [esp+074h], ecx

2, compile calculations

Quote:

fld QWORD PTR [esp+0b0h]
fld QWORD PTR [esp+090h]
fld QWORD PTR [esp+070h]

3, optimalization (interlace of integer/FPU operations)

Quote:

mov esi, DWORD PTR [eax]
mov eax, DWORD PTR [eax+04h]
mov DWORD PTR [esp+0b4h], eax
mov DWORD PTR [esp+0b0h], esi
fld QWORD PTR [esp+0b0h]
mov DWORD PTR [esp+090h], ecx
mov DWORD PTR [esp+094h], ecx
fld QWORD PTR [esp+090h]
mov DWORD PTR [esp+070h], ecx
mov DWORD PTR [esp+074h], ecx
fld QWORD PTR [esp+070h]

So, this is an optimized result.
But, always is a faster solution, of course.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: But, always is a

20 Jan 2008 23:01:44 UTC

Message 77368 in response to message 77367

(moderation:

)

Quote:

But, always is a faster solution, of course.

Let's hope so...

My "0" result finished in 50,579.88. I'm guessing that translates to 10-12% drop for me...although it is a guess, as I don't have all these data plots to go on... It could be less than that, perhaps even around the 7% that you stated happened on a Sempron...

Anyway, tough call on whether or not to make the app official. Seems completely stable, but the performance drop could increase the incidence of systems missing the deadline. Could you please ask Bernd to consider increasing the deadline up to 18 days temporarily?

Thanks....

Windows S5R3 App 4.25 available for Beta Test

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner