S5R3

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2783119244
RAC: 713232

This will be the final graph

This will be the final graph in this series - unfortunately, the scheduler didn't cooperate fully.


(direct link)

I got a continuous run down to result __16, and as you can see the shape of each repetition is remarkably consistent. And although I didn't get down to result __0 to prove it, I think my prediction that the phase of the curve is 'maximum runtime is at result zero' is still plausible.

The scheduler now wants me to be wingman for some results in progress on a neighbouring frequency, so I'll switch to the Beta app while I have the chance (I'm pretty sure the Beta is faster on these boxes, so it would have messed up the timing chart if I'd switched earlier). It'll be nice to have the company, and some credits - my nearest wingman is some 40 results back, and I've got almost 21,000 credits pending on this one host!

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4274
Credit: 245423444
RAC: 11596

Update: We have found a

Update:

We have found a model to predict the run-time variation of tasks with equal number of templates (i.e. tasks that currently get the same credit). Note that the credit assigned is already correct on average over all the workunits of a base frequency.

We are working on integrating this model into our Workunit generator to apply a correction to the credit assigned to the WU such that the credit will roughly match the expected run-time even for a single task. However the amplitude of the variation in run-time differs between platforms, CPU types and even the speed of the memory (interface) of the machines, it's rater impossible to make this completely accurate.

BM

BM

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 27

That's interesting, and would

That's interesting, and would seem to imply that there's a fair amount of room to tweak this app. Akos's S4/s5r1 apps were tweaked to the point that they managed to keep their working sets in the 32k l1 data cache of an athlon, and only barely overflowed the much smaller 8 or 12k data cache of a p4.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692127185
RAC: 119955

RE: That's interesting, and

Message 73315 in response to message 73314

Quote:
That's interesting, and would seem to imply that there's a fair amount of room to tweak this app. Akos's S4/s5r1 apps were tweaked to the point that they managed to keep their working sets in the 32k l1 data cache of an athlon, and only barely overflowed the much smaller 8 or 12k data cache of a p4.

OK, but you can't compare the S5R1 and S5R2/3 apps 1:1 because the "hierarchical search" apps in R2 and R3 added a complete processing stage, see this message by one of the project scientists.

Quote:


The key step starting with S5R2 was to move part of the "post processing" from our server to the E@H hosts: previous searches performed one (or two) "F-statistic" searches on the host before sending back the results. These searches were performed over a number (between 17 and 60 in different runs) of different time stretches ("stacks"), which we combined using a "coincidence scheme" in the post-processing stage on the server. The amount of data (ie number of candidates) that can be allowed to be sent back from each host to the server is limited, and it turned out that this was the main factor holding back our achievable sensitivity.

The new "Hierarchical" search scheme, used since S5R2, performs F-statistic searches over 84 different stacks, then combines the results by a sophisticated coincidence scheme ("Hough transform") on the host, and only *then* sends back the results to the server. This avoids the data-returning bottleneck of previous runs and substantially increases the expected sensitivity (by about a factor of 6!)

CU

Bikeman

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4274
Credit: 245423444
RAC: 11596

RE: RE: The new

Message 73316 in response to message 73315

Quote:
Quote:
The new "Hierarchical" search scheme, used since S5R2, performs F-statistic searches over 84 different stacks, then combines the results by a sophisticated coincidence scheme ("Hough transform") on the host, and only *then* sends back the results to the server.


And yes, it's this coincidence step that wasn't present in S5R1 and before that causes the run-time variation, the memory bandwidth dependency and some other unexpected behavior.

BM

BM

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

This is the thread which

This is the thread which Richard gave me for learning material. I'd have never found it without a link. are there others?

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

moved to this thead as

moved to this thead as requested.

I have read ALL linked threads PRIOR to posting subsequent queries. I've even unchecked the "reorder stickies" incase I've over looked something. As proof that I've done a bit of reading:

Someone should alter the "return by date" for resumes' for the Sysadmin position (as shown on the front page), as it's really old, or perhaps remove the advertised "sysadmin" position. I've looked many places for my answers.
Bikeman wrote:

Quote:

Hi!

You'll find some infos in the "Visualization" Thread that also was recommended to you.

read it

Quote:

The task number is directly related to the region of the sky that the respective WU deals with. The zero numbered WU always seems to start at a pole of the starphere coordinate system, so to speak. The following tasks will traverse the sky towards the equator, and then towards the opposing pole. Then it will start all over with other parameters.

Tasks that look at sky-regions near the poles are slower than those near the equator. Because "space" to look at get's smaller at the pole (the circumference of the ring of points that are investigated gets smaller), the "angular speed" of the search increases near the poles, and I guess that's what produces the steep slope of the graphs near the maximums.

For different search frequency (the first number in the WU name), the algorithm traverses the sky with different speeds, so it takes more tasks to cover the whole sky at higher frequencies ==> longer period of the runtime oscillation.

CU

Bikeman


This explains much, like the symetrical similarity and reason for the waveform.

So, the starting "task number" is at the pole, the 1/2 way point is the equator, and it finishes at the other pole.

OK, I still don't know the "agreed upon" naming convention of those two sets of numbers. It looks like "frequency" is the first "task number" is the second. I'd hate to produce data and mislabel it. Which is really why I asked in the first place. Is there a Max "task number"? Currently my highest is 555. This would define the "width" of a chart, and is the reason I'm asking. It would also let me know if my data starts in the middle of a run, or at the beginning (or would it). Are the number of "tasks" variable?

The quantity of "task numbers" also lets me know if I should bother continuing trying to get 4.15 samples. It appears from comments by Bernd, that He's in favor of switching the "recommended" application to 4.25. So, with that in mind, perhaps I should dump my use of 4.15 and concentrate on 4.25???? Perhaps, I haven't the time to get enough 4.15 to be meaningful/useful.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692127185
RAC: 119955

RE: So, the starting "task

Message 73319 in response to message 73318

Quote:

So, the starting "task number" is at the pole, the 1/2 way point is the equator, and it finishes at the other pole.

Yup

Quote:

OK, I still don't know the "agreed upon" naming convention of those two sets of numbers. It looks like "frequency" is the first "task number" is the second. I'd hate to produce data and mislabel it.


Correct. Here, frequency is the frequency of the gravitation wave we hope to help find in this WU. That would be twice the pin rate of a pulsar producing this wave.

Quote:

Is there a Max "task number"? Currently my highest is 555. This would define the "width" of a chart, and is the reason I'm asking.


Hmm, don't know. I guess < 1000 is a safe assumption for currently distributed WUs.

Quote:

It would also let me know if my data starts in the middle of a run, or at the beginning (or would it). Are the number of "tasks" variable?


I think so: higher frequencies, more tasks. (?)

Quote:

The quantity of "task numbers" also lets me know if I should bother continuing trying to get 4.15 samples. It appears from comments by Bernd, that He's in favor of switching the "recommended" application to 4.25. So, with that in mind, perhaps I should dump my use of 4.15 and concentrate on 4.25???? Perhaps, I haven't the time to get enough 4.15 to be meaningful/useful.

Here's an idea: you can use the graphics window to tell what region of the sky the current WU is all about: see that figure labeled DE in the lower right hand corner? That's declination, 0 means equator, +/-90 means poles. This might help to give you an idea where in the task cycle your current WU is and what runtime to expect.

CU

Bikeman

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1000560
RAC: 0

Looking a Richards

Looking a Richards chart:


(direct link

are you saying that the trip from pole to pole is represented by the data from the left to the right side of the chart, or from Peak to Peak, or something else entirely?

You stated that as the first number (frequency) rises ....No, wait a minute. I think you're saying that the number of tasks is NOT fixed, but increases with the frequency. Is there a max? I read what you wrote as the "CPU run time" increases as freq increases. I wonder if the Peak to Trough amplitude also responds in a predictable way. Anyway, perhaps, I should just find an app they'll continue to use and then get and study the relationships myself.

I don't seem to express my interest very well.

tony

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692127185
RAC: 119955

RE: I don't seem to

Message 73321 in response to message 73320

Quote:

I don't seem to express my interest very well.

I guess it's me being in weekend mode :-), sorry.

Quote:


Looking a Richards chart:


(direct link

are you saying that the trip from pole to pole is represented by the data from the left to the right side of the chart, or from Peak to Peak, or something else entirely?

At the Peak, the WU will look at the poles.
Left to right represents Pole->Equator->other Pole, Pole->Equator->other Pole ....

Quote:

You stated that as the first number (frequency) rises ....No, wait a minute. I think you're saying that the number of tasks is NOT fixed, but increases with the frequency. Is there a max?

There will be a max for the task nr. Since the set of WUs is finite :-), but I don't know the max. Assume 1000 for now.

Quote:

I read what you wrote as the "CPU run time" increases as freq increases.

No, at least not in terms of credits/CPU second. The main fluctuation comes from the sky-position thing described above.

Quote:


I wonder if the Peak to Trough amplitude also responds in a predictable way. Anyway, perhaps, I should just find an app they'll continue to use and then get and study the relationships myself.

tony

This is were it gets real nasty (or interesting): the relative difference in runtime between min and max cpu time will depend on the compiler used, the CPU brand, memory bandwidth, etc !!!

What seems to be known by now is that (very simplified now) floating point performance is the "base" of the overall performance, it will determine how low the minimum runtime can get. Memory latency and throughput will determine the difference between minimum and maximum. Systems with fast Floating point (and code with vector optimizations) but slow memory should see the biggest relative spread.

On multicores/ multi CPU systems), the relative spread should be bigger when running multiple instances as compared to running on a single core only (if cores share the same bus).

That's the theory, at least.

CU
Bikeman


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.