Petition - Deadline Relief for Longest Results


Advanced search

Message boards : Cruncher's Corner : Petition - Deadline Relief for Longest Results

Sort
AuthorMessage
Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 71995 - Posted 16 Jul 2007 0:59:33 UTC

Many of you no doubt will be aware of the spate of "monster" results that have been seen recently. You will also recall that Bernd posted a link to a graph in PDF format showing in general terms how the expected crunch time would change based on frequency.

Over a period now, many participants have aired concerns about the deadline stress that the longer running work is causing. People with slower machines and people supporting multiple projects are most affected.

Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

My intention in starting this petition is NOT to invite people to start a bitch session. Be warned that I will delete posts that engage in flaming/bitching. I'm sure that the Devs are fully aware of all the arguments that you might dream up and feel like venting here. Please don't do that. So you only need to post (vote) once and keep it short and courteous.

Please realise that all this petition will (and should) do is provide a convenient mechanism for alerting the Devs to the strength of numbers of people who need deadline relief. Based on those numbers and the needs of the project, the Devs will take whatever action they feel is appropriate. That's the way it should be.

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?
____________
Cheers,
Gary.

RandyC
Avatar
Joined: Jan 18 05
Posts: 319
ID: 3454
Credit: 1,949,162
RAC: 1,872
Message 72002 - Posted 16 Jul 2007 1:47:08 UTC - in response to Message 71995.
Last modified: 16 Jul 2007 1:48:49 UTC

1. The stated ranges seem appropriate
2. For me personally, the current deadlines are adequate

However, let me add the following:


  • All of my systems run 24/7
  • I have a broadband internet connection--always on
  • My slowest system is an AMD XP1700+
  • I do not micro-manage the BOINC Client
  • All my systems are connected to two projects (not all have the same two)
  • All my systems run WinXP Pro



My personal recommendation to the Devs is that (if possible), WUs having the below frequencies should only be sent to 'fast' systems, similar to how the S5R1 WUs were split out between fast and slow systems based on frequency range. I leave it to the Project to decide what a fast vs slow system is.


Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


[edit] strip excessive quote for readability [/edit]

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72030 - Posted 16 Jul 2007 11:25:04 UTC - in response to Message 71995.


Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


I think the most urgent candidates are the "monster WUs" above (ca. 530?) Hz and up to about 650 credits.

What would be reasonable for them.

My idea is the following:
A reasonably fast PC (something that is not considered an oldtimer) should be able to crunch this thing if active for 5 days a week and 8 hours a day, getting 50 % of the CPU time for E@H during this period (allowing time for other BOINC projects and "real" work on this PC!), within the deadline, with 25 % safety margin .

Let's say a reasonably fast machine is a machine in the league of (say) 10 credits / hour.

==> 650 / 10 = 65 CPU hours per monster WU ~ 130 hours wall clock time while computer is on.

==> ca. 16 "working days" (without safety margin) or 20 working days with a 25 % safety margin.

==> Under the above assumptions which are even a bit optimistic IMHO, the "reasonable" deadline for the monster WU would be 4 calendar weeks, not two.

If anybody thinks this is too long, we would either have to find a bug in my math or identify the assumptions that are wrong :-).

CU

BRM


____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 72038 - Posted 16 Jul 2007 12:45:09 UTC

Admittedly, I cut back my Einstein resource share about a year ago in favor of other projects. At that same time, I also cut back on my message board participation, so I am not "up to speed" on the issues anymore. That being said, I believe that a 2-week deadline for the "average S5R2 unit" is way too short and should be at least doubled. Doing so would bring Einstein deadlines more in line with those of other projects in which I participate. As to the question of "Longest Results", I would suggest: 6 weeks.
____________

Profile Svenie25
Joined: Mar 21 05
Posts: 117
ID: 62489
Credit: 538,288
RAC: 13
Message 72039 - Posted 16 Jul 2007 12:51:36 UTC

I don´t know how difficult it is, to implent a various deadline like at SETI. But this would be my favourite way. I personally don´t have any problems with the deadlines, running only 2 projects with 50/50 share on a 24/7 system.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72041 - Posted 16 Jul 2007 13:23:45 UTC
Last modified: 16 Jul 2007 13:25:32 UTC

Just two things to mention: Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72045 - Posted 16 Jul 2007 13:34:08 UTC - in response to Message 72041.
Last modified: 16 Jul 2007 13:35:03 UTC

Just two things to mention: Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM


To emphasize how serious the situation with the "monster WUs" is currently, let's estimate how many of the active hosts actually will be capable to crunch them in time:

For 650 credits within 14 days, you need on average of about 50 credits per day (allowing a day of buffer between download and the actual beginning of crunching).

Looking at boincstats.com, you will see that currently only about 32500 hosts participating in E@H have an average of 50 credits/day and above. Depending on the definition of "active host", there are about 60 .. 75k hosts active on E@H now. So we are talking about a third to half of the hosts that won't make it.

So I really do think that the "monster WUs" carry a dramatic potential for user frustration that could badly hurt the E@H user base if not acted upon.

CU

BRM

____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72058 - Posted 16 Jul 2007 17:27:24 UTC - in response to Message 71995.
Last modified: 16 Jul 2007 17:37:12 UTC


.
.
.
Over a period now, many participants have aired concerns about the deadline stress that the longer running work is causing. People with slower machines and people supporting multiple projects are most affected.

Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

.
.
.

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


1.) Yes, those are the ranges which give the most trouble for older hosts/multiple projects.

2.) Based on my observations, at least 3 weeks for the current app 'efficiency'.

@ Bernd: Agreed, loosening the tightness factor will increase the DB load at your end, however I think doubling it is the worst case scenario, and in practice would be far less than that.

OTOH, you have to measure that against the 'bad taste' in your mouth a participant gets on a slowhost when the project sends you a result that is marginal in meeting the deadline, turns out that it will overrun by a few days and then gets unconditionally aborted a few hours before completing because the reissue went to a small cache 'rocket' host.

IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.

AFA instant gratification goes, I think everybody knows my opinion on that. ;-)

Regards,

Alinator

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72060 - Posted 16 Jul 2007 17:50:23 UTC - in response to Message 72058.


IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.


I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick.

CU

BRM

____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72061 - Posted 16 Jul 2007 17:56:29 UTC - in response to Message 72060.


IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.


I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick.

CU

BRM


Using RAC might work for EAH due to the pretty close 'uniformity' of the work in general.

However there are a couple of problems with RAC generally in this application.

1.) It tends to break down when the host is not run 24/7. IOW's, variable time crunchers would/may have a problem.

2.) WU failures would tend to have more impact on scheduling decisions than they really should for all hosts.

Alinator

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72071 - Posted 16 Jul 2007 19:42:33 UTC - in response to Message 72060.
Last modified: 16 Jul 2007 19:49:24 UTC

I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM

Profile Svenie25
Joined: Mar 21 05
Posts: 117
ID: 62489
Credit: 538,288
RAC: 13
Message 72072 - Posted 16 Jul 2007 19:57:01 UTC - in response to Message 72071.

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


I think so. This should be the same system at SETI, I think. Ther is also a variable deadline from the size of the WU.
____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72073 - Posted 16 Jul 2007 20:09:26 UTC - in response to Message 72071.
Last modified: 16 Jul 2007 20:13:03 UTC

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


Agreed, and when the scheduler decides whether or not to send a given WU to host it has the reported performance and other metrics for the host to use for that purpose.

However, keep in mind you would only 'need' to use variable deadlines if you intended to send the whole spectrum of template frequencies to all hosts so that the tightness factor was constant over that range.

If you were to stick with the 'slowhost/fasthost' method used in S5R1/I, then the main factor would be the trigger points for the ranges. Of course the downside to that with a fixed two week deadline is you will progressively raise the bar for who can participate as the work gets 'tougher'. IOW's slowhosts would only be able to run a smaller fraction of the work, all other things being equal.

So it seems to me that the choice of stategies boils down to how much the extra load from either method would impact the DB backend. Although I can't say for sure I would think that just bumping the deadline a week would have less effect than variable deadlines at fixed tightness factor would, since my data indicates it could easily take a host at the low end of the speed spectrum a month to run the 'toughies'.

<edit> Thinking about it, since we're Beta right now anyway, wouldn't it be easier to test the 3 week deadline theory at this point than variable deadlines with regard to DB load?

Alinator

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 72074 - Posted 16 Jul 2007 20:12:05 UTC - in response to Message 72071.

I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


That seems to be the way that SETI does it.

BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.
____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72076 - Posted 16 Jul 2007 20:15:01 UTC - in response to Message 72074.
Last modified: 16 Jul 2007 20:15:56 UTC


BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.


You've just observed what I'm talking about when I speak of 'tightness factor'.

FWIW, EAH has normally been a tighter project than SAH historically speaking.

Alinator

ohiomike
Avatar
Joined: Nov 4 06
Posts: 80
ID: 228690
Credit: 3,719,639
RAC: 14,010
Message 72080 - Posted 16 Jul 2007 21:09:04 UTC
Last modified: 16 Jul 2007 21:09:55 UTC

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72082 - Posted 16 Jul 2007 22:05:31 UTC - in response to Message 72080.
Last modified: 16 Jul 2007 22:19:12 UTC

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


No offense meant, but issuing trailers by default is the last thing which should be considered, especially while we're in this beta phase and other possible scheduling issues have been observed.

My reasons:

1.) It means that any host not running a 5.10 client will end up wasting at least some of, up to most of it's time running scientifically useless results and therefore wasting the participants money spent on power.

2.) With a tight deadline project your host might find itself running a little late, but end up getting unconditionally aborted after having crunched most of the result, due to the third result coming in and validating. There are other twists to this scenario, and applies to 5.5 CC's and up (IIRC).

3.) The amount of time a result stays pending has zero long term impact on any of your performance metrics, regardless of the reason for it.

4.) The large host cache scenario, where 221 functionality works to mitigate the wasted time issue for always connected fast hosts, is really intended for people who are not always connected (ie notebooks and DU participants). Issuing trailers by default just to placate instant gratification unduly penalizes them due to Items 1 and 2. One only needs to look at Dr. Anderson's comments regarding this to see how the 'head man' feels about it, the cache decoupling feature was only recently released, and the jury is still out regarding whether it's a good or bad thing in the context of being available to the whole spectrum of participants. My guess is 221 functionality was added as well in order to prevent wholesale deadline blowing in extreme cache, short CI scenarios when running multiple projects.

Alinator

Profile KSMarksPsych
Forum moderator
Avatar
Joined: Oct 15 05
Posts: 2349
ID: 114819
Credit: 422,629
RAC: 18
Message 72097 - Posted 17 Jul 2007 8:27:46 UTC - in response to Message 72080.

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.



But doesn't that take a backend update as well (which is needed here)? I don't run Seti and I haven't been paying very close attention to all of the server side stuff that's come up in the last few weeks. I've had enough trouble keeping up with client side stuff.

On the topic of the thread, I personally am not running into deadline issues. But variable deadlines does seem to fit the bill here.
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 72106 - Posted 17 Jul 2007 11:50:02 UTC - in response to Message 72041.

Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.


If I understand things correctly (and that's a big if) I don't think this statement is necessarily true.

I would think that many of the result pairs - perhaps even the majority - get fully completed in less than 10 days with a 14 day deadline. I base this on an observation of many of my own results over time. I ask this question. If the deadline had been 28 days instead of 14, would all those people who are taking 10 days or less suddenly start taking 20 days? I wouldn't have thought so. In fact isn't it true to say that a simple increase in deadline would have no effect on those who are currently meeting the deadline unless they suddenly started running their machines less hours per day or suddenly reduced the resource share that they were prepared to allocate to EAH or suddenly did something silly like drastically increasing their cache size? My gut feeling is that whilst some may take some of these three actions, most wouldn't.

As far as waiting for validation and credits, I don't think there would be much change. The anecdotal evidence suggests that there is a significant drift of machines away from the project because the perception of the owners is that they can't abide the long crunch times and the strict deadlines. Many simply leave without completing what they have which means that work has to be reissued. In other words, the results so effected are going to take a long time to validate anyway. A longer deadline would encourage many of those people to "stick it out" which may actually reduce the total time for validation on quite a few results. Someone sticking to the job for 20 days is going to be faster than two people successively failing a 14 day deadline.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.


I think absolutely that this is the way to go. As the workunit generator knows the "size" and could calculate and insert an appropriate deadline, this course of action obeys the KISS principle. On your PDF graph you showed "size" in terms of crunch hours - 10, 20, 30, etc. It almost seems appropriate to change those into deadline days - 10 days, 20 days, 30 days. It wouldn't need to be a continuous function - you could put certain frequencies into "speed bins" and have a single deadline for each bin - whatever is easiest for the WU generator to do.


____________
Cheers,
Gary.

archae86
Joined: Dec 6 05
Posts: 569
ID: 139940
Credit: 5,757,826
RAC: 9,250
Message 72110 - Posted 17 Jul 2007 13:27:21 UTC - in response to Message 72041.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM

One comment: I've observed an undesirable side-effect on the short end of the current variable SETI deadlines.

For users running more than one project, one with low resource share can very easily trip into EDF processing when a new result is downloaded with a low predicted runtime and thus a very early deadline. On my machines, this effect is annoying. I run some at 2% SETI share, but when SETI has a server hiccup, during recovery my machine overfetches (a long-known bug), and if among the overfetch are some short ones, I get into Earliest Deadline First. If I run a queue of more than trivial length, soon some Einstein units are in EDF. I won't go down the path of arguing whether anyone should care about this--the simple fact is that a fair number of people do.

I'd think most of us would like the behavior of the variable deadline with size approach so long as the low end did not dip below something like ten days.

On the long end the biggest project risk I can see is that an unlucky WU which gets downloaded to sequence of machines which quit or invalidate could take even longer to finally get resolved than now. So the tail at the end of current campaign could take even longer and lead to even more massive multiple issuing in the end game.

On balance I think it a good idea. As a first guess I'd suggest the smallest units currently issued get ten days, and the largest currently issue get double the current deadline, with a linear scale between. No magic bullet this, but possibly a decent compromise among the considerations.

____________

Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 72145 - Posted 18 Jul 2007 5:37:22 UTC

When you look at the basic assumptions for BOINC then the deadlines MUST be extended.
You cannot expect hosts to be on 24/7, most are probably only on during office hours, or at home, but not sleeping hours, so don't expect more than 8 hrs/day as an absolute max.
You must assume the host is attached to more than one project, so divide time by 2 or 3.
The host probably uses windows and uses standard app, and on average is two years old. Therefore crunch time per Einstein, mid range, unit is 20+ hours.
BOINC is only expected to use spare cpu cycles.
And in reality a 14 day deadline is probably closer to 12 days crunching. As each unit is probably downloaded a day before it starts and the scheduler tries to return 24 hrs before actual deadline.


From this I would guess the average attached computer could do one unit/cpu at most and probably at some point is in EDF for the Einstein units.

Andy

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72154 - Posted 18 Jul 2007 12:14:32 UTC - in response to Message 72097.

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


But doesn't that take a backend update as well (which is needed here)?


I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM

ohiomike
Avatar
Joined: Nov 4 06
Posts: 80
ID: 228690
Credit: 3,719,639
RAC: 14,010
Message 72156 - Posted 18 Jul 2007 13:40:01 UTC - in response to Message 72154.

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


But doesn't that take a backend update as well (which is needed here)?


I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM


There is no work/credit lost. WU's are not aborted if they have begun to run. Only WU's in the clients queue that are no longer needed are aborted. All in all it is a good thing because almost 100% of the WU's crunched are used. The sending of the "trailer" and the book-keeping overhead might be a pain for the project however.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72157 - Posted 18 Jul 2007 13:58:39 UTC - in response to Message 72156.

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


But doesn't that take a backend update as well (which is needed here)?


I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM


There is no work/credit lost. WU's are not aborted if they have begun to run. Only WU's in the clients queue that are no longer needed are aborted. All in all it is a good thing because almost 100% of the WU's crunched are used. The sending of the "trailer" and the book-keeping overhead might be a pain for the project however.


But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.

CU

BRM

____________

Nothing But Idle Time
Joined: Aug 24 05
Posts: 158
ID: 103162
Credit: 289,204
RAC: 0
Message 72164 - Posted 18 Jul 2007 17:34:01 UTC - in response to Message 72157.

<clipped> But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.
BRM
Strongly agree.

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72165 - Posted 18 Jul 2007 18:14:41 UTC - in response to Message 72157.
Last modified: 18 Jul 2007 18:16:28 UTC


But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.

CU

BRM


I started tracking this about 10 months ago when enough Core 2 D's and then Q's started appearing over at SAH to start making an impact on the work stream my hosts were seeing.

As I said before, the criteria SAH used to set the tightness factor for the deadline was that a PI-100 runing with 33.3333% machine ontime should be able to make the deadline (IIRC).

What's working out in practice currently is that with a 3/2 IR/MQ there is no scientifically useful reason to run basically anything less than a PIII or Athlon 'Classic' on SAH, since the odds are the result returned will be the trailer for the WU, even if you run it 24/7. Currently, my Katmai 550 running 24/7 with a 0.01 cache coupled CI is still effective, but if I ran it for 12 hours per day or with a 1 or 2 day CI, I estimated it would be returning 50% trailers or more.

So looking it at it from the viewpoint of not wasting my money on electricity I'd have to serious consider dropping the project on this host.

The beauty of EAH has been ever since you went to 2/2 way back when, as long as the host can meet the deadline, you know for a fact your host has contributed to the science, and therefore it was worth running it here no matter how fast or slow it is. This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, as well as for some reason the scheduler has seen fit to send the oldest ones template frequencies which are beyond their capabilities with a 2 week deadline.

Alinator


Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72167 - Posted 18 Jul 2007 18:28:23 UTC - in response to Message 72165.

This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, ....


Akos made an interesting remark, stating that the new apps are, in fact, several orders of magnitude *faster* than the old ones, probably meaning that they can do the same "scientific work" many times faster. So if the pre-S5R2 apps were biplanes, the new ones seem to be jet fighters. Problem is they get assigned much longer missions (just to stretch your paradigm a bit more :-) ) in the hierarchical all-sky search of S5R2.

I just wanted to clearify "slow" a bit so people don't get the impression that the apps "deteriorated" over time in some way.

CU

BRM



____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72169 - Posted 18 Jul 2007 18:38:27 UTC - in response to Message 72167.
Last modified: 18 Jul 2007 18:47:47 UTC

This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, ....


Akos made an interesting remark, stating that the new apps are, in fact, several orders of magnitude *faster* than the old ones, probably meaning that they can do the same "scientific work" many times faster. So if the pre-S5R2 apps were biplanes, the new ones seem to be jet fighters. Problem is they get assigned much longer missions (just to stretch your paradigm a bit more :-) ) in the hierarchical all-sky search of S5R2.

I just wanted to clearify "slow" a bit so people don't get the impression that the apps "deteriorated" over time in some way.

CU

BRM


LOL...

Agreed. It's all relative (as it should be on on EAH). ;-)

The new work is more difficult compared to the old work. So even though the new apps have a lot of the improvements which were in the old apps performance wise, in the current configuration it seems like they are slower, relativityly (pun intended) speaking! :-)

Alinator

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72171 - Posted 18 Jul 2007 18:43:11 UTC - in response to Message 72154.

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM


Bernd,

Take a look at this host:

Bad 4.19 Host

You may have to consider cutting off clients older than the later releases of 4x in some circumstances. IIRC, there were serious client side scheduler and other issues with some of them. While this wasn't such a big deal back then, it seems to cause some problems with the state of the project today. ;-)

Alinator

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72175 - Posted 18 Jul 2007 19:42:31 UTC - in response to Message 72171.

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM


Bernd,

Take a look at this host:

Bad 4.19 Host

You may have to consider cutting off clients older than the later releases of 4x in some circumstances. IIRC, there were serious client side scheduler and other issues with some of them. While this wasn't such a big deal back then, it seems to cause some problems with the state of the project today. ;-)

Alinator


Is the link correct?
CU

BRM


____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72176 - Posted 18 Jul 2007 19:51:15 UTC
Last modified: 18 Jul 2007 20:12:21 UTC

OOPS.....

Bad 4.19 host

LOL...

The one time I didn't check to make sure the link was right before moving on to other problems! ;-)

<edit> The thought just occured to me that hosts like this might have been a contributing factor to the trouble we saw at the end of S5R1. Since it appears it will grab a big load of work every time it connnects and then blow the deadline for all but a few, it could leave the project side thinking that a given set of datapaks has an adequate number of hosts running it and possibly delay the time it takes to get around to issuing them to a host more likely to actually return them on time.

Alinator

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72178 - Posted 18 Jul 2007 20:23:39 UTC - in response to Message 72167.
Last modified: 18 Jul 2007 20:24:35 UTC

This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, ....


Akos made an interesting remark, stating that the new apps are, in fact, several orders of magnitude *faster* than the old ones, probably meaning that they can do the same "scientific work" many times faster. So if the pre-S5R2 apps were biplanes, the new ones seem to be jet fighters. Problem is they get assigned much longer missions (just to stretch your paradigm a bit more :-) ) in the hierarchical all-sky search of S5R2.

I just wanted to clearify "slow" a bit so people don't get the impression that the apps "deteriorated" over time in some way.

Let me just emphasize what I wrote in the original "S5R2" posting:
The "science run #5" of the LIGO instruments, or S5 for short, gives us not only the most sensitive data, but also the largest amount of data we ever had. [...]

However, with our present [i.e. S5R1] analysis tool, the computation time needed grows to the power of six over the amount of data.

If an analysis of the S4 data took a year, analyzing twice as much data would have taken about 64 years with the old program (and the same computing power). The new program should basically be able to do this in about a year again (actually less, but we have more than twice the data). So it's about fair to say that the new program does the same work 64 times faster than the old.

BM

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72181 - Posted 18 Jul 2007 20:46:45 UTC - in response to Message 72178.
Last modified: 18 Jul 2007 20:47:57 UTC

If an analysis of the S4 data took a year, analyzing twice as much data would have taken about 64 years with the old program (and the same computing power). The new program should basically be able to do this in about a year again (actually less, but we have more than twice the data). So it's about fair to say that the new program does the same work 64 times faster than the old.

BM


Agreed, and I didn't (and am pretty sure Bikeman didn't either) mean to give the impression that overall the project was moving 'backwards' from a performance POV on running the analysis.

It's really a perception issue on the participants part when you look at the project with a 'blackbox' view and then compare one project to another, more than anything else (IMHO).


Profile Gerry Rough
Avatar
Joined: Mar 1 05
Posts: 95
ID: 43437
Credit: 590,135
RAC: 773
Message 72188 - Posted 18 Jul 2007 23:37:27 UTC - in response to Message 72157.
Last modified: 18 Jul 2007 23:39:14 UTC


There is no work/credit lost. WU's are not aborted if they have begun to run. Only WU's in the clients queue that are no longer needed are aborted. All in all it is a good thing because almost 100% of the WU's crunched are used. The sending of the "trailer" and the book-keeping overhead might be a pain for the project however.


But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.

CU

BRM


I agreee, buit only mildly. Either way is fine with me.

Also, it was mentioned in the other thread that there are several reasons why this one didn't turn into another bitch session on the project. One more reason IMHO: crunchers who frequent several boards have learned as a group not to get into complaints so easily: discussions turn sour quickly, so perhaps avoiding the chance to get another discussion off track may be at work to some degree. Let's face it, sometimes it gets ugly. Witness the recent smear on the P@H boards. I think many don't want it repeated here as well.
____________

(Click for detailed stats)

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72214 - Posted 19 Jul 2007 9:33:52 UTC

We just started a new workunit generator with "dynamic deadlines". The deadlines of workunits generated from now on will vary between two and three weeks depending on the size of the wokunit (i.e. the number of templates within it, which should be proportional to the credit granted).

We'll watch it for a while, maybe we need to adjust the actual numbers.

BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72215 - Posted 19 Jul 2007 9:42:15 UTC - in response to Message 72190.

Bernd Machenschalk
I am trying to contact an Einstein@home project Admin.

So the right address would probably be Bruce Allen and David Hammer.
I'll forward the request to them.

In the longer run after upgrading our backend to newer BOINC you should be able to solve this problem yourself, but I don't know for when such an upgrade is scheduled.

BM

jowr
Joined: Feb 19 05
Posts: 55
ID: 19126
Credit: 1,947,636
RAC: 0
Message 72262 - Posted 19 Jul 2007 23:44:37 UTC

1) I have no opinion on frequency ranges. Whatever gives the best science.
2) I think 3 weeks would be fairly good, as it would allow slower hosts to get some of the larger work units done. As of now, it takes [full time] some of my machines a full week to process a work unit. I have seen at one confirmed case where it took a full 2 weeks [1 million seconds! holy crap].
____________

Dave Burbank
Joined: Jan 30 06
Posts: 275
ID: 168016
Credit: 1,548,376
RAC: 0
Message 72268 - Posted 20 Jul 2007 1:42:42 UTC - in response to Message 72214.

We just started a new workunit generator with "dynamic deadlines". The deadlines of workunits generated from now on will vary between two and three weeks depending on the size of the wokunit (i.e. the number of templates within it, which should be proportional to the credit granted).

We'll watch it for a while, maybe we need to adjust the actual numbers.

BM


That's great to hear! While this hasn't been an issue for me, this will hopefully bring back users who were unable to, or afraid of not making the deadlines. I'm proud to be apart of a project where the development team actually listens to their user base.

Maybe mentioning this on E@H's main page will help make more users aware of this change.
____________
There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 72271 - Posted 20 Jul 2007 4:38:50 UTC - in response to Message 72214.

We just started a new workunit generator with "dynamic deadlines" ....


Bernd,

I think this is a really great outcome. Thank you very much for taking the views of the participants back to the team and for being willing to get involved in this issue. This should be great for participant morale.


____________
Cheers,
Gary.

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72273 - Posted 20 Jul 2007 6:13:24 UTC - in response to Message 72214.

We just started a new workunit generator with "dynamic deadlines". The deadlines of workunits generated from now on will vary between two and three weeks depending on the size of the wokunit (i.e. the number of templates within it, which should be proportional to the credit granted).

We'll watch it for a while, maybe we need to adjust the actual numbers.

BM


Great news!

CU

BRM

____________

John McLeod VII
Forum moderator
Project developer
Avatar
Joined: Nov 10 04
Posts: 546
ID: 354
Credit: 121,983
RAC: 36
Message 72295 - Posted 20 Jul 2007 14:52:12 UTC

3 weeks is still a bit short for some computers. I have one that is going to take more than 12 days to complete. I have another that I pulled off the project that was going to take more than 20 days. these are attached to other projects, so they will have other work on the host when the Einstein task is downloaded. They are going to miss deadlines even at three weeks. BTW, they are on and crunching 24/7.
____________

BOINC WIKI

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72306 - Posted 20 Jul 2007 16:50:12 UTC

Agreed, 3 weeks brings the majority of the work within the window for my K6-2/500's, but the K6/300's will still not make it for the high end template frequencies.

I haven't sat down and crunched the numbers to estimate what the new cutoff points are for them yet. However, the new setup should provide significant relief for many hosts.

Alinator

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 72309 - Posted 20 Jul 2007 17:09:37 UTC - in response to Message 72306.
Last modified: 20 Jul 2007 17:12:11 UTC

Agreed, 3 weeks brings the majority of the work within the window for my K6-2/500's, but the K6/300's will still not make it for the high end template frequencies.

I haven't sat down and crunched the numbers to estimate what the new cutoff points are for them yet. However, the new setup should provide significant relief for many hosts.

Alinator


With 3 Weeks deadline, you need about 33 RAC (simplified calculation...I know, just as a very rough estimate).

I checked with boincstats.com and it seems that about 40000 hosts fall into this category.

If you want to cover 50000, the RAC goes down to 15 (!) or about 6 week deadline. So a transition from 2 to 3 weeks does a lot to allow more hosts to crunch "monsters", but an additional week (to 4 weeks) would not buy that much, IMHO.


I know, this is simplified: some of the low RAC machines are fast enough but have low RACs becazuse they are switched off for longer periods in a row etc, or are new and not yet at their full RAC-level. But you get an idea, lacking better data.

CU

BRM

____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 72312 - Posted 20 Jul 2007 17:43:03 UTC

Yep, RAC is OK for a rough guessitmate as long as you take into account your usage pattern when deciding.

Usually it's better to use the CPCS (Credit Per CPU Second), the time stats, Resource Share, and the RDCF for the most accuracy in close call situations. That's what the project uses to differentiate between fast and slow hosts.

The best indicator I've found is to use my own empirical data for the Credit Rate on my hosts, but you have to be tracking that independantly over time on your own. The catch is that can be time consuming in it's own right. ;-)

In any event, I don't mind the fact there is work some hosts can't deal with under the project parameters. I just wish the scheduler would take my oldest timers seriously when they tell it, "You've gotta be kidding, right!". :-)

I guess to demonstrate the concepts we've been talking about here more clearly, I'll pull some actual numbers from mine when I do the next data logging session for them and post them.

Alinator

Profile Slywy
Joined: Jan 26 06
Posts: 9
ID: 166966
Credit: 22,176
RAC: 14
Message 72535 - Posted 25 Jul 2007 2:45:27 UTC - in response to Message 71995.



Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


I just finished a workunit that I had to do some contortions on (i.e., leave computer running during the day whilst at work, up Einstein's allocation, etc.) to finish, and it completed with only two hours to spare even then. I don't know the answer to the first question, but I suspect I really need at least 21-24 days, I think, of normal running time for that particular type of WU (about 375 credits).
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 72538 - Posted 25 Jul 2007 4:45:59 UTC - in response to Message 72214.
Last modified: 25 Jul 2007 4:58:23 UTC

We just started a new workunit generator with "dynamic deadlines". The deadlines of workunits generated from now on will vary between two and three weeks depending on the size of the wokunit (i.e. the number of templates within it, which should be proportional to the credit granted).

We'll watch it for a while, maybe we need to adjust the actual numbers.

BM


Just downloaded this WU/Result and noticed it only has a 2 week deadline. However, the unit appears to be similar (in estimated "To completion" time) to this 635 point WU processed last week. That is, if I am right about it being another "Monster", it doesn't appear that the new variable deadline generator is working as advertized.

____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 72542 - Posted 25 Jul 2007 6:12:11 UTC - in response to Message 72538.

]... it doesn't appear that the new variable deadline generator is working as advertized.


If you have a look at the full quorum for that workunit, the result you have just downloaded is a replacement for one that has just exceeded the deadline on the computer of the previous wingman.

As the deadlines are built into the workunits, unfortunately you have been lumbered with the same deadline that the previous wingman "enjoyed" :).

You'll just have to work a bit faster :).


____________
Cheers,
Gary.

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 72564 - Posted 25 Jul 2007 22:08:26 UTC - in response to Message 72542.

As the deadlines are built into the workunits, unfortunately you have been lumbered with the same deadline that the previous wingman "enjoyed" :).


Thanks Gary! I didn't realize it worked that way.


You'll just have to work a bit faster :).


Now, if I understand this part correctly, if I don't finish by my deadline, I just need to beat the "next wingman's" result in order to receive credit.

____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 72565 - Posted 25 Jul 2007 23:30:05 UTC - in response to Message 72564.
Last modified: 25 Jul 2007 23:38:56 UTC

... I just need to beat the "next wingman's" result in order to receive credit.


Yep. You can usually count on a few extra days - probably 1 day as a minimum if you're unlucky :).

However, in this particular case, it is possible that you could get no margin at all. If you look at the workunit, and ignoring the compute error result #3, your result (#4) has essentially replaced #1. #2 will exceed the deadline shortly so there could be an extra wingman added as well. If that happened (let's call him #5), if any two of #1, #2, or #5 were to suddenly submit valid results, a quorum would be formed and you would need to submit by the deadline in order to get credit.

Fun, isn't it :).


EDIT: I've just had a look at the computers of #1 and #2 and it seems there's little liklihood of either submitting any time soon. I think you and #5, whoever he turns out to be, will have the best chance of completing the quorum :).


____________
Cheers,
Gary.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 72579 - Posted 26 Jul 2007 15:43:43 UTC

Unfortunately the change only affects the newly generated workunits, older ones will still keep the old deadline. Feel free to abort the Task if you feel you can't meet the deadline.

(Gary, I'd like to contact you individually. I wrote two messages to the eMail address you registered here. Did you get them?)

BM

Lloyd M.
Joined: Apr 24 07
Posts: 24
ID: 255965
Credit: 259,361
RAC: 0
Message 72816 - Posted 31 Jul 2007 2:57:14 UTC - in response to Message 71995.

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


I've only run into one instance where I've had to manually abort WU's, because I somehow managed to get one machine hugely overcommitted. That being said, my default resource share for E@H has to be much higher (up to close to two orders of magnitude, in some cases) than my other four projects, in order to get any WUs at all.

I don't recall seeing any "monster" WUs. The WUs I have taken notice of are in the 350-450 credit range. These take approximately 28 CPU hours each (Opteron 170 2gHz, dual core, graphics enabled, Windows XP), 25 CPU hours (Athlon 64 3200+, Windows 2000, run as service) and 23 CPU hours (Athlon 64 3700+, linux, no graphics).

Aside from that one instance of somehow managing to get one machine hopelessly overcommitted, meeting deadlines is never a problem. To the greatest extent possible, I run everything 24/7.

I have a Celeron 400, and even with an optimized SETI client, it only ever managed about 25 RAC. I decided it wasn't worth the heat generated and the electricity to run it. Though I never even tried it, I wonder about the wisdom of running E@H on a machine that slow. I suppose if someone offers, one shouldn't turn them down. On the other hand, do we really want to set people up for failure and frustration? That one time I did get overcommitted, it was a drag to "lose" WUs that took so much time to try and process, only to see them get downloaded to other hosts becaus I barely missed the deadline.

If there is some simple way to make sure that slower machines only get the "easiest" WUs, that might lower the stress level of people that want to help, but only have modest computing resources. I don't know if there are system requirements published, and it might be worth telling people how much CPU they need (running for how many hours a day) to have any reasonable expectation of meeting deadlines.
____________

Profile tullio
Joined: Jan 22 05
Posts: 1175
ID: 6186
Credit: 167,788
RAC: 180
Message 72818 - Posted 31 Jul 2007 3:48:39 UTC
Last modified: 31 Jul 2007 3:55:12 UTC

My 400 MHz PII Deschutes has just finished a SETI WU (58 credits), a QMC WU (299 credits) and is already 68% ahead of a big Einstein WU, well within its 9 August deadline. So even a slow CPU can be useful, if you just manage it. I had only to give the "no more work" instruction to SETI in order to meet the QMC deadline.
Tullio
____________

Message boards : Cruncher's Corner : Petition - Deadline Relief for Longest Results


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration