Petition - Deadline Relief for Longest Results

log in

Advanced search

Message boards : Cruncher's Corner : Petition - Deadline Relief for Longest Results

1 · 2 · 3 · Next
Author Message
Profile Gary Roberts
Volunteer moderator
Send message
Joined: 9 Feb 05
Posts: 3768
Credit: 3,408,059,539
RAC: 3,959,116
Message 71995 - Posted: 16 Jul 2007, 0:59:33 UTC

Many of you no doubt will be aware of the spate of "monster" results that have been seen recently. You will also recall that Bernd posted a link to a graph in PDF format showing in general terms how the expected crunch time would change based on frequency.

Over a period now, many participants have aired concerns about the deadline stress that the longer running work is causing. People with slower machines and people supporting multiple projects are most affected.

Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

My intention in starting this petition is NOT to invite people to start a bitch session. Be warned that I will delete posts that engage in flaming/bitching. I'm sure that the Devs are fully aware of all the arguments that you might dream up and feel like venting here. Please don't do that. So you only need to post (vote) once and keep it short and courteous.

Please realise that all this petition will (and should) do is provide a convenient mechanism for alerting the Devs to the strength of numbers of people who need deadline relief. Based on those numbers and the needs of the project, the Devs will take whatever action they feel is appropriate. That's the way it should be.

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?
____________
Cheers,
Gary.

RandyC
Avatar
Send message
Joined: 18 Jan 05
Posts: 337
Credit: 69,017,808
RAC: 32,690
Message 72002 - Posted: 16 Jul 2007, 1:47:08 UTC - in response to Message 71995.
Last modified: 16 Jul 2007, 1:48:49 UTC

1. The stated ranges seem appropriate
2. For me personally, the current deadlines are adequate

However, let me add the following:


  • All of my systems run 24/7
  • I have a broadband internet connection--always on
  • My slowest system is an AMD XP1700+
  • I do not micro-manage the BOINC Client
  • All my systems are connected to two projects (not all have the same two)
  • All my systems run WinXP Pro



My personal recommendation to the Devs is that (if possible), WUs having the below frequencies should only be sent to 'fast' systems, similar to how the S5R1 WUs were split out between fast and slow systems based on frequency range. I leave it to the Project to decide what a fast vs slow system is.


Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


[edit] strip excessive quote for readability [/edit]
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,204,255
RAC: 120,044
Message 72030 - Posted: 16 Jul 2007, 11:25:04 UTC - in response to Message 71995.


Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


I think the most urgent candidates are the "monster WUs" above (ca. 530?) Hz and up to about 650 credits.

What would be reasonable for them.

My idea is the following:
A reasonably fast PC (something that is not considered an oldtimer) should be able to crunch this thing if active for 5 days a week and 8 hours a day, getting 50 % of the CPU time for E@H during this period (allowing time for other BOINC projects and "real" work on this PC!), within the deadline, with 25 % safety margin .

Let's say a reasonably fast machine is a machine in the league of (say) 10 credits / hour.

==> 650 / 10 = 65 CPU hours per monster WU ~ 130 hours wall clock time while computer is on.

==> ca. 16 "working days" (without safety margin) or 20 working days with a 25 % safety margin.

==> Under the above assumptions which are even a bit optimistic IMHO, the "reasonable" deadline for the monster WU would be 4 calendar weeks, not two.

If anybody thinks this is too long, we would either have to find a bug in my math or identify the assumptions that are wrong :-).

CU

BRM


____________
Stick
Send message
Joined: 24 Feb 05
Posts: 788
Credit: 1,032,655
RAC: 1,783
Message 72038 - Posted: 16 Jul 2007, 12:45:09 UTC

Admittedly, I cut back my Einstein resource share about a year ago in favor of other projects. At that same time, I also cut back on my message board participation, so I am not "up to speed" on the issues anymore. That being said, I believe that a 2-week deadline for the "average S5R2 unit" is way too short and should be at least doubled. Doing so would bring Einstein deadlines more in line with those of other projects in which I participate. As to the question of "Longest Results", I would suggest: 6 weeks.
____________

Profile Svenie25
Send message
Joined: 21 Mar 05
Posts: 139
Credit: 2,436,862
RAC: 0
Message 72039 - Posted: 16 Jul 2007, 12:51:36 UTC

I don´t know how difficult it is, to implent a various deadline like at SETI. But this would be my favourite way. I personally don´t have any problems with the deadlines, running only 2 projects with 50/50 share on a 24/7 system.
____________

Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3611
Credit: 128,407,245
RAC: 56,163
Message 72041 - Posted: 16 Jul 2007, 13:23:45 UTC
Last modified: 16 Jul 2007, 13:25:32 UTC

Just two things to mention: Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,204,255
RAC: 120,044
Message 72045 - Posted: 16 Jul 2007, 13:34:08 UTC - in response to Message 72041.
Last modified: 16 Jul 2007, 13:35:03 UTC

Just two things to mention: Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM


To emphasize how serious the situation with the "monster WUs" is currently, let's estimate how many of the active hosts actually will be capable to crunch them in time:

For 650 credits within 14 days, you need on average of about 50 credits per day (allowing a day of buffer between download and the actual beginning of crunching).

Looking at boincstats.com, you will see that currently only about 32500 hosts participating in E@H have an average of 50 credits/day and above. Depending on the definition of "active host", there are about 60 .. 75k hosts active on E@H now. So we are talking about a third to half of the hosts that won't make it.

So I really do think that the "monster WUs" carry a dramatic potential for user frustration that could badly hurt the E@H user base if not acted upon.

CU

BRM

____________
Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 72058 - Posted: 16 Jul 2007, 17:27:24 UTC - in response to Message 71995.
Last modified: 16 Jul 2007, 17:37:12 UTC


.
.
.
Over a period now, many participants have aired concerns about the deadline stress that the longer running work is causing. People with slower machines and people supporting multiple projects are most affected.

Based on the information from the graph, it would appear that most of the concerns could be addressed if the following two frequency ranges were given an extended deadline of say 18 - 20 days instead of the standard 14 days.

1. 250 - 300 Hz
2. 450 - 550 Hz

.
.
.

Please try to limit yourselves to two questions:-

1. Are the identified frequencies appropriate?
2. What should the deadline days figure be?


1.) Yes, those are the ranges which give the most trouble for older hosts/multiple projects.

2.) Based on my observations, at least 3 weeks for the current app 'efficiency'.

@ Bernd: Agreed, loosening the tightness factor will increase the DB load at your end, however I think doubling it is the worst case scenario, and in practice would be far less than that.

OTOH, you have to measure that against the 'bad taste' in your mouth a participant gets on a slowhost when the project sends you a result that is marginal in meeting the deadline, turns out that it will overrun by a few days and then gets unconditionally aborted a few hours before completing because the reissue went to a small cache 'rocket' host.

IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.

AFA instant gratification goes, I think everybody knows my opinion on that. ;-)

Regards,

Alinator
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,204,255
RAC: 120,044
Message 72060 - Posted: 16 Jul 2007, 17:50:23 UTC - in response to Message 72058.


IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.


I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick.

CU

BRM

____________
Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 72061 - Posted: 16 Jul 2007, 17:56:29 UTC - in response to Message 72060.


IMO, even if you go to variable deadlines, you still need to consider the effect of the tightness factor on hosts which participate on multiple projects, since many complaints are about projects appearing to 'hog' the machine by people who are not completely up to speed about how debt and resource share works (or just don't care about the facts and want BOINC to do exactly what they want regardless of any other ramifications). Another area where tightness factor plays a role is for part time/variable time crunchers, regardless of how fast your host is.


I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick.

CU

BRM


Using RAC might work for EAH due to the pretty close 'uniformity' of the work in general.

However there are a couple of problems with RAC generally in this application.

1.) It tends to break down when the host is not run 24/7. IOW's, variable time crunchers would/may have a problem.

2.) WU failures would tend to have more impact on scheduling decisions than they really should for all hosts.

Alinator
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3611
Credit: 128,407,245
RAC: 56,163
Message 72071 - Posted: 16 Jul 2007, 19:42:33 UTC - in response to Message 72060.
Last modified: 16 Jul 2007, 19:49:24 UTC

I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM
Profile Svenie25
Send message
Joined: 21 Mar 05
Posts: 139
Credit: 2,436,862
RAC: 0
Message 72072 - Posted: 16 Jul 2007, 19:57:01 UTC - in response to Message 72071.

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


I think so. This should be the same system at SETI, I think. Ther is also a variable deadline from the size of the WU.
____________
Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 72073 - Posted: 16 Jul 2007, 20:09:26 UTC - in response to Message 72071.
Last modified: 16 Jul 2007, 20:13:03 UTC

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


Agreed, and when the scheduler decides whether or not to send a given WU to host it has the reported performance and other metrics for the host to use for that purpose.

However, keep in mind you would only 'need' to use variable deadlines if you intended to send the whole spectrum of template frequencies to all hosts so that the tightness factor was constant over that range.

If you were to stick with the 'slowhost/fasthost' method used in S5R1/I, then the main factor would be the trigger points for the ranges. Of course the downside to that with a fixed two week deadline is you will progressively raise the bar for who can participate as the work gets 'tougher'. IOW's slowhosts would only be able to run a smaller fraction of the work, all other things being equal.

So it seems to me that the choice of stategies boils down to how much the extra load from either method would impact the DB backend. Although I can't say for sure I would think that just bumping the deadline a week would have less effect than variable deadlines at fixed tightness factor would, since my data indicates it could easily take a host at the low end of the speed spectrum a month to run the 'toughies'.

<edit> Thinking about it, since we're Beta right now anyway, wouldn't it be easier to test the 3 week deadline theory at this point than variable deadlines with regard to DB load?

Alinator
Stick
Send message
Joined: 24 Feb 05
Posts: 788
Credit: 1,032,655
RAC: 1,783
Message 72074 - Posted: 16 Jul 2007, 20:12:05 UTC - in response to Message 72071.

I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM


That seems to be the way that SETI does it.

BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.
____________
Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 72076 - Posted: 16 Jul 2007, 20:15:01 UTC - in response to Message 72074.
Last modified: 16 Jul 2007, 20:15:56 UTC


BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.


You've just observed what I'm talking about when I speak of 'tightness factor'.

FWIW, EAH has normally been a tighter project than SAH historically speaking.

Alinator
ohiomike
Avatar
Send message
Joined: 4 Nov 06
Posts: 80
Credit: 6,453,639
RAC: 0
Message 72080 - Posted: 16 Jul 2007, 21:09:04 UTC
Last modified: 16 Jul 2007, 21:09:55 UTC

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.
____________

Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 72082 - Posted: 16 Jul 2007, 22:05:31 UTC - in response to Message 72080.
Last modified: 16 Jul 2007, 22:19:12 UTC

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


No offense meant, but issuing trailers by default is the last thing which should be considered, especially while we're in this beta phase and other possible scheduling issues have been observed.

My reasons:

1.) It means that any host not running a 5.10 client will end up wasting at least some of, up to most of it's time running scientifically useless results and therefore wasting the participants money spent on power.

2.) With a tight deadline project your host might find itself running a little late, but end up getting unconditionally aborted after having crunched most of the result, due to the third result coming in and validating. There are other twists to this scenario, and applies to 5.5 CC's and up (IIRC).

3.) The amount of time a result stays pending has zero long term impact on any of your performance metrics, regardless of the reason for it.

4.) The large host cache scenario, where 221 functionality works to mitigate the wasted time issue for always connected fast hosts, is really intended for people who are not always connected (ie notebooks and DU participants). Issuing trailers by default just to placate instant gratification unduly penalizes them due to Items 1 and 2. One only needs to look at Dr. Anderson's comments regarding this to see how the 'head man' feels about it, the cache decoupling feature was only recently released, and the jury is still out regarding whether it's a good or bad thing in the context of being available to the whole spectrum of participants. My guess is 221 functionality was added as well in order to prevent wholesale deadline blowing in extreme cache, short CI scenarios when running multiple projects.

Alinator
Profile KSMarksPsych
Volunteer moderator
Avatar
Send message
Joined: 15 Oct 05
Posts: 2473
Credit: 1,862,907
RAC: 5,749
Message 72097 - Posted: 17 Jul 2007, 8:27:46 UTC - in response to Message 72080.

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.



But doesn't that take a backend update as well (which is needed here)? I don't run Seti and I haven't been paying very close attention to all of the server side stuff that's come up in the last few weeks. I've had enough trouble keeping up with client side stuff.

On the topic of the thread, I personally am not running into deadline issues. But variable deadlines does seem to fit the bill here.
____________
Kathryn :o)
Profile Gary Roberts
Volunteer moderator
Send message
Joined: 9 Feb 05
Posts: 3768
Credit: 3,408,059,539
RAC: 3,959,116
Message 72106 - Posted: 17 Jul 2007, 11:50:02 UTC - in response to Message 72041.

Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.


If I understand things correctly (and that's a big if) I don't think this statement is necessarily true.

I would think that many of the result pairs - perhaps even the majority - get fully completed in less than 10 days with a 14 day deadline. I base this on an observation of many of my own results over time. I ask this question. If the deadline had been 28 days instead of 14, would all those people who are taking 10 days or less suddenly start taking 20 days? I wouldn't have thought so. In fact isn't it true to say that a simple increase in deadline would have no effect on those who are currently meeting the deadline unless they suddenly started running their machines less hours per day or suddenly reduced the resource share that they were prepared to allocate to EAH or suddenly did something silly like drastically increasing their cache size? My gut feeling is that whilst some may take some of these three actions, most wouldn't.

As far as waiting for validation and credits, I don't think there would be much change. The anecdotal evidence suggests that there is a significant drift of machines away from the project because the perception of the owners is that they can't abide the long crunch times and the strict deadlines. Many simply leave without completing what they have which means that work has to be reissued. In other words, the results so effected are going to take a long time to validate anyway. A longer deadline would encourage many of those people to "stick it out" which may actually reduce the total time for validation on quite a few results. Someone sticking to the job for 20 days is going to be faster than two people successively failing a 14 day deadline.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.


I think absolutely that this is the way to go. As the workunit generator knows the "size" and could calculate and insert an appropriate deadline, this course of action obeys the KISS principle. On your PDF graph you showed "size" in terms of crunch hours - 10, 20, 30, etc. It almost seems appropriate to change those into deadline days - 10 days, 20 days, 30 days. It wouldn't need to be a continuous function - you could put certain frequencies into "speed bins" and have a single deadline for each bin - whatever is easiest for the WU generator to do.


____________
Cheers,
Gary.
archae86
Send message
Joined: 6 Dec 05
Posts: 1758
Credit: 354,920,653
RAC: 577,424
Message 72110 - Posted: 17 Jul 2007, 13:27:21 UTC - in response to Message 72041.

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM

One comment: I've observed an undesirable side-effect on the short end of the current variable SETI deadlines.

For users running more than one project, one with low resource share can very easily trip into EDF processing when a new result is downloaded with a low predicted runtime and thus a very early deadline. On my machines, this effect is annoying. I run some at 2% SETI share, but when SETI has a server hiccup, during recovery my machine overfetches (a long-known bug), and if among the overfetch are some short ones, I get into Earliest Deadline First. If I run a queue of more than trivial length, soon some Einstein units are in EDF. I won't go down the path of arguing whether anyone should care about this--the simple fact is that a fair number of people do.

I'd think most of us would like the behavior of the variable deadline with size approach so long as the low end did not dip below something like ten days.

On the long end the biggest project risk I can see is that an unlucky WU which gets downloaded to sequence of machines which quit or invalidate could take even longer to finally get resolved than now. So the tail at the end of current campaign could take even longer and lead to even more massive multiple issuing in the end game.

On balance I think it a good idea. As a first guess I'd suggest the smallest units currently issued get ten days, and the largest currently issue get double the current deadline, with a linear scale between. No magic bullet this, but possibly a decent compromise among the considerations.

____________
1 · 2 · 3 · Next

Message boards : Cruncher's Corner : Petition - Deadline Relief for Longest Results


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen