S5R5 plans


Advanced search

Message boards : Cruncher's Corner : S5R5 plans

Sort
AuthorMessage
Filipe
Joined: Mar 10 05
Posts: 13
ID: 52787
Credit: 275,697
RAC: 77
Message 90858 - Posted 30 Oct 2008 19:06:31 UTC

Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4...
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 90864 - Posted 30 Oct 2008 21:38:09 UTC - in response to Message 90858.
Last modified: 30 Oct 2008 21:49:20 UTC

Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4...

Will definitely come. We're still testing and tweaking the setup. Some facts so far:
- slightly increased memory requirement
- larger "dwell time" per sky location. This is limiting the maximum checkpoint rate to about once per 3 min (on current average CPUs, longer on slower ones)
- workunits will run roughly half as long as S5R4 ones

BM

Profile MarkJ
Joined: Feb 28 08
Posts: 89
ID: 313109
Credit: 2,721,765
RAC: 3,849
Message 90887 - Posted 31 Oct 2008 11:55:10 UTC - in response to Message 90864.

Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4...

Will definitely come. We're still testing and tweaking the setup. Some facts so far:
- slightly increased memory requirement
- larger "dwell time" per sky location. This is limiting the maximum checkpoint rate to about once per 3 min (on current average CPUs, longer on slower ones)
- workunits will run roughly half as long as S5R4 ones

BM


Hi Bernd,

Can we use the current (605) power app, or will we need a new app/app_info?
____________
BOINC blog

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 90888 - Posted 31 Oct 2008 12:07:25 UTC - in response to Message 90887.

Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4...

Will definitely come. We're still testing and tweaking the setup. Some facts so far:
- slightly increased memory requirement
- larger "dwell time" per sky location. This is limiting the maximum checkpoint rate to about once per 3 min (on current average CPUs, longer on slower ones)
- workunits will run roughly half as long as S5R4 ones

BM


Hi Bernd,

Can we use the current (605) power app, or will we need a new app/app_info?


S5R5 will require new binaries that are currently under test.
CU
Bikeman

____________

John Clark
Avatar
Joined: May 4 07
Posts: 1063
ID: 258634
Credit: 1,411,736
RAC: 5,000
Message 90892 - Posted 31 Oct 2008 13:28:23 UTC - in response to Message 90888.
Last modified: 31 Oct 2008 13:31:13 UTC

S5R5 will require new binaries that are currently under test.
CU
Bikeman


I presume there will be a similar rundown of S5R4 and change over to S5R5 as there was for the start of the S5R4? Meaning those that want to initially stick to the S5R4 WUs can do so until these get scarce?

One driver will be the credit, and RAC, given between the R4 and R5. If similar then own choices, if the R5 is slightly better, then the run down of R4 may take longer than planned.
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 90894 - Posted 31 Oct 2008 15:12:31 UTC - in response to Message 90887.
Last modified: 31 Oct 2008 15:14:06 UTC

Can we use the current (605) power app, or will we need a new app/app_info

The S5R5 Windows App will feature the code that makes the S5R4 6.05 App as fast as it is.

There is no change in crediting targeted for S5R5.

BM

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 90895 - Posted 31 Oct 2008 15:28:16 UTC

Also, the S5R5 run will work on the existing S5R4 datafiles, so the transition form S5R4 to S5R5 should be much smoother than the previous one from S5R3 to S5R4 where it was not possible to re-use downloaded datafiles from the previous run.

CU
Bikeman
____________

John Clark
Avatar
Joined: May 4 07
Posts: 1063
ID: 258634
Credit: 1,411,736
RAC: 5,000
Message 90900 - Posted 31 Oct 2008 18:13:42 UTC - in response to Message 90895.

Also, the S5R5 run will work on the existing S5R4 datafiles, so the transition form S5R4 to S5R5 should be much smoother than the previous one from S5R3 to S5R4 where it was not possible to re-use downloaded datafiles from the previous run.

CU
Bikeman


That sounds good then!
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Benjamin Rietveld
Joined: May 14 06
Posts: 43
ID: 194611
Credit: 57,996
RAC: 0
Message 90901 - Posted 31 Oct 2008 18:18:39 UTC

Partly on topic:

If this run hits, what's the best way to get rid of the optimized application? Resetting the project is what I usually do, but there must be better ways
____________

RandyC
Avatar
Joined: Jan 18 05
Posts: 319
ID: 3454
Credit: 1,949,162
RAC: 1,872
Message 90911 - Posted 31 Oct 2008 23:50:34 UTC - in response to Message 90901.

Partly on topic:

If this run hits, what's the best way to get rid of the optimized application? Resetting the project is what I usually do, but there must be better ways


Drain your queue (set no-new-work)
Report results when it's empty
Shutdown BOINC (make sure it's down all the way)
Remove app_info.xml file
Restart BOINC and enable new work

Profile Yin Gang
Joined: Feb 23 05
Posts: 40
ID: 33322
Credit: 1,304,842
RAC: 825
Message 90969 - Posted 2 Nov 2008 2:10:45 UTC
Last modified: 2 Nov 2008 2:11:43 UTC

So...the result of S5R4 would still make any sense? If not, I'd like to stop running this project until S5R5 comes out.

YG
____________


Welcome To Team China!

Profile nevermore
Joined: Feb 14 06
Posts: 2719
ID: 171869
Credit: 1,388,406
RAC: 0
Message 90977 - Posted 2 Nov 2008 3:55:24 UTC - in response to Message 90969.

So...the result of S5R4 would still make any sense? If not, I'd like to stop running this project until S5R5 comes out.

YG


I believe all of the science runs are valid and useful to some degree. I suggest completing the run.
____________

Tombei the Mist
Joined: Mar 6 07
Posts: 11
ID: 248689
Credit: 141,764
RAC: 0
Message 90981 - Posted 2 Nov 2008 8:19:56 UTC

Well done, that's excellent that the new application will have shorter running tasks. I hope you reduce the deadline accordingly. I would suggest 7 days instead of the current 18 days would help greatly to reduce the length of time that some tasks remain pending.

7 days would be long enough for even the slowest computers to complete the new shorter tasks. I know there are some who still use museum pieces for fun but there comes a time when they are no longer suited for distributed computing for electricity/carbon reasons.

A shorter deadline also helps to minimise total pending time when tasks remain unsent for 7-10 days.

It also minimises the delay caused by those who download a large number of tasks, only complete a few and then detach and reattach and repeat this procedure.

[B^S] Elphidieus
Joined: Feb 20 05
Posts: 162
ID: 21268
Credit: 6,917,803
RAC: 22,459
Message 90990 - Posted 2 Nov 2008 12:43:38 UTC - in response to Message 90981.

Well done, that's excellent that the new application will have shorter running tasks. I hope you reduce the deadline accordingly. I would suggest 7 days instead of the current 18 days would help greatly to reduce the length of time that some tasks remain pending.

7 days would be long enough for even the slowest computers to complete the new shorter tasks. I know there are some who still use museum pieces for fun but there comes a time when they are no longer suited for distributed computing for electricity/carbon reasons.

A shorter deadline also helps to minimise total pending time when tasks remain unsent for 7-10 days.

It also minimises the delay caused by those who download a large number of tasks, only complete a few and then detach and reattach and repeat this procedure.


A seven-day deadline is way too short as it will be detrimental (pardon my strong word) to those multi-core crunchers like me who would download loads of workunits to last a week or two on less-accessible-yet-automated machines.

Can't wait to get credited on your work...?

Tombei the Mist
Joined: Mar 6 07
Posts: 11
ID: 248689
Credit: 141,764
RAC: 0
Message 90992 - Posted 2 Nov 2008 13:28:58 UTC
Last modified: 2 Nov 2008 14:12:22 UTC

I don't mind waiting a few weeks but a month or more is excessive on any project.

I understand that many who download a lot of work at once complete it all but there are also others with hidden computers who make a habit of downloading many tasks, wait until their wingmen have completed them and then only complete the faster tasks. Not only is this against the spirit of crunching but it is wasteful of server resources and bandwidth.

Brian Silvers
Joined: Aug 26 05
Posts: 782
ID: 103927
Credit: 282,700
RAC: 0
Message 91002 - Posted 2 Nov 2008 17:43:48 UTC - in response to Message 90992.

I don't mind waiting a few weeks but a month or more is excessive on any project.

I understand that many who download a lot of work at once complete it all but there are also others with hidden computers who make a habit of downloading many tasks, wait until their wingmen have completed them and then only complete the faster tasks. Not only is this against the spirit of crunching but it is wasteful of server resources and bandwidth.


The way the datasets are distributed, if you reduce to a 7-day deadline you will more than likely significantly increase the amount of "backfill" downloading that goes on. This means that there will be more downloading of 70MB+ groups of files. This will irritate those still on dialup.

If runtimes are indeed halved, then the minimum deadline should go to 9 days, since that is half of the current 18. My suggestion though would be to return to the original 14-day deadline.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 91018 - Posted 2 Nov 2008 21:47:30 UTC

Several weeks ago the "pending credit" situation was rather bad as reported in this thread, but now, at least for me, it's OK again. If it stays like it is now I can live with a 18 or 14 day deadline. I guess the fact that ATLAS stopped crunching for some time when the servers got overloaded must have contributed to the massive increase of pending credits ?

CU
Bikeman

____________

archae86
Joined: Dec 6 05
Posts: 569
ID: 139940
Credit: 5,757,826
RAC: 9,250
Message 91030 - Posted 2 Nov 2008 23:19:23 UTC - in response to Message 91018.

I guess the fact that ATLAS stopped crunching for some time when the servers got overloaded must have contributed to the massive increase of pending credits ?
Not directly by waiting for it, I think. It seems to me the primary symptom was that it became common for many days to go by between first issue of a result from a WU and issue of the first quorum partner result in that same WU. No matter how promptly everyone processes the results they receive, that situation gives trouble.

It seems to have become less common recently, though my tiny fleet is not a big enough sample to say that with any assurance. Even in that fleet I spotted some 5 day delays within the last week.

Perhaps ATLAS contributed indirectly: instead of waiting for it our waits were a consequence of the scheduler's poor response to its presence.

____________

Brian Silvers
Joined: Aug 26 05
Posts: 782
ID: 103927
Credit: 282,700
RAC: 0
Message 91033 - Posted 2 Nov 2008 23:28:36 UTC - in response to Message 91030.

No matter how promptly everyone processes the results they receive, that situation gives trouble.


Could you please state the "trouble" it gives? Help me understand why there is a problem...
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 91048 - Posted 3 Nov 2008 9:59:46 UTC
Last modified: 3 Nov 2008 10:50:36 UTC

The "deadline" is something that can rather easily be adjusted on-the-fly during a run (whereas the average workunit duration is not). With current workunits the average is set to be 18d, for what I recall from previous discussions I think most people feel comfortable with about 14d for 6-8h WUs.

BM

Brian Silvers
Joined: Aug 26 05
Posts: 782
ID: 103927
Credit: 282,700
RAC: 0
Message 91063 - Posted 3 Nov 2008 16:18:21 UTC - in response to Message 91048.
Last modified: 3 Nov 2008 16:21:02 UTC

The "deadline" is something that can rather easily be adjusted on-the-fly during a run (whereas the average workunit duration is not). With current workunits the average is set to be 18d, for what I recall from previous discussions I think most people feel comfortable with about 14d for 6-8h WUs.

BM


Perhaps people don't remember the history and the reasoning for going up to 18 days. There used to be a lot of complaining about "Einstein" tasks going into Earliest Deadline First (now called "High Priority") when tasks were set to 14 days. The common misconception was that it is the project doing this and that the participants' resource allocation selection is not being honored. People made complaints about Einstein "hogging" their CPU. The reality is that BOINC was/is doing it to try to make sure that work is returned on time and that resource allocations are honored over the long-term, but perhaps not on an hour-by-hour basis.

I view a lot of the "I have way too much pending credit" discussion in a similar light. Is it a "problem"? I guess it could be, if it is significant enough to cause a sizeable amount of participants to stop processing tasks because they feel they are not being rewarded in a timely fashion. Beyond that, it is up to you and the rest of the project team to determine whether or not you need to be getting results in faster.

The deadline needs to be set to something "reasonable". When I requested the increase to 18-21 days, the condition I put on it was that it should be increased until such time as the SSE (and other) enhancements made it into the stock Windows application. Since that time has arrived, 14 days is probably a good choice again. Due to workunit distribution methods, I'm not sure that going lower than that will have the substantial "relief" hoped for by the people who are upset about pending credit and/or unsent tasks.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 91065 - Posted 3 Nov 2008 16:30:42 UTC - in response to Message 91063.

I guess also ATLAS has to be taken into account, I could imagine that there will be periods when E@H jobs get very little CPU time and others when ATLAS is highly productive for E@H, for lack of other jobs. Probably hard to predict and highly irregular. If the deadline is too short there might be mass failures to meet the deadline by the several thousand ATLAS cores. But 14 days seem reasonable.

CU
Bikeman


____________

Brian Silvers
Joined: Aug 26 05
Posts: 782
ID: 103927
Credit: 282,700
RAC: 0
Message 91066 - Posted 3 Nov 2008 16:52:15 UTC - in response to Message 91065.

If the deadline is too short there might be mass failures to meet the deadline by the several thousand ATLAS cores. But 14 days seem reasonable.


It more than likely is reasonable. Personally, I doubt going below 12 days would help, and in fact, I think it may harm things. 14 is a good place to start at. Part of the problem now is the faster Windows app is still not the stock app. A combination of getting the faster app to the general user base along with the reduction to 14 days should be the first step taken to see if it makes a significant dent in the pending / unsent issue. If it does not, then it is up to the project to decide if that issue is a high enough risk to warrant any other action to be taken to address the situation.


____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 91068 - Posted 3 Nov 2008 17:03:01 UTC - in response to Message 91066.

A combination of getting the faster app to the general user base along with the reduction to 14 days should be the first step taken to see if it makes a significant dent in the pending / unsent issue.


Is there still an issue? The "oldest unsent result" is now back to the 7 days (it used to be like that as far as I can remember), and pending credits have been reduced likewise (at least that's my experience). I think it's more or less "normal" again.

CU
Bikeman

____________

Brian Silvers
Joined: Aug 26 05
Posts: 782
ID: 103927
Credit: 282,700
RAC: 0
Message 91071 - Posted 3 Nov 2008 18:42:39 UTC - in response to Message 91068.


Is there still an issue? The "oldest unsent result" is now back to the 7 days (it used to be like that as far as I can remember), and pending credits have been reduced likewise (at least that's my experience). I think it's more or less "normal" again.


There are still people complaining about it, so there is an "issue" of some sort, either real or perceived...
____________

Profile MarkJ
Joined: Feb 28 08
Posts: 89
ID: 313109
Credit: 2,721,765
RAC: 3,849
Message 91096 - Posted 4 Nov 2008 10:35:42 UTC - in response to Message 90911.

Partly on topic:

If this run hits, what's the best way to get rid of the optimized application? Resetting the project is what I usually do, but there must be better ways


Drain your queue (set no-new-work)
Report results when it's empty
Shutdown BOINC (make sure it's down all the way)
Remove app_info.xml file
Restart BOINC and enable new work


Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least.
____________
BOINC blog

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 91099 - Posted 4 Nov 2008 12:15:53 UTC - in response to Message 91096.

Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least.

Currently there's a telecon scheduled for tomorrow which covers the subject. Once the final decision is made actually starting the run is a matter of days.

I'll keep you posted.

BM

John Clark
Avatar
Joined: May 4 07
Posts: 1063
ID: 258634
Credit: 1,411,736
RAC: 5,000
Message 91101 - Posted 4 Nov 2008 12:33:28 UTC

Thanks Brenard
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

AnRM
Joined: Feb 9 05
Posts: 213
ID: 9811
Credit: 3,053,004
RAC: 0
Message 91109 - Posted 4 Nov 2008 17:12:54 UTC
Last modified: 4 Nov 2008 17:28:22 UTC

Merci, Bernd........or is it 'mercy'??....;)....Cheers, Rog.
____________

Tombei the Mist
Joined: Mar 6 07
Posts: 11
ID: 248689
Credit: 141,764
RAC: 0
Message 91142 - Posted 5 Nov 2008 13:28:54 UTC

Perhaps my ambit claim of 7 days deadline was a bit ambitious, but it was not my intention to cause offence. I still believe a shorter deadline if possible is better for any quorum project than a longer one, particularly if the project wishes to retain a higher percentage of new contributors. When the unsent time rises it's like a double whammy. However it seems there are valid user and server reasons to justify a 2 week deadline so that's fair enough.

The excellent part of this is that shorter running tasks and an included optimised Windows application will be much better and should help those with slower computers who wish to contribute and also please Windows users. I like them shorter too, thank you.

I have noticed that the running time of the current Einstein tasks can vary on my computer by up to about 27%. Will the new S5R5 tasks also vary the same as the current tasks? I was just wondering if the larger "dwell time" per sky location would change this running time variability.

Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 91145 - Posted 5 Nov 2008 14:06:46 UTC
Last modified: 5 Nov 2008 14:07:51 UTC

I have noticed that the running time of the current Einstein tasks can vary on my computer by up to about 27%. Will the new S5R5 tasks also vary the same as the current tasks?

The variation is quite normal, there is a lot of info and graphs in the "How to check Performance when Testing a new App" thread.

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 91146 - Posted 5 Nov 2008 14:09:00 UTC - in response to Message 91142.


I have noticed that the running time of the current Einstein tasks can vary on my computer by up to about 27%. Will the new S5R5 tasks also vary the same as the current tasks? I was just wondering if the larger "dwell time" per sky location would change this running time variability.


Runtime variation will not disappear in S5R5, if anything it will get a bit more pronounced and certainly less predictable (greater "wiggles" in the runtime graphs discussed in other threads here). This is because there will be fewer sky-points per WU to average out some runtime-irregularities over the course of a single WU.

It *might* be possible to model the credits per WU somewhat more realistically, so that WUs that take longer will be awarded some more credits, not sure this gets implemented in time for S5R5 tho, we'll see.

CU
Bikeman

____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 91150 - Posted 5 Nov 2008 15:18:23 UTC - in response to Message 91099.
Last modified: 5 Nov 2008 15:22:26 UTC

Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least.

Currently there's a telecon scheduled for tomorrow which covers the subject. Once the final decision is made actually starting the run is a matter of days.

I'll keep you posted.

BM

Directly from telecon: Further (internal) testing / simulations needed, expected timeframe for this is another week.

BM

Tombei the Mist
Joined: Mar 6 07
Posts: 11
ID: 248689
Credit: 141,764
RAC: 0
Message 91153 - Posted 5 Nov 2008 16:22:46 UTC
Last modified: 5 Nov 2008 16:24:07 UTC

Thank you both for your replies, Winternight and Bikeman.

Apologies Winternight I did not mean to imply that I did not know that variation was normal, I was just trying to phrase it simply and did it poorly. I have read some of the runtime analysis threads, though I never managed to check the 49 consecutive tasks I did 3 weeks ago.

If "runtime-irregularities over the course of a single WU" means that they process faster at some times and slower at other times, then I have noticed this variation in the KBoincSpy "Instantaneous processing speed" value. I don't know if this value is accurate at all though because it seems to vary quite often and by a large amount.

The runtime variation being a bit more pronounced will not concern me because the total runtime will be half as long on average. So even if the percentage variation is higher the time taken by the variation should be less.

I'm keen to get stuck into some of these S5R5 tasks. I hope the tests and simulations are successful then I can go, go, go.

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 91199 - Posted 6 Nov 2008 18:44:19 UTC

Hmmmm...

I've been busy lately with other priorities, and so have not been following along with EAH developments closely.

I'm confused on a couple of points in this thread though.

1.) First off, isn't it a little early to be worrying about R5 at this point? AFAICT, there is still over a year to go for R4 according to the Status page.

2.) Second, what's the story with the bug mentioned in the OP? I can't seem to find any reference in the fora to a problem which would lead to R4 run to be canceled in favor of going straight to R5.

Alinator

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 91202 - Posted 6 Nov 2008 19:10:17 UTC - in response to Message 91199.

Hmmmm...

I've been busy lately with other priorities, and so have not been following along with EAH developments closely.

I'm confused on a couple of points in this thread though.

1.) First off, isn't it a little early to be worrying about R5 at this point? AFAICT, there is still over a year to go for R4 according to the Status page.

2.) Second, what's the story with the bug mentioned in the OP? I can't seem to find any reference in the fora to a problem which would lead to R4 run to be canceled in favor of going straight to R5.

Alinator

It's in a post by Bernd buried in the middle of the v6.05 power app thread.

Benjamin Rietveld
Joined: May 14 06
Posts: 43
ID: 194611
Credit: 57,996
RAC: 0
Message 91204 - Posted 6 Nov 2008 19:58:52 UTC

maybe redundant by now, but the answer to question one is that S5R5 will analyze the S5R4-data, but faster and without bugs, so they're not going to wait until this run finishes, but replace it with the new run/application.
At least that's what I got from following the discussions a bit ^_^
____________

ML1
Joined: Feb 20 05
Posts: 154
ID: 24273
Credit: 2,789,037
RAC: 3,928
Message 91245 - Posted 8 Nov 2008 1:54:58 UTC

Do we get native 64-bit apps for this run?

(Linux, Mac, and Windows?)

Regards,
Martin

____________
Powered by Mandriva Linux A user friendly OS!
See the Boinc HELP Wiki

Benjamin Rietveld
Joined: May 14 06
Posts: 43
ID: 194611
Credit: 57,996
RAC: 0
Message 91388 - Posted 10 Nov 2008 19:08:58 UTC - in response to Message 91150.

Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least.

Currently there's a telecon scheduled for tomorrow which covers the subject. Once the final decision is made actually starting the run is a matter of days.

I'll keep you posted.

BM

Directly from telecon: Further (internal) testing / simulations needed, expected timeframe for this is another week.

BM

So how is it coming along? Can we expect it in two days? (5 november + 7 days = 12 november ^^) or is it going to take longer?
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 92367 - Posted 3 Dec 2008 16:31:47 UTC
Last modified: 3 Dec 2008 16:32:08 UTC

Update:

Simulations show that with the currently planned (and preliminarily implemented) S5R5 setup we would miss some signals. More tuning needed, will take at least another week.

BM

Filipe
Joined: Mar 10 05
Posts: 13
ID: 52787
Credit: 275,697
RAC: 77
Message 92375 - Posted 3 Dec 2008 20:01:07 UTC

Thanks for the update bernd.
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 93691 - Posted 7 Jan 2009 9:18:06 UTC - in response to Message 92367.

Hi Bernd,

Simulations show that with the currently planned (and preliminarily implemented) S5R5 setup we would miss some signals. More tuning needed, will take at least another week.


Now that Christmas/New Year distractions are over and the people that matter are getting back to work, I presume that some action on S5R5 is probably close at hand. Also, as announced in the Windows 6.10 thread

I just made this App "official". This gives you the opportunity now to switch back to the "official" path (you should empty your work cache before removing the app_info.xml), which I would recommend. This will allow you to get ABP1 and S5R5 work right away when we issue it.


things seem to be hotting up for a "sooner rather than later" bit of action.

It would be very good if you could provide some details on how you see the transition from S5R4 to S5R5 actually happening. Here are some specific questions:-

1. Will there be a sudden termination of S5R4 tasks - ie server will suddenly have zero S5R4 tasks to issue or will there be a transition during which both types will be available?

2. Will there be any point in completing cached work on clients? Presumably the answer to this would be "yes" if there is to be some sort of transition?

3. If somebody has a large cache would it be advantageous to reduce it now in anticipation of S5R5?

4. Will you be attempting to complete all "open" quorums by reissuing tasks when already issued ones error out or fail to return by the deadline?

Any detailed information you can share would be appreciated, thanks.

____________
Cheers,
Gary.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93692 - Posted 7 Jan 2009 10:17:52 UTC - in response to Message 93691.
Last modified: 7 Jan 2009 10:32:49 UTC

Hi Gary!

Yes, with a bit of luck (i.e. if we don't find a problem in the very last minute) we'll start S5R5 around the weekend. We were just waiting for a green light that we got last night.

Due to some remaining uncertainties we will restrict the frequency range of S5R5 to 1000Hz and then take a look at the results to decide whether we should push it up to the 1250Hz of the current S5R4. This reduces the projected total runtime of (the possibly first part of) S5R5 to about 6 months. We are working on some hoefully even more sensitive and efficient analysis methods (improving the "Hough transform"). If a new program is ready to be used by then, we'll also not extend S5R5, but start an S5R6.

The only consequence of this decision right now is that clients currently running S5R4 workunits above 1000Hz will need to get a new set of data files, while the others will just get S5R5 work for the data files they already have.

1. Will there be a sudden termination of S5R4 tasks - ie server will suddenly have zero S5R4 tasks to issue or will there be a transition during which both types will be available?

2. Will there be any point in completing cached work on clients? Presumably the answer to this would be "yes" if there is to be some sort of transition?

3. If somebody has a large cache would it be advantageous to reduce it now in anticipation of S5R5?

4. Will you be attempting to complete all "open" quorums by reissuing tasks when already issued ones error out or fail to return by the deadline?


We'll stop the S5R4 workunit generator, but the workunits generated so far will be finished and credited. I'm not sure that pushing the last ones through is necessary, so I probably won't put time into this.

There is no (intentional) change in the crediting, so wrt the credit it shouldn't matter whether you run S5R4 or S5R5 workunits. The S5R5 ones will run a bit shorter (design goal was 50%, but I'm afraid with the adjustments we had to make afterwards we missed it by about 10%).

BM

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 93726 - Posted 8 Jan 2009 6:39:03 UTC

Perhaps it's also worth mentioning that even tho WUs will finish much faster on average, the relative runtime variation (ratio between longest and shortest runtime on the same host for different WUs) will *increase*. So it will require averaging over even more WUs than in S5R4 to really estimate the true average runtime on your systems: don't be too excited/disappointed when the first few WUs run much faster/slower than expected ;-).

It will be interesting to see how different CPU models will cope with the new run: even tho the apps are almost identical, search parameters will be different which means that the memory intensive pattern recognition part of the code (the Hough Transform already mentioned above ) will claim a larger share of overall runtime compared to the floating point arithmetic intensive "digital signal processing" part of the program. I see lots of new diagrams and benchmarking ahead :-)

CU
Bikeman

____________

th3_1rzt
Joined: Aug 24 06
Posts: 208
ID: 210060
Credit: 1,948,393
RAC: 6,210
Message 93825 - Posted 11 Jan 2009 14:11:07 UTC

the relative runtime variation ... will *increase*

More unpredictable, just what i need. I will follow the findings of the "runtime variance diagram"-guys but for now i switch.
____________
Team Philippines

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93854 - Posted 12 Jan 2009 7:52:09 UTC - in response to Message 93825.
Last modified: 12 Jan 2009 7:52:53 UTC

the relative runtime variation ... will *increase*

More unpredictable, just what i need. I will follow the findings of the "runtime variance diagram"-guys but for now i switch.

Actually it's more predictable than ever. Thanks to the work of Bikeman the granted credit will more accurately match the runtime variation than ever before. Also too as the total Task runtime will be halved, even a 30% variation will be smaller in absolute time than before. Finally I think we got the floating point estimation better than ever and update the progress counter more frequently in the S5R5 App, so the client should be able to estimate the runtime more accurately.

BM

th3_1rzt
Joined: Aug 24 06
Posts: 208
ID: 210060
Credit: 1,948,393
RAC: 6,210
Message 93857 - Posted 12 Jan 2009 8:49:05 UTC

Good to hear that at least the credit/hr will be more stable then, if performance can be evaluated out from credits granted then the runtime variation isnt so bad... Looking even more forward to see some S5R5 results then.
____________
Team Philippines

Profile MarkJ
Joined: Feb 28 08
Posts: 89
ID: 313109
Credit: 2,721,765
RAC: 3,849
Message 93859 - Posted 12 Jan 2009 9:00:25 UTC - in response to Message 93854.

the relative runtime variation ... will *increase*

More unpredictable, just what i need. I will follow the findings of the "runtime variance diagram"-guys but for now i switch.

Actually it's more predictable than ever. Thanks to the work of Bikeman the granted credit will more accurately match the runtime variation than ever before. Also too as the total Task runtime will be halved, even a 30% variation will be smaller in absolute time than before. Finally I think we got the floating point estimation better than ever and update the progress counter more frequently in the S5R5 App, so the client should be able to estimate the runtime more accurately.

BM


Okay when are you planning on switching over? I've deleted app_info in anticipation, but it seems happy to use the 6.10 app with S5R4's still coming down at the moment.
____________
BOINC blog

Profile paul milton
Avatar
Joined: Sep 16 05
Posts: 191
ID: 109635
Credit: 435,032
RAC: 1,374
Message 93862 - Posted 12 Jan 2009 10:35:25 UTC - in response to Message 93859.



Okay when are you planning on switching over? I've deleted app_info in anticipation, but it seems happy to use the 6.10 app with S5R4's still coming down at the moment.


good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less.
____________
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93863 - Posted 12 Jan 2009 11:30:06 UTC - in response to Message 93862.

good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less.

If the currents S5R4 are 40h for you, S5R5 should be 20h on average.

We're running some final tests, S5R5 should start today or tomorrow if we don't find any oddities.

BM

Profile MarkJ
Joined: Feb 28 08
Posts: 89
ID: 313109
Credit: 2,721,765
RAC: 3,849
Message 93864 - Posted 12 Jan 2009 12:07:00 UTC - in response to Message 93863.

good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less.

If the currents S5R4 are 40h for you, S5R5 should be 20h on average.

We're running some final tests, S5R5 should start today or tomorrow if we don't find any oddities.

BM


Great, thanks for the reply.

Mine take around 8 hours each so that means they should be around 4 hours each.
____________
BOINC blog

[B^S] Elphidieus
Joined: Feb 20 05
Posts: 162
ID: 21268
Credit: 6,917,803
RAC: 22,459
Message 93868 - Posted 12 Jan 2009 13:12:38 UTC

So if my credit per WU goes 222 cobblestones, it means it should be around 222 as well....? (Fingers crossed)
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 93884 - Posted 12 Jan 2009 17:48:58 UTC - in response to Message 93868.
Last modified: 12 Jan 2009 17:52:09 UTC

So if my credit per WU goes 222 cobblestones, it means it should be around 222 as well....? (Fingers crossed)


Not quite sure what you mean. There will still be a significant gap between claimed credit and granted credit (claimed credit is just ignored by E@H anyway, credit is fixed at the server level).

In S5R1..S5R4 all WU of a certain frequency range got you the exact same credit, like, say, 222 credits. However, some of them would take (say) 10 hours to complete and others (say) 12 hours.

With S5R5, workunits are shorter, but the relative variation is expected to be greater, so, say, one WU would finish in 6 hours (for say 111 credits) and the other in maybe 3.5 hours.

To award the same credits to WUs with such a big variation is probably not very well received by users, even if the average credits/h would (over a long time) be the same.

So an attempt was made to actually try to award credits according to the complexity of the WUs. Longer WUs will get more credits and shorter ones less.

This can only be an approximation, and may need some adjustments after seeing how well this works out for a) different WU frequency ranges and b) different hardware. For the "theory" of the runtime variations, see the different threads on performance measurement and the "ready reckoner" here in this forum that provided the input for this approximation. (Gary, Mike, Richard Haselgrove and archae86 provided a lot of insight and data there).

CU
Bikeman
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 93886 - Posted 12 Jan 2009 18:05:21 UTC

Luckily both my systems returned their latest tasks just today and BOINC was giving the CPU time back to the other projects. So in eager anticipation I've put Einstein on NNT on both systems and in a bit I'll remove the app_info.xml files for the various Power Apps they run, so that when I re-allow work tomorrow, I'll only have to do a reset prior to that and hope to sit back and await the new downloads. ;-)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Svenie25
Joined: Mar 21 05
Posts: 117
ID: 62489
Credit: 538,288
RAC: 13
Message 93922 - Posted 13 Jan 2009 16:17:11 UTC

The new applications are in the pipe since yesterday afternoon. Now we only needs the new work for them. ;)
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93932 - Posted 13 Jan 2009 17:52:36 UTC
Last modified: 13 Jan 2009 18:11:21 UTC

The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched.

The server status page has been updated. BTW ABP1 (currently under "previous searches") lists the workunits in the active database from the Arecibo Binary Pulsar search; currently the ones remaining from the test in December. The time estimations for S5R5 look a bit absurd, but that usually settles for some reasonable value after the WUG ran for a few days.

The runtimes of S5R5 tasks should be half the one you know from S5R4, but don't judge this after a few tasks - the runtime variation between different tasks of the same frequency is larger in S5R5.

We intend to keep the credit level (credits per CPU hour) the same as in S5R4, but due to the rather short internal testing we relied on some educated guessing. We think we got it pretty close, but might need to make some adjustment to that in a week or so.

BM

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 93933 - Posted 13 Jan 2009 18:07:31 UTC - in response to Message 93932.

The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched.

And so I allowed for some extra minutes between its launch and me re-allowing work. Got an S5R4 of course. ;-)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

rroonnaalldd
Joined: Dec 12 05
Posts: 101
ID: 146004
Credit: 450,418
RAC: 444
Message 93935 - Posted 13 Jan 2009 18:20:42 UTC

Yes, three hosts three times re-allowing new work and got 3 times S5R4-units ;-)
____________

samuel7
Joined: Feb 16 05
Posts: 26
ID: 17704
Credit: 740,440
RAC: 296
Message 93937 - Posted 13 Jan 2009 18:49:27 UTC

And same here: got two S5R4 reissues. I think it'll depend on how many unsent results for your frequency range there were and how quickly you and your data file partners can pick up the old results. The oldest unsent result in database is now just less than 7 days old.

The deadline is 14 days.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93938 - Posted 13 Jan 2009 18:51:24 UTC

Yes, the S5R4 workunits will remain the majority for the next few days. We won't cancel them so people will get credit for them; this will also help our servers to cope with the transition.

BM

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 93939 - Posted 13 Jan 2009 19:01:45 UTC - in response to Message 93938.

Go figure. My AMD got an S5R5 out-of-the-box.

13-Jan-09 19:58:36|Einstein@Home|Started download of einstein_S5R5_3.01_windows_intelx86_0.exe
13-Jan-09 19:58:37|Einstein@Home|Finished download of einstein_S5R5_3.01_windows_intelx86.exe
13-Jan-09 19:59:15|Einstein@Home|Started download of skygrid_0640Hz_S5R5.dat
13-Jan-09 19:59:17|Einstein@Home|Finished download of skygrid_0640Hz_S5R5.dat

Let's see how quickly it can rip through it. :-)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

samuel7
Joined: Feb 16 05
Posts: 26
ID: 17704
Credit: 740,440
RAC: 296
Message 93940 - Posted 13 Jan 2009 19:11:07 UTC - in response to Message 93938.

Yes, the S5R4 workunits will remain the majority for the next few days. We won't cancel them so people will get credit for them; this will also help our servers to cope with the transition.

BM


Indeed it helps spread out the new application downloads and is of course the best way to handle the transition. I'm just eager to start the new run on my machines but will do my part to complete the S5R4 run. Looks like about 40 results to be done for my laptop with no partners in sight. Now have to check the quad...

Thanks Bernd and here's hoping for a smooth changeover!


____________

Lee Venters
Joined: Oct 21 08
Posts: 2
ID: 377809
Credit: 35,179
RAC: 408
Message 93946 - Posted 13 Jan 2009 21:59:00 UTC

I have 2 E@H S5R4 tasks left to do... one of them is almost done (another 29h+)and the second one will take about 52h+
Maybe one of these or both will match up with Samuel7's.
I do crunching for Einstein, Rosetta and SETI. It's interesting (and intriguing), but I'm not doing it for the credits to be rewarded.
Maybe the next ones I get from E@H will be the S5R5's, since they are ready.
The 14 day or 18 day deadline is fine with me. I do prefer longer deadlines, because SETI does send out some long WU's with a shorter deadline, so BOINC picks that one to run at high priority. (Or Rosetta, depending on the expected deadline... but Rosetta's WU's are usually between 2h and 4h.)
I'm just happy that I can be of service to the scientific community.
Thank you for all of the updates.

Lee

archae86
Joined: Dec 6 05
Posts: 569
ID: 139940
Credit: 5,757,826
RAC: 9,250
Message 93949 - Posted 13 Jan 2009 22:24:26 UTC
Last modified: 13 Jan 2009 22:41:00 UTC

I just transitioned three hosts from ap_info running 6.05 to stock (thus accepting S5R4 on 6.10 plus S5R5 on 3.01)

Here are a couple of observations from my experience, in case they may reduce surprise to others

work availability
Of the three hosts, the Duo got two S5R4's and the first Quad 4 S5R4's on first fetch. But the second Quad got two S5R4's and two S5R5's. I then opened up my requested queue size a little, and both Quads got several more sequential S5R5's. The S5R5s came to the same frequency range as the S5R4.

Task Duration
Even though my work queue is small and has sequential work, the predicted completion times for unstarted work vary appreciably.

On the Q6600
730.05 1102 predicted 7:50:43
730.05 1098 predicted 7:40:36

On the Q9550
748.6 1133 predicted 4:23:12
748.6 1123 predicted 4:09:38

I can't yet give any observation on prediction vs. reality, save that the one S5R5 currently executing is indeed clearly running faster than S5R4, though nowhere near twice as fast (perhaps I happen to be near a peak, however)

what to delete and what is downloaded
On my first host to change over, I thought to be clever and delete the 6.05 ap as well as the ap_info. The result was that an entry in one of the config files triggered an attempt to re-download it (even though not needed), and of course it had no place to get it--one minute loop. A project reset fixed that.

For my other two I followed directions and only deleted the ap_info. Both started right up. Of course the first thing to do was considerable downloading (_0, _1, and _2 ap, and graphics files for S5R4, a skygrid or two and several of the 4 Mbyte frequency-specific files, plus more files for S5R5). So the total download in the first few minutes for my Q6600 was just over 80 Mbytes. The servers supplied at splendid rates, however, and no retries were required.

Deadlines

As noted, the S5R5 work is coming with 14 day deadlines, while newly downloaded S5R4 remains at 18 day deadline, so if you are looking at the web page representation of "Tasks for Computer" at the moment newly downloaded January 27 deadline work is R5, while January 31 is R4 (in less than two hours to be January 28 and February 1).
____________

John Clark
Avatar
Joined: May 4 07
Posts: 1063
ID: 258634
Credit: 1,411,736
RAC: 5,000
Message 93950 - Posted 13 Jan 2009 22:46:58 UTC
Last modified: 13 Jan 2009 22:47:18 UTC

I have swapped part of one of my quads to the new WU.

Deleted the app_info file and D/Led the new 6.01 client and a mixture of S5R4s and S5R5s (6.10s and 3.01s). I have 6 of each, but they will not start crunching until a couple of MW WUs are completed in about 15 minutes.

ATM I will ignore the predicted completion times of 3hrs 25m for the 3.01s and 8hrs 20mins for the 6.10s. These predicted times for the 6.10s are longer than when I ran the 6.05 client.
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

John Clark
Avatar
Joined: May 4 07
Posts: 1063
ID: 258634
Credit: 1,411,736
RAC: 5,000
Message 93958 - Posted 14 Jan 2009 0:12:28 UTC

Now crunching 3 3.01 WUs and projections suggest these will complete (for my older quad) in about 6 hours. The 6.10 WUs using the 6.05 client, with app_info file, was 7hours 22 minuted.

A good reduction, but not by 50%.

I know I am not comparing like with like. This is just to a very rough first approximation.
____________
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Profile Svenie25
Joined: Mar 21 05
Posts: 117
ID: 62489
Credit: 538,288
RAC: 13
Message 93973 - Posted 14 Jan 2009 8:35:40 UTC

My Desktop got his first R5 WUs. Looks fine so far. Waiting for the first validations.
Now I hope to stay a looong time at one frequenzfile, to have a lokk on the new curve in runtime.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,079,839
RAC: 9,661
Message 93990 - Posted 14 Jan 2009 13:24:19 UTC

BTW, as the command line arguments to the app are now printed into teh debugging output of the results, it's much easier to check after a WU has finished whether it's runtime is near the expected minimum or maximum.

Youst look at the output in the result, and find the argument :

--numSkyPartitions=xxx

e.g. "--numSkyPartitions=339"

Now lookup the sequence number in the name of the result following the double underscore, e.g. for WU h1_0709.40_S5R4__677_S5R5a that number would be 677

now divide the second by the first number, so here:

677 / 339 = 1.99

If the fractional part of that quotient is close to 0 or 1 , you are near a runtime maximum. If it's close to 0.5, you are near a runtime minimum.

This will help to put the first runtime results into perspective a bit.

CU

Bikeman
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 94052 - Posted 15 Jan 2009 19:24:41 UTC - in response to Message 93933.

The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched.

And so I allowed for some extra minutes between its launch and me re-allowing work. Got an S5R4 of course. ;-)

Hmm, something strange happened here. My internet just dropped off.
Because of that I had some network problems (router doing strange things). The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found...

Here's the log at the exact time my internet went off:

15-Jan-2009 18:57:33 [---] file projects/einstein.phys.uwm.edu/einstein_S5R4_6.09_graphics_windows_intelx86.exe not found
15-Jan-2009 18:57:33 [---] Suspending network activity - user request
15-Jan-2009 18:57:33 [Einstein@Home] [error] Application file einstein_S5R4_6.09_windows_intelx86.exe missing signature
15-Jan-2009 18:57:33 [Einstein@Home] [error] BOINC cannot accept this file
15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec
15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Reason: Unrecoverable error for result h1_1103.40_S5R4__791_S5R4a_1 (Input file einstein_S5R4_6.09_windows_intelx86.exe missing or invalid: -123)
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::report_result_error
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from start
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from resume_or_start1
15-Jan-2009 18:57:35 [Einstein@Home] Computation for task h1_1103.40_S5R4__791_S5R4a_1 finished
15-Jan-2009 18:57:35 [Einstein@Home] Output file h1_1103.40_S5R4__791_S5R4a_1_0 for task h1_1103.40_S5R4__791_S5R4a_1 absent
15-Jan-2009 18:57:35 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::app_finished

It has now downloaded a new S5R5 task, but is still trying to download the 6.09 graphical application every minute.

15-Jan-09 20:15:21|Einstein@Home|Backing off 1 min 0 sec on download of einstein_S5R4_6.09_graphics_windows_intelx86.exe

{scratch, scratch} was 6.09 a power app then?
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 94057 - Posted 15 Jan 2009 23:21:19 UTC - in response to Message 94052.
Last modified: 15 Jan 2009 23:27:58 UTC

... The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found...


Actually, the task exited because it was looking for the 6.09 version of the science app and that there was no file sig for that version. The fact that the 6.09 graphics app also couldn't be found was just collateral damage :-).

I usually keep all the beta/power app versions on my server and I've just checked. I have versions (for Windows) 6.04, 6.05, 6.06, 6.07 and 6.10. AFAIK there was a Windows 6.09 version but it was to fix checkpointing issues under Win98 and ME if I recall correctly.

Since the current beta and official version is 6.10 (and I'm guessing this would have been the version you were using) the reason for your problem is that for some unknown reason the version number associated with your task suddenly got changed from 610 to 609 in your state file and then BOINC suddenly realised that you didn't have the 6.09 app package with which to continue crunching it. The fact that BOINC tries to get the 609 app shows that you weren't using the AP mechanism and somehow BOINC thinks that 609 is official. I don't remember if 609 was ever official at any point.

There are probably other variations on this but it seems that something in your state file that was 6.10 somehow got changed to 6.09 in some way. It's hard to see how this might be due to a loss of network connectivity.

Another funny point is that BOINC complains about a missing signature for a 6.09 file. This seems to imply that you had such a file in your project folder and had run it under AP at some point so that there was a <file_info> block for it (with no file sig) in your state file. Surely BOINC wouldn't say that it can't accept the file if the file didn't actually exist??

So what version of the science app were you actually running??
____________
Cheers,
Gary.

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 94061 - Posted 16 Jan 2009 0:02:17 UTC - in response to Message 94057.

... The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found...


Actually, the task exited because it was looking for the 6.09 version of the science app and that there was no file sig for that version. The fact that the 6.09 graphics app also couldn't be found was just collateral damage :-).

I usually keep all the beta/power app versions on my server and I've just checked. I have versions (for Windows) 6.04, 6.05, 6.06, 6.07 and 6.10. AFAIK there was a Windows 6.09 version but it was to fix checkpointing issues under Win98 and ME if I recall correctly.

Since the current beta and official version is 6.10 (and I'm guessing this would have been the version you were using) the reason for your problem is that for some unknown reason the version number associated with your task suddenly got changed from 610 to 609 in your state file and then BOINC suddenly realised that you didn't have the 6.09 app package with which to continue crunching it. The fact that BOINC tries to get the 609 app shows that you weren't using the AP mechanism and somehow BOINC thinks that 609 is official. I don't remember if 609 was ever official at any point.

There are probably other variations on this but it seems that something in your state file that was 6.10 somehow got changed to 6.09 in some way. It's hard to see how this might be due to a loss of network connectivity.

Another funny point is that BOINC complains about a missing signature for a 6.09 file. This seems to imply that you had such a file in your project folder and had run it under AP at some point so that there was a <file_info> block for it (with no file sig) in your state file. Surely BOINC wouldn't say that it can't accept the file if the file didn't actually exist??

So what version of the science app were you actually running??

Gary,

For the first (and probably only) time I'm going to disagree with you - you're probably the most technically astute (and courteous) moderator I've come across in my limited range of BOINC projects - and yet.....

There was a Windows v6.09 package, and Bernd made his usual announcement thread for it. As a Beta, it would have come with an app_info.xml specifying all the filenames.

And that's exactly the point. The anonymous platform mechanism requires that every file is named, explicitly. BOINC doesn't make up filenames by combining version numbers with filename root components. [It does make up 'friendly names' that way for display in BOINC Manager]. That does suggest that at some point Jord downloaded and tested Beta v6.09 - it must have been in a relatively short interval between 4 Dec 2008 (v6.08) and 1 January 2009 (v6.10). I was an active participant in the Windows 98 phase of that test, and those are the download datestamps of my preserved archives.

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 94062 - Posted 16 Jan 2009 0:32:35 UTC - in response to Message 94057.

Since the current beta and official version is 6.10 (and I'm guessing this would have been the version you were using) the reason for your problem is that for some unknown reason the version number associated with your task suddenly got changed from 610 to 609 in your state file and then BOINC suddenly realised that you didn't have the 6.09 app package with which to continue crunching it. The fact that BOINC tries to get the 609 app shows that you weren't using the AP mechanism and somehow BOINC thinks that 609 is official. I don't remember if 609 was ever official at any point.

I was running version 6.09 up until that time, with the app_info.xml file.
But prior to trying for S5R5 work, I had set EAH to NNT, exited BOINC, taken out the app_info.xml file and the executables, restarted BOINC, reset the project (to clear straggling remnants in client_state.xml file) and re-allowed work fetch.

As I mentioned in this thread, I had gotten an S5R4 task. It has been running it with the 6.09 application and hasn't had a problem with it until my internet connection dropped off.

It had been running for several hours already before all of a sudden it found this file gone missing.

13-Jan-2009 23:38:35 [Einstein@Home] Starting h1_1103.40_S5R4__791_S5R4a_1
13-Jan-2009 23:38:38 [Einstein@Home] [task_debug] task_state=EXECUTING for h1_1103.40_S5R4__791_S5R4a_1 from start
13-Jan-2009 23:38:38 [Einstein@Home] Starting task h1_1103.40_S5R4__791_S5R4a_1 using einstein_S5R4 version 609

and

13-Jan-2009 23:41:40 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed
13-Jan-2009 23:43:29 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed
13-Jan-2009 23:45:18 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed

and

15-Jan-2009 10:49:03 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed
15-Jan-2009 10:49:03 [Einstein@Home] [task_debug] task_state=QUIT_PENDING for h1_1103.40_S5R4__791_S5R4a_1 from preempt
15-Jan-2009 10:49:04 [Einstein@Home] [task_debug] Process for h1_1103.40_S5R4__791_S5R4a_1 exited
15-Jan-2009 10:49:04 [Einstein@Home] [task_debug] task_state=UNINITIALIZED for h1_1103.40_S5R4__791_S5R4a_1 from handle_premature_exit

That was all she wrote, until my internet went out and I had to restart BOINC (for different reasons), to be greeted upon return by

15-Jan-2009 18:57:33 [---] file projects/einstein.phys.uwm.edu/einstein_S5R4_6.09_graphics_windows_intelx86.exe not found
15-Jan-2009 18:57:33 [---] Suspending network activity - user request
15-Jan-2009 18:57:33 [Einstein@Home] [error] Application file einstein_S5R4_6.09_windows_intelx86.exe missing signature
15-Jan-2009 18:57:33 [Einstein@Home] [error] BOINC cannot accept this file
15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec
15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Reason: Unrecoverable error for result h1_1103.40_S5R4__791_S5R4a_1 (Input file einstein_S5R4_6.09_windows_intelx86.exe missing or invalid: -123)
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::report_result_error
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from start
15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from resume_or_start1

Look, if you don't want to run for some reason from day one, you're not checkpointing either. ;-)

After it kept on yammering that it couldn't find that one file, I even unpacked it from the zip file I have for 6.09, but then it would still not take it as the signature wouldn't match. Three further BOINC restarts fixed that.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 94068 - Posted 16 Jan 2009 7:57:42 UTC - in response to Message 94061.

For the first (and probably only) time I'm going to disagree with you ....

I'm wrong plenty of times so you should disagree with me quite a bit :-).

At the moment I'm in the middle of converting 150 machines all running under AP and all still dual R3/R4 capable - although none of them has seen any R3 for a looooong time :-). All machines have caches in the range of 3 - 6 days and whilst EAH is the main project, some support LHC and some support SAH. Instead of just waiting for the caches to empty, I decided to dream up a conversion so that each machine could be dual capable for R4/R5 and that this transition should occur mid cache, so to speak, since none of my caches have actually drained as yet. I have a working solution that takes about 10 - 15 minutes per machine and I'm about half way through.

The longest part of the procedure is actually making the state file R3 clean so that I can get rid of all the old R3 stuff still in the project directory of each host. Another significant component is adding the file signatures for the R4 beta test apps that subsequently became official, 6.02 for Linux and 6.10 for Windows. This is what allows the successful removal of AP while there are still R4 tasks onboard. Also, as part of the conversion procedure, the new R5 apps are added to the project folder and are then discovered by BOINC when it restarts. This saves a lot of bandwidth by not having to download the full R5 app package 150 times.

After doing this surgery (requiring extreme concentration) for many, many hours, I decided I was in need of a rest so I decided to read the boards. So there was Jord's cry of pain which I read and responded to in rather too much haste in a mentally unfit state. I made the following dubious assumptions.

  • Because it was Jord and because he always stays up-to-date, he would be running 6.10.
  • He wouldn't be running 6.09 because his final statement seemed to be indicating that he didn't know what 6.09 actually was
  • I didn't ever download 6.09 but I certainly knew it existed. I thought it was the version to correct checkpointing problems under Win9x.
  • Jord runs 2K and XP so another reason why he wouldn't have been running 6.09
  • The real problem was a missing file signature so I made the assumption that somehow a 6.10 somewhere had got changed to a 6.09 to create the problem.



With the benefit of Jord's next message, I now see that he was indeed running 6.09 and so his original message now conjures up a quite different image.

As part of the conversion process on my machines, I get to see what happens to each one (post conversion) when BOINC fires up again. I deliberately (by increasing the cache as required) force each host to download new work, just to be sure that everything is working correctly. When I first started doing this, the new task was mostly R5 but of late, the most recently converted hosts seem to be scoring R4 resends (ie _2 or above). If I keep increasing the cache, I will often get further resends but eventually (on every host so far) I get to score the initial R5 and I get to see all the "skipping downloads" messages for the full R5 app package.

My theory is that Jord (according to his statements) did his very best to remove all traces of R4 from his machine so that when he actually received an R4 task instead of the expected R5 he probably got quite a surprise. BOINC would have had to download the stock app for R4 which is 6.10 and not 6.09. I don't understand why that R4 task even started crunching with 6.09??? That's a question for Jord. When he received the R4 task, did he also receive the 6.10 stock app to go with it? If not, why not???

There was a Windows v6.09 package ...

Yes, I know. I tried to say that I hadn't bothered to download it as it mustn't have been important for me.

And that's exactly the point. The anonymous platform mechanism requires that every file is named, explicitly. BOINC doesn't make up filenames by combining version numbers with filename root components. [It does make up 'friendly names' that way for display in BOINC Manager].

I do actually more than fully understand all this :-). BOINC may not invent names but the editing mistakes of users certainly can.

The crucial point is that since Jord tried so hard to "revert to stock", why was 6.09 being used at all?

____________
Cheers,
Gary.

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 94070 - Posted 16 Jan 2009 8:16:29 UTC - in response to Message 94068.

That's a question for Jord. When he received the R4 task, did he also receive the 6.10 stock app to go with it? If not, why not???

I never got 6.10 .. in fact the only applications I have in my Data\projects\einstein.phys.uwm.edu\ are:

einstein_S5R5_3.01_windows_intelx86.exe
einstein_S5R5_3.01_windows_intelx86_0.exe
einstein_S5R5_3.01_windows_intelx86_1.exe
einstein_S5R5_3.01_windows_intelx86_2.exe
einstein_S5R5_3.01_graphics_windows_intelx86.exe

and
einstein_S5R4_6.09_windows_intelx86.exe
einstein_S5R4_6.09_windows_intelx86_0.exe
einstein_S5R4_6.09_windows_intelx86_1.exe
einstein_S5R4_6.09_windows_intelx86_2.exe
einstein_S5R4_6.09_graphics_windows_intelx86.exe


There was a Windows v6.09 package ...

Yes, I know. I tried to say that I hadn't bothered to download it as it mustn't have been important for me.

I guess I did it because I was still at 6.04 or 6.05 before that. I never updated to 6.10 as I didn't see in time it was out. Was a bit busy elsewhere.

The crucial point is that since Jord tried so hard to "revert to stock", why was 6.09 being used at all?

And why did the app survive 2 earlier restarts of BOINC, before crashing out as being missing upon my internet connection going AWOL? (Although I am sure that was a coincidence, a one in a trillion shot. ;-))

I will do another reset after this S5R5 task has ran its course. Although, the task only ran for an hour and a half, I may get away with it and get it resent if I do the reset now.

Reset project. It's resending me the same task. Good.
It's also only resending me the 3.01 applications. The 6.09s are now gone from my Data\projects\einstein.phys.uwm.edu\ directory. I'll put a voodoo lock on it so they do stay gone. ;-)

Also good news:
I followed the whole same procedure on the AMD (win2k) and it just finished its first S5R5. I had a 632_60 done with S5R4, that ran in 92,132.46 seconds. The new one on S5R5 ran in 59,818.89 seconds. So definite speed up. I'll leave the credit comparing shenanigans to someone else. ;-)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 94224 - Posted 22 Jan 2009 14:40:05 UTC

I'm having problems getting one machine to run the S5R5 application. It's host 475735, my Windows 2000 standard server (SP4). It's just my domestic file/print server, not a domain controller or anything. It's been running earlier versions of Einstein just fine (see the host join date/credit), and it's continuing to run SETI without problems. BOINC is v5.10.13 installed as a service - no recent change.

The problem with S5R5 is that tasks (well, the only S5R5 task it's been assigned so far) starts to run, but makes no progress at all. I was away at the weekend, and the app ran for well over a day with still 0.000% progress showing.

Also, once the app starts, I can't find any way of stopping in. If I suspend the task via BOINCManager or BoincView, it continues to run at 99% CPU utilisation. Likewise if I shut down the BOINC service. I can't even kill the Einstein process with Task Manager - it tells me 'access denied'. The only way I can get back to productive work (e.g. on SETI) is to reboot the whole computer.

The CPU is a single-core P4 Northwood, with 512MB RAM. It's a very close match to my host 1036916, which runs S5R5 with no problems under XP SP3. Any ideas?

archae86
Joined: Dec 6 05
Posts: 569
ID: 139940
Credit: 5,757,826
RAC: 9,250
Message 94226 - Posted 22 Jan 2009 15:13:09 UTC - in response to Message 94224.

The CPU is a single-core P4 Northwood, with 512MB RAM. It's a very close match to my host 1036916, which runs S5R5 with no problems under XP SP3. Any ideas?

Northwood had the hyperthreading hardware, though it was not enabled for use until pretty late in the development cycle (my Gallatin, a direct Northwood descendant, had HT enabled).

If you do have HT, and have it enabled, you might get a change in behavior by disabling it. With my Gallatin host, it seemed to me that HT exposed bugs in more than one installer, so it could expose a bug in something else.

Long shot.
____________

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 94228 - Posted 22 Jan 2009 15:44:42 UTC - in response to Message 94226.

The CPU is a single-core P4 Northwood, with 512MB RAM. It's a very close match to my host 1036916, which runs S5R5 with no problems under XP SP3. Any ideas?

Northwood had the hyperthreading hardware, though it was not enabled for use until pretty late in the development cycle (my Gallatin, a direct Northwood descendant, had HT enabled).

If you do have HT, and have it enabled, you might get a change in behavior by disabling it. With my Gallatin host, it seemed to me that HT exposed bugs in more than one installer, so it could expose a bug in something else.

Long shot.

No, no HT enabled on either box. Both are unmodified Dell motherboards (XP on Dimension, W2KS on PowerEdge 600SC), so not much scope for getting the BIOS and the CPU out of sync!

Profile Stranger7777
Avatar
Joined: Mar 17 05
Posts: 117
ID: 58307
Credit: 14,266,286
RAC: 16,853
Message 96772 - Posted 4 May 2009 5:32:26 UTC

I had asked this question already when S5R3 was at finish line. But, here it is again. Why not finish S5R4 ASAP by crunching it inside? There are only 27 units without final result - about a week of work for single computer. This will lead to removing excessive daemons like S5R4 assimilator, S5R4 validator and maybe even S5R4 filedeleter (not sure, may be it is common for all S5). If it was useful search - than it will be time to analyze the data, if not - throw it away ASAP. Are there any thoughts about this?

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 96773 - Posted 4 May 2009 8:06:12 UTC - in response to Message 96772.
Last modified: 4 May 2009 8:06:28 UTC

I had asked this question already when S5R3 was at finish line. But, here it is again. Why not finish S5R4 ASAP by crunching it inside? There are only 27 units without final result - about a week of work for single computer. This will lead to removing excessive daemons like S5R4 assimilator, S5R4 validator and maybe even S5R4 filedeleter (not sure, may be it is common for all S5). If it was useful search - than it will be time to analyze the data, if not - throw it away ASAP. Are there any thoughts about this?


If scientists here would be eagerly awaiting the S5R4 results, we could help finish this run faster by raising the "initial replication" of the remaining workunits (i.e. sending out more tasks for them, two of these will hit fast computers). But actually they are still working on previous runs (finishing S5R1 publication, analyzing S5R3 results). If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all.

Like OS daemons, the S5R4 ones just sleep until there is something to do. They don't harm the system at all.

For the time being we're just keeping the S5R4 workunits in the system for participants to get credit, and to save us unnecessary additional work.

BM

Profile Misfit
Avatar
Joined: Feb 11 05
Posts: 500
ID: 15639
Credit: 100,000
RAC: 0
Message 96786 - Posted 5 May 2009 0:06:28 UTC - in response to Message 96773.

If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all.

So it's possible all that work and crunch time could have been for nothing?
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 96791 - Posted 5 May 2009 6:51:38 UTC - in response to Message 96786.

If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all.

So it's possible all that work and crunch time could have been for nothing?

At the time we started S5R4 it was the best search we could do. But then learning from analyzing the results we had so far we found a way to improve the sensitivity without requiring more computing power, so S5R5 was started, and S5R4 was cut short in favor of it. I would call S5R4 wasted if we had it continued till the end instead of superseding it by S5R5.

Dakota tribal wisdom says that when you discover you are riding a dead horse, the best strategy is to dismount.

BM

Profile Stranger7777
Avatar
Joined: Mar 17 05
Posts: 117
ID: 58307
Credit: 14,266,286
RAC: 16,853
Message 97306 - Posted 2 Jun 2009 6:34:37 UTC

Yeah! We are finally finished the S5R4. Now it wouldn't waste our computer time anymore! Congratulations.

Profile Stranger7777
Avatar
Joined: Mar 17 05
Posts: 117
ID: 58307
Credit: 14,266,286
RAC: 16,853
Message 97315 - Posted 3 Jun 2009 5:39:30 UTC

I'm now having a new question to developers: is it possible to place a progress bar or progress value for ABP1 search. I see the overal time for S5R5 gets even higher each day and the cause of it is an additional search the progress for which we don't know. So, it will be nice to see how fast we are going through it and when it'll finish to open the road for the mainstream project.

Holmis
Joined: Jan 4 05
Posts: 81
ID: 2070
Credit: 1,438,318
RAC: 3,249
Message 97318 - Posted 3 Jun 2009 8:10:39 UTC - in response to Message 97315.

I'm now having a new question to developers: is it possible to place a progress bar or progress value for ABP1 search. I see the overal time for S5R5 gets even higher each day and the cause of it is an additional search the progress for which we don't know. So, it will be nice to see how fast we are going through it and when it'll finish to open the road for the mainstream project.


Have a look in this thread!
____________

Message boards : Cruncher's Corner : S5R5 plans


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration