S5R5 plans |
Message boards : Cruncher's Corner : S5R5 plans
| Author | Message |
|---|---|
|
Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4... | |
| ID: 90858 | | |
Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4... Will definitely come. We're still testing and tweaking the setup. Some facts so far: - slightly increased memory requirement - larger "dwell time" per sky location. This is limiting the maximum checkpoint rate to about once per 3 min (on current average CPUs, longer on slower ones) - workunits will run roughly half as long as S5R4 ones BM | |
| ID: 90864 | | |
Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4... Hi Bernd, Can we use the current (605) power app, or will we need a new app/app_info? ____________ BOINC blog ![]() | |
| ID: 90887 | | |
Any news about the S5R5 run mentioned earlier in this thread?? it was supposed to begin soon, because of some kind of bug on the results of the S5R4... S5R5 will require new binaries that are currently under test. CU Bikeman ____________ ![]() ![]() | |
| ID: 90888 | | |
S5R5 will require new binaries that are currently under test. I presume there will be a similar rundown of S5R4 and change over to S5R5 as there was for the start of the S5R4? Meaning those that want to initially stick to the S5R4 WUs can do so until these get scarce? One driver will be the credit, and RAC, given between the R4 and R5. If similar then own choices, if the R5 is slightly better, then the run down of R4 may take longer than planned. ____________ Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty! ![]() | |
| ID: 90892 | | |
Can we use the current (605) power app, or will we need a new app/app_info The S5R5 Windows App will feature the code that makes the S5R4 6.05 App as fast as it is. There is no change in crediting targeted for S5R5. BM | |
| ID: 90894 | | |
|
Also, the S5R5 run will work on the existing S5R4 datafiles, so the transition form S5R4 to S5R5 should be much smoother than the previous one from S5R3 to S5R4 where it was not possible to re-use downloaded datafiles from the previous run. | |
| ID: 90895 | | |
Also, the S5R5 run will work on the existing S5R4 datafiles, so the transition form S5R4 to S5R5 should be much smoother than the previous one from S5R3 to S5R4 where it was not possible to re-use downloaded datafiles from the previous run. That sounds good then! ____________ Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty! ![]() | |
| ID: 90900 | | |
|
Partly on topic: | |
| ID: 90901 | | |
Partly on topic: Drain your queue (set no-new-work) Report results when it's empty Shutdown BOINC (make sure it's down all the way) Remove app_info.xml file Restart BOINC and enable new work | |
| ID: 90911 | | |
|
So...the result of S5R4 would still make any sense? If not, I'd like to stop running this project until S5R5 comes out. | |
| ID: 90969 | | |
So...the result of S5R4 would still make any sense? If not, I'd like to stop running this project until S5R5 comes out. I believe all of the science runs are valid and useful to some degree. I suggest completing the run. ____________ | |
| ID: 90977 | | |
|
Well done, that's excellent that the new application will have shorter running tasks. I hope you reduce the deadline accordingly. I would suggest 7 days instead of the current 18 days would help greatly to reduce the length of time that some tasks remain pending. | |
| ID: 90981 | | |
Well done, that's excellent that the new application will have shorter running tasks. I hope you reduce the deadline accordingly. I would suggest 7 days instead of the current 18 days would help greatly to reduce the length of time that some tasks remain pending. A seven-day deadline is way too short as it will be detrimental (pardon my strong word) to those multi-core crunchers like me who would download loads of workunits to last a week or two on less-accessible-yet-automated machines. Can't wait to get credited on your work...? | |
| ID: 90990 | | |
|
I don't mind waiting a few weeks but a month or more is excessive on any project. | |
| ID: 90992 | | |
I don't mind waiting a few weeks but a month or more is excessive on any project. The way the datasets are distributed, if you reduce to a 7-day deadline you will more than likely significantly increase the amount of "backfill" downloading that goes on. This means that there will be more downloading of 70MB+ groups of files. This will irritate those still on dialup. If runtimes are indeed halved, then the minimum deadline should go to 9 days, since that is half of the current 18. My suggestion though would be to return to the original 14-day deadline. ____________ ![]() | |
| ID: 91002 | | |
|
Several weeks ago the "pending credit" situation was rather bad as reported in this thread, but now, at least for me, it's OK again. If it stays like it is now I can live with a 18 or 14 day deadline. I guess the fact that ATLAS stopped crunching for some time when the servers got overloaded must have contributed to the massive increase of pending credits ? | |
| ID: 91018 | | |
I guess the fact that ATLAS stopped crunching for some time when the servers got overloaded must have contributed to the massive increase of pending credits ?Not directly by waiting for it, I think. It seems to me the primary symptom was that it became common for many days to go by between first issue of a result from a WU and issue of the first quorum partner result in that same WU. No matter how promptly everyone processes the results they receive, that situation gives trouble. It seems to have become less common recently, though my tiny fleet is not a big enough sample to say that with any assurance. Even in that fleet I spotted some 5 day delays within the last week. Perhaps ATLAS contributed indirectly: instead of waiting for it our waits were a consequence of the scheduler's poor response to its presence. ____________ | |
| ID: 91030 | | |
No matter how promptly everyone processes the results they receive, that situation gives trouble. Could you please state the "trouble" it gives? Help me understand why there is a problem... ____________ ![]() | |
| ID: 91033 | | |
|
The "deadline" is something that can rather easily be adjusted on-the-fly during a run (whereas the average workunit duration is not). With current workunits the average is set to be 18d, for what I recall from previous discussions I think most people feel comfortable with about 14d for 6-8h WUs. | |
| ID: 91048 | | |
The "deadline" is something that can rather easily be adjusted on-the-fly during a run (whereas the average workunit duration is not). With current workunits the average is set to be 18d, for what I recall from previous discussions I think most people feel comfortable with about 14d for 6-8h WUs. Perhaps people don't remember the history and the reasoning for going up to 18 days. There used to be a lot of complaining about "Einstein" tasks going into Earliest Deadline First (now called "High Priority") when tasks were set to 14 days. The common misconception was that it is the project doing this and that the participants' resource allocation selection is not being honored. People made complaints about Einstein "hogging" their CPU. The reality is that BOINC was/is doing it to try to make sure that work is returned on time and that resource allocations are honored over the long-term, but perhaps not on an hour-by-hour basis. I view a lot of the "I have way too much pending credit" discussion in a similar light. Is it a "problem"? I guess it could be, if it is significant enough to cause a sizeable amount of participants to stop processing tasks because they feel they are not being rewarded in a timely fashion. Beyond that, it is up to you and the rest of the project team to determine whether or not you need to be getting results in faster. The deadline needs to be set to something "reasonable". When I requested the increase to 18-21 days, the condition I put on it was that it should be increased until such time as the SSE (and other) enhancements made it into the stock Windows application. Since that time has arrived, 14 days is probably a good choice again. Due to workunit distribution methods, I'm not sure that going lower than that will have the substantial "relief" hoped for by the people who are upset about pending credit and/or unsent tasks. ____________ ![]() | |
| ID: 91063 | | |
|
I guess also ATLAS has to be taken into account, I could imagine that there will be periods when E@H jobs get very little CPU time and others when ATLAS is highly productive for E@H, for lack of other jobs. Probably hard to predict and highly irregular. If the deadline is too short there might be mass failures to meet the deadline by the several thousand ATLAS cores. But 14 days seem reasonable. | |
| ID: 91065 | | |
If the deadline is too short there might be mass failures to meet the deadline by the several thousand ATLAS cores. But 14 days seem reasonable. It more than likely is reasonable. Personally, I doubt going below 12 days would help, and in fact, I think it may harm things. 14 is a good place to start at. Part of the problem now is the faster Windows app is still not the stock app. A combination of getting the faster app to the general user base along with the reduction to 14 days should be the first step taken to see if it makes a significant dent in the pending / unsent issue. If it does not, then it is up to the project to decide if that issue is a high enough risk to warrant any other action to be taken to address the situation. ____________ ![]() | |
| ID: 91066 | | |
A combination of getting the faster app to the general user base along with the reduction to 14 days should be the first step taken to see if it makes a significant dent in the pending / unsent issue. Is there still an issue? The "oldest unsent result" is now back to the 7 days (it used to be like that as far as I can remember), and pending credits have been reduced likewise (at least that's my experience). I think it's more or less "normal" again. CU Bikeman ____________ ![]() ![]() | |
| ID: 91068 | | |
There are still people complaining about it, so there is an "issue" of some sort, either real or perceived... ____________ ![]() | |
| ID: 91071 | | |
Partly on topic: Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least. ____________ BOINC blog ![]() | |
| ID: 91096 | | |
Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least. Currently there's a telecon scheduled for tomorrow which covers the subject. Once the final decision is made actually starting the run is a matter of days. I'll keep you posted. BM | |
| ID: 91099 | | |
|
Thanks Brenard | |
| ID: 91101 | | |
|
Merci, Bernd........or is it 'mercy'??....;)....Cheers, Rog. | |
| ID: 91109 | | |
|
Perhaps my ambit claim of 7 days deadline was a bit ambitious, but it was not my intention to cause offence. I still believe a shorter deadline if possible is better for any quorum project than a longer one, particularly if the project wishes to retain a higher percentage of new contributors. When the unsent time rises it's like a double whammy. However it seems there are valid user and server reasons to justify a 2 week deadline so that's fair enough. | |
| ID: 91142 | | |
I have noticed that the running time of the current Einstein tasks can vary on my computer by up to about 27%. Will the new S5R5 tasks also vary the same as the current tasks? The variation is quite normal, there is a lot of info and graphs in the "How to check Performance when Testing a new App" thread. | |
| ID: 91145 | | |
Runtime variation will not disappear in S5R5, if anything it will get a bit more pronounced and certainly less predictable (greater "wiggles" in the runtime graphs discussed in other threads here). This is because there will be fewer sky-points per WU to average out some runtime-irregularities over the course of a single WU. It *might* be possible to model the credits per WU somewhat more realistically, so that WUs that take longer will be awarded some more credits, not sure this gets implemented in time for S5R5 tho, we'll see. CU Bikeman ____________ ![]() ![]() | |
| ID: 91146 | | |
Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least. Directly from telecon: Further (internal) testing / simulations needed, expected timeframe for this is another week. BM | |
| ID: 91150 | | |
|
Thank you both for your replies, Winternight and Bikeman. | |
| ID: 91153 | | |
|
Hmmmm... | |
| ID: 91199 | | |
Hmmmm... It's in a post by Bernd buried in the middle of the v6.05 power app thread. | |
| ID: 91202 | | |
|
maybe redundant by now, but the answer to question one is that S5R5 will analyze the S5R4-data, but faster and without bugs, so they're not going to wait until this run finishes, but replace it with the new run/application. | |
| ID: 91204 | | |
|
Do we get native 64-bit apps for this run? | |
| ID: 91245 | | |
Any idea when the cut over to S5R5 is likely to happen? So we can plan the above steps, well the first one at least. So how is it coming along? Can we expect it in two days? (5 november + 7 days = 12 november ^^) or is it going to take longer? ____________ | |
| ID: 91388 | | |
|
Update: | |
| ID: 92367 | | |
|
Thanks for the update bernd. | |
| ID: 92375 | | |
|
Hi Bernd, Simulations show that with the currently planned (and preliminarily implemented) S5R5 setup we would miss some signals. More tuning needed, will take at least another week. Now that Christmas/New Year distractions are over and the people that matter are getting back to work, I presume that some action on S5R5 is probably close at hand. Also, as announced in the Windows 6.10 thread I just made this App "official". This gives you the opportunity now to switch back to the "official" path (you should empty your work cache before removing the app_info.xml), which I would recommend. This will allow you to get ABP1 and S5R5 work right away when we issue it. things seem to be hotting up for a "sooner rather than later" bit of action. It would be very good if you could provide some details on how you see the transition from S5R4 to S5R5 actually happening. Here are some specific questions:- 1. Will there be a sudden termination of S5R4 tasks - ie server will suddenly have zero S5R4 tasks to issue or will there be a transition during which both types will be available? 2. Will there be any point in completing cached work on clients? Presumably the answer to this would be "yes" if there is to be some sort of transition? 3. If somebody has a large cache would it be advantageous to reduce it now in anticipation of S5R5? 4. Will you be attempting to complete all "open" quorums by reissuing tasks when already issued ones error out or fail to return by the deadline? Any detailed information you can share would be appreciated, thanks. ____________ Cheers, Gary. | |
| ID: 93691 | | |
|
Hi Gary! 1. Will there be a sudden termination of S5R4 tasks - ie server will suddenly have zero S5R4 tasks to issue or will there be a transition during which both types will be available? We'll stop the S5R4 workunit generator, but the workunits generated so far will be finished and credited. I'm not sure that pushing the last ones through is necessary, so I probably won't put time into this. There is no (intentional) change in the crediting, so wrt the credit it shouldn't matter whether you run S5R4 or S5R5 workunits. The S5R5 ones will run a bit shorter (design goal was 50%, but I'm afraid with the adjustments we had to make afterwards we missed it by about 10%). BM | |
| ID: 93692 | | |
|
Perhaps it's also worth mentioning that even tho WUs will finish much faster on average, the relative runtime variation (ratio between longest and shortest runtime on the same host for different WUs) will *increase*. So it will require averaging over even more WUs than in S5R4 to really estimate the true average runtime on your systems: don't be too excited/disappointed when the first few WUs run much faster/slower than expected ;-). | |
| ID: 93726 | | |
the relative runtime variation ... will *increase* More unpredictable, just what i need. I will follow the findings of the "runtime variance diagram"-guys but for now i switch. ____________ Team Philippines | |
| ID: 93825 | | |
the relative runtime variation ... will *increase* Actually it's more predictable than ever. Thanks to the work of Bikeman the granted credit will more accurately match the runtime variation than ever before. Also too as the total Task runtime will be halved, even a 30% variation will be smaller in absolute time than before. Finally I think we got the floating point estimation better than ever and update the progress counter more frequently in the S5R5 App, so the client should be able to estimate the runtime more accurately. BM | |
| ID: 93854 | | |
|
Good to hear that at least the credit/hr will be more stable then, if performance can be evaluated out from credits granted then the runtime variation isnt so bad... Looking even more forward to see some S5R5 results then. | |
| ID: 93857 | | |
the relative runtime variation ... will *increase* Okay when are you planning on switching over? I've deleted app_info in anticipation, but it seems happy to use the 6.10 app with S5R4's still coming down at the moment. ____________ BOINC blog ![]() | |
| ID: 93859 | | |
good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less. ____________ seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift. | |
| ID: 93862 | | |
good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less. If the currents S5R4 are 40h for you, S5R5 should be 20h on average. We're running some final tests, S5R5 should start today or tomorrow if we don't find any oddities. BM | |
| ID: 93863 | | |
good question, 40 hour work units are gettin on my nervs :| no offence, but id prefer 23:59:59 or less. Great, thanks for the reply. Mine take around 8 hours each so that means they should be around 4 hours each. ____________ BOINC blog ![]() | |
| ID: 93864 | | |
|
So if my credit per WU goes 222 cobblestones, it means it should be around 222 as well....? (Fingers crossed) | |
| ID: 93868 | | |
So if my credit per WU goes 222 cobblestones, it means it should be around 222 as well....? (Fingers crossed) Not quite sure what you mean. There will still be a significant gap between claimed credit and granted credit (claimed credit is just ignored by E@H anyway, credit is fixed at the server level). In S5R1..S5R4 all WU of a certain frequency range got you the exact same credit, like, say, 222 credits. However, some of them would take (say) 10 hours to complete and others (say) 12 hours. With S5R5, workunits are shorter, but the relative variation is expected to be greater, so, say, one WU would finish in 6 hours (for say 111 credits) and the other in maybe 3.5 hours. To award the same credits to WUs with such a big variation is probably not very well received by users, even if the average credits/h would (over a long time) be the same. So an attempt was made to actually try to award credits according to the complexity of the WUs. Longer WUs will get more credits and shorter ones less. This can only be an approximation, and may need some adjustments after seeing how well this works out for a) different WU frequency ranges and b) different hardware. For the "theory" of the runtime variations, see the different threads on performance measurement and the "ready reckoner" here in this forum that provided the input for this approximation. (Gary, Mike, Richard Haselgrove and archae86 provided a lot of insight and data there). CU Bikeman ____________ ![]() ![]() | |
| ID: 93884 | | |
|
Luckily both my systems returned their latest tasks just today and BOINC was giving the CPU time back to the other projects. So in eager anticipation I've put Einstein on NNT on both systems and in a bit I'll remove the app_info.xml files for the various Power Apps they run, so that when I re-allow work tomorrow, I'll only have to do a reset prior to that and hope to sit back and await the new downloads. ;-) | |
| ID: 93886 | | |
|
The new applications are in the pipe since yesterday afternoon. Now we only needs the new work for them. ;) | |
| ID: 93922 | | |
|
The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched. | |
| ID: 93932 | | |
The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched. And so I allowed for some extra minutes between its launch and me re-allowing work. Got an S5R4 of course. ;-) ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 93933 | | |
|
Yes, three hosts three times re-allowing new work and got 3 times S5R4-units ;-) | |
| ID: 93935 | | |
|
And same here: got two S5R4 reissues. I think it'll depend on how many unsent results for your frequency range there were and how quickly you and your data file partners can pick up the old results. The oldest unsent result in database is now just less than 7 days old. | |
| ID: 93937 | | |
|
Yes, the S5R4 workunits will remain the majority for the next few days. We won't cancel them so people will get credit for them; this will also help our servers to cope with the transition. | |
| ID: 93938 | | |
|
Go figure. My AMD got an S5R5 out-of-the-box. | |
| ID: 93939 | | |
Yes, the S5R4 workunits will remain the majority for the next few days. We won't cancel them so people will get credit for them; this will also help our servers to cope with the transition. Indeed it helps spread out the new application downloads and is of course the best way to handle the transition. I'm just eager to start the new run on my machines but will do my part to complete the S5R4 run. Looks like about 40 results to be done for my laptop with no partners in sight. Now have to check the quad... Thanks Bernd and here's hoping for a smooth changeover! ____________ ![]() | |
| ID: 93940 | | |
|
I have 2 E@H S5R4 tasks left to do... one of them is almost done (another 29h+)and the second one will take about 52h+ | |
| ID: 93946 | | |
|
I just transitioned three hosts from ap_info running 6.05 to stock (thus accepting S5R4 on 6.10 plus S5R5 on 3.01) | |
| ID: 93949 | | |
|
I have swapped part of one of my quads to the new WU. | |
| ID: 93950 | | |
|
Now crunching 3 3.01 WUs and projections suggest these will complete (for my older quad) in about 6 hours. The 6.10 WUs using the 6.05 client, with app_info file, was 7hours 22 minuted. | |
| ID: 93958 | | |
|
My Desktop got his first R5 WUs. Looks fine so far. Waiting for the first validations. | |
| ID: 93973 | | |
|
BTW, as the command line arguments to the app are now printed into teh debugging output of the results, it's much easier to check after a WU has finished whether it's runtime is near the expected minimum or maximum. | |
| ID: 93990 | | |
The S5R4 workunit generator (WUG) has been stopped and the S5R5 one been started instead. S5R5 has officially been launched. Hmm, something strange happened here. My internet just dropped off. Because of that I had some network problems (router doing strange things). The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found... Here's the log at the exact time my internet went off: 15-Jan-2009 18:57:33 [---] file projects/einstein.phys.uwm.edu/einstein_S5R4_6.09_graphics_windows_intelx86.exe not found 15-Jan-2009 18:57:33 [---] Suspending network activity - user request 15-Jan-2009 18:57:33 [Einstein@Home] [error] Application file einstein_S5R4_6.09_windows_intelx86.exe missing signature 15-Jan-2009 18:57:33 [Einstein@Home] [error] BOINC cannot accept this file 15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec 15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Reason: Unrecoverable error for result h1_1103.40_S5R4__791_S5R4a_1 (Input file einstein_S5R4_6.09_windows_intelx86.exe missing or invalid: -123) 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::report_result_error 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from start 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from resume_or_start1 15-Jan-2009 18:57:35 [Einstein@Home] Computation for task h1_1103.40_S5R4__791_S5R4a_1 finished 15-Jan-2009 18:57:35 [Einstein@Home] Output file h1_1103.40_S5R4__791_S5R4a_1_0 for task h1_1103.40_S5R4__791_S5R4a_1 absent 15-Jan-2009 18:57:35 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::app_finished It has now downloaded a new S5R5 task, but is still trying to download the 6.09 graphical application every minute. 15-Jan-09 20:15:21|Einstein@Home|Backing off 1 min 0 sec on download of einstein_S5R4_6.09_graphics_windows_intelx86.exe {scratch, scratch} was 6.09 a power app then? ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 94052 | | |
... The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found... Actually, the task exited because it was looking for the 6.09 version of the science app and that there was no file sig for that version. The fact that the 6.09 graphics app also couldn't be found was just collateral damage :-). I usually keep all the beta/power app versions on my server and I've just checked. I have versions (for Windows) 6.04, 6.05, 6.06, 6.07 and 6.10. AFAIK there was a Windows 6.09 version but it was to fix checkpointing issues under Win98 and ME if I recall correctly. Since the current beta and official version is 6.10 (and I'm guessing this would have been the version you were using) the reason for your problem is that for some unknown reason the version number associated with your task suddenly got changed from 610 to 609 in your state file and then BOINC suddenly realised that you didn't have the 6.09 app package with which to continue crunching it. The fact that BOINC tries to get the 609 app shows that you weren't using the AP mechanism and somehow BOINC thinks that 609 is official. I don't remember if 609 was ever official at any point. There are probably other variations on this but it seems that something in your state file that was 6.10 somehow got changed to 6.09 in some way. It's hard to see how this might be due to a loss of network connectivity. Another funny point is that BOINC complains about a missing signature for a 6.09 file. This seems to imply that you had such a file in your project folder and had run it under AP at some point so that there was a <file_info> block for it (with no file sig) in your state file. Surely BOINC wouldn't say that it can't accept the file if the file didn't actually exist?? So what version of the science app were you actually running?? ____________ Cheers, Gary. | |
| ID: 94057 | | |
... The next thing I know is that the old S5R4 task was gone. It exited with an error as the graphics application was not found... Gary, For the first (and probably only) time I'm going to disagree with you - you're probably the most technically astute (and courteous) moderator I've come across in my limited range of BOINC projects - and yet..... There was a Windows v6.09 package, and Bernd made his usual announcement thread for it. As a Beta, it would have come with an app_info.xml specifying all the filenames. And that's exactly the point. The anonymous platform mechanism requires that every file is named, explicitly. BOINC doesn't make up filenames by combining version numbers with filename root components. [It does make up 'friendly names' that way for display in BOINC Manager]. That does suggest that at some point Jord downloaded and tested Beta v6.09 - it must have been in a relatively short interval between 4 Dec 2008 (v6.08) and 1 January 2009 (v6.10). I was an active participant in the Windows 98 phase of that test, and those are the download datestamps of my preserved archives. | |
| ID: 94061 | | |
Since the current beta and official version is 6.10 (and I'm guessing this would have been the version you were using) the reason for your problem is that for some unknown reason the version number associated with your task suddenly got changed from 610 to 609 in your state file and then BOINC suddenly realised that you didn't have the 6.09 app package with which to continue crunching it. The fact that BOINC tries to get the 609 app shows that you weren't using the AP mechanism and somehow BOINC thinks that 609 is official. I don't remember if 609 was ever official at any point. I was running version 6.09 up until that time, with the app_info.xml file. But prior to trying for S5R5 work, I had set EAH to NNT, exited BOINC, taken out the app_info.xml file and the executables, restarted BOINC, reset the project (to clear straggling remnants in client_state.xml file) and re-allowed work fetch. As I mentioned in this thread, I had gotten an S5R4 task. It has been running it with the 6.09 application and hasn't had a problem with it until my internet connection dropped off. It had been running for several hours already before all of a sudden it found this file gone missing. 13-Jan-2009 23:38:35 [Einstein@Home] Starting h1_1103.40_S5R4__791_S5R4a_1 13-Jan-2009 23:38:38 [Einstein@Home] [task_debug] task_state=EXECUTING for h1_1103.40_S5R4__791_S5R4a_1 from start 13-Jan-2009 23:38:38 [Einstein@Home] Starting task h1_1103.40_S5R4__791_S5R4a_1 using einstein_S5R4 version 609 and 13-Jan-2009 23:41:40 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed 13-Jan-2009 23:43:29 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed 13-Jan-2009 23:45:18 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed and 15-Jan-2009 10:49:03 [Einstein@Home] [task_debug] result h1_1103.40_S5R4__791_S5R4a_1 checkpointed 15-Jan-2009 10:49:03 [Einstein@Home] [task_debug] task_state=QUIT_PENDING for h1_1103.40_S5R4__791_S5R4a_1 from preempt 15-Jan-2009 10:49:04 [Einstein@Home] [task_debug] Process for h1_1103.40_S5R4__791_S5R4a_1 exited 15-Jan-2009 10:49:04 [Einstein@Home] [task_debug] task_state=UNINITIALIZED for h1_1103.40_S5R4__791_S5R4a_1 from handle_premature_exit That was all she wrote, until my internet went out and I had to restart BOINC (for different reasons), to be greeted upon return by 15-Jan-2009 18:57:33 [---] file projects/einstein.phys.uwm.edu/einstein_S5R4_6.09_graphics_windows_intelx86.exe not found 15-Jan-2009 18:57:33 [---] Suspending network activity - user request 15-Jan-2009 18:57:33 [Einstein@Home] [error] Application file einstein_S5R4_6.09_windows_intelx86.exe missing signature 15-Jan-2009 18:57:33 [Einstein@Home] [error] BOINC cannot accept this file 15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec 15-Jan-2009 18:57:33 [Einstein@Home] [sched_op_debug] Reason: Unrecoverable error for result h1_1103.40_S5R4__791_S5R4a_1 (Input file einstein_S5R4_6.09_windows_intelx86.exe missing or invalid: -123) 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] result state=COMPUTE_ERROR for h1_1103.40_S5R4__791_S5R4a_1 from CS::report_result_error 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from start 15-Jan-2009 18:57:33 [Einstein@Home] [task_debug] task_state=COULDNT_START for h1_1103.40_S5R4__791_S5R4a_1 from resume_or_start1 Look, if you don't want to run for some reason from day one, you're not checkpointing either. ;-) After it kept on yammering that it couldn't find that one file, I even unpacked it from the zip file I have for 6.09, but then it would still not take it as the signature wouldn't match. Three further BOINC restarts fixed that. ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 94062 | | |
For the first (and probably only) time I'm going to disagree with you .... I'm wrong plenty of times so you should disagree with me quite a bit :-). At the moment I'm in the middle of converting 150 machines all running under AP and all still dual R3/R4 capable - although none of them has seen any R3 for a looooong time :-). All machines have caches in the range of 3 - 6 days and whilst EAH is the main project, some support LHC and some support SAH. Instead of just waiting for the caches to empty, I decided to dream up a conversion so that each machine could be dual capable for R4/R5 and that this transition should occur mid cache, so to speak, since none of my caches have actually drained as yet. I have a working solution that takes about 10 - 15 minutes per machine and I'm about half way through. The longest part of the procedure is actually making the state file R3 clean so that I can get rid of all the old R3 stuff still in the project directory of each host. Another significant component is adding the file signatures for the R4 beta test apps that subsequently became official, 6.02 for Linux and 6.10 for Windows. This is what allows the successful removal of AP while there are still R4 tasks onboard. Also, as part of the conversion procedure, the new R5 apps are added to the project folder and are then discovered by BOINC when it restarts. This saves a lot of bandwidth by not having to download the full R5 app package 150 times. After doing this surgery (requiring extreme concentration) for many, many hours, I decided I was in need of a rest so I decided to read the boards. So there was Jord's cry of pain which I read and responded to in rather too much haste in a mentally unfit state. I made the following dubious assumptions.
There was a Windows v6.09 package ... Yes, I know. I tried to say that I hadn't bothered to download it as it mustn't have been important for me. And that's exactly the point. The anonymous platform mechanism requires that every file is named, explicitly. BOINC doesn't make up filenames by combining version numbers with filename root components. [It does make up 'friendly names' that way for display in BOINC Manager]. I do actually more than fully understand all this :-). BOINC may not invent names but the editing mistakes of users certainly can. The crucial point is that since Jord tried so hard to "revert to stock", why was 6.09 being used at all? ____________ Cheers, Gary. | |
| ID: 94068 | | |
That's a question for Jord. When he received the R4 task, did he also receive the 6.10 stock app to go with it? If not, why not??? I never got 6.10 .. in fact the only applications I have in my Data\projects\einstein.phys.uwm.edu\ are: einstein_S5R5_3.01_windows_intelx86.exe einstein_S5R5_3.01_windows_intelx86_0.exe einstein_S5R5_3.01_windows_intelx86_1.exe einstein_S5R5_3.01_windows_intelx86_2.exe einstein_S5R5_3.01_graphics_windows_intelx86.exe and einstein_S5R4_6.09_windows_intelx86.exe einstein_S5R4_6.09_windows_intelx86_0.exe einstein_S5R4_6.09_windows_intelx86_1.exe einstein_S5R4_6.09_windows_intelx86_2.exe einstein_S5R4_6.09_graphics_windows_intelx86.exe There was a Windows v6.09 package ... I guess I did it because I was still at 6.04 or 6.05 before that. I never updated to 6.10 as I didn't see in time it was out. Was a bit busy elsewhere. The crucial point is that since Jord tried so hard to "revert to stock", why was 6.09 being used at all? And why did the app survive 2 earlier restarts of BOINC, before crashing out as being missing upon my internet connection going AWOL? (Although I am sure that was a coincidence, a one in a trillion shot. ;-)) I will do another reset after this S5R5 task has ran its course. Although, the task only ran for an hour and a half, I may get away with it and get it resent if I do the reset now. Reset project. It's resending me the same task. Good. It's also only resending me the 3.01 applications. The 6.09s are now gone from my Data\projects\einstein.phys.uwm.edu\ directory. I'll put a voodoo lock on it so they do stay gone. ;-) Also good news: I followed the whole same procedure on the AMD (win2k) and it just finished its first S5R5. I had a 632_60 done with S5R4, that ran in 92,132.46 seconds. The new one on S5R5 ran in 59,818.89 seconds. So definite speed up. I'll leave the credit comparing shenanigans to someone else. ;-) ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 94070 | | |
|
I'm having problems getting one machine to run the S5R5 application. It's host 475735, my Windows 2000 standard server (SP4). It's just my domestic file/print server, not a domain controller or anything. It's been running earlier versions of Einstein just fine (see the host join date/credit), and it's continuing to run SETI without problems. BOINC is v5.10.13 installed as a service - no recent change. | |
| ID: 94224 | | |
The CPU is a single-core P4 Northwood, with 512MB RAM. It's a very close match to my host 1036916, which runs S5R5 with no problems under XP SP3. Any ideas? Northwood had the hyperthreading hardware, though it was not enabled for use until pretty late in the development cycle (my Gallatin, a direct Northwood descendant, had HT enabled). If you do have HT, and have it enabled, you might get a change in behavior by disabling it. With my Gallatin host, it seemed to me that HT exposed bugs in more than one installer, so it could expose a bug in something else. Long shot. ____________ | |
| ID: 94226 | | |
The CPU is a single-core P4 Northwood, with 512MB RAM. It's a very close match to my host 1036916, which runs S5R5 with no problems under XP SP3. Any ideas? No, no HT enabled on either box. Both are unmodified Dell motherboards (XP on Dimension, W2KS on PowerEdge 600SC), so not much scope for getting the BIOS and the CPU out of sync! | |
| ID: 94228 | | |
|
I had asked this question already when S5R3 was at finish line. But, here it is again. Why not finish S5R4 ASAP by crunching it inside? There are only 27 units without final result - about a week of work for single computer. This will lead to removing excessive daemons like S5R4 assimilator, S5R4 validator and maybe even S5R4 filedeleter (not sure, may be it is common for all S5). If it was useful search - than it will be time to analyze the data, if not - throw it away ASAP. Are there any thoughts about this? | |
| ID: 96772 | | |
I had asked this question already when S5R3 was at finish line. But, here it is again. Why not finish S5R4 ASAP by crunching it inside? There are only 27 units without final result - about a week of work for single computer. This will lead to removing excessive daemons like S5R4 assimilator, S5R4 validator and maybe even S5R4 filedeleter (not sure, may be it is common for all S5). If it was useful search - than it will be time to analyze the data, if not - throw it away ASAP. Are there any thoughts about this? If scientists here would be eagerly awaiting the S5R4 results, we could help finish this run faster by raising the "initial replication" of the remaining workunits (i.e. sending out more tasks for them, two of these will hit fast computers). But actually they are still working on previous runs (finishing S5R1 publication, analyzing S5R3 results). If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all. Like OS daemons, the S5R4 ones just sleep until there is something to do. They don't harm the system at all. For the time being we're just keeping the S5R4 workunits in the system for participants to get credit, and to save us unnecessary additional work. BM | |
| ID: 96773 | | |
If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all. So it's possible all that work and crunch time could have been for nothing? ____________ ![]() ![]() ![]() | |
| ID: 96786 | | |
If by the time they are done with that the (higher sensitivity) S5R5 results for the same parameter space have been finished, they'll probably won't look at the corresponding S5R4 ones at all. At the time we started S5R4 it was the best search we could do. But then learning from analyzing the results we had so far we found a way to improve the sensitivity without requiring more computing power, so S5R5 was started, and S5R4 was cut short in favor of it. I would call S5R4 wasted if we had it continued till the end instead of superseding it by S5R5. Dakota tribal wisdom says that when you discover you are riding a dead horse, the best strategy is to dismount. BM | |
| ID: 96791 | | |
|
Yeah! We are finally finished the S5R4. Now it wouldn't waste our computer time anymore! Congratulations. | |
| ID: 97306 | | |
|
I'm now having a new question to developers: is it possible to place a progress bar or progress value for ABP1 search. I see the overal time for S5R5 gets even higher each day and the cause of it is an additional search the progress for which we don't know. So, it will be nice to see how fast we are going through it and when it'll finish to open the road for the mainstream project. | |
| ID: 97315 | | |
I'm now having a new question to developers: is it possible to place a progress bar or progress value for ABP1 search. I see the overal time for S5R5 gets even higher each day and the cause of it is an additional search the progress for which we don't know. So, it will be nice to see how fast we are going through it and when it'll finish to open the road for the mainstream project. Have a look in this thread! ____________ | |
| ID: 97318 | | |
Message boards :
Cruncher's Corner :
S5R5 plans