S5R1 and beyond |
Message boards : Cruncher's Corner : S5R1 and beyond
| Author | Message |
|---|---|
|
This is a short status update. All of us have been quite busy, as you probably can imagine, trying to fix all kinds of problems, and we still are. | |
| ID: 62509 | | |
Many of us can't schedule the crunched files for more than 20 hours now. When do you expect to solve the problems? All the problems were known since end of December. The information of the project officials is still disappointing for all cruncher. | |
| ID: 62525 | | |
|
bye e@h... | |
| ID: 62531 | | |
My personal thought, someone of the officials should have the time to post some news on the project otherwise many chruncher will leave the project. | |
| ID: 62533 | | |
|
Oh come on... chill out, guys. You know all members of the project staff are giving their best; what more can they do? I didn't think information was so bad here. Don't forget all those months and months the problem has been running smoothly; it must have been one of the most stable projects around. Of course all those recent problems are frustrating, but as I said, they're all giving their best, so give them a break... | |
| ID: 62534 | | |
I will not leave E@Home, sure. The recent "problems" are the first since many time now so some patience is needeed, that's all. ____________ [ | |
| ID: 62536 | | |
|
What happens now? | |
| ID: 62539 | | |
|
Bernd, | |
| ID: 62550 | | |
|
Calm down, people. The team is doing the best they can. Einstein has fun flawlessly FOREVER - it has problems for the first time and you jump down the team's throat? | |
| ID: 62558 | | |
Calm down, people. The team is doing the best they can. Einstein has fun flawlessly FOREVER - it has problems for the first time and you jump down the team's throat? In the past 5 years I have spent time at SETI, FAD (a completed project controlled from Oxford) and Einstein@home. I would have to say without hesitation that Einstien has had the lowest down time of the 3. But the one thing that I can't understand is why someone at the project can't spare a couple of minutes, type a sentence or two advising when the project is expected to be back up and put it up on one of pages that can still be acessed. Keeping new users in the dark for hours on end when the project goes down is the best way to send them running to some other project as fast as they can d/l the the software. I guess I just don't understand the scientific mind. F. Prefect ____________ In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams | |
| ID: 62567 | | |
But the one thing that I can't understand is why someone at the project can't spare a couple of minutes, type a sentence or two advising when the project is expected to be back up and put it up on one of pages that can still be acessed. Keeping new users in the dark for hours on end when the project goes down is the best way to send them running to some other project as fast as they can d/l the the software. I guess I just don't understand the scientific mind. Ford, you know how it goes: to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit. With expected time of a couple of days, we really don't want to write down the result for those who don't know the law ;) In the mean time, don't worry, happily crunch another project. ____________ Metod ... ![]() | |
| ID: 62582 | | |
to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit. LOL! Oh that is GOOD! Real good! I do hope there's no copyright on that. :-) Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62585 | | |
|
I am not leaving E@H, but I do think I will take a few days off and do some reconfiguring on my own network to get a few more units online. I hope when I come back in a few days all is well. | |
| ID: 62594 | | |
We are currently testing the setup for a new run that will look again into a smaller frequency range of the current S5R1 dataset with modified parameters (spindown and mismatch). We hope to start distributing this new workunits in the next days, so there should not be much of a gap to the S5R1 run. This run will last 2-3 months. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been. I have around 60+ datafiles waiting to report now and not getting anymore work. Should I/we kill everything that's waiting to report or should we hold on to them until you get things up and running again? If you could put some information on the main webpage to let us know how you want us to handle things during this transition and the expected time frame it would go a long way to helping us schedule our computers. For now I've suspended einstien (not on dial up but its still affecting my network). Arion ____________ ![]() | |
| ID: 62601 | | |
to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit. It's called Westheimer's rule. And should you use 30 mins or half an hour? | |
| ID: 62604 | | |
It's called Westheimer's rule. Ahh, you learn something new every day! Sounds like he was a colleague of Murphy. :-) And should you use 30 mins or half an hour? That clearly needs to be referred to committee..... Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62612 | | |
|
*lol* Yeah that really is a nice one. But Prefect, I guess it's not the scientific mind we're talking here ;-) this is admin business and that has it's own rules. Sometimes there's just no time to keep everyone informed, or you don't have anything new to say 'cause you're not sure when you will be finished yourself- or you have worked so many nights you just forget. It happens. | |
| ID: 62616 | | |
|
Hi folks, | |
| ID: 62626 | | |
Hi folks, see this good answer from Gary Roberts. [Edit] corrected typo errors...[/Edit] ____________ Udo ![]() | |
| ID: 62627 | | |
Hi folks, WOW! Thanks for this fast response! This helps me lot in understanding how things work. Have a nice day ____________ ![]() | |
| ID: 62628 | | |
|
Folks, | |
| ID: 62637 | | |
|
I run only 2 projects on my 14 machines, with a .01 cache. I've NEVER run out of work and I don't babysit BOINC. | |
| ID: 62644 | | |
It's called Westheimer's rule. There’s another rule that says the first 90% of a job takes up 90% of the time allocated, and then the last 10% takes up another 90%. But my favourite such principle is Hofstafdter’s Law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.� (If you’ve read anything by Douglas Hofstadter you’ll know he enjoys recursion of all kinds.) ____________ ![]() | |
| ID: 62656 | | |
|
Just one more example of a rule of physics that we have not discovered yet. It's called Westheimer's rule. | |
| ID: 62657 | | |
|
This thread wouldn't be here if some information about the server status and the reasons why the "server is down"-messages keep coming up, was available on the Einstein homepage. | |
| ID: 62665 | | |
This is a short status update. All of us have been quite busy, as you probably can imagine, trying to fix all kinds of problems, and we still are. Hi there at EAH, I know now that You all do YOu best to fix and solve the current situation, but to NOT disapointing the crunchers, it would be good that the news are regular updated. The last Info is from Jan. 7. :-( ! Regards Urban ____________ <a>http://www.boincstats.com/stats/banner.php?cpid=3837f9fafc28ff2e9df5b13ae2f8aaf7 | |
| ID: 62703 | | |
|
| |
| ID: 62712 | | |
I'm getting that too. It just started about half an hour ago for me, when BOINC sent a request, and it got 12WU's (2-day cache) and reported 68. Way to go E@H! ____________ ![]() | |
| ID: 62714 | | |
to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit The problem with the original Westheimer's rule is that it's recursive... No, seriously: - We have started distributing Work of a run called S5RI this morning - Lasting longer than the short Workunits of S5R1 this will lower the load on our database server, so things should go back to a more or less normal state from now (and already are...) Actually the situaton went pretty bad because of a number of issues that happened at the very same time: - hardware problems with the fileserver, causing delayed and thus accumulated reports - S5R1 was coming to an end, with almost only short workunits left - faster machines have been added after X-Mas :-) - Bruce was (and in some sense still is) moving with his family from Milwaukee to Hannover, which means that everything at UWM was on David's shoulders - currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today BM ____________ BM | |
| ID: 62717 | | |
- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today Storm warnings? That little bit of wind? It is a little bit worse than last week's storm, isn't it? ;-) ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 62723 | | |
You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded. After deletion, restart Boinc and the new WU's start coming :-) ____________ I am Homer of Borg. Prepare to be ...ooooh donuts! ![]() | |
| ID: 62728 | | |
|
Just seen msg saying back up over on Seti but I got this: | |
| ID: 62730 | | |
|
no wonder i didnt notice it with my vision, S5R1 and S5RI look exactly alike. talk about confuseing -.- | |
| ID: 62731 | | |
You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded. There are platforms that require to run "anonymous" Apps. I'm sticking together some new app_info.xmls for them to get the new work. BM ____________ BM | |
| ID: 62733 | | |
J18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database Yep, still a bit rough road. The latest performance issues were due to all validators running at full load to check the results that managed to come in now... BM ____________ BM | |
| ID: 62738 | | |
I know now that You all do YOu best to fix and solve the current situation, but to NOT disapointing the crunchers, it would be good that the news are regular updated. The last Info is from Jan. 7. :-( ! I'd appreciate that, too. It seems that the people with access permissions to do so are offline, probably getting some well-deserved sleep. BM ____________ BM | |
| ID: 62739 | | |
|
Great work guys :-) hope this did the trick. Sure looks like it. | |
| ID: 62743 | | |
|
Does anyone have any idea what this means? I just got all 44 of my results sent up from my main system and these are the mesages i'm getting. | |
| ID: 62744 | | |
|
Arion, please read this thread. | |
| ID: 62747 | | |
Arion, please read this thread. Thanks much appreciated the link.... ____________ ![]() | |
| ID: 62749 | | |
no wonder i didnt notice it with my vision, S5R1 and S5RI look exactly alike. talk about confuseing -.- LOL, glad I looked here before I started getting any of the new work. Mine are all still helping to clean up on R1. But wow, why 'I' for the designator on the new run? Maybe it stands for "interim". Alinator | |
| ID: 62758 | | |
|
Yep, that was my best guess, too. | |
| ID: 62765 | | |
J18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database First I'd like to say: good job EAH ! And it seems the validators are still busy, on connecting my 2 EAH hosts they reported several 10s of results and all of them are still in the initial state. Yes, even the ones that have reached quorum. Not a problem, I expect this will resolve itself in the coming days as the last "1"'s are reported in and the uploads of the "I" files are decreasing to normal levels. Hope the very busy server holds, though... | |
| ID: 62776 | | |
|
Returned my first S5RI | |
| ID: 62781 | | |
|
returned my first S5R1 | |
| ID: 62784 | | |
|
Thank you Bernd, for the update! :-) [aside] Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62786 | | |
returned my first S5R1 I believe you mean "first S5RI" ____________ Don't get distracted by shiny objects. ![]() | |
| ID: 62787 | | |
returned my first S5R1 Validate errors are server-side; AFAICT they occur when the validator can’t find the results it’s supposed to be comparing. There have been quite a few of them at S@h recently, associated with the server problems over there; likewise I would guess yours to have something to do with the server problems here. ____________ ![]() | |
| ID: 62789 | | |
|
It looks like a wrong version of the validator had been installed. | |
| ID: 62794 | | |
|
Still up and working... must be really extreme over there. I hope our fellow crunchers will at least show patience ;-) | |
| ID: 62796 | | |
It looks like a wrong version of the validator had been installed I was wondering why my Einstein units were giving me Rosetta credit :\\ :) ____________ ![]() | |
| ID: 62800 | | |
|
First successful S5RI results, are validating at 50% higher credit/hour than the s5R1 units they're replacing. | |
| ID: 62802 | | |
|
Sounds very logical, Dan. But to both points- who cares? ;-) | |
| ID: 62804 | | |
First successful S5RI results, are validating at 50% higher credit/hour than the s5R1 units they're replacing. The first few hundred Workunits have been accidentally generated with a higher credit (factor was 1.6 IIRC). We thought it wasn't worth the hazzle to manually dig them out of the DB and fix it. Seems you were just lucky. Credit should be back to what you expect from S5R1 with later charges of WUs. BM ____________ BM | |
| ID: 62805 | | |
It looks like a wrong version of the validator had been installed. Well done. I've always been partial to summary justice. Fair trials should always be followed by executions ...... :-) Sorry for the inconveninance, we're all a bit short on sleep. Keep the revolver loaded at the bedside then .... :-) Keep up the good work!! Cheers, Mike ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62806 | | |
2. There will be a mass of howling credit whores furious at being robbed. Ahh ...... Dan my man, you meant, of course: 'There will be a mass of howling credit hunters furious at being robbed' Big smile :-) Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62807 | | |
2. There will be a mass of howling credit whores furious at being robbed. You give them too much credit. ____________ ![]() | |
| ID: 62809 | | |
You give them too much credit. Absolutely!! :-) Cheers, Mike. ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62810 | | |
|
Now that we are crunching S5RI, does that mean that S5R1 is officially done? Does that mean that there are no more S5R1 WUs left to crunch aside from some unreported/unreturned WUs? | |
| ID: 62812 | | |
Now that we are crunching S5RI, does that mean that S5R1 is officially done? No, it's still washing up time. Does that mean that there are no more S5R1 WUs left to crunch aside from some unreported/unreturned WUs? Basically yes. It's tying up the loose quorums, reconciling missed stuff due to server wobbles etc. I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU? I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity? I'd say the former, though I guess it could be the latter. This is not a complaint or criticism, it's just an idle speculation. Please, speculate away.... :-) Cheers, Mike. ( edit ) Upon closer inspection, ie. I put my reading glasses on, my work units are labelled eg. 'h1_0374.0_S5R1__1503_S5RIa_0' - or spoken 'aych one underline zero three seven four point zero underline ess five arr ONE underline underline one five zero three underline ess five arr EYE ay underline zero' :-) So that'd make the 'S5RI' units a subset of 'S5R1' ..... ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62813 | | |
I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU? Yes you are. I don't know why the command line looks more complicated to you than the ones of S5R1. We are using a newer framework for our workunit generator, which may result in more options given on the command line than being hidden in the config file or in program defaults, but in priciple the program shouldn't do something different. I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity? Not really with a higher sensitivity, which would be something like a closer look. We're rather looking at a certain part from a different angle, or with a different focus, but from the more or less same distance. We found that the spindown values we were looking for in S5R1 might not have been optimal for this frequency range (150-720Hz, I think), so we've changed that for this short run. Originally the workunits resulting from this setup were a bit longer than the long S5R1 WUs, so we decided to cut them in a half to not exclude the slower computers. BM ____________ BM | |
| ID: 62815 | | |
Upon closer inspection, ie. I put my reading glasses on, my work units are labelled eg. 'h1_0374.0_S5R1__1503_S5RIa_0' - or spoken 'aych one underline zero three seven four point zero underline ess five arr ONE underline underline one five zero three underline ess five arr EYE ay underline zero' :-) The first part of a Workunit is just the name of the datafile it refers to. As we are using the same data files, they are still labeled S5R1, even if the workunit belongs to S5RI. And yes, in terms of the frequencies we're looking at S5RI is a subset of S5R1. BM ____________ BM | |
| ID: 62816 | | |
but in priciple the program shouldn't do something different. It just seemed to me that these long WUs were doing something like 6..7 times the amount of work per step than the short WUs. Originally the workunits resulting from this setup were a bit longer than the long S5R1 WUs, so we decided to cut them in a half to not exclude the slower computers. Well, I'm doing these long WUs in about 1 hour 40 min each, it must be taking slow computers half way to forever to get them done. Thanks for the update on the status of S5R1, it was kind of an anticlimatic finish to the S5R1 project with all the server problems. I hope you can find something useful in the data from this S5RI subset. | |
| ID: 62825 | | |
I hope you can find something useful in the data from this S5RI subset. The work is important whether or not something is found, though it'd be sweet if some waves were found. If nothing is found, tighter and tighter constraints are put on the background gravitational radiation and how often things like infall events happen. ____________ | |
| ID: 62856 | | |
It looks like a wrong version of the validator had been installed. Seems I was wrong - David has replaced the validator with the proper version. BM ____________ BM | |
| ID: 62859 | | |
Thank you Bernd, for the update! :-) hope i have that formated right.. mike, my visual acuaty is 20/400 in both eyes :) all so color blind, i forget the exact term but the kind where the names of like colors i.e. black, brown, and red. have no meaning to me. there the same color. same for green, yellow. and so on. i cant spell it but i have "astigmic myopia with nastagnus" ... lay terms, im farsighted nearsighted and my eyes "jitter". [sarcastic] needless to say, crossing the road is a real hassle, those darn drivers just wont get out of my way. [/sarcastic] ____________ seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift. | |
| ID: 62865 | | |
hope i have that formated right.. mike, my visual acuaty is 20/400 in both eyes :) all so color blind, i forget the exact term but the kind where the names of like colors i.e. black, brown, and red. have no meaning to me. there the same color. same for green, yellow. and so on. i cant spell it but i have "astigmic myopia with nastagnus" ... lay terms, im farsighted nearsighted and my eyes "jitter". Yeah, alas you lack ( the genes for ) the correct retinal pigment(s) that captures the appropriate photon frequencies, pumps up the molecular energy level to trigger a neural membrane signal. Blame your parents ..... sap in the family tree. :-) Astigmatic means an eye's preferential focal length ( ~ ideal distance to focus well ), which depends primarily on corneal contour ( shape of the front window of the eyeball ), is different depending on which axis you examine. So it is different for, say, the up/down axis vs. left/right. Most of us have a smidgen of this if looked at in sufficient detail. You have rather more than most. Nystagmus means, basically, the feedback loop(s) attempting to align both eyes to achieve an agreeable view or the world ( no double vision, same magnification of objects with each eye, acceptable parallax or movement perspective ) are suffering from overshoot and cannot find 'consensus' settings - at a guess due to the other issues. Kinda makes the Windows Accesibility Options somewhat more relevant, eh? Actually there's some sweet text to speech stuff around, and not too expensive. I use one 'Text to Speech Pro' which has a great English lady's voice - Audrey - and does a decent job of handling pagination, punctuation and various common idioms. [sarcastic] We get that regardless of anyone's vision! ____________ "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal | |
| ID: 62867 | | |
BM: Question? How do the new work units compare to the S5R1 short units? My poor old PII 300 mhz did a short in 13hrs, 45 mins. My first S5RI will take aprox 90+ hrs. acording to bonic manager. Does this look correct? | |
| ID: 62929 | | |
|
Haven't crunched any of the new ones, but mathematically... well... a long WU would be about 6 hours on my workstation, a short one less than one hour... maybe 7 or 8 times as much time for a long WU... so if the new ones are half as big as a long one, that would mean perhaps 4 times as much as a short WU. You should be closer to 60 than 90; I think your box is overestimating the time. Mine probably does, because more than 5 hours seem way too much when a LONG WU only takes 6 or 6.5... | |
| ID: 62931 | | |
BM: On my C2D the new units take about 4.5 times longer to crunch than the old short units. Can't help more than that, all my old slower computers now have homes with penniless students. Andy | |
| ID: 62932 | | |
|
21 hours on a 450 p3! Woooo! | |
| ID: 62936 | | |
|
Hey guys, | |
| ID: 62947 | | |
I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it? Maybe this is your problem: You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded. ____________ I am Homer of Borg. Prepare to be ...ooooh donuts! ![]() | |
| ID: 62950 | | |
Hey guys, Did you run S5R1 as a trial and install an app_inf.xml file in BOINC\\projects\\einstein.phys.uwm.edu. If you did this file needs to be deleted. Or try an undate you might, depending on BOINC ver be in a very long back-off due to recent server probs. Andy | |
| ID: 62951 | | |
|
I see the date for the updates on the main page still says 2006 ... | |
| ID: 62953 | | |
Hey guys, I've just looked at the last result you returned. It was crunched with the standard windows application so you wont have an app_info.xml file to cause a problem. Normally you don't need to do anything as the standard new app is automatically downloaded when needed. Because of the server problems just before the old run finished, it is possible that your client is in a one week backoff. Look on the projects tab of BOINC Manager. In the status column on the far right is there a counter on the EAH line counting down with maybe 100 hours or so still to go? If there is, just select the EAH line and then click the update button in the commands box at the left. That's all you need to do to get it going again. Take a look on the messages tab and you will see the action. ____________ Cheers, Gary. | |
| ID: 62954 | | |
|
Hi Andy, | |
| ID: 62955 | | |
|
Hi Gary, Hey guys, | |
| ID: 62956 | | |
|
Forgot to append this from the logs. | |
| ID: 62957 | | |
Hi Gary, Hi Sean, If it counts down 60 seconds then contact with the server occurred and it has done something :). We need to know what it has done and that will be contained in the last few lines of messages under the messages tab. You should be able to see when you clicked "Update" and the messages that came back after that. We need you to cut and paste those relevant messages in a new post here for us to look at. Thanks. EDIT: OK, I can now see the messages, thanks. BOINC is not allowing EAH to get new work, probably because it thinks that EAH has had too much time recently and that your other projects are more deserving. Please use an editor like notepad to examine your "client_state.xml" file. You don't have to stop anything as long as you make no changes and exit without saving. Do a search for a variable called "long_term_debt" (there will be one for each project) and tell us what the values are. ____________ Cheers, Gary. | |
| ID: 62958 | | |
|
Or use BOINCDV (debt viewer) which will show you the short term and long term debts without needing to edit the client_state.xml file. Windows only this program. | |
| ID: 62959 | | |
|
I've noticed this with the two projects. | |
| ID: 62960 | | |
|
No, that's the correct notation. The mean of all debt is always zero. So if you add up all the short term debts they should end in zero, the same story for the long term debts. | |
| ID: 62961 | | |
|
Hey Ageless, No, that's the correct notation. The mean of all debt is always zero. So if you add up all the short term debts they should end in zero, the same story for the long term debts. | |
| ID: 62962 | | |
Jord has given you the answer but to make sure you understand what to do, here are the steps in full:-
| |
| ID: 62963 | | |
|
Ok this is odd. Seriously, it's still rather larger than you would normally expect - particularly since you are only running two projects. It's possible for this to happen if you had set a very large number of days for your cache and had a very low value for your EAH resource share. Perhaps you would like to let us know those values so we can investigate a little further? | |
| ID: 62966 | | |
|
Hey guys, Seriously, it's still rather larger than you would normally expect - particularly since you are only running two projects. It's possible for this to happen if you had set a very large number of days for your cache and had a very low value for your EAH resource share. Perhaps you would like to let us know those values so we can investigate a little further? Could it have just been a quirk of the heat we've been having here this past week? It's been in the high 30 - 40 degrees here and it did make my machine behave a little weird, had to shut it down for a day as the heat was rather intense in this room, roughly 40degrees C. (Wild stab in the dark) | |
| ID: 62968 | | |
Ok this is odd. Sean, I've looked up your computers both on Seti and on EAH. Seti shows you as having 1 computer and your results list there shows no client errors (yet). EAH now shows you as having two (identical) computers so when you made the change to the LTD value you have obviously corrupted something else in the state file so that EAH has given your machine a new ID. We can fix the double ID problem later on. Now that you are back with the old state file, you have eliminated the corruption (whatever it was) but you are going to soon see EAH do no more crunching for about 12 days unless you fix the LTD values. You basically have two options:- (i) Leave everything alone and wait 12 days for EAH to repay the debt (ii) Change the debt values to zero without changing anything else (which you must have done last time). Have you used a text editor previously? Are you confident that you can avoid changing values of other variables (or more likely corrupting the syntax of the file in some way)? You can probably work out what BOINC was upset with last time if you examine the contents of the log files stdoutdae.txt, stderrdae.txt, stdoutgui.txt and stderrgui.txt in your boinc folder. Some or all of these may contain information relevant to what happened when you started BOINC with the edited state file. Please be assured that the edit of LTDs itself cannot have caused the observed action. There must have been other corruption. Did you use notepad to do the edit? Let us know what you would like to do. ____________ Cheers, Gary. | |
| ID: 62969 | | |
It's highly unlikeley to be related to temperature. In my experience if a machine gets too hot it simply locks up. A reboot gets it going again. Rarely do I see any file corruption. Any corruption is usually in Windows files, not BOINC files. I'd like you to have another go at editing the state file. If you used notepad last time, please start it up again (without loading any file) and look under the format tab and tell me whether the wordwrap option is ticked or not. ____________ Cheers, Gary. | |
| ID: 62970 | | |
|
Gary, Sean, | |
| ID: 62971 | | |
|
Don't get me worried, I used the trick of resetting all debts to zero, too. Hope nothing will crash (couldn't see yet because my box is far from idle atm and doesn't get much crunching done)... but at least I used a good text editor and didn't have word wrap or sth on. | |
| ID: 62972 | | |
|
Ok I did exactly as you posted, # Stop BOINC completely and confirm with task manager that nothing is running Except for the last step as I rebooted the machine. Now the values are: Einstein <long_term_debt>0.156250</long_term_debt> Seti <long_term_debt>-0.156250</long_term_debt> I'm guessing this is more what we are looking for? | |
| ID: 62973 | | |
Don't get me worried, I used the trick of resetting all debts to zero, too. Hope nothing will crash (couldn't see yet because my box is far from idle atm and doesn't get much crunching done)... but at least I used a good text editor and didn't have word wrap or sth on. Thank you Annika, I'm sure it will work out. Right now I am crunching Einstein units so things are better than they were. :-) Oh and I'd prefer 10 degrees and rain any day over 40, and we sure do need rain here! | |
| ID: 62974 | | |
|
I could send you some, we definitely have more than enough here ;-) | |
| ID: 62975 | | |
Ok I did exactly as you posted, ... Good. I deliberately put 0.000000 in the instructions because I knew that BOINC will write out 6 decimal digits every time it updates that value. You would think it would be happy with a plain 0 but I've actually never been game to test that even though I've done this exact same edit many times before. I've always used 0.00000 to be safe. I don't know that this was the problem but I'm not suggesting we try it again to find out :).
Same difference!! No problem.
Great!! This is exactly what you should be seeing. The only problem left is that you now have two machine IDs on the EAH project. I'm guessing that the one in your state file at the moment will be recorded as <hostid>783254</hostid>. You will find it several lines above the LTD value you have been editing. The newer one that you also have acquired is 840813 but I don't believe that this second one will show up in your state file. Please let me know which one is currently there and I'll tell you how to merge them. ____________ Cheers, Gary. | |
| ID: 62976 | | |
|
Hi Gary, I've looked up your computers both on Seti and on EAH. Seti shows you as having 1 computer and your results list there shows no client errors (yet). EAH now shows you as having two (identical) computers so when you made the change to the LTD value you have obviously corrupted something else in the state file so that EAH has given your machine a new ID. We can fix the double ID problem later on. | |
| ID: 62977 | | |
|
Ok here we go. | |
| ID: 62979 | | |
Ok here we go. OK, I was hoping it wouldn't be that one. Doesn't matter all that much. After the merge all your results and credit history will be back under one hostid anyway. If you could step me through what to do I'll do it in the morning, it's getting a bit late. ;-) Yeah, I know :). I'm in Brisbane but I'm guessing you are south of the border and on daylight saving time so it is getting a little late for you. Bummer, just saw Lleyton lose his match in the Open. Anyway, here are your instructions for the morning:-
Thank-you Gary. You're welcome. ____________ Cheers, Gary. | |
| ID: 62984 | | |
I deliberately put 0.000000 in the instructions because I knew that BOINC will write out 6 decimal digits every time it updates that value. You would think it would be happy with a plain 0 but I've actually never been game to test that even though I've done this exact same edit many times before. I've always used 0.00000 to be safe. I don't know that this was the problem but I'm not suggesting we try it again to find out :). When I did that kind of edit a while ago I just put a plain 0 in there. So I can tell you BOINC doesn't have any problem with that. ____________ ![]() | |
| ID: 62990 | | |
|
Hmmm, interesting. I haven't done a debt reset in a while, but my experience has been the same as Gary's. If you just replace the value with '0' instead of '0.000000', BOINC would revert to using the old value (which I assume meant it reverted to the value in the backup client_state file). | |
| ID: 62997 | | |
- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today Well Ageless, you see, two days later with 12 people killed in germany alone, nobody will talk about "a little bit of wind" again... :( *GRRRR!* Next time just leave it, please! | |
| ID: 63007 | | |
Hmmm, interesting. I haven't done a debt reset in a while, but my experience has been the same as Gary's. If you just replace the value with '0' instead of '0.000000', BOINC would revert to using the old value (which I assume meant it reverted to the value in the backup client_state file). Don't bother, my 5.4.11 app (which should be the "current stable" as I downloaded it only a week ago after my Windows reinstall) reacted just the same. ____________ ![]() | |
| ID: 63009 | | |
As Jord pointed out, the EAH debt of -1024672 seconds represents the time that other projects (ie Seti) will run before EAH will be allowed to run. He was obviously using one of those famous Dutch cakculators because his estimate was a little bit out :). It's actually just under 12 days and not 9 months - pretty close for a Dutch calculator don't you think :). I blame lack of sleep. I hadn't slept (normally) for over 3 days. But I am all up to speed again. Missed the step of dividing it by 24. ;) ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 63013 | | |
Yeah, I know :). I'm in Brisbane but I'm guessing you are south of the border and on daylight saving time so it is getting a little late for you. Yep I live in Melbourne, oh and it RAINED last night!!! :-D Ok I followed the merging steps and my computer id is now 840813 and has incorporated 783254's results, well all but 10 points. So it looks like it's all working. I've noticed the LTDs are now <long_term_debt>1802.000000</long_term_debt>, so I'm guessing that every now and then I should check them and reset to 0.000000. Thanks for your help Gary, all seems to be working now. :-) Sean | |
| ID: 63020 | | |
That sounds pretty good. It should be OK now.
NO, not at all!!! Please realise that we only fiddle with the state file in exceptional circumstances. A value of 1800 seconds (ie half an hour) is not at all unusual when the normal switch interval is an hour - 3600 seconds. If you would like to see a simple example of how the LTD values will change as the projects share the CPU, have a look at this post in another thread where the OP was having a problem with a crazy set of values for LTDs. You may need to review some earlier messages to get the full context but the example in the second half of my post shows exactly how LTD values are supposed to vary. In normal circumstances LTD values will cycle between +3600 and -3600 and will often be "crossing the equator" so to speak. They will move outside this general range if a project has to run more than it should (to meet a deadline) or cannot run at all (lack of work, server problems, etc). LTD is BOINCs mechanism for ensuring that your resource shares are respected. EDIT: Resetting the debts by editing like this should be viewed as a bit of emergency surgery to correct a problem - not a routine exercise to be undertaken whenever you feel like it. Thanks for your help Gary, all seems to be working now. :-) You're welcome! ____________ Cheers, Gary. | |
| ID: 63022 | | |
That's OK ;). I was just "rattling your cage" :). ____________ Cheers, Gary. | |
| ID: 63023 | | |
|
LOL... | |
| ID: 63024 | | |
|
Ok then, no problems I'll leave things as they are and have a read of the link you sent me to get more info on this.
| |
| ID: 63030 | | |
LOL... Absolutely - never doubted it for a minute. I was just too lazy on two counts:- (i) I didn't have the patience to open up a copy of my state file in one window and go through it line by line, matching variable names and his jumbled-up values. (ii) When you had done that work for me, I failed to appreciate, initially, just how big the LTD really was. Sometimes you can look at a number and it just doesn't register .... There was a real benefit however. The OP learned the technique for scanning and presenting excerpts from his state file which he was obviously pleased about. When he did that, the abnormal LTD suddenly hit me in the face :). It's the old adage really - you give a person a fish and he can feed himself for a day or you give him a fishing line .... :). ____________ Cheers, Gary. | |
| ID: 63070 | | |
|
[snip] [aside] to which I add - B, 8 and 0 (when written with a slash thru it) and O Cheers, Mike.[/quote] | |
| ID: 63124 | | |
|
Now that everyone is getting WU's again, how about fixing the "server status" page? | |
| ID: 63125 | | |
|
Many Thanks, | |
| ID: 63223 | | |
|
Does anyone know when the current "maintenance" will be done? | |
| ID: 63268 | | |
|
Gary, | |
| ID: 63289 | | |
|
Thanks for that. | |
| ID: 63292 | | |
|
Just an FYI. I jumped on the BOINC forum and posted there as well. Personally, I'd consider this a rather large error of ommission so I recommended that either a debt reset option be added, or that a configurable debt 'cap' be added to limit the damage when this occurs. I consider it undesirable to force users to have to edit configuration files manually, especially when you are dealing with a user base as large as this one. This is definitely one case where waiting for BOINC to sort it out, while it is an option, is probably not the optimal solution. ;-) | |
| ID: 63302 | | |
|
So it seems the problem persists. The last post is quite old, so i would like to hear if there is any news on this problem? I guess we all have the same problem with E@H - i have problems sending and receiving WUs. The 'Server status'-page is still down, though i recall it was up and running around a week ago. Didn't it run properly or did it crash? | |
| ID: 64248 | | |
It is switched off to reduce server load! ____________ Udo ![]() | |
| ID: 64286 | | |
|
So will it never get back online? Too bad, i really liked that page a lot. I got a feeling that we were getting somewhere. So is it only temporarily or permanent? | |
| ID: 64433 | | |
So will it never get back online? No, it´s not permanent donwn, but not every enquiery for fresh data become immediately answered by the server, as there are problems since more than a week. Try over and over, you will get data. BOINC will do this for you automaticaly. - Look at the Message board of your BOINC. - Up to now, I got allways new data at the right time for continuesly crunching. ( 1 file takes about 3.25h to become crunched at my host.) There is also a problem with low crunching speed of the validators. As a meanvalue of the last 8 days, I get only 50% of the reported files validated. So your list of pending files will increase all time. Up to now, I have no indication that any reported file gone lost. This is my personal experience. ____________ | |
| ID: 64439 | | |
So will it never get back online? Too bad, i really liked that page a lot. I got a feeling that we were getting somewhere. So is it only temporarily or permanent? We'll definitely put it back online once the database problems have been solved. BM ____________ BM | |
| ID: 64493 | | |
Message boards :
Cruncher's Corner :
S5R1 and beyond