S5R1 and beyond


Advanced search

Message boards : Cruncher's Corner : S5R1 and beyond

Sort
AuthorMessage
Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62509 - Posted 16 Jan 2007 20:07:50 UTC
Last modified: 16 Jan 2007 21:50:51 UTC

This is a short status update. All of us have been quite busy, as you probably can imagine, trying to fix all kinds of problems, and we still are.

- Today we generated the last Workunit of S5R1. All that remains to do of that run is to crunch the remaining Workunits that are already in the database and for which no canonical result has been found yet.

[edit:] - There are probably only a small number of tasks remaining for every frequency band, which causes hosts to download a new datafile for almost each task. Dial-Up users may want to suspend the project for the next few days.

- A lot of problems we had recently, in particular the database problems, seem to have come mostly from the fact that near the end of S5R1 much more short Workunits were left, so they came in at a much higher rate than we expected. With the end of S5R1, things should be back to normal again.

- We are currently testing the setup for a new run that will look again into a smaller frequency range of the current S5R1 dataset with modified parameters (spindown and mismatch). We hope to start distributing this new workunits in the next days, so there should not be much of a gap to the S5R1 run. This run will last 2-3 months. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been.

I hope to have time to post some more info here as soon as it becomes available.

BM
____________
BM

Profile GreyCruncher
Joined: Sep 2 06
Posts: 20
ID: 211731
Credit: 10,497,276
RAC: 16,822
Message 62525 - Posted 16 Jan 2007 21:51:35 UTC - in response to Message 62509.
Last modified: 16 Jan 2007 22:12:48 UTC


- A lot of problems we had recently, in paricular the database problems, seem to have come mostly from the fact that near the end of S5R1 much more short Workunits were left, so they came in at a much higher rate than we expected. With the end of S5R1, things should be back to normal again.


Many of us can't schedule the crunched files for more than 20 hours now. When do you expect to solve the problems?

All the problems were known since end of December. The information of the project officials is still disappointing for all cruncher.

EggZZ
Joined: Feb 7 06
Posts: 2
ID: 169871
Credit: 2,497,398
RAC: 2,305
Message 62531 - Posted 16 Jan 2007 22:38:40 UTC

bye e@h...

:(

EggZZ
____________

Profile GreyCruncher
Joined: Sep 2 06
Posts: 20
ID: 211731
Credit: 10,497,276
RAC: 16,822
Message 62533 - Posted 16 Jan 2007 23:10:54 UTC - in response to Message 62509.


I hope to have time to post some more info here as soon as it becomes available.
BM


My personal thought, someone of the officials should have the time to post some news on the project otherwise many chruncher will leave the project.

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62534 - Posted 16 Jan 2007 23:24:29 UTC

Oh come on... chill out, guys. You know all members of the project staff are giving their best; what more can they do? I didn't think information was so bad here. Don't forget all those months and months the problem has been running smoothly; it must have been one of the most stable projects around. Of course all those recent problems are frustrating, but as I said, they're all giving their best, so give them a break...
____________

Profile Nightbird
Joined: Feb 17 05
Posts: 79
ID: 17951
Credit: 561,723
RAC: 0
Message 62536 - Posted 16 Jan 2007 23:40:41 UTC - in response to Message 62533.
Last modified: 16 Jan 2007 23:44:34 UTC


I hope to have time to post some more info here as soon as it becomes available.
BM


My personal thought, someone of the officials should have the time to post some news on the project otherwise many chruncher will leave the project.

I will not leave E@Home, sure.
The recent "problems" are the first since many time now so some patience is needeed, that's all.

____________
[

Profile GreyCruncher
Joined: Sep 2 06
Posts: 20
ID: 211731
Credit: 10,497,276
RAC: 16,822
Message 62539 - Posted 17 Jan 2007 0:06:38 UTC
Last modified: 17 Jan 2007 0:27:19 UTC

What happens now?
All WU uploaded, but report responses this (Example of one Host):

17.01.2007 00:39:23|Einstein@Home|Project is down
17.01.2007 01:01:05|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
17.01.2007 01:01:05|Einstein@Home|Reason: Requested by user
17.01.2007 01:01:05|Einstein@Home|Reporting 15 tasks
17.01.2007 01:01:10|Einstein@Home|Scheduler request succeeded
17.01.2007 01:01:10|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
17.01.2007 01:01:10|Einstein@Home|Project is down

For more than 24 Hours

Most of our Hosts report:
Communication deferred 167.00.00 hours :(

IMHO this is not a professionell work

Chris(one of the Borg Cube)
____________

Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 62550 - Posted 17 Jan 2007 2:40:20 UTC

Bernd,
Thanks for keeping us updated.

And we will remain Patient won't we.

Andy

[B^S] MattDavis
Joined: Jan 18 05
Posts: 71
ID: 2162
Credit: 1,731,280
RAC: 1,450
Message 62558 - Posted 17 Jan 2007 3:25:16 UTC
Last modified: 17 Jan 2007 3:43:16 UTC

Calm down, people. The team is doing the best they can. Einstein has fun flawlessly FOREVER - it has problems for the first time and you jump down the team's throat?
____________

F. Prefect
Avatar
Joined: Nov 7 05
Posts: 137
ID: 119854
Credit: 882,195
RAC: 526
Message 62567 - Posted 17 Jan 2007 4:00:02 UTC - in response to Message 62558.

Calm down, people. The team is doing the best they can. Einstein has fun flawlessly FOREVER - it has problems for the first time and you jump down the team's throat?


In the past 5 years I have spent time at SETI, FAD (a completed project controlled from Oxford) and Einstein@home. I would have to say without hesitation that Einstien has had the lowest down time of the 3.

But the one thing that I can't understand is why someone at the project can't spare a couple of minutes, type a sentence or two advising when the project is expected to be back up and put it up on one of pages that can still be acessed. Keeping new users in the dark for hours on end when the project goes down is the best way to send them running to some other project as fast as they can d/l the the software. I guess I just don't understand the scientific mind.

F. Prefect
____________
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams

Metod, S56RKO
Joined: Feb 11 05
Posts: 119
ID: 15557
Credit: 14,176,632
RAC: 13,576
Message 62582 - Posted 17 Jan 2007 6:20:11 UTC - in response to Message 62567.
Last modified: 17 Jan 2007 6:21:51 UTC

But the one thing that I can't understand is why someone at the project can't spare a couple of minutes, type a sentence or two advising when the project is expected to be back up and put it up on one of pages that can still be acessed. Keeping new users in the dark for hours on end when the project goes down is the best way to send them running to some other project as fast as they can d/l the the software. I guess I just don't understand the scientific mind.


Ford, you know how it goes: to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit. With expected time of a couple of days, we really don't want to write down the result for those who don't know the law ;)

In the mean time, don't worry, happily crunch another project.
____________
Metod ...

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62585 - Posted 17 Jan 2007 6:34:22 UTC - in response to Message 62582.
Last modified: 17 Jan 2007 6:34:46 UTC

to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit.

LOL! Oh that is GOOD! Real good! I do hope there's no copyright on that. :-)

Cheers, Mike.

____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Profile JDBurch
Joined: Sep 2 05
Posts: 191
ID: 106096
Credit: 1,145,117
RAC: 9
Message 62594 - Posted 17 Jan 2007 7:43:15 UTC

I am not leaving E@H, but I do think I will take a few days off and do some reconfiguring on my own network to get a few more units online. I hope when I come back in a few days all is well.
____________

Profile Arion
Avatar
Joined: Mar 20 05
Posts: 145
ID: 61093
Credit: 1,194,698
RAC: 2,409
Message 62601 - Posted 17 Jan 2007 8:19:46 UTC - in response to Message 62509.

We are currently testing the setup for a new run that will look again into a smaller frequency range of the current S5R1 dataset with modified parameters (spindown and mismatch). We hope to start distributing this new workunits in the next days, so there should not be much of a gap to the S5R1 run. This run will last 2-3 months. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been.

I hope to have time to post some more info here as soon as it becomes available.

BM


I have around 60+ datafiles waiting to report now and not getting anymore work. Should I/we kill everything that's waiting to report or should we hold on to them until you get things up and running again?
If you could put some information on the main webpage to let us know how you want us to handle things during this transition and the expected time frame it would go a long way to helping us schedule our computers. For now I've suspended einstien (not on dial up but its still affecting my network).

Arion
____________

Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 62604 - Posted 17 Jan 2007 9:07:30 UTC - in response to Message 62585.

to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit.

LOL! Oh that is GOOD! Real good! I do hope there's no copyright on that. :-)

Cheers, Mike.

It's called Westheimer's rule.

And should you use 30 mins or half an hour?

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62612 - Posted 17 Jan 2007 10:25:18 UTC - in response to Message 62604.

It's called Westheimer's rule.

Ahh, you learn something new every day!
Sounds like he was a colleague of Murphy. :-)
And should you use 30 mins or half an hour?

That clearly needs to be referred to committee.....

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62616 - Posted 17 Jan 2007 10:36:33 UTC

*lol* Yeah that really is a nice one. But Prefect, I guess it's not the scientific mind we're talking here ;-) this is admin business and that has it's own rules. Sometimes there's just no time to keep everyone informed, or you don't have anything new to say 'cause you're not sure when you will be finished yourself- or you have worked so many nights you just forget. It happens.
I wish the team every possible success with their task- I'll do what I can and crunch at least those WUs waiting on my boxes, and I'm sure most people here will do the same.
____________

Profile [SG]marodeur6
Avatar
Joined: Jan 1 06
Posts: 8
ID: 156638
Credit: 1,434,783
RAC: 0
Message 62626 - Posted 17 Jan 2007 12:14:14 UTC

Hi folks,

what about a short answer to the above question of Arion :-)
I have nearly 300 of this short WUswaiting for reporting. What should I do?
Was this all for nothing, or will they be processed later? There are still 200+ to crunsh on my systems...

This will help a lot!

Many thanks in advance

____________

Profile Udo
Joined: May 19 05
Posts: 204
ID: 82463
Credit: 3,415,929
RAC: 1,235
Message 62627 - Posted 17 Jan 2007 12:25:52 UTC - in response to Message 62626.
Last modified: 17 Jan 2007 12:27:18 UTC

Hi folks,

what about a short answer to the above question of Arion :-)
I have nearly 300 of this short WUswaiting for reporting. What should I do?
Was this all for nothing, or will they be processed later? There are still 200+ to crunsh on my systems...

This will help a lot!

Many thanks in advance


see this good answer from Gary Roberts.

[Edit] corrected typo errors...[/Edit]
____________
Udo

Profile [SG]marodeur6
Avatar
Joined: Jan 1 06
Posts: 8
ID: 156638
Credit: 1,434,783
RAC: 0
Message 62628 - Posted 17 Jan 2007 12:31:17 UTC - in response to Message 62627.

Hi folks,

what about a short answer to the above question of Arion :-)
I have nearly 300 of this short WUswaiting for reporting. What should I do?
Was this all for nothing, or will they be processed later? There are still 200+ to crunsh on my systems...

This will help a lot!

Many thanks in advance


see this good answer from Gary Roberts.

[Edit] corrected typo errors...[/Edit]


WOW!

Thanks for this fast response! This helps me lot in understanding how things work.

Have a nice day
____________

Jim Wilkins
Joined: Jun 1 05
Posts: 11
ID: 85054
Credit: 423,185
RAC: 803
Message 62637 - Posted 17 Jan 2007 14:59:40 UTC

Folks,

It will come up when it comes up and BOINC will handle it. I'm not sure what the fuss is about. This is why one should have multiple projects to crunch on.

Jim

[B^S] MattDavis
Joined: Jan 18 05
Posts: 71
ID: 2162
Credit: 1,731,280
RAC: 1,450
Message 62644 - Posted 17 Jan 2007 16:36:54 UTC

I run only 2 projects on my 14 machines, with a .01 cache. I've NEVER run out of work and I don't babysit BOINC.

Just leave BOINC alone! It will send the units on its own!
____________

Odysseus
Avatar
Joined: Dec 17 05
Posts: 349
ID: 149638
Credit: 1,496,873
RAC: 1,287
Message 62656 - Posted 17 Jan 2007 18:47:05 UTC - in response to Message 62612.

It's called Westheimer's rule.

Ahh, you learn something new every day!
Sounds like he was a colleague of Murphy. :-)

There’s another rule that says the first 90% of a job takes up 90% of the time allocated, and then the last 10% takes up another 90%.

But my favourite such principle is Hofstafdter’s Law:
“It always takes longer than you expect, even when you take into account Hofstadter’s Law.�

(If you’ve read anything by Douglas Hofstadter you’ll know he enjoys recursion of all kinds.)
____________

Jim Wilkins
Joined: Jun 1 05
Posts: 11
ID: 85054
Credit: 423,185
RAC: 803
Message 62657 - Posted 17 Jan 2007 19:14:30 UTC - in response to Message 62656.

Just one more example of a rule of physics that we have not discovered yet.

Jim

It's called Westheimer's rule.

Ahh, you learn something new every day!
Sounds like he was a colleague of Murphy. :-)

There’s another rule that says the first 90% of a job takes up 90% of the time allocated, and then the last 10% takes up another 90%.

But my favourite such principle is Hofstafdter’s Law:
“It always takes longer than you expect, even when you take into account Hofstadter’s Law.�

(If you’ve read anything by Douglas Hofstadter you’ll know he enjoys recursion of all kinds.)

Profile Jan Gnodde
Joined: Nov 28 05
Posts: 9
ID: 131738
Credit: 1,073,677
RAC: 0
Message 62665 - Posted 17 Jan 2007 20:19:57 UTC

This thread wouldn't be here if some information about the server status and the reasons why the "server is down"-messages keep coming up, was available on the Einstein homepage.

Jan.

____________

Urban
Joined: Feb 20 05
Posts: 7
ID: 23151
Credit: 873,782
RAC: 261
Message 62703 - Posted 18 Jan 2007 6:31:46 UTC - in response to Message 62509.

This is a short status update. All of us have been quite busy, as you probably can imagine, trying to fix all kinds of problems, and we still are.

I hope to have time to post some more info here as soon as it becomes available.

BM


Hi there at EAH,

I know now that You all do YOu best to fix and solve the current situation, but to NOT disapointing the crunchers, it would be good that the news are regular updated. The last Info is from Jan. 7. :-( !

Regards
Urban
____________
<a>http://www.boincstats.com/stats/banner.php?cpid=3837f9fafc28ff2e9df5b13ae2f8aaf7

Profile Udo
Joined: May 19 05
Posts: 204
ID: 82463
Credit: 3,415,929
RAC: 1,235
Message 62712 - Posted 18 Jan 2007 11:50:10 UTC


For 2h I'm getting NEW WUs!
They are named S5RI.
____________
Udo

Profile jedirock
Avatar
Joined: Jun 11 06
Posts: 23
ID: 199132
Credit: 246,315
RAC: 0
Message 62714 - Posted 18 Jan 2007 12:34:19 UTC - in response to Message 62712.


For 2h I'm getting NEW WUs!
They are named S5RI.


I'm getting that too. It just started about half an hour ago for me, when BOINC sent a request, and it got 12WU's (2-day cache) and reported 68. Way to go E@H!

____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62717 - Posted 18 Jan 2007 12:46:03 UTC - in response to Message 62582.
Last modified: 18 Jan 2007 16:39:56 UTC

to get needed time, take expected time, multiply it by 2 and switch over to next larger time unit

The problem with the original Westheimer's rule is that it's recursive...

No, seriously:

- We have started distributing Work of a run called S5RI this morning
- Lasting longer than the short Workunits of S5R1 this will lower the load on our database server, so things should go back to a more or less normal state from now (and already are...)

Actually the situaton went pretty bad because of a number of issues that happened at the very same time:

- hardware problems with the fileserver, causing delayed and thus accumulated reports
- S5R1 was coming to an end, with almost only short workunits left
- faster machines have been added after X-Mas :-)
- Bruce was (and in some sense still is) moving with his family from Milwaukee to Hannover, which means that everything at UWM was on David's shoulders
- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today

BM

____________
BM

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 62723 - Posted 18 Jan 2007 13:39:49 UTC - in response to Message 62717.

- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today

Storm warnings? That little bit of wind?
It is a little bit worse than last week's storm, isn't it? ;-)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

ledi
Joined: Mar 7 06
Posts: 16
ID: 178649
Credit: 257,098
RAC: 0
Message 62728 - Posted 18 Jan 2007 13:55:17 UTC - in response to Message 62717.


- We have started distributing Work of a run called S5RI this morning
- Lasting longer than the short Workunits of S5R1 this will lower the load on our database server, so things should go back to a more or less normal state from now


You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded.
After deletion, restart Boinc and the new WU's start coming :-)

____________
I am Homer of Borg. Prepare to be ...ooooh donuts!


Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 62730 - Posted 18 Jan 2007 14:09:45 UTC

Just seen msg saying back up over on Seti but I got this:
18/01/2007 14:02:22|Einstein@Home|Sending scheduler request: Requested by user
18/01/2007 14:02:22|Einstein@Home|(not requesting new work or reporting completed tasks)
18/01/2007 14:02:27|Einstein@Home|Scheduler RPC succeeded
18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database
18/01/2007 14:02:27|Einstein@Home|Deferring communication for 1 hr 0 min 0 sec
18/01/2007 14:02:27|Einstein@Home|Reason: project is down


Andy

Profile paul milton
Avatar
Joined: Sep 16 05
Posts: 191
ID: 109635
Credit: 435,032
RAC: 1,374
Message 62731 - Posted 18 Jan 2007 14:43:26 UTC - in response to Message 62730.
Last modified: 18 Jan 2007 14:46:31 UTC

no wonder i didnt notice it with my vision, S5R1 and S5RI look exactly alike. talk about confuseing -.-

edit: i should clarify, i did notice it downloaded S5RI but due to my vision i tholt it was redownloading the S5R1 files.

and btw, WAY TO GO! i know you guys have had one heck of a time ove the last several days. you deserve a pat on the back! (and a break)
____________
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62733 - Posted 18 Jan 2007 15:15:30 UTC - in response to Message 62728.

You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded.
After deletion, restart Boinc and the new WU's start coming :-)

There are platforms that require to run "anonymous" Apps. I'm sticking together some new app_info.xmls for them to get the new work.

BM

____________
BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62738 - Posted 18 Jan 2007 16:05:15 UTC - in response to Message 62730.

J18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database

Yep, still a bit rough road.

The latest performance issues were due to all validators running at full load to check the results that managed to come in now...

BM
____________
BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62739 - Posted 18 Jan 2007 16:15:22 UTC - in response to Message 62703.
Last modified: 18 Jan 2007 16:17:13 UTC

I know now that You all do YOu best to fix and solve the current situation, but to NOT disapointing the crunchers, it would be good that the news are regular updated. The last Info is from Jan. 7. :-( !

I'd appreciate that, too. It seems that the people with access permissions to do so are offline, probably getting some well-deserved sleep.

BM
____________
BM

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62743 - Posted 18 Jan 2007 16:40:34 UTC

Great work guys :-) hope this did the trick. Sure looks like it.
I'm going to power on my laptop, too, as soon as I'm finished posting this, so it gets its share of the new WUs aswell ;-) storm warning here, too, but no serious storm yet. Hope I'll be able to let the workstation run but I don't have a generator or sth here, so when it gets really bad it might be laptop only ;-) we'll see.
____________

Profile Arion
Avatar
Joined: Mar 20 05
Posts: 145
ID: 61093
Credit: 1,194,698
RAC: 2,409
Message 62744 - Posted 18 Jan 2007 16:41:03 UTC

Does anyone have any idea what this means? I just got all 44 of my results sent up from my main system and these are the mesages i'm getting.

What the heck is it refusing my WU for?


1/18/2007 11:32:17 AM|Einstein@Home|Requesting 86400 seconds of new work, and reporting 44 completed tasks
1/18/2007 11:32:32 AM|Einstein@Home|Scheduler request succeeded
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7861_S5R1a_0 refused: successful result ALREADY reported for this work
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7860_S5R1a_0 refused: successful result ALREADY reported for this work
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7859_S5R1a_0 refused: successful result ALREADY reported for this work
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7822_S5R1a_0 refused: successful result ALREADY reported for this work
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7819_S5R1a_0 refused: successful result ALREADY reported for this work
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Completed result h1_0399.0_S5R1__7816_S5R1a_0 refused: successful result ALREADY reported for this work

1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__398_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__397_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__396_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__395_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__394_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__393_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__392_S5RIa_0
1/18/2007 11:32:32 AM|Einstein@Home|Message from server: Resent lost result h1_0215.0_S5R1__391_S5RIa_0

____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 62747 - Posted 18 Jan 2007 16:52:33 UTC - in response to Message 62744.

Arion, please read this thread.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Arion
Avatar
Joined: Mar 20 05
Posts: 145
ID: 61093
Credit: 1,194,698
RAC: 2,409
Message 62749 - Posted 18 Jan 2007 17:46:56 UTC - in response to Message 62747.

Arion, please read this thread.



Thanks much appreciated the link....



____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 62758 - Posted 18 Jan 2007 19:22:51 UTC - in response to Message 62731.

no wonder i didnt notice it with my vision, S5R1 and S5RI look exactly alike. talk about confuseing -.-

edit: i should clarify, i did notice it downloaded S5RI but due to my vision i tholt it was redownloading the S5R1 files.

and btw, WAY TO GO! i know you guys have had one heck of a time ove the last several days. you deserve a pat on the back! (and a break)


LOL, glad I looked here before I started getting any of the new work. Mine are all still helping to clean up on R1. But wow, why 'I' for the designator on the new run? Maybe it stands for "interim".

Alinator

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62765 - Posted 18 Jan 2007 20:09:08 UTC

Yep, that was my best guess, too.
____________

Profile WimTea
Avatar
Joined: Feb 14 06
Posts: 48
ID: 172001
Credit: 117,392
RAC: 585
Message 62776 - Posted 18 Jan 2007 21:28:01 UTC - in response to Message 62738.

J18/01/2007 14:02:27|Einstein@Home|Message from server: Server can't open database

Yep, still a bit rough road.

The latest performance issues were due to all validators running at full load to check the results that managed to come in now...

BM


First I'd like to say: good job EAH !

And it seems the validators are still busy, on connecting my 2 EAH hosts they reported several 10s of results and all of them are still in the initial state. Yes, even the ones that have reached quorum.
Not a problem, I expect this will resolve itself in the coming days as the last "1"'s are reported in and the uploads of the "I" files are decreasing to normal levels. Hope the very busy server holds, though...

Profile Pooh Bear 27
Avatar
Joined: Mar 20 05
Posts: 1330
ID: 61731
Credit: 3,487,843
RAC: 1,967
Message 62781 - Posted 18 Jan 2007 22:31:20 UTC

Returned my first S5RI
26358042

Looking good, the partnered has not sent theirs back, so waiting to see validation.

____________

Profile Major
Avatar
Joined: May 22 06
Posts: 4
ID: 196033
Credit: 259,234
RAC: 0
Message 62784 - Posted 18 Jan 2007 22:38:02 UTC

returned my first S5R1

26334359

Looking BAD! INVALID? WHY?
____________

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62786 - Posted 18 Jan 2007 22:44:06 UTC

Thank you Bernd, for the update! :-)
Three cheers for all the hardworking E@H workers!
May you all sleep safe, sound and dry.

[aside]
Always a worry, depending upon the font and personal acuity:
1 and I, O and 0, S and 5, E and 3, g and 9, A and 4, B and 8, 2 and Z, b and 6.
[/aside]


Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

larry1186
Joined: Sep 20 06
Posts: 3
ID: 216011
Credit: 38,635
RAC: 0
Message 62787 - Posted 18 Jan 2007 22:50:15 UTC - in response to Message 62784.

returned my first S5R1

26334359


I believe you mean "first S5RI"
____________
Don't get distracted by shiny objects.

Odysseus
Avatar
Joined: Dec 17 05
Posts: 349
ID: 149638
Credit: 1,496,873
RAC: 1,287
Message 62789 - Posted 18 Jan 2007 22:55:21 UTC - in response to Message 62784.

returned my first S5R1

26334359

Looking BAD! INVALID? WHY?

Validate errors are server-side; AFAICT they occur when the validator can’t find the results it’s supposed to be comparing. There have been quite a few of them at S@h recently, associated with the server problems over there; likewise I would guess yours to have something to do with the server problems here.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62794 - Posted 18 Jan 2007 23:14:10 UTC
Last modified: 19 Jan 2007 0:11:05 UTC

It looks like a wrong version of the validator had been installed.

The one responsible for this had been found and shot. Now there's no way to fix it anymore.

Seriously: Bruce will be the first awake with the permissions to fix it, so this should be cured tomorrow morning (CET). All results that have been marked "validate error" should be validated again, probably there's nothing wrong with them.

Sorry for the inconvenience, we're all a bit short on sleep.

BM
____________
BM

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62796 - Posted 18 Jan 2007 23:18:32 UTC

Still up and working... must be really extreme over there. I hope our fellow crunchers will at least show patience ;-)
____________

[B^S] MattDavis
Joined: Jan 18 05
Posts: 71
ID: 2162
Credit: 1,731,280
RAC: 1,450
Message 62800 - Posted 18 Jan 2007 23:36:16 UTC - in response to Message 62794.

It looks like a wrong version of the validator had been installed



I was wondering why my Einstein units were giving me Rosetta credit :\\


:)

____________

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,617
RAC: 8,996
Message 62802 - Posted 18 Jan 2007 23:46:37 UTC

First successful S5RI results, are validating at 50% higher credit/hour than the s5R1 units they're replacing.

A64x2 @ 2.5gig

S5R1 23 credits/hour
S5RI 34 credits/hour

I make two predictions:
1. In the near future credit will be significantly nerfed back in line with S5R1 and the other major projects.
2. There will be a mass of howling credit whores furious at being robbed.
____________

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62804 - Posted 19 Jan 2007 0:01:53 UTC

Sounds very logical, Dan. But to both points- who cares? ;-)
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62805 - Posted 19 Jan 2007 0:04:55 UTC - in response to Message 62802.

First successful S5RI results, are validating at 50% higher credit/hour than the s5R1 units they're replacing.


The first few hundred Workunits have been accidentally generated with a higher credit (factor was 1.6 IIRC). We thought it wasn't worth the hazzle to manually dig them out of the DB and fix it. Seems you were just lucky. Credit should be back to what you expect from S5R1 with later charges of WUs.

BM
____________
BM

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62806 - Posted 19 Jan 2007 0:06:42 UTC - in response to Message 62794.

It looks like a wrong version of the validator had been installed.
The one responsible for this had been found and shot. Now there's no way to fix it anymore.

Well done. I've always been partial to summary justice. Fair trials should always be followed by executions ...... :-)
Sorry for the inconveninance, we're all a bit short on sleep.

Keep the revolver loaded at the bedside then .... :-)

Keep up the good work!!

Cheers, Mike


____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62807 - Posted 19 Jan 2007 0:16:37 UTC - in response to Message 62802.

2. There will be a mass of howling credit whores furious at being robbed.

Ahh ...... Dan my man, you meant, of course:

'There will be a mass of howling credit hunters furious at being robbed'

Big smile :-)

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,617
RAC: 8,996
Message 62809 - Posted 19 Jan 2007 0:21:16 UTC - in response to Message 62807.

2. There will be a mass of howling credit whores furious at being robbed.

Ahh ...... Dan my man, you meant, of course:

'There will be a mass of howling credit hunters furious at being robbed'


You give them too much credit.
____________

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62810 - Posted 19 Jan 2007 0:22:32 UTC - in response to Message 62809.

You give them too much credit.

Absolutely!! :-)

Cheers, Mike.

____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Bob Guy
Joined: Feb 9 05
Posts: 12
ID: 11733
Credit: 58,059
RAC: 0
Message 62812 - Posted 19 Jan 2007 0:34:52 UTC

Now that we are crunching S5RI, does that mean that S5R1 is officially done? Does that mean that there are no more S5R1 WUs left to crunch aside from some unreported/unreturned WUs?

I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU? I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity?

This is not a complaint or criticism, it's just an idle speculation.

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62813 - Posted 19 Jan 2007 0:45:02 UTC - in response to Message 62812.
Last modified: 19 Jan 2007 1:13:36 UTC

Now that we are crunching S5RI, does that mean that S5R1 is officially done?

No, it's still washing up time.

Does that mean that there are no more S5R1 WUs left to crunch aside from some unreported/unreturned WUs?

Basically yes. It's tying up the loose quorums, reconciling missed stuff due to server wobbles etc.

I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU? I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity?

I'd say the former, though I guess it could be the latter.

This is not a complaint or criticism, it's just an idle speculation.


Please, speculate away.... :-)

Cheers, Mike.

( edit ) Upon closer inspection, ie. I put my reading glasses on, my work units are labelled eg. 'h1_0374.0_S5R1__1503_S5RIa_0' - or spoken 'aych one underline zero three seven four point zero underline ess five arr ONE underline underline one five zero three underline ess five arr EYE ay underline zero' :-)

So that'd make the 'S5RI' units a subset of 'S5R1' .....

____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62815 - Posted 19 Jan 2007 1:14:56 UTC - in response to Message 62812.
Last modified: 19 Jan 2007 1:32:42 UTC

I've noticed that the (long) S5RI WUs seem to have a complicated command line. Am I wrong in thinking that that is just a kludge to force the WUs to do more work than is necessary in order to extend the time it takes to do a WU?

Yes you are.

I don't know why the command line looks more complicated to you than the ones of S5R1. We are using a newer framework for our workunit generator, which may result in more options given on the command line than being hidden in the config file or in program defaults, but in priciple the program shouldn't do something different.

I'm imagining that that was done only to lessen the stress on the servers caused by the short return times of the short(er) WUs. Or, are we now re-crunching interesting WUs using a higher degree of sensitivity?

Not really with a higher sensitivity, which would be something like a closer look. We're rather looking at a certain part from a different angle, or with a different focus, but from the more or less same distance. We found that the spindown values we were looking for in S5R1 might not have been optimal for this frequency range (150-720Hz, I think), so we've changed that for this short run.

Originally the workunits resulting from this setup were a bit longer than the long S5R1 WUs, so we decided to cut them in a half to not exclude the slower computers.

BM
____________
BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62816 - Posted 19 Jan 2007 1:21:36 UTC - in response to Message 62813.

Upon closer inspection, ie. I put my reading glasses on, my work units are labelled eg. 'h1_0374.0_S5R1__1503_S5RIa_0' - or spoken 'aych one underline zero three seven four point zero underline ess five arr ONE underline underline one five zero three underline ess five arr EYE ay underline zero' :-)

So that'd make the 'S5RI' units a subset of 'S5R1' .....


The first part of a Workunit is just the name of the datafile it refers to. As we are using the same data files, they are still labeled S5R1, even if the workunit belongs to S5RI. And yes, in terms of the frequencies we're looking at S5RI is a subset of S5R1.

BM
____________
BM

Bob Guy
Joined: Feb 9 05
Posts: 12
ID: 11733
Credit: 58,059
RAC: 0
Message 62825 - Posted 19 Jan 2007 3:17:00 UTC - in response to Message 62815.

but in priciple the program shouldn't do something different.

It just seemed to me that these long WUs were doing something like 6..7 times the amount of work per step than the short WUs.

Originally the workunits resulting from this setup were a bit longer than the long S5R1 WUs, so we decided to cut them in a half to not exclude the slower computers.

BM

Well, I'm doing these long WUs in about 1 hour 40 min each, it must be taking slow computers half way to forever to get them done.

Thanks for the update on the status of S5R1, it was kind of an anticlimatic finish to the S5R1 project with all the server problems. I hope you can find something useful in the data from this S5RI subset.

jowr
Joined: Feb 19 05
Posts: 55
ID: 19126
Credit: 1,947,636
RAC: 0
Message 62856 - Posted 19 Jan 2007 8:18:13 UTC - in response to Message 62825.

I hope you can find something useful in the data from this S5RI subset.


The work is important whether or not something is found, though it'd be sweet if some waves were found.

If nothing is found, tighter and tighter constraints are put on the background gravitational radiation and how often things like infall events happen.

____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 62859 - Posted 19 Jan 2007 8:56:15 UTC - in response to Message 62794.

It looks like a wrong version of the validator had been installed.

Seriously: Bruce will be the first awake with the permissions to fix it, so this should be cured tomorrow morning (CET). All results that have been marked "validate error" should be validated again, probably there's nothing wrong with them.

Seems I was wrong - David has replaced the validator with the proper version.

BM
____________
BM

Profile paul milton
Avatar
Joined: Sep 16 05
Posts: 191
ID: 109635
Credit: 435,032
RAC: 1,374
Message 62865 - Posted 19 Jan 2007 11:24:42 UTC - in response to Message 62859.
Last modified: 19 Jan 2007 11:25:27 UTC

Thank you Bernd, for the update! :-)
Three cheers for all the hardworking E@H workers!
May you all sleep safe, sound and dry.

[aside]
Always a worry, depending upon the font and personal acuity:
1 and I, O and 0, S and 5, E and 3, g and 9, A and 4, B and 8, 2 and Z, b and 6.
[/aside]




Cheers, Mike.


hope i have that formated right.. mike, my visual acuaty is 20/400 in both eyes :) all so color blind, i forget the exact term but the kind where the names of like colors i.e. black, brown, and red. have no meaning to me. there the same color. same for green, yellow. and so on. i cant spell it but i have "astigmic myopia with nastagnus" ... lay terms, im farsighted nearsighted and my eyes "jitter".

[sarcastic]
needless to say, crossing the road is a real hassle, those darn drivers just wont get out of my way.
[/sarcastic]
____________
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,434,218
RAC: 5,116
Message 62867 - Posted 19 Jan 2007 11:53:39 UTC - in response to Message 62865.
Last modified: 19 Jan 2007 11:55:45 UTC

hope i have that formated right.. mike, my visual acuaty is 20/400 in both eyes :) all so color blind, i forget the exact term but the kind where the names of like colors i.e. black, brown, and red. have no meaning to me. there the same color. same for green, yellow. and so on. i cant spell it but i have "astigmic myopia with nastagnus" ... lay terms, im farsighted nearsighted and my eyes "jitter".

Yeah, alas you lack ( the genes for ) the correct retinal pigment(s) that captures the appropriate photon frequencies, pumps up the molecular energy level to trigger a neural membrane signal. Blame your parents ..... sap in the family tree. :-)

Astigmatic means an eye's preferential focal length ( ~ ideal distance to focus well ), which depends primarily on corneal contour ( shape of the front window of the eyeball ), is different depending on which axis you examine. So it is different for, say, the up/down axis vs. left/right. Most of us have a smidgen of this if looked at in sufficient detail. You have rather more than most. Nystagmus means, basically, the feedback loop(s) attempting to align both eyes to achieve an agreeable view or the world ( no double vision, same magnification of objects with each eye, acceptable parallax or movement perspective ) are suffering from overshoot and cannot find 'consensus' settings - at a guess due to the other issues.

Kinda makes the Windows Accesibility Options somewhat more relevant, eh?

Actually there's some sweet text to speech stuff around, and not too expensive. I use one 'Text to Speech Pro' which has a great English lady's voice - Audrey - and does a decent job of handling pagination, punctuation and various common idioms.

[sarcastic]
needless to say, crossing the road is a real hassle, those darn drivers just wont get out of my way.
[/sarcastic]

We get that regardless of anyone's vision!
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Hofman's Atlantic
Joined: Oct 27 06
Posts: 2
ID: 226860
Credit: 18,527
RAC: 0
Message 62929 - Posted 20 Jan 2007 3:12:00 UTC - in response to Message 62509.

BM:
- We are currently testing the setup for a new run. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been.

Question? How do the new work units compare to the S5R1 short units? My poor old PII 300 mhz did a short in 13hrs, 45 mins. My first S5RI will take aprox 90+ hrs. acording to bonic manager. Does this look correct?

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62931 - Posted 20 Jan 2007 3:38:18 UTC

Haven't crunched any of the new ones, but mathematically... well... a long WU would be about 6 hours on my workstation, a short one less than one hour... maybe 7 or 8 times as much time for a long WU... so if the new ones are half as big as a long one, that would mean perhaps 4 times as much as a short WU. You should be closer to 60 than 90; I think your box is overestimating the time. Mine probably does, because more than 5 hours seem way too much when a LONG WU only takes 6 or 6.5...
____________

Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 62932 - Posted 20 Jan 2007 3:43:44 UTC - in response to Message 62929.

BM:
- We are currently testing the setup for a new run. It will consist of only one type of workunits that are a bit more than half as long as the S5R1 long ones have been.

Question? How do the new work units compare to the S5R1 short units? My poor old PII 300 mhz did a short in 13hrs, 45 mins. My first S5RI will take aprox 90+ hrs. acording to bonic manager. Does this look correct?

On my C2D the new units take about 4.5 times longer to crunch than the old short units. Can't help more than that, all my old slower computers now have homes with penniless students.

Andy

[B^S] MattDavis
Joined: Jan 18 05
Posts: 71
ID: 2162
Credit: 1,731,280
RAC: 1,450
Message 62936 - Posted 20 Jan 2007 4:37:32 UTC

21 hours on a 450 p3! Woooo!
____________

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62947 - Posted 20 Jan 2007 7:04:45 UTC

Hey guys,

I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it?
Am I supposed to do something to connect to the new project?

Sean

ledi
Joined: Mar 7 06
Posts: 16
ID: 178649
Credit: 257,098
RAC: 0
Message 62950 - Posted 20 Jan 2007 8:28:07 UTC - in response to Message 62947.

I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it?
Am I supposed to do something to connect to the new project?
Sean


Maybe this is your problem:

You might need to delete your app_info.xml-file. If this file is in your projects/einstein-directory it will prevent the new version to be downloaded.
After deletion, restart Boinc and the new WU's start coming :-)


____________
I am Homer of Borg. Prepare to be ...ooooh donuts!


Winterknight
Joined: Jun 4 05
Posts: 312
ID: 85786
Credit: 651,234
RAC: 78
Message 62951 - Posted 20 Jan 2007 8:28:20 UTC - in response to Message 62947.

Hey guys,

I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it?
Am I supposed to do something to connect to the new project?

Sean

Did you run S5R1 as a trial and install an app_inf.xml file in BOINC\\projects\\einstein.phys.uwm.edu. If you did this file needs to be deleted.
Or try an undate you might, depending on BOINC ver be in a very long back-off due to recent server probs.

Andy

Tobie
Joined: Sep 4 06
Posts: 6
ID: 212046
Credit: 79,955
RAC: 0
Message 62953 - Posted 20 Jan 2007 9:36:24 UTC

I see the date for the updates on the main page still says 2006 ...

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62954 - Posted 20 Jan 2007 9:41:37 UTC - in response to Message 62947.

Hey guys,

I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it?
Am I supposed to do something to connect to the new project?

Sean


I've just looked at the last result you returned. It was crunched with the standard windows application so you wont have an app_info.xml file to cause a problem. Normally you don't need to do anything as the standard new app is automatically downloaded when needed.

Because of the server problems just before the old run finished, it is possible that your client is in a one week backoff. Look on the projects tab of BOINC Manager. In the status column on the far right is there a counter on the EAH line counting down with maybe 100 hours or so still to go? If there is, just select the EAH line and then click the update button in the commands box at the left.

That's all you need to do to get it going again. Take a look on the messages tab and you will see the action.

____________
Cheers,
Gary.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62955 - Posted 20 Jan 2007 9:42:26 UTC - in response to Message 62951.

Hi Andy,
I haven't made any changes or been a part of a trial at all. I'm running BOINC 5.4.11 on WinXP. On the 18th work units stopped arriving. In my einstein directory I do see reference to the new work units dated the 16th.
eg, 16,220,160 h1_0383.0_S5R1 amongst others.
Also I see no reference to "app_info.xml-file" in my einstein project directory.

Any ideas?


[/quote]
Did you run S5R1 as a trial and install an app_inf.xml file in BOINC\\projects\\einstein.phys.uwm.edu. If you did this file needs to be deleted.
Or try an undate you might, depending on BOINC ver be in a very long back-off due to recent server probs.

Andy[/quote]

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62956 - Posted 20 Jan 2007 9:47:55 UTC - in response to Message 62954.

Hi Gary,

Under the status column it is blank and when I highlight the E@H line and click the update button in counts down 60secs and nothing happens. I've got seti@home running with no dramas so I'm guessing the BOINC client isn't falling over.

Sean

Hey guys,

I haven't seen head nor hide of the new work units that have been mentioned. My last work unit was on the 18th Jan and thats been it?
Am I supposed to do something to connect to the new project?

Sean


I've just looked at the last result you returned. It was crunched with the standard windows application so you wont have an app_info.xml file to cause a problem. Normally you don't need to do anything as the standard new app is automatically downloaded when needed.

Because of the server problems just before the old run finished, it is possible that your client is in a one week backoff. Look on the projects tab of BOINC Manager. In the status column on the far right is there a counter on the EAH line counting down with maybe 100 hours or so still to go? If there is, just select the EAH line and then click the update button in the commands box at the left.

That's all you need to do to get it going again. Take a look on the messages tab and you will see the action.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62957 - Posted 20 Jan 2007 9:52:15 UTC

Forgot to append this from the logs.
This is what I get under the Message tab when I click the update button.


20/01/2007 8:45:02 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
20/01/2007 8:45:02 PM|Einstein@Home|Reason: Requested by user
20/01/2007 8:45:02 PM|Einstein@Home|(not requesting new work or reporting completed tasks)
20/01/2007 8:45:07 PM|Einstein@Home|Scheduler request succeeded

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62958 - Posted 20 Jan 2007 9:54:08 UTC - in response to Message 62956.
Last modified: 20 Jan 2007 10:01:26 UTC

Hi Gary,

Under the status column it is blank and when I highlight the E@H line and click the update button in counts down 60secs and nothing happens. I've got seti@home running with no dramas so I'm guessing the BOINC client isn't falling over.

Sean



Hi Sean,

If it counts down 60 seconds then contact with the server occurred and it has done something :). We need to know what it has done and that will be contained in the last few lines of messages under the messages tab. You should be able to see when you clicked "Update" and the messages that came back after that. We need you to cut and paste those relevant messages in a new post here for us to look at.

Thanks.



EDIT: OK, I can now see the messages, thanks. BOINC is not allowing EAH to get new work, probably because it thinks that EAH has had too much time recently and that your other projects are more deserving. Please use an editor like notepad to examine your "client_state.xml" file. You don't have to stop anything as long as you make no changes and exit without saving. Do a search for a variable called "long_term_debt" (there will be one for each project) and tell us what the values are.


____________
Cheers,
Gary.

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 62959 - Posted 20 Jan 2007 10:17:24 UTC
Last modified: 20 Jan 2007 10:18:10 UTC

Or use BOINCDV (debt viewer) which will show you the short term and long term debts without needing to edit the client_state.xml file. Windows only this program.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62960 - Posted 20 Jan 2007 10:27:01 UTC

I've noticed this with the two projects.

The Seti project has;
<long_term_debt>1024672.689278</long_term_debt>
the Einstein project has;
<long_term_debt>-1024672.689278</long_term_debt>

The Einstein has a "-" in the string, could this be the issue?

Sean

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 62961 - Posted 20 Jan 2007 10:35:16 UTC - in response to Message 62960.

No, that's the correct notation. The mean of all debt is always zero. So if you add up all the short term debts they should end in zero, the same story for the long term debts.

For the next 1024672 seconds you'll be crunching Seti. Which is only 9 months. ;)
Did you have Seti on No New Work/Tasks or Suspended for a long time, or what?

Anyway, you could edit both those values so they show 0
Do make sure you have exited BOINC completely before you do so.
And save the client_state.xml file with the Save option (not Save As..) before restarting BOINC.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62962 - Posted 20 Jan 2007 10:42:37 UTC - in response to Message 62961.

Hey Ageless,

I haven't touch the running of the projects at all, I've just let it do what it needed without any interference.

Ok I'll give your suggestion a go, I'll backup the original file and go from there. I'm guessing this will return it back to its normal operation where it switched between the two projects of its own accord?

Sean

No, that's the correct notation. The mean of all debt is always zero. So if you add up all the short term debts they should end in zero, the same story for the long term debts.

For the next 1024672 seconds you'll be crunching Seti. Which is only 9 months. ;)
Did you have Seti on No New Work/Tasks or Suspended for a long time, or what?

Anyway, you could edit both those values so they show 0
Do make sure you have exited BOINC completely before you do so.
And save the client_state.xml file with the Save option (not Save As..) before restarting BOINC.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62963 - Posted 20 Jan 2007 11:00:04 UTC - in response to Message 62960.


the Einstein project has;
<long_term_debt>-1024672.689278</long_term_debt>


Jord has given you the answer but to make sure you understand what to do, here are the steps in full:-

  • Stop BOINC completely and confirm with task manager that nothing is running
  • Open client_state.xml with notepad and edit so that both debts are <long_term_debt>0.000000</long_term_debt>
  • Save your changes and exit notepad (don't use save as...)
  • Start BOINC



Your client should now contact the server and download new executables and data files and everything will return to normal.

As Jord pointed out, the EAH debt of -1024672 seconds represents the time that other projects (ie Seti) will run before EAH will be allowed to run. He was obviously using one of those famous Dutch cakculators because his estimate was a little bit out :). It's actually just under 12 days and not 9 months - pretty close for a Dutch calculator don't you think :).

Seriously, it's still rather larger than you would normally expect - particularly since you are only running two projects. It's possible for this to happen if you had set a very large number of days for your cache and had a very low value for your EAH resource share. Perhaps you would like to let us know those values so we can investigate a little further?


____________
Cheers,
Gary.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62966 - Posted 20 Jan 2007 11:24:44 UTC

Ok this is odd.

This is what I did. Stopped BOINC completely. I made a backup copy of the original file, just in case, and edited the original file as stated by Ageless. Started BOINC and got computation errors on my Seti units and download errors on Einstein. So I put the backed up file back into the directory, correctly named, with original long_term_debt values and started BOINC.
Seti started to work, started on new units as the other two have, Computation error next to them, then under the Transfer tab I could see that I was downloading a 15.47mb file.
Right now the two seti units that were working are in Preempted mode and I've got two Einstein units working, this by the way is normal. After the Einstein units finish, which is in just under two hours bit longer then normal, the seti units will begin and it cycles something like that as it has done since I joined E@H.
The values for long term debt are now;
Einstein
<long_term_debt>-1021112.009591</long_term_debt>
Seti
<long_term_debt>1021112.009591</long_term_debt>

Seriously, it's still rather larger than you would normally expect - particularly since you are only running two projects. It's possible for this to happen if you had set a very large number of days for your cache and had a very low value for your EAH resource share. Perhaps you would like to let us know those values so we can investigate a little further?

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62968 - Posted 20 Jan 2007 12:02:03 UTC

Hey guys,

Thanks for your help, it is appreciated.

Do you think I should just leave it and let it run for a while or should I check your suggestion Gary;

Seriously, it's still rather larger than you would normally expect - particularly since you are only running two projects. It's possible for this to happen if you had set a very large number of days for your cache and had a very low value for your EAH resource share. Perhaps you would like to let us know those values so we can investigate a little further?


Could it have just been a quirk of the heat we've been having here this past week? It's been in the high 30 - 40 degrees here and it did make my machine behave a little weird, had to shut it down for a day as the heat was rather intense in this room, roughly 40degrees C.
(Wild stab in the dark)

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62969 - Posted 20 Jan 2007 12:32:28 UTC - in response to Message 62966.

Ok this is odd.

This is what I did. Stopped BOINC completely. I made a backup copy of the original file, just in case, and edited the original file as stated by Ageless. Started BOINC and got computation errors on my Seti units and download errors on Einstein. So I put the backed up file back into the directory, correctly named, with original long_term_debt values and started BOINC.
Seti started to work, started on new units as the other two have, Computation error next to them, then under the Transfer tab I could see that I was downloading a 15.47mb file.
Right now the two seti units that were working are in Preempted mode and I've got two Einstein units working, this by the way is normal. After the Einstein units finish, which is in just under two hours bit longer then normal, the seti units will begin and it cycles something like that as it has done since I joined E@H.
The values for long term debt are now;
Einstein
<long_term_debt>-1021112.009591</long_term_debt>
Seti
<long_term_debt>1021112.009591</long_term_debt>


Sean,

I've looked up your computers both on Seti and on EAH. Seti shows you as having 1 computer and your results list there shows no client errors (yet). EAH now shows you as having two (identical) computers so when you made the change to the LTD value you have obviously corrupted something else in the state file so that EAH has given your machine a new ID. We can fix the double ID problem later on.

Now that you are back with the old state file, you have eliminated the corruption (whatever it was) but you are going to soon see EAH do no more crunching for about 12 days unless you fix the LTD values. You basically have two options:-

(i) Leave everything alone and wait 12 days for EAH to repay the debt
(ii) Change the debt values to zero without changing anything else (which you must have done last time).

Have you used a text editor previously? Are you confident that you can avoid changing values of other variables (or more likely corrupting the syntax of the file in some way)?

You can probably work out what BOINC was upset with last time if you examine the contents of the log files stdoutdae.txt, stderrdae.txt, stdoutgui.txt and stderrgui.txt in your boinc folder. Some or all of these may contain information relevant to what happened when you started BOINC with the edited state file.

Please be assured that the edit of LTDs itself cannot have caused the observed action. There must have been other corruption. Did you use notepad to do the edit?

Let us know what you would like to do.



____________
Cheers,
Gary.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62970 - Posted 20 Jan 2007 12:47:58 UTC - in response to Message 62968.


Could it have just been a quirk of the heat we've been having here this past week? It's been in the high 30 - 40 degrees here and it did make my machine behave a little weird, had to shut it down for a day as the heat was rather intense in this room, roughly 40degrees C.
(Wild stab in the dark)


It's highly unlikeley to be related to temperature. In my experience if a machine gets too hot it simply locks up. A reboot gets it going again. Rarely do I see any file corruption. Any corruption is usually in Windows files, not BOINC files.

I'd like you to have another go at editing the state file. If you used notepad last time, please start it up again (without loading any file) and look under the format tab and tell me whether the wordwrap option is ticked or not.


____________
Cheers,
Gary.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62971 - Posted 20 Jan 2007 12:48:36 UTC - in response to Message 62969.

Gary,

Yeah I did use notepad and this is what I did. Renamed the file client_stateA and moved it from the directory leaving the original file untouched in the directory. I edited client_state file removing the numbers that were in the LTD and put in 0, I didn't put in 0.0000 just 0. I did this for both seti and EAH. Nothing else was edited!
Started BOINC, got the errors as quoted, shutdown BOINC deleted the edited client_state file, moved the renamed file into the directory and called it client_state, this was using notepad.

What I'll do now is go in and just change the LTD's to 0, not make a back up and try again. I'll shut down BOINC and make the change.
Oh and yeah I know my way around a PC. :-)

Sean

Sean,

I've looked up your computers both on Seti and on EAH. Seti shows you as having 1 computer and your results list there shows no client errors (yet). EAH now shows you as having two (identical) computers so when you made the change to the LTD value you have obviously corrupted something else in the state file so that EAH has given your machine a new ID. We can fix the double ID problem later on.

Now that you are back with the old state file, you have eliminated the corruption (whatever it was) but you are going to soon see EAH do no more crunching for about 12 days unless you fix the LTD values. You basically have two options:-

(i) Leave everything alone and wait 12 days for EAH to repay the debt
(ii) Change the debt values to zero without changing anything else (which you must have done last time).

Have you used a text editor previously? Are you confident that you can avoid changing values of other variables (or more likely corrupting the syntax of the file in some way)?

You can probably work out what BOINC was upset with last time if you examine the contents of the log files stdoutdae.txt, stderrdae.txt, stdoutgui.txt and stderrgui.txt in your boinc folder. Some or all of these may contain information relevant to what happened when you started BOINC with the edited state file.

Please be assured that the edit of LTDs itself cannot have caused the observed action. There must have been other corruption. Did you use notepad to do the edit?

Let us know what you would like to do.



Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62972 - Posted 20 Jan 2007 13:02:06 UTC

Don't get me worried, I used the trick of resetting all debts to zero, too. Hope nothing will crash (couldn't see yet because my box is far from idle atm and doesn't get much crunching done)... but at least I used a good text editor and didn't have word wrap or sth on.
Sean, good luck with your modifications, I hope it works out okay this time. btw I'd like 40 degrees, too, it's around 10 and rainy here which is not really nice. Well, my computers probably like it cool ;-)
____________

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62973 - Posted 20 Jan 2007 13:05:04 UTC

Ok I did exactly as you posted,

# Stop BOINC completely and confirm with task manager that nothing is running
# Open client_state.xml with notepad and edit so that both debts are <long_term_debt>0.000000</long_term_debt>
# Save your changes and exit notepad (don't use save as...)
# Start BOINC


Except for the last step as I rebooted the machine.
Now the values are:
Einstein
<long_term_debt>0.156250</long_term_debt>
Seti
<long_term_debt>-0.156250</long_term_debt>

I'm guessing this is more what we are looking for?

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62974 - Posted 20 Jan 2007 13:18:40 UTC - in response to Message 62972.

Don't get me worried, I used the trick of resetting all debts to zero, too. Hope nothing will crash (couldn't see yet because my box is far from idle atm and doesn't get much crunching done)... but at least I used a good text editor and didn't have word wrap or sth on.
Sean, good luck with your modifications, I hope it works out okay this time. btw I'd like 40 degrees, too, it's around 10 and rainy here which is not really nice. Well, my computers probably like it cool ;-)


Thank you Annika, I'm sure it will work out. Right now I am crunching Einstein units so things are better than they were. :-)

Oh and I'd prefer 10 degrees and rain any day over 40, and we sure do need rain here!

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 62975 - Posted 20 Jan 2007 13:34:42 UTC

I could send you some, we definitely have more than enough here ;-)
Good you finally got BOINC to do what it should :-)
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62976 - Posted 20 Jan 2007 13:53:57 UTC - in response to Message 62973.

Ok I did exactly as you posted, ...


Good. I deliberately put 0.000000 in the instructions because I knew that BOINC will write out 6 decimal digits every time it updates that value. You would think it would be happy with a plain 0 but I've actually never been game to test that even though I've done this exact same edit many times before. I've always used 0.00000 to be safe. I don't know that this was the problem but I'm not suggesting we try it again to find out :).


Except for the last step as I rebooted the machine.


Same difference!! No problem.


Now the values are:
Einstein
<long_term_debt>0.156250</long_term_debt>
Seti
<long_term_debt>-0.156250</long_term_debt>

I'm guessing this is more what we are looking for?


Great!! This is exactly what you should be seeing.

The only problem left is that you now have two machine IDs on the EAH project. I'm guessing that the one in your state file at the moment will be recorded as <hostid>783254</hostid>. You will find it several lines above the LTD value you have been editing. The newer one that you also have acquired is 840813 but I don't believe that this second one will show up in your state file.

Please let me know which one is currently there and I'll tell you how to merge them.



____________
Cheers,
Gary.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62977 - Posted 20 Jan 2007 13:57:47 UTC

Hi Gary,

Yeah I've noticed I now have two entries on the "your computers" page. If you are able to fix this so my original entry is left and the results on the second are transferred that would be great.
But I must stress, I did not change anything other than the LTD for Einstein and Seti using notepad, it's not like I was using vi on my FreeBSD box and deleted half the file. ;-)

Appreciate your help.

Sean

I've looked up your computers both on Seti and on EAH. Seti shows you as having 1 computer and your results list there shows no client errors (yet). EAH now shows you as having two (identical) computers so when you made the change to the LTD value you have obviously corrupted something else in the state file so that EAH has given your machine a new ID. We can fix the double ID problem later on.

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 62979 - Posted 20 Jan 2007 14:04:49 UTC
Last modified: 20 Jan 2007 14:14:13 UTC

Ok here we go.

The host id is the new one.
<hostid>840813</hostid>

If you could step me through what to do I'll do it in the morning, it's getting a bit late. ;-)

Thank-you Gary.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 62984 - Posted 20 Jan 2007 14:57:18 UTC - in response to Message 62979.
Last modified: 20 Jan 2007 14:58:28 UTC

Ok here we go.

The host id is the new one.
<hostid>840813</hostid>


OK, I was hoping it wouldn't be that one. Doesn't matter all that much. After the merge all your results and credit history will be back under one hostid anyway.

If you could step me through what to do I'll do it in the morning, it's getting a bit late. ;-)


Yeah, I know :). I'm in Brisbane but I'm guessing you are south of the border and on daylight saving time so it is getting a little late for you. Bummer, just saw Lleyton lose his match in the Open.

Anyway, here are your instructions for the morning:-

  • Go to "Your Computers" page and click on your old computer ID - 783254
  • At the bottom of the page that comes up you will see the "Merge ..." link
  • Click the link and follow the instructions
  • You should be offered the option of merging 783254 with 840813. Accept this option.



You should be done. At the moment, before merging, 840813 has just 5 results in its list whilst 783254 has 221 which includes some new work and all your old results that haven't yet been deleted from the online database. Some of your old ones may disappear overnight but that is just normal behaviour. When you do the merge, whatever is left will be merged into the list for 840813 - at the moment 226 results in total.

The server will notice that you have new work under both 783254 and 840813 so when the merge occurs it will resend any lost or missing work and you should see a message to this effect when that happens. You should end up with all the new work from both former hosts all merged together in a single queue on 840813.

Let us know if anything seems amiss.


Thank-you Gary.


You're welcome.

____________
Cheers,
Gary.

Profile S@NL - Marleen
Avatar
Joined: Jan 18 05
Posts: 24
ID: 3250
Credit: 254,206
RAC: 283
Message 62990 - Posted 20 Jan 2007 16:01:33 UTC - in response to Message 62976.

I deliberately put 0.000000 in the instructions because I knew that BOINC will write out 6 decimal digits every time it updates that value. You would think it would be happy with a plain 0 but I've actually never been game to test that even though I've done this exact same edit many times before. I've always used 0.00000 to be safe. I don't know that this was the problem but I'm not suggesting we try it again to find out :).

When I did that kind of edit a while ago I just put a plain 0 in there. So I can tell you BOINC doesn't have any problem with that.

____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 62997 - Posted 20 Jan 2007 17:42:53 UTC

Hmmm, interesting. I haven't done a debt reset in a while, but my experience has been the same as Gary's. If you just replace the value with '0' instead of '0.000000', BOINC would revert to using the old value (which I assume meant it reverted to the value in the backup client_state file).

Guess if I get a "boredom moment" I'll give it a try with the later versions I'm running now to see if this changed.

Alinator

Profile [B^S] BOINC-SG
Avatar
Joined: Oct 2 06
Posts: 3
ID: 218227
Credit: 53,544
RAC: 290
Message 63007 - Posted 20 Jan 2007 19:45:02 UTC - in response to Message 62723.

- currently due to the storm warnings in Germany facilities are shutting down and people are sent home, so we'll see how things go today

Storm warnings? That little bit of wind?
It is a little bit worse than last week's storm, isn't it? ;-)


Well Ageless, you see, two days later with 12 people killed in germany alone, nobody will talk about "a little bit of wind" again... :( *GRRRR!*

Next time just leave it, please!

Annika
Avatar
Joined: Aug 8 06
Posts: 718
ID: 207213
Credit: 210,088
RAC: 0
Message 63009 - Posted 20 Jan 2007 20:04:24 UTC - in response to Message 62997.

Hmmm, interesting. I haven't done a debt reset in a while, but my experience has been the same as Gary's. If you just replace the value with '0' instead of '0.000000', BOINC would revert to using the old value (which I assume meant it reverted to the value in the backup client_state file).

Guess if I get a "boredom moment" I'll give it a try with the later versions I'm running now to see if this changed.

Alinator


Don't bother, my 5.4.11 app (which should be the "current stable" as I downloaded it only a week ago after my Windows reinstall) reacted just the same.
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 63013 - Posted 20 Jan 2007 21:06:52 UTC - in response to Message 62963.

As Jord pointed out, the EAH debt of -1024672 seconds represents the time that other projects (ie Seti) will run before EAH will be allowed to run. He was obviously using one of those famous Dutch cakculators because his estimate was a little bit out :). It's actually just under 12 days and not 9 months - pretty close for a Dutch calculator don't you think :).

I blame lack of sleep. I hadn't slept (normally) for over 3 days. But I am all up to speed again. Missed the step of dividing it by 24. ;)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 63020 - Posted 20 Jan 2007 23:56:24 UTC

Yeah, I know :). I'm in Brisbane but I'm guessing you are south of the border and on daylight saving time so it is getting a little late for you.


Yep I live in Melbourne, oh and it RAINED last night!!! :-D

Ok I followed the merging steps and my computer id is now 840813 and has incorporated 783254's results, well all but 10 points. So it looks like it's all working.
I've noticed the LTDs are now <long_term_debt>1802.000000</long_term_debt>, so I'm guessing that every now and then I should check them and reset to 0.000000.

Thanks for your help Gary, all seems to be working now. :-)

Sean

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 63022 - Posted 21 Jan 2007 2:38:45 UTC - in response to Message 63020.
Last modified: 21 Jan 2007 2:52:55 UTC


Ok I followed the merging steps and my computer id is now 840813 and has incorporated 783254's results, well all but 10 points. So it looks like it's all working.


That sounds pretty good. It should be OK now.


I've noticed the LTDs are now <long_term_debt>1802.000000</long_term_debt>, so I'm guessing that every now and then I should check them and reset to 0.000000.


NO, not at all!!!

Please realise that we only fiddle with the state file in exceptional circumstances. A value of 1800 seconds (ie half an hour) is not at all unusual when the normal switch interval is an hour - 3600 seconds. If you would like to see a simple example of how the LTD values will change as the projects share the CPU, have a look at this post in another thread where the OP was having a problem with a crazy set of values for LTDs. You may need to review some earlier messages to get the full context but the example in the second half of my post shows exactly how LTD values are supposed to vary. In normal circumstances LTD values will cycle between +3600 and -3600 and will often be "crossing the equator" so to speak. They will move outside this general range if a project has to run more than it should (to meet a deadline) or cannot run at all (lack of work, server problems, etc). LTD is BOINCs mechanism for ensuring that your resource shares are respected.

EDIT: Resetting the debts by editing like this should be viewed as a bit of emergency surgery to correct a problem - not a routine exercise to be undertaken whenever you feel like it.

Thanks for your help Gary, all seems to be working now. :-)

Sean


You're welcome!


____________
Cheers,
Gary.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 63023 - Posted 21 Jan 2007 3:02:40 UTC - in response to Message 63013.


I blame lack of sleep... ;)


That's OK ;). I was just "rattling your cage" :).


____________
Cheers,
Gary.

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 63024 - Posted 21 Jan 2007 3:53:54 UTC - in response to Message 63023.

LOL...

You noticed that my "puzzle" of Mouse was correct in the end? ;)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Sean
Joined: Oct 27 06
Posts: 15
ID: 226983
Credit: 572,931
RAC: 516
Message 63030 - Posted 21 Jan 2007 4:56:50 UTC - in response to Message 63022.

Ok then, no problems I'll leave things as they are and have a read of the link you sent me to get more info on this.

Happy crunching. :-)


NO, not at all!!!

Please realise that we only fiddle with the state file in exceptional circumstances. A value of 1800 seconds (ie half an hour) is not at all unusual when the normal switch interval is an hour - 3600 seconds. If you would like to see a simple example of how the LTD values will change as the projects share the CPU, have a look at this post in another thread where the OP was having a problem with a crazy set of values for LTDs. You may need to review some earlier messages to get the full context but the example in the second half of my post shows exactly how LTD values are supposed to vary. In normal circumstances LTD values will cycle between +3600 and -3600 and will often be "crossing the equator" so to speak. They will move outside this general range if a project has to run more than it should (to meet a deadline) or cannot run at all (lack of work, server problems, etc). LTD is BOINCs mechanism for ensuring that your resource shares are respected.

EDIT: Resetting the debts by editing like this should be viewed as a bit of emergency surgery to correct a problem - not a routine exercise to be undertaken whenever you feel like it.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 63070 - Posted 21 Jan 2007 22:23:30 UTC - in response to Message 63024.

LOL...

You noticed that my "puzzle" of Mouse was correct in the end? ;)


Absolutely - never doubted it for a minute. I was just too lazy on two counts:-

(i) I didn't have the patience to open up a copy of my state file in one window and go through it line by line, matching variable names and his jumbled-up values.
(ii) When you had done that work for me, I failed to appreciate, initially, just how big the LTD really was. Sometimes you can look at a number and it just doesn't register ....

There was a real benefit however. The OP learned the technique for scanning and presenting excerpts from his state file which he was obviously pleased about. When he did that, the abnormal LTD suddenly hit me in the face :).

It's the old adage really - you give a person a fish and he can feed himself for a day or you give him a fishing line .... :).


____________
Cheers,
Gary.

KWSN THE Holy Hand Grenade!
Avatar
Joined: Sep 10 06
Posts: 14
ID: 213643
Credit: 80,471
RAC: 110
Message 63124 - Posted 22 Jan 2007 16:06:29 UTC - in response to Message 62786.

[snip]

[aside]
Always a worry, depending upon the font and personal acuity:
1 and I, O and 0, S and 5, E and 3, g and 9, A and 4, B and 8, 2 and Z, b and 6.
[/aside]


to which I add - B, 8 and 0 (when written with a slash thru it) and O

Cheers, Mike.[/quote]

KWSN THE Holy Hand Grenade!
Avatar
Joined: Sep 10 06
Posts: 14
ID: 213643
Credit: 80,471
RAC: 110
Message 63125 - Posted 22 Jan 2007 16:33:21 UTC
Last modified: 22 Jan 2007 16:36:03 UTC

Now that everyone is getting WU's again, how about fixing the "server status" page?

Seriously, that's the first page (aside from the main page) that I look at when I visit a project...

Profile Phillip J
Avatar
Joined: Mar 29 05
Posts: 6
ID: 66943
Credit: 4,000
RAC: 0
Message 63223 - Posted 24 Jan 2007 0:13:34 UTC

Many Thanks,
I appreciate the update–and, more so, appreciate you keeping us Mac users in the run.
____________
Whtnukle

Daxa
Avatar
Joined: Jan 8 07
Posts: 4
ID: 239401
Credit: 153,045
RAC: 0
Message 63268 - Posted 24 Jan 2007 21:36:20 UTC

Does anyone know when the current "maintenance" will be done?

I have a WU that needs to be sent before I go out of town for 2 weeks!

Arcturus
Joined: Sep 20 06
Posts: 2
ID: 215925
Credit: 296,730
RAC: 11
Message 63289 - Posted 25 Jan 2007 7:05:08 UTC
Last modified: 25 Jan 2007 7:17:29 UTC

Gary,

I have one possible answer to the huge long term debts that were discussed in this thread over the past couple of days. I have the same problem on one of my machines and I had a problem with the system date about a week or so ago. I fat fingered it and didn't notice that I entered 2017 for the year instead of 2007 when I 'corrected' the date. I noticed the problem when I checked the graphs in Boinc manager and reset the year to the proper value. My graphs are still messed up and, when I checked the long term debts on my projects on that machine, they total +/- 400 million-ish. (Do the math and that works out to about 12 years in seconds, so maybe I entered 2019 - can't remember now.) I don't have access to that machine right now (checked it remotely) but I'll certainly be clearing that up the next time I get on it.

Moral of the story: Before you mess around with your system date, shut down Boinc first and leave it down until you are sure that the date is correct.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,127
RAC: 174,711
Message 63292 - Posted 25 Jan 2007 8:02:31 UTC

Thanks for that.

Yes, a big date adjustment like that while BOINC is running is likely to trash the debt system. Shutting down BOINC first is a very good idea.


____________
Cheers,
Gary.

Arcturus
Joined: Sep 20 06
Posts: 2
ID: 215925
Credit: 296,730
RAC: 11
Message 63302 - Posted 25 Jan 2007 16:05:07 UTC

Just an FYI. I jumped on the BOINC forum and posted there as well. Personally, I'd consider this a rather large error of ommission so I recommended that either a debt reset option be added, or that a configurable debt 'cap' be added to limit the damage when this occurs. I consider it undesirable to force users to have to edit configuration files manually, especially when you are dealing with a user base as large as this one. This is definitely one case where waiting for BOINC to sort it out, while it is an option, is probably not the optimal solution. ;-)

Sporally
Joined: Dec 29 05
Posts: 31
ID: 155785
Credit: 29,335
RAC: 0
Message 64248 - Posted 14 Feb 2007 21:41:13 UTC

So it seems the problem persists. The last post is quite old, so i would like to hear if there is any news on this problem? I guess we all have the same problem with E@H - i have problems sending and receiving WUs. The 'Server status'-page is still down, though i recall it was up and running around a week ago. Didn't it run properly or did it crash?
____________
"The world is a fine place and worth figthing for." (Ernest Hemmingway)
"Non progredi est regredi

Profile Udo
Joined: May 19 05
Posts: 204
ID: 82463
Credit: 3,415,929
RAC: 1,235
Message 64286 - Posted 15 Feb 2007 11:52:42 UTC - in response to Message 64248.


...The 'Server status'-page is still down, though i recall it was up and running around a week ago. Didn't it run properly or did it crash?


It is switched off to reduce server load!
____________
Udo

Sporally
Joined: Dec 29 05
Posts: 31
ID: 155785
Credit: 29,335
RAC: 0
Message 64433 - Posted 18 Feb 2007 17:38:09 UTC

So will it never get back online? Too bad, i really liked that page a lot. I got a feeling that we were getting somewhere. So is it only temporarily or permanent?
____________
"The world is a fine place and worth figthing for." (Ernest Hemmingway)
"Non progredi est regredi

astro-marwil
Joined: May 28 05
Posts: 56
ID: 84167
Credit: 401,429
RAC: 309
Message 64439 - Posted 18 Feb 2007 18:12:46 UTC - in response to Message 64433.

So will it never get back online?
So is it only temporarily or permanent?


No, it´s not permanent donwn, but not every enquiery for fresh data become immediately answered by the server, as there are problems since more than a week. Try over and over, you will get data. BOINC will do this for you automaticaly. - Look at the Message board of your BOINC. - Up to now, I got allways new data at the right time for continuesly crunching. ( 1 file takes about 3.25h to become crunched at my host.)
There is also a problem with low crunching speed of the validators. As a meanvalue of the last 8 days, I get only 50% of the reported files validated. So your list of pending files will increase all time. Up to now, I have no indication that any reported file gone lost.
This is my personal experience.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 64493 - Posted 19 Feb 2007 14:37:13 UTC - in response to Message 64433.
Last modified: 19 Feb 2007 14:37:33 UTC

So will it never get back online? Too bad, i really liked that page a lot. I got a feeling that we were getting somewhere. So is it only temporarily or permanent?

We'll definitely put it back online once the database problems have been solved.

BM
____________
BM

Message boards : Cruncher's Corner : S5R1 and beyond


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration