New (Albert) application and workunits


Advanced search

Message boards : Cruncher's Corner : New (Albert) application and workunits

Sort
AuthorMessage
Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 24094 - Posted 24 Dec 2005 5:16:38 UTC
Last modified: 28 Dec 2005 4:33:32 UTC

I wanted to tell the dedicated crunchers a bit about the new application (called 'Albert') and workunits, that I have started testing on the public project today.

We've been doing private testing within the small group of Einstein@Home developers for several months, and are no longer finding problems and errors. So I have started to distribute a few thousand of these workunits to 'the general public'. If they work well we will start issuing primarily these workunits in the coming days.

A couple of key differences between the 'albert' (new) and 'einstein' (old) workunits.

- The new WU have different execution times, typically ranging from about 25% to 100% the previous execution times

- The new WU application incorporates all BOINC graphics and other bug fixes to date

- The new WU application has a slightly re-arranged screensaver, which includes our top wish-list item: a real time clock

I'll update (edit) this post if questions arise about how these new WU are structured. In many cases I'll then delete the post which asked the question, to keep the thread as compact as possible.

I have not forgotten that when we launched Einstein@Home in February 2005, we found a number of bugs because of the vigilance and sharp eyes of Einstein@Home users. So please call attention to strange behavior, either in this thread or in the Problems and Bug Reports message board.

Bruce Allen

[Edit Dec 24, questions from Paul Buck]

1) We wouldn't have been lucky enough to get FLOPS counting this time around would we?

No, but I'll take a quick look at the API, and implement this if it's easy.

2) Are the improved run times from optimized compiles for windows?

The compilation process is no more and no less optimized than before. The differences in run times come about because we are now using a sky search grid and frequency band which depends upon frequency. This makes it impossible for all workunits to be the same length.

3) Is the Mac version still using Altiec?

Yes, the Mac version still uses Altivec optimization if the CPU supports the Altivec instruction set.

I got one running right now by the way (thanks), and it is hard to tell over RealVNC, but the graphics looks like they are a little "prettier". Estimated run time is ~3 hours so that looks like about 25% of the prior (though I am only 13% through).

If you have a real-time clock in the upper right hand corner of the screensaver/graphics screen and the wording in the corners has slightly cleaner layout, then yes, you are running 'Albert'.

[EDIT 25 December, questions from various people]

Will we be switching back and forth between Einstein and Albert apps?

Yes, for some time now, until we are sure that the Albert app is working as required.

Does the Albert application have its own number (like 4.80) or is it still 4.79?

The Albert app has its own number and name. You will know you are running this by seeing what the name of the application is in the BOINC manager, or in the title bar of the graphics window. See the list of applications for more info.

Is there any way we can download the new Albert application?

No. What work (and hence, what application) your computer gets is determined by chance. The 'scheduler' decides this when work is sent out.


[EDIT December 27]
Is is intentional that the target number of results is three rather than the old value of four?

Yes, this is intentional. It may slow down result validation in some cases but will increase our computing power by ~ 25%.


____________

Profile Stef
Joined: Mar 8 05
Posts: 50
ID: 50436
Credit: 330,695
RAC: 0
Message 24188 - Posted 26 Dec 2005 16:37:09 UTC

Linux optimisation seem to got even worse:
http://einstein.phys.uwm.edu/workunit.php?wuid=3061680
The same CPUs and the same WU: 28848s for linux and 18498s for win.

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24190 - Posted 26 Dec 2005 16:50:38 UTC - in response to Message 24188.
Last modified: 26 Dec 2005 16:53:26 UTC

Linux optimisation seem to got even worse:
Workunit in question
The same CPUs and the same WU: 28848s for linux and 18498s for win.
.

Check again. The 18498s result also indicates a Linux OS.

____________

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,207,774
RAC: 1,922
Message 24192 - Posted 26 Dec 2005 17:37:24 UTC - in response to Message 24190.


Workunit in question


Just noticed that "initial replication" is set to 3, instead of 4
for the old application.

Was that intentional?

Michael
____________
Team Linux Users Everywhere

Profile Stef
Joined: Mar 8 05
Posts: 50
ID: 50436
Credit: 330,695
RAC: 0
Message 24193 - Posted 26 Dec 2005 17:37:32 UTC - in response to Message 24190.


Check again. The 18498s result also indicates a Linux OS.

Oops, you're right. Why the difference then?

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24197 - Posted 26 Dec 2005 18:31:02 UTC - in response to Message 24193.


Check again. The 18498s result also indicates a Linux OS.

Oops, you're right. Why the difference then?

Number of CPUs = 1, = 2 ...

One is HT, the other is not most likely.

HT gives you 2 logical processors but does not give 2x speed. I see 20-40% better THROUGHPUT at a loss of individual processing time, they take longer ...
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24199 - Posted 26 Dec 2005 18:45:32 UTC - in response to Message 24192.
Last modified: 26 Dec 2005 18:51:11 UTC

Just noticed that "initial replication" is set to 3, instead of 4
for the old application.

Was that intentional?

Michael


I've processed one Albert unit so far - and its "initial replication" was also 3 - so, my guess is it was intentional.

But, getting back to this unit, I noticed the "failed" result's computer is still using BOINC 4.19. Is BOINC 4.19 "too old" for Albert or was this just a coincidence?

____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24200 - Posted 26 Dec 2005 18:56:43 UTC

Can't find the minimum requirement any longer. But, if the BOINC Software was out of date the work should not have been issued. But, this may need project attention. Did they test Albert with 4.19?

Of course, with the better versions out there I have no idea why anyone would still use 4.19 ... :)
____________

AnRM
Joined: Feb 9 05
Posts: 213
ID: 9811
Credit: 3,053,004
RAC: 0
Message 24213 - Posted 27 Dec 2005 0:11:40 UTC

Daily quota problems with 'Albert'.....please see 'Problems and Bug Reports' for details.....Cheers, Rog.
____________

Desti
Joined: Aug 20 05
Posts: 109
ID: 102538
Credit: 2,442,512
RAC: 2,828
Message 24215 - Posted 27 Dec 2005 0:29:05 UTC - in response to Message 24192.


Workunit in question


Just noticed that "initial replication" is set to 3, instead of 4
for the old application.

Was that intentional?

Michael


3 is a good idea. 4 is a big waste of resources, because a lot of WUa are done with 3 valid results and the fourth is completed for nothing.
____________

Boris@newsVIP
Joined: Apr 30 05
Posts: 2
ID: 77747
Credit: 3,182
RAC: 0
Message 24218 - Posted 27 Dec 2005 1:23:58 UTC

to Paul
Because 4.19 has a progress bar.
Knowing is important with heavy WU which is like "Einstein@home".

I want to run with "Albert" soon.

Thanks.
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24223 - Posted 27 Dec 2005 8:26:07 UTC

Hmmm, with BOINC View I have lots of progress bars ...

Only runs on Windows though ...
____________

Profile Honza
Joined: Nov 10 04
Posts: 136
ID: 614
Credit: 3,332,354
RAC: 14
Message 24229 - Posted 27 Dec 2005 11:40:48 UTC
Last modified: 27 Dec 2005 11:41:54 UTC

A screenshot from a new 'Albert' may interest Einstein's participants, I guess. Anyone...?
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24233 - Posted 27 Dec 2005 14:11:53 UTC - in response to Message 24229.

A screenshot from a new 'Albert' may interest Einstein's participants, I guess. Anyone...?

Without a sample link, how do I know if what you had was interesting?
____________

Profile Honza
Joined: Nov 10 04
Posts: 136
ID: 614
Credit: 3,332,354
RAC: 14
Message 24234 - Posted 27 Dec 2005 14:16:26 UTC - in response to Message 24233.

Without a sample link, how do I know if what you had was interesting?
Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot...

____________

xi3piscium
Avatar
Joined: Dec 13 05
Posts: 39
ID: 147325
Credit: 12,787
RAC: 0
Message 24236 - Posted 27 Dec 2005 14:29:31 UTC - in response to Message 24234.

Without a sample link, how do I know if what you had was interesting?
Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot...


I made a screen shot of the new Mac graphics, can't figure out how
to upload or post it as a msg. Feeling stupid in SW China :)
____________

Profile Honza
Joined: Nov 10 04
Posts: 136
ID: 614
Credit: 3,332,354
RAC: 14
Message 24238 - Posted 27 Dec 2005 14:42:44 UTC - in response to Message 24236.
Last modified: 27 Dec 2005 15:04:13 UTC

I made a screen shot of the new Mac graphics, can't figure out how to upload or post it as a msg. Feeling stupid in SW China :)

Once you have got the image on a server (there are free webhosting providers for personal/non-commericial use) you need to use BOINC forum tags.

For example, use (see the code using Reply to this post button)


____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24250 - Posted 27 Dec 2005 17:47:54 UTC - in response to Message 24236.

Without a sample link, how do I know if what you had was interesting?
Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot...


I made a screen shot of the new Mac graphics, can't figure out how
to upload or post it as a msg. Feeling stupid in SW China :)

E-mail it to me and I will post it and link it...

p.d.buck@comcast.net
____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24267 - Posted 27 Dec 2005 19:19:46 UTC

Today I got 4 Albert WUs, and all 4 resulted in "Client error".

Description for 1st WU error is...

<core_client_version>5.2.13</core_client_version>
<message>app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>albert_4.37_windows_intelx86.exe</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>

And for other 3 it is...

<core_client_version>5.2.13</core_client_version>
<message>WU download error: couldn't get input files:
<file_xfer_error>
<file_name>skygrid_1290_r_T09.dat</file_name>
<error_code>-119</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>

Any ideas what it is all about and how I could fix it?

Thanks
Edo
____________

Profile Honza
Joined: Nov 10 04
Posts: 136
ID: 614
Credit: 3,332,354
RAC: 14
Message 24272 - Posted 27 Dec 2005 20:33:24 UTC - in response to Message 24267.

Any ideas what it is all about and how I could fix it?
I would try to reset the project. It will delete create new HostID and trigger fresh download of Einstein's project files.
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24290 - Posted 28 Dec 2005 6:47:00 UTC
Last modified: 28 Dec 2005 6:50:46 UTC

There are a number of reasons you can get MD5/ signature failures. These include bad connections to the project (which only makes sense if you have similar issues with the project for the "normal" work units, or the project may have generated "bad" work unit files.

I am guessing, especially since I have 13 results on hand that it is only a partial problem. Have you gotten any yet?

==== edit

I am seeing a number of "Unsent" in the work issued to me (in other words I got it, someone else got it, gut the third person did not), so, there may be a server problem also ...


____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24295 - Posted 28 Dec 2005 8:15:52 UTC

Thank you Honza, thank you Paul. I'll see what willhappen today with this machine (if it receive Albert WU again). Will report here if anything new happen.

Edo
____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24323 - Posted 28 Dec 2005 16:24:51 UTC

I report what happens today with this machine...
http://einstein.phys.uwm.edu/forum_thread.php?id=3446

Now it send Client errors with Einstein WUs as well. I reset project and I reattached project. Same thing happens again. It seems as I can't get to download Albert app successfuly.

Any ideas how to fix this?
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24328 - Posted 28 Dec 2005 17:31:22 UTC

I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...
____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24330 - Posted 28 Dec 2005 17:39:29 UTC

Hmmm... we will see. Strange part is that all of a sudden I just can't make this machine to crunch E@H at all (Einstein or Albert). And all started with the first Albert WU. Could that first Albert WU somehow made a mess with E@H related files?

Tomorrow I'll try to delete complete E@H folder on that machine and start all over again with installation. We'll see what happens. Wouldn't like to return to some other projects with this machine, but if nothing else works...
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24332 - Posted 28 Dec 2005 17:47:27 UTC - in response to Message 24330.

Hmmm... we will see. Strange part is that all of a sudden I just can't make this machine to crunch E@H at all (Einstein or Albert). And all started with the first Albert WU. Could that first Albert WU somehow made a mess with E@H related files?

Tomorrow I'll try to delete complete E@H folder on that machine and start all over again with installation. We'll see what happens. Wouldn't like to return to some other projects with this machine, but if nothing else works...


Edo,

If you have "failed" a bunch of WU's, it could be that your Einstein quota has been reduced, and therefore, you just can't get any work right now. You just might have to wait until tomorrow for a new WU.

Stick

____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24337 - Posted 28 Dec 2005 18:42:31 UTC - in response to Message 24332.


Edo,

If you have "failed" a bunch of WU's, it could be that your Einstein quota has been reduced, and therefore, you just can't get any work right now. You just might have to wait until tomorrow for a new WU.

Stick


Stick,

Yes, I know I reached daily quota limit, but before that I wasn't able to download Albert app after I reattached project. Now I have two WUs registered as they are in crunching phase but they were not...
http://einstein.phys.uwm.edu/workunit.php?wuid=3116299
http://einstein.phys.uwm.edu/workunit.php?wuid=3094170

Both of those WUs just weren't able to start processing.

I uninstalled BOINC at that machine and will try tomorrow from the begining. Will see if it works.


____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24338 - Posted 28 Dec 2005 19:19:31 UTC - in response to Message 24337.
Last modified: 28 Dec 2005 19:30:27 UTC

Stick,

Yes, I know I reached daily quota limit, but before that I wasn't able to download Albert app after I reattached project. Now I have two WUs registered as they are in crunching phase but they were not...
http://einstein.phys.uwm.edu/workunit.php?wuid=3116299
http://einstein.phys.uwm.edu/workunit.php?wuid=3094170

Both of those WUs just weren't able to start processing.

I uninstalled BOINC at that machine and will try tomorrow from the begining. Will see if it works.


Edo,

Sorry! (I hadn't read all your earlier posts before now.) Are the WU's cited above yours? And, are you still using BOINC 4.19? If so, did you see my earlier post here? No one has answered my question yet - but, it could be that BOINC 4.19 is the problem.

If it's you who needs to upgrade, remember that BOINC 5.2.13 will not install over v4.19. You need to remove v4.19 using Windows "Add/Remove" Control Panel, then install v5.2.13. The Windows "Remove" should leave your BOINC folder with your account info intact.

Stick

Edit: Or, if some of your other projects don't support BOINC 5.2.X yet, you could upgrade to BOINC 4.45 (directly from v4.19).
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24344 - Posted 28 Dec 2005 20:02:55 UTC
Last modified: 28 Dec 2005 20:30:42 UTC

Edo,

I am confused now. I just read your posts on the Problems and Bug Reports : New Application "Albert" thread and I got the impression you are running BOINC 5.2.13. Do you have several machines with different BOINC versions? If not, your problem may be that BOINC 5.2.13 did not install properly. If you think this may be the problem, you should read this message and thread.

I remember reading some posts about a month ago where someone thought they had updated to BOINC 5.2.X but, even though the new BOINC Manager was installed, the new BOINC client was not. Turns out the user did not properly exit BOINC before doing the "install". The old BOINC client was left running and therefore could not be overwritten. Unfortunately, the BOINC Installer was not "smart enough" to catch the problem.

Stick
____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24347 - Posted 28 Dec 2005 21:26:59 UTC

Stick,

Thanks a lot for your help. Machine with problems has 5.2.13 BOINC. I see that WU # 3094170 has one other host with Client error and with BOINC 4.19. That host is not mine. Mine is 498479. It appears like my machine normally work on that WU, but it doesn't.

I run only E@H. Today I twice reinstalled BOINC. But every time I had a problem with downloading Albert app. I'll try tomorrow to delete complete BOINC folder and start installation from the begining. Will report here how it went.

Thanks.

Edo
____________

rwalraven
Joined: Dec 24 05
Posts: 1
ID: 153979
Credit: 34,426
RAC: 0
Message 24358 - Posted 29 Dec 2005 0:28:21 UTC

12/29/2005 1:24:13 AM|Einstein@Home|Requesting 43200 seconds of new work
12/29/2005 1:24:18 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12/29/2005 1:24:20 AM|Einstein@Home|Started download of albert_4.37_windows_intelx86.exe
12/29/2005 1:24:20 AM|Einstein@Home|Started download of albert_4.37_windows_intelx86.pdb
12/29/2005 1:25:08 AM|Einstein@Home|Temporarily failed download of albert_4.37_windows_intelx86.exe: error 500
12/29/2005 1:25:08 AM|Einstein@Home|Temporarily failed download of albert_4.37_windows_intelx86.pdb: error 500
12/29/2005 1:25:08 AM|Einstein@Home|Started download of earth_05_09
12/29/2005 1:25:08 AM|Einstein@Home|Started download of sun_05_09
12/29/2005 1:25:56 AM|Einstein@Home|Temporarily failed download of earth_05_09: error 500
12/29/2005 1:25:56 AM|Einstein@Home|Temporarily failed download of sun_05_09: error 500
12/29/2005 1:25:56 AM|Einstein@Home|Started download of config_S4R2a.cfg
12/29/2005 1:25:56 AM|Einstein@Home|Started download of r1_1204.0
12/29/2005 1:26:42 AM|Einstein@Home|Temporarily failed download of config_S4R2a.cfg: error 500
12/29/2005 1:26:42 AM|Einstein@Home|Temporarily failed download of r1_1204.0: error 500
12/29/2005 1:26:42 AM|Einstein@Home|Started download of skygrid_1210_r_T09.dat
12/29/2005 1:26:43 AM|Einstein@Home|Started download of albert_4.37_windows_intelx86.exe
12/29/2005 1:26:44 AM|Einstein@Home|Finished download of albert_4.37_windows_intelx86.exe
12/29/2005 1:26:44 AM|Einstein@Home|Throughput 18635 bytes/sec
12/29/2005 1:26:44 AM|Einstein@Home|Started download of albert_4.37_windows_intelx86.pdb
12/29/2005 1:26:44 AM|Einstein@Home|Checksum or signature error for albert_4.37_windows_intelx86.exe
12/29/2005 1:26:45 AM|Einstein@Home|Unrecoverable error for result r1_1204.0__1750_S4R2a_2 (app_version download error: couldn't get input files:<file_xfer_error> <file_name>albert_4.37_windows_intelx86.exe</file_name> <error_code>-200</error_code> <error_message></error_message></file_xfer_error>)
12/29/2005 1:26:45 AM|Einstein@Home|Unrecoverable error for result r1_1204.0__1749_S4R2a_0 (app_version download error: couldn't get input files:<file_xfer_error> <file_name>albert_4.37_windows_intelx86.exe</file_name> <error_code>-200</error_code> <error_message></error_message></file_xfer_error>)
12/29/2005 1:26:47 AM|Einstein@Home|Finished download of albert_4.37_windows_intelx86.pdb
12/29/2005 1:26:47 AM|Einstein@Home|Throughput 2611 bytes/sec
12/29/2005 1:26:47 AM|Einstein@Home|Started download of earth_05_09
12/29/2005 1:26:47 AM|Einstein@Home|Checksum or signature error for albert_4.37_windows_intelx86.pdb
12/29/2005 1:26:49 AM|Einstein@Home|Finished download of earth_05_09
12/29/2005 1:26:49 AM|Einstein@Home|Throughput 33448 bytes/sec
12/29/2005 1:26:49 AM|Einstein@Home|Started download of sun_05_09
12/29/2005 1:26:49 AM|Einstein@Home|Checksum or signature error for earth_05_09
12/29/2005 1:26:52 AM|Einstein@Home|Finished download of sun_05_09
12/29/2005 1:26:52 AM|Einstein@Home|Throughput 2609 bytes/sec
12/29/2005 1:26:52 AM|Einstein@Home|Started download of config_S4R2a.cfg
12/29/2005 1:26:52 AM|Einstein@Home|Checksum or signature error for sun_05_09
12/29/2005 1:26:54 AM|Einstein@Home|Finished download of config_S4R2a.cfg
12/29/2005 1:26:54 AM|Einstein@Home|Throughput 55510 bytes/sec
12/29/2005 1:26:54 AM|Einstein@Home|Started download of r1_1204.0
12/29/2005 1:26:54 AM|Einstein@Home|Checksum or signature error for config_S4R2a.cfg
12/29/2005 1:26:57 AM|Einstein@Home|Finished download of r1_1204.0
12/29/2005 1:26:57 AM|Einstein@Home|Throughput 2611 bytes/sec
12/29/2005 1:26:57 AM|Einstein@Home|Checksum or signature error for r1_1204.0
12/29/2005 1:27:29 AM|Einstein@Home|Temporarily failed download of skygrid_1210_r_T09.dat: error 500
12/29/2005 1:27:30 AM|Einstein@Home|Started download of skygrid_1210_r_T09.dat
12/29/2005 1:27:32 AM|Einstein@Home|Finished download of skygrid_1210_r_T09.dat
12/29/2005 1:27:32 AM|Einstein@Home|Throughput 12799 bytes/sec
12/29/2005 1:27:32 AM|Einstein@Home|Checksum or signature error for skygrid_1210_r_T09.dat
12/29/2005 1:27:48 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
12/29/2005 1:27:48 AM|Einstein@Home|Reason: To fetch work
12/29/2005 1:27:48 AM|Einstein@Home|Requesting 43200 seconds of new work, and reporting 2 results
12/29/2005 1:27:53 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
12/29/2005 1:27:53 AM|Einstein@Home|Message from server: No work sent
12/29/2005 1:27:53 AM|Einstein@Home|Message from server: (reached daily quota of 2 results)
12/29/2005 1:27:53 AM|Einstein@Home|No work from project

____________

Ingleside
Joined: Jan 23 05
Posts: 32
ID: 7428
Credit: 48,690
RAC: 51
Message 24404 - Posted 29 Dec 2005 14:49:35 UTC - in response to Message 24328.

I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...


Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 24412 - Posted 29 Dec 2005 16:54:10 UTC - in response to Message 24404.

I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...


Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

Quite bizarre ... especially as there are many mixed into the sequence where all 3 (or 4) are sent virtually simultaneously. Well, weird anyway ... then again, I never claimed to understand how all this stuff works ...
____________

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24425 - Posted 29 Dec 2005 21:59:11 UTC - in response to Message 24094.


[EDIT December 27]
Is is intentional that the target number of results is three rather than the old value of four?

Yes, this is intentional. It may slow down result validation in some cases but will increase our computing power by ~ 25%.



You are underselling here.

Say you got 12 users. They used to crunch 3 different WU with a replication of 4; now they crunch 4 different WU, replicated 3 times. Throughput up from 3 to 4 is a 33% rise.

Thank you very much for this change - it is one I asked for, as did many others - tho not nearly as many as asked for the real time clock in the screensaver ;-)
____________
~~gravywavy

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24426 - Posted 29 Dec 2005 22:15:07 UTC - in response to Message 24200.

Of course, with the better versions out there I have no idea why anyone would still use 4.19 ... :)

On a low resolution screen the 4.19 manager made a much better use of screen space than anything since. Those buttons are so enormous that on 640x480 you can't even attach to a project cos you can't reach the button. Talk about shunning users of old equipment.

Why the next manager can't drop the buttons and use right-click context menus beats me - it's how it should have been done in the first place, let the OS decide how to fit it on the desktop when it is clicked. Would also make it accessible to those who need very large fonts sizes. Come on, fixed layout interfaces should be a no-no.

Also, some people liked the graphics slider showing progress (still available vie BOINCview by the way).

But those are the only areas where 4.19 still wins.

IMO there is no advantage at all to the 4.19 client.

And just to be clear, no I don't still use it - the advantages of later clients encouraged me to upgrade at whichever point EDF was working reasonably sensibly.

R~~
____________
~~gravywavy

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24427 - Posted 29 Dec 2005 22:41:41 UTC - in response to Message 24412.
Last modified: 29 Dec 2005 23:02:58 UTC

I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...


Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

Quite bizarre ... especially as there are many mixed into the sequence where all 3 (or 4) are sent virtually simultaneously. Well, weird anyway ... then again, I never claimed to understand how all this stuff works ...


I explained this just 213 days ago and some people have forgotten already ;-)

Seriously Paul - one day I will keep my promise to start contributing to the wiki, but in the meantime if you'd like to find a place for this it would be great.

My guess is that this is a non-deliberate side-effect of other scheduling rules. The patent on all such unintended side effects is held by Murphy.

Consider

rule 1 - wherever possible assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Rule 1 reduces download times, which are bad enough on E@h anyway. Rule 2 means that redundancy is spread out to reduce chances of two computers repeatedly make the same mistake on the same wu. Let me be clear, I don't know that rule 2 exists, I am 'reverse engineerig' it from what seems to happen.

Rule 1 certainly exists and is also known as locality scheduling (thanks to JOhn Keck for that)

Now suppose A (by luck) is the first computer to be assigned work from a new dataset.

Eventually, along comes B who has no more wu to be assigned form their old data, and thay are assigned wu from the same dataset as B. Because of rule 2, B will only be assigned one wu that is shared with A. B's next wu after that will be a different wu from the same dataset. Meanwhile A may well want a second wu.

Then along comes C, D, E each will only be assigned one of the WU that any other computer has had. We might have this picture just after G gets their frist wu from this dataset:

With an initial allocation of 4 we get:

wu 1 : A, B, C, D
wu 2 : B,
wu 3 : A, E, F, G
wu 4 : A,
wu 5 : B, E
wu 6 : A,
wu 7 : C, E
wu 8 : B, F
wu 9 : A
wu 10: D, F
wu 11: C,
wu 12: B
wu 13: A
wu 14: D
wu 15: C
wu 16: B
wu 17: A
wu 18: C
wu 19: D

eventually there are enough people on board that all wu get all their results issued close together. It is only around the startup of a new data file or a new app that I'd expect to see this kind of effect.

Question for a mathematician - what is the smallest N such that N results can be given to 4N people and no result given to the same two people as any other result? After N results you expect the issuing of results to start looking sensible instead of all over the place.

But notice that the very first WU, and in fact several others along the way, will get all their results sent out together even while others are kept in solo-crunch state for ages.

Bruce: From the project point of view the thing to notice is that you will not get good turn-round on very small batches of WU if your servers are keeping them back for those who've already seen those datasets - that N seems to define the minimum size -- if, of course, my guess is right about the scheduler's decision rules. If the rules are different it just needs someone to dry run on paper how many wu / hosts need to go through the process before there are sufficient returning hosts to make the wu fly out the door nicely.


R~~

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24428 - Posted 30 Dec 2005 0:10:41 UTC - in response to Message 24427.


Question for a mathematician - what is the smallest N such that N results can be given to 4N people and no result given to the same two people as any other result?


No, my mistake - with N results and 4N hosts it's always possible - everyone gets just one wu! The question meant to ask is at what point do you stop needing almost as many wu as hosts - or something. This effect does go away after a startup period, and there must be some way to work out a switch-over, but its around midnight and I can't think it through...

R~~ zzzz

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24430 - Posted 30 Dec 2005 0:42:55 UTC - in response to Message 24094.

The new WU have different execution times, typically ranging from about 25% to 100% the previous execution times


hi again Bruce. If I understand correctly the different run times originate from the different frequecies, which are know at the outset - is this right?

If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect.

To keep working over a weekend, for example, 2.7 days work is fine if I know it will be 2.7 days.

If, however, the variation is not known / knowable by the scheduler and client at download, if say it can vary by a factor of four, I'd have to ask for extra work in case the work issued ran short. If it then ran long I might get into deadline issues, or it might put otherprojects into EDF, and so on.

So accurate estimates of run lengths please, and based on you tester's experience of crunching the test WU. As accurate as possible - if the science means the numbers can't be predicted then we'd all have to live with that.


River~~
R~~

Profile genes
Avatar
Joined: Nov 10 04
Posts: 41
ID: 612
Credit: 550,704
RAC: 0
Message 24433 - Posted 30 Dec 2005 1:50:52 UTC - in response to Message 24427.


rule 1 - wherever possible assign work from the data the client already holds


*This* would explain why a new machine I just attached is getting nothing but Albert WU's while all my older machines get nothing but the original Einsteins.

(not that I'm complaining, just curious)

-Gene
____________

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24460 - Posted 30 Dec 2005 18:35:58 UTC - in response to Message 24433.
Last modified: 30 Dec 2005 18:36:52 UTC


rule 1 - wherever possible assign work from the data the client already holds


*This* would explain why a new machine I just attached is getting nothing but Albert WU's while all my older machines get nothing but the original Einsteins.

(not that I'm complaining, just curious)

-Gene


Correct. Bruce said that the allocations are random, but you only go into the draw when there are no wu to crunch from the data you already have.

You may have noticed you get runs of WU with similar-starting names. It is only at the changeover from one such set of WU to another that you have any chance of getting an Albert. Presumably Alberts alos come in batches attached to different datasets, in which case when your computer can't get any more of the same set of Alberts, it may well revert at that point to the Einsteins.

Dial-up users may have noticed that at the changeover in the name of the wu they get very long connect times - this is because a huge chunk of new data is downloaded. At all other times the instructions for the next wu simply tell the app to do something different with the data already on disk.

Bruce: this makes me think of something else For you to think of...

When Einstein is finally withdrawn, there will be a spate of the server dishing out on-off wu - odds and ends from the old datasets. Oldtimers will remember this happening in previous chageovers. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU.

The advantage of BOINC, of course, is that even if your primary loyalty is with one project, you can easily go elsewhere and come back to avoid temporary issues. By posting a warning like that, dial-up users are more likely to come back than if they are not warned and leave in a tizzy over costs. In my opinion :-)

River~~

Marck
Joined: Feb 11 05
Posts: 9
ID: 15388
Credit: 514,940
RAC: 324
Message 24472 - Posted 30 Dec 2005 23:06:16 UTC

If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect.

As it seems, the core client already is aware of the different run times. Right now, I've got Albert results waiting in the cache that have different "To completion" times.

Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU.

I remember that this was already done for the previous changeover (see the news at April 7, 2005 in the archive), so I see no reason why it won't be done this time, too. :)
____________

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 24507 - Posted 31 Dec 2005 11:25:48 UTC - in response to Message 24472.
Last modified: 31 Dec 2005 11:26:58 UTC


As it seems, the core client already is aware of the different run times. Right now, I've got Albert results waiting in the cache that have different "To completion" times.

:-)

...it may be an idea to post warnings [to dial-up users] when that is about to happen...

I remember that this was already done for the previous changeover (see the news at April 7, 2005 in the archive), so I see no reason why it won't be done this time, too. :)

yep - take my comment as a gentle reminder and if it was going to be done anyway, then take it as thanks-in-advance!

R~~

Profile TTL
Joined: Mar 29 05
Posts: 2
ID: 67272
Credit: 16,052
RAC: 0
Message 24513 - Posted 31 Dec 2005 12:28:25 UTC

Hello everybody,
since toady my computer uses Albert as well.

For me it looks like when using Albert there is much more activity on my hard drive.
For example the client_state.xml is rewritten at least every three seconds now. When I remember right, it took around a minute with Einstein before the file was updated. Currently when running Albert there is activity on my hard drive around every second. Because I run Albert on a laptop I hear every hard drive activity - this is annoying.

Next I tried cpulimit to reduce the amount of time used by Albert, it did not do anything (but I must say that I did not test it with Einstein aswell).

Finally I would like to request more architecture specific versions of Albert. Like some for Ahlon K7, K8 Pentium 4 and so on.

This informations might be useful:
I am using Boinc version 4.25 under Linux Kernel 2.6.12 with an AMD Sempron 2800+
I use the sync option for my hard drive to write changed data back immediately since the WLAN driver sometimes crashes my system and I dont want any loss of unwritten data when this happens.

TTL

Bill Michael
Joined: Jul 27 05
Posts: 306
ID: 98119
Credit: 34,927
RAC: 0
Message 24529 - Posted 31 Dec 2005 17:57:21 UTC - in response to Message 24513.

For me it looks like when using Albert there is much more activity on my hard drive.


What is your setting for "Write to disk at most every N seconds"? (Your Account, General Preferences) - are you sure it's the client_state.xml that's being written to, and not the VM page file? Do you have plenty of RAM for everything you're running?

____________

Profile TTL
Joined: Mar 29 05
Posts: 2
ID: 67272
Credit: 16,052
RAC: 0
Message 24530 - Posted 31 Dec 2005 18:41:36 UTC - in response to Message 24529.

For me it looks like when using Albert there is much more activity on my hard drive.


What is your setting for "Write to disk at most every N seconds"? (Your Account, General Preferences) - are you sure it's the client_state.xml that's being written to, and not the VM page file? Do you have plenty of RAM for everything you're running?


This setting is set to write the data back every 120seconds. I think I have enough RAM (448MB) since most of the time 50% of the RAM is used for cache only.
If I do a
sh -c 'while grep fraction_done client_state.xml ; do sleep 1 ; done'
I get a new fraction_done value mostly every three seconds (sometimes after four or five seconds), meaning that the file is written at least every three to five seconds.

TTL
____________

Ross Morgan
Joined: Feb 20 05
Posts: 18
ID: 23520
Credit: 122,639
RAC: 0
Message 24537 - Posted 31 Dec 2005 20:47:33 UTC

Albert seems to be running a lot slower than einstein 4.79. For example i started a WU today 4pm(UTC)and it has been left running itself with the computer not being used and no other major resource gobbling processes running. The screensaver and the system virus scan have been disbled for example.

At 20.35(UTC) the Albert WU is only showing as having done 3hrs 17m of work not the 4hrs 35m it should have done.

Is anybody else seeing this kind of behaviour or is it a problem with my system only?
____________

Ross Morgan
Joined: Feb 20 05
Posts: 18
ID: 23520
Credit: 122,639
RAC: 0
Message 24543 - Posted 31 Dec 2005 23:23:33 UTC - in response to Message 24537.

Albert seems to be running a lot slower than einstein 4.79. For example i started a WU today 4pm(UTC)and it has been left running itself with the computer not being used and no other major resource gobbling processes running. The screensaver and the system virus scan have been disbled for example.

At 20.35(UTC) the Albert WU is only showing as having done 3hrs 17m of work not the 4hrs 35m it should have done.

Is anybody else seeing this kind of behaviour or is it a problem with my system only?



The problem above seems to be down to the Albert app only using 50% of the available CPU power. The other 50% of CPU power is running the idle cycle. This computer is of course a single core processor(non -HT). I am just wondering if Albert was designed solely for HT or dual core processors and that's why it using only 50% of the CPU power.

Does anybody's single core non HT processor use 50% or less of the CPU power available?
____________

RandyC
Avatar
Joined: Jan 18 05
Posts: 319
ID: 3454
Credit: 1,949,162
RAC: 1,872
Message 24552 - Posted 1 Jan 2006 1:35:26 UTC - in response to Message 24543.


Does anybody's single core non HT processor use 50% or less of the CPU power available?


Using an AMD XP1700 with XP Pro SP2...using 95% of CPU for Albert.

Ross Morgan
Joined: Feb 20 05
Posts: 18
ID: 23520
Credit: 122,639
RAC: 0
Message 24554 - Posted 1 Jan 2006 1:55:30 UTC - in response to Message 24552.


Does anybody's single core non HT processor use 50% or less of the CPU power available?


Using an AMD XP1700 with XP Pro SP2...using 95% of CPU for Albert.



Cheers the rig i am having problems with is AMD XP1900 so i try resetting the project first of all.
____________

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24564 - Posted 1 Jan 2006 5:09:32 UTC - in response to Message 24460.

When Einstein is finally withdrawn, there will be a spate of the server dishing out on-off wu - odds and ends from the old datasets. Oldtimers will remember this happening in previous chageovers. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU.

The advantage of BOINC, of course, is that even if your primary loyalty is with one project, you can easily go elsewhere and come back to avoid temporary issues. By posting a warning like that, dial-up users are more likely to come back than if they are not warned and leave in a tizzy over costs. In my opinion :-)

River~~


Would it be possible to improve the sceduler to not send one off work units to machines with slow connections? Since the configuration page already allows you to specify a maximum connection rate the needed information to tell between clients on narrow pipes and fat ones should already be available.
____________

Profile slavko.sk
Avatar
Joined: Jan 22 05
Posts: 33
ID: 6210
Credit: 477,735
RAC: 2,268
Message 24583 - Posted 1 Jan 2006 12:37:49 UTC

Hi all,
I got quite a lot of Downloading Errors by albert on Windows platform. Works well on Linux. Which version of BOINC Manager albert app requires?
____________
ALL GLORY TO THE HYPNOTOAD!
Do You Dare?

Sharky T
Joined: Feb 19 05
Posts: 159
ID: 20395
Credit: 836,157
RAC: 4,482
Message 24585 - Posted 1 Jan 2006 14:06:41 UTC - in response to Message 24583.
Last modified: 1 Jan 2006 14:13:50 UTC

Hi all,
I got quite a lot of Downloading Errors by albert on Windows platform. Works well on Linux. Which version of BOINC Manager albert app requires?
Everybody who has this problem lives in the same timezone(mid european) and tries connect to einstein.aei.mpg.de when downloading the app.
I'm not sure of this yet,but if you change your timezone in your computer (try UK or something) to try to download from the UK server.(Let me know if it works.)
My albert works alright on all my 4 boxes but I got "couldn't connect to host[einstein.aei.mpg.de]" on every box,so I think my downloads might have come from another server(probably UK).
____________

Profile slavko.sk
Avatar
Joined: Jan 22 05
Posts: 33
ID: 6210
Credit: 477,735
RAC: 2,268
Message 24587 - Posted 1 Jan 2006 17:14:56 UTC - in response to Message 24585.

Hi all,
I got quite a lot of Downloading Errors by albert on Windows platform. Works well on Linux. Which version of BOINC Manager albert app requires?
Everybody who has this problem lives in the same timezone(mid european) and tries connect to einstein.aei.mpg.de when downloading the app.
I'm not sure of this yet,but if you change your timezone in your computer (try UK or something) to try to download from the UK server.(Let me know if it works.)
My albert works alright on all my 4 boxes but I got "couldn't connect to host[einstein.aei.mpg.de]" on every box,so I think my downloads might have come from another server(probably UK).

I changed time zone but I reached daily quota, I have to wait till tomorrow.
____________
ALL GLORY TO THE HYPNOTOAD!
Do You Dare?

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24591 - Posted 1 Jan 2006 18:53:00 UTC

Hmmm... it seems that I finaly manage to download albert files. Here is a log...

1/1/2006 7:24:08 PM|Einstein@Home|Started download of albert_4.37_windows_intelx86.exe
1/1/2006 7:24:08 PM|Einstein@Home|Started download of albert_4.37_windows_intelx86.pdb
1/1/2006 7:24:30 PM||Couldn't connect to hostname [einstein.aei.mpg.de]
1/1/2006 7:24:30 PM||Couldn't connect to hostname [einstein.aei.mpg.de]
1/1/2006 7:24:30 PM|Einstein@Home|Temporarily failed download of albert_4.37_windows_intelx86.exe: system I/O
1/1/2006 7:24:30 PM|Einstein@Home|Temporarily failed download of albert_4.37_windows_intelx86.pdb: system I/O
1/1/2006 7:24:30 PM|Einstein@Home|Started download of config_S4R2a.cfg
1/1/2006 7:24:30 PM|Einstein@Home|Started download of r1_0793.0
1/1/2006 7:24:53 PM||Couldn't connect to hostname [einstein.aei.mpg.de]
1/1/2006 7:24:53 PM||Couldn't connect to hostname [einstein.aei.mpg.de]
1/1/2006 7:24:53 PM|Einstein@Home|Temporarily failed download of config_S4R2a.cfg: system I/O
1/1/2006 7:24:53 PM|Einstein@Home|Temporarily failed download of r1_0793.0: system I/O
1/1/2006 7:24:53 PM|Einstein@Home|Started download of skygrid_0800_r_T09.dat
1/1/2006 7:24:54 PM|Einstein@Home|Started download of albert_4.37_windows_intelx86.exe
1/1/2006 7:25:14 PM||Couldn't connect to hostname [einstein.aei.mpg.de]
1/1/2006 7:25:15 PM|Einstein@Home|Temporarily failed download of skygrid_0800_r_T09.dat: system I/O
1/1/2006 7:25:16 PM|Einstein@Home|Started download of albert_4.37_windows_intelx86.pdb
1/1/2006 7:26:03 PM|Einstein@Home|Finished download of albert_4.37_windows_intelx86.exe
1/1/2006 7:26:03 PM|Einstein@Home|Throughput 15258 bytes/sec
1/1/2006 7:26:03 PM|Einstein@Home|Started download of config_S4R2a.cfg
1/1/2006 7:26:04 PM|Einstein@Home|Finished download of config_S4R2a.cfg
1/1/2006 7:26:04 PM|Einstein@Home|Throughput 1328 bytes/sec
1/1/2006 7:26:04 PM|Einstein@Home|Started download of r1_0793.0
1/1/2006 7:34:07 PM|Einstein@Home|Finished download of r1_0793.0
1/1/2006 7:34:07 PM|Einstein@Home|Throughput 15322 bytes/sec
1/1/2006 7:34:07 PM|Einstein@Home|Started download of skygrid_0800_r_T09.dat
1/1/2006 7:34:17 PM|Einstein@Home|Finished download of skygrid_0800_r_T09.dat
1/1/2006 7:34:17 PM|Einstein@Home|Throughput 16738 bytes/sec
1/1/2006 7:38:59 PM|Einstein@Home|Finished download of albert_4.37_windows_intelx86.pdb
1/1/2006 7:38:59 PM|Einstein@Home|Throughput 3930 bytes/sec
1/1/2006 7:39:00 PM||request_reschedule_cpus: files downloaded

Pay attention on "Couldn't connect to hostname [einstein.aei.mpg.de]".

This machine has BOINC 5.2.7 version. Hope now everything will work OK. Will se in about one hour, when I finish my einstein WU. Will keep you all posted.

I'll try tomorrow with my office machines. Hope I will succeed with them too, so I can resume crunching for E@H at full speed.

____________

Sharky T
Joined: Feb 19 05
Posts: 159
ID: 20395
Credit: 836,157
RAC: 4,482
Message 24593 - Posted 1 Jan 2006 19:15:10 UTC

Congrats Edo. I think your on dry land now. :)
The trick is to download when you can't reach the german server. :) (at least till they fix it.)
____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24595 - Posted 1 Jan 2006 19:31:54 UTC

Thanks Sharky! :) I hope german server will be unreachable for me tomorrow when I turn on office machines. :)
____________

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 24596 - Posted 1 Jan 2006 20:08:13 UTC - in response to Message 24513.


For me it looks like when using Albert there is much more activity on my hard drive.


I see the same thing. My system started running Albert units this morning, and is doing a disk write every 5 seconds. My preferences are set for 60 seconds and I'm still within my actual system memory (with swap space showing 0% in use).


____________

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24599 - Posted 1 Jan 2006 20:41:59 UTC

Just to report back... finally Albert works fine on at least one of my machines! Hope tomorrow I will manage to make it work on other machines too.

Edo
____________

Sir Ulli
Avatar
Joined: Jan 18 05
Posts: 121
ID: 3835
Credit: 104,603
RAC: 0
Message 24615 - Posted 1 Jan 2006 23:16:47 UTC

the first Albert WUs



for Info Athlon64 3.200+

Greetings from Germany NRW
Ulli

Profile Santas little helper
Joined: Feb 11 05
Posts: 37
ID: 16600
Credit: 179,652
RAC: 86
Message 24623 - Posted 2 Jan 2006 1:05:34 UTC

Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret?
Thx in advance
____________
Greetings, Santas little helper

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24627 - Posted 2 Jan 2006 1:56:35 UTC - in response to Message 24623.

Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret?
Thx in advance


Read the 1st post in this thread, including the Q&A section. I've got a batch of Alberts that have 17% less estimated runtime than Einsteins. 1st one should be starting ~noon tommorrow if I've got the timing right (my cores are currently half a work unit out of sync, drifting closer/farther depending on the vagarities of which core windows decides to use for housekeeping tasks.)
____________

Profile slavko.sk
Avatar
Joined: Jan 22 05
Posts: 33
ID: 6210
Credit: 477,735
RAC: 2,268
Message 24638 - Posted 2 Jan 2006 7:14:11 UTC

I'm still getting errors:
2.1.2006 1:22:30|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
2.1.2006 1:22:32|Einstein@Home|Started download of l1_1167.5
2.1.2006 1:22:58|Einstein@Home|Temporarily failed download of l1_1167.5: error 504
2.1.2006 1:22:59|Einstein@Home|Started download of l1_1167.5
2.1.2006 1:23:59|Einstein@Home|Finished download of l1_1167.5
2.1.2006 1:23:59|Einstein@Home|Throughput 108515 bytes/sec
2.1.2006 1:23:59|Einstein@Home|MD5 check failed for l1_1167.5
2.1.2006 1:23:59|Einstein@Home|expected 5aa89df74b1cdb44a5e44d69b974299f, got 83bb2e1ddb98aba4db62b7b48bb869b5
2.1.2006 1:23:59|Einstein@Home|Checksum or signature error for l1_1167.5
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.7_0.1_T08_S4lD_3 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.8_0.1_T08_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.9_0.1_T08_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.5_0.1_T09_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.6_0.1_T09_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.7_0.1_T09_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.8_0.1_T09_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.9_0.1_T09_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.5_0.1_T10_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.6_0.1_T10_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.7_0.1_T10_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.8_0.1_T10_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.9_0.1_T10_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.5_0.1_T11_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.6_0.1_T11_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)
2.1.2006 1:24:00|Einstein@Home|Unrecoverable error for result l1_1167.5__1167.7_0.1_T11_S4lD_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>l1_1167.5</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message></file_xfer_error>)

____________
ALL GLORY TO THE HYPNOTOAD!
Do You Dare?

Profile MAGIC
Avatar
Joined: Jan 18 05
Posts: 160
ID: 3123
Credit: 1,527,289
RAC: 2,810
Message 24639 - Posted 2 Jan 2006 8:28:58 UTC
Last modified: 2 Jan 2006 8:30:32 UTC


The Albert 4.37's are working perfect for me so far and I have done lots of them.

My only problem is the server seems to have picked my slowest pc and gives me Alberts all the time but my faster machines still get the other version.

Not sure why that is.

But I just sent another in (5.5hr average)

This time it loaded 4 more Alberts but this time it added a "download of skygrid_0500_r_T08.dat"


First time I have seen that.


Hopefully Alberts will start jumping on my other faster machines soon.



Happy New Year

____________

Profile Santas little helper
Joined: Feb 11 05
Posts: 37
ID: 16600
Credit: 179,652
RAC: 86
Message 24646 - Posted 2 Jan 2006 11:39:46 UTC - in response to Message 24627.
Last modified: 2 Jan 2006 11:42:48 UTC

Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret?
Thx in advance


Read the 1st post in this thread, including the Q&A section. I've got a batch of Alberts that have 17% less estimated runtime than Einsteins. 1st one should be starting ~noon tommorrow if I've got the timing right (my cores are currently half a work unit out of sync, drifting closer/farther depending on the vagarities of which core windows decides to use for housekeeping tasks.)


I read the 1st post and most of the others below. But the most are about some less important things concerning to what the program does while calculating. Reduced time says absolutely nothing important about the internal structure of the program ... the only important thing is this one:

"The differences in run times come about because we are now using a sky search grid and frequency band which depends upon frequency"

... and this is just a brief description ... so, anyone with deeper insights? :)

(btw: I am not complaining! I just want to know the important differencies)
____________
Greetings, Santas little helper

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24658 - Posted 2 Jan 2006 17:11:46 UTC

I'm wondering about exactly what the scheduler is doing re einstien/alberts myself.

The way I *thought* it was going to work was to give E's exclusively until it had enough people assigned to each chunk of data to have reasonable computation times for getting everything done. Once that was accomplished as all the work on each dataset was assigned it would start switching people over to A's, and excepting odds/ends due to incomplete/invalid results everyone would stay with A's once they got them.

What's being put into my queue indicates this isn't the case. I've used up almost all my old E's, and am starting on a 2day supply of A's, the 2 days of work at the tail end of my queue however are a set of 'virgin' E's, most of which it appears I'm the only person to've been assigned to do so far.
____________

Profile MAGIC
Avatar
Joined: Jan 18 05
Posts: 160
ID: 3123
Credit: 1,527,289
RAC: 2,810
Message 24665 - Posted 2 Jan 2006 18:16:01 UTC - in response to Message 24646.
Last modified: 2 Jan 2006 18:32:11 UTC

I read the 1st post and most of the others below. But the most are about some less important things concerning to what the program does while calculating. Reduced time says absolutely nothing important about the internal structure of the program ... the only important thing is this one:

"The differences in run times come about because we are now using a sky search grid and frequency band which depends upon frequency"

... and this is just a brief description ... so, anyone with deeper insights? :)

(btw: I am not complaining! I just want to know the important differencies)[/quote]


http://tinyurl.com/8tmbt



____________

Sharky T
Joined: Feb 19 05
Posts: 159
ID: 20395
Credit: 836,157
RAC: 4,482
Message 24668 - Posted 2 Jan 2006 18:33:48 UTC

Slavko.sk
I see you have changed the timezone on this computer and it looks in your scheduler logs that you have the UK site on top of that list.I guess those downloads went well.
But this computer have the mid european timezon and the german server on top.
Try change this timezone too.

I guess its ok to set the timezones back to normal when you have done the downloads and start crunching.(untill the next big datablock is needed??)
____________

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24669 - Posted 2 Jan 2006 18:42:03 UTC - in response to Message 24665.


http://tinyurl.com/8tmbt


Did you actaully read that link, or just conjur it out of google? IF the former, where should we be looking in it? The phrases "all-sky" and "all sky" are not in the document anywhere, and every instance of improved appears to either refer directly to sensor hardware, or other backend infrastructure; not the client app.

PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens.

http://66.102.7.104/search?q=cache:by7iLtzNjZIJ:www.ligo.caltech.edu/NSF/pdf/annual_report.pdf+What+exactly+does+Improved+(all-sky+pulsar+search)+mean%3F+2006&hl=en




____________

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24670 - Posted 2 Jan 2006 18:44:02 UTC - in response to Message 24669.

PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens.


Hmmm.

Did you edit your post, JAHMAGIC? The link that showed *after* I made my reply isn't the same as the one I got backing in my browser.

____________

caferace
Joined: Oct 21 05
Posts: 4
ID: 115924
Credit: 286,637
RAC: 0
Message 24679 - Posted 2 Jan 2006 19:33:04 UTC

I'm seeing some Albert ugliness on my Mac:

<core_client_version>5.2.13</core_client_version>
<stderr_txt>

2006-01-02 00:40:57.9653 [normal]: Start of BOINC application 'albert_4.39_powerpc-apple-darwin'.
MacOS Error -43 occured in Mac_Lib.c line 65
MacOS Error -43 occured in Mac_Lib.c line 65
2006-01-02 00:40:57.9986 [normal]: Started search at lalDebugLevel = 0
2006-01-02 00:40:59.1248 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-01-02 00:40:59.1253 [normal]: No usable checkpoint found, starting from beginning.
Detected CPU type 1
2006-01-02 00:52:45.2383 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2006-01-02 05:06:06.2487 [normal]: Search finished successfully.

</stderr_txt>

And on my older P3:

<core_client_version>5.2.8</core_client_version>
<stderr_txt>

2005-12-31 12:45:36.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
2005-12-31 12:45:36.1250 [normal]: Started search at lalDebugLevel = 0
2005-12-31 12:45:36.8125 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2005-12-31 12:45:36.8125 [normal]: No usable checkpoint found, starting from beginning.
2005-12-31 12:50:57.9843 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2005-12-31 16:24:34.0937 [normal]: Search finished successfully.

</stderr_txt>

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24691 - Posted 2 Jan 2006 21:52:28 UTC

Just to report back... Today I try to run E@H at my office machine... but it produce lots of client errors. It didn't even try to work with Albert but it tried to download to Einstein app and files. I get error 504 message, and signature error when downloading files. And this machine work perfectly before it was hit by Albert. Now I just can't make it to work. My home machine work nicely on Albert. The only difference is that at office I'm behind proxy. But that wasn't problem before. Hope staff will fix this nightmare soon (if this is problem on their part).
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 24696 - Posted 2 Jan 2006 22:29:59 UTC - in response to Message 24679.
Last modified: 2 Jan 2006 22:31:22 UTC

I'm seeing some Albert ugliness on my Mac:

<core_client_version>5.2.13</core_client_version>
<stderr_txt>

2006-01-02 00:40:57.9653 [normal]: Start of BOINC application 'albert_4.39_powerpc-apple-darwin'.
MacOS Error -43 occured in Mac_Lib.c line 65
MacOS Error -43 occured in Mac_Lib.c line 65
2006-01-02 00:40:57.9986 [normal]: Started search at lalDebugLevel = 0
2006-01-02 00:40:59.1248 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-01-02 00:40:59.1253 [normal]: No usable checkpoint found, starting from beginning.
Detected CPU type 1
2006-01-02 00:52:45.2383 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2006-01-02 05:06:06.2487 [normal]: Search finished successfully.

</stderr_txt>

And on my older P3:

<core_client_version>5.2.8</core_client_version>
<stderr_txt>

2005-12-31 12:45:36.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
2005-12-31 12:45:36.1250 [normal]: Started search at lalDebugLevel = 0
2005-12-31 12:45:36.8125 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2005-12-31 12:45:36.8125 [normal]: No usable checkpoint found, starting from beginning.
2005-12-31 12:50:57.9843 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2005-12-31 16:24:34.0937 [normal]: Search finished successfully.

</stderr_txt>


Looks like normal operations to me. That is, I think the "No usable checkpoint found . . ." messages are indicative of the first time Albert tried to write a checkpoint for those particular WU's. Every Albert WU I have looked at has one of these messages. In other words, it is only be a problem if a WU gets more than one of these messages.
____________

Profile MAGIC
Avatar
Joined: Jan 18 05
Posts: 160
ID: 3123
Credit: 1,527,289
RAC: 2,810
Message 24710 - Posted 3 Jan 2006 3:39:01 UTC - in response to Message 24670.
Last modified: 3 Jan 2006 3:42:14 UTC

PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens.


Hmmm.

Did you edit your post, JAHMAGIC? The link that showed *after* I made my reply isn't the same as the one I got backing in my browser.



Sorry about that Dan.....yeah I tend to edit a few times (I just got up and was having coffee and running several pc's at once)

And since I'm stuck on the worlds slowest dialup here things run slower than my thoughts.


No this site doesn't auto-tinyurl your links ......you just convert them yourself and post them especially long one like that one)

I just happened to have that page up and noticed towards the end it mentioned sky search grid.


Basically looking around more than just one section of the sky.


(and yeah my tiny url was messed up at first and I switched it as fast as my dialup would allow)


The best thing that happened is now all my machines are loaded with the new Albert 4.37's so I get to test the timing differences and all the rest of the Einstein fun.



____________

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 24714 - Posted 3 Jan 2006 4:21:27 UTC - in response to Message 24710.

The best thing that happened is now all my machines are loaded with the new Albert 4.37's so I get to test the timing differences and all the rest of the Einstein fun.


Enjoy. I'm ~40h from switching back from 83% alberts to einsteins again.
____________

Profile slavko.sk
Avatar
Joined: Jan 22 05
Posts: 33
ID: 6210
Credit: 477,735
RAC: 2,268
Message 24722 - Posted 3 Jan 2006 7:55:01 UTC - in response to Message 24668.

Hi,
yes I succeed on all my windows boxes after chnaging time zone. Thank for help. The only one is missing. But that is a server, I don't want chnge time zone on it for now. I changed it for me as a user but BOINC runs on it as service and looks like it doesn't take time zone from the user which it is running under. I have to investigate more. I hope also that they will fix the problem with german server, probably there is still noboday after the christmas/new year eve.
S


Slavko.sk
I see you have changed the timezone on this computer and it looks in your scheduler logs that you have the UK site on top of that list.I guess those downloads went well.
But this computer have the mid european timezon and the german server on top.
Try change this timezone too.

I guess its ok to set the timezones back to normal when you have done the downloads and start crunching.(untill the next big datablock is needed??)


____________
ALL GLORY TO THE HYPNOTOAD!
Do You Dare?

Profile Edo
Avatar
Joined: Feb 11 05
Posts: 90
ID: 14005
Credit: 14,737
RAC: 0
Message 24742 - Posted 3 Jan 2006 14:22:44 UTC

Finally I decide to change my time zone on my office machines too. I change it to EST (US) zone and now it works perfectly. It downloaded Albert app and all the files without any problems.

Edo
____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 24762 - Posted 3 Jan 2006 21:18:59 UTC - in response to Message 24596.
Last modified: 3 Jan 2006 21:19:26 UTC


For me it looks like when using Albert there is much more activity on my hard drive.

I see the same thing. My system started running Albert units this morning, and is doing a disk write every 5 seconds. My preferences are set for 60 seconds and I'm still within my actual system memory (with swap space showing 0% in use).


Please, could you help us to identify which files are being modified? A simple way is to set your preferences to (say) 600 seconds, then monitor the timestamps of the files in projects/einstein.phys.uwm.edu/ and in slots/N/ to see which of these files is being written to more often than once every ten minutes.
____________

Bill Michael
Joined: Jul 27 05
Posts: 306
ID: 98119
Credit: 34,927
RAC: 0
Message 24763 - Posted 3 Jan 2006 21:26:10 UTC - in response to Message 24513.

For me it looks like when using Albert there is much more activity on my hard drive.
For example the client_state.xml is rewritten at least every three seconds now.
TTL


Above is from this message, so it looks like it's the client_state.xml file being written to.

____________

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 24770 - Posted 4 Jan 2006 0:20:32 UTC - in response to Message 24762.
Last modified: 4 Jan 2006 0:46:20 UTC

Please, could you help us to identify which files are being modified? A simple way is to set your preferences to (say) 600 seconds, then monitor the timestamps of the files in projects/einstein.phys.uwm.edu/ and in slots/N/ to see which of these files is being written to more often than once every ten minutes.


I set my write to disk option for 600 seconds. Looking in the main boinc directory, the client_state.xml and client_state_prev.xml are updating at an interval varying between 3 and 5 seconds. In the slots directories (HT so there are 2 instances running), each slot's Fstat.out.ckp file is updating at varying intervals of 2 to 4 seconds.

I then suspended einstein to put it back to running seti, and confirmed that the state.sah file does write to disk at the correct interval as specified in the preferences.

After that, when going back to einstein, there was a 50 second delay in which there was no disk activity, then an entry was written to stderr.txt (says it was verifying the checksum for Fstat.out.ckp) after which it resumes the actual computation and again resumes writing the checkpoint file every 2 to 4 seconds.

(edit) - sorry, forgot to mention the project folder. Both result files (the text file bearing the same name as as work units being run) in the ~/projects/einstein.phys.uwm.edu folder are being updated at 2 to 4 second intervals.

Rojer
Joined: Apr 2 05
Posts: 23
ID: 68837
Credit: 1,765,892
RAC: 330
Message 24786 - Posted 4 Jan 2006 12:08:11 UTC

Wish to canel the limit that one machine onle get 16(max)wu.
Some new wu like this(http://einstein.phys.uwm.edu/workunit.php?wuid=3147325)
only take few minutes,so my pc will quickly have no work to do!And my pc in a local net,send more wu that will leave me from bored translation.


____________
Wish you can understand my English:)

KWSN-GMC-Peeper of the Castle Anthrax
Avatar
Joined: Oct 6 05
Posts: 41
ID: 113412
Credit: 478,218
RAC: 784
Message 24787 - Posted 4 Jan 2006 12:09:54 UTC

My first Albert WU has crunched. 5:12:30, down from an average 8:22. quite an improvement. Congrats on a much improved app.
Oh yes, that's on an older Intel P4 prescott, 2.8Ghz 512 L2 1Gb 400ddr.
____________

Profile ragnar schroder
Avatar
Joined: Mar 31 05
Posts: 29
ID: 68076
Credit: 205,493
RAC: 0
Message 24899 - Posted 5 Jan 2006 16:43:49 UTC - in response to Message 24762.
Last modified: 5 Jan 2006 16:44:44 UTC


For me it looks like when using Albert there is much more activity on my hard drive.



I have the same problem, initiated a thread about it in the Problems and Bug reports forum:

http://einstein.phys.uwm.edu/forum_thread.php?id=3513#24882

It bugs me quite a bit, hope someone has a solution.

Greetings, Mr Ragnar Schroder

Nothing But Idle Time
Joined: Aug 24 05
Posts: 158
ID: 103162
Credit: 289,204
RAC: 0
Message 24906 - Posted 5 Jan 2006 17:40:41 UTC - in response to Message 24770.

...
I set my write to disk option for 600 seconds. Looking in the main boinc directory, the client_state.xml and client_state_prev.xml are updating at an interval varying between 3 and 5 seconds. In the slots directories (HT so there are 2 instances running), each slot's Fstat.out.ckp file is updating at varying intervals of 2 to 4 seconds...


I'm running Intel P4 w/HT XP/SP2 3GHz and specified in preferences to update every 3 minutes. Have been running Albert WUs (8-hrs length)for some time. Client_State.xml and Prev..xml as well as chkp files in both einstein slots update every 3 minutes as specified. Lucky me.

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 24910 - Posted 5 Jan 2006 18:01:13 UTC - in response to Message 24906.

...
I set my write to disk option for 600 seconds. Looking in the main boinc directory, the client_state.xml and client_state_prev.xml are updating at an interval varying between 3 and 5 seconds. In the slots directories (HT so there are 2 instances running), each slot's Fstat.out.ckp file is updating at varying intervals of 2 to 4 seconds...


I'm running Intel P4 w/HT XP/SP2 3GHz and specified in preferences to update every 3 minutes. Have been running Albert WUs (8-hrs length)for some time. Client_State.xml and Prev..xml as well as chkp files in both einstein slots update every 3 minutes as specified. Lucky me.


Yeah, it seems that it's specific to the linux version of albert. And may only be happening in certain linux kernels, though of the three people reporting it here the 2.4 and 2.6 kernels are represented.


____________

Robby
Avatar
Joined: Jan 18 05
Posts: 29
ID: 3676
Credit: 145,639
RAC: 47
Message 25016 - Posted 6 Jan 2006 22:29:17 UTC

Have same continuous 5 second disk access on all 3 Mandriva, 2005LE and 2006.0, systems when Albert is running. Have General Preferences set to 60 seconds for disk updating. Looking in boinc directory client_state_prev.xml and client_state.xml are modifed every minute. In slots/0 Fstat.out.ckp is shown as modified every minute and in projects the wu r1_1112.5__1343_s4r2a_0_0 is shown as being modified every minute. These are the only files I have seen being updated so far. Boinc is running in my home directory so don't think it necessary to look elsewhere.

Robby
Avatar
Joined: Jan 18 05
Posts: 29
ID: 3676
Credit: 145,639
RAC: 47
Message 25031 - Posted 7 Jan 2006 0:16:03 UTC - in response to Message 25016.

Have same continuous 5 second disk access on all 3 Mandriva, 2005LE and 2006.0, systems when Albert is running. Have General Preferences set to 60 seconds for disk updating. Looking in boinc directory client_state_prev.xml and client_state.xml are modifed every minute. In slots/0 Fstat.out.ckp is shown as modified every minute and in projects the wu r1_1112.5__1343_s4r2a_0_0 is shown as being modified every minute. These are the only files I have seen being updated so far. Boinc is running in my home directory so don't think it necessary to look elsewhere.


As a follow up changed General Prefs to 300 sec for disk updating (taking a que from Bruce). Did an update and verified update in messages. Same files are being modified when Albert is running at one minte intervals except I now have a different wu r1_1112.5__1337s4r2a_1_0 and as before Fstat.out.ckp, client_state_prev.xml, and client_state.xml.

Bill Michael
Joined: Jul 27 05
Posts: 306
ID: 98119
Credit: 34,927
RAC: 0
Message 25032 - Posted 7 Jan 2006 0:20:57 UTC

There's a new version of the Linux app being sent out now - but still no word from the staff...

____________

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 25044 - Posted 7 Jan 2006 2:37:59 UTC - in response to Message 25032.

There's a new version of the Linux app being sent out now - but still no word from the staff...


I don't know what all the new app may address, but I just got some new work that uses it (albert 4.40) and the disk writes are now working at the proper time interval to match my preferences.


____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 25114 - Posted 7 Jan 2006 17:21:20 UTC
Last modified: 7 Jan 2006 17:21:40 UTC

The problem with the Linux version of the albert application writing to disk more frequently than set by user preferences has been fixed. A new Linux version of the app (4.40) is now being distributed which fixes this.

For the technically-inclined, this was a nasty bug in the interface between the BOINC API library and the Einstein application. A API function called boinc_is_standalone() was returning a boolean value, but the API header file declared it to return an integer. It took a number of hours to track this down!
____________

Hans Sveen
Joined: Jan 18 05
Posts: 11
ID: 2717
Credit: 296,852
RAC: 395
Message 25130 - Posted 7 Jan 2006 21:33:07 UTC
Last modified: 7 Jan 2006 21:34:43 UTC

Hello!
I have been out of all boincing and internet for almost a week, had to change isp, and missed some deadlines on several projects! I just wonder is not these Albert wu's compatible with my beta application namely Einstein 0.18, which runs on host 4683? I have tried to reset the project , but no new workun it's came this way.
Is it planned to go back to "old" einstein wu's some time ; as far as I have read Albert's are here to stay!!


Ignore this thread just found out via this thread( http://einstein.phys.uwm.edu/forum_thread.php?id=2369 ) I have to rename app_info.xml and restart boinc to get Alberts!!

With regards and keep up the good work !!


____________

Profile marion
Avatar
Joined: Mar 18 05
Posts: 7
ID: 60009
Credit: 68,498
RAC: 0
Message 25193 - Posted 8 Jan 2006 22:47:25 UTC

no problems here downloading/crunching alberts. they take less time than the other WUs so i claim less credit.

BUT the computer I am paired with (well, it has been sent most of the alberts i have had) hasn't returned their alberts yet, so i must wait for credit.... grr
____________
***
Nothing is impossible

DanNeely
Joined: Sep 4 05
Posts: 780
ID: 106636
Credit: 4,560,479
RAC: 9,003
Message 25195 - Posted 8 Jan 2006 23:02:45 UTC - in response to Message 25193.

no problems here downloading/crunching alberts. they take less time than the other WUs so i claim less credit.

BUT the computer I am paired with (well, it has been sent most of the alberts i have had) hasn't returned their alberts yet, so i must wait for credit.... grr


Normally I get almost instant credit since I've got a 4 day queue. 3 days to cover my isp going down friday evening and not being fixed until monday (happened twice in the last 6 mo), and one more day incase thier sysadmin needs to overnight a spare part. It looks like the person you're waiting on has a similarly long queue.

IT could be worse afterall. I've got a 5 results waiting on a noob who appears to've quit after returning 6 errors the last week of dec, and a 6th on annother noob that only did a single work unit.


____________

Professor Ray
Joined: Feb 22 05
Posts: 11
ID: 30885
Credit: 30,077
RAC: 32
Message 25414 - Posted 11 Jan 2006 23:02:48 UTC - in response to Message 24696.


And on my older P3:

<core_client_version>5.2.8</core_client_version>
<stderr_txt>

2005-12-31 12:45:36.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
2005-12-31 12:45:36.1250 [normal]: Started search at lalDebugLevel = 0
2005-12-31 12:45:36.8125 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2005-12-31 12:45:36.8125 [normal]: No usable checkpoint found, starting from beginning.
2005-12-31 12:50:57.9843 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ... done.
2005-12-31 16:24:34.0937 [normal]: Search finished successfully.

</stderr_txt>


Looks like normal operations to me. That is, I think the "No usable checkpoint found . . ." messages are indicative of the first time Albert tried to write a checkpoint for those particular WU's. Every Albert WU I have looked at has one of these messages. In other words, it is only be a problem if a WU gets more than one of these messages.


Man, I don't know abou that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time.

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 25415 - Posted 11 Jan 2006 23:23:47 UTC
Last modified: 11 Jan 2006 23:24:12 UTC

An idea for the reduced "initial replication" part, I'm not sure if that is possible without a lot of work though:

Maybe fresh results of those workunits, that have result entries with "Over/No reply" could be delivered preferably to hosts with host.avg_turnaround < 3 days

Michael Roycraft
Avatar
Joined: Mar 10 05
Posts: 859
ID: 52156
Credit: 157,718
RAC: 0
Message 25416 - Posted 11 Jan 2006 23:38:05 UTC - in response to Message 25414.

Man, I don't know about that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time.


Ray,

70-80 hours is way too long for your machine, especially considering the WUs weren't even completed in that time, unless there's some incompatibility with Win98/albert that I don't know about. I'd suspect either thermal throttling or something very CPU-intensive running alongside it. Anything you know of that might qualify?

Regards,

Michael

____________
microcraft
"The arc of history is long, but it bends toward justice" - MLK

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 25417 - Posted 11 Jan 2006 23:41:40 UTC

@Professor Ray :

Your results really do not look good, the messages indicate a problem.

- No heartbeat from core client for 31 sec - exiting
- Corrupted Fstat-file '...': has 2697271 bytes instead of 2700598


This is what I would do in this case :

- exclude the BOINC directory from beeing scanned by antivirus software
- while BOINC is not running, do a scandisk
- check the message board for known incompatibilities with Win9x


The plain "Maximum CPU time exceeded" error without additional messages might also be caused by an "over-optimized" BOINC client that causes a too high benchmark value. The maximum allowed CPU time isn't a constant but calculated from the benchmark values I think.

Professor Ray
Joined: Feb 22 05
Posts: 11
ID: 30885
Credit: 30,077
RAC: 32
Message 25418 - Posted 11 Jan 2006 23:55:27 UTC
Last modified: 12 Jan 2006 0:40:35 UTC

Nope, doesn't make any sense.

As is evident from my profile I've accum'd almost 4K credits w/EAH. I'm engaged in three other BOINC science applications, and except for a recent Rosetta hiccup there are no other problems. Rosetta completed the last two WU's w/out issue. Concurrently with BOINC applications, I'm processing UD Agent (Rosetta and/or LigandFit). I'm getting a mean time between UD Agent checkpoints of about 59 minutes with 1 STD being 1:21:00 over a period of 300 checkpoints. This is reasonable performance for UD Agent (and is why I bowed out of WCG processing, i.e., checkpointing for that BOINC application is non-deterministic).

Task switching between BOINC applications occurs about every 3:20:00, and write to disk is every 0:01:00. That should ensure at least one iteration of each application once per CPU wake period.

As far as CPU intensive processing: there's nothing going on. When I desire to launch one of my sims (Falcon4.0 or F1 2002), I wait for UD Stats to show a recent checkpoint, and then I suspend/snooze both BOINC and UD Agent. The rest is just normal IE browsing/Outlook Express.

I'm perceiving either a problem with Albert (and this appears to have just started around New Years).

I'm running default 5.13 BOINC, albeit with a optimized SETI application (that shouldn't affect EAH though). I am OC'd at 112 FSB running PC133 ECC SDRAM async at 4/3. But that hasn't changed either. What HAS changed is Albert.

It could be that my box is dying, i.e., I'm running a slot 1 P3 on a P3V4X, and HD00 (which NEVER spins down because of SpyBot's Tea Timer) is getting long in the tooth at 5 years. The CPU is cooled w/Vantec P35030 dual-fan CPU cooler (shimmed w/Arctic Silver). The P3V4X clock generator has a Arctic Silver shimmed passive (486) heat-sink (as does the Northbridge). If my system is dying, its dying selectively (only w/respect to EAH).

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25424 - Posted 12 Jan 2006 2:48:58 UTC

My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)

(I never use screen saver or graphics).

Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000

I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Michael Roycraft
Avatar
Joined: Mar 10 05
Posts: 859
ID: 52156
Credit: 157,718
RAC: 0
Message 25425 - Posted 12 Jan 2006 3:06:31 UTC - in response to Message 25424.

My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)

(I never use screen saver or graphics).

Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000

I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)


Jord,

Eej, maat! As the other half of the "Graphics Bug" tag-team, I guess that leaves me off the case, too, since it's equally unlikely to be a graphics adaptor driver issue. :-)

Michael

____________
microcraft
"The arc of history is long, but it bends toward justice" - MLK

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25426 - Posted 12 Jan 2006 3:09:49 UTC - in response to Message 25424.
Last modified: 12 Jan 2006 3:33:21 UTC

My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;)

(I never use screen saver or graphics).

Reason: Access Violation (0xc0000005) at address 0x0045CB31 read attempt to address 0x00000000

I never knew the application could read the top part of my memory. I thought it was in use by Windows. :)


Maybe you should try the Beta application! ;-)

Actually, I happened to find a similar result last week and posted this message on the NEW: WINDOWS TEST APPLICATION FOR EINSTEIN@HOME board.

I have to admit that Jord used the more appropriate venue.

Edited - to improve the humor (maybe).



____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25454 - Posted 12 Jan 2006 13:42:33 UTC
Last modified: 12 Jan 2006 13:44:09 UTC

Wow... 6 in a row?? All with the same error. Anyone?

I stopped BOINC already, restarted it, did a reboot. Or am I getting the bad batch on purpose? ;)

edit: 8 in a row now. Einstein is at No New Work until I figure out what's happening here. No need to blow through the other 8 units.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Michael Roycraft
Avatar
Joined: Mar 10 05
Posts: 859
ID: 52156
Credit: 157,718
RAC: 0
Message 25458 - Posted 12 Jan 2006 14:13:07 UTC - in response to Message 25454.

Wow... 6 in a row?? All with the same error. Anyone?

I stopped BOINC already, restarted it, did a reboot. Or am I getting the bad batch on purpose? ;)

edit: 8 in a row now. Einstein is at No New Work until I figure out what's happening here. No need to blow through the other 8 units.


Jord,

Maybe time to consider backing off that 5.3.6 to an approved client?

Michael

____________
microcraft
"The arc of history is long, but it bends toward justice" - MLK

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25462 - Posted 12 Jan 2006 14:20:32 UTC

Why? It's an alpha client, I am an alpha tester.

Besides, everything worked fine until the first Albert crashed early this morning. Check the rest of the list. Seti, Seti Beta, uFluids and Primegrid have no problem either. It's only these Albert results which want to address the top of my memory all of a sudden. They blink out as soon as they start.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Michael Roycraft
Avatar
Joined: Mar 10 05
Posts: 859
ID: 52156
Credit: 157,718
RAC: 0
Message 25464 - Posted 12 Jan 2006 14:29:31 UTC - in response to Message 25462.

Why? It's an alpha client, I am an alpha tester.

Besides, everything worked fine until the first Albert crashed early this morning. Check the rest of the list. Seti, Seti Beta, uFluids and Primegrid have no problem either. It's only these Albert results which want to address the top of my memory all of a sudden. They blink out as soon as they start.


Jord,

I'd forgotten that you're doing alpha work, sorry. Thank you for the sacrifices in the name of progress.

Michael

____________
microcraft
"The arc of history is long, but it bends toward justice" - MLK

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25465 - Posted 12 Jan 2006 14:29:31 UTC - in response to Message 25458.

Wow... 6 in a row?? All with the same error. Anyone?

I stopped BOINC already, restarted it, did a reboot. Or am I getting the bad batch on purpose? ;)

edit: 8 in a row now. Einstein is at No New Work until I figure out what's happening here. No need to blow through the other 8 units.


Jord,

Maybe time to consider backing off that 5.3.6 to an approved client?
Michael


Or, maybe you should start using graphics. ;-)

I have some observations (but no idea what the problem is). I noticed that all but the first WU failed immediately (never even got to the "No usable checkpoint . . ." stage). Makes me wonder if there is a residual value from the first unit somewhere in Albert or BOINC that needs to be reset. Also, the other result (which I pointed to in my previous post here) was an isolated single failure (and it was under BOINC 5.2.13).
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25466 - Posted 12 Jan 2006 14:38:49 UTC

I did a full system shutdown.
Waited a minute.
Rebooted.
Logged on to Windows.
Opened Boinc Manager (boinc is a service).
Allowed work from Einstein.

But:

12/01/2006 15:35:15|Einstein@Home|Message from server: No work sent
12/01/2006 15:35:15|Einstein@Home|Message from server: (reached daily quota of 7 results)
12/01/2006 15:35:15|Einstein@Home|No work from project

Yep: Maximum daily WU quota per CPU: 7/day

Oh well. :-D

Wasn't there a quota of 16, though? I know I spoiled 8...
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25467 - Posted 12 Jan 2006 14:57:22 UTC - in response to Message 25466.

Wasn't there a quota of 16, though? I know I spoiled 8...


I just counted nine of them.

____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25468 - Posted 12 Jan 2006 15:01:08 UTC

You think Friday the 13th is bad... I think Thursday the 12th is. :)

____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25469 - Posted 12 Jan 2006 15:17:49 UTC - in response to Message 25468.
Last modified: 12 Jan 2006 15:18:40 UTC

You think Friday the 13th is bad... I think Thursday the 12th is. :)


Given the way you count, it may already be Friday the 13th where you are (or maybe it's only Wednesday the 11th). :)

BTW: Have you thought about e-mailing your "std*" files to Walt yet? He'll probably want to look at them. Here is an old message with some instructions on how to do that.

____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 25471 - Posted 12 Jan 2006 15:49:40 UTC

I've contacted Bruce on it. He's very excited. I think he complimented me: "a reproducible bug from a BOINC expert and all-around computer geek!". :-)

So now waiting for Walt to contact me.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25476 - Posted 12 Jan 2006 17:21:41 UTC
Last modified: 12 Jan 2006 17:26:29 UTC

the exit code -1073741819 (0xc0000005)


Another, new report of this same problem is over here. Host = 481640.
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 25478 - Posted 12 Jan 2006 17:35:20 UTC

Don't forget, return errors, lower the quota ...
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25498 - Posted 13 Jan 2006 0:08:25 UTC - in response to Message 25476.

the exit code -1073741819 (0xc0000005)


Another, new report of this same problem is over here. Host = 481640.


I happened to find another example of this problem from the owner of this thread.

____________

Fletcher G. Hawkins
Joined: Nov 24 05
Posts: 2
ID: 126623
Credit: 78,551
RAC: 133
Message 25590 - Posted 14 Jan 2006 14:55:11 UTC

I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.)

All that said, I have been with SETI for years, but only just found out about the others when the changeover to BOINC came about. I elected to start with just Einstein in addition to SETI until I could get used to BOINC. I have completed and submitted one Einstein unit, but I'm not getting any new data to work on. My requests for further work go unfilled. Is it something I'm doing, or is the site locked out pending elimination of a bug?
____________

Profile adhc.com.au
Avatar
Joined: Feb 11 05
Posts: 189
ID: 15621
Credit: 108,902
RAC: 0
Message 25596 - Posted 14 Jan 2006 15:37:18 UTC - in response to Message 25590.

.. [snip]..
...I have completed and submitted one Einstein unit, but I'm not getting any new data to work on. My requests for further work go unfilled. Is it something I'm doing, or is the site locked out pending elimination of a bug?


It may be your "Average turnaround time 7.1 days". If your supplied with 3 work units, your PC would be over committed.
14 days is the maximum time allowed to complete a work unit. Even now, you have two & 2 * 7.1 is 14.2 days.
It may work out as you near completion of the work unit your crunching now.
____________


Join the #1 Aussie Alliance on Einstein

Profile Jim Baize
Avatar
Joined: Jan 22 05
Posts: 116
ID: 5775
Credit: 141,226
RAC: 97
Message 25661 - Posted 15 Jan 2006 14:39:07 UTC - in response to Message 25590.

I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.)

All that said, I have been with SETI for years, but only just found out about the others when the changeover to BOINC came about. I elected to start with just Einstein in addition to SETI until I could get used to BOINC. I have completed and submitted one Einstein unit, but I'm not getting any new data to work on. My requests for further work go unfilled. Is it something I'm doing, or is the site locked out pending elimination of a bug?


It would help if you could give us some information. For example, can you give us copies of the exact error message. Also, if we could know things like type of computer, OS, BOINC version, the "connect every 'x' days" setting, and other such similar items.

Welcome to BOINC and to Einstein. I hope we can get this problem fixed for you. There is a whole new world of projects for you to enjoy.

Jim

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 25666 - Posted 15 Jan 2006 16:46:21 UTC

Fletcher,

You would be best served by opening a new thread with a description of the problem, and the system as Jim suggested. Then your solution does not get mized in with other things.
____________

Stick
Joined: Feb 24 05
Posts: 786
ID: 36935
Credit: 209,452
RAC: 183
Message 25921 - Posted 20 Jan 2006 13:36:12 UTC - in response to Message 25498.
Last modified: 20 Jan 2006 13:37:53 UTC



This message reports a completely different problem but if you check the results from his computer, you will see that all of them have failed with exit code -1073741819 (0xc0000005) and his quota is now down to one.
____________

DMansell
Joined: Feb 25 05
Posts: 4
ID: 39128
Credit: 75,776
RAC: 0
Message 26017 - Posted 21 Jan 2006 15:06:37 UTC

I am getting an error on one of 2 pc running boinc, see my original info in Message 25473.
I checked for and upgraded to the latest video driver, changed my NON boinc screen saver to "blank" and am still failing results. i am running on a IBM R50 1.4 GHz 256 MB XP without SP2.
By the way it only started when i added a 3rd project, rosetta, and both rosetta and einstein fail. Rosetta fails much less frequently and Seti NEVER fails. The error also occurs just as the unit is almost 100% complete.
My home system running seti and einstein at the same versions is flawless.

I may try rosetta on my home machine to see if unit start failing. maybe 2 is company and 3 is a crowd?
Doesn't it seems strange that the instruction would call its own memory location to be read?

thanks
DaveM
____________

Profile Paul D. Buck
Joined: Jan 17 05
Posts: 727
ID: 2081
Credit: 852,192
RAC: 1,130
Message 26020 - Posted 21 Jan 2006 17:14:07 UTC

The first thing I would try is to suspend Rosetta@Home and see if the problem persists. There may be an interaction, though I don't see how, between the two projects.
____________

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 26025 - Posted 21 Jan 2006 17:50:13 UTC - in response to Message 26017.

... By the way it only started when i added a 3rd project, rosetta, and both rosetta and einstein fail. ... Doesn't it seems strange that the instruction would call its own memory location to be read?


If you're running rosetta also, be sure that you have the "leave applications in memory while preempted" option under your general preferences set to "yes". Rosetta requires this, and I don't know if it could be related or not, but perhaps if rosetta isn't being left in memory it's not properly releasing something then einstein and rosetta are conflicting over a memory address issue.


____________

[BOINCstats] Garindan
Joined: Feb 20 05
Posts: 1
ID: 23721
Credit: 79,644
RAC: 0
Message 26122 - Posted 24 Jan 2006 5:51:14 UTC

JUst to report from here:

I am running win XP with multiple projects (einstein, seti, LHC and rosetta) on an Xp3000 AMD processor and I have had no trouble at all!

And I've been running Albert units for some time now.

Good work!
____________

Profile Jim Baize
Avatar
Joined: Jan 22 05
Posts: 116
ID: 5775
Credit: 141,226
RAC: 97
Message 26136 - Posted 24 Jan 2006 18:30:40 UTC

another thing to try is would be to run a memory test application to see if your memory is still good. One application is memtest86 (if memory serves me correctly).

Profile KB7RZF
Avatar
Joined: May 8 05
Posts: 124
ID: 79785
Credit: 38,751
RAC: 0
Message 26155 - Posted 25 Jan 2006 4:54:03 UTC

I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here

Jeremy
____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 26165 - Posted 25 Jan 2006 14:06:37 UTC - in response to Message 26155.

I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here

Two other users have successfully completed the same WU, so there is probably nothing intrinsically wrong with it.
____________

Profile KB7RZF
Avatar
Joined: May 8 05
Posts: 124
ID: 79785
Credit: 38,751
RAC: 0
Message 26172 - Posted 25 Jan 2006 15:59:24 UTC - in response to Message 26165.

I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here

Two other users have successfully completed the same WU, so there is probably nothing intrinsically wrong with it.


Hmm, i dunno why it sat there for 4 1/2 hours with no progress at all on it. I'll watch the next ones that I got and see.

Thanks Bruce.

Jeremy

DMansell
Joined: Feb 25 05
Posts: 4
ID: 39128
Credit: 75,776
RAC: 0
Message 26245 - Posted 27 Jan 2006 19:44:36 UTC - in response to Message 26025.

Thanks i will try this. Seti never errors. i had einstein and seti running on one system and was fine then einstein and rosetta fails when adding rosetta. on another system i had seti and rosetta running fine then added einstein and rosetta and einstein fails
one system is 1.4 ghz the other is 600 mhz have tested ram and is okay.

... By the way it only started when i added a 3rd project, rosetta, and both rosetta and einstein fail. ... Doesn't it seems strange that the instruction would call its own memory location to be read?


If you're running rosetta also, be sure that you have the "leave applications in memory while preempted" option under your general preferences set to "yes". Rosetta requires this, and I don't know if it could be related or not, but perhaps if rosetta isn't being left in memory it's not properly releasing something then einstein and rosetta are conflicting over a memory address issue.



____________

Fletcher G. Hawkins
Joined: Nov 24 05
Posts: 2
ID: 126623
Credit: 78,551
RAC: 133
Message 26320 - Posted 30 Jan 2006 4:39:59 UTC - in response to Message 25661.

I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.)

All that said, I have been with SETI for years, but only just found out about the others when the changeover to BOINC came about. I elected to start with just Einstein in addition to SETI until I could get used to BOINC. I have completed and submitted one Einstein unit, but I'm not getting any new data to work on. My requests for further work go unfilled. Is it something I'm doing, or is the site locked out pending elimination of a bug?


It would help if you could give us some information. For example, can you give us copies of the exact error message. Also, if we could know things like type of computer, OS, BOINC version, the "connect every 'x' days" setting, and other such similar items.

Welcome to BOINC and to Einstein. I hope we can get this problem fixed for you. There is a whole new world of projects for you to enjoy.

Jim


You and Mike give me far too much credit. It took me two weeks just to find my way back this board again.

In trying to copy you the error message, my usual request to update did not go unfilled this time(!) so I have only my imperfect memory to relate "scheduler request: not receiving new workunits or ...posting?... results," I can't remember exactly. The only thing I can remember tinkering with was the .... response time? .... I set something from 0.1 day to 1.0 day. But that was several manual requests ago.

Anyway, I'm crunching again! Thanks!

Fletcher
____________

Message boards : Cruncher's Corner : New (Albert) application and workunits


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration