New (Albert) application and workunits |
Message boards : Cruncher's Corner : New (Albert) application and workunits
| Author | Message |
|---|---|
|
I wanted to tell the dedicated crunchers a bit about the new application (called 'Albert') and workunits, that I have started testing on the public project today. 1) We wouldn't have been lucky enough to get FLOPS counting this time around would we? No, but I'll take a quick look at the API, and implement this if it's easy. 2) Are the improved run times from optimized compiles for windows? The compilation process is no more and no less optimized than before. The differences in run times come about because we are now using a sky search grid and frequency band which depends upon frequency. This makes it impossible for all workunits to be the same length. 3) Is the Mac version still using Altiec? Yes, the Mac version still uses Altivec optimization if the CPU supports the Altivec instruction set. I got one running right now by the way (thanks), and it is hard to tell over RealVNC, but the graphics looks like they are a little "prettier". Estimated run time is ~3 hours so that looks like about 25% of the prior (though I am only 13% through). If you have a real-time clock in the upper right hand corner of the screensaver/graphics screen and the wording in the corners has slightly cleaner layout, then yes, you are running 'Albert'. [EDIT 25 December, questions from various people] Will we be switching back and forth between Einstein and Albert apps? Yes, for some time now, until we are sure that the Albert app is working as required. Does the Albert application have its own number (like 4.80) or is it still 4.79? The Albert app has its own number and name. You will know you are running this by seeing what the name of the application is in the BOINC manager, or in the title bar of the graphics window. See the list of applications for more info. Is there any way we can download the new Albert application? No. What work (and hence, what application) your computer gets is determined by chance. The 'scheduler' decides this when work is sent out. [EDIT December 27] Is is intentional that the target number of results is three rather than the old value of four? Yes, this is intentional. It may slow down result validation in some cases but will increase our computing power by ~ 25%. ____________ | |
| ID: 24094 | | |
|
Linux optimisation seem to got even worse: | |
| ID: 24188 | | |
Linux optimisation seem to got even worse:. Check again. The 18498s result also indicates a Linux OS. ____________ | |
| ID: 24190 | | |
Just noticed that "initial replication" is set to 3, instead of 4 for the old application. Was that intentional? Michael ____________ Team Linux Users Everywhere ![]() | |
| ID: 24192 | | |
Oops, you're right. Why the difference then? | |
| ID: 24193 | | |
Number of CPUs = 1, = 2 ... One is HT, the other is not most likely. HT gives you 2 logical processors but does not give 2x speed. I see 20-40% better THROUGHPUT at a loss of individual processing time, they take longer ... ____________ | |
| ID: 24197 | | |
Just noticed that "initial replication" is set to 3, instead of 4 I've processed one Albert unit so far - and its "initial replication" was also 3 - so, my guess is it was intentional. But, getting back to this unit, I noticed the "failed" result's computer is still using BOINC 4.19. Is BOINC 4.19 "too old" for Albert or was this just a coincidence? ____________ | |
| ID: 24199 | | |
|
Can't find the minimum requirement any longer. But, if the BOINC Software was out of date the work should not have been issued. But, this may need project attention. Did they test Albert with 4.19? | |
| ID: 24200 | | |
|
Daily quota problems with 'Albert'.....please see 'Problems and Bug Reports' for details.....Cheers, Rog. | |
| ID: 24213 | | |
3 is a good idea. 4 is a big waste of resources, because a lot of WUa are done with 3 valid results and the fourth is completed for nothing. ____________ ![]() | |
| ID: 24215 | | |
|
to Paul | |
| ID: 24218 | | |
|
Hmmm, with BOINC View I have lots of progress bars ... | |
| ID: 24223 | | |
|
A screenshot from a new 'Albert' may interest Einstein's participants, I guess. Anyone...? | |
| ID: 24229 | | |
A screenshot from a new 'Albert' may interest Einstein's participants, I guess. Anyone...? Without a sample link, how do I know if what you had was interesting? ____________ | |
| ID: 24233 | | |
Without a sample link, how do I know if what you had was interesting?Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot... ____________ | |
| ID: 24234 | | |
Without a sample link, how do I know if what you had was interesting?Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot... I made a screen shot of the new Mac graphics, can't figure out how to upload or post it as a msg. Feeling stupid in SW China :) ____________ ![]() | |
| ID: 24236 | | |
I made a screen shot of the new Mac graphics, can't figure out how to upload or post it as a msg. Feeling stupid in SW China :) Once you have got the image on a server (there are free webhosting providers for personal/non-commericial use) you need to use BOINC forum tags. For example, use (see the code using Reply to this post button) ![]() ____________ | |
| ID: 24238 | | |
Without a sample link, how do I know if what you had was interesting?Sorry Paul, I wasn't clear enough - I was actually asking if someone can provide such a screenshot... E-mail it to me and I will post it and link it... p.d.buck@comcast.net ____________ | |
| ID: 24250 | | |
|
Today I got 4 Albert WUs, and all 4 resulted in "Client error". | |
| ID: 24267 | | |
Any ideas what it is all about and how I could fix it?I would try to reset the project. It will delete create new HostID and trigger fresh download of Einstein's project files. ____________ | |
| ID: 24272 | | |
|
There are a number of reasons you can get MD5/ signature failures. These include bad connections to the project (which only makes sense if you have similar issues with the project for the "normal" work units, or the project may have generated "bad" work unit files. | |
| ID: 24290 | | |
|
Thank you Honza, thank you Paul. I'll see what willhappen today with this machine (if it receive Albert WU again). Will report here if anything new happen. | |
| ID: 24295 | | |
|
I report what happens today with this machine... | |
| ID: 24323 | | |
|
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ... | |
| ID: 24328 | | |
|
Hmmm... we will see. Strange part is that all of a sudden I just can't make this machine to crunch E@H at all (Einstein or Albert). And all started with the first Albert WU. Could that first Albert WU somehow made a mess with E@H related files? | |
| ID: 24330 | | |
Hmmm... we will see. Strange part is that all of a sudden I just can't make this machine to crunch E@H at all (Einstein or Albert). And all started with the first Albert WU. Could that first Albert WU somehow made a mess with E@H related files? Edo, If you have "failed" a bunch of WU's, it could be that your Einstein quota has been reduced, and therefore, you just can't get any work right now. You just might have to wait until tomorrow for a new WU. Stick ____________ | |
| ID: 24332 | | |
Stick, Yes, I know I reached daily quota limit, but before that I wasn't able to download Albert app after I reattached project. Now I have two WUs registered as they are in crunching phase but they were not... http://einstein.phys.uwm.edu/workunit.php?wuid=3116299 http://einstein.phys.uwm.edu/workunit.php?wuid=3094170 Both of those WUs just weren't able to start processing. I uninstalled BOINC at that machine and will try tomorrow from the begining. Will see if it works. ____________ | |
| ID: 24337 | | |
Stick, Edo, Sorry! (I hadn't read all your earlier posts before now.) Are the WU's cited above yours? And, are you still using BOINC 4.19? If so, did you see my earlier post here? No one has answered my question yet - but, it could be that BOINC 4.19 is the problem. If it's you who needs to upgrade, remember that BOINC 5.2.13 will not install over v4.19. You need to remove v4.19 using Windows "Add/Remove" Control Panel, then install v5.2.13. The Windows "Remove" should leave your BOINC folder with your account info intact. Stick Edit: Or, if some of your other projects don't support BOINC 5.2.X yet, you could upgrade to BOINC 4.45 (directly from v4.19). ____________ | |
| ID: 24338 | | |
|
Edo, | |
| ID: 24344 | | |
|
Stick, | |
| ID: 24347 | | |
|
12/29/2005 1:24:13 AM|Einstein@Home|Requesting 43200 seconds of new work | |
| ID: 24358 | | |
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ... Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal. | |
| ID: 24404 | | |
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ... Quite bizarre ... especially as there are many mixed into the sequence where all 3 (or 4) are sent virtually simultaneously. Well, weird anyway ... then again, I never claimed to understand how all this stuff works ... ____________ | |
| ID: 24412 | | |
You are underselling here. Say you got 12 users. They used to crunch 3 different WU with a replication of 4; now they crunch 4 different WU, replicated 3 times. Throughput up from 3 to 4 is a 33% rise. Thank you very much for this change - it is one I asked for, as did many others - tho not nearly as many as asked for the real time clock in the screensaver ;-) ____________ ~~gravywavy ![]() | |
| ID: 24425 | | |
Of course, with the better versions out there I have no idea why anyone would still use 4.19 ... :) On a low resolution screen the 4.19 manager made a much better use of screen space than anything since. Those buttons are so enormous that on 640x480 you can't even attach to a project cos you can't reach the button. Talk about shunning users of old equipment. Why the next manager can't drop the buttons and use right-click context menus beats me - it's how it should have been done in the first place, let the OS decide how to fit it on the desktop when it is clicked. Would also make it accessible to those who need very large fonts sizes. Come on, fixed layout interfaces should be a no-no. Also, some people liked the graphics slider showing progress (still available vie BOINCview by the way). But those are the only areas where 4.19 still wins. IMO there is no advantage at all to the 4.19 client. And just to be clear, no I don't still use it - the advantages of later clients encouraged me to upgrade at whichever point EDF was working reasonably sensibly. R~~ ____________ ~~gravywavy ![]() | |
| ID: 24426 | | |
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ... I explained this just 213 days ago and some people have forgotten already ;-) Seriously Paul - one day I will keep my promise to start contributing to the wiki, but in the meantime if you'd like to find a place for this it would be great. My guess is that this is a non-deliberate side-effect of other scheduling rules. The patent on all such unintended side effects is held by Murphy. Consider rule 1 - wherever possible assign work from the data the client already holds rule 2 - don't assign consecutive wu to the same pairings of computers Rule 1 reduces download times, which are bad enough on E@h anyway. Rule 2 means that redundancy is spread out to reduce chances of two computers repeatedly make the same mistake on the same wu. Let me be clear, I don't know that rule 2 exists, I am 'reverse engineerig' it from what seems to happen. Rule 1 certainly exists and is also known as locality scheduling (thanks to JOhn Keck for that) Now suppose A (by luck) is the first computer to be assigned work from a new dataset. Eventually, along comes B who has no more wu to be assigned form their old data, and thay are assigned wu from the same dataset as B. Because of rule 2, B will only be assigned one wu that is shared with A. B's next wu after that will be a different wu from the same dataset. Meanwhile A may well want a second wu. Then along comes C, D, E each will only be assigned one of the WU that any other computer has had. We might have this picture just after G gets their frist wu from this dataset: With an initial allocation of 4 we get: wu 1 : A, B, C, D wu 2 : B, wu 3 : A, E, F, G wu 4 : A, wu 5 : B, E wu 6 : A, wu 7 : C, E wu 8 : B, F wu 9 : A wu 10: D, F wu 11: C, wu 12: B wu 13: A wu 14: D wu 15: C wu 16: B wu 17: A wu 18: C wu 19: D eventually there are enough people on board that all wu get all their results issued close together. It is only around the startup of a new data file or a new app that I'd expect to see this kind of effect. Question for a mathematician - what is the smallest N such that N results can be given to 4N people and no result given to the same two people as any other result? After N results you expect the issuing of results to start looking sensible instead of all over the place. But notice that the very first WU, and in fact several others along the way, will get all their results sent out together even while others are kept in solo-crunch state for ages. Bruce: From the project point of view the thing to notice is that you will not get good turn-round on very small batches of WU if your servers are keeping them back for those who've already seen those datasets - that N seems to define the minimum size -- if, of course, my guess is right about the scheduler's decision rules. If the rules are different it just needs someone to dry run on paper how many wu / hosts need to go through the process before there are sufficient returning hosts to make the wu fly out the door nicely. R~~ | |
| ID: 24427 | | |
No, my mistake - with N results and 4N hosts it's always possible - everyone gets just one wu! The question meant to ask is at what point do you stop needing almost as many wu as hosts - or something. This effect does go away after a startup period, and there must be some way to work out a switch-over, but its around midnight and I can't think it through... R~~ zzzz | |
| ID: 24428 | | |
The new WU have different execution times, typically ranging from about 25% to 100% the previous execution times hi again Bruce. If I understand correctly the different run times originate from the different frequecies, which are know at the outset - is this right? If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect. To keep working over a weekend, for example, 2.7 days work is fine if I know it will be 2.7 days. If, however, the variation is not known / knowable by the scheduler and client at download, if say it can vary by a factor of four, I'd have to ask for extra work in case the work issued ran short. If it then ran long I might get into deadline issues, or it might put otherprojects into EDF, and so on. So accurate estimates of run lengths please, and based on you tester's experience of crunching the test WU. As accurate as possible - if the science means the numbers can't be predicted then we'd all have to live with that. River~~ R~~ | |
| ID: 24430 | | |
*This* would explain why a new machine I just attached is getting nothing but Albert WU's while all my older machines get nothing but the original Einsteins. (not that I'm complaining, just curious) -Gene ____________ | |
| ID: 24433 | | |
Correct. Bruce said that the allocations are random, but you only go into the draw when there are no wu to crunch from the data you already have. You may have noticed you get runs of WU with similar-starting names. It is only at the changeover from one such set of WU to another that you have any chance of getting an Albert. Presumably Alberts alos come in batches attached to different datasets, in which case when your computer can't get any more of the same set of Alberts, it may well revert at that point to the Einsteins. Dial-up users may have noticed that at the changeover in the name of the wu they get very long connect times - this is because a huge chunk of new data is downloaded. At all other times the instructions for the next wu simply tell the app to do something different with the data already on disk. Bruce: this makes me think of something else For you to think of... When Einstein is finally withdrawn, there will be a spate of the server dishing out on-off wu - odds and ends from the old datasets. Oldtimers will remember this happening in previous chageovers. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU. The advantage of BOINC, of course, is that even if your primary loyalty is with one project, you can easily go elsewhere and come back to avoid temporary issues. By posting a warning like that, dial-up users are more likely to come back than if they are not warned and leave in a tizzy over costs. In my opinion :-) River~~ | |
| ID: 24460 | | |
If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect. As it seems, the core client already is aware of the different run times. Right now, I've got Albert results waiting in the cache that have different "To completion" times. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU. I remember that this was already done for the previous changeover (see the news at April 7, 2005 in the archive), so I see no reason why it won't be done this time, too. :) ____________ | |
| ID: 24472 | | |
:-) ...it may be an idea to post warnings [to dial-up users] when that is about to happen... yep - take my comment as a gentle reminder and if it was going to be done anyway, then take it as thanks-in-advance! R~~ | |
| ID: 24507 | | |
|
Hello everybody, | |
| ID: 24513 | | |
For me it looks like when using Albert there is much more activity on my hard drive. What is your setting for "Write to disk at most every N seconds"? (Your Account, General Preferences) - are you sure it's the client_state.xml that's being written to, and not the VM page file? Do you have plenty of RAM for everything you're running? ____________ | |
| ID: 24529 | | |
For me it looks like when using Albert there is much more activity on my hard drive. This setting is set to write the data back every 120seconds. I think I have enough RAM (448MB) since most of the time 50% of the RAM is used for cache only. If I do a sh -c 'while grep fraction_done client_state.xml ; do sleep 1 ; done' I get a new fraction_done value mostly every three seconds (sometimes after four or five seconds), meaning that the file is written at least every three to five seconds. TTL ____________ | |
| ID: 24530 | | |
|
Albert seems to be running a lot slower than einstein 4.79. For example i started a WU today 4pm(UTC)and it has been left running itself with the computer not being used and no other major resource gobbling processes running. The screensaver and the system virus scan have been disbled for example. | |
| ID: 24537 | | |
Albert seems to be running a lot slower than einstein 4.79. For example i started a WU today 4pm(UTC)and it has been left running itself with the computer not being used and no other major resource gobbling processes running. The screensaver and the system virus scan have been disbled for example. The problem above seems to be down to the Albert app only using 50% of the available CPU power. The other 50% of CPU power is running the idle cycle. This computer is of course a single core processor(non -HT). I am just wondering if Albert was designed solely for HT or dual core processors and that's why it using only 50% of the CPU power. Does anybody's single core non HT processor use 50% or less of the CPU power available? ____________ | |
| ID: 24543 | | |
Using an AMD XP1700 with XP Pro SP2...using 95% of CPU for Albert. | |
| ID: 24552 | | |
Cheers the rig i am having problems with is AMD XP1900 so i try resetting the project first of all. ____________ | |
| ID: 24554 | | |
When Einstein is finally withdrawn, there will be a spate of the server dishing out on-off wu - odds and ends from the old datasets. Oldtimers will remember this happening in previous chageovers. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU. Would it be possible to improve the sceduler to not send one off work units to machines with slow connections? Since the configuration page already allows you to specify a maximum connection rate the needed information to tell between clients on narrow pipes and fat ones should already be available. ____________ ![]() | |
| ID: 24564 | | |
|
Hi all, | |
| ID: 24583 | | |
Hi all,Everybody who has this problem lives in the same timezone(mid european) and tries connect to einstein.aei.mpg.de when downloading the app. I'm not sure of this yet,but if you change your timezone in your computer (try UK or something) to try to download from the UK server.(Let me know if it works.) My albert works alright on all my 4 boxes but I got "couldn't connect to host[einstein.aei.mpg.de]" on every box,so I think my downloads might have come from another server(probably UK). ____________ ![]() | |
| ID: 24585 | | |
Hi all,Everybody who has this problem lives in the same timezone(mid european) and tries connect to einstein.aei.mpg.de when downloading the app. I changed time zone but I reached daily quota, I have to wait till tomorrow. ____________ ALL GLORY TO THE HYPNOTOAD! Do You Dare? ![]() | |
| ID: 24587 | | |
|
Hmmm... it seems that I finaly manage to download albert files. Here is a log... | |
| ID: 24591 | | |
|
Congrats Edo. I think your on dry land now. :) | |
| ID: 24593 | | |
|
Thanks Sharky! :) I hope german server will be unreachable for me tomorrow when I turn on office machines. :) | |
| ID: 24595 | | |
I see the same thing. My system started running Albert units this morning, and is doing a disk write every 5 seconds. My preferences are set for 60 seconds and I'm still within my actual system memory (with swap space showing 0% in use). ____________ ![]() | |
| ID: 24596 | | |
|
Just to report back... finally Albert works fine on at least one of my machines! | |
| ID: 24599 | | |
|
the first Albert WUs | |
| ID: 24615 | | |
|
Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret? | |
| ID: 24623 | | |
Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret? Read the 1st post in this thread, including the Q&A section. I've got a batch of Alberts that have 17% less estimated runtime than Einsteins. 1st one should be starting ~noon tommorrow if I've got the timing right (my cores are currently half a work unit out of sync, drifting closer/farther depending on the vagarities of which core windows decides to use for housekeeping tasks.) ____________ ![]() | |
| ID: 24627 | | |
|
I'm still getting errors: | |
| ID: 24638 | | |
|
| |
| ID: 24639 | | |
Just being interested in Albert: What exactly does Improved (all-sky pulsar search) mean? (I don't refer to the WUs) Or is this a secret? I read the 1st post and most of the others below. But the most are about some less important things concerning to what the program does while calculating. Reduced time says absolutely nothing important about the internal structure of the program ... the only important thing is this one: "The differences in run times come about because we are now using a sky search grid and frequency band which depends upon frequency" ... and this is just a brief description ... so, anyone with deeper insights? :) (btw: I am not complaining! I just want to know the important differencies) ____________ Greetings, Santas little helper | |
| ID: 24646 | | |
|
I'm wondering about exactly what the scheduler is doing re einstien/alberts myself. | |
| ID: 24658 | | |
|
I read the 1st post and most of the others below. But the most are about some less important things concerning to what the program does while calculating. Reduced time says absolutely nothing important about the internal structure of the program ... the only important thing is this one: | |
| ID: 24665 | | |
|
Slavko.sk | |
| ID: 24668 | | |
Did you actaully read that link, or just conjur it out of google? IF the former, where should we be looking in it? The phrases "all-sky" and "all sky" are not in the document anywhere, and every instance of improved appears to either refer directly to sensor hardware, or other backend infrastructure; not the client app. PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens. http://66.102.7.104/search?q=cache:by7iLtzNjZIJ:www.ligo.caltech.edu/NSF/pdf/annual_report.pdf+What+exactly+does+Improved+(all-sky+pulsar+search)+mean%3F+2006&hl=en ____________ ![]() | |
| ID: 24669 | | |
PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens. Hmmm. Did you edit your post, JAHMAGIC? The link that showed *after* I made my reply isn't the same as the one I got backing in my browser. ____________ ![]() | |
| ID: 24670 | | |
|
I'm seeing some Albert ugliness on my Mac: | |
| ID: 24679 | | |
|
Just to report back... Today I try to run E@H at my office machine... but it produce lots of client errors. It didn't even try to work with Albert but it tried to download to Einstein app and files. I get error 504 message, and signature error when downloading files. And this machine work perfectly before it was hit by Albert. Now I just can't make it to work. My home machine work nicely on Albert. The only difference is that at office I'm behind proxy. But that wasn't problem before. Hope staff will fix this nightmare soon (if this is problem on their part). | |
| ID: 24691 | | |
I'm seeing some Albert ugliness on my Mac: Looks like normal operations to me. That is, I think the "No usable checkpoint found . . ." messages are indicative of the first time Albert tried to write a checkpoint for those particular WU's. Every Albert WU I have looked at has one of these messages. In other words, it is only be a problem if a WU gets more than one of these messages. ____________ | |
| ID: 24696 | | |
PS, does this board automatically tinyurl any link? The read view of the forum is showing a tinyurl that points to the tinyurl you posted, while the edit page is showing what I assume is your orginal tinyurl. I'm directly pasting the link from google's cache to see what happens. Sorry about that Dan.....yeah I tend to edit a few times (I just got up and was having coffee and running several pc's at once) And since I'm stuck on the worlds slowest dialup here things run slower than my thoughts. No this site doesn't auto-tinyurl your links ......you just convert them yourself and post them especially long one like that one) I just happened to have that page up and noticed towards the end it mentioned sky search grid. Basically looking around more than just one section of the sky. (and yeah my tiny url was messed up at first and I switched it as fast as my dialup would allow) The best thing that happened is now all my machines are loaded with the new Albert 4.37's so I get to test the timing differences and all the rest of the Einstein fun. ____________ | |
| ID: 24710 | | |
The best thing that happened is now all my machines are loaded with the new Albert 4.37's so I get to test the timing differences and all the rest of the Einstein fun. Enjoy. I'm ~40h from switching back from 83% alberts to einsteins again. ____________ ![]() | |
| ID: 24714 | | |
|
Hi, Slavko.sk ____________ ALL GLORY TO THE HYPNOTOAD! Do You Dare? ![]() | |
| ID: 24722 | | |
|
Finally I decide to change my time zone on my office machines too. I change it to EST (US) zone and now it works perfectly. It downloaded Albert app and all the files without any problems. | |
| ID: 24742 | | |
Please, could you help us to identify which files are being modified? A simple way is to set your preferences to (say) 600 seconds, then monitor the timestamps of the files in projects/einstein.phys.uwm.edu/ and in slots/N/ to see which of these files is being written to more often than once every ten minutes. ____________ | |
| ID: 24762 | | |
For me it looks like when using Albert there is much more activity on my hard drive. Above is from this message, so it looks like it's the client_state.xml file being written to. ____________ | |
| ID: 24763 | | |
Please, could you help us to identify which files are being modified? A simple way is to set your preferences to (say) 600 seconds, then monitor the timestamps of the files in projects/einstein.phys.uwm.edu/ and in slots/N/ to see which of these files is being written to more often than once every ten minutes. I set my write to disk option for 600 seconds. Looking in the main boinc directory, the client_state.xml and client_state_prev.xml are updating at an interval varying between 3 and 5 seconds. In the slots directories (HT so there are 2 instances running), each slot's Fstat.out.ckp file is updating at varying intervals of 2 to 4 seconds. I then suspended einstein to put it back to running seti, and confirmed that the state.sah file does write to disk at the correct interval as specified in the preferences. After that, when going back to einstein, there was a 50 second delay in which there was no disk activity, then an entry was written to stderr.txt (says it was verifying the checksum for Fstat.out.ckp) after which it resumes the actual computation and again resumes writing the checkpoint file every 2 to 4 seconds. (edit) - sorry, forgot to mention the project folder. Both result files (the text file bearing the same name as as work units being run) in the ~/projects/einstein.phys.uwm.edu folder are being updated at 2 to 4 second intervals. | |
| ID: 24770 | | |
|
Wish to canel the limit that one machine onle get 16(max)wu. | |
| ID: 24786 | | |
|
My first Albert WU has crunched. 5:12:30, down from an average 8:22. quite an improvement. Congrats on a much improved app. | |
| ID: 24787 | | |
I have the same problem, initiated a thread about it in the Problems and Bug reports forum: http://einstein.phys.uwm.edu/forum_thread.php?id=3513#24882 It bugs me quite a bit, hope someone has a solution. Greetings, Mr Ragnar Schroder | |
| ID: 24899 | | |
... I'm running Intel P4 w/HT XP/SP2 3GHz and specified in preferences to update every 3 minutes. Have been running Albert WUs (8-hrs length)for some time. Client_State.xml and Prev..xml as well as chkp files in both einstein slots update every 3 minutes as specified. Lucky me. | |
| ID: 24906 | | |
... Yeah, it seems that it's specific to the linux version of albert. And may only be happening in certain linux kernels, though of the three people reporting it here the 2.4 and 2.6 kernels are represented. ____________ ![]() | |
| ID: 24910 | | |
|
Have same continuous 5 second disk access on all 3 Mandriva, 2005LE and 2006.0, systems when Albert is running. Have General Preferences set to 60 seconds for disk updating. Looking in boinc directory client_state_prev.xml and client_state.xml are modifed every minute. In slots/0 Fstat.out.ckp is shown as modified every minute and in projects the wu r1_1112.5__1343_s4r2a_0_0 is shown as being modified every minute. These are the only files I have seen being updated so far. Boinc is running in my home directory so don't think it necessary to look elsewhere. | |
| ID: 25016 | | |
Have same continuous 5 second disk access on all 3 Mandriva, 2005LE and 2006.0, systems when Albert is running. Have General Preferences set to 60 seconds for disk updating. Looking in boinc directory client_state_prev.xml and client_state.xml are modifed every minute. In slots/0 Fstat.out.ckp is shown as modified every minute and in projects the wu r1_1112.5__1343_s4r2a_0_0 is shown as being modified every minute. These are the only files I have seen being updated so far. Boinc is running in my home directory so don't think it necessary to look elsewhere. As a follow up changed General Prefs to 300 sec for disk updating (taking a que from Bruce). Did an update and verified update in messages. Same files are being modified when Albert is running at one minte intervals except I now have a different wu r1_1112.5__1337s4r2a_1_0 and as before Fstat.out.ckp, client_state_prev.xml, and client_state.xml. | |
| ID: 25031 | | |
|
There's a new version of the Linux app being sent out now - but still no word from the staff... | |
| ID: 25032 | | |
There's a new version of the Linux app being sent out now - but still no word from the staff... I don't know what all the new app may address, but I just got some new work that uses it (albert 4.40) and the disk writes are now working at the proper time interval to match my preferences. ____________ ![]() | |
| ID: 25044 | | |
|
The problem with the Linux version of the albert application writing to disk more frequently than set by user preferences has been fixed. A new Linux version of the app (4.40) is now being distributed which fixes this. | |
| ID: 25114 | | |
|
Hello! | |
| ID: 25130 | | |
|
no problems here downloading/crunching alberts. they take less time than the other WUs so i claim less credit. | |
| ID: 25193 | | |
no problems here downloading/crunching alberts. they take less time than the other WUs so i claim less credit. Normally I get almost instant credit since I've got a 4 day queue. 3 days to cover my isp going down friday evening and not being fixed until monday (happened twice in the last 6 mo), and one more day incase thier sysadmin needs to overnight a spare part. It looks like the person you're waiting on has a similarly long queue. IT could be worse afterall. I've got a 5 results waiting on a noob who appears to've quit after returning 6 errors the last week of dec, and a 6th on annother noob that only did a single work unit. ____________ ![]() | |
| ID: 25195 | | |
Man, I don't know abou that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time. | |
| ID: 25414 | | |
|
An idea for the reduced "initial replication" part, I'm not sure if that is possible without a lot of work though: | |
| ID: 25415 | | |
Man, I don't know about that. The last three WU's I've processed have failed on me due to excessive CPU times. And these times are way out in space: 55 hours to completion? And the CPU time indicated at abort is a bunch of jive with respect to actual elapsed time. There's no way I could've processed a WU as long as is indicated at abort time. Ray, 70-80 hours is way too long for your machine, especially considering the WUs weren't even completed in that time, unless there's some incompatibility with Win98/albert that I don't know about. I'd suspect either thermal throttling or something very CPU-intensive running alongside it. Anything you know of that might qualify? Regards, Michael ____________ microcraft "The arc of history is long, but it bends toward justice" - MLK | |
| ID: 25416 | | |
|
@Professor Ray : | |
| ID: 25417 | | |
|
Nope, doesn't make any sense. | |
| ID: 25418 | | |
|
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;) | |
| ID: 25424 | | |
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;) Jord, Eej, maat! As the other half of the "Graphics Bug" tag-team, I guess that leaves me off the case, too, since it's equally unlikely to be a graphics adaptor driver issue. :-) Michael ____________ microcraft "The arc of history is long, but it bends toward justice" - MLK | |
| ID: 25425 | | |
My latest Albert went out with an error, and no, before Stick probes me, the exit code -1073741819 (0xc0000005) wasn't caused by me using my screensaver. ;) Maybe you should try the Beta application! ;-) Actually, I happened to find a similar result last week and posted this message on the NEW: WINDOWS TEST APPLICATION FOR EINSTEIN@HOME board. I have to admit that Jord used the more appropriate venue. Edited - to improve the humor (maybe). ____________ | |
| ID: 25426 | | |
|
Wow... 6 in a row?? All with the same error. Anyone? | |
| ID: 25454 | | |
Wow... 6 in a row?? All with the same error. Anyone? Jord, Maybe time to consider backing off that 5.3.6 to an approved client? Michael ____________ microcraft "The arc of history is long, but it bends toward justice" - MLK | |
| ID: 25458 | | |
|
Why? It's an alpha client, I am an alpha tester. | |
| ID: 25462 | | |
Why? It's an alpha client, I am an alpha tester. Jord, I'd forgotten that you're doing alpha work, sorry. Thank you for the sacrifices in the name of progress. Michael ____________ microcraft "The arc of history is long, but it bends toward justice" - MLK | |
| ID: 25464 | | |
Wow... 6 in a row?? All with the same error. Anyone? Or, maybe you should start using graphics. ;-) I have some observations (but no idea what the problem is). I noticed that all but the first WU failed immediately (never even got to the "No usable checkpoint . . ." stage). Makes me wonder if there is a residual value from the first unit somewhere in Albert or BOINC that needs to be reset. Also, the other result (which I pointed to in my previous post here) was an isolated single failure (and it was under BOINC 5.2.13). ____________ | |
| ID: 25465 | | |
|
I did a full system shutdown. 12/01/2006 15:35:15|Einstein@Home|Message from server: No work sent Yep: Maximum daily WU quota per CPU: 7/day Oh well. :-D Wasn't there a quota of 16, though? I know I spoiled 8... ____________ Jord -The BOINC FAQ Service -CUDA/CAL Stream FAQ | |
| ID: 25466 | | |
Wasn't there a quota of 16, though? I know I spoiled 8... I just counted nine of them. ____________ | |
| ID: 25467 | | |
|
You think Friday the 13th is bad... I think Thursday the 12th is. :) | |
| ID: 25468 | | |
You think Friday the 13th is bad... I think Thursday the 12th is. :) Given the way you count, it may already be Friday the 13th where you are (or maybe it's only Wednesday the 11th). :) BTW: Have you thought about e-mailing your "std*" files to Walt yet? He'll probably want to look at them. Here is an old message with some instructions on how to do that. ____________ | |
| ID: 25469 | | |
|
I've contacted Bruce on it. He's very excited. I think he complimented me: "a reproducible bug from a BOINC expert and all-around computer geek!". :-) | |
| ID: 25471 | | |
the exit code -1073741819 (0xc0000005) Another, new report of this same problem is over here. Host = 481640. ____________ | |
| ID: 25476 | | |
|
Don't forget, return errors, lower the quota ... | |
| ID: 25478 | | |
the exit code -1073741819 (0xc0000005) I happened to find another example of this problem from the owner of this thread. ____________ | |
| ID: 25498 | | |
|
I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.) | |
| ID: 25590 | | |
.. [snip].. It may be your "Average turnaround time 7.1 days". If your supplied with 3 work units, your PC would be over committed. 14 days is the maximum time allowed to complete a work unit. Even now, you have two & 2 * 7.1 is 14.2 days. It may work out as you near completion of the work unit your crunching now. ____________ Join the #1 Aussie Alliance on Einstein | |
| ID: 25596 | | |
I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.) It would help if you could give us some information. For example, can you give us copies of the exact error message. Also, if we could know things like type of computer, OS, BOINC version, the "connect every 'x' days" setting, and other such similar items. Welcome to BOINC and to Einstein. I hope we can get this problem fixed for you. There is a whole new world of projects for you to enjoy. Jim | |
| ID: 25661 | | |
|
Fletcher, | |
| ID: 25666 | | |
|
| |
| ID: 25921 | | |
|
I am getting an error on one of 2 pc running boinc, see my original info in Message 25473. | |
| ID: 26017 | | |
|
The first thing I would try is to suspend Rosetta@Home and see if the problem persists. There may be an interaction, though I don't see how, between the two projects. | |
| ID: 26020 | | |
... By the way it only started when i added a 3rd project, rosetta, and both rosetta and einstein fail. ... Doesn't it seems strange that the instruction would call its own memory location to be read? If you're running rosetta also, be sure that you have the "leave applications in memory while preempted" option under your general preferences set to "yes". Rosetta requires this, and I don't know if it could be related or not, but perhaps if rosetta isn't being left in memory it's not properly releasing something then einstein and rosetta are conflicting over a memory address issue. ____________ ![]() | |
| ID: 26025 | | |
|
JUst to report from here: | |
| ID: 26122 | | |
|
another thing to try is would be to run a memory test application to see if your memory is still good. One application is memtest86 (if memory serves me correctly). | |
| ID: 26136 | | |
|
I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here | |
| ID: 26155 | | |
I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here Two other users have successfully completed the same WU, so there is probably nothing intrinsically wrong with it. ____________ | |
| ID: 26165 | | |
I had an Albert WU that sat and went for almost 4 1/2 hours with 0% at all. I aborted the WU. WU is Here Hmm, i dunno why it sat there for 4 1/2 hours with no progress at all on it. I'll watch the next ones that I got and see. Thanks Bruce. Jeremy | |
| ID: 26172 | | |
|
Thanks i will try this. Seti never errors. i had einstein and seti running on one system and was fine then einstein and rosetta fails when adding rosetta. on another system i had seti and rosetta running fine then added einstein and rosetta and einstein fails ... By the way it only started when i added a 3rd project, rosetta, and both rosetta and einstein fail. ... Doesn't it seems strange that the instruction would call its own memory location to be read? ____________ | |
| ID: 26245 | | |
I'm brand new and confused. I know just enough about computers to really screw things up. (I was pretty good at DOS, but Windows has escaped me, and I don't have the time or youth to figure it out.) You and Mike give me far too much credit. It took me two weeks just to find my way back this board again. In trying to copy you the error message, my usual request to update did not go unfilled this time(!) so I have only my imperfect memory to relate "scheduler request: not receiving new workunits or ...posting?... results," I can't remember exactly. The only thing I can remember tinkering with was the .... response time? .... I set something from 0.1 day to 1.0 day. But that was several manual requests ago. Anyway, I'm crunching again! Thanks! Fletcher ____________ | |
| ID: 26320 | | |
Message boards :
Cruncher's Corner :
New (Albert) application and workunits