Windows Beta Test App 4.24 available |
Message boards : Cruncher's Corner : Windows Beta Test App 4.24 available
| Author | Message |
|---|---|
|
A new Windows App is available from our Beta Test Page. | |
| ID: 71144 | | |
A new Windows App is available from our Beta Test Page. Oh, just what we need... provoking... ;-) I don't think we should provoke Einstein. He may roll over and declare E=MC3 :-O I'll try it later... Doing some SETI stuff right now... | |
| ID: 71146 | | |
|
Loaded and crunching ... 34 hours of work estimated on the Athlon 3200+. Wait and see :) | |
| ID: 71150 | | |
|
Installed on two hosts. | |
| ID: 71174 | | |
Installed on two hosts. Are you saying that if a Linux host are paired up with any Windows OS that the Linux result is guaranteed to be declared "invalid"? I could've swore that I had one result validate against Linux some time ago, but I've been paired with nothing but other Windows versions lately. Edit: It must be something specific with either the Linux or the Windows installation that is causing the issue, because not everything fails. I guess it could also be the frequency range of the result. Don't know. I'm currently watching FalconFly's results for comparisons against my processor. Though he said that he wouldn't be really processing a lot of stuff until this week, this WU of his validated against a Windows XP Pro host... | |
| ID: 71176 | | |
My Linux box just got 304.31 credits against a Windows box. Never had a validation error on my PII Deschutes running SuSE Linux 10.1, BOINC 5.8.17. Same in SETI and QMC. Tullio ____________ | |
| ID: 71177 | | |
Hmmm... Yep, I knew there were hosts out there going along fine. Obviously something triggers the problem...but what? Is it perhaps a problem with the WU generator? Pure speculation...and I'm probably talking out of ignorance at this point... LOL P.S. - I HATE REGULAR EXPRESSIONS!!!!! Been up all night working on something for school. I realize they are "powerful and all", but they are way too nerdy, IMO... | |
| ID: 71178 | | |
|
I'm pretty sure that the cross-platform validation problem neither has to do with the memory access bug we fixed, nor with a particular machine. | |
| ID: 71179 | | |
|
I remember that, at Area Science Park in Trieste, my MIPS 6000 minicomputer, running UNIX System V, gave results that were different from those obtained on a SUN workstation running SUNOS. But I still don't know why. | |
| ID: 71180 | | |
I'm pretty sure that the cross-platform validation problem neither has to do with the memory access bug we fixed, nor with a particular machine. That reminds me...: Bernd, is it true that the math library for the Linux app is linked dynamically? Wouldn't it be better to have it statically linked to have a guarantee that every Linux app uses the same? Some frequencies really seem to be more affected than others, I had a series of WUs lately where I had a 50 % failure rate (!), and a single result I investigated contained as many as 100 mismatches between Windows & Linux in the final "toplist" containing the 10000 most promising items. CU BRM ____________ ![]() ![]() | |
| ID: 71181 | | |
A) Where do you go to see that kind of stuff? B) Where do you come up with the time to do that analysis? ;-) | |
| ID: 71182 | | |
A): Just disable network access when you know you have a "special" workunit, so the resultfile can't be immediately sent to the server. Wait until the resultfile is generated, unzip it, and look into it :-). Resultfile is plain ASCII and can be analysed pretty well. It's not too difficult to re-run a WU on another BOINC installation as well (offline, so you don't sent something to the server, of course!!!) B) I guess being single atm helps a lot :-). I spent maybe a weekend on this because I was curious what kind of error this was (just a small "epsilon" problem which could be fixed by relaxing the validator or something more complex). I'm an IT professional like you, so sometimes you can't help and just HAVE to find out, I guess. CU BRM ____________ ![]() ![]() | |
| ID: 71183 | | |
Regualr Expressions are cool, but the the syntax is fairly difficult to get used to. ____________ ![]() | |
| ID: 71188 | | |
Like I said, too nerdy... I'd rather let the ubernerds hack at this kind of thing and just give me some private function to call. Of course, some prospective place of employment would naturally decide they wanted something that was not in the black-box implementation... :sigh: Try doing a google search for "hate regular expressions" and see how many hits there are... LOL | |
| ID: 71192 | | |
I knew you'd come up with some use for zapping network... ;-) It's not too difficult to re-run a WU on another BOINC installation as well (offline, so you don't sent something to the server, of course!!!) I'm just a poor boy... I can't go getting other hardware... I might check into VirtualPC 2007 or VMware player...but not until after this class is over. Too much to tackle getting set up otherwise (got to get apache or something like it set up this week). so sometimes you can't help and just HAVE to find out, I guess. Can relate... | |
| ID: 71193 | | |
The cross-platform tests I performed were done on the same hardware, just to make sure. I used my regular Win XP and a Linux Live-DVD (Knoppix in this case). (insert DVD, boot , enjoy Linux). Zero installation effort, very nice. CU BRM ____________ ![]() ![]() | |
| ID: 71194 | | |
What kind of hardware though? I have a sound card that has no official Linux support (X-Fi XtremeGamer), and a Logitech MX Revolution mouse that may or may not be supported, and since I can't mod x11 stuff when running off of an image file (at least I don't think I can).... ? More concerning would be if my network adapter (Realtek GigaLAN) is supported...? | |
| ID: 71197 | | |
I'm pretty sure that the cross-platform validation problem neither has to do with the memory access bug we fixed, nor with a particular machine. This may sound like a silly question, but. . . If there are differences between the various compilers and math libraries, how do we know which ones will give scientifically accurate results? And, have a lot of us been producing results that are worthless? ____________ ![]() | |
| ID: 71206 | | |
Interesting questions indeed. Follow up question: One way to improve validation would be to inject simulated "Pulsar Signals" into the input data and verify that the clients find them. Are there any plans to do that in the future? CU BRM ____________ ![]() ![]() | |
| ID: 71212 | | |
This was done with some data at the end of the s4 run. ____________ ![]() | |
| ID: 71220 | | |
|
Finished first result with 4.24. Seems a bit slower, but dunno. No provocation (yet). I'll do that with one of the next group that I get... | |
| ID: 71230 | | |
I have an "interesting question". I tried to get VTune up and working on my machine to answer it, but since it is an AMD processor, it squawked about the processor architecture...and on top of that, I have no idea how to use the blessed thing... The C++ DLL that I worked on was not a performance drag (credit card auth on tcp/ip usually happened very quickly), so it was never "tuned"... Sooooo..... What is the effect that happens when you "ABC" again? Is that working against the modf() -> ftol() change, or is there still some activity going to modf() despite the change, meaning there's another "buggy detection" different from the one that was already worked around, or is that string changing some other function? Brian | |
| ID: 71240 | | |
My VTune trial license expired... Anyway...the effect of the "ABC" patch is that on AMD CPUs that supports SSE2, a global flag in the runtime lib is set differently. This flag toggles the (usually) faster SSE2 codepath for several functions, not just modf. What Bernd did was to rewrite the code in the hot-loop so that it would no longer call modf but ftol, for which, in VS 2003, only one code path exists which is reasonable fast. The slow codepath will continue to be executed in the new Win apps, but no longer in the hot-loop, as I understand it, so the overall effect of "ABC"ing the app should be much smaller now. CU H-B ____________ ![]() ![]() | |
| ID: 71241 | | |
This flag toggles the (usually) faster SSE2 codepath for several functions, not just modf. What Bernd did was to rewrite the code in the hot-loop so that it would no longer call modf but ftol, for which, in VS 2003, only one code path exists which is reasonable fast. The slow codepath will continue to be executed in the new Win apps, but no longer in the hot-loop, as I understand it, so the overall effect of "ABC"ing the app should be much smaller now. Gotcha... Yeah, it doesn't have as much octane as on 4.17... I really hope Barcelona/Agena (Phenom...btw, IMO, silly name) will at least get AMD back onto a level playing field from an architecture standpoint... | |
| ID: 71242 | | |
This may sound like a silly question, but. . . There is no such thing as a silly question. If there are differences between the various compilers and math libraries, how do we know which ones will give scientifically accurate results? This is a very good question, and difficult to answer indeed. "Science" (as it applies here) is based on mathematics, which is based on an ideal world: values are continuous, spaces are infinite etc. Calculations performed on real-world machines (computers) are not like this: resources (memory, time) are limited, and so is precision, which means values are discrete. In this sense every (non-trivial real-number) calculation done on a computer is wrong wrt. the ideal model the implementation is based on. However, in many (hopefully most) cases the difference ("error") is neglectable. Although every "computation" as mentioned is "wrong", i.e. differs from the mathematical idea, the difference itself varies between the systems the calculations are done with (CPUs, compilers, libraries etc.). A way to make all computations at least wrong in the same way is to set a standard for the way they are performed, independent of the properties named above. This was tried in IEEE 754. Almost all "systems" have some way of enforcing calculations conforming to this standard. However, most modern processors have evolved beyond this standard and e.g. implemented ways to accelerate their own understanding of floating-point arithmetic, so enforcing "IEEE arithmetic" is still possible, but would noticeably slow down the computation compared to the systems "native" way. So for us a way to ensure cross-platform compatibility would be to use IEEE arithmetic (and it would make the various CPUs truly comparable), but it would generally slow down the computation. For a project whose success (i.e. probability of detecting a gravitational wave) depends so much on the "computing power" (here: the number of computations done) this would have a severe impact, too. And, have a lot of us been producing results that are worthless? Definitely not. In principle all results have been helpful, even though they didn't pass validation. We will need to adjust the App and/or the validator to make the good results pass validation, regardless of the platform they have been calculated on. BM | |
| ID: 71246 | | |
Interesting questions indeed. Follow up question: One way to improve validation would be to inject simulated "Pulsar Signals" into the input data and verify that the clients find them. Are there any plans to do that in the future? Fully true. There actually are two types of "signal injections" already done: "hardware injections" that actually affect the test masses of the detector (sometimes used for calibrations, too), testing the whole pipeline from detector to data analysis. There are "software injections", too, where fake pulsar signals are added by software to the data that has been recorded from the detector. There should be more detailed descriptions of this in the S3 results report (available through a link from the front page). This, however, is beyond the scope of a single workunit, and thus does not help for technically validating individual results. BM | |
| ID: 71247 | | |
For the curious: You can provoke a "client error" (Breakpoint) by putting a file named "EAH_MSC_BREAKPOINT" into the BOINC directory (remember to remove it after testing!). Well, this is just for testing getting the symbols from the symbol store, so only if you're really curious. It will happen right at the beginning, so shouldn't waste computing time. You should probably set the project to "no new work" before you try this, and "allow more work" after you removed the file in order not to trash too many results. The result should look like this result, in particular you should find "PDB Symbols Loaded". BM | |
| ID: 71250 | | |
For the curious: You can provoke a "client error" (Breakpoint) by putting a file named "EAH_MSC_BREAKPOINT" into the BOINC directory (remember to remove it after testing!). BTW, Bernd, did you notice this message from Gary Roberts? http://einstein.phys.uwm.edu/forum_thread.php?id=5848&nowrap=true#70886 It seems the l1_* files never get deleted, slowly filling up the disks of hosts until the quota is reached, effectively shutting down work for Einsein@Home after some time. Could be responsible for some hosts dropping out of E@H. CU BRM ____________ ![]() ![]() | |
| ID: 71268 | | |
|
1st WU finished w/o incident, and valid. My host was the 3rd to finish, zeroing out the credits of a Linux box. I do hope it wasn't you again, Gary :-(. I was lucky to get away alive with it last time... | |
| ID: 71270 | | |
|
Hey Bernd, got some odd errors for my WU and wanted to check back:
It doesn't look too relevant to me and I'm not sure where it comes from (maybe from having a second screen connected to my notebook at that time and set to "primary") but I thought I'd get back anyway just to be on the save side. Both WUs seem to be crunching away normally now. [edited for spelling] ____________ ![]() | |
| ID: 71286 | | |
|
First WU with 4.24 completed and validated OK even with WinXP paired with Darwin wingman. Running AMD XP2600+ WinXP against wingman with Intel core 2/T5600 Darwin 8.9.1. | |
| ID: 71298 | | |
BTW, Bernd, did you notice this message from Gary Roberts? http://einstein.phys.uwm.edu/forum_thread.php?id=5848&nowrap=true#70886 Nope, I didn't notice before. Thanks for pointing out. I wrote a note to the person in charge of the scheduler (which sends out the cleanup requests). BM | |
| ID: 71299 | | |
|
I plan to make this App official in the next few hours, mainly to get a more useful info regarding the client errors. It looks like it will at least make things not worse than the current official App (apart from Intel-SSE2 machines, which will see a small penalty of around 7%, but caugth up by the AMDs). | |
| ID: 71301 | | |
... I do hope it wasn't you again, Gary :-(. I was lucky to get away alive with it last time... No it wasn't me this time :). You are obviously getting stuck into some other poor sod :). These Linux/Windows validation issues just keep going on. Here is one of my AMD/Linux boxes that is suffering with three validation problems in its current list. Two have been finalised and a score of 0.0 awarded and they will probably disappear from the list shortly. A third is still pending but as usual, the "decider" has gone to a windows box so no joy there either. C'est la vie!! :). ____________ Cheers, Gary. | |
| ID: 71327 | | |
|
It turns out there are a number of issues that lead to these cross-platform validation problems, some of which have been addressed recently, some we're still digging for. Solving these problems will probably require both a new validator and a complete set of Apps. I am confident that we will have all these pieces together next week. | |
| ID: 71328 | | |
.... I am confident that we will have all these pieces together next week. That's really great news, thank you!! With those issues sorted out soon, and with the prospect of significantly optimised apps to follow, hopefully people will be encouraged to hang on a bit longer or even return if they had already left. ____________ Cheers, Gary. | |
| ID: 71333 | | |
That's really great news, thank you!! With those issues sorted out soon, and with the prospect of significantly optimised apps to follow, hopefully people will be encouraged to hang on a bit longer or even return if they had already left. I already have some code that should speed up computation significantly, but with the present issues it's simply impossible to validate it. BM | |
| ID: 71337 | | |
|
I understand 4.24 is now the official release. | |
| ID: 71343 | | |
Is it as simple as stopping Boincmgr, removing app_info.xml in the Einstein directory, and restarting Boincmgr? (Assuming all results in queue are already tagged as 4.24 because they were downloaded after the previous change?) Yes it is. BM | |
| ID: 71344 | | |
|
The notice on the Home Page regarding 4.24 is dated June 13, 2007. You may want to fix it... | |
| ID: 71369 | | |
|
My first result with the Windows 4.24 app completed during the night. For a 340-cobblestone workunit, completion time went from 29.86 hours to 22.46 hours. So, a nice little speed-up. | |
| ID: 71418 | | |
The notice on the Home Page regarding 4.24 is dated June 13, 2007. You may want to fix it... Thanks. However, changing news items later is difficult, it causes trouble not on our site, but on the sites getting the news from xml or rss feeds. BM | |
| ID: 71450 | | |
|
We get some errors with exit code 0x40010004 from hosts running Windows Vista. Did anyone run this App successfully and reliably on Vista, or is it failing on all such machines? Any clues what precisely might be the reason of this error? | |
| ID: 71457 | | |
|
Success on a Vista Machine (Mine) | |
| ID: 71460 | | |
We get some errors with exit code 0x40010004 from hosts running Windows Vista. Did anyone run this App successfully and reliably on Vista, or is it failing on all such machines? Any clues what precisely might be the reason of this error? Hi! Seems to be graphics related, see some BOINC wiki I've seen this happen at Rosetta as well, Vista with its new graphics subsystems seems to cause a lot of these errors, I'm afraid. CU BRM ____________ ![]() ![]() | |
| ID: 71461 | | |
|
Hi! | |
| ID: 71462 | | |
Hi! 5.2.5 is a very old BOINC Core Client. It has been stated that "newer" BOINC CCs are needed for the error reporting, although what "newer" means is a bit fuzzy... Is 5.4.11 ok? How about 5.8.16? Hopefully it doesn't need to be a 5.9.x or 5.10.x release... :-( | |
| ID: 71463 | | |
|
Sorry I don't know the exact version myself. it needs to be some version that installs the dbghelp.dll in the BOINC directory. 5.8.x is definitely ok, 5.2 sounds too old, should be something in between where the change occured. Anyone knows the version for sure? | |
| ID: 71464 | | |
|
Bernd, the BOINC Windows Debugger was introduced in the 5.2.X series but a newer version was introduced with the 5.4.X series. Rom posted some technical details on the Ralph forum here which may help. Dave | |
| ID: 71468 | | |
I just saw this one: http://einstein.phys.uwm.edu/result.php?resultid=85401300 You get a proper trace of an internal error from BOINC_LAL_ErrHand(), which "now calling boinc_finish()", but apparently it's boinc_finish() that failes (which does little more than just exit()) with an access violation. Something has gone really, really wrong on this machine (faulty memory or similar). BM | |
| ID: 71471 | | |
|
My two AMD boxes (an AMD Athlon 2200 and an AMD Sempron 2600) are both nearly 25% faster! | |
| ID: 71472 | | |
It turns out there are a number of issues that lead to these cross-platform validation problems, some of which have been addressed recently, some we're still digging for. Solving these problems will probably require both a new validator and a complete set of Apps. I am confident that we will have all these pieces together next week. Bernd, When you get to the point of deploying the new validator and the new set of apps, are you intending to run a (perhaps short) beta test phase first, as you did with the 4.24 Windows app? If you are, might I make a suggestion about the app_info.xml file that would accompany each test app? As you warn quite clearly on the beta test page, changing the app aborts any work in progress with a client error. However you can easily avoid this with a small modification to the app_info.xml file. If you are already fully aware of this and do not want to allow a change of app in the middle of a result, that is fine - no change is needed. My thinking is that the beta test period could be kept shorter and the number of potential beta testers could be increased if people were allowed to "re-brand" the results in their caches so that they didn't have to abort or wait for their caches to drain or in any way disrupt their normal crunching patterns in order to participate in the test. I'm sure that people have done this in the past by editing their state files. I think it's much safer to do it through the app_info.xml mechanism. ____________ Cheers, Gary. | |
| ID: 71482 | | |
|
Bernd, | |
| ID: 71485 | | |
When you get to the point of deploying the new validator and the new set of apps, are you intending to run a (perhaps short) beta test phase first, as you did with the 4.24 Windows app? If new Apps are needed, I'll definitely publish them for a public Beta test first. Currently it looks like upgrading some server-side components (validator and workunit generator) may solve the problem and be the best choice, but we're still looking into this. If you are, might I make a suggestion about the app_info.xml file that would accompany each test app? As you warn quite clearly on the beta test page, changing the app aborts any work in progress with a client error. However you can easily avoid this with a small modification to the app_info.xml file. If you are already fully aware of this and do not want to allow a change of app in the middle of a result, that is fine - no change is needed. Actually I'll not advise people to manually hack the client_state.xml files, they are too fragile. However in the future the app_info.xml files in the Beta Test packages will include entries for previous (maybe both official and beta) App versions, so after installing the Beta Test Package even in the middle of a result will not lead to a Client Error, but just to be finished with the old App version, and new work will be assigned to the new App. Furthermore if you really want to switch the App version halfway through a result, see the sticky post on this subject. I can not guarantee that it will work at all, as e.g. the syntax of the checkpoint file might change between versions. BM | |
| ID: 71525 | | |
Hi Bernd, Thanks for the reply. I'm fully aware of that sticky you link to and I'm also NOT suggesting any hacking of the state file. My comments were about making some additions to the app_info.xml file so that the state file would remain pristine and that no changing of the name of the new executable so that it could pretend to be the old executable would be needed either (as was mentioned in the sticky). Taking the case of the transition from 4.17 to 4.24 as an example. Here there were desirable bugfixes and apparently no change in output syntax. It would be prudent therefore for any 4.17 "branded" results in a person's cache to be crunched by 4.24, rather than the old buggy app. This can be achieved very simply using a bit more intelligence built into app_info.xml. No dodgy editing of the state file is required at all. ____________ Cheers, Gary. | |
| ID: 71528 | | |
Taking the case of the transition from 4.17 to 4.24 as an example. Here there were desirable bugfixes and apparently no change in output syntax. It would be prudent therefore for any 4.17 "branded" results in a person's cache to be crunched by 4.24, rather than the old buggy app. This can be achieved very simply using a bit more intelligence built into app_info.xml. No dodgy editing of the state file is required at all. I understand. I guess I have to think about this a little more. BM | |
| ID: 71529 | | |
Wouldn't it be worthwhile to correct the uninitialized data problem in the Linux and Mac apps? As those were detected by compiler runtime checks, to me it sounds as if they were relevant. CU BRM ____________ ![]() ![]() | |
| ID: 71533 | | |
Currently it looks like upgrading some server-side components (validator and workunit generator) may solve the problem and be the best choice, but we're still looking into this. On Linux and Mac we haven't seen a single result that have been affected by this bug, i.e. it didn't have an effect on the final outcome of the calculation. With this 4.24 Windows App we have found another problem in the same module (which might have been introduced by the fix to the earlier problem). We're working on this. So we'll definitely release a new generation of Apps anyway with some bugfixes. However for the cross-platform validation problem (only) it might be that we'll need to deal with this only on the server side. BM | |
| ID: 71534 | | |
|
How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one... | |
| ID: 71537 | | |
How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one... Is it still happeneing with the new app?? I would have guesses that the majority of these bugs were secondary problems resulting in a failure to initialize the runtime debugger (which should now work). CU BRM ____________ ![]() ![]() | |
| ID: 71538 | | |
How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one... He emailed me the other day asking about it. It is with 4.24. 0xc0000142 is a DLL did not initialize. It is a Windows stop error. From what I read through googling it, it could be a science app problem or it could be a graphics subsystem problem. Graphics-related, I found a few mentions of the issue happening with ATI video cards. Sooooo, based off of what I recall from the initial Linux Signal 11 ("SIGABRT") issue with some OpenGL library, then it could be whatever OpenGL software that the ATI Catalyst drivers use... Ultimately, it's way out of my league. I mentioned he should contact Rom Walton...one of the main BOINC developers... Brian | |
| ID: 71539 | | |
How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one... Yep, got it. Sorry for not replying immediately, had two rather chaotic days. Wrote to Rom about it as you suggested. Edit: BTW, SIGABRT still seems to come up for Linux. See this result. Yep. But not too many (190 in past week), most from the same 4 machines. Not my highest priority right now. BM | |
| ID: 71546 | | |
|
Just had a 4.24 crap out about half way thorough its first result run with 4.24. | |
| ID: 71571 | | |
Just had a 4.24 crap out about half way thorough its first result run with 4.24. Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ... CU BRM ____________ ![]() ![]() | |
| ID: 71581 | | |
|
I have noticed one of these as well. At first glance it seems to be the same situation as Alinator's. It happened on the third result since the switch to 4.24. | |
| ID: 71588 | | |
Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ... Yep. Keeps me confused ever since I made the error messages a little more verbose. We actually get a lot of these errors, I'll write to Rom about that. Maybe boinc_fopen() does some funny things... BM | |
| ID: 71590 | | |
|
To answer the question below re 4.24 vs 4.17 with-w/o patch, I went ahead and did the 'ABC' patch and got the following results:
| |
| ID: 71591 | | |
|
Just to keep you updated of our plans, mainly regarding the cross-platform differences: | |
| ID: 71592 | | |
|
Excellent news, and just in time to deal with the 'new' monster workunits (>= 630 credits) that would otherwise cause quite a bit of frustration if crunched for zero credits because of the cross platform validation issue. | |
| ID: 71594 | | |
|
My AMD 3500+ liked the 4.24, it went from about 38hr with 4.17 to ~28hr with 4.24 on WU from the same set of datafile, still waiting for my crunch partner to see if it is valid. If there is more to do too speed it up it´s great but I understand that the validation problem must be looked at first. | |
| ID: 71596 | | |
After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far... Bernd, You might like to consider posting a short news item (linking to your latest message) on the front page right now. This would give more people who might like to participate in the next beta test some time to do a bit of research before things get going in earnest. There probably aren't a whole lot of participants following this particular thread anymore :). The other major benefit is for all those people who wouldn't participate in a beta test anyway. At least they should be highly encouraged to see that something is happening to address issues that may currently be turning them off this project. Just IMHO of course :). ____________ Cheers, Gary. | |
| ID: 71598 | | |
|
So is this a public beta test of 4.24? | |
| ID: 71608 | | |
So is this a public beta test of 4.24? The beta test of 4.24 finished a while ago and then 4.24 became the official version (for windows). Please read this entire thread carefully for the full story. Bernd is now talking about the future beta testing of apps that eventually will replace the current official apps. I would think that they will have version numbers higher than 4.24. ____________ Cheers, Gary. | |
| ID: 71610 | | |
|
| |
| ID: 71616 | | |
|
Some of AKOS's computers have been found and they're running apps >5x faster than stock. Once the bugs are worked out, I think faster apps is the planned solution to the slow box problem. | |
| ID: 71630 | | |
|
Just finished a wu with the beta version. | |
| ID: 71658 | | |
|
4.24 is no longer beta. It was pushed out as the mainstream win/x86 app about a week ago. | |
| ID: 71661 | | |
Here is essentially the same question as yours. It was asked in this very thread some days ago and Bernd answered it immediately - see the very next post for the answer. ____________ Cheers, Gary. | |
| ID: 71662 | | |
Thanks Gary!! (Sorry I missed it looking thru posts....) | |
| ID: 71733 | | |
Just to keep you updated of our plans, mainly regarding the cross-platform differences: Now that the validator issue has been resolved, are we almost to the point of beta testing a new batch of apps? ____________ | |
| ID: 71821 | | |
Now that the validator issue has been resolved, are we almost to the point of beta testing a new batch of apps? Yes we are. I'm currently waiting for some internal tests to finish and some feedback from other developers from the other side of the earth (see http://www.amaldi7.com/). Apps are in the pipeline. BM | |
| ID: 71829 | | |
|
The Win App 4.24 dropped nearly 40 WUs with 99 (0x66) Recursive error exit code on my computer in the early hours, but it works well now. | |
| ID: 71844 | | |
The Win App 4.24 dropped nearly 40 WUs .... The straight 4.24 app or one you had optimised? .... but it works well now. What caused it to start behaving again do you think? New (different frequency) data file perhaps? ____________ Cheers, Gary. | |
| ID: 71849 | | |
I'm currently waiting for some internal tests to finish ... Is there likely to be any performance improvement? Now that validation appears to be fixed, it would be nice to see some potential relief from the deadline pressure issue that is troubling many participants and not just those with older boxes :). ____________ Cheers, Gary. | |
| ID: 71850 | | |
The Win App 4.24 dropped nearly 40 WUs .... The official 4.24 app. .... but it works well now. I don't know yet. But I looked this problem on other computers too, they also reported client errors after a second. So it isn't a big time wasting thing, but these computers run out of the daily quota limit. And this problem usually happens on WU series. | |
| ID: 71851 | | |
|
An exit code of 99 means that the App terminated due to a failing internal sanity check. There should be a dump of a "status structure", similar to a stack dump, at the end of stderr_out, indicating the check that failed. | |
| ID: 71853 | | |
Akos, what are the last few lines of stderr_out of the results in question? I see the same stderr output in every cases. <core_client_version>5.4.11</core_client_version> What other tools accessing the filesystem (virus scanner, malware removal etc.) are you using? Only a Total Commander. OS is a Win2000 with SP4. | |
| ID: 71855 | | |
The one that you get on Microsoft Patchday? I never had any problems with it, but I guess if it would cause problems, they would kind of "stick out" statistically because most people will execute this tool automatically on MS Patchday, which is always a Wednesday, right? Might be worthwile to group errors by weekdays. As to file corruption, the MD5 checksums are in client_state.xml, right? So one could check unless the file is now already deleted. CU BRM ____________ ![]() ![]() | |
| ID: 71857 | | |
BTW: Anyone knows if the standard Microsoft Malware removal tool has any influence on BOINC Apps? Good shots! Akos, can you dig out the checksums from client_state.xml and check your data files? There's probably a simple too for Windows that does this (I usually use md5sum from Cygwin). BM | |
| ID: 71858 | | |
Akos, can you dig out the checksums from client_state.xml and check your data files? There's probably a simple too for Windows that does this (I usually use md5sum from Cygwin). Probably I can't check it before tueasday, but i keep it on my mind. | |
| ID: 71861 | | |
|
Here's another client error that looks kind of interesting: | |
| ID: 71911 | | |
|
Thanks for drawing my attention back to this one. | |
| ID: 71923 | | |
Thanks for drawing my attention back to this one. The owner of the box posted here recently, btw. CU BRM ____________ ![]() ![]() | |
| ID: 71924 | | |
|
See here. | |
| ID: 72022 | | |
|
I noticed that around the time Visual Studio .NET 2003 was released, the programmers at the Folding@home project noticed that their science code in their clients, which was written in hand-crafted assembly code and takes advantage of SSE and 3DNow!, were causing earlier revisions of Athlons with SSE to crash, while Pentium III's and Pentium 4's were not crashing on the same code. Therefore, Microsoft was forced to add this code to avoid writing code that crashes because no one knew at the time what was causing this behavior. Later on, somebody found out that the Folding@home code was using non-aligned memory accesses in its SSE code, and that was causing the Athlons to crash and was slowing the other chips down. Since almost no commercially-released compiler would be stupid enough to generate any code with the mistake of non-aligned memory accesses anyways, maybe it is time to upgrade the compiler to Visual Studio 2005, which might have the detection code removed because a compiler will never make the mistake that will cause the chip bugs to crash those Athlons. | |
| ID: 72392 | | |
I noticed that around the time Visual Studio .NET 2003 was released, the programmers at the Folding@home project noticed that their science code in their clients, which was written in hand-crafted assembly code and takes advantage of SSE and 3DNow!, were causing earlier revisions of Athlons with SSE to crash, while Pentium III's and Pentium 4's were not crashing on the same code. Therefore, Microsoft was forced to add this code to avoid writing code that crashes because no one knew at the time what was causing this behavior. Later on, somebody found out that the Folding@home code was using non-aligned memory accesses in its SSE code, and that was causing the Athlons to crash and was slowing the other chips down. Since almost no commercially-released compiler would be stupid enough to generate any code with the mistake of non-aligned memory accesses anyways, maybe it is time to upgrade the compiler to Visual Studio 2005, which might have the detection code removed because a compiler will never make the mistake that will cause the chip bugs to crash those Athlons. Hi! Interesting, is there a source where one could read more about this? Visual Studio 2005 indeed does no longer have this detection code, but anyways one would have expected MS to correct this "issue" in one of their service packs to Visual Studio 2003. Very much has changed in the floating point department between VS 2003 and 2005, and you can bet that also some new bugs have been introduced, so changing the compiler always comes at a certain risk. Anyway, the long term plans seem to be to compile the Windows version under gcc as well once that works. CU BRM ____________ ![]() ![]() | |
| ID: 72393 | | |
I noticed that around the time Visual Studio .NET 2003 was released, the programmers at the Folding@home project noticed that their science code in their clients, which was written in hand-crafted assembly code and takes advantage of SSE and 3DNow!, were causing earlier revisions of Athlons with SSE to crash, while Pentium III's and Pentium 4's were not crashing on the same code. Therefore, Microsoft was forced to add this code to avoid writing code that crashes because no one knew at the time what was causing this behavior. Later on, somebody found out that the Folding@home code was using non-aligned memory accesses in its SSE code, and that was causing the Athlons to crash and was slowing the other chips down. Since almost no commercially-released compiler would be stupid enough to generate any code with the mistake of non-aligned memory accesses anyways, maybe it is time to upgrade the compiler to Visual Studio 2005, which might have the detection code removed because a compiler will never make the mistake that will cause the chip bugs to crash those Athlons. The hot ticket if you want performance is the Intel ICC/IPP combination. It produces code that is about 40% faster than gcc on Intel and 25% faster on other cpu's. Supports Linux, Windows (icc called by VS is an option), and Mac. ____________ ![]() ![]() ![]() | |
| ID: 72402 | | |
But is it fair to AMD CPUs now? It used to cripple the code on AMD CPUs, much like what we saw with Microsoft VS 2003 here. CU BRM ____________ ![]() ![]() | |
| ID: 72404 | | |
I noticed that around the time Visual Studio .NET 2003 was released, the programmers at the Folding@home project noticed that their science code in their clients, which was written in hand-crafted assembly code and takes advantage of SSE and 3DNow!, were causing earlier revisions of Athlons with SSE to crash, while Pentium III's and Pentium 4's were not crashing on the same code. Therefore, Microsoft was forced to add this code to avoid writing code that crashes because no one knew at the time what was causing this behavior. Later on, somebody found out that the Folding@home code was using non-aligned memory accesses in its SSE code, and that was causing the Athlons to crash and was slowing the other chips down. Since almost no commercially-released compiler would be stupid enough to generate any code with the mistake of non-aligned memory accesses anyways, maybe it is time to upgrade the compiler to Visual Studio 2005, which might have the detection code removed because a compiler will never make the mistake that will cause the chip bugs to crash those Athlons. I actually made a reasonable guess at what was happening with VS 2003 and SSE, but the rest of this can be found somewhere in the Folding@home forums at http://forum.folding-community.org/. The temporary workaround was to fall back to 3DNow! mode whenever an Athlon was detected. Incidentally, 3DNow! mode was slower on an Athlon than SSE mode when both modes are able to function properly. This happened many years ago, so I do not want to spend hours searching that forum. | |
| ID: 72407 | | |
Message boards :
Cruncher's Corner :
Windows Beta Test App 4.24 available