here is an example i don't quite understand
four people worked on the WU but the one who claimed least credit actually was granted zero
Is it normal and if not - what went wrong?
why Outcome=Success when result's validate state is Invalid ?
____________
ID: 5710 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
When you click on the Result that was granted 0.00 credit you'll see that the "Validate state" is "invalid". Something's wron with the output file that was poduced.
> When you click on the Result that was granted 0.00 credit you'll see that the
> "Validate state" is "invalid". Something's wron with the output file that was
> poduced.
but why Outcome=Success, i thought outcome would not be success in such situations
____________
Here is another WU (327852) with a similar issue. Could it be that the "canonical" result is actually invalid and therefore trashing everybody else's credits? My host (1597) doesn't seem to indicate any errors, and the byte count is identical to the other hosts which received 0 credits.
ID: 5764 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
> but why Outcome=Success, i thought outcome would not be success in such situations
Outcome=success means that the program (Application) ran without crashing. It doesn't tell anything about the result file itself, that's the part of the validator on the server side.
We get a lot of Results back that are more or less numerical garbage, possibly due to overclocking affecting the FPU, though the program didn't really crash.
I currently can't say what's wrong with your particular Result, if it was related to the problem in the validator it should not happen again.
BM
____________
BM
ID: 5782 |
Magnus Back
Joined: Feb 19 05 Posts: 12 ID: 20342 Credit: 10,964 RAC: 0
Isn't it strange that the einstein@home WUs return bad results from machines
where seti@home WUs doesn't ...
____________
ID: 5840 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
> Isn't it strange that the einstein@home WUs return bad results from machines where seti@home WUs doesn't ...
I don't know the SETI code neither of the App nor the validator, but I can imagine at least two reasons for this:
1. SETIs validator isn't as picky as Einsteins. Thus they get back bad results, but don't recognize it at least on the level of analysis the users come to see.
2. SETIs App might rely more on memory and integer operations, while Einstein is definetly FPU bound. The CPU chip gets hot at the spot where the most energy is needed. When it gets too hot, it first breaks the results of the unit that is located there. If an integer unit gives false results, this will soon end in a crash of the program or the OS, e.g. because of wrong memory address calculations. If it's the FPU that gets too hot, you will notice nothing of it while the program runs until you take a close look at the results.
Carl Cristensen told me that CPDN gets all kinds of weird and obviously wrong results from overclocked machines. However I don't know how the CPDN validator handles them.
> > Isn't it strange that the einstein@home WUs return bad results from
I can imagine one more reason - i work on the PC E@H is running and sometimes my PC hangs up or crashes. I am forced to turn it off and on again and then scan disk finds FAT allocation problems and fixes them. Maybe after such restart E@H faces some problems ?
I noticed that the machine failing to deliver a valid result is a Linux 2.6.10 - machine, while all other machines completing the same WU are on Windows. My own machine (AMD-Duron 1800MHz) is running on Linux 2.6.3-7mdk, and is not overclocked. The first result was marked valid, but the last two, or rather three, results show errors in the log and were marked invalid.
Could this problem be somehow related to stopping/resuming work on a Linux-machine?
The error shown in the log
APP DEBUG: Application caught signal 2
Resuming computation at 23563/2690776/2691132
is I guess due to a shutdown of the program via Ctrl-C in a console. At least this is how I do it, and I get the same errors in my logs.
Yhis may be. I just started participating to E@H and have few linux machines. Machines send some results, claimed some credits. Every machine which has stopped boinc and run again has been granted zero credits for its work.
Well, in my case I did it for purpose. I find cpu unused becouse client couldn't contact with server to get more work. And
I find in log: Deferring communication with project for 20 hours ....
so i stopped boinc, removed the min_rpc_time line from config
and run boinc again. Now client didn't wait, just contacted server, got some work, and run core. The drawback is that I was grated zero credits.
On one hand I don't like boinc waiting over 20 hours while doing nothing, on the other, I don't want to be granted zero credits.
Any idea how to resolve the case ?
ID: 5875 |
Ned
Joined: Jan 22 05 Posts: 18 ID: 4548 Credit: 1,314,435 RAC: 143
My one and only Linux result suffered the same "invalid" fate with the "zero" credit rating... I had stopped Boinc a few times to see what was happening and to drop out of "X" to console mode.
So, is this a bug in the validator or is the Client not handling the shutdown properly in order the be restarted? Either way, it is a waste of several hours of processing if you expect every WU to have uninterrupted time.
____________
Ol' Retired IT Geezer
ID: 5904 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
@wijata.com: Please post the Result ids or names, or at least the id of the machine you did this on. I'll take a look at this.
> @wijata.com: Please post the Result ids or names, or at least the id of the
> machine you did this on. I'll take a look at this.
result id 1311263, hence the computer was known to have some memory issues in the past.
resid 1286319 and resid 1283956 comes from machines known as working good.
All thouse machines was previously run Folding@Home - no problems at that time, however the computing was surely different.
ID: 5925 |
Darren
Joined: Jan 18 05 Posts: 94 ID: 2400 Credit: 53,420 RAC: 0
OK, here's another one to add to the mix. I think it's a conspiracy against us Linux users ;~|
This wu was assigned to 2 windows systems and 2 linux systems. Looking at the results, the bytecount and checksums for the linux systems are identical. The windows systems are not identical, but obviously within the tolerance range.
Now, the 2 windows systems returned their results first so that set the determination for what is "valid" when the first linux result came in and didn't match. But, if the 2 linux results match exactly, how can it be determined that they are wrong (hence "invalid") and they get 0 credit while the "close enough" windows results are accepted as correct and given credit?
On this particular wu, what would have happened had the 2 linux systems returned their results first? It hardly seems that simply crossing the finish line first should either validate or invalidate science.
All that said, I'm really not overly concerned with points, but I really hate it when my contribution is useless. Thus far, I'm running about 40% invalid on einstein units across 5 linux systems - and that's very frustrating.
lin machine
Fstats.Ha: bytecount 1601439 checksum 76844773
Fstats.Hb: bytecount 1287439 checksum 61769950
(not that result are identical, thus granted zero credits)
Windows machines happen to be first.
We may officialy have conspiracy theory now ;)
No, it happen that is was third (3rd) result - not 4th. 4th result was also from linux and also zero credits. Identical results from different linux machines was marked invalid, while not identical from windows are OK.
And moreover. More and more my linux machines gets zero credits for work.
I don't belive all my linux machines are faulty... Some of them are servers working well for some long time.
Could some developer explain how WU are validated (what is validated)
Or maybe the problem is, that I downloaded 4.24 version (hence i use proxy and there is statement, which made me download 4.24 first)
Maybe it's just a bug in Linux version ?
Kio
Joined: Feb 23 05 Posts: 5 ID: 32069 Credit: 5,897 RAC: 0
I know little about the inner workings of BOINC... it's all Voodoo to me =) Having said that here is a simple observation from me...
On Errors:
Running:
AMD 3200+, WinXP Pro, 1 BG DDR Ram in dual mode.
I frequently shut down BOINC (Two to three times in an evening) because I am a PC Gamer and under certain circumstances I need as little overhead as I can accomplish for my PC. E@H seems to be operating A typical when compared to the other projects I am running. It returns bad units, asks for restarts, and never lists credits.
On the credits issue:
If CPDN can trickle, can E@H? ... it would generate a lot of good will if nothing else.
> On this particular wu, what would have happened had the 2
>linux systems returned their results first? It hardly seems
>that simply crossing the finish line first should either
>validate or invalidate science.
In this case the two Linux systems reported first and effectively locked out the Windows boxes - by the look of it anyway.
I wonder what it is about the results that causes this problem?
> All that said, I'm really not overly concerned with points,
>but I really hate it when my contribution is useless.
I'm with you on that. I don't think our contributions are completely useless in cases like this, but it is somewhat frustrating - seems like such a waste of time, IYKWIM?
Well i guess there are some differences in computation between linux and windows versions.
I guess it's time for developers to say something here. Our work (and CPU power and energy power) is wasted this way.
ID: 6354 |
Peter Wagner
Joined: Feb 24 05 Posts: 3 ID: 34664 Credit: 73,454 RAC: 0
> I guess it's time for developers to say something here. Our work (and CPU
> power and energy power) is wasted this way.
>
developers please do something. this is an obvious bug.
____________
I examined some result from the Merlin cluster running linux;
I found this one very ???
machine ID 3701 WU 1217945
It reported first but with different checksums with the 2 other WinXP's which had identical ones.
Found on the other hand also late reports for a Merlin machine but was credited also together with XP's.
????????????????
____________
John,
ID: 6447 |
Bruce Allen Forum moderator Project administrator Project developer Project scientist
Joined: Oct 15 04 Posts: 986 ID: 3 Credit: 170,849,008 RAC: 0
> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.
We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.
> I examined some result from the Merlin cluster running linux;
>
> I found this one very ???
> machine ID 3701 WU 1217945
>
> It reported first but with different checksums with the 2 other WinXP's which
> had identical ones.
>
> Found on the other hand also late reports for a Merlin machine but was
> credited also together with XP's.
>
> ????????????????
>
> Sorry, work ID = 342166 (and granted credit)
>
____________
John,
ID: 6454 |
Peter Wagner
Joined: Feb 24 05 Posts: 3 ID: 34664 Credit: 73,454 RAC: 0
> We are looking into this: it appears that our validator may be setting the
> agreement threshold slightly too tight in some cases. This is hard to 'tune
> in advance' without having access to the actual results. So please be
> patient: one of our developers is now working on this.
thanks in advance.
I'll be patient.
But hard to understand for an newbie like me. two computers running the same software - receive the same input - do the same computational stuff - should get the same results - apply the same checksum algorithm on these results - why can there be a difference at all?
Bruce Allen, note that this could be bigger problem than simply tuning the validator.
The history shows, that there were cases where first two windows machines returned similar results, and they were ok as should be. Then two linux machines returned identical results (hence different from windows machines) and they were marked invalid.
On some other thread i read, that core has different version on linux than windows (4.80 vs 4.79).
Maybe they just do different computation? Maybe difference is too big?
>
> > We are looking into this: it appears that our validator may be setting
> the
> > agreement threshold slightly too tight in some cases. This is hard to
> 'tune
> > in advance' without having access to the actual results. So please be
> > patient: one of our developers is now working on this.
>
> thanks in advance.
> I'll be patient.
>
> But hard to understand for an newbie like me. two computers running the same
> software - receive the same input - do the same computational stuff - should
> get the same results - apply the same checksum algorithm on these results -
> why can there be a difference at all?
>
> Peter
>
In two words: rounding errors.
The best example of this is 1/2=4.999999999 with as many 9's as your default floating type has significant digits. Different OS and CPUs handle this differently. After doing this reapeatedly and using the output for input on the next calculation the differences can become significant.
____________ BOINC WIKI
BOINCing since 2002/12/8
Not sure what calculator you used to come to 1/2=4.9999999999999 John, but I would throw it in the bin and use some old fashioned paper and a pencil. ;)
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
> On some other thread i read, that core has different version on linux than
> windows (4.80 vs 4.79).
> Maybe they just do different computation? Maybe difference is too big?
For database reasons we are currently keeping a separate minor version number for each architecture. However, the Apps 4.78/79/80 are built from exactly the same (science) code.
A stunningly simple example for architecture and rounding issues is (int)(10*0.3) (described here).
> Not sure what calculator you used to come to 1/2=4.9999999999999 John,
> but I would throw it in the bin and use some old fashioned paper and a pencil.
> ;)
>
You had better throw your computer in the trash then. That is the answer most computers will come up with if you are not careful how you program the expression. The way I typed it normally expands to int(1)/int(2)=float(4.99999999) notice the mixed numeric types. If you force all float type numbers float(1.0000000)/float(2.000000)=float(5.000000) it will normally get the correct answer.
Where this really affects things like how well the validator works is: computation1 yeilds result 1.23456789 on one platform and 1.23456788 on another platform. That is not much different and would be considered correct either way however the final result might do that computation many times and each time that little difference gets multiplied. Multiply that enough times and you get answers that are different enough not to match.
In the real calculations done for the project double percision numbers are most likely used. They are generally stored internally at 80 bits and then rounded to 64 bits when the result is sent back. This is done to reduce the rounding errors but they do creep in.
____________ BOINC WIKI
BOINCing since 2002/12/8
ID: 6592 |
Peter Wagner
Joined: Feb 24 05 Posts: 3 ID: 34664 Credit: 73,454 RAC: 0
> > Not sure what calculator you used to come to 1/2=4.9999999999999
> John,
> > but I would throw it in the bin and use some old fashioned paper and a
> pencil.
> > ;)
> >
> You had better throw your computer in the trash then. That is the answer most
> computers will come up with if you are not careful how you program the
> expression. The way I typed it normally expands to
> int(1)/int(2)=float(4.99999999) notice the mixed numeric types. If you force
> all float type numbers float(1.0000000)/float(2.000000)=float(5.000000) it
> will normally get the correct answer.
What do you get when you divide 1 by 2?
your answer : 4.99999999
my answer : one half
my good old slide rule : 0.49999999
my mathematics teacher : there is a solution to your problem
my computer : 42
And one more piece of evidence that this project is too generous when it comes to resources of contributors. Another thing is the fourfold calculation of the same workunit when just two "identical" results are enough to mark the outcome as valid. What about sending 1 WU to only two sites first and use the third site only to select the liar in the case of different results? And if we get three divers reports, well, is it still science? The code really should be robust enough not to allow that. Wijata is right, we probably should wait some time until E@H matures.
____________
> Well i guess there are some differences in computation
>between linux and windows versions.
>
> I guess it's time for developers to say something here....
Actually, it seems they are aware of the problem and looking at it. Take a look at this thread.
TTFN - Pete.
____________
ID: 6737 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
> Actually, it seems they are aware of the problem and looking at it.
We are. It turned out that it's not done by just adjusting some parameters, but we need to change the validator as a whole, which means basically to rewrite it. We are working on this.
>> Actually, it seems they are aware of the problem and
>>looking at it....
>
> We are. It turned out that it's not done by just adjusting
>some parameters, but we need to change the validator as a whole,
>which means basically to rewrite it. We are working on this.
Ouch, sounds like a big job. I wish you well with it!
What is the ETA on rewrite, near term or longer term?
____________
ID: 6867 |
Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2085 ID: 2 Credit: 25,013,655 RAC: 35,457
There are people already working on it. I'd expect this to be ready about next week, but usually all that can go wrong will go wrong - so don't bet on it. Let's hope that it goes wrong before we put it on the public server.
I already have half a dozen Results trashed as "invalid" on my Linux Boxes so far, I was beginning to become concerned (as 4 other Projects validate all their work as correct)
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
My Red Hat system is now getting credit for latest work, it still has 3 as invalid, Xandros is still all invalid. No changes made at my end.
____________
ID: 7617 |
Peter Koek
Joined: Mar 4 05 Posts: 21 ID: 46774 Credit: 322,305 RAC: 191
I have joined this project this week, because I am very interested in science. However my first result is "invalid". I don't understand why, if 2 computers are given the same input, it should produce the same output. I don't think the OS should have any effect, because the FPU does the calculation and not the OS. But have said this, I see lots of people complaining about invalid results with linux OS's, and I have also a linux OS. I use Redhat 9, with a 2.6.10 kernel. Therefore I think to quit this project again, because it's a waste of electricity if all results of linux OS's are marked as "invalid".
+/- 100 Watt (0.1 kW) of power is used by a PC.
1 KWh costs 0.06 euro
tax (netherlands) 0.08 euro
---------------------------
0.14 euro
24/7 for 1 year results: 365 * 24 * 0.1 * 0.14 = 122.64 euro
Lots of money, therefore I want more feedback why results are marked as "invalid".
____________
ID: 7971 |
Gary Roberts Forum moderator
Joined: Feb 9 05 Posts: 2167 ID: 12521 Credit: 70,318,296 RAC: 133,692
> Lots of money, therefore I want more feedback why results are marked as
> "invalid".
Have you actually read the information in this thread?? Here is what Prof Allen has already said about the very issue you are raising.
> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.
We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.
Bruce
This "tight validation" problem is affecting a lot of people running linux. For more information do a keyword search for something like "linux validation" and see what you get.
> This "tight validation" problem is affecting a lot
>of people running linux....
Not just Linux, Windows too....I've had a number of results declared invalid where two or more of the other hosts were running Linux.
Haven't had the problem recently as it seems TPTB have arranged for only machines running the same OS to crunch particular WUs - or is this perhaps coincidence?
> A new linux computer attached to the project yesterday has gotten nothing but
> zeros, in three attempts.
>
> http://einstein.phys.uwm.edu/results.php?hostid=76355
>
I bet you get another 0 credits on the workunit with currently pending credits (http://einstein.phys.uwm.edu/workunit.php?wuid=504786). The other returned (from a windows machine) result has a different checksum and the other computers working on the same WU are also Windows boxes.
I'll move to the climate simulations (climateprediction.net) while this problem is not solved.
I just came back on board a few days ago when LHC went down again & out of 113 WU's of mine that have been granted credit so far none of them had 0 Credits Granted, most of them had more than I requested in fact ...
ID: 8915 |
Jorge Guerra
Joined: Mar 6 05 Posts: 1 ID: 48128 Credit: 94,812 RAC: 60
It's really depresing I have ~61% of my units granted *zero* credits.
I don't know what to do, it seems like all the work my computer is doing is just going to the trash, I'm starting to belive that it's a conspiracy against linux users :(
Is this "just the way things will be"? Or is there a better desired end-state?
I would hope for the latter.
Bruce Richardson
Team: Richwood Timber LLC
____________
When the going gets weird, the weird turn PRO. -- Hunter S. Thompson
ID: 9733 |
Robert Somerville
Joined: Nov 11 04 Posts: 27 ID: 1873 Credit: 21,819 RAC: 0
> There are people already working on it. I'd expect this to be ready about next
> week, but usually all that can go wrong will go wrong - so don't bet on it.
> Let's hope that it goes wrong before we put it on the public server.
>
> BM
>
Is there any results from the programmers on speeding up the Linux versions & fixing the validation problems on Linux ??
There is no reason that Linux machines should not be with 10% of Windows machines
if the programming staff has any clue at all ( sorry for being direct about it ), this issue could be related to the validation scheme problems, if you are validating against Windoze machines, perhaps you are striving to hard to match their "C library", when you shouldn't be. GNU C++ is widely regarded as the mopst correct C++ compiler ...........
____________
Robert Somerville
> I don't know what to do, it seems like all the work my computer is doing is
> just going to the trash, I'm starting to belive that it's a conspiracy against
> linux users :(
>
____________
> > I don't know what to do, it seems like all the work my computer is doing
> is
> > just going to the trash, I'm starting to belive that it's a conspiracy
> against
> > linux users :(
> >
EDIT TO ABOVE: Who the heck is using my account to post messages and claim my results. Their can not be two CMHCS's with the same account keys.......
____________
> > > I don't know what to do, it seems like all the work my computer is
> doing
> > is
> > > just going to the trash, I'm starting to belive that it's a
> conspiracy
> > against
> > > linux users :(
> > >
> EDIT TO ABOVE: Who the heck is using my account to post messages and claim my
> results. Their can not be two CMHCS's with the same account keys.......
>I did not write the post above, I don't even own a Linux machine.......
____________
Linux 2.4.26-HN-1.6-AMD
AuthenticAMD AMD Athlon(tm) XP 1700+
13 results:
7 with positive granted credit
4 with zero granted credit
2 with validate error
____________
ID: 9906 |
Robert Somerville
Joined: Nov 11 04 Posts: 27 ID: 1873 Credit: 21,819 RAC: 0
> going to quit
>
> Linux 2.4.26-HN-1.6-AMD
> AuthenticAMD AMD Athlon(tm) XP 1700+
>
> 13 results:
> 7 with positive granted credit
> 4 with zero granted credit
> 2 with validate error
>
>
Install Wine; check out posts from john; i haven't had an invalid result in 2 days & have racked up ~700pts (its twice as fast as linux version) (but not as satisfying )
here's John's post to me ....
also the posts on xvnc if you are logging out of X (do a search on Xvnc)
**********************************
You can do a search on WINE: see also my following answer in another thread I gave.
IMHO running under Linux is waste of time; after a couple of approved and credited WU's,all my recent WU's are credited 0 again.
Combine this with the slow running om all platforms but Windows, I decided to install wine also, plse have a look at the follwing site about dll's and installshield issue's and setup will be easy.
http://frankscorner.org/index.php?p=ishield
Plse use the 4.19 version for windows and run 'nice wine boinc_cli.exe -return_results_immediately -allow_remote_gui_rpc'
Do not forget to export the warnings as follows, (before starting boinc !! in the xterm):
export WINEDEBUG="err-all,warn-all,fixme-all,trace-all"
You can have the screensaver then by telnetting to port 127.0.0.1 31416
Results so far are good for most people,
Hopefully there will be a better native solution for specific nonWin platforms
PS.
Without the 'export WINEDEBUG' there will be many errors in your xterm window; they seem to be related with sceduling but they do NOT however influence the results; I have already meny credited WU's and crunching is without errors for 96 hr already.
Succes !!
John
____________
Robert Somerville
ID: 9949 |
Darren
Joined: Jan 18 05 Posts: 94 ID: 2400 Credit: 53,420 RAC: 0
> Install Wine; check out posts from john; i haven't had an invalid result in 2
> days & have racked up ~700pts (its twice as fast as linux version) (but
> not as satisfying )
I've had no luck at all getting it to run with wine. I can get through the install, attach and download work, but then they just immediately error with the following stderr output:
[core_client_version]4.19[/core_client_version]
[message] - exit code 99 (0x63)
[/message]
[active_task_state]1[/active_task_state]
[signal]0[/signal]
[stderr_txt]
WARNING: Can't boinc-resolve config file "conf"
Could not open data-file: `./conf`
Level 0: $Id: ComputeFStatistic.c,v 1.232 2005/02/11 17:03:16 ballen Exp $
Function call `LALUserVarReadAllInput(stat,argc,argv)' failed.
file ComputeFStatistic.c, line 436
Level 1: $Id: UserInput.c,v 1.21 2004/11/29 18:38:59 reinhard Exp $
Status code -1: Recursive error
function LALUserVarReadAllInput, file UserInput.c, line 680
Level 2: $Id: UserInput.c,v 1.21 2004/11/29 18:38:59 reinhard Exp $
Status code -1: Recursive error
function LALUserVarReadCfgfile, file UserInput.c, line 463
Level 3: $Id: ConfigFile.c,v 1.15 2004/11/03 23:52:05 reinhard Exp $
Status code 2: File error.
function LALParseDataFile, file ConfigFile.c, line 188
BOINC_ERR_EXIT: now calling boinc_finish()
[/stderr_txt>]
And I am not a computer anything except user - so that's all nothing but plain old jibberish to me.
I'm in the same spot as AK1001 at this point - I'm surrendering. I have 2 systems that are dual boot. One of them is my primary system, so I boot it into windows when I go to bed, but I can't stand to use windows so I have to boot it back into linux when I get up. My others are linux only and sit off in a closet with me only accessing them with ssh, so booting them into windows isn't an option.
Between 3 different issues it's driving me insane trying to keep einstein going. Most annoying is that einstein rarely restarts after a project change. Then there's the issue of the linux client being so slow, and then combine that with the fact that a lot of the work gets marked invalid after all that, and it's just a bit much.
I've set all my systems today to an alternate profile that will stop any further work from downloading. Once they've finished what they have now, I'll detatch them from einstein and just let them keep going with projects that aren't giving them so much grief. Not what I want to do, but I'm used to checking them every couple weeks and finding everything going ok, not having to check them every couple hours and finding problems.
____________
ID: 9953 |
Robert Somerville
Joined: Nov 11 04 Posts: 27 ID: 1873 Credit: 21,819 RAC: 0
> > Install Wine; check out posts from john; i haven't had an invalid result
> in 2
> > days & have racked up ~700pts (its twice as fast as linux version)
> (but
> > not as satisfying )
>
i used the 4.19 version of boinc, which version did you install ??
____________
Robert Somerville
ID: 10074 |
Darren
Joined: Jan 18 05 Posts: 94 ID: 2400 Credit: 53,420 RAC: 0
> i used the 4.19 version of boinc, which version did you install ??
That was with 4.19. From what I've read anything newer isn't working with wine for anyone, so all of my attempts were with 4.19.
I am using a win2000 PC & 98% of the WU are valid. I am also using Fedora core 3 on my server, which is hyperthreaded, & I have 40% of the WU are valid. At first I thought it was the hyperthreading, so I cut it down to use one processor & that has stepped it valid results to about 50%.
____________
i'm about to take one machine offline, which started to turn out a lot of zero-score results (about 60..70%) all of a sudden two weeks ago. wine is no option on that machine. That's an athlon XP 2400+, running linux (debian, kernel 2.4.27). Other machines which i have commited include athlon XP 1700+ to XP 2600+, all of them running linux, but as far only this machine exhibits the problem: http://einstein.phys.uwm.edu/results.php?hostid=39647
> @wijata.com: Please post the Result ids or names, or at least the id of the
> machine you did this on. I'll take a look at this.
>
> BM
>
I have the same problem - no OC, no Ctrl C. Report time was 1st of April.
Result ID Comp ID
2400876 87327 25 Mar 2005 2:26:29 UTC 31 Mar 2005 1:14:36 UTC Over Success Done 38,431.24 87.68 0.00
I think, that Einstein seems to be a (very) little bit buggy, because also to me it happens, that a lot of Units are "state: invalid" It takes a day of CPU to get no credit - and this is very frustrating. I should better go back to seti, because CPU ist better used. Dont think, that Einstein people dosn't need invalid results too. (btw: the machine ist linux, no additional boinc project running - no crashes up and runnig for a couple of month. therefore the calculation in the applications sometimes runs into rubbish.... :-(
rainy
This material is based upon work supported by the National Science
Foundation (NSF) under Grant NSF-0200852 and by the Max Planck
Gesellschaft (MPG). Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the investigators
and do not necessarily reflect the views of the NSF or the MPG.