granted credit zero


Advanced search

Message boards : Problems and Bug Reports : granted credit zero

Sort
AuthorMessage
Profile debugas
Avatar
Joined: Nov 11 04
Posts: 171
ID: 641
Credit: 69,044
RAC: 23
Message 5710 - Posted 26 Feb 2005 17:29:18 UTC
Last modified: 26 Feb 2005 17:34:08 UTC

here is an example i don't quite understand
four people worked on the WU but the one who claimed least credit actually was granted zero

Is it normal and if not - what went wrong?
why Outcome=Success when result's validate state is Invalid ?
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 5711 - Posted 26 Feb 2005 17:33:00 UTC

When you click on the Result that was granted 0.00 credit you'll see that the "Validate state" is "invalid". Something's wron with the output file that was poduced.

BM

____________
BM

Profile debugas
Avatar
Joined: Nov 11 04
Posts: 171
ID: 641
Credit: 69,044
RAC: 23
Message 5713 - Posted 26 Feb 2005 17:34:46 UTC - in response to Message 5711.

> When you click on the Result that was granted 0.00 credit you'll see that the
> "Validate state" is "invalid". Something's wron with the output file that was
> poduced.

but why Outcome=Success, i thought outcome would not be success in such situations
____________

Profile gravitysmith
Joined: Nov 8 04
Posts: 54
ID: 269
Credit: 2,517,688
RAC: 4,294
Message 5764 - Posted 26 Feb 2005 21:13:33 UTC

Here is another WU (327852) with a similar issue. Could it be that the "canonical" result is actually invalid and therefore trashing everybody else's credits? My host (1597) doesn't seem to indicate any errors, and the byte count is identical to the other hosts which received 0 credits.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 5782 - Posted 26 Feb 2005 23:22:29 UTC - in response to Message 5713.

> but why Outcome=Success, i thought outcome would not be success in such situations

Outcome=success means that the program (Application) ran without crashing. It doesn't tell anything about the result file itself, that's the part of the validator on the server side.

We get a lot of Results back that are more or less numerical garbage, possibly due to overclocking affecting the FPU, though the program didn't really crash.

I currently can't say what's wrong with your particular Result, if it was related to the problem in the validator it should not happen again.

BM
____________
BM

Magnus Back
Joined: Feb 19 05
Posts: 12
ID: 20342
Credit: 10,964
RAC: 0
Message 5840 - Posted 27 Feb 2005 5:46:10 UTC

Isn't it strange that the einstein@home WUs return bad results from machines
where seti@home WUs doesn't ...


____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 5849 - Posted 27 Feb 2005 7:58:24 UTC - in response to Message 5840.
Last modified: 27 Feb 2005 8:06:55 UTC

> Isn't it strange that the einstein@home WUs return bad results from machines where seti@home WUs doesn't ...

I don't know the SETI code neither of the App nor the validator, but I can imagine at least two reasons for this:

1. SETIs validator isn't as picky as Einsteins. Thus they get back bad results, but don't recognize it at least on the level of analysis the users come to see.

2. SETIs App might rely more on memory and integer operations, while Einstein is definetly FPU bound. The CPU chip gets hot at the spot where the most energy is needed. When it gets too hot, it first breaks the results of the unit that is located there. If an integer unit gives false results, this will soon end in a crash of the program or the OS, e.g. because of wrong memory address calculations. If it's the FPU that gets too hot, you will notice nothing of it while the program runs until you take a close look at the results.

Carl Cristensen told me that CPDN gets all kinds of weird and obviously wrong results from overclocked machines. However I don't know how the CPDN validator handles them.

BM
____________
BM

Profile debugas
Avatar
Joined: Nov 11 04
Posts: 171
ID: 641
Credit: 69,044
RAC: 23
Message 5858 - Posted 27 Feb 2005 9:57:23 UTC - in response to Message 5849.

> > Isn't it strange that the einstein@home WUs return bad results from
I can imagine one more reason - i work on the PC E@H is running and sometimes my PC hangs up or crashes. I am forced to turn it off and on again and then scan disk finds FAT allocation problems and fixes them. Maybe after such restart E@H faces some problems ?

Molgo
Joined: Feb 24 05
Posts: 1
ID: 36343
Credit: 176,101
RAC: 0
Message 5861 - Posted 27 Feb 2005 10:05:03 UTC - in response to Message 5849.

I noticed that the machine failing to deliver a valid result is a Linux 2.6.10 - machine, while all other machines completing the same WU are on Windows. My own machine (AMD-Duron 1800MHz) is running on Linux 2.6.3-7mdk, and is not overclocked. The first result was marked valid, but the last two, or rather three, results show errors in the log and were marked invalid.

Could this problem be somehow related to stopping/resuming work on a Linux-machine?

The error shown in the log

APP DEBUG: Application caught signal 2
Resuming computation at 23563/2690776/2691132

is I guess due to a shutdown of the program via Ctrl-C in a console. At least this is how I do it, and I get the same errors in my logs.

Any comments?


____________

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 5875 - Posted 27 Feb 2005 11:36:51 UTC
Last modified: 27 Feb 2005 11:38:26 UTC

Yhis may be. I just started participating to E@H and have few linux machines. Machines send some results, claimed some credits. Every machine which has stopped boinc and run again has been granted zero credits for its work.

Well, in my case I did it for purpose. I find cpu unused becouse client couldn't contact with server to get more work. And
I find in log: Deferring communication with project for 20 hours ....
so i stopped boinc, removed the min_rpc_time line from config
and run boinc again. Now client didn't wait, just contacted server, got some work, and run core. The drawback is that I was grated zero credits.

On one hand I don't like boinc waiting over 20 hours while doing nothing, on the other, I don't want to be granted zero credits.
Any idea how to resolve the case ?

Ned
Joined: Jan 22 05
Posts: 18
ID: 4548
Credit: 1,314,435
RAC: 143
Message 5904 - Posted 27 Feb 2005 15:14:11 UTC

My one and only Linux result suffered the same "invalid" fate with the "zero" credit rating... I had stopped Boinc a few times to see what was happening and to drop out of "X" to console mode.

So, is this a bug in the validator or is the Client not handling the shutdown properly in order the be restarted? Either way, it is a waste of several hours of processing if you expect every WU to have uninterrupted time.








____________
Ol' Retired IT Geezer

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 5920 - Posted 27 Feb 2005 16:13:43 UTC

@wijata.com: Please post the Result ids or names, or at least the id of the machine you did this on. I'll take a look at this.

BM

____________
BM

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 5925 - Posted 27 Feb 2005 16:25:51 UTC - in response to Message 5920.

> @wijata.com: Please post the Result ids or names, or at least the id of the
> machine you did this on. I'll take a look at this.

result id 1311263, hence the computer was known to have some memory issues in the past.
resid 1286319 and resid 1283956 comes from machines known as working good.
All thouse machines was previously run Folding@Home - no problems at that time, however the computing was surely different.

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 6032 - Posted 28 Feb 2005 0:35:02 UTC
Last modified: 28 Feb 2005 0:46:40 UTC

OK, here's another one to add to the mix. I think it's a conspiracy against us Linux users ;~|

396825

This wu was assigned to 2 windows systems and 2 linux systems. Looking at the results, the bytecount and checksums for the linux systems are identical. The windows systems are not identical, but obviously within the tolerance range.

Now, the 2 windows systems returned their results first so that set the determination for what is "valid" when the first linux result came in and didn't match. But, if the 2 linux results match exactly, how can it be determined that they are wrong (hence "invalid") and they get 0 credit while the "close enough" windows results are accepted as correct and given credit?

On this particular wu, what would have happened had the 2 linux systems returned their results first? It hardly seems that simply crossing the finish line first should either validate or invalidate science.

All that said, I'm really not overly concerned with points, but I really hate it when my contribution is useless. Thus far, I'm running about 40% invalid on einstein units across 5 linux systems - and that's very frustrating.

Darren


____________

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 6125 - Posted 28 Feb 2005 13:32:36 UTC

I'm afraid You are 100 percent right
Look also here http://einstein.phys.uwm.edu/workunit.php?wuid=369574

win machine
Fstats.Ha: bytecount 1601439 checksum 76686342
Fstats.Hb: bytecount 1287439 checksum 61644201

win machine
Fstats.Ha: bytecount 1601439 checksum 76686342
Fstats.Hb: bytecount 1287439 checksum 61644194
(note that checksum differs, hence both machines was granted credits)

lin machine
Fstats.Ha: bytecount 1601439 checksum 76844773
Fstats.Hb: bytecount 1287439 checksum 61769950

lin machine
Fstats.Ha: bytecount 1601439 checksum 76844773
Fstats.Hb: bytecount 1287439 checksum 61769950
(not that result are identical, thus granted zero credits)
Windows machines happen to be first.
We may officialy have conspiracy theory now ;)

Profile debugas
Avatar
Joined: Nov 11 04
Posts: 171
ID: 641
Credit: 69,044
RAC: 23
Message 6158 - Posted 28 Feb 2005 16:18:06 UTC - in response to Message 6125.

If i understand right - the evaluation is done upon first 3 results reported.
It just happened that your linux machine was the last one to report

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 6164 - Posted 28 Feb 2005 16:34:28 UTC
Last modified: 28 Feb 2005 16:35:08 UTC

No, it happen that is was third (3rd) result - not 4th. 4th result was also from linux and also zero credits. Identical results from different linux machines was marked invalid, while not identical from windows are OK.

And moreover. More and more my linux machines gets zero credits for work.
I don't belive all my linux machines are faulty... Some of them are servers working well for some long time.

Could some developer explain how WU are validated (what is validated)

Or maybe the problem is, that I downloaded 4.24 version (hence i use proxy and there is statement, which made me download 4.24 first)
Maybe it's just a bug in Linux version ?

mhe
Joined: Feb 24 05
Posts: 11
ID: 35424
Credit: 7,576
RAC: 0
Message 6177 - Posted 28 Feb 2005 17:05:46 UTC - in response to Message 6164.

hi,

i'm using

http://www.pperry.f2s.com/files/boinc_4.19_pentium4-pc-linux-gnu.tar.bz2
http://www.pperry.f2s.com/files/boinc_4.19_athlon-xp-pc-linux-gnu.tar.bz2

on a laptop and a ws.

all get invalid results

http://einstein.phys.uwm.edu/result.php?resultid=1358689
http://einstein.phys.uwm.edu/result.php?resultid=1325488
http://einstein.phys.uwm.edu/result.php?resultid=1325484

____________

Profile Higgs Boson
Joined: Feb 25 05
Posts: 15
ID: 38687
Credit: 5,891,610
RAC: 5,710
Message 6253 - Posted 28 Feb 2005 22:25:36 UTC

I'm having the same results, output looks good but got no credit.

Computer id=45532 WU's are 379527, 375913, 372616 and 368687
____________

mhe
Joined: Feb 24 05
Posts: 11
ID: 35424
Credit: 7,576
RAC: 0
Message 6255 - Posted 28 Feb 2005 22:32:53 UTC - in response to Message 6253.

> I'm having the same results, output looks good but got no credit.
>
> Computer id=45532 WU's are 379527, 375913, 372616 and 368687
>

just finished and uploaded

invalid

http://einstein.phys.uwm.edu/workunit.php?wuid=372656
____________

Profile Kio
Avatar
Joined: Feb 23 05
Posts: 5
ID: 32069
Credit: 5,897
RAC: 0
Message 6282 - Posted 1 Mar 2005 0:29:41 UTC

I know little about the inner workings of BOINC... it's all Voodoo to me =) Having said that here is a simple observation from me...

On Errors:
Running:
AMD 3200+, WinXP Pro, 1 BG DDR Ram in dual mode.
I frequently shut down BOINC (Two to three times in an evening) because I am a PC Gamer and under certain circumstances I need as little overhead as I can accomplish for my PC. E@H seems to be operating A typical when compared to the other projects I am running. It returns bad units, asks for restarts, and never lists credits.

On the credits issue:
If CPDN can trickle, can E@H? ... it would generate a lot of good will if nothing else.

Thanks
____________
-Kio

Ensor
Avatar
Joined: Feb 9 05
Posts: 48
ID: 11399
Credit: 257,408
RAC: 462
Message 6321 - Posted 1 Mar 2005 3:20:07 UTC - in response to Message 6032.


Hi,

> On this particular wu, what would have happened had the 2
>linux systems returned their results first? It hardly seems
>that simply crossing the finish line first should either
>validate or invalidate science.

Actually, take a look at this WU: #366758.

In this case the two Linux systems reported first and effectively locked out the Windows boxes - by the look of it anyway.

I wonder what it is about the results that causes this problem?


> All that said, I'm really not overly concerned with points,
>but I really hate it when my contribution is useless.

I'm with you on that. I don't think our contributions are completely useless in cases like this, but it is somewhat frustrating - seems like such a waste of time, IYKWIM?


TTFN - Pete.

____________

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 6354 - Posted 1 Mar 2005 8:27:49 UTC

Well i guess there are some differences in computation between linux and windows versions.

I guess it's time for developers to say something here. Our work (and CPU power and energy power) is wasted this way.

Peter Wagner
Joined: Feb 24 05
Posts: 3
ID: 34664
Credit: 73,454
RAC: 0
Message 6434 - Posted 1 Mar 2005 18:32:55 UTC - in response to Message 6354.

> I guess it's time for developers to say something here. Our work (and CPU
> power and energy power) is wasted this way.
>
developers please do something. this is an obvious bug.
____________

john.mac
Avatar
Joined: Feb 9 05
Posts: 85
ID: 8324
Credit: 167,393
RAC: 0
Message 6447 - Posted 1 Mar 2005 18:55:54 UTC

I examined some result from the Merlin cluster running linux;

I found this one very ???
machine ID 3701 WU 1217945

It reported first but with different checksums with the 2 other WinXP's which had identical ones.

Found on the other hand also late reports for a Merlin machine but was credited also together with XP's.

????????????????


____________
John,

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 986
ID: 3
Credit: 170,849,008
RAC: 0
Message 6450 - Posted 1 Mar 2005 19:05:12 UTC - in response to Message 6434.

> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.

We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.

Bruce
____________

john.mac
Avatar
Joined: Feb 9 05
Posts: 85
ID: 8324
Credit: 167,393
RAC: 0
Message 6454 - Posted 1 Mar 2005 19:13:31 UTC - in response to Message 6447.

> I examined some result from the Merlin cluster running linux;
>
> I found this one very ???
> machine ID 3701 WU 1217945
>
> It reported first but with different checksums with the 2 other WinXP's which
> had identical ones.
>
> Found on the other hand also late reports for a Merlin machine but was
> credited also together with XP's.
>
> ????????????????
>
> Sorry, work ID = 342166 (and granted credit)
>
____________
John,

Peter Wagner
Joined: Feb 24 05
Posts: 3
ID: 34664
Credit: 73,454
RAC: 0
Message 6456 - Posted 1 Mar 2005 19:18:51 UTC - in response to Message 6450.


> We are looking into this: it appears that our validator may be setting the
> agreement threshold slightly too tight in some cases. This is hard to 'tune
> in advance' without having access to the actual results. So please be
> patient: one of our developers is now working on this.

thanks in advance.
I'll be patient.

But hard to understand for an newbie like me. two computers running the same software - receive the same input - do the same computational stuff - should get the same results - apply the same checksum algorithm on these results - why can there be a difference at all?

Peter
____________

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 6460 - Posted 1 Mar 2005 19:47:07 UTC
Last modified: 1 Mar 2005 19:48:53 UTC

Bruce Allen, note that this could be bigger problem than simply tuning the validator.
The history shows, that there were cases where first two windows machines returned similar results, and they were ok as should be. Then two linux machines returned identical results (hence different from windows machines) and they were marked invalid.

On some other thread i read, that core has different version on linux than windows (4.80 vs 4.79).
Maybe they just do different computation? Maybe difference is too big?

Greetings from Poland!

Profile Keck_Komputers
Avatar
Joined: Jan 18 05
Posts: 376
ID: 2914
Credit: 829,579
RAC: 1,653
Message 6502 - Posted 1 Mar 2005 23:44:05 UTC - in response to Message 6456.

>
> > We are looking into this: it appears that our validator may be setting
> the
> > agreement threshold slightly too tight in some cases. This is hard to
> 'tune
> > in advance' without having access to the actual results. So please be
> > patient: one of our developers is now working on this.
>
> thanks in advance.
> I'll be patient.
>
> But hard to understand for an newbie like me. two computers running the same
> software - receive the same input - do the same computational stuff - should
> get the same results - apply the same checksum algorithm on these results -
> why can there be a difference at all?
>
> Peter
>
In two words: rounding errors.

The best example of this is 1/2=4.999999999 with as many 9's as your default floating type has significant digits. Different OS and CPUs handle this differently. After doing this reapeatedly and using the output for input on the next calculation the differences can become significant.
____________
BOINC WIKI

BOINCing since 2002/12/8

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1953
ID: 7430
Credit: 155,035
RAC: 78
Message 6507 - Posted 1 Mar 2005 23:54:21 UTC - in response to Message 6502.

Not sure what calculator you used to come to 1/2=4.9999999999999 John, but I would throw it in the bin and use some old fashioned paper and a pencil. ;)


____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 6564 - Posted 2 Mar 2005 4:16:05 UTC - in response to Message 6460.
Last modified: 2 Mar 2005 4:22:32 UTC

> On some other thread i read, that core has different version on linux than
> windows (4.80 vs 4.79).
> Maybe they just do different computation? Maybe difference is too big?

For database reasons we are currently keeping a separate minor version number for each architecture. However, the Apps 4.78/79/80 are built from exactly the same (science) code.

A stunningly simple example for architecture and rounding issues is (int)(10*0.3) (described here).

BM
____________
BM

bcorsello
Joined: Feb 20 05
Posts: 3
ID: 22435
Credit: 118,233
RAC: 0
Message 6568 - Posted 2 Mar 2005 5:40:41 UTC - in response to Message 6564.

> A stunningly simple example for architecture and rounding issues is
> (int)(10*0.3) (described here).

How can the project trust ANY of the results it's getting? The choice of which result is "canonical" seems to be pretty arbitrary.
____________

Profile Keck_Komputers
Avatar
Joined: Jan 18 05
Posts: 376
ID: 2914
Credit: 829,579
RAC: 1,653
Message 6592 - Posted 2 Mar 2005 10:35:40 UTC - in response to Message 6507.

> Not sure what calculator you used to come to 1/2=4.9999999999999 John,
> but I would throw it in the bin and use some old fashioned paper and a pencil.
> ;)
>
You had better throw your computer in the trash then. That is the answer most computers will come up with if you are not careful how you program the expression. The way I typed it normally expands to int(1)/int(2)=float(4.99999999) notice the mixed numeric types. If you force all float type numbers float(1.0000000)/float(2.000000)=float(5.000000) it will normally get the correct answer.

Where this really affects things like how well the validator works is: computation1 yeilds result 1.23456789 on one platform and 1.23456788 on another platform. That is not much different and would be considered correct either way however the final result might do that computation many times and each time that little difference gets multiplied. Multiply that enough times and you get answers that are different enough not to match.

In the real calculations done for the project double percision numbers are most likely used. They are generally stored internally at 80 bits and then rounded to 64 bits when the result is sent back. This is done to reduce the rounding errors but they do creep in.
____________
BOINC WIKI

BOINCing since 2002/12/8

Peter Wagner
Joined: Feb 24 05
Posts: 3
ID: 34664
Credit: 73,454
RAC: 0
Message 6625 - Posted 2 Mar 2005 12:33:03 UTC - in response to Message 6592.

> > Not sure what calculator you used to come to 1/2=4.9999999999999
> John,
> > but I would throw it in the bin and use some old fashioned paper and a
> pencil.
> > ;)
> >
> You had better throw your computer in the trash then. That is the answer most
> computers will come up with if you are not careful how you program the
> expression. The way I typed it normally expands to
> int(1)/int(2)=float(4.99999999) notice the mixed numeric types. If you force
> all float type numbers float(1.0000000)/float(2.000000)=float(5.000000) it
> will normally get the correct answer.

What do you get when you divide 1 by 2?

your answer : 4.99999999
my answer : one half
my good old slide rule : 0.49999999
my mathematics teacher : there is a solution to your problem
my computer : 42

SCNR
____________

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 6639 - Posted 2 Mar 2005 13:27:01 UTC

Back to real problem, it's depressing, look at my linux hosts
Should I quit contributing to E@H ???
Here
Here
Here
Here
Sobbing...

Scavenger7
Avatar
Joined: Jan 22 05
Posts: 1
ID: 5332
Credit: 109,356
RAC: 0
Message 6660 - Posted 2 Mar 2005 15:47:21 UTC
Last modified: 2 Mar 2005 15:47:39 UTC

I am also having this problem.

Link.


____________

Profile Jindrich
Joined: Feb 20 05
Posts: 1
ID: 23216
Credit: 22,081
RAC: 0
Message 6681 - Posted 2 Mar 2005 18:01:48 UTC - in response to Message 6660.

And one more piece of evidence that this project is too generous when it comes to resources of contributors. Another thing is the fourfold calculation of the same workunit when just two "identical" results are enough to mark the outcome as valid. What about sending 1 WU to only two sites first and use the third site only to select the liar in the case of different results? And if we get three divers reports, well, is it still science? The code really should be robust enough not to allow that. Wijata is right, we probably should wait some time until E@H matures.
____________

Ensor
Avatar
Joined: Feb 9 05
Posts: 48
ID: 11399
Credit: 257,408
RAC: 462
Message 6737 - Posted 2 Mar 2005 23:47:23 UTC - in response to Message 6354.
Last modified: 2 Mar 2005 23:49:12 UTC

Hi,

> Well i guess there are some differences in computation
>between linux and windows versions.
>
> I guess it's time for developers to say something here....

Actually, it seems they are aware of the problem and looking at it. Take a look at this thread.


TTFN - Pete.

____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 6741 - Posted 3 Mar 2005 0:22:48 UTC - in response to Message 6737.

> Actually, it seems they are aware of the problem and looking at it.

We are. It turned out that it's not done by just adjusting some parameters, but we need to change the validator as a whole, which means basically to rewrite it. We are working on this.

BM

____________
BM

Ensor
Avatar
Joined: Feb 9 05
Posts: 48
ID: 11399
Credit: 257,408
RAC: 462
Message 6859 - Posted 3 Mar 2005 18:36:00 UTC - in response to Message 6741.


Hi,

>> Actually, it seems they are aware of the problem and
>>looking at it....
>
> We are. It turned out that it's not done by just adjusting
>some parameters, but we need to change the validator as a whole,
>which means basically to rewrite it. We are working on this.

Ouch, sounds like a big job. I wish you well with it!

Thanks for the information.


TTFN - Pete.

____________

Profile Higgs Boson
Joined: Feb 25 05
Posts: 15
ID: 38687
Credit: 5,891,610
RAC: 5,710
Message 6867 - Posted 3 Mar 2005 19:00:28 UTC

What is the ETA on rewrite, near term or longer term?
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2085
ID: 2
Credit: 25,013,655
RAC: 35,457
Message 6887 - Posted 3 Mar 2005 20:54:09 UTC
Last modified: 3 Mar 2005 20:56:45 UTC

There are people already working on it. I'd expect this to be ready about next week, but usually all that can go wrong will go wrong - so don't bet on it. Let's hope that it goes wrong before we put it on the public server.

BM
____________
BM

Profile Higgs Boson
Joined: Feb 25 05
Posts: 15
ID: 38687
Credit: 5,891,610
RAC: 5,710
Message 6903 - Posted 3 Mar 2005 22:33:27 UTC

Do you need more data? Or shall I redirect their efforts.
____________

FalconFly
Avatar
Joined: Feb 16 05
Posts: 109
ID: 17696
Credit: 4,119,984
RAC: 1
Message 7065 - Posted 4 Mar 2005 20:28:39 UTC - in response to Message 6903.

Glad to hear that the Problem was recognized.

I already have half a dozen Results trashed as "invalid" on my Linux Boxes so far, I was beginning to become concerned (as 4 other Projects validate all their work as correct)
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Profile wijata.com
Avatar
Joined: Feb 11 05
Posts: 113
ID: 14055
Credit: 19,956,888
RAC: 18,325
Message 7376 - Posted 6 Mar 2005 11:30:36 UTC

Yes, most people are glad.
And only one wish more. E@H, please note us when new validator is launched...
Thanx in advance.

Traveller
Joined: Feb 20 05
Posts: 2
ID: 22287
Credit: 50,661
RAC: 0
Message 7459 - Posted 6 Mar 2005 22:49:35 UTC

Hopefully, it will be possilbe to re-validate those results already delivered. All my procesing is on Linux.
____________

Profile Higgs Boson
Joined: Feb 25 05
Posts: 15
ID: 38687
Credit: 5,891,610
RAC: 5,710
Message 7617 - Posted 7 Mar 2005 21:52:37 UTC

My Red Hat system is now getting credit for latest work, it still has 3 as invalid, Xandros is still all invalid. No changes made at my end.
____________

Peter Koek
Joined: Mar 4 05
Posts: 21
ID: 46774
Credit: 322,305
RAC: 191
Message 7971 - Posted 9 Mar 2005 19:07:17 UTC
Last modified: 9 Mar 2005 19:11:55 UTC

I have joined this project this week, because I am very interested in science. However my first result is "invalid". I don't understand why, if 2 computers are given the same input, it should produce the same output. I don't think the OS should have any effect, because the FPU does the calculation and not the OS. But have said this, I see lots of people complaining about invalid results with linux OS's, and I have also a linux OS. I use Redhat 9, with a 2.6.10 kernel. Therefore I think to quit this project again, because it's a waste of electricity if all results of linux OS's are marked as "invalid".

+/- 100 Watt (0.1 kW) of power is used by a PC.
1 KWh costs 0.06 euro
tax (netherlands) 0.08 euro
---------------------------
0.14 euro

24/7 for 1 year results: 365 * 24 * 0.1 * 0.14 = 122.64 euro

Lots of money, therefore I want more feedback why results are marked as "invalid".

____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2167
ID: 12521
Credit: 70,318,296
RAC: 133,692
Message 8023 - Posted 9 Mar 2005 23:36:26 UTC - in response to Message 7971.

> Lots of money, therefore I want more feedback why results are marked as
> "invalid".

Have you actually read the information in this thread?? Here is what Prof Allen has already said about the very issue you are raising.


> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.

We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.

Bruce


This "tight validation" problem is affecting a lot of people running linux. For more information do a keyword search for something like "linux validation" and see what you get.


____________
Cheers,
Gary.

Ensor
Avatar
Joined: Feb 9 05
Posts: 48
ID: 11399
Credit: 257,408
RAC: 462
Message 8603 - Posted 13 Mar 2005 17:32:28 UTC - in response to Message 8023.


Hi,

> This "tight validation" problem is affecting a lot
>of people running linux....

Not just Linux, Windows too....I've had a number of results declared invalid where two or more of the other hosts were running Linux.

Haven't had the problem recently as it seems TPTB have arranged for only machines running the same OS to crunch particular WUs - or is this perhaps coincidence?


TTFN - Pete.

____________

bcorsello
Joined: Feb 20 05
Posts: 3
ID: 22435
Credit: 118,233
RAC: 0
Message 8869 - Posted 16 Mar 2005 2:46:27 UTC

A new linux computer attached to the project yesterday has gotten nothing but zeros, in three attempts.

http://einstein.phys.uwm.edu/results.php?hostid=76355


____________

Profile rklein
Joined: Feb 24 05
Posts: 4
ID: 37005
Credit: 133,507
RAC: 68
Message 8911 - Posted 16 Mar 2005 16:41:41 UTC - in response to Message 8869.
Last modified: 16 Mar 2005 16:48:35 UTC

> A new linux computer attached to the project yesterday has gotten nothing but
> zeros, in three attempts.
>
> http://einstein.phys.uwm.edu/results.php?hostid=76355
>

I bet you get another 0 credits on the workunit with currently pending credits (http://einstein.phys.uwm.edu/workunit.php?wuid=504786). The other returned (from a windows machine) result has a different checksum and the other computers working on the same WU are also Windows boxes.

I'll move to the climate simulations (climateprediction.net) while this problem is not solved.

PoorBoy
Joined: Jan 18 05
Posts: 120
ID: 3373
Credit: 128,685
RAC: 0
Message 8915 - Posted 16 Mar 2005 17:30:27 UTC

I just came back on board a few days ago when LHC went down again & out of 113 WU's of mine that have been granted credit so far none of them had 0 Credits Granted, most of them had more than I requested in fact ...

Jorge Guerra
Joined: Mar 6 05
Posts: 1
ID: 48128
Credit: 94,812
RAC: 60
Message 9524 - Posted 22 Mar 2005 3:13:02 UTC

It's really depresing I have ~61% of my units granted *zero* credits.

Look here

I don't know what to do, it seems like all the work my computer is doing is just going to the trash, I'm starting to belive that it's a conspiracy against linux users :(

Profile W9FZ
Avatar
Joined: Mar 9 05
Posts: 20
ID: 51967
Credit: 1,009,334
RAC: 0
Message 9733 - Posted 23 Mar 2005 16:28:47 UTC
Last modified: 23 Mar 2005 16:29:36 UTC

http://einstein.phys.uwm.edu/workunit.php?wuid=566081

Is a WU where Linux "won". Two linux users got it done first and get credit. The two Windows users got ZERO.

lin1
Fstats.Ha: bytecount 3585214 checksum 172429206
Fstats.Hb: bytecount 3607578 checksum 173564096

lin2
Fstats.Ha: bytecount 3585214 checksum 172429206
Fstats.Hb: bytecount 3607578 checksum 173564096

win1
Fstats.Ha: bytecount 3585214 checksum 172069077
Fstats.Hb: bytecount 3607578 checksum 173200112
"invalid"

win2
Fstats.Ha: bytecount 3585214 checksum 172069077
Fstats.Hb: bytecount 3607578 checksum 173200112
"invalid"

Is this "just the way things will be"? Or is there a better desired end-state?

I would hope for the latter.

Bruce Richardson
Team: Richwood Timber LLC


____________
When the going gets weird, the weird turn PRO. -- Hunter S. Thompson

Profile Robert Somerville
Avatar
Joined: Nov 11 04
Posts: 27
ID: 1873
Credit: 21,819
RAC: 0
Message 9739 - Posted 23 Mar 2005 17:25:33 UTC - in response to Message 6887.

> There are people already working on it. I'd expect this to be ready about next
> week, but usually all that can go wrong will go wrong - so don't bet on it.
> Let's hope that it goes wrong before we put it on the public server.
>
> BM
>

Is there any results from the programmers on speeding up the Linux versions & fixing the validation problems on Linux ??

There is no reason that Linux machines should not be with 10% of Windows machines
if the programming staff has any clue at all ( sorry for being direct about it ), this issue could be related to the validation scheme problems, if you are validating against Windoze machines, perhaps you are striving to hard to match their "C library", when you shouldn't be. GNU C++ is widely regarded as the mopst correct C++ compiler ...........
____________
Robert Somerville

Profile CMHCS
Joined: Feb 9 05
Posts: 5
ID: 8556
Credit: 177,226
RAC: 0
Message 9808 - Posted 24 Mar 2005 7:03:08 UTC - in response to Message 9524.

> It's really depresing I have ~61% of my units granted *zero* credits.

> [url=http://einstein.phys.uwm.edu/results.php?userid=48128]Look
> here[/url]

> I don't know what to do, it seems like all the work my computer is doing is
> just going to the trash, I'm starting to belive that it's a conspiracy against
> linux users :(
>
____________

Profile CMHCS
Joined: Feb 9 05
Posts: 5
ID: 8556
Credit: 177,226
RAC: 0
Message 9811 - Posted 24 Mar 2005 7:32:58 UTC - in response to Message 9808.

> > It's really depresing I have ~61% of my units granted *zero*
> credits.

> > [url=http://einstein.phys.uwm.edu/results.php?userid=48128]Look
> > here[/url]

> > I don't know what to do, it seems like all the work my computer is doing
> is
> > just going to the trash, I'm starting to belive that it's a conspiracy
> against
> > linux users :(
> >
EDIT TO ABOVE: Who the heck is using my account to post messages and claim my results. Their can not be two CMHCS's with the same account keys.......
____________

Profile CMHCS
Joined: Feb 9 05
Posts: 5
ID: 8556
Credit: 177,226
RAC: 0
Message 9812 - Posted 24 Mar 2005 7:37:15 UTC - in response to Message 9811.

> > > It's really depresing I have ~61% of my units granted *zero*
> > credits.

> > > <a> href="http://einstein.phys.uwm.edu/results.php?userid=48128">Look
> > > here[/url]

> > > I don't know what to do, it seems like all the work my computer is
> doing
> > is
> > > just going to the trash, I'm starting to belive that it's a
> conspiracy
> > against
> > > linux users :(
> > >
> EDIT TO ABOVE: Who the heck is using my account to post messages and claim my
> results. Their can not be two CMHCS's with the same account keys.......
>I did not write the post above, I don't even own a Linux machine.......
____________

AK1001
Joined: Mar 9 05
Posts: 4
ID: 50965
Credit: 2,825
RAC: 0
Message 9906 - Posted 24 Mar 2005 20:09:37 UTC

going to quit

Linux 2.4.26-HN-1.6-AMD
AuthenticAMD AMD Athlon(tm) XP 1700+

13 results:
7 with positive granted credit
4 with zero granted credit
2 with validate error

____________

Profile Robert Somerville
Avatar
Joined: Nov 11 04
Posts: 27
ID: 1873
Credit: 21,819
RAC: 0
Message 9949 - Posted 25 Mar 2005 3:04:39 UTC - in response to Message 9906.
Last modified: 25 Mar 2005 3:05:06 UTC

> going to quit
>
> Linux 2.4.26-HN-1.6-AMD
> AuthenticAMD AMD Athlon(tm) XP 1700+
>
> 13 results:
> 7 with positive granted credit
> 4 with zero granted credit
> 2 with validate error
>
>

Install Wine; check out posts from john; i haven't had an invalid result in 2 days & have racked up ~700pts (its twice as fast as linux version) (but not as satisfying )

here's John's post to me ....
also the posts on xvnc if you are logging out of X (do a search on Xvnc)
**********************************
You can do a search on WINE: see also my following answer in another thread I gave.

http://einstein.phys.uwm.edu/forum_thread.php?id=1579

IMHO running under Linux is waste of time; after a couple of approved and credited WU's,all my recent WU's are credited 0 again.
Combine this with the slow running om all platforms but Windows, I decided to install wine also, plse have a look at the follwing site about dll's and installshield issue's and setup will be easy.

http://frankscorner.org/index.php?p=ishield


Plse use the 4.19 version for windows and run 'nice wine boinc_cli.exe -return_results_immediately -allow_remote_gui_rpc'
Do not forget to export the warnings as follows, (before starting boinc !! in the xterm):
export WINEDEBUG="err-all,warn-all,fixme-all,trace-all"
You can have the screensaver then by telnetting to port 127.0.0.1 31416

Results so far are good for most people,

Hopefully there will be a better native solution for specific nonWin platforms

PS.
Without the 'export WINEDEBUG' there will be many errors in your xterm window; they seem to be related with sceduling but they do NOT however influence the results; I have already meny credited WU's and crunching is without errors for 96 hr already.

Succes !!


John


____________
Robert Somerville

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 9953 - Posted 25 Mar 2005 3:44:01 UTC - in response to Message 9949.

> Install Wine; check out posts from john; i haven't had an invalid result in 2
> days & have racked up ~700pts (its twice as fast as linux version) (but
> not as satisfying )

I've had no luck at all getting it to run with wine. I can get through the install, attach and download work, but then they just immediately error with the following stderr output:

[core_client_version]4.19[/core_client_version]
[message] - exit code 99 (0x63)
[/message]
[active_task_state]1[/active_task_state]
[signal]0[/signal]
[stderr_txt]
WARNING: Can't boinc-resolve config file "conf"
Could not open data-file: `./conf`
Level 0: $Id: ComputeFStatistic.c,v 1.232 2005/02/11 17:03:16 ballen Exp $
Function call `LALUserVarReadAllInput(stat,argc,argv)' failed.
file ComputeFStatistic.c, line 436
Level 1: $Id: UserInput.c,v 1.21 2004/11/29 18:38:59 reinhard Exp $
Status code -1: Recursive error
function LALUserVarReadAllInput, file UserInput.c, line 680
Level 2: $Id: UserInput.c,v 1.21 2004/11/29 18:38:59 reinhard Exp $
Status code -1: Recursive error
function LALUserVarReadCfgfile, file UserInput.c, line 463
Level 3: $Id: ConfigFile.c,v 1.15 2004/11/03 23:52:05 reinhard Exp $
Status code 2: File error.
function LALParseDataFile, file ConfigFile.c, line 188
BOINC_ERR_EXIT: now calling boinc_finish()
[/stderr_txt>]

And I am not a computer anything except user - so that's all nothing but plain old jibberish to me.

I'm in the same spot as AK1001 at this point - I'm surrendering. I have 2 systems that are dual boot. One of them is my primary system, so I boot it into windows when I go to bed, but I can't stand to use windows so I have to boot it back into linux when I get up. My others are linux only and sit off in a closet with me only accessing them with ssh, so booting them into windows isn't an option.

Between 3 different issues it's driving me insane trying to keep einstein going. Most annoying is that einstein rarely restarts after a project change. Then there's the issue of the linux client being so slow, and then combine that with the fact that a lot of the work gets marked invalid after all that, and it's just a bit much.

I've set all my systems today to an alternate profile that will stop any further work from downloading. Once they've finished what they have now, I'll detatch them from einstein and just let them keep going with projects that aren't giving them so much grief. Not what I want to do, but I'm used to checking them every couple weeks and finding everything going ok, not having to check them every couple hours and finding problems.


____________

Profile Robert Somerville
Avatar
Joined: Nov 11 04
Posts: 27
ID: 1873
Credit: 21,819
RAC: 0
Message 10074 - Posted 25 Mar 2005 22:12:20 UTC - in response to Message 9953.

> > Install Wine; check out posts from john; i haven't had an invalid result
> in 2
> > days & have racked up ~700pts (its twice as fast as linux version)
> (but
> > not as satisfying )
>

i used the 4.19 version of boinc, which version did you install ??
____________
Robert Somerville

Profile Darren
Avatar
Joined: Jan 18 05
Posts: 94
ID: 2400
Credit: 53,420
RAC: 0
Message 10076 - Posted 25 Mar 2005 22:22:45 UTC - in response to Message 10074.

> i used the 4.19 version of boinc, which version did you install ??

That was with 4.19. From what I've read anything newer isn't working with wine for anyone, so all of my attempts were with 4.19.


____________

Cougarky
Joined: Feb 20 05
Posts: 6
ID: 20707
Credit: 279,334
RAC: 260
Message 10085 - Posted 26 Mar 2005 0:05:27 UTC

I am using a win2000 PC & 98% of the WU are valid. I am also using Fedora core 3 on my server, which is hyperthreaded, & I have 40% of the WU are valid. At first I thought it was the hyperthreading, so I cut it down to use one processor & that has stepped it valid results to about 50%.
____________


Joined: Feb 23 05
Posts: 11
ID: 32853
Credit: 20,302
RAC: 0
Message 10196 - Posted 26 Mar 2005 23:10:24 UTC

i'm about to take one machine offline, which started to turn out a lot of zero-score results (about 60..70%) all of a sudden two weeks ago. wine is no option on that machine. That's an athlon XP 2400+, running linux (debian, kernel 2.4.27). Other machines which i have commited include athlon XP 1700+ to XP 2600+, all of them running linux, but as far only this machine exhibits the problem: http://einstein.phys.uwm.edu/results.php?hostid=39647


Joined: Feb 23 05
Posts: 11
ID: 32853
Credit: 20,302
RAC: 0
Message 10263 - Posted 27 Mar 2005 14:10:29 UTC

Good bye, it was fun. I'm not willing anymore to commit time to a project which favors windows machines.

____________

Ben Christy
Avatar
Joined: Mar 6 05
Posts: 40
ID: 47979
Credit: 20,891
RAC: 0
Message 10372 - Posted 28 Mar 2005 19:59:16 UTC

So CaffeineJunkie, let me get this straight, if you stub your toe you will cut off the leg? Why not use a little salve instead?

You admit it only happens on one machine so maybe that machine needs a little help?

Did you try reinstalling Boinc?

What other differences are there between machines that work and the one that doesn't?

It might be a bug....

... or maybe someone just set Cancel_Credit=YES?hostid=39647 ;)

____________
==========================================
a Chicago user who likes to be usefull

dondrusco
Joined: Feb 27 05
Posts: 3
ID: 41726
Credit: 2,350,317
RAC: 1,269
Message 10555 - Posted 31 Mar 2005 2:00:06 UTC - in response to Message 5920.

> @wijata.com: Please post the Result ids or names, or at least the id of the
> machine you did this on. I'll take a look at this.
>
> BM
>
I have the same problem - no OC, no Ctrl C. Report time was 1st of April.

Result ID Comp ID
2400876 87327 25 Mar 2005 2:26:29 UTC 31 Mar 2005 1:14:36 UTC Over Success Done 38,431.24 87.68 0.00

thx
Drusco

rainmaker
Joined: Mar 21 05
Posts: 1
ID: 62488
Credit: 3,518
RAC: 0
Message 10716 - Posted 1 Apr 2005 20:28:30 UTC

I think, that Einstein seems to be a (very) little bit buggy, because also to me it happens, that a lot of Units are "state: invalid" It takes a day of CPU to get no credit - and this is very frustrating. I should better go back to seti, because CPU ist better used. Dont think, that Einstein people dosn't need invalid results too. (btw: the machine ist linux, no additional boinc project running - no crashes up and runnig for a couple of month. therefore the calculation in the applications sometimes runs into rubbish.... :-(
rainy

Message boards : Problems and Bug Reports : granted credit zero


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2010 Bruce Allen for the LIGO Scientific Collaboration