Validate errors


Advanced search

Message boards : Problems and Bug Reports : Validate errors

Sort
AuthorMessage
arcturus
Joined: Feb 11 05
Posts: 46
ID: 14642
Credit: 493,256
RAC: 3
Message 52302 - Posted 8 Nov 2006 15:32:48 UTC

Getting a bunch of 'validate errors'

http://einstein.phys.uwm.edu/results.php?hostid=790150

Project was reset but errors continue.

Rosetta & Seti on same box show no problems. Suggestions?

Thx.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 52305 - Posted 8 Nov 2006 16:40:22 UTC

You've got infinite numbers in the result files. This usually points to a hardware problem, CPU getting too hot, memory fault etc.

BM
____________
BM

arcturus
Joined: Feb 11 05
Posts: 46
ID: 14642
Credit: 493,256
RAC: 3
Message 52309 - Posted 8 Nov 2006 17:32:08 UTC

Ok thanks, i'll chalk it up then to a project which stresses hardware more than the others mentioned and/or has more stringent verification.

Profile Pooh Bear 27
Avatar
Joined: Mar 20 05
Posts: 1330
ID: 61731
Credit: 3,487,843
RAC: 1,967
Message 52313 - Posted 8 Nov 2006 19:25:06 UTC

Correct, this project is probably among the strictest in verification. Bernd is one of the developers, and knows his software very well, so he's given you the exact reason you are getting validate errors. The issues could result from any of a number of things.

I would do the simple things first. Clean, clean, clean. Make sure every fan is clean, the heat sink is clean, the heat sink is not loose, etc. Then do some of the simple tests, like Memtest86+ and Prime95.


____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 52324 - Posted 8 Nov 2006 22:34:15 UTC - in response to Message 52309.
Last modified: 8 Nov 2006 22:39:22 UTC

Ok thanks, i'll chalk it up then to a project which stresses hardware more than the others mentioned and/or has more stringent verification.

I don't know about Rosetta, but we are definitely more picky than SETI concerning the results. And we are strictly CPU-bound, which puts more load on the CPU (actually FPU) itself than e.g. the memory, which might also be different to other projects.

BM
____________
BM

Profile Krzychu P.
Joined: Mar 23 05
Posts: 3
ID: 64307
Credit: 28,143
RAC: 0
Message 53317 - Posted 16 Nov 2006 11:42:41 UTC

I also get 'validate error:

http://einstein.phys.uwm.edu/results.php?userid=64307

What's wrong?
____________

Wurgl (speak^Wcrunching for Special: Off-Topic)
Joined: Feb 11 05
Posts: 304
ID: 12758
Credit: 2,979,164
RAC: 6,192
Message 53328 - Posted 16 Nov 2006 14:00:06 UTC - in response to Message 52305.

You've got infinite numbers in the result files. This usually points to a hardware problem, CPU getting too hot, memory fault etc.


A lot of bad results can be found in this thread: Different CPU, different OS.
http://einstein.phys.uwm.edu/forum_thread.php?id=5066

Really strange!

Jayargh
Avatar
Joined: Feb 9 05
Posts: 64
ID: 7715
Credit: 1,162,720
RAC: 537
Message 53378 - Posted 16 Nov 2006 20:59:41 UTC
Last modified: 16 Nov 2006 21:12:40 UTC

I am getting validate errors also from my new woodcrest(stock no ock) and receiving wu's that have had validate errors. Now I haven't had but 1 validate error in almost 2 years here What Gives? Should I abort the ones that have not validated? I hate crunching long units when I see a problem ahead of time....I don't think it is the hosts ,I think Bernd you need to look at your validator as too many of these are popping up at once.If this continues I will suspend the project and crunch elsewheres. Please respond... Thanks JR
____________

Profile Vanessa
Avatar
Joined: Sep 7 06
Posts: 717
ID: 212625
Credit: 250,873
RAC: 0
Message 53384 - Posted 16 Nov 2006 21:40:56 UTC
Last modified: 16 Nov 2006 22:12:42 UTC

I do get Validate errors as well.
Strange that 4 machines have validate errors on the very same WU and now two more machines are working on it (maybe they come up with validate errors as well): see here

there are more WU like this but mostly (only) two machines with validate errors so far:
http://einstein.phys.uwm.edu/workunit.php?wuid=18982777
http://einstein.phys.uwm.edu/workunit.php?wuid=18980247
http://einstein.phys.uwm.edu/workunit.php?wuid=18994947
http://einstein.phys.uwm.edu/workunit.php?wuid=18640632

and even more where my machines are the only ones with Validate errors so far.

I find this quite bizzare, especially the first one, where all machines seem to agree that the WU is a validate error in itself ;)

btw.. in the first two month I did not experience any validate errors either and now find them sporadically happen on different machines of mine.


____________

Profile butz
Joined: Jun 19 06
Posts: 7
ID: 200288
Credit: 203,146
RAC: 0
Message 53393 - Posted 16 Nov 2006 23:10:29 UTC
Last modified: 16 Nov 2006 23:21:49 UTC

Hi, i also got some validate errors on my machines.


machine 1
http://einstein.phys.uwm.edu/workunit.php?wuid=18856427
http://einstein.phys.uwm.edu/workunit.php?wuid=18848997
http://einstein.phys.uwm.edu/workunit.php?wuid=18842557
http://einstein.phys.uwm.edu/workunit.php?wuid=18835952
machine 2
http://einstein.phys.uwm.edu/workunit.php?wuid=18898382
http://einstein.phys.uwm.edu/workunit.php?wuid=18893997
machine 3
http://einstein.phys.uwm.edu/workunit.php?wuid=18936577


All my machines broken????
The problem i think is not by my side... :(
____________

Ziran
Avatar
Joined: Nov 26 04
Posts: 195
ID: 2042
Credit: 54,833
RAC: 0
Message 53408 - Posted 17 Nov 2006 0:14:57 UTC
Last modified: 17 Nov 2006 0:17:41 UTC

All hosts are getting validating errors on These WU’s.

http://einstein.phys.uwm.edu/workunit.php?wuid=19049072
http://einstein.phys.uwm.edu/workunit.php?wuid=19065962
http://einstein.phys.uwm.edu/workunit.php?wuid=19070317
http://einstein.phys.uwm.edu/workunit.php?wuid=19057172

All this WU’s are from the same data file.
My host is the sixth result on this WU l1_0297.0_S5R1__147_S5R1a.

One other strange thing is that we are a lot of fast hosts doing this data file with short results.

____________
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 53416 - Posted 17 Nov 2006 1:19:52 UTC
Last modified: 18 Nov 2006 17:29:57 UTC

Dear Einstein@Home participants,

I'm sorry to say that after more than 150 days of trouble-free project operation, one of our five validator instances went on the rampage at about 10:30 UTC 16-11-2006. I learned about this from the message boards and from an email (thanks Timothy!) around 00:15 UTC 17-11-2006. I restarted the errant validator and all is well again.

My apologies for this glitch - I don't know what went wrong on the server side but experience shows that in spite of our best efforts to run a stable project, these things do sometimes happen.

Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above.

[Subsequent edit/addition]: another validator instance started to show the same problem the next day. I have restarted ALL the validator instances, and granted credit for the second round of problematic results as well.

I apologize again,

Bruce Allen
____________

Profile roadrunner_gs
Joined: Mar 7 06
Posts: 94
ID: 178622
Credit: 2,737,776
RAC: 0
Message 53464 - Posted 17 Nov 2006 6:22:56 UTC

Thanks a lot for the information.
So we haven't to report here to get our credit because it would be (re)granted on an automated base?
____________

ExtraTerrestrial Apes
Avatar
Joined: Nov 10 04
Posts: 61
ID: 515
Credit: 3,558,640
RAC: 3,079
Message 53471 - Posted 17 Nov 2006 9:23:32 UTC

Bruce, thanks for the information and the very good handling of the problem - we can't ask for anything more!

MrS
____________
Scanning for our furry friends since Jan 2002

Dimmerjas
Joined: Jul 6 05
Posts: 28
ID: 93691
Credit: 343,971
RAC: 0
Message 53482 - Posted 17 Nov 2006 10:50:51 UTC - in response to Message 53416.


My apologies for this glitch - I don't know what went wrong on the server side but experience shows that in spite of our best efforts to run a stable project, these things do sometimes happen.

Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above.

I apologize again,

Bruce Allen


No need to apologize. Things like this happens. App's, that have worked perfectly, can suddenly "stock". And RAM can get "tired". And the only way out, is to reboot. So the system is fresh again.
I have tried several other project's, but E@H is the most stable project. And you are fast to solve problems, there might be. So I'm only running E@H on my computers.

Profile Pooh Bear 27
Avatar
Joined: Mar 20 05
Posts: 1330
ID: 61731
Credit: 3,487,843
RAC: 1,967
Message 53490 - Posted 17 Nov 2006 11:43:31 UTC

You are welcome Dr. Allen, and thank you for looking into the situation and repairing the rogue validator. I believe everyone appreciates the stability of this project and can accept a glitch once every few months, especially since it was such a minor glitch and easily repaired.

____________

Profile The Ancient One
Joined: Feb 9 05
Posts: 4
ID: 10033
Credit: 78,378
RAC: 373
Message 53507 - Posted 17 Nov 2006 13:09:13 UTC - in response to Message 52305.

You've got infinite numbers in the result files. This usually points to a hardware problem, CPU getting too hot, memory fault etc.

BM



Hi,
When was the last time you cleaned out your computer? It sounds like your processors thermal system is clogged with dust. Yes all computers suffer from this. I have to clean mine out at least ever six months to prevent this from happening.

Regards
James
____________
"All man born has a right to life and no man born has the right to take that life"

Profile The Ancient One
Joined: Feb 9 05
Posts: 4
ID: 10033
Credit: 78,378
RAC: 373
Message 53510 - Posted 17 Nov 2006 13:13:57 UTC - in response to Message 53416.

Dear Einstein@Home participants,

I'm sorry to say that after more than 150 days of trouble-free project operation, one of our five validator instances went on the rampage at about 10:30 UTC 16-11-2006. I learned about this from the message boards and from an email (thanks Timothy!) around 00:15 UTC 17-11-2006. I restarted the errant validator and all is well again.

My apologies for this glitch - I don't know what went wrong on the server side but experience shows that in spite of our best efforts to run a stable project, these things do sometimes happen.

Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above.

I apologize again,

Bruce Allen



Hi Bruce,

It appears that your not the only ones having this problem. Spinhendge@home are having simular problems (for the last couple of weeks). Maybe you could help each other out to resolve this issue.

Regards
James

____________
"All man born has a right to life and no man born has the right to take that life"

Averill
Joined: Oct 19 06
Posts: 1
ID: 223987
Credit: 71,547
RAC: 0
Message 53532 - Posted 17 Nov 2006 14:57:25 UTC

I just left my compiter witha cleaning programme on it -
when I cam back everything had disappeared!
Then when I logged on here the whole thing had gone down ---
It wan't me was it !!! LOL!!
____________

Profile Vanessa
Avatar
Joined: Sep 7 06
Posts: 717
ID: 212625
Credit: 250,873
RAC: 0
Message 53538 - Posted 17 Nov 2006 15:45:59 UTC - in response to Message 53416.

Dear Einstein@Home participants,

I'm sorry to say ....
<snip>
I apologize again,

Bruce Allen


No need to be sorry or apologize twice. Well done in solving the problem extremely fast.

thanks for that

____________

Jayargh
Avatar
Joined: Feb 9 05
Posts: 64
ID: 7715
Credit: 1,162,720
RAC: 537
Message 53556 - Posted 17 Nov 2006 19:05:05 UTC
Last modified: 17 Nov 2006 19:06:23 UTC

Please look at the validator again because this one just showed up
http://einstein.phys.uwm.edu/workunit.php?wuid=19025946
Both reported AFTER you thought you solved the problem less than an hour ago Thanks JR

Fuzzy Duck
Joined: Dec 3 05
Posts: 40
ID: 137377
Credit: 278,885
RAC: 0
Message 53557 - Posted 17 Nov 2006 19:07:56 UTC

Thankfully a minor problem easily solved.

Having being a fairly keen DC player since 99, I must comment that E@H is remarkably stable. Well done, and your efforts are most appreciated!

FD
____________

Profile Pooh Bear 27
Avatar
Joined: Mar 20 05
Posts: 1330
ID: 61731
Credit: 3,487,843
RAC: 1,967
Message 53558 - Posted 17 Nov 2006 19:20:48 UTC - in response to Message 53556.

Please look at the validator again because this one just showed up
http://einstein.phys.uwm.edu/workunit.php?wuid=19025946
Both reported AFTER you thought you solved the problem less than an hour ago Thanks JR

I am betting it was uploaded and validated, then you reported it, which made the validation show up already invalid.

____________

Jayargh
Avatar
Joined: Feb 9 05
Posts: 64
ID: 7715
Credit: 1,162,720
RAC: 537
Message 53559 - Posted 17 Nov 2006 19:29:10 UTC - in response to Message 53558.
Last modified: 17 Nov 2006 19:35:52 UTC

Please look at the validator again because this one just showed up
http://einstein.phys.uwm.edu/workunit.php?wuid=19025946
Both reported AFTER you thought you solved the problem less than an hour ago Thanks JR

I am betting it was uploaded and validated, then you reported it, which made the validation show up already invalid.


No Pooh Bear I have a constant connection and use trux client report immediately option....look at my host list I have reported 3 wu's since Bruce thought he fixed the problem

Jayargh
Avatar
Joined: Feb 9 05
Posts: 64
ID: 7715
Credit: 1,162,720
RAC: 537
Message 53566 - Posted 17 Nov 2006 20:33:00 UTC
Last modified: 17 Nov 2006 21:30:28 UTC

now 2 more

http://einstein.phys.uwm.edu/workunit.php?wuid=19075676
http://einstein.phys.uwm.edu/workunit.php?wuid=19075681
Seems that validator needs more than a reboot...

Profile roadrunner_gs
Joined: Mar 7 06
Posts: 94
ID: 178622
Credit: 2,737,776
RAC: 0
Message 53590 - Posted 17 Nov 2006 22:35:54 UTC

Here too:

http://einstein.phys.uwm.edu/workunit.php?wuid=18997936
http://einstein.phys.uwm.edu/workunit.php?wuid=18998236
http://einstein.phys.uwm.edu/workunit.php?wuid=18697171

that is not good...
____________

Dimmerjas
Joined: Jul 6 05
Posts: 28
ID: 93691
Credit: 343,971
RAC: 0
Message 53595 - Posted 17 Nov 2006 23:03:09 UTC

These 4 have been send out, and returned, today. My hosts will do a re-run on them. So I don't know yet, if I get a Validate error as well.

http://einstein.phys.uwm.edu/workunit.php?wuid=19152971
http://einstein.phys.uwm.edu/workunit.php?wuid=19126086
http://einstein.phys.uwm.edu/workunit.php?wuid=19134626
http://einstein.phys.uwm.edu/workunit.php?wuid=19136351

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 53597 - Posted 17 Nov 2006 23:11:05 UTC

OK, I have restarted all five instances of the validator: one of the OTHER instances was showing problems also. I hope this fixes it. I will modify the credit-granting script to give credit for these results also.

Bruce
____________

Jayargh
Avatar
Joined: Feb 9 05
Posts: 64
ID: 7715
Credit: 1,162,720
RAC: 537
Message 53613 - Posted 17 Nov 2006 23:41:24 UTC - in response to Message 53597.
Last modified: 17 Nov 2006 23:41:58 UTC

OK, I have restarted all five instances of the validator: one of the OTHER instances was showing problems also. I hope this fixes it. I will modify the credit-granting script to give credit for these results also.

Bruce


Thank-you Bruce for responding quickly and fixing again. It seems to be something to keep an eye on even though it is the weekend.

Its not only the credit not immediately granted but also that your result won't count and the waste of cpu power/time to send it back out.

Profile paul milton
Avatar
Joined: Sep 16 05
Posts: 191
ID: 109635
Credit: 435,032
RAC: 1,374
Message 53733 - Posted 18 Nov 2006 1:19:51 UTC - in response to Message 53613.

Mr. allen i cant say it any better than the others here. this is one of the best projects imo some of the others could learn a thing or two from you guys! thank you sir!
____________
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Graham G3ZOD
Joined: Jul 11 06
Posts: 12
ID: 203551
Credit: 215,361
RAC: 197
Message 53809 - Posted 18 Nov 2006 9:32:06 UTC - in response to Message 53416.

Would the credit show up on the web pages? Doesn't seem to be there, e.g.:
http://einstein.phys.uwm.edu/workunit.php?wuid=19065442

Graham.

...Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above...

Profile roadrunner_gs
Joined: Mar 7 06
Posts: 94
ID: 178622
Credit: 2,737,776
RAC: 0
Message 53819 - Posted 18 Nov 2006 10:30:38 UTC - in response to Message 53809.
Last modified: 18 Nov 2006 10:31:07 UTC

Would the credit show up on the web pages? Doesn't seem to be there, e.g.:
http://einstein.phys.uwm.edu/workunit.php?wuid=19065442

Graham.

...Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above...


Maybe that needs some times?
I think script must be writen to re-grant the credit lost for false-invalidated WUs otherwise everyone with invalid WUs during the error time would become his credit re-granted.
____________

Profile tahanko
Joined: Feb 28 06
Posts: 16
ID: 175859
Credit: 184,885
RAC: 2,483
Message 53842 - Posted 18 Nov 2006 11:20:12 UTC - in response to Message 53819.

Maybe that needs some times?
I think script must be writen to re-grant the credit lost for false-invalidated WUs otherwise everyone with invalid WUs during the error time would become his credit re-granted.


I have also 2 units false-invalited, I also think we should wait for the new script to grant us credit
____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 53886 - Posted 18 Nov 2006 12:09:36 UTC - in response to Message 53809.
Last modified: 18 Nov 2006 12:15:09 UTC

Would the credit show up on the web pages? Doesn't seem to be there, e.g.:
http://einstein.phys.uwm.edu/workunit.php?wuid=19065442

Graham.

...Credit hounds (and others): I am granting credit for ALL those results which were marked as invalid or validate error during the time window above...


Actually you had ALREADY been granted credit for the result, but the web page 'workunit.php' was not displaying this properly. If you had clicked on the result in question then you would have seen the granted credit on that next web page that would have been displayed.

Anyway I have modified the 'workunit.php' web page so that it now correctly shows the granted credit.

Cheers,
Bruce
____________

Profile tahanko
Joined: Feb 28 06
Posts: 16
ID: 175859
Credit: 184,885
RAC: 2,483
Message 54192 - Posted 18 Nov 2006 17:01:30 UTC - in response to Message 53886.

Actually you had ALREADY been granted credit for the result, but the web page 'workunit.php' was not displaying this properly. If you had clicked on the result in question then you would have seen the granted credit on that next web page that would have been displayed.

Anyway I have modified the 'workunit.php' web page so that it now correctly shows the granted credit.

Cheers,
Bruce


evrything is ok, thanks for fixing it
____________

Alinator
Joined: May 8 05
Posts: 857
ID: 79809
Credit: 655,584
RAC: 1,291
Message 54253 - Posted 18 Nov 2006 17:41:49 UTC - in response to Message 53559.
Last modified: 18 Nov 2006 17:42:17 UTC

Please look at the validator again because this one just showed up
http://einstein.phys.uwm.edu/workunit.php?wuid=19025946
Both reported AFTER you thought you solved the problem less than an hour ago Thanks JR

I am betting it was uploaded and validated, then you reported it, which made the validation show up already invalid.


No Pooh Bear I have a constant connection and use trux client report immediately option....look at my host list I have reported 3 wu's since Bruce thought he fixed the problem


One other problem here is that merely uploading the result doesn't trigger validation, the client has to report the result as well before it's submitted to the validator.

Alinator

Graham G3ZOD
Joined: Jul 11 06
Posts: 12
ID: 203551
Credit: 215,361
RAC: 197
Message 54569 - Posted 19 Nov 2006 9:31:26 UTC - in response to Message 53886.

Aha! Many thanks, Bruce.

Graham.

Actually you had ALREADY been granted credit for the result, but the web page 'workunit.php' was not displaying this properly. If you had clicked on the result in question then you would have seen the granted credit on that next web page that would have been displayed.

Anyway I have modified the 'workunit.php' web page so that it now correctly shows the granted credit.

Dimitris
Joined: Oct 19 06
Posts: 4
ID: 224113
Credit: 148,900
RAC: 0
Message 55573 - Posted 21 Nov 2006 10:37:47 UTC

Looks like these (15 to 17 Nov) were the lucky days for hosts like this one, getting huge credit for nothing, with tens of results like this:

Outcome - Client state - CPU time - claimed credit - granted credit
Client error - Compute error - 68.52 - 0.21 - 12.33
Client error - Compute error - 69.58 - 0.19 - 12.33
Client error - Compute error - 72.89 - 0.22 - 12.33

A sample WU (the host in error is the first):
Outcome - Client state - CPU time - claimed credit - granted credit
Client error - Compute error - 72.89 - 0.22 - 12.33
Validate error – Done - 3,273.28 - 12.33 - 12.33
Validate error – Done - 10,131.96 - 12.33 - 12.33
Validate error – Done - 5,065.23 - 12.33 - 12.33
Validate error – Done - 8,220.56 - 12.33 - 12.33
Success – Done - 3,109.99 - 12.33 - 12.33
Success – Done - 2,773.07 - 12.33 - 12.33

If this is due to the validator running in a special mode then it’s fine. But if it is "normal", I can see a problem there. A lot of people would be tempted to use an erratic ultra fast client.

Message boards : Problems and Bug Reports : Validate errors


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration