validate errors

log in

Advanced search

Message boards : Cruncher's Corner : validate errors

1 · 2 · Next
Author Message
Voyager
Avatar
Send message
Joined: 9 Feb 05
Posts: 6
Credit: 108,614
RAC: 0
Message 75973 - Posted: 15 Oct 2007, 19:52:59 UTC

could someone explain whats happened? the one thats not finished yet is suppended. should i abort? why process wus already with validate errors?

Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 75974 - Posted: 15 Oct 2007, 19:56:42 UTC
Last modified: 15 Oct 2007, 20:00:03 UTC

They had database trouble today, and are fixing all the erroneous validate errors even as we speak.

Best thing to do is resume the work you have onboard and just let it run. Last time I checked most of the backend processes were still disabled, so you may run out of work temporarily, but it should all take care of itself once they get everything straightened out again.

Alinator

Profile Bruce Allen
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 15 Oct 04
Posts: 1105
Credit: 171,768,817
RAC: 0
Message 75979 - Posted: 15 Oct 2007, 22:13:15 UTC
Last modified: 15 Oct 2007, 22:19:45 UTC

Here is a quick summary of what happened in the past 8 hours:

An admin mistake (SQL command update result set outcome=6;validate_state=2 where id=84114386;) accidentally set all the results in the database into an outcome=validate error state (the first semicolon in the command should be a comma!).

I have corrected these as best as I could. There may be a few hundred results which are not quite in the correct state. Please bear with me while I correct these over the next few days.

I have modified the reporting deadlines for any results that were due in the past 8 hours or the next 4 hours, advancing these deadlines by 12 hours. So results will not be marked as late because of this project downtime.

Hopefully my database repairs will be effective and most Einstein@Home contributors should not notice any problems or unusual behavior with the project.

Cheers,
Bruce
____________

Alinator
Send message
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0
Message 75982 - Posted: 15 Oct 2007, 22:45:30 UTC
Last modified: 15 Oct 2007, 22:58:58 UTC

Well thanks for the update Dr. Allen.

I checked over my account and I don't seem to have any collateral domage to report. Completed, pendings and in progress all seem to be in the correct state.

I've even had one complete and report since the backend came back up (although it had probably been waiting to report for a few hours at least).

<edit> LOL... you have to hate those punctuation errors in command lines though!

<edit2> BTW, if you're going to be in working on database records anyway, I have this task on one of my old timers. It's a reissue from S5R2, but it's one of the long ones and should have never gotten sent to this host at all. However, I have about 480 hours on it and it will complete fine except I need about 2 more weeks to complete it (November 3rd would be fine). That way you don't have to reissue another S5R2 and this old timer can get credit for 5 weeks hard crunchin'! TIA. ;-)

Alinator

Brian Cook (KI4HLW)
Send message
Joined: 5 Sep 07
Posts: 1
Credit: 1,308,657
RAC: 0
Message 75983 - Posted: 15 Oct 2007, 22:57:15 UTC
Last modified: 15 Oct 2007, 22:57:52 UTC

Is this one of those errors? Notice I got no credit while 2 others have some, but my results seem ok.

http://einstein.phys.uwm.edu/workunit.php?wuid=34957517
____________

Profile Bruce Allen
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 15 Oct 04
Posts: 1105
Credit: 171,768,817
RAC: 0
Message 75984 - Posted: 15 Oct 2007, 23:21:18 UTC - in response to Message 75983.
Last modified: 15 Oct 2007, 23:21:58 UTC

Is this one of those errors? Notice I got no credit while 2 others have some, but my results seem ok.

http://einstein.phys.uwm.edu/workunit.php?wuid=34957517


Yes, that was my mistake. This was one of 131 results that I should have left as 'outcome=validation errors' but in my haste I changed this to 'outcome=success'.

I have fixed these 131 results (including yours).

Thanks for pointing it out!

Cheers,
Bruce
____________
Profile Pooh Bear 27
Avatar
Send message
Joined: 20 Mar 05
Posts: 1381
Credit: 20,312,671
RAC: 0
Message 75987 - Posted: 16 Oct 2007, 0:00:32 UTC

Is this one of the mistakes? http://einstein.phys.uwm.edu/workunit.php?wuid=34921280

Profile Bruce Allen
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 15 Oct 04
Posts: 1105
Credit: 171,768,817
RAC: 0
Message 75989 - Posted: 16 Oct 2007, 0:31:53 UTC - in response to Message 75987.

Is this one of the mistakes? http://einstein.phys.uwm.edu/workunit.php?wuid=34921280


This appears to be a genuine error in the result.

Bruce


____________
Jonathan
Send message
Joined: 6 Nov 06
Posts: 9
Credit: 215,358
RAC: 0
Message 75994 - Posted: 16 Oct 2007, 3:28:55 UTC

No "'finished' file"? This is a first for me--all part of the error? Bits of the log file follow:

10/15/07 12:56:41||Starting BOINC client version 5.10.7 for windows_intelx86
10/15/07 12:56:41||log flags: task, file_xfer, sched_ops
10/15/07 12:56:41||Libraries: libcurl/7.16.1 OpenSSL/0.9.8e zlib/1.2.3
10/15/07 12:56:41||Data directory: C:\\Program Files\\BOINC
10/15/07 12:56:58||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz [x86 Family 6 Model 15 Stepping 6]
10/15/07 12:56:58||Processor features: fpu tsc pae nx sse sse2 mmx
10/15/07 12:56:58||Memory: 2.00 GB physical, 3.85 GB virtual
10/15/07 12:56:58||Disk: 79.17 GB total, 54.34 GB free
10/15/07 12:56:58|Einstein@Home|URL: http://einstein.phys.uwm.edu/; Computer ID: 882874; location: work; project prefs: work


10/15/07 21:34:44|Einstein@Home|Restarting task h1_0314.35_S5R2__43_S5R3a_2 using einstein_S5R3 version 407

10/15/07 22:32:25|Einstein@Home|Task h1_0314.35_S5R2__43_S5R3a_2 exited with zero status but no 'finished' file
10/15/07 22:32:25|Einstein@Home|If this happens repeatedly you may need to reset the project.

10/15/07 22:33:13|Einstein@Home|Restarting task h1_0314.35_S5R2__43_S5R3a_2 using einstein_S5R3 version 407
10/15/07 23:13:59||Running CPU benchmarks
10/15/07 23:13:59||Suspending computation - running CPU benchmarks
10/15/07 23:14:31||Benchmark results:
10/15/07 23:14:31|| Number of CPUs: 1
10/15/07 23:14:31|| 1659 floating point MIPS (Whetstone) per CPU
10/15/07 23:14:31|| 3090 integer MIPS (Dhrystone) per CPU
10/15/07 23:14:32||Resuming computation


Jonathan

Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3611
Credit: 128,446,844
RAC: 56,568
Message 75995 - Posted: 16 Oct 2007, 5:47:30 UTC - in response to Message 75989.

Is this one of the mistakes? http://einstein.phys.uwm.edu/workunit.php?wuid=34921280


This appears to be a genuine error in the result.

Actually this looks like a bug in the 4.07 App, probably related to the "new checkpointing code", so the 4.09 might have it, too.

BM
Profile Pooh Bear 27
Avatar
Send message
Joined: 20 Mar 05
Posts: 1381
Credit: 20,312,671
RAC: 0
Message 76002 - Posted: 16 Oct 2007, 10:07:21 UTC - in response to Message 75995.

Is this one of the mistakes? http://einstein.phys.uwm.edu/workunit.php?wuid=34921280


This appears to be a genuine error in the result.

Actually this looks like a bug in the 4.07 App, probably related to the "new checkpointing code", so the 4.09 might have it, too.

BM

Then I am glad I brought it up. Something more for you guys to work on.

Thanks for both your updated, Dr. Allen and Bernd (are you a Dr. also?).
Colin Porter
Send message
Joined: 15 Feb 05
Posts: 21
Credit: 4,583,733
RAC: 0
Message 76009 - Posted: 16 Oct 2007, 13:18:01 UTC - in response to Message 75979.

Here is a quick summary of what happened in the past 8 hours:

An admin mistake (SQL command update result set outcome=6;validate_state=2 where id=84114386;) accidentally set all the results in the database into an outcome=validate error state (the first semicolon in the command should be a comma!).


Show me someone who say's they have not done that kind of thing and I'll show you a liar.

Looks like you have done a good job of recovery and also a big thanks from me for
running such a stable and trouble free project - From the crunchers point of view. I can imagine it gives you a few headaches though.
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,236,603
RAC: 116,043
Message 76010 - Posted: 16 Oct 2007, 14:25:03 UTC

So true. This is the first downtime I can remember for quite some time, and most participants probably didn't even notice it because of work caches.

CU
H-BE
____________

Profile Bruce Allen
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 15 Oct 04
Posts: 1105
Credit: 171,768,817
RAC: 0
Message 76014 - Posted: 16 Oct 2007, 15:52:02 UTC - in response to Message 76009.


An admin mistake (SQL command update result set outcome=6;validate_state=2 where id=84114386;) accidentally set all the results in the database into an outcome=validate error state (the first semicolon in the command should be a comma!).


Show me someone who say's they have not done that kind of thing and I'll show you a liar.

Looks like you have done a good job of recovery and also a big thanks from me for
running such a stable and trouble free project - From the crunchers point of view.


Thank you very much for the kind comments. We try hard not to make mistakes, but we're human!

Cheers,
Bruce

____________
Annika
Avatar
Send message
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0
Message 76018 - Posted: 16 Oct 2007, 16:25:04 UTC

It happens. Reminds me of some server mistakes I made when I was really tired. Great job getting it fixed so quickly!

PovAddict
Avatar
Send message
Joined: 31 Mar 05
Posts: 44
Credit: 1,066,609
RAC: 0
Message 76032 - Posted: 17 Oct 2007, 0:17:48 UTC - in response to Message 75979.

You'll get to love the --i-am-a-dummy mysql client setting. Also available with a less offensive name under --safe-updates. If you do an UPDATE without a WHERE clause, it will give an error. Saved my a** a couple of times.

I think you can set it in my.ini under the [client] section, to make it the default.
____________

Profile Bruce Allen
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 15 Oct 04
Posts: 1105
Credit: 171,768,817
RAC: 0
Message 76043 - Posted: 17 Oct 2007, 8:06:53 UTC - in response to Message 76032.

You'll get to love the --i-am-a-dummy mysql client setting. Also available with a less offensive name under --safe-updates. If you do an UPDATE without a WHERE clause, it will give an error. Saved my a** a couple of times.

I think you can set it in my.ini under the [client] section, to make it the default.


Good idea -- I will pass this on to our admin!

Bruce
____________
moz6311_v2
Send message
Joined: 7 Nov 06
Posts: 2
Credit: 22,267
RAC: 0
Message 76046 - Posted: 17 Oct 2007, 8:57:09 UTC

One of my WU's got hit too:
http://einstein.phys.uwm.edu/workunit.php?wuid=34956676

So far it's been sent out six times. I know my machine is
stable (1 yr on EAH), and at least one other wingman
is stable too. Both got errors anyway. Help!

Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 15 Oct 04
Posts: 3611
Credit: 128,446,844
RAC: 56,568
Message 76096 - Posted: 17 Oct 2007, 22:32:30 UTC - in response to Message 76046.

One of my WU's got hit too:
http://einstein.phys.uwm.edu/workunit.php?wuid=34956676

Ouch!

The trouble is that with this bug there are very few workunits that can't be finished valid with the 4.07 App (probably the 4.09 Linux Beta had the same problem, which should be fixed in 4.12).

BM
tapir
Send message
Joined: 19 Mar 05
Posts: 23
Credit: 344,063,277
RAC: 420,503
Message 76175 - Posted: 19 Oct 2007, 7:13:09 UTC
Last modified: 19 Oct 2007, 7:14:45 UTC

My first validate error:
wuid=35000079
____________

1 · 2 · Next

Message boards : Cruncher's Corner : validate errors


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen