Arecibo Binary Search 1.04 taking 200 hours?

Patrick
Patrick
Joined: 22 Dec 07
Posts: 2
Credit: 132768
RAC: 0
Topic 194374

Here is the job info:
Job ID: p2030_53614_09366_0084_G73.19-00.29.N_3.dm_532_0
Binary: einsteinbinary_ABP1 version 104

The job has been running for 7 hours and claims to be at 2.730% progress. BOINC reports that it will complete at 14:48, with the completion time going up one second every second. Just dividing by the progress gives an expected completion time of 200 hours, 30 times more than normal jobs.

I have tried suspending and resuming this job to no avail. Another job with the same binary is running smoothly (is already at 10% after less than an hour)

Any ideas? Am I just supposed to abort jobs like this? Or are the 200 hours of computation useful?

Unlike that other post, it is actually taking up 100% CPU, so it's definitely doing something.

Quote:
Sat 06 Jun 2009 02:23:44 PM PDT||Starting BOINC client version 6.2.18 for x86_64-pc-linux-gnu
Sat 06 Jun 2009 02:23:44 PM PDT||log flags: task, file_xfer, sched_ops
Sat 06 Jun 2009 02:23:44 PM PDT||Libraries: libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10
Sat 06 Jun 2009 02:23:44 PM PDT||Data directory: /var/lib/boinc-client
Sat 06 Jun 2009 02:23:44 PM PDT||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
Sat 06 Jun 2009 02:23:44 PM PDT||Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdc
Sat 06 Jun 2009 02:23:44 PM PDT||OS: Linux: 2.6.28-11-generic
Sat 06 Jun 2009 02:23:44 PM PDT||Memory: 3.81 GB physical, 4.92 GB virtual
Sat 06 Jun 2009 02:23:44 PM PDT||Disk: 24.62 GB total, 12.99 GB free
Sat 06 Jun 2009 02:23:44 PM PDT||Local time is UTC -7 hours
Sat 06 Jun 2009 02:23:44 PM PDT||No coprocessors

Here is the job info:
129391691 53825008 6 Jun 2009 22:14:53 UTC 20 Jun 2009 22:14:53 UTC In progress --- New
One other person has also received this job and is still crunching since the same time. I guess I'll leave it on overnight and see what happens.

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

Arecibo Binary Search 1.04 taking 200 hours?

I have one just like it.

http://einsteinathome.org/workunit/53824866

5 hours 20 minutes in, 2.4% done.

Did I miss the stop at E@H and end up at CPDN? :-D

Kathryn :o)

Einstein@Home Moderator

Olaf
Olaf
Joined: 16 Sep 06
Posts: 26
Credit: 190763630
RAC: 0

I currently have two of

Message 92998 in response to message 92997

I currently have two of them.

Typically those ABP jobs run about 5-8 hours on those computers,
for these two 'G73' it claims to have less than 7% after more than 12 hours
of computing time with always increasing time to completion - hopefully these
are no black holes in the computing space, becoming always larger tasks with
more computation time ;o)

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 950
Credit: 25167626
RAC: 3

Hi guys, This is

Hi guys,

This is definitely not intended! Preliminary investigations show that this particular set of workunits (p2030_53614_09366_0084_G73.19-00.29.N_[2|3]*) is based on a corrupted data file that went unnoticed unfortunately.

We are looking into this more closely right now and we're going to cancel the workunits as soon as our first findings are confirmed. You should abort the tasks concerned when we cancelled the workunits on the server side. Until then I suggest you simple pause them and wait for further notice. I expect this to happen in the next 1-2 days...

Sorry for the inconvenience this might have caused!

Oliver

 

Einstein@Home Project

Patrick
Patrick
Joined: 22 Dec 07
Posts: 2
Credit: 132768
RAC: 0

Thanks, I will pause this

Thanks, I will pause this task.

By the way, according to this page, someone else managed to finish my stalled work unit in only 10 hours of CPU time. How did that happen?

http://einsteinathome.org/workunit/53825008

Olaf
Olaf
Joined: 16 Sep 06
Posts: 26
Credit: 190763630
RAC: 0

Maybe the problem is specific

Message 93001 in response to message 93000

Maybe the problem is specific for intel processors?
The two samples I have run on intel too, the finished one we can see is on AMD.
However looking on the computation time and the percentage value,
this seems to result in a stable estimate of about 183 hours for
the jobs I have - this is within the deadline ;o)

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 950
Credit: 25167626
RAC: 3

RE: By the way, according

Message 93002 in response to message 93000

Quote:

By the way, according to this page, someone else managed to finish my stalled work unit in only 10 hours of CPU time. How did that happen?

http://einsteinathome.org/workunit/53825008

Well, the corrupted data file usually contains (or leads to) "NaN"s and different machines handle/interpret those differently.

In the meantime we determined the root cause of this problem and added additional checks to the workunit generator to detect this kind of data error before we start processing the affected files!

Cheers,
Oliver

 

Einstein@Home Project

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Hmmmm... Well, you're

Hmmmm...

Well, you're going to have to make a ruling one way or the other on this set of tasks pretty quick now.

First off, not everyone can afford to sit on an EAH task for extended periods without causing schedule jams. I know I'm starting to approach the "fish or cut bait" point.

Secondly, I've started to notice people are starting to just summarily abort the set when they see them.

I really hate to abort tasks I've been assigned, but the one I have is not looking so grim as the other ones reported so far. OTOH, I don't want to make a 161 hour and have it die on user aborts or get canceled.

It's 12 hours in on K6 III/450 and showing just over 10% complete. Estimated runtime is about 60% more than what I've seem for ABPS on this host before, but is still less than half of the deadline.

Alinator

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 950
Credit: 25167626
RAC: 3

Update: the affected

Update: the affected workunits have been cancelled! This means no more tasks/results will be created for them. You may therefore now abort any local work based on the affected (see my previous post) workunits. Recent clients should do this automatically after their next server (scheduler) contact but I recommend doing this manually to save precious CPU cycles.

Cheers,
Oliver

 

Einstein@Home Project

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

Isn't it a good idea to post

Message 93005 in response to message 93004

Isn't it a good idea to post about it on the front page of the project as well, in the News section? That way it goes out as RSS to many more people who do not read the forums.

Dingo
Dingo
Joined: 18 Jan 05
Posts: 31
Credit: 135458055
RAC: 7201

Just found this thread. I

Just found this thread. I have had a work unit running for 77 hours. Am I supposed to abort and get no credit ? That does not seem to be fair. I assume that the other person with this wu is unaware of the situation. If it a problem from Einstein they should cancell the work unit and return it ans give the appropriate credit.

http://einsteinathome.org/workunit/53824911


Proud Founder of
Have a look at my WebCam<

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.