Resumed Gamma-Ray Pulsar search


Advanced search

Message boards : Technical News : Resumed Gamma-Ray Pulsar search

AuthorMessage
Profile Bernd Machenschalk
Forum moderator
Project administrator
Project developer
Avatar
Send message
Joined: Oct 15 04
Posts: 3267
Credit: 90,776,418
RAC: 10,264
Message 118574 - Posted 30 Jul 2012 8:19:16 UTC

    We have Fermi-LAT Gamma-Ray Pulsar Search work left for about 10 days. We won't add new work for that search, but instead take the time to prepare the results collected so far for publication, and to further develop and improve the application. The latter will happen over at Albert@Home.

    BM

    Sid
    Send message
    Joined: Oct 17 10
    Posts: 89
    Credit: 48,711,144
    RAC: 15,418
    Message 118575 - Posted 30 Jul 2012 8:31:34 UTC

      Thank you for information.
      Talking about the publication - do we need to anticipate some exiting story about Gamma-Ray pulsar discoveries ?

      Profile Bernd Machenschalk
      Forum moderator
      Project administrator
      Project developer
      Avatar
      Send message
      Joined: Oct 15 04
      Posts: 3267
      Credit: 90,776,418
      RAC: 10,264
      Message 118576 - Posted 30 Jul 2012 8:49:19 UTC - in response to Message 118575.

        You don't need to.

        Sorry, we can't publish anything before the publication, or else it wouldn't be (the) one.

        BM

        Profile Bikeman (Heinz-Bernd Eggenstein)
        Forum moderator
        Project administrator
        Project developer
        Avatar
        Send message
        Joined: Aug 28 06
        Posts: 3225
        Credit: 73,642,482
        RAC: 31,958
        Message 118577 - Posted 30 Jul 2012 9:42:36 UTC - in response to Message 118575.

          Thank you for information.
          Talking about the publication - do we need to anticipate some exiting story about Gamma-Ray pulsar discoveries ?


          What we can say is this, taken from a press release that was published in connection with a pulsar discovery done with essentially the same code, but on the ATLAS computing cluster, not on E@H

          (http://www.aei.mpg.de/hannover-de/77-files/pm/2012/PM2012_SprunghafterPulsar_eng.pdf)

          "
          The ATLAS computer cluster of the Albert Einstein Institute has thus already assisted in the discovery of the tenth previously unknown gamma-ray pulsar; however, Allen’s team has meanwhile mobilised further computing capacity. “Since August 2011, our search has also been running on the distributed computing project Einstein@Home, which has computing power a factor of ten greater than the ATLAS cluster. We are very optimistic about finding more unusual gamma-ray pulsars in the Fermi data,” says Bruce Allen. One goal of the expanded search is to discover the first gamma-ray-only pulsar with a rotation period in the millisecond range.
          "

          HB

          ____________

          Profile Bernd Machenschalk
          Forum moderator
          Project administrator
          Project developer
          Avatar
          Send message
          Joined: Oct 15 04
          Posts: 3267
          Credit: 90,776,418
          RAC: 10,264
          Message 118812 - Posted 24 Aug 2012 11:09:33 UTC

            Last modified: 24 Aug 2012 12:50:06 UTC

            We are currently testing the a new FGRP App version on Einstein. A fresh pair of eyes (HB's) on the code fond a serious bug that appears to be responsible for most of the validation problems (validate errors and invalid results) we've seen in the FGRP search. So far the new App version 30 has shown not a single validate error (neither on Albert nor on Einstein), and only one invalid result (compared to ~1000 valid ones). Looks pretty good.

            In the next days we will ship a couple of FGRP WUs again that are mainly designed to check how much this bug affected the results optianed with the older App.

            BM

            Sparrow
            Send message
            Joined: Jul 4 11
            Posts: 23
            Credit: 2,708,180
            RAC: 5,917
            Message 118837 - Posted 26 Aug 2012 9:58:23 UTC - in response to Message 118812.

              Let's hope that we don't have to repeat all of the WUs because of this bug...

              Profile Bernd Machenschalk
              Forum moderator
              Project administrator
              Project developer
              Avatar
              Send message
              Joined: Oct 15 04
              Posts: 3267
              Credit: 90,776,418
              RAC: 10,264
              Message 118852 - Posted 27 Aug 2012 9:11:09 UTC - in response to Message 118837.

                Let's hope that we don't have to repeat all of the WUs because of this bug...


                Certainly not.

                My current impression is that all tasks that were affected by this bug produced unusable results and were filtered out by the validation process. IOW the technically valid results should all be scientifically valid, too. But as this is only my personal impression, we are trying to verify this now.

                And even if we would find that certain results could have been affected by this bug, we wouldn't just run the old WUs again. Instead we would include the respective parameter space in the setup for the next "run".

                BM

                Khangollo
                Avatar
                Send message
                Joined: Feb 17 11
                Posts: 38
                Credit: 67,253,931
                RAC: 165,722
                Message 118860 - Posted 27 Aug 2012 14:09:45 UTC

                  Last modified: 27 Aug 2012 14:16:44 UTC

                  Great job! I haven't gotten any validate errors which were plaguing my Linux hosts before.
                  I've noticed that new 0.30 application for Linux x86 is around 10% slower than 0.23 on my i7-920 (and runtime estimate which was almost exact is now off by 40 min.). Is this normal or just something weird with my computer (I haven't changed anything)?
                  ____________

                  Profile Bernd Machenschalk
                  Forum moderator
                  Project administrator
                  Project developer
                  Avatar
                  Send message
                  Joined: Oct 15 04
                  Posts: 3267
                  Credit: 90,776,418
                  RAC: 10,264
                  Message 118861 - Posted 27 Aug 2012 14:20:39 UTC - in response to Message 118860.

                    Last modified: 27 Aug 2012 14:26:51 UTC

                    Other people noticed a significant performance increase of the 0.30 App over the previous version when ran on exactly the same data.

                    From the code changes I would expect a small increase of performance in the order of very few percent.

                    Up to +-10% should be within the normal fluctuation even between different datasets. No reason to worry.

                    BM

                    Sparrow
                    Send message
                    Joined: Jul 4 11
                    Posts: 23
                    Credit: 2,708,180
                    RAC: 5,917
                    Message 118862 - Posted 27 Aug 2012 14:29:46 UTC - in response to Message 118860.

                      Great job! I haven't gotten any validate errors which were plaguing my Linux hosts before.
                      I've noticed that new 0.30 application for Linux x86 is around 10% slower than 0.23 on my i7-920 (and runtime estimate which was almost exact is now off by 40 min.). Is this normal or just something weird with my computer (I haven't changed anything)?


                      On Win7 64bit it seems to be slower too. A WU takes 7.5 hours now, and I'm quite sure that it took me between 6 and 7 hours before. But maybe playing Diablo 3 (which I do way too much :-) ) is slowing down BOINC a bit.

                      I also have a WU waiting in Linux 64bit, but it didn't start yet.

                      Oh, and I'm also using a i7-920.

                      Profile Gary Roberts
                      Forum moderator
                      Send message
                      Joined: Feb 9 05
                      Posts: 3022
                      Credit: 1,095,818,417
                      RAC: 2,357,496
                      Message 118871 - Posted 28 Aug 2012 1:58:14 UTC - in response to Message 118852.

                        ... My current impression is that all tasks that were affected by this bug produced unusable results and were filtered out by the validation process.

                        Any thoughts on why the rates of validate errors were (apparently) so highly OS-centric? Why did Windows hosts seem to be relatively immune when the rates for both OS X and Linux (but particularly OS X) were so high.

                        Also, if one host participating in a quorum produced a validate error, why didn't all hosts do the same? I didn't examine affected quorums all that closely but my recollection is that there were plenty of examples of validate errors where at least one of the two hosts that eventually completed the quorum was running either Linux or OS X. Once you have done your full analysis, it would be interesting to be updated on all this.

                        As someone with large numbers of Linux and Mac OS X machines that were haunted by this problem, I'm extremely grateful for HB's 'new set of eyes' :-). Congratulations HB - a job extremely well done!!

                        I look forward keenly to the next round of FGRP work, whenever it comes, with the anticipation that the 5-10% validate error rate is now a thing of the past.

                        ____________
                        Cheers,
                        Gary.

                        Profile Bernd Machenschalk
                        Forum moderator
                        Project administrator
                        Project developer
                        Avatar
                        Send message
                        Joined: Oct 15 04
                        Posts: 3267
                        Credit: 90,776,418
                        RAC: 10,264
                        Message 118873 - Posted 28 Aug 2012 6:11:17 UTC - in response to Message 118871.

                          Last modified: 28 Aug 2012 6:15:00 UTC

                          The main bug was a variable on the stack that conditionally was accessed uninitialized. In most cases the correct value was still there from a previous call to the same function, but depending on process- and memory management (which is OS-dependent) and whatever else was going on on the machine at that time this memory position may have been overwritten between two such calls.

                          The nature of this bug made it impossible to reproduce it in a clean environment (or on another computer), which is why it took us so long to track it down.

                          In many cases the floating-point variable was overwritten with something that wasn't a valid number, resulting in "NaN"s (Not A Number) in the result, ultimately ending in a "validate error". IMHO it is highly unlikely that we got a wrong "canonical" result because of this bug, as for this to happen there needed to be two machines with (almost) exactly the same "garbage" at the same point in the calculation on the stack, which also would need to be a valid floating-point number in double precision representation.

                          BM

                          Profile Mike Hewson
                          Forum moderator
                          Avatar
                          Send message
                          Joined: Dec 1 05
                          Posts: 3508
                          Credit: 28,009,501
                          RAC: 4,319
                          Message 118874 - Posted 28 Aug 2012 6:28:12 UTC - in response to Message 118873.

                            ... a variable on the stack that conditionally was accessed uninitialized ....

                            Arrghh

                            Cheers, Mike.

                            ____________
                            "I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

                            Sparrow
                            Send message
                            Joined: Jul 4 11
                            Posts: 23
                            Credit: 2,708,180
                            RAC: 5,917
                            Message 118887 - Posted 29 Aug 2012 15:24:21 UTC - in response to Message 118862.

                              Great job! I haven't gotten any validate errors which were plaguing my Linux hosts before.
                              I've noticed that new 0.30 application for Linux x86 is around 10% slower than 0.23 on my i7-920 (and runtime estimate which was almost exact is now off by 40 min.). Is this normal or just something weird with my computer (I haven't changed anything)?


                              On Win7 64bit it seems to be slower too. A WU takes 7.5 hours now, and I'm quite sure that it took me between 6 and 7 hours before. But maybe playing Diablo 3 (which I do way too much :-) ) is slowing down BOINC a bit.

                              I also have a WU waiting in Linux 64bit, but it didn't start yet.

                              Oh, and I'm also using a i7-920.


                              On Linux 64bit the new application seems to be as fast as the old one, or even a bit faster.

                              Sid
                              Send message
                              Joined: Oct 17 10
                              Posts: 89
                              Credit: 48,711,144
                              RAC: 15,418
                              Message 118888 - Posted 29 Aug 2012 16:06:12 UTC - in response to Message 118871.


                                Any thoughts on why the rates of validate errors were (apparently) so highly OS-centric? Why did Windows hosts seem to be relatively immune when the rates for both OS X and Linux (but particularly OS X) were so high.

                                As far as I remember Windows initializes memory before it will be given to task by 0xCCCCCCCC. Unix like systems do the same but initializes memory by 0x00000000
                                Know nothing about OS X however.
                                Probably this is the answer.

                                Profile Bernd Machenschalk
                                Forum moderator
                                Project administrator
                                Project developer
                                Avatar
                                Send message
                                Joined: Oct 15 04
                                Posts: 3267
                                Credit: 90,776,418
                                RAC: 10,264
                                Message 118895 - Posted 30 Aug 2012 7:23:02 UTC - in response to Message 118888.

                                  Last modified: 30 Aug 2012 7:25:27 UTC

                                  Probably this is the answer.


                                  No, I don't think so.

                                  With the first such function call, the variable in question is correctly initialized by the function. The error happens at subsequent calls when a possible initialization by the OS has already been overwritten.

                                  Furthermore, 0x0... is a valid double-precision number (0), while 0xC... (I think) is not. If this initialization would be the reason, we should get more (or even only) such "validate errors" from Windows hosts, which is the opposite of what we observe.

                                  Finally I recently verified that at least on (modern) Linux systems memory passed to the application is definitely not initialized. I vaguely remember having read about such memory initialization in an early edition of "The Design and Implementation of the BSD Operating System", but I can't find it in the BSD4.4 edition anymore and I think this is considered obsolete by most modern OS for performance reasons. Possibly paranoid Net/OpenBSD versions still do it.

                                  BM

                                  Public0x05bf
                                  Send message
                                  Joined: Oct 16 11
                                  Posts: 3
                                  Credit: 402,846
                                  RAC: 678
                                  Message 121235 - Posted 10 Dec 2012 23:17:19 UTC - in response to Message 118895.

                                    * all processes in linux (even boinc) processes run in virtual memory.
                                    * virtual memory is realized by mapping physical memory or disk (file / swap-
                                    space) to virtual memory.
                                    * mapping is done in pages (e.g. 4096 bytes for a normal i386-system).
                                    * virtual memory pages may be remapped.
                                    * there exists one physical-memory-page initialized to all zeros: the
                                    'zero-page'.
                                    * every time a process requests (virtual-)memory, it gets memory all mapped
                                    to this 'zero-page', so all memory a process gets is virtually
                                    initiazlized to 0x00000000.
                                    * this virtually-initializing of (process-)memory is essential for security
                                    (e.g. to avoid that a process B sees passwords of another process A that
                                    has used the [physical] memory before process B).

                                    * as soon as a process writes to its memory, all the memory pages written to
                                    are remapped to other (free) physical memory, now containing the data
                                    written by the process (called "Copy On Write).

                                    (read e.g. "DANIEL P. BOVET & MARCO CESATI: Understanding the LINUX KERNEL,
                                    published by O'REILLY, 2nd edition", Chapter 8: Process Address Space, sub-
                                    chapter: Page Fault Exception Handling, 'sub-sub-chapters': Demand Paging
                                    (p. 292), Copy On Write (p. 295); the 'zero-page' is mentioned at p. 294.

                                    Sincererly

                                    Thomas

                                    Profile Bernd Machenschalk
                                    Forum moderator
                                    Project administrator
                                    Project developer
                                    Avatar
                                    Send message
                                    Joined: Oct 15 04
                                    Posts: 3267
                                    Credit: 90,776,418
                                    RAC: 10,264
                                    Message 121472 - Posted 18 Dec 2012 18:07:10 UTC

                                      New Gamma-Ray pulaar search work is shipped under the new label FGRP2. Only ~4500 tasks for now. If these come back ok, we'll start continuous production tomorrow.

                                      BM

                                      Profile Gary Roberts
                                      Forum moderator
                                      Send message
                                      Joined: Feb 9 05
                                      Posts: 3022
                                      Credit: 1,095,818,417
                                      RAC: 2,357,496
                                      Message 121475 - Posted 19 Dec 2012 0:22:38 UTC - in response to Message 121472.

                                        Last modified: 19 Dec 2012 0:53:56 UTC

                                        If these come back ok ...

                                        Are they meant to go so fast?? I saw two of them on a particular host so I promoted them to the top of the queue. One was estimated at 3 hours and the other was estimated at 6 hours. The first is finished in 15 mins and the second is currently 50% completed in 17 mins!!

                                        This new app seems to be on steroids!!! :-).

                                        ... we'll start continuous production tomorrow.

                                        Ahhh... I see ... a cunning ploy to break the 1 Petaflop barrier before Christmas!! :-).

                                        EDIT: The second one finished in 35 mins. I've reported them both. They can be seen in the tasks list for hostid=83040, which is a new GPU cruncher that I've just built.

                                        The crunching on the (quite basic) CPU cores was just a sideline but these two super quick FGRP2 tasks might cause me to reassess that :-). I wonder how much credit we'll get :-).
                                        ____________
                                        Cheers,
                                        Gary.

                                        Profile Bernd Machenschalk
                                        Forum moderator
                                        Project administrator
                                        Project developer
                                        Avatar
                                        Send message
                                        Joined: Oct 15 04
                                        Posts: 3267
                                        Credit: 90,776,418
                                        RAC: 10,264
                                        Message 121478 - Posted 19 Dec 2012 7:37:39 UTC - in response to Message 121475.

                                          Hi Gary!

                                          The App is almost identical to the last FGRP1 one.

                                          We changed quite a bit in the setup of the new workunits: they use mission data of ~4y now instead of previously 3y, a "coherent follow-up" (a closer look at the most promising candidate) is done now only after looking at a couple of skypoints, not after every skypoint, the number of skypoints per workunit had been reduced etc.

                                          Honestly we had not much of an idea how all these changes together would affect the run-time, and we found the testing on Albert not very representative. So we decided to just go ahead, run (relatively) few tasks here on Einstein and see what happens. For now we left the credit unchanged, which now looks like a Xmas present to our fellow crunchers.

                                          Finally, as in FGRP1 the workunits are cut in equal chunks from a larger set of skypoints that is not necessarily dividable by the number of skypoints per workunit. This results in workunits at the "end" of each data file that can be much shorter than the other ones. The first one you ran was probably such a "short end".

                                          BM

                                          Profile Gary Roberts
                                          Forum moderator
                                          Send message
                                          Joined: Feb 9 05
                                          Posts: 3022
                                          Credit: 1,095,818,417
                                          RAC: 2,357,496
                                          Message 121479 - Posted 19 Dec 2012 8:24:34 UTC - in response to Message 121478.

                                            Last modified: 19 Dec 2012 10:01:44 UTC

                                            Thanks very much for the info. I've found, promoted, crunched and returned a few more on other hosts of mine during the day. The speedup is very impressive!! I was expecting you to come back with a "Houston, we have a problem ..." type reply. I'm very happy it's not that!! :-).

                                            I notice that validation is currently disabled and there are already 350 WUs waiting for validation. Will you be turning on validation shortly? I'm interested to see if they validate.

                                            With the sorts of speeds I've been seeing on various hosts, I hope your infrastructure can cope with the onslaught when you ramp up to full production! :-).

                                            Is that still expected for today?

                                            EDIT: Looks like there is a validator running now! Quite a few validated tasks (762) showing on the status page and the 'waiting' queue has dropped to zero. So far there are no 'invalids' listed so that is quite hopeful. Of course that doesn't mean there aren't any quorums pending the outcome of a third result :-).
                                            ____________
                                            Cheers,
                                            Gary.

                                            Profile Ageless
                                            Avatar
                                            Send message
                                            Joined: Jan 26 05
                                            Posts: 2971
                                            Credit: 5,356,009
                                            RAC: 144
                                            Message 121480 - Posted 19 Dec 2012 11:52:42 UTC - in response to Message 121479.

                                              I was expecting you to come back with a "Houston, we have a problem ..." type reply. I'm very happy it's not that!! :-).

                                              That would be a "Hannover, we have a problem" type of reply then anyway. ;-)
                                              ____________
                                              Jord

                                              Profile MAGIC
                                              Avatar
                                              Send message
                                              Joined: Jan 18 05
                                              Posts: 514
                                              Credit: 106,805,513
                                              RAC: 166,996
                                              Message 121499 - Posted 20 Dec 2012 9:56:04 UTC

                                                Last modified: 20 Dec 2012 9:57:16 UTC

                                                I have 2 of my 8 hosts that I haven't updated to a GPU cruncher yet so I run the Grav S6's on and since we ran out of them I ran the BRP4's w/CPU

                                                But it is almost done and after a few tries before that and right now I have no luck getting any of the new FGRP2 and I have the pref. set to recieve the Grav.'s and GRP's....couple more tries and I guess I will reset and get a couple days worth of BRP4's

                                                As I am typing......I just got 16 tasks of the Grav. S6's (1.13)

                                                So it will have work to do along with the T4T X2 that I have it running for some reason.

                                                Maybe next time one of my CPU hosts will grab a few FGRP2

                                                2am.....goodnight
                                                ____________

                                                Profile Gary Roberts
                                                Forum moderator
                                                Send message
                                                Joined: Feb 9 05
                                                Posts: 3022
                                                Credit: 1,095,818,417
                                                RAC: 2,357,496
                                                Message 121501 - Posted 20 Dec 2012 11:28:14 UTC

                                                  There shouldn't be a shortage of GW tasks just yet - the work generator is running, there are a few thousand ready to send and 600K units before the run is finished. It'll be the new year before the 'end game' is upon us and even then there will be some work available over the days and weeks after that.

                                                  Nobody can get FGRP2 tasks just yet - there are zero ready to send and the work generator is disabled. Hopefully it will be turned on RSN :-).

                                                  ____________
                                                  Cheers,
                                                  Gary.

                                                  Profile Donald A. Tevault
                                                  Avatar
                                                  Send message
                                                  Joined: Feb 17 06
                                                  Posts: 415
                                                  Credit: 62,667,177
                                                  RAC: 0
                                                  Message 121514 - Posted 20 Dec 2012 22:43:20 UTC

                                                    Last modified: 20 Dec 2012 22:46:54 UTC

                                                    I hate to be the bearer of bad news, but. . .

                                                    All of the new Gamma-Ray workunits on this machine of mine ended in an error condition.


                                                    It's Lubuntu 12.10 running with the stock kernel. Am I going to have to compile my own non-preemptive kernel, again?

                                                    Profile Gary Roberts
                                                    Forum moderator
                                                    Send message
                                                    Joined: Feb 9 05
                                                    Posts: 3022
                                                    Credit: 1,095,818,417
                                                    RAC: 2,357,496
                                                    Message 121516 - Posted 20 Dec 2012 23:15:51 UTC - in response to Message 121514.

                                                      I took a look through a task ID link of a failed task and found

                                                      <core_client_version>7.0.28</core_client_version>
                                                      <![CDATA[
                                                      <message>
                                                      process exited with code 22 (0x16, -234)
                                                      </message>
                                                      <stderr_txt>
                                                      execv: No such file or directory

                                                      </stderr_txt>
                                                      ]]>


                                                      Is this a 64 bit OS and do you have the 32 bit libraries installed?

                                                      I think that might be the problem. Perhaps it's looking for 32 bit libs and can't find them. In a shell run 'ldd path/to/executable' without the quotes. That will list any 'not found' libs.

                                                      ____________
                                                      Cheers,
                                                      Gary.

                                                      Profile Gary Roberts
                                                      Forum moderator
                                                      Send message
                                                      Joined: Feb 9 05
                                                      Posts: 3022
                                                      Credit: 1,095,818,417
                                                      RAC: 2,357,496
                                                      Message 121517 - Posted 20 Dec 2012 23:49:03 UTC

                                                        As I indicated in a message during the test run, tasks are crunching really fast. A back-of-the-envelope estimate says that the actual crunch time will be 6x to 10x (or more) faster than the estimated time, which, I guess, is based on the prior run.

                                                        This is going to play havoc for people maintaining large caches and/or supporting multiple projects because of fairly wild see-sawing in DCF (duration correction factor) as BOINC tries to cope with the variations between the FGRP2, BRP4 and S6LV1 tasks in the overall task mix. It is quite important that the new FGRP2 tasks should be delivered with a more accurate time estimate and presumably this should happen as the new run gets established.

                                                        In the meantime, it would be very prudent to turn down your cache size substantially so as to avoid BOINC running in high priority mode later on if a series of short running FGRP2 tasks causes over-fetching of the other two types of tasks. People (quite rightly) tend to get upset if the 'equilibrium' between projects gets disturbed to the point that other projects are 'shut out' by any one project going into 'panic' mode like this. Sometimes it's the participant's unwise choice of (too large) cache size for the number of projects in the mix, but this time I think even moderate cache sizes could be adversely affected.

                                                        ____________
                                                        Cheers,
                                                        Gary.

                                                        Profile Donald A. Tevault
                                                        Avatar
                                                        Send message
                                                        Joined: Feb 17 06
                                                        Posts: 415
                                                        Credit: 62,667,177
                                                        RAC: 0
                                                        Message 121519 - Posted 21 Dec 2012 0:07:11 UTC - in response to Message 121516.

                                                          Ugh! I'm a dolt.

                                                          I just installed this system a few days ago, and completely forgot to install the 32-bit libraries.

                                                          Oh well, I guess I'll do that now.

                                                          Profile Bernd Machenschalk
                                                          Forum moderator
                                                          Project administrator
                                                          Project developer
                                                          Avatar
                                                          Send message
                                                          Joined: Oct 15 04
                                                          Posts: 3267
                                                          Credit: 90,776,418
                                                          RAC: 10,264
                                                          Message 121523 - Posted 21 Dec 2012 11:13:47 UTC - in response to Message 121516.

                                                            <core_client_version>7.0.28</core_client_version>


                                                            Is this a 64 bit OS and do you have the 32 bit libraries installed?


                                                            If that is the problem it means that the detection of the 32Bit compatibility libs still doesn't work with 7.0.28. Pitty. If it would, the client should detect the absence of these libs and you shouldn't get such tasks at all.

                                                            BM

                                                            Profile Bernd Machenschalk
                                                            Forum moderator
                                                            Project administrator
                                                            Project developer
                                                            Avatar
                                                            Send message
                                                            Joined: Oct 15 04
                                                            Posts: 3267
                                                            Credit: 90,776,418
                                                            RAC: 10,264
                                                            Message 121524 - Posted 21 Dec 2012 11:17:44 UTC - in response to Message 121517.

                                                              New FGRP2 tasks will run a bit longer (~ twice as long) now, and will have the FLOPs estimation reduced to 1/4. Flops estimation and Credit will be fine-tuned when we have more data (i.e. tasks returned), but possibly not this year anymore.

                                                              BM

                                                              Profile Donald A. Tevault
                                                              Avatar
                                                              Send message
                                                              Joined: Feb 17 06
                                                              Posts: 415
                                                              Credit: 62,667,177
                                                              RAC: 0
                                                              Message 121526 - Posted 21 Dec 2012 14:36:37 UTC - in response to Message 121523.

                                                                <core_client_version>7.0.28</core_client_version>


                                                                Is this a 64 bit OS and do you have the 32 bit libraries installed?


                                                                If that is the problem it means that the detection of the 32Bit compatibility libs still doesn't work with 7.0.28. Pitty. If it would, the client should detect the absence of these libs and you shouldn't get such tasks at all.

                                                                BM



                                                                Yeah, that was the problem. I installed the ia32-libs package, and now the Gamma Ray app runs fine.

                                                                Profile Gary Roberts
                                                                Forum moderator
                                                                Send message
                                                                Joined: Feb 9 05
                                                                Posts: 3022
                                                                Credit: 1,095,818,417
                                                                RAC: 2,357,496
                                                                Message 121537 - Posted 22 Dec 2012 9:46:01 UTC - in response to Message 121524.

                                                                  New FGRP2 tasks will run a bit longer (~ twice as long) now, and will have the FLOPs estimation reduced to 1/4. Flops estimation and Credit will be fine-tuned when we have more data (i.e. tasks returned), but possibly not this year anymore.

                                                                  BM

                                                                  Thanks very much for attending to this. I've added several hosts very recently and these have downloaded and completed tasks with the changed configs already. The estimated and actual times are much closer now so that is great to see.

                                                                  Once again, thanks for fixing this promptly.

                                                                  ____________
                                                                  Cheers,
                                                                  Gary.

                                                                  Steve Applin
                                                                  Send message
                                                                  Joined: Jul 19 10
                                                                  Posts: 14
                                                                  Credit: 20,185,964
                                                                  RAC: 0
                                                                  Message 121559 - Posted 24 Dec 2012 10:59:03 UTC - in response to Message 121537.

                                                                    I've noticed on two of my machines that there has been a substantial (30% more on 4127571 and 5 x more on 4127568) for Gravitational Wave S6 LineVeto search v1.13 (SSE2) searches.

                                                                    An example of the massive increase in time is task 140002012 (http://einstein.phys.uwm.edu/workunit.php?wuid=140002012) on machine 4127568.

                                                                    Is the increase in time related to this issue, or do I have another problem?

                                                                    astro-marwil
                                                                    Send message
                                                                    Joined: May 28 05
                                                                    Posts: 277
                                                                    Credit: 23,595,634
                                                                    RAC: 33,219
                                                                    Message 121633 - Posted 27 Dec 2012 8:16:58 UTC - in response to Message 121559.

                                                                      Last modified: 27 Dec 2012 8:18:50 UTC

                                                                      Hallo!
                                                                      From 125 tasks crunched on one computer within a bit more than 4 days, I get
                                                                      Mean Crunching Time : 2.71 +/- 0,67[h]
                                                                      Mean Run Time : 2,96 +/- 0.79[h]
                                                                      Mean Realtive Crunching Overhead : 9.4 +/- 8.1[%]
                                                                      The shortest Run Time was 1,4[h], the longest 5.1[h].
                                                                      The smallest Relative Overhead was 1.7[%], the biggest one 55.1[%]
                                                                      There is no correlation between relative overhead and crunching time.
                                                                      I also didn´t find a correlation between long running times and my activities on this computer like writing this here or backup or virus search activities.
                                                                      So, there is a very high varity in the behaviour of the tasks.

                                                                      Kind regards an happy crunching
                                                                      Martin
                                                                      ____________

                                                                      Profile Gary Roberts
                                                                      Forum moderator
                                                                      Send message
                                                                      Joined: Feb 9 05
                                                                      Posts: 3022
                                                                      Credit: 1,095,818,417
                                                                      RAC: 2,357,496
                                                                      Message 121654 - Posted 28 Dec 2012 12:39:58 UTC - in response to Message 121559.

                                                                        Is the increase in time related to this issue, or do I have another problem?

                                                                        No, the dramatic increase in actual run time shown by the task you referenced has nothing to do with any see-sawing of estimated run time to be expected when there is a wide variation in the accuracy of estimates of various science runs within the one project. The potential problem I was pointing to has now been averted (as explained by Bernd) by the actions taken to correct the estimates for the new FGRP2 run. The estimates still need further refinement but are certainly good enough so as not to cause violent swings in the DCF value. I've been watching things closely in several of my hosts and whilst there is still fluctuation in DCF, the swings are modest and shouldn't cause any real problems.

                                                                        You certainly have another issue and it's one that I've seen from time to time in some of my hosts. However there are no guarantees that the causes in my cases are necessarily the same as for your case.

                                                                        These days, I largely run Linux and I don't see the problem. A couple of years ago I was running a much greater proportion of WinXP hosts and I saw the problem (run times blowing out to 5x to 10x normal) quite regularly.

                                                                        My habit was (and still is with Linux) to run crunching hosts with no keyboard, mouse, or monitor attached. WinXP (and perhaps related somewhat to the hardware on which it was running) doesn't like this and maybe after days to a week or two, it would start delivering dramatically extended run times just like your example. The tasks would still validate but progress was woeful. I quickly found a workaround and that was to hookup a keyboard and mouse.

                                                                        This wasn't a complete solution. What it really did was to simply extend the period before the dramatic slowdown started. The complete workaround was to actually toggle some keys on the keyboard or move the mouse once in a while. With a keyboard and mouse attached, it usually took several weeks for a slowdown to occur and I found I could prevent this from ever occurring by toggling the numlock key or moving the mouse every week or so. I never see this problem on any machines with Linux. They run for months and months (just the box, power cable and network cable) with no sign of a slowdown.

                                                                        I don't know the exact cause of the slowdown but I'm guessing it was something to do with Windows consuming increasing amounts of CPU cycles trying to poll the detached hardware, or something like that. The problem resolved itself the instant I connected the devices and/or toggled the numlock key and/or moved the mouse. I wanted to change to Linux anyway so this was a pretty good excuse.

                                                                        Apart from the above dramatic slowdowns, I also see what is usually a much less significant slowdown that is heat related. I assume it is some sort of thermal throttling of one (or more) core(s) in a multi-core CPU that happen to be running a bit hotter than some internal limit is happy with. On a quad, for example, there is usually not much variation from what is expected for the 4 simultaneous tasks that are running if all cores are sufficiently cool. If the ambient is too elevated, or if the heat sink is starting to lose efficiency, or if the fan is starting to run dry, this often can be spotted by occasional tasks running slower than previously. It's not usually a huge slowdown like in your example, more like 10-50% slower than normal. It's a wakeup call to do some PM, after which the slowdown is usually resolved.

                                                                        I don't know what might have caused the slowdown you reported but hopefully the above may give you some things to consider.

                                                                        ____________
                                                                        Cheers,
                                                                        Gary.

                                                                        Steve Applin
                                                                        Send message
                                                                        Joined: Jul 19 10
                                                                        Posts: 14
                                                                        Credit: 20,185,964
                                                                        RAC: 0
                                                                        Message 121781 - Posted 3 Jan 2013 10:21:40 UTC - in response to Message 121654.

                                                                          @ astro-marwil @ Gary

                                                                          Thanks for your help with this, I appreciate your time.

                                                                          The huge slowdown has happened on my work laptop and I suspect it's seen better days. I suspect it will soon be time to uninstall Boinc and return it to IT for a new one. The lessor of the two slowdowns happened on my home computer.

                                                                          It struck me as a bit strange because the issue affected two computers at the same time, the difference was instant, not a buildup in completion times over time, but not on a third laptop I'm running Boinc on.

                                                                          When I get a bit more time, "later", I'll try some PM on my home PC.

                                                                          Thanks again for your time.

                                                                          Steve

                                                                          Miklos M.
                                                                          Send message
                                                                          Joined: Apr 3 05
                                                                          Posts: 16
                                                                          Credit: 13,529,849
                                                                          RAC: 253
                                                                          Message 121902 - Posted 7 Jan 2013 18:55:16 UTC

                                                                            I wonder why I receive 70 credits for a Gamma wu on one computer and on the other one I get 377?

                                                                            Profile Bernd Machenschalk
                                                                            Forum moderator
                                                                            Project administrator
                                                                            Project developer
                                                                            Avatar
                                                                            Send message
                                                                            Joined: Oct 15 04
                                                                            Posts: 3267
                                                                            Credit: 90,776,418
                                                                            RAC: 10,264
                                                                            Message 121904 - Posted 7 Jan 2013 19:05:22 UTC - in response to Message 121902.

                                                                              See this post.

                                                                              BM

                                                                              Miklos M.
                                                                              Send message
                                                                              Joined: Apr 3 05
                                                                              Posts: 16
                                                                              Credit: 13,529,849
                                                                              RAC: 253
                                                                              Message 121934 - Posted 8 Jan 2013 15:07:14 UTC - in response to Message 121904.

                                                                                Sorry, can you be more specific? The credits seem to vary as of late.

                                                                                Profile Bernd Machenschalk
                                                                                Forum moderator
                                                                                Project administrator
                                                                                Project developer
                                                                                Avatar
                                                                                Send message
                                                                                Joined: Oct 15 04
                                                                                Posts: 3267
                                                                                Credit: 90,776,418
                                                                                RAC: 10,264
                                                                                Message 121936 - Posted 8 Jan 2013 15:26:34 UTC - in response to Message 121934.

                                                                                  The credit that will be granted is assigned to the workunit (WU) when it is generated. Tasks of FGRP2 WUs generated before Jan 4 will be granted the old FGRP1 value of 377 credits, tasks of FGRP2 WUs generated after Jan 4 will be granted 70 (as announced here).

                                                                                  BM

                                                                                  Miklos M.
                                                                                  Send message
                                                                                  Joined: Apr 3 05
                                                                                  Posts: 16
                                                                                  Credit: 13,529,849
                                                                                  RAC: 253
                                                                                  Message 121938 - Posted 8 Jan 2013 16:23:29 UTC - in response to Message 121936.

                                                                                    Thank you, now it is clear.

                                                                                    Post to thread

                                                                                    Message boards : Technical News : Resumed Gamma-Ray Pulsar search


                                                                                    Home · Your account · Message boards

                                                                                    This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

                                                                                    Copyright © 2014 Bruce Allen