Getting "latest" einstein@home apps source code, and why no CUDA for Gamma Wave Pulsar or S6 searches?


Advanced search

Message boards : Cruncher's Corner : Getting "latest" einstein@home apps source code, and why no CUDA for Gamma Wave Pulsar or S6 searches?

AuthorMessage
Profile John Jamulla
Send message
Joined: Feb 26 05
Posts: 20
Credit: 80,031,130
RAC: 91,936
Message 114367 - Posted 5 Oct 2011 12:30:45 UTC

    I have recently downloaded the software for einstein@home (following directions from http://einstein.phys.uwm.edu/license.php), and with a ton of "foolin around", have gotten that to compile under linux. But, it looks to me like this is not the latest software, nor is it all of the apps, nor any of the CUDA apps. Maybe I am mistaken.

    I would like to see if I could contribute to this endeavor by modifying code for the apps, hopefully to increase the use of GPUs using CUDA for the apps that aren't currently using it (such as gamma ray search, S6 search, etc.).

    How can I gt about getting the latest software, and whom might I be able to talk to about it if I had trouble.
    I can at least try to make modifications myself, if I could get the real and latest complete set of software.

    FYI - Currently I have approx. 7.2M credit and approx rank of 325, I am excited! Lookin to increase that. Currently I am easily able to add 2 more graphics cards, but I don't want to bather since I don't seem to be getting enough CUDA work, as most is still CPU work.

    Sincerely,
    John J.
    ____________

    Profile Bernd Machenschalk
    Forum moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: Oct 15 04
    Posts: 3208
    Credit: 88,367,073
    RAC: 28,945
    Message 114368 - Posted 5 Oct 2011 15:26:53 UTC - in response to Message 114367.

      There is a CUDA version of "HierarchicalSearch", the App that we were using until "S5R6". You might be able to build it with the build script "eah_build.sh" with --cuda. If you want to build it manually, you can configure LAL & LALApps with --with-cuda before building. You will find the kernel, wrapper and everything in lalapps/src/pulsar/FDS_isolated/OptimizedCFS. There is also an OpenCL version of that code.

      The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

      To use this in the current HierarchSearchGCT App, you would need to restructure the main loops to basically use XLALComputeFStatFreqBandVector instead of ComputeFStatFreqBand.

      The other way to make a CUDA/OpenCL version of that App is to validate the "resamplin Fstat" (which is currently being done internally, but will take a while), then implement the actual resampling (currently based on GSL splines) in CUDA, switch to using the cuFFT and possibly also implement the "global correlation transform" to run on the GPU. This is the way we (the LSC CW group at AEI) are currently heading.

      BM

      Profile Bernd Machenschalk
      Forum moderator
      Project administrator
      Project developer
      Avatar
      Send message
      Joined: Oct 15 04
      Posts: 3208
      Credit: 88,367,073
      RAC: 28,945
      Message 114369 - Posted 5 Oct 2011 15:46:08 UTC

        Addendum: The FGRP source code is not public. I am in communication with the main authors, but I don't think this will change in the foreseeable future.

        BM

        Akos Fekete
        Volunteer developer
        Avatar
        Send message
        Joined: Nov 13 05
        Posts: 562
        Credit: 4,404,768
        RAC: 0
        Message 114371 - Posted 5 Oct 2011 20:17:15 UTC - in response to Message 114368.

          The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

          An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
          AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.
          ____________

          Profile John Jamulla
          Send message
          Joined: Feb 26 05
          Posts: 20
          Credit: 80,031,130
          RAC: 91,936
          Message 114375 - Posted 5 Oct 2011 23:43:03 UTC - in response to Message 114371.

            AVX sounds interesting, since I have a Sandy Bridge CPU... 2600K.

            Also not sure but I know there's SSE up to 4.2 now (might include the AVX), but the curent apps seems to use SSE 2 only.
            ____________

            DanNeely
            Send message
            Joined: Sep 4 05
            Posts: 1071
            Credit: 56,501,129
            RAC: 92,914
            Message 114382 - Posted 6 Oct 2011 10:40:55 UTC - in response to Message 114375.

              AVX sounds interesting, since I have a Sandy Bridge CPU... 2600K.

              Also not sure but I know there's SSE up to 4.2 now (might include the AVX), but the curent apps seems to use SSE 2 only.


              I know SSE3 didn't offer anything that lead to faster computation rates.
              ____________

              Profile Donald A. Tevault
              Avatar
              Send message
              Joined: Feb 17 06
              Posts: 415
              Credit: 62,667,177
              RAC: 1,908
              Message 114384 - Posted 6 Oct 2011 13:46:06 UTC - in response to Message 114371.

                The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

                An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
                AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.



                This sounds interesting. Are you saying that a Bulldozer would outperform a Sandy Bridge on Einstein at Home applications?

                Akos Fekete
                Volunteer developer
                Avatar
                Send message
                Joined: Nov 13 05
                Posts: 562
                Credit: 4,404,768
                RAC: 0
                Message 114391 - Posted 6 Oct 2011 19:24:26 UTC - in response to Message 114384.

                  An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
                  AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.

                  This sounds interesting. Are you saying that a Bulldozer would outperform a Sandy Bridge on Einstein at Home applications?

                  I don't think it. Sandy Bridge is very powerful...
                  ____________

                  Post to thread

                  Message boards : Cruncher's Corner : Getting "latest" einstein@home apps source code, and why no CUDA for Gamma Wave Pulsar or S6 searches?


                  Home · Your account · Message boards

                  This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

                  Copyright © 2014 Bruce Allen