BRP4cuda32 vs. BRP4cuda32nv301 performance


Advanced search

Message boards : Cruncher's Corner : BRP4cuda32 vs. BRP4cuda32nv301 performance

AuthorMessage
Bjarke
Send message
Joined: Feb 10 06
Posts: 5
Credit: 303,755
RAC: 199
Message 118914 - Posted 31 Aug 2012 8:32:14 UTC

    In the past weeks I have been using the 276.52 driver (released 2012.02.09) enabling me to run the BRP4cuda32 application. Using this driver/application it took my Nvidia Quadro FX1800 arround 6200 seconds to complete a WU and get the 500 credits.

    After updating to the most recent 305.93 driver (released 2012.08.28) I am able to run the BRP4cuda32nv301 application. This seems to put more load on the GPU, since my system becomes less responsive. Therefore I would expect the runtimes to be shorter than when using the old driver. However, now a WU takes arround 19000 seconds to complete - and I still get 500 credits.

    My questions are:
    - Are my GPU performing better using the old driver and the BRP4cuda32 application?
    - Or am I doing any better with the new driver (eg. crunching more numbers), and the credits are just not adjusted properly?

    The "credit/time" might not be a very accurate measurement of performance.
    ____________

    Richard Haselgrove
    Send message
    Joined: Dec 10 05
    Posts: 1341
    Credit: 29,867,497
    RAC: 11,853
    Message 118916 - Posted 31 Aug 2012 9:22:05 UTC - in response to Message 118914.

      The actual applications used for BRP4cuda32 and BRP4cuda32nv301 are identical:

      Comparing files einsteinbinary_BRP4_1.25_windows_intelx86__BRP4cuda32nv301.exe and EINSTEINBINARY_BRP4_1.25_WINDOWS_INTELX86__BRP4CUDA32.EXE
      FC: no differences encountered

      Reading the - huge and complicated - Comparison of Nvidia graphics processing units table at Wikipedia, it looks as if your FX1800 has a variant of the G94 GPU chip, which makes it comparable to a 9600 GS or GT with Compute Capability 1.1

      That's quite an old technology. It's highly unlikely that NVidia is targeting driver improvements on those old chips: new features will be designed for the Fermi (GTX 4xx and 5xx) and Kepler (GTX 6xx) ranges. It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.

      I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.

      Bjarke
      Send message
      Joined: Feb 10 06
      Posts: 5
      Credit: 303,755
      RAC: 199
      Message 118918 - Posted 31 Aug 2012 10:48:28 UTC - in response to Message 118916.

        It's even possible that some new features, designed to make computing more robust and reliable, may reduce the raw speed of these older cards.


        Interesting view. I havent thought of that, but it truly does make sense.

        I suspect that any change in credit per hour is more likely to come from the driver change, and perhaps a slight variability in task runtime, than anything else.


        However, since my compute-time with the new 305.93 driver is now 3 times as long as with the old 276.52 driver - and I still get the same credit - I might consider downgrading to the old 276.52 driver.

        It would be nice if someone could clarify whether the credits truly is an accurate measure for the amount of work done. Put in another way:
        Am I doing the same amount of work i 6200 seconds using the 276.52 driver as I am doing in 19000 seconds using the 305.93 driver?
        ____________

        Richard Haselgrove
        Send message
        Joined: Dec 10 05
        Posts: 1341
        Credit: 29,867,497
        RAC: 11,853
        Message 118919 - Posted 31 Aug 2012 11:17:17 UTC - in response to Message 118918.

          Last modified: 31 Aug 2012 11:18:47 UTC

          The tasks and the application are the same. The credits are fixed at 500 per task by the project.

          The 3xx series drivers are the first to preview cuda5: we're running cuda3(.2) here. That may make a difference - it does sound as if for that card, and this project, the older drivers are more suitable.

          (304 and higher drivers for most other NVidia cards are still in Beta)

          Bjarke
          Send message
          Joined: Feb 10 06
          Posts: 5
          Credit: 303,755
          RAC: 199
          Message 118920 - Posted 31 Aug 2012 11:44:29 UTC - in response to Message 118919.

            Thanks. I will definitely downgrade to the 276.52 driver then.
            ____________

            Profile Bikeman (Heinz-Bernd Eggenstein)
            Forum moderator
            Project administrator
            Project developer
            Avatar
            Send message
            Joined: Aug 28 06
            Posts: 3229
            Credit: 75,280,288
            RAC: 57,237
            Message 118926 - Posted 31 Aug 2012 15:23:18 UTC

              Last modified: 31 Aug 2012 15:24:59 UTC

              Hi!

              Thanks for reporting this. As Richard has already mentioned, the two apps in question are, in fact, identical, byte-by-byte.

              We want to support a wide range of users , even those having older drivers and hardware. However,if the penalty to use older runtime and libraries is getting too severe with newer drivers, we will be forced to have additional versions targeting newer CUDA versions soon. Note that with all the variations in operating system (Win, Linux, OSX), word length (64 bit, 32 bit) and CPU/GPU type (SSE, NVIDIA , ATI/AMD) , we now have a dozen or so different executables for the BRP4 search alone.

              Oliver has done some marvelous work on automating our continuous integration environment, so we are in principle prepared to support even more variants if really neaded.

              Stay tuned.

              HB
              ____________

              Claggy
              Send message
              Joined: Dec 29 06
              Posts: 429
              Credit: 965,681
              RAC: 659
              Message 118933 - Posted 31 Aug 2012 19:17:10 UTC - in response to Message 118926.

                Last modified: 31 Aug 2012 19:18:15 UTC

                My testing of the Lunatics x41z Cuda apps for Setiathome on legacy hardware also showed a slowdown on Cuda 5 preview drivers, (OpenCL Astropulse times also increased):

                x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 v6 Bench on the 9800GTX+ (Win Vista x64, 301.42 (Cuda42 drivers)):

                WU : PG1327.wu
                Lunatics_x41z_win32_cuda22.exe -verb -nog :
                Elapsed 33.418 secs, speedup: 87.71% ratio: 8.14x
                CPU 8.315 secs, speedup: 96.95% ratio: 32.81x
                Lunatics_x41z_win32_cuda23.exe -verb -nog :
                Elapsed 33.467 secs, speedup: 87.69% ratio: 8.12x
                CPU 8.486 secs, speedup: 96.89% ratio: 32.15x
                Lunatics_x41z_win32_cuda32.exe -verb -nog :
                Elapsed 34.870 secs, speedup: 87.17% ratio: 7.80x
                CPU 8.908 secs, speedup: 96.74% ratio: 30.63x
                Lunatics_x41z_win32_cuda41.exe -verb -nog :
                Elapsed 34.253 secs, speedup: 87.40% ratio: 7.94x
                CPU 8.268 secs, speedup: 96.97% ratio: 33.00x
                Lunatics_x41z_win32_cuda42.exe -verb -nog :
                Elapsed 34.351 secs, speedup: 87.37% ratio: 7.91x
                CPU 8.658 secs, speedup: 96.83% ratio: 31.51x

                v6 MB Bench of x41z Cuda22, Cuda23, Cuda32, Cuda41 and Cuda42 on the 9800GTX+ (Win Vista x64, 304.48 Cuda 5 preview drivers):

                WU : PG1327.wu
                Lunatics_x41z_win32_cuda22.exe -verb -nog :
                Elapsed 39.856 secs, speedup: 85.34% ratio: 6.82x
                CPU 8.798 secs, speedup: 96.78% ratio: 31.01x
                Lunatics_x41z_win32_cuda23.exe -verb -nog :
                Elapsed 39.870 secs, speedup: 85.34% ratio: 6.82x
                CPU 8.564 secs, speedup: 96.86% ratio: 31.86x
                Lunatics_x41z_win32_cuda32.exe -verb -nog :
                Elapsed 50.052 secs, speedup: 81.59% ratio: 5.43x
                CPU 9.485 secs, speedup: 96.52% ratio: 28.77x
                Lunatics_x41z_win32_cuda41.exe -verb -nog :
                Elapsed 47.394 secs, speedup: 82.57% ratio: 5.74x
                CPU 8.034 secs, speedup: 97.06% ratio: 33.96x
                Lunatics_x41z_win32_cuda42.exe -verb -nog :
                Elapsed 47.363 secs, speedup: 82.58% ratio: 5.74x
                CPU 8.190 secs, speedup: 97.00% ratio: 33.31x

                Claggy

                Bjarke
                Send message
                Joined: Feb 10 06
                Posts: 5
                Credit: 303,755
                RAC: 199
                Message 119012 - Posted 4 Sep 2012 9:22:44 UTC

                  Last modified: 4 Sep 2012 9:25:26 UTC

                  I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.



                  Clearly using the lowest required CUDA version seems to speed up computations significantly.
                  ____________

                  Profile Bikeman (Heinz-Bernd Eggenstein)
                  Forum moderator
                  Project administrator
                  Project developer
                  Avatar
                  Send message
                  Joined: Aug 28 06
                  Posts: 3229
                  Credit: 75,280,288
                  RAC: 57,237
                  Message 119019 - Posted 4 Sep 2012 13:10:33 UTC - in response to Message 119012.

                    Last modified: 4 Sep 2012 13:12:11 UTC

                    I have now downgraded from the 305.93 driver (released 2012.08.28) to 267.66 (released 2011.03.21) which has only CUDA 3.2 as required by this project. I have summarised the results below.



                    Clearly using the lowest required CUDA version seems to speed up computations significantly.




                    Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.

                    Cheers
                    HB
                    ____________

                    Bjarke
                    Send message
                    Joined: Feb 10 06
                    Posts: 5
                    Credit: 303,755
                    RAC: 199
                    Message 119021 - Posted 4 Sep 2012 14:37:44 UTC - in response to Message 119019.


                      Note however that the task were computed using different versions of BRP4. That should not explain the huge slowdown for the 305.93 driver, but I'm not sure that the 267.66 driver is really faster than the 276.52 driver. Because newer drivers also mean bug fixes (ok, sometimes new bugs, too ... :-) ), I would be reluctant to go back too far in time.


                      You're right. But that truly does makes it hard to decide which driver to use. Not too old and flawed, not too new and slow...
                      ____________

                      Post to thread

                      Message boards : Cruncher's Corner : BRP4cuda32 vs. BRP4cuda32nv301 performance


                      Home · Your account · Message boards

                      This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

                      Copyright © 2014 Bruce Allen