CUDA App einsteinbinary 1.10 for Linux available for Beta Test


Advanced search

Message boards : Cruncher's Corner : CUDA App einsteinbinary 1.10 for Linux available for Beta Test

Sort
AuthorMessage
Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2034
ID: 2
Credit: 21,982,968
RAC: 41,653
Message 98831 - Posted 19 Aug 2009 15:38:22 UTC
Last modified: 19 Aug 2009 15:38:49 UTC

A new Einstein@home CUDA App for Linux is available for Beta Test at Beta Test Page.

We stumbled over some bugs in the CUDA part that might have caused some segfaults, so this is mainly a bugfix release. Also too the CPU part of the App now uses SSE, like in the .09 Beta Apps.

Please test and report, and please include important information (like the NVIdia Driver and Core Client version) in your posts.

BM

Andris Pavenis
Joined: Feb 24 05
Posts: 3
ID: 36157
Credit: 798,698
RAC: 1,073
Message 98874 - Posted 20 Aug 2009 20:13:18 UTC

Tried
- Fedora 11 x86_64
- 'rpm -qa kmod-nvidia' returns kmod-nvidia-185.18.14-1.fc11.3.x86_64

Crashes (see below). Shows 100% but does not stop. Had to abort workunit.

Also:
- Seti@HOME Beta (CUDA) crashes similarly but workunit finishes with failure
- GPUGRID - works OK

[22:53:40][14614][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:53:40][14614][INFO ] Starting data processing...
[22:53:40][14614][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:53:40][14614][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:53:40][14614][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:53:41][14614][INFO ] Seed for random number generator is 977043268.

[22:53:43][14614][ERROR] Application caught signal 11.

------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10cf4d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x1093ad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdd595b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xde8164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xdcde03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xdc7df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5128c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5132c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x4f333b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
[22:54:18][14706][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:54:18][14706][INFO ] Starting data processing...
[22:54:18][14706][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:54:18][14706][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:54:18][14706][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:54:19][14706][INFO ] Seed for random number generator is 977043268.

[22:54:20][14706][ERROR] Application caught signal 11.

------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e84d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x10acad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdee95b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xe01164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xde6e03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xde0df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25e8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25f2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x23f33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized

____________

SciTechGrid
Joined: Jan 18 05
Posts: 2
ID: 2698
Credit: 167,541
RAC: 1
Message 98934 - Posted 22 Aug 2009 20:04:52 UTC

WU starts running bot jumps straight to 100%, according to stderrout.txt there is a "File format not recognized"

[14:53:33][5101][INFO ] Application startup - thank you for supporting Einstein@Home!
[14:53:33][5101][INFO ] Starting data processing...
[14:53:33][5101][INFO ] Using CUDA device #0 "GeForce GTX 275" (1010.88 GFLOPS)
[14:53:33][5101][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[14:53:33][5101][INFO ] Header contents:
------> Original WAPP file: p2030_54162_45910_0042_G41.76+01.37.C_0.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54162.531365740739
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 190240.781394
------> DEC (J2000): 82856.412406
------> Galactic l: 41.758
------> Galactic b: 1.3809
------> Name: G41.76+01.37.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 357.8171
------> ZA at start: 9.8367
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 70.2 cm^-3 pc
------> Scale factor: 6877.66
[14:53:35][5101][INFO ] Seed for random number generator is -1164413432.

[14:53:37][5101][ERROR] Application caught signal 11.

------> Obtained 18 stack frames for this thread.
------> Backtrace:
Frame 18:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 17:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 16:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 15:
Binary file: /usr/lib32/libcuda.so (0xf7816791)
Frame 14:
Binary file: /usr/lib32/libcuda.so (0xf77e07ae)
Frame 13:
Binary file: /usr/lib32/libcuda.so (0xf77841f3)
Frame 12:
Binary file: /usr/lib32/libcuda.so (0xf7798794)
Frame 11:
Binary file: /usr/lib32/libcuda.so (0xf777a675)
Frame 10:
Binary file: /usr/lib32/libcuda.so (0xf7773992)
Frame 9:
Binary file: /usr/lib32/libcuda.so (0xf77d65bf)
Offset info: cuCtxCreate+0x4f
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3c8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3d2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e1d33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib32/libc.so.6 (0xf7b71775)
Offset info: __libc_start_main+0xe5
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
____________

Andris Pavenis
Joined: Feb 24 05
Posts: 3
ID: 36157
Credit: 798,698
RAC: 1,073
Message 98943 - Posted 23 Aug 2009 6:00:28 UTC

The message about file format not recognized is only after the real problem:

[14:53:37][5101][ERROR] Application caught signal 11.

So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.

Andris



____________

SciTechGrid
Joined: Jan 18 05
Posts: 2
ID: 2698
Credit: 167,541
RAC: 1
Message 98968 - Posted 24 Aug 2009 10:41:30 UTC

I'm running 64bit Ubuntu

____________

Profile Olaf
Joined: Sep 16 06
Posts: 9
ID: 215241
Credit: 3,586,495
RAC: 17,829
Message 98984 - Posted 24 Aug 2009 17:18:34 UTC - in response to Message 98831.

It works without problems on my notebook (Intel Core2 Duo P7350 +
GeForce 9650M GT, driver 180.22-2).
It is about 20-30% faster for one job than with the CPU only.
Sometimes it runs three jobs at once, two S5R5 and one ABP1 on CPU+GPU.
Note, that there are only two CPUs and one GPU. In such a case one of
the CPUs seems to run one S5R5 and the ABP1 together with the GPU at the
same time - is this intended?

jstarek
Joined: Jan 21 08
Posts: 3
ID: 306144
Credit: 57,777
RAC: 185
Message 99062 - Posted 27 Aug 2009 18:42:54 UTC

Unfortunately, I get no work for the new application. BOINC reports:

Message from Server: (Project has no jobs available)


What seems strange is, while I get the message "Found app_info.xml; using anonymous platform" as described in the installation instructions, several lines further down I get "Can't load libcudart".

I am sure that the CUDA libs are in the search path: After re-running ldconfig, everything is visible to the binary:

bash-4.0# ldd einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda
linux-gate.so.1 => (0xb7fc5000)
libcufft.so.2 => /usr/local/lib/cuda/libcufft.so.2 (0xb7e89000)
libcudart.so.2 => /usr/local/lib/cuda/libcudart.so.2 (0xb7e3e000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb7e25000)
libm.so.6 => /lib/libm.so.6 (0xb7dff000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7d0d000)
libc.so.6 => /lib/libc.so.6 (0xb7bc7000)
/lib/ld-linux.so.2 (0xb7fc6000)
libdl.so.2 => /lib/libdl.so.2 (0xb7bc2000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb7ba4000)
librt.so.1 => /lib/librt.so.1 (0xb7b9b000)


I have installed the CUDA application as described on the website, but moved the libs manually to /usr/local/lib/cuda/. The system is a current Arch Linux with NVidia drivers version 185.18.31-1, BOINC version 6.4.5 and the CUDA libs that are delivered with the Einstein Beta download. Hardware: Pentium M 2,1 GHz (i686, not a 64 bit architecture) and a simple GeForce 8400 GS.

Any ideas what's going on here?

Profile Gundolf Jahn
Joined: Mar 1 05
Posts: 364
ID: 43449
Credit: 156,767
RAC: 182
Message 99063 - Posted 27 Aug 2009 19:41:48 UTC - in response to Message 99062.

Unfortunately, I get no work for the new application. BOINC reports:

Message from Server: (Project has no jobs available)

I recently get that at every (successful) download:
27/08/2009 13:09:06|Einstein@Home|Scheduler request succeeded: got 1 new tasks
27/08/2009 13:09:06|Einstein@Home|[sched_ops_debug] Server version 607
27/08/2009 13:09:06|Einstein@Home|Message from server: (Project has no jobs available)

So, you have to read the log carefully ;-)

Gruß,
Gundolf

jstarek
Joined: Jan 21 08
Posts: 3
ID: 306144
Credit: 57,777
RAC: 185
Message 99064 - Posted 27 Aug 2009 20:11:10 UTC

Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?

Profile Gundolf Jahn
Joined: Mar 1 05
Posts: 364
ID: 43449
Credit: 156,767
RAC: 182
Message 99077 - Posted 28 Aug 2009 4:40:03 UTC - in response to Message 99064.

Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?

I only wanted to say that you can't trust the "no jobs available" message.

I'm sure that I don't run CUDA, since I don't have such a device :-)

Profile Mike Hewson
Forum moderator
Avatar
Joined: Dec 1 05
Posts: 1868
ID: 135571
Credit: 4,435,243
RAC: 5,182
Message 99078 - Posted 28 Aug 2009 4:50:18 UTC - in response to Message 99077.

I only wanted to say that you can't trust the "no jobs available" message.

Yeah ..... I see that frequently and yet the machine(s) is/are certainly not idle. I've been ignoring it since all is otherwise running fine.

Cheers, Mike.

____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Profile Olaf
Joined: Sep 16 06
Posts: 9
ID: 215241
Credit: 3,586,495
RAC: 17,829
Message 99102 - Posted 29 Aug 2009 9:39:02 UTC

Well, other computer and other card and it does not work.
Intel Core i7, GeForce GTX 260, driver 185.18.36.

It is only noted a CPU time of 1 or 2 seconds, a progress of 100%
and that this is running with 1 CPU and 1 CUDA. This remains for several
hours, currently about 15.
The other 7 CPUs work until the current tasks are finished, but no new
tasks are started.


Looks like the same or similar problem as SciTechGrid reported with a
corresponding stderrout.txt.

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,208,578
RAC: 1,928
Message 99239 - Posted 3 Sep 2009 17:20:02 UTC - in response to Message 98943.

The message about file format not recognized is only after the real problem:

[14:53:37][5101][ERROR] Application caught signal 11.

So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.

Andris





Same here (SUSE 11.1 64Bit, NVIDIA 190.* beta (sorry, did not get hold of 181.20 as suggested by Bernd), 9800GT Green).

Michael
____________
Team Linux Users Everywhere

Profile Ed1934158
Joined: Nov 10 04
Posts: 58
ID: 568
Credit: 4,433,606
RAC: 11,831
Message 99389 - Posted 12 Sep 2009 9:44:32 UTC - in response to Message 99239.
Last modified: 12 Sep 2009 10:06:03 UTC

myself wrote:

Ubuntu Jaunty 64bit, NVIDIA 9800GTX+, drivers 180.44.
Quote from log:
Sat 12 Sep 2009 10:36:31 AM CEST Einstein@Home [error] File einstein_S5R5_1.06_graphics_i686-pc-linux-gnu has wrong size: expected 3832104, got 12272855

It may be that I made a mistake... I'll try again when these workunits are finished, and report if something goes wrong.
Sorry.
____________

Andris Pavenis
Joined: Feb 24 05
Posts: 3
ID: 36157
Credit: 798,698
RAC: 1,073
Message 99394 - Posted 12 Sep 2009 17:36:59 UTC

I have not tested that with Einstein@HOME CUDA, but only Seti@HOME beta test binary for Linux. It behaves almost similarly (crashes and terminates, but does not stay hanging) when started from BOINC. When started in standalone mode directly it completes and the results were near, but not exactly as test results.
Perhaps the reason could be that 9800GX have only 32-bit floating point arithmetic AFAIK.

What is similar is that it also crashes when started from BOINC.

From other point of view GPU GRID does not crash in a similar way.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99401 - Posted 13 Sep 2009 8:17:55 UTC - in response to Message 99394.

I have not tested that with Einstein@HOME CUDA, but only Seti@HOME beta test binary for Linux. It behaves almost similarly (crashes and terminates, but does not stay hanging) when started from BOINC. When started in standalone mode directly it completes and the results were near, but not exactly as test results.
Perhaps the reason could be that 9800GX have only 32-bit floating point arithmetic AFAIK.

What is similar is that it also crashes when started from BOINC.

From other point of view GPU GRID does not crash in a similar way.


It must be related to the combination of driver and BOINC version: I'm running this app with driver version 180.44 under Suse Linux 64 bit and BOINC version 6.4.5 on a 9800 GT eco card. No problem at all:

http://einstein.phys.uwm.edu/result.php?resultid=139174127

CU
Bikeman
____________

Profile Ed1934158
Joined: Nov 10 04
Posts: 58
ID: 568
Credit: 4,433,606
RAC: 11,831
Message 99449 - Posted 15 Sep 2009 10:41:41 UTC
Last modified: 15 Sep 2009 11:18:41 UTC

I have Ubuntu 9.04 x64 and Gforce 9800GTX+, 180.44drivers version, 6.45 BOINC version (I have tried with the newest version of boinc the same thing happens). I get the errors:
http://einstein.phys.uwm.edu/result.php?resultid=139636948
http://einstein.phys.uwm.edu/result.php?resultid=139617514

Also when I was running gpugrid my graphic card got much hotter while running einstein@home it's like it's doing nothing. Everything in BOINC settings seems to work fine, it can see my graphics processor, but something must be wrong.
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99455 - Posted 15 Sep 2009 16:30:36 UTC - in response to Message 99449.

I have Ubuntu 9.04 x64 and Gforce 9800GTX+, 180.44drivers version, 6.45 BOINC version (I have tried with the newest version of boinc the same thing happens). I get the errors:
http://einstein.phys.uwm.edu/result.php?resultid=139636948
http://einstein.phys.uwm.edu/result.php?resultid=139617514

Also when I was running gpugrid my graphic card got much hotter while running einstein@home it's like it's doing nothing. Everything in BOINC settings seems to work fine, it can see my graphics processor, but something must be wrong.


Thanks for the report, this error seems to be an "interesting" one, I forwarded it to the devs. I don't think it's related to driver or libs, it coul dbe something "deeper" in the implementation.

CU
Bikeman



____________

Profile Ed1934158
Joined: Nov 10 04
Posts: 58
ID: 568
Credit: 4,433,606
RAC: 11,831
Message 99462 - Posted 15 Sep 2009 20:04:46 UTC - in response to Message 99455.

Thanks for the report, this error seems to be an "interesting" one, I forwarded it to the devs. I don't think it's related to driver or libs, it coul dbe something "deeper" in the implementation.

CU
Bikeman




I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.
____________

Profile Ed1934158
Joined: Nov 10 04
Posts: 58
ID: 568
Credit: 4,433,606
RAC: 11,831
Message 99490 - Posted 17 Sep 2009 9:46:30 UTC - in response to Message 99462.

I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.

Tried with new drivers and now I have a different problem. Unit came to 100% in 2 seconds but status stayed "Running (1.00 CPUs, 1 CUDA) and indeed one processor was not available for usage although it was idle.
These are really strange problems, I don't know if I can give you any data about this one.

____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99495 - Posted 17 Sep 2009 16:42:00 UTC - in response to Message 99490.

I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.

Tried with new drivers and now I have a different problem. Unit came to 100% in 2 seconds but status stayed "Running (1.00 CPUs, 1 CUDA) and indeed one processor was not available for usage although it was idle.
These are really strange problems, I don't know if I can give you any data about this one.


I now got one of these "error code 3" errors myself :

http://einstein.phys.uwm.edu/result.php?resultid=139848057

with a similar driver and BOINC configuration. The developers are looking into this. For my host most of the results work just fine, tho.

Regards
Bikeman

____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99515 - Posted 18 Sep 2009 21:15:48 UTC

Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.

CU
Bikeman
____________

Profile Ed1934158
Joined: Nov 10 04
Posts: 58
ID: 568
Credit: 4,433,606
RAC: 11,831
Message 99517 - Posted 19 Sep 2009 0:22:30 UTC - in response to Message 99515.

Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.

CU
Bikeman

I'm not sure what to think. I returned to gpugrid with 185.18.36 drivers and there everything is working fine. I guess I'll try again when I finish units that I have.

Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99520 - Posted 19 Sep 2009 3:05:17 UTC - in response to Message 99517.


Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?


There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.

The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.

CU
Bikeman
____________

ralph
Joined: Dec 11 08
Posts: 1
ID: 434814
Credit: 2,578
RAC: 0
Message 99522 - Posted 19 Sep 2009 6:37:39 UTC - in response to Message 99520.


Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?


There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.

The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.

CU
Bikeman


Gpugrid experienced a series of errors when the Nvidia Linux 185+ drivers came out. They managed a work around that solved the problem. It looks like the errors that are occurring here are similar in nature. People with 180 drivers can process the WUs but people with 185 or 190 drivers cannot.
The programmers may want to contact the Gpugrid people to see how they fixed their issue with the 185+ Linux Nvidia drivers.
I was able to process WUs with the 1.09 version of the application but the new 1.1 version goes to 100% immediately and stays stuck there. This is identical to the type of error that I used to experience with Gpugrid when the new Nvidia drivers were released.
Good luck in sorting out the problem.

Profile Stephan Goll
Joined: Dec 13 05
Posts: 7
ID: 147518
Credit: 745,270
RAC: 2,626
Message 99529 - Posted 19 Sep 2009 14:19:22 UTC

Dear Bernd,

I tried CUDA ... but I got only limited success. Only CUDA 2.3 will get detected on my computer, older nvidia driver will load, but the CUDA toolkit will not compile (2.1) or simply not work (2.2).

It's this little box:
http://einstein.phys.uwm.edu/show_host_detail.php?hostid=2069906
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5102101

Intel Atom 330, nVidia Ion chipset, 2 GB ram, debian 64, kernel 2.6.30 from debian backports, CUDA software from http://www.nvidia.com/object/cuda_get.html.

s@h wus seems to work, e@h wus will not even start.

19-Sep-2009 09:57:41 [---] Starting BOINC client version 6.6.36 for x86_64-pc-linux-gnu
19-Sep-2009 09:57:41 [---] log flags: task, file_xfer, sched_ops
19-Sep-2009 09:57:41 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
19-Sep-2009 09:57:41 [---] Running as a daemon
19-Sep-2009 09:57:41 [---] Data directory: /home/boinc
19-Sep-2009 09:57:41 [---] Processor: 4 GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz [Family 6 Model 28 Stepping 2]
19-Sep-2009 09:57:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm lahf_lm
19-Sep-2009 09:57:41 [---] OS: Linux: 2.6.30
19-Sep-2009 09:57:41 [---] Memory: 1.47 GB physical, 250.98 MB virtual
19-Sep-2009 09:57:41 [---] Disk: 4.58 GB total, 2.12 GB free
19-Sep-2009 09:57:41 [---] Local time is UTC +1 hours
19-Sep-2009 09:57:42 [---] CUDA device: ION (driver version 0, compute capability 1.1, 509MB, est. 6GFLOPS)
19-Sep-2009 09:57:42 [Einstein@Home] Found app_info.xml; using anonymous platform
19-Sep-2009 09:57:42 [---] Not using a proxy
19-Sep-2009 09:57:42 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 5102101; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 2069906; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] General prefs: from Einstein@Home (last modified 02-Mar-2007 22:05:08)
19-Sep-2009 09:57:42 [Einstein@Home] Computer location: home
19-Sep-2009 09:57:42 [---] General prefs: using separate prefs for home
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when active to 752.68MB
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when idle to 1354.82MB
19-Sep-2009 09:57:42 [---] Preferences limit disk usage to 2.29GB

Best regards,
Stephan
____________

Profile Jos van Wolput
Joined: Feb 11 05
Posts: 39
ID: 14928
Credit: 233,705
RAC: 458
Message 99592 - Posted 23 Sep 2009 10:48:16 UTC

I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?

____________

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,981
RAC: 2,347
Message 99593 - Posted 23 Sep 2009 11:07:32 UTC - in response to Message 99592.

I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?

No.

'CUDA' is specifically a trade name for the NVidia architecture.

Profile [AF>Linux>Fight] koubi
Joined: Sep 22 09
Posts: 1
ID: 473385
Credit: 5,498
RAC: 142
Message 99645 - Posted 26 Sep 2009 9:10:04 UTC

hello i tried cuda app 1.10:

sam 26 sep 2009 03:09:47 CEST Starting BOINC client version 6.10.4 for x86_64-pc-linux-gnu
sam 26 sep 2009 03:09:47 CEST log flags: task, file_xfer, sched_ops
sam 26 sep 2009 03:09:47 CEST Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
sam 26 sep 2009 03:09:47 CEST Data directory: /home/koubi/Desktop/BOINC
sam 26 sep 2009 03:09:47 CEST Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ [Family 15 Model 107 Stepping 2]
sam 26 sep 2009 03:09:47 CEST Processor: 512.00 KB cache
sam 26 sep 2009 03:09:47 CEST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefe
sam 26 sep 2009 03:09:47 CEST OS: Linux: 2.6.30.7
sam 26 sep 2009 03:09:47 CEST Memory: 3.86 GB physical, 956.93 MB virtual
sam 26 sep 2009 03:09:47 CEST Disk: 145.79 GB total, 16.66 GB free
sam 26 sep 2009 03:09:47 CEST Local time is UTC +2 hours
sam 26 sep 2009 03:09:47 CEST NVIDIA GPU 0: GeForce GTX 260 (driver version 0, CUDA version 2020, compute capability 1.3, 895MB, est. 117GFLOPS)
sam 26 sep 2009 03:09:47 CEST Can't load library libaticalrt.so
sam 26 sep 2009 03:09:47 CEST Einstein@Home Found app_info.xml; using anonymous platform


Task ID 140776480
Name p2030_53837_39307_0070_G63.81+00.12.C_6.dm_619_1
Workunit 59072881
Created 25 Sep 2009 1:57:53 UTC
Sent 25 Sep 2009 21:29:58 UTC
Received 26 Sep 2009 7:50:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2093030
Report deadline 9 Oct 2009 21:29:58 UTC
CPU time 15969.72
stderr out

<core_client_version>6.10.4</core_client_version>
<![CDATA[
<stderr_txt>
[23:30:19][21754][INFO ] Starting data processing...
[23:30:19][21754][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[23:30:19][21754][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:30:19][21754][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[23:30:21][21754][INFO ] Seed for random number generator is -1148624978.
[23:30:22][21754][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[23:31:19][21754][INFO ] Checkpoint committed!


[00:26:09][27652][INFO ] Starting data processing...
[00:26:09][27652][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[00:26:09][27652][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4375
[00:26:09][27652][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[00:26:11][27652][INFO ] Seed for random number generator is -1148624978.
[00:26:12][27652][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[00:27:10][27652][INFO ] Checkpoint committed!
[00:28:10][27652][INFO ] Checkpoint committed!
[00:29:11][27652][INFO ] Checkpoint committed!
[00:30:11][27652][INFO ] Checkpoint committed!
[01:15:26][32689][INFO ] Starting data processing...
[01:15:26][32689][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:15:26][32689][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4698
[01:15:26][32689][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:15:28][32689][INFO ] Seed for random number generator is -1148624978.
[01:15:29][32689][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:52:58][8338][INFO ] Starting data processing...
[01:52:58][8338][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:52:58][8338][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7081
[01:52:58][8338][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:53:00][8338][INFO ] Seed for random number generator is -1148624978.
[01:53:01][8338][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:53:58][8338][INFO ] Checkpoint committed!
[06:47:37][27973][INFO ] Starting data processing...
[06:47:37][27973][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[06:47:37][27973][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7157
[06:47:37][27973][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[06:47:39][27973][INFO ] Seed for random number generator is -1148624978.
[06:47:40][27973][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881

[09:50:13][27973][INFO ] Data processing finished successfully!
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 77.6536025819315
Granted credit 250
application version 1.10

gtx 260 216sp gpu is overclocked: core@756mhz memory@1096mhz shaders@1512mhz

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99654 - Posted 26 Sep 2009 15:12:52 UTC - in response to Message 99645.

hello i tried cuda app 1.10:



Looks good!!!
CU
Bikeman

____________

Profile Skip Da Shu
Avatar
Joined: Jan 18 05
Posts: 49
ID: 3628
Credit: 2,134,462
RAC: 2
Message 99775 - Posted 3 Oct 2009 6:16:54 UTC

I'm getting comp errors on between 1/3 and 1/2 of the WUs so far. Lowered the clock on the card a bit tonight so will see if that makes any diff.

Most recent invalid is HERE.

Linux, kernel 2.6.28, 64b, GTX-260 running 190.36 driver.
____________
- da shu @ the BOINC farm, SkipsJunk, Guru Mountain, Crunchers

Profile Gundolf Jahn
Joined: Mar 1 05
Posts: 364
ID: 43449
Credit: 156,767
RAC: 182
Message 99779 - Posted 3 Oct 2009 8:25:25 UTC - in response to Message 99775.
Last modified: 3 Oct 2009 8:29:43 UTC

Most recent invalid is HERE.

It finished with "Maximum elapsed time exceeded"

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99796 - Posted 4 Oct 2009 15:34:51 UTC - in response to Message 99779.
Last modified: 4 Oct 2009 15:35:13 UTC

Most recent invalid is HERE.

It finished with "Maximum elapsed time exceeded"

Gruß,
Gundolf


...which is strange because the runtime is anything but excessive.

Hmm...here's a question for the BOINC experts: how does BOINC decide, for a CUDA app, when the max. time is reached? I know that usually the Workunits will contain a estimated maximum number of floating point operations that processing the result could reasonably consume at worst. Now, with a CUDA app that takes 1 CPU plus 1 GPU in parallel, what does this tell BOINC? Will BOINC try to limit CPU time using the CPU benchmark, or will it use the estimated GPU performance (usually two orders of magnitudes greater than that of a single CPU core!!)??


Thanks for any insights

CU
Bikeman
____________

Profile Skip Da Shu
Avatar
Joined: Jan 18 05
Posts: 49
ID: 3628
Credit: 2,134,462
RAC: 2
Message 99800 - Posted 4 Oct 2009 20:07:07 UTC - in response to Message 99775.
Last modified: 4 Oct 2009 20:09:34 UTC

I'm getting comp errors on between 1/3 and 1/2 of the WUs so far. Lowered the clock on the card a bit tonight so will see if that makes any diff.

Most recent invalid is HERE.

Linux, kernel 2.6.28, 64b, GTX-260 running 190.36 driver.


I'm up over 50% failure rate now. Funny thing it doesn't seem to be the GPUgrid FFT error but a mix of things.

Think I need to put E@H on hold for a bit and see if the Collatz 64b Linux app works on this card.

UPDATE: Collatz seems to work once I figured out the right symlink to add. 2 of 2 since symlink fix and now under v6.10.11.
____________
- da shu @ the BOINC farm, SkipsJunk, Guru Mountain, Crunchers

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 99876 - Posted 8 Oct 2009 11:39:47 UTC
Last modified: 8 Oct 2009 11:40:58 UTC

Hi!

For those of you suffering from "maximum elapsed time exceeded" error mesages, a solution is discussed in the thread on the Windows CUDA App http://einstein.phys.uwm.edu/forum_thread.php?id=7539&nowrap=true#99856, it's the same problem for Linux.

For those suffering from "signal 11" (segmentation fault) crashes of their ABP1 CUDA apps with Linux drivers 185.x and 190.x, I stumbled over a possible workaround for the moment, but I'm a bit reluctant to share it here in public because I'd rather like to have this tested by one or two other volunteers before. If you are interested, drop me a PM.

CU
Bikeman
____________

Oliver
Project developer
Joined: Sep 4 07
Posts: 56
ID: 279320
Credit: 482,632
RAC: 1,135
Message 99973 - Posted 13 Oct 2009 10:20:00 UTC - in response to Message 99876.
Last modified: 13 Oct 2009 10:20:18 UTC

Hi,

For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.

Thanks,
Oliver

Profile Olaf
Joined: Sep 16 06
Posts: 9
ID: 215241
Credit: 3,586,495
RAC: 17,829
Message 99979 - Posted 13 Oct 2009 14:56:50 UTC - in response to Message 99973.

Hi,

For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.

Thanks,
Oliver



At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...

Oliver
Project developer
Joined: Sep 4 07
Posts: 56
ID: 279320
Credit: 482,632
RAC: 1,135
Message 100040 - Posted 16 Oct 2009 9:21:14 UTC - in response to Message 99979.
Last modified: 16 Oct 2009 9:23:48 UTC

Hi Olaf,


At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...


Interesting. Which 6.6.x BOINC version did you try? Did you use one of the official binary downloads or did you compile it yourself?

Cheers,
Oliver

Profile Olaf
Joined: Sep 16 06
Posts: 9
ID: 215241
Credit: 3,586,495
RAC: 17,829
Message 100047 - Posted 16 Oct 2009 17:20:56 UTC - in response to Message 100040.

Hi Olaf,

Interesting. Which 6.6.x BOINC version did you try? Did you use one of the official binary downloads or did you compile it yourself?

Cheers,
Oliver


Well I tried 6.6.36 as provided directly from BOINC on three computers
(two without GPU, one with GPU).
I did not install it over a previous version, just in a new directory as
for previous versions too to avoid possible problems with residuals from
previous versions.
Well, on two other computers I still use 5.10.21 because three computers use the
same home directory - and this seems not to work anymore with BOINC version 6,
but those computers have no GPU, therefore no need to update ;o)
What can be considered as stable or convenient for a normal user seems to
depend on the number and the arrangement of the computers used for BOINC at
the same time ;o)
I think, within Debian, version 6.4.5 is considered to be the current
experimental/unstable version, stable is 6.2.14.


However, what is considered as a stable or unstable NVIDIA driver at Debian,
was already to old for the GPU in the newest computer, therefore I had already
something to do to get the GPU work with the newest driver considered by
NVIDIA to be stable/worth to publish ;o)
Maybe there is a similar incompatibility with newer BOINC versions...

th3_1rzt
Joined: Aug 24 06
Posts: 208
ID: 210060
Credit: 1,950,700
RAC: 6,155
Message 100070 - Posted 17 Oct 2009 16:22:29 UTC - in response to Message 99979.
Last modified: 17 Oct 2009 16:22:43 UTC

Hi,

For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.

Thanks,
Oliver



At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...


I use the stable version of Debian Lenny and 6.6.36 doesnt have problem connecting to any projects. Installed it by coping the files from the official boinc download into /usr/bin, iirc.

Im still using the 2.6.26-1-686-bigmem kernel, i see you have 2.6.26-2-686-bigmem but i doubt thats why.
____________
Team Philippines

Profile Olaf
Joined: Sep 16 06
Posts: 9
ID: 215241
Credit: 3,586,495
RAC: 17,829
Message 100179 - Posted 25 Oct 2009 14:14:22 UTC - in response to Message 100070.

After some connection problems again I managed now to get BOINC 6.6.41 work
including CUDA for Intel Core i7, GeForce GTX 260, driver 185.18.36
(the mashine already mentioned above) - hopefully with useful results now ;o)

th3_1rzt
Joined: Aug 24 06
Posts: 208
ID: 210060
Credit: 1,950,700
RAC: 6,155
Message 100200 - Posted 27 Oct 2009 8:07:17 UTC - in response to Message 100179.

After some connection problems again I managed now to get BOINC 6.6.41 work
including CUDA for Intel Core i7, GeForce GTX 260, driver 185.18.36
(the mashine already mentioned above) - hopefully with useful results now ;o)

Your rig is very similar to mine (i7 + GTX260, 32bit Lenny w. PAE kernel), maybe i should give CUDA another try, will test 6.6.41 and 185.18.36 later then.
____________
Team Philippines

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,208,578
RAC: 1,928
Message 100331 - Posted 5 Nov 2009 15:21:55 UTC
Last modified: 5 Nov 2009 15:22:48 UTC

Hi all,

is it possible to run S6 application alongside CUDA with the app_info provided in the linux beta tar?

If not, can you tell me what to modify?

Michael

edit: S6 application means: "Hierarchical S5 all-sky GW search #6"
____________
Team Linux Users Everywhere

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 100336 - Posted 5 Nov 2009 21:46:19 UTC - in response to Message 100331.

Hi all,

is it possible to run S6 application alongside CUDA with the app_info provided in the linux beta tar?

If not, can you tell me what to modify?

Michael

edit: S6 application means: "Hierarchical S5 all-sky GW search #6"


Hi!

Sure, this app.xml should do (I assume you've already run out of S5R5 jobs)



<app_info>
<app>
<name>einstein_S5R6</name>
</app>
<file_info>
<name>einstein_S5R5_1.06_i686-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>einstein_S5R5_1.06_i686-pc-linux-gnu_0</name>
<executable/>
</file_info>
<file_info>
<name>einstein_S5R5_1.06_i686-pc-linux-gnu_1</name>
<executable/>
</file_info>
<file_info>
<name>einstein_S5R5_1.06_i686-pc-linux-gnu_2</name>
<executable/>
</file_info>
<file_info>
<name>einstein_S5R5_1.06_graphics_i686-pc-linux-gnu</name>
<executable/>
</file_info>

<app>
<name>einsteinbinary_ABP1</name>
</app>
<file_info>
<name>einsteinbinary_ABP1_1.10_graphics_i686-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda</name>
<executable/>
</file_info>
<file_info>
<name>libcudart.so.2</name>
<executable/>
</file_info>
<file_info>
<name>libcufft.so.2</name>
<executable/>
</file_info>

<app_version>
<app_name>einstein_S5R6</app_name>
<version_num>101</version_num>
<api_version>6.3.0</api_version>
<file_ref>
<file_name>einstein_S5R5_1.06_i686-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einstein_S5R5_1.06_i686-pc-linux-gnu_0</file_name>
</file_ref>
<file_ref>
<file_name>einstein_S5R5_1.06_i686-pc-linux-gnu_1</file_name>
</file_ref>
<file_ref>
<file_name>einstein_S5R5_1.06_i686-pc-linux-gnu_2</file_name>
</file_ref>
<file_ref>
<file_name>einstein_S5R5_1.06_graphics_i686-pc-linux-gnu</file_name>
<open_name>graphics_app</open_name>
</file_ref>
</app_version>

<app_version>
<app_name>einsteinbinary_ABP1</app_name>
<version_num>107</version_num>
<plan_class>cuda</plan_class>
<flops>3000000000.0</flops>
<avg_ncpus>1.0</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<api_version>6.7.0</api_version>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_graphics_i686-pc-linux-gnu</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>einsteinbinary_ABP1</app_name>
<version_num>109</version_num>
<plan_class>cuda</plan_class>
<flops>3000000000.0</flops>
<avg_ncpus>1.0</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<api_version>6.7.0</api_version>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_graphics_i686-pc-linux-gnu</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>einsteinbinary_ABP1</app_name>
<version_num>110</version_num>
<plan_class>cuda</plan_class>
<flops>3000000000.0</flops>
<avg_ncpus>1.0</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<api_version>6.7.0</api_version>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>einsteinbinary_ABP1_1.10_graphics_i686-pc-linux-gnu</file_name>
<open_name>graphics_app</open_name>
</file_ref>
<file_ref>
<file_name>libcudart.so.2</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.2</file_name>
</file_ref>
</app_version>

</app_info>




The rationale for the additional <flops> tag (which is unrelated to the S5R6 change) is stated here: http://einstein.phys.uwm.edu/forum_thread.php?id=7539&nowrap=true#99856.

Hope this works for you, please let us know.

Bikeman
____________

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,208,578
RAC: 1,928
Message 100347 - Posted 6 Nov 2009 7:06:34 UTC - in response to Message 100336.
Last modified: 6 Nov 2009 7:08:54 UTC

Hi Bikeman,

sure, all S5R5 jobs are long gone. Thanks for posting app_info, never did this before, because it was never really necessary.

Tried CUDA app. (again) without app_info, because your post was a little late.

I still get the same error messages, I posted last time. (BOINC 6.6.41; NVIDIA 190.42). GPUGRID is not working too. I suspect the card itself might be broken (fan does not speed up, only while booting) or the OS (Suse 11.1 seems to have a problem, read that on GPUGRID boards)

So no CUDA for me ATM :(


Michael

PS [edit] I read somewhere (NVIDIA rel.-notes?), that it is possible to adjust fan speed using NVIDIA GUI, if Cooolbits is enabled. But I did not find it.


Hi!

Sure, this app.xml should do (I assume you've already run out of S5R5 jobs)



<app_info>

--snip--


</app_info>



The rationale for the additional <flops> tag (which is unrelated to the S5R6 change) is stated here: http://einstein.phys.uwm.edu/forum_thread.php?id=7539&nowrap=true#99856.

Hope this works for you, please let us know.

Bikeman

____________
Team Linux Users Everywhere

ziegenmelker
Joined: Jun 27 05
Posts: 306
ID: 91349
Credit: 3,290,806
RAC: 11,411
Message 100349 - Posted 6 Nov 2009 11:03:04 UTC - in response to Message 100347.
Last modified: 6 Nov 2009 11:08:51 UTC

PS [edit] I read somewhere (NVIDIA rel.-notes?), that it is possible to adjust fan speed using NVIDIA GUI, if Cooolbits is enabled. But I did not find it.

Maybe this will help. I didn't try it myself, because my card doesn't support cuda:
http://aldeby.org/blog/index.php/enable-nvidia-coolbits-frequency-tuner.html

Edit: Another link:
http://www.linuxhardware.org/nvclock/
____________

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,083,882
RAC: 9,728
Message 100350 - Posted 6 Nov 2009 11:20:12 UTC

As for CUDA app and 190 drivers, you might want to try the latest 6.10.x Boinc version.

CU
Bikeman
____________

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,208,578
RAC: 1,928
Message 100351 - Posted 6 Nov 2009 11:47:20 UTC
Last modified: 6 Nov 2009 11:47:58 UTC

Hi all.

Maybe this will help. I didn't try it myself, because my card doesn't support cuda:
http://aldeby.org/blog/index.php/enable-nvidia-coolbits-frequency-tuner.html

Edit: Another link:
http://www.linuxhardware.org/nvclock/


Thanks, but I thought it is possible via NVIDIA GUI. Overclocking options are available, but nothing about fans.

Its buried in NVIDIA release-notes:

"Added support for configuring the GPU's fan speed; see the "Coolbits" X configuration option in the README."

As for CUDA app and 190 drivers, you might want to try the latest 6.10.x Boinc version.


I will give it a try, next time I have physical access to the machine in question.

Michael
____________
Team Linux Users Everywhere

DanNeely
Joined: Sep 4 05
Posts: 782
ID: 106636
Credit: 4,562,302
RAC: 8,942
Message 100369 - Posted 8 Nov 2009 1:18:03 UTC

Are the 1.91 drivers available for linux yet? They fixed several CUDA issues on my windows box.
____________

Michael Karlinsky
Avatar
Joined: Jan 22 05
Posts: 665
ID: 6887
Credit: 1,208,578
RAC: 1,928
Message 100381 - Posted 9 Nov 2009 8:41:26 UTC - in response to Message 100369.

Are the 1.91 drivers available for linux yet? They fixed several CUDA issues on my windows box.


Hi Dan,

no.

You can check NVIDIA site.

Michael
____________
Team Linux Users Everywhere

ML1
Joined: Feb 20 05
Posts: 154
ID: 24273
Credit: 2,789,634
RAC: 3,868
Message 100443 - Posted 12 Nov 2009 18:45:29 UTC

What's the latest developments for this?

... beta test App package (S5R5 1.06 / ABP1 1.10 CUDA)

Is it under further development?
Working well?
Or still very much 'experimental'? And more debug results wanted?

Happy crunchin',
Martin

____________
Powered by Mandriva Linux A user friendly OS!
See the Boinc HELP Wiki

Message boards : Cruncher's Corner : CUDA App einsteinbinary 1.10 for Linux available for Beta Test


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration