Bernd Machenschalk Forum moderator Project developer
Joined: Oct 15 04 Posts: 2034 ID: 2 Credit: 21,982,968 RAC: 41,653
A new Einstein@home CUDA App for Linux is available for Beta Test at Beta Test Page.
We stumbled over some bugs in the CUDA part that might have caused some segfaults, so this is mainly a bugfix release. Also too the CPU part of the App now uses SSE, like in the .09 Beta Apps.
Please test and report, and please include important information (like the NVIdia Driver and Core Client version) in your posts.
BM
ID: 98831 |
Andris Pavenis
Joined: Feb 24 05 Posts: 3 ID: 36157 Credit: 798,698 RAC: 1,073
Crashes (see below). Shows 100% but does not stop. Had to abort workunit.
Also:
- Seti@HOME Beta (CUDA) crashes similarly but workunit finishes with failure
- GPUGRID - works OK
[22:53:40][14614][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:53:40][14614][INFO ] Starting data processing...
[22:53:40][14614][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:53:40][14614][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:53:40][14614][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:53:41][14614][INFO ] Seed for random number generator is 977043268.
[22:53:43][14614][ERROR] Application caught signal 11.
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
[22:54:18][14706][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:54:18][14706][INFO ] Starting data processing...
[22:54:18][14706][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:54:18][14706][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:54:18][14706][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:54:19][14706][INFO ] Seed for random number generator is 977043268.
[22:54:20][14706][ERROR] Application caught signal 11.
WU starts running bot jumps straight to 100%, according to stderrout.txt there is a "File format not recognized"
[14:53:33][5101][INFO ] Application startup - thank you for supporting Einstein@Home!
[14:53:33][5101][INFO ] Starting data processing...
[14:53:33][5101][INFO ] Using CUDA device #0 "GeForce GTX 275" (1010.88 GFLOPS)
[14:53:33][5101][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[14:53:33][5101][INFO ] Header contents:
------> Original WAPP file: p2030_54162_45910_0042_G41.76+01.37.C_0.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54162.531365740739
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 190240.781394
------> DEC (J2000): 82856.412406
------> Galactic l: 41.758
------> Galactic b: 1.3809
------> Name: G41.76+01.37.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 357.8171
------> ZA at start: 9.8367
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 70.2 cm^-3 pc
------> Scale factor: 6877.66
[14:53:35][5101][INFO ] Seed for random number generator is -1164413432.
[14:53:37][5101][ERROR] Application caught signal 11.
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
____________
ID: 98934 |
Andris Pavenis
Joined: Feb 24 05 Posts: 3 ID: 36157 Credit: 798,698 RAC: 1,073
The message about file format not recognized is only after the real problem:
[14:53:37][5101][ERROR] Application caught signal 11.
So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.
It works without problems on my notebook (Intel Core2 Duo P7350 +
GeForce 9650M GT, driver 180.22-2).
It is about 20-30% faster for one job than with the CPU only.
Sometimes it runs three jobs at once, two S5R5 and one ABP1 on CPU+GPU.
Note, that there are only two CPUs and one GPU. In such a case one of
the CPUs seems to run one S5R5 and the ABP1 together with the GPU at the
same time - is this intended?
Unfortunately, I get no work for the new application. BOINC reports:
Message from Server: (Project has no jobs available)
What seems strange is, while I get the message "Found app_info.xml; using anonymous platform" as described in the installation instructions, several lines further down I get "Can't load libcudart".
I am sure that the CUDA libs are in the search path: After re-running ldconfig, everything is visible to the binary:
I have installed the CUDA application as described on the website, but moved the libs manually to /usr/local/lib/cuda/. The system is a current Arch Linux with NVidia drivers version 185.18.31-1, BOINC version 6.4.5 and the CUDA libs that are delivered with the Einstein Beta download. Hardware: Pentium M 2,1 GHz (i686, not a 64 bit architecture) and a simple GeForce 8400 GS.
Unfortunately, I get no work for the new application. BOINC reports:
Message from Server: (Project has no jobs available)
I recently get that at every (successful) download:
27/08/2009 13:09:06|Einstein@Home|Scheduler request succeeded: got 1 new tasks
27/08/2009 13:09:06|Einstein@Home|[sched_ops_debug] Server version 607
27/08/2009 13:09:06|Einstein@Home|Message from server: (Project has no jobs available)
Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?
Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?
I only wanted to say that you can't trust the "no jobs available" message.
I'm sure that I don't run CUDA, since I don't have such a device :-)
ID: 99077 |
Mike Hewson Forum moderator
Joined: Dec 1 05 Posts: 1868 ID: 135571 Credit: 4,435,243 RAC: 5,182
I only wanted to say that you can't trust the "no jobs available" message.
Yeah ..... I see that frequently and yet the machine(s) is/are certainly not idle. I've been ignoring it since all is otherwise running fine.
Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Well, other computer and other card and it does not work.
Intel Core i7, GeForce GTX 260, driver 185.18.36.
It is only noted a CPU time of 1 or 2 seconds, a progress of 100%
and that this is running with 1 CPU and 1 CUDA. This remains for several
hours, currently about 15.
The other 7 CPUs work until the current tasks are finished, but no new
tasks are started.
Looks like the same or similar problem as SciTechGrid reported with a
corresponding stderrout.txt.
ID: 99102 |
Michael Karlinsky
Joined: Jan 22 05 Posts: 665 ID: 6887 Credit: 1,208,578 RAC: 1,928
The message about file format not recognized is only after the real problem:
[14:53:37][5101][ERROR] Application caught signal 11.
So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.
Andris
Same here (SUSE 11.1 64Bit, NVIDIA 190.* beta (sorry, did not get hold of 181.20 as suggested by Bernd), 9800GT Green).
Ubuntu Jaunty 64bit, NVIDIA 9800GTX+, drivers 180.44.
Quote from log:
Sat 12 Sep 2009 10:36:31 AM CEST Einstein@Home [error] File einstein_S5R5_1.06_graphics_i686-pc-linux-gnu has wrong size: expected 3832104, got 12272855
It may be that I made a mistake... I'll try again when these workunits are finished, and report if something goes wrong.
Sorry.
____________
ID: 99389 |
Andris Pavenis
Joined: Feb 24 05 Posts: 3 ID: 36157 Credit: 798,698 RAC: 1,073
I have not tested that with Einstein@HOME CUDA, but only Seti@HOME beta test binary for Linux. It behaves almost similarly (crashes and terminates, but does not stay hanging) when started from BOINC. When started in standalone mode directly it completes and the results were near, but not exactly as test results.
Perhaps the reason could be that 9800GX have only 32-bit floating point arithmetic AFAIK.
What is similar is that it also crashes when started from BOINC.
From other point of view GPU GRID does not crash in a similar way.
____________
ID: 99394 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
I have not tested that with Einstein@HOME CUDA, but only Seti@HOME beta test binary for Linux. It behaves almost similarly (crashes and terminates, but does not stay hanging) when started from BOINC. When started in standalone mode directly it completes and the results were near, but not exactly as test results.
Perhaps the reason could be that 9800GX have only 32-bit floating point arithmetic AFAIK.
What is similar is that it also crashes when started from BOINC.
From other point of view GPU GRID does not crash in a similar way.
It must be related to the combination of driver and BOINC version: I'm running this app with driver version 180.44 under Suse Linux 64 bit and BOINC version 6.4.5 on a 9800 GT eco card. No problem at all:
Also when I was running gpugrid my graphic card got much hotter while running einstein@home it's like it's doing nothing. Everything in BOINC settings seems to work fine, it can see my graphics processor, but something must be wrong.
____________
ID: 99449 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
Also when I was running gpugrid my graphic card got much hotter while running einstein@home it's like it's doing nothing. Everything in BOINC settings seems to work fine, it can see my graphics processor, but something must be wrong.
Thanks for the report, this error seems to be an "interesting" one, I forwarded it to the devs. I don't think it's related to driver or libs, it coul dbe something "deeper" in the implementation.
Thanks for the report, this error seems to be an "interesting" one, I forwarded it to the devs. I don't think it's related to driver or libs, it coul dbe something "deeper" in the implementation.
CU
Bikeman
I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.
____________
I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.
Tried with new drivers and now I have a different problem. Unit came to 100% in 2 seconds but status stayed "Running (1.00 CPUs, 1 CUDA) and indeed one processor was not available for usage although it was idle.
These are really strange problems, I don't know if I can give you any data about this one.
____________
ID: 99490 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
I've finished few more units, the same thing happens.
I'll try to upgrade drivers to version 185.18.36. And than see what will happen.
Tried with new drivers and now I have a different problem. Unit came to 100% in 2 seconds but status stayed "Running (1.00 CPUs, 1 CUDA) and indeed one processor was not available for usage although it was idle.
These are really strange problems, I don't know if I can give you any data about this one.
I now got one of these "error code 3" errors myself :
with a similar driver and BOINC configuration. The developers are looking into this. For my host most of the results work just fine, tho.
Regards
Bikeman
____________
ID: 99495 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.
Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.
CU
Bikeman
I'm not sure what to think. I returned to gpugrid with 185.18.36 drivers and there everything is working fine. I guess I'll try again when I finish units that I have.
Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)
And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?
ID: 99517 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)
And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?
There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.
The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.
CU
Bikeman
____________
ID: 99520 |
ralph
Joined: Dec 11 08 Posts: 1 ID: 434814 Credit: 2,578 RAC: 0
Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)
And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?
There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.
The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.
CU
Bikeman
Gpugrid experienced a series of errors when the Nvidia Linux 185+ drivers came out. They managed a work around that solved the problem. It looks like the errors that are occurring here are similar in nature. People with 180 drivers can process the WUs but people with 185 or 190 drivers cannot.
The programmers may want to contact the Gpugrid people to see how they fixed their issue with the 185+ Linux Nvidia drivers.
I was able to process WUs with the 1.09 version of the application but the new 1.1 version goes to 100% immediately and stays stuck there. This is identical to the type of error that I used to experience with Gpugrid when the new Nvidia drivers were released.
Good luck in sorting out the problem.
ID: 99522 |
Stephan Goll
Joined: Dec 13 05 Posts: 7 ID: 147518 Credit: 745,270 RAC: 2,626
Dear Bernd,
I tried CUDA ... but I got only limited success. Only CUDA 2.3 will get detected on my computer, older nvidia driver will load, but the CUDA toolkit will not compile (2.1) or simply not work (2.2).
It's this little box:
http://einstein.phys.uwm.edu/show_host_detail.php?hostid=2069906
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5102101
Intel Atom 330, nVidia Ion chipset, 2 GB ram, debian 64, kernel 2.6.30 from debian backports, CUDA software from http://www.nvidia.com/object/cuda_get.html.
s@h wus seems to work, e@h wus will not even start.
19-Sep-2009 09:57:41 [---] Starting BOINC client version 6.6.36 for x86_64-pc-linux-gnu
19-Sep-2009 09:57:41 [---] log flags: task, file_xfer, sched_ops
19-Sep-2009 09:57:41 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
19-Sep-2009 09:57:41 [---] Running as a daemon
19-Sep-2009 09:57:41 [---] Data directory: /home/boinc
19-Sep-2009 09:57:41 [---] Processor: 4 GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz [Family 6 Model 28 Stepping 2]
19-Sep-2009 09:57:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm lahf_lm
19-Sep-2009 09:57:41 [---] OS: Linux: 2.6.30
19-Sep-2009 09:57:41 [---] Memory: 1.47 GB physical, 250.98 MB virtual
19-Sep-2009 09:57:41 [---] Disk: 4.58 GB total, 2.12 GB free
19-Sep-2009 09:57:41 [---] Local time is UTC +1 hours
19-Sep-2009 09:57:42 [---] CUDA device: ION (driver version 0, compute capability 1.1, 509MB, est. 6GFLOPS)
19-Sep-2009 09:57:42 [Einstein@Home] Found app_info.xml; using anonymous platform
19-Sep-2009 09:57:42 [---] Not using a proxy
19-Sep-2009 09:57:42 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 5102101; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 2069906; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] General prefs: from Einstein@Home (last modified 02-Mar-2007 22:05:08)
19-Sep-2009 09:57:42 [Einstein@Home] Computer location: home
19-Sep-2009 09:57:42 [---] General prefs: using separate prefs for home
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when active to 752.68MB
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when idle to 1354.82MB
19-Sep-2009 09:57:42 [---] Preferences limit disk usage to 2.29GB
Best regards,
Stephan
____________
ID: 99529 |
Jos van Wolput
Joined: Feb 11 05 Posts: 39 ID: 14928 Credit: 233,705 RAC: 458
I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?
____________
ID: 99592 |
Richard Haselgrove
Joined: Dec 10 05 Posts: 579 ID: 144054 Credit: 2,965,981 RAC: 2,347
I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?
No.
'CUDA' is specifically a trade name for the NVidia architecture.
sam 26 sep 2009 03:09:47 CEST Starting BOINC client version 6.10.4 for x86_64-pc-linux-gnu
sam 26 sep 2009 03:09:47 CEST log flags: task, file_xfer, sched_ops
sam 26 sep 2009 03:09:47 CEST Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
sam 26 sep 2009 03:09:47 CEST Data directory: /home/koubi/Desktop/BOINC
sam 26 sep 2009 03:09:47 CEST Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ [Family 15 Model 107 Stepping 2]
sam 26 sep 2009 03:09:47 CEST Processor: 512.00 KB cache
sam 26 sep 2009 03:09:47 CEST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefe
sam 26 sep 2009 03:09:47 CEST OS: Linux: 2.6.30.7
sam 26 sep 2009 03:09:47 CEST Memory: 3.86 GB physical, 956.93 MB virtual
sam 26 sep 2009 03:09:47 CEST Disk: 145.79 GB total, 16.66 GB free
sam 26 sep 2009 03:09:47 CEST Local time is UTC +2 hours
sam 26 sep 2009 03:09:47 CEST NVIDIA GPU 0: GeForce GTX 260 (driver version 0, CUDA version 2020, compute capability 1.3, 895MB, est. 117GFLOPS)
sam 26 sep 2009 03:09:47 CEST Can't load library libaticalrt.so
sam 26 sep 2009 03:09:47 CEST Einstein@Home Found app_info.xml; using anonymous platform
Task ID 140776480
Name p2030_53837_39307_0070_G63.81+00.12.C_6.dm_619_1
Workunit 59072881
Created 25 Sep 2009 1:57:53 UTC
Sent 25 Sep 2009 21:29:58 UTC
Received 26 Sep 2009 7:50:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2093030
Report deadline 9 Oct 2009 21:29:58 UTC
CPU time 15969.72
stderr out
<core_client_version>6.10.4</core_client_version>
<![CDATA[
<stderr_txt>
[23:30:19][21754][INFO ] Starting data processing...
[23:30:19][21754][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[23:30:19][21754][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:30:19][21754][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[23:30:21][21754][INFO ] Seed for random number generator is -1148624978.
[23:30:22][21754][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[23:31:19][21754][INFO ] Checkpoint committed!
[00:26:09][27652][INFO ] Starting data processing...
[00:26:09][27652][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[00:26:09][27652][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4375
[00:26:09][27652][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[00:26:11][27652][INFO ] Seed for random number generator is -1148624978.
[00:26:12][27652][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[00:27:10][27652][INFO ] Checkpoint committed!
[00:28:10][27652][INFO ] Checkpoint committed!
[00:29:11][27652][INFO ] Checkpoint committed!
[00:30:11][27652][INFO ] Checkpoint committed!
[01:15:26][32689][INFO ] Starting data processing...
[01:15:26][32689][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:15:26][32689][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4698
[01:15:26][32689][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:15:28][32689][INFO ] Seed for random number generator is -1148624978.
[01:15:29][32689][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:52:58][8338][INFO ] Starting data processing...
[01:52:58][8338][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:52:58][8338][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7081
[01:52:58][8338][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:53:00][8338][INFO ] Seed for random number generator is -1148624978.
[01:53:01][8338][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:53:58][8338][INFO ] Checkpoint committed!
[06:47:37][27973][INFO ] Starting data processing...
[06:47:37][27973][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[06:47:37][27973][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7157
[06:47:37][27973][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[06:47:39][27973][INFO ] Seed for random number generator is -1148624978.
[06:47:40][27973][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[09:50:13][27973][INFO ] Data processing finished successfully!
called boinc_finish
</stderr_txt>
]]>
Validate state Valid
Claimed credit 77.6536025819315
Granted credit 250
application version 1.10
gtx 260 216sp gpu is overclocked: core@756mhz memory@1096mhz shaders@1512mhz
ID: 99645 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
hello i tried cuda app 1.10:
Looks good!!!
CU
Bikeman
____________
ID: 99654 |
Skip Da Shu
Joined: Jan 18 05 Posts: 49 ID: 3628 Credit: 2,134,462 RAC: 2
I'm getting comp errors on between 1/3 and 1/2 of the WUs so far. Lowered the clock on the card a bit tonight so will see if that makes any diff.
...which is strange because the runtime is anything but excessive.
Hmm...here's a question for the BOINC experts: how does BOINC decide, for a CUDA app, when the max. time is reached? I know that usually the Workunits will contain a estimated maximum number of floating point operations that processing the result could reasonably consume at worst. Now, with a CUDA app that takes 1 CPU plus 1 GPU in parallel, what does this tell BOINC? Will BOINC try to limit CPU time using the CPU benchmark, or will it use the estimated GPU performance (usually two orders of magnitudes greater than that of a single CPU core!!)??
Thanks for any insights
CU
Bikeman
____________
ID: 99796 |
Skip Da Shu
Joined: Jan 18 05 Posts: 49 ID: 3628 Credit: 2,134,462 RAC: 2
I'm getting comp errors on between 1/3 and 1/2 of the WUs so far. Lowered the clock on the card a bit tonight so will see if that makes any diff.
I'm up over 50% failure rate now. Funny thing it doesn't seem to be the GPUgrid FFT error but a mix of things.
Think I need to put E@H on hold for a bit and see if the Collatz 64b Linux app works on this card.
UPDATE: Collatz seems to work once I figured out the right symlink to add. 2 of 2 since symlink fix and now under v6.10.11.
____________ - da shu @ the BOINC farm, SkipsJunk, Guru Mountain, Crunchers
ID: 99800 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
For those suffering from "signal 11" (segmentation fault) crashes of their ABP1 CUDA apps with Linux drivers 185.x and 190.x, I stumbled over a possible workaround for the moment, but I'm a bit reluctant to share it here in public because I'd rather like to have this tested by one or two other volunteers before. If you are interested, drop me a PM.
For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.
For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.
Thanks,
Oliver
At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...
At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...
Interesting. Which 6.6.x BOINC version did you try? Did you use one of the official binary downloads or did you compile it yourself?
Interesting. Which 6.6.x BOINC version did you try? Did you use one of the official binary downloads or did you compile it yourself?
Cheers,
Oliver
Well I tried 6.6.36 as provided directly from BOINC on three computers
(two without GPU, one with GPU).
I did not install it over a previous version, just in a new directory as
for previous versions too to avoid possible problems with residuals from
previous versions.
Well, on two other computers I still use 5.10.21 because three computers use the
same home directory - and this seems not to work anymore with BOINC version 6,
but those computers have no GPU, therefore no need to update ;o)
What can be considered as stable or convenient for a normal user seems to
depend on the number and the arrangement of the computers used for BOINC at
the same time ;o)
I think, within Debian, version 6.4.5 is considered to be the current
experimental/unstable version, stable is 6.2.14.
However, what is considered as a stable or unstable NVIDIA driver at Debian,
was already to old for the GPU in the newest computer, therefore I had already
something to do to get the GPU work with the newest driver considered by
NVIDIA to be stable/worth to publish ;o)
Maybe there is a similar incompatibility with newer BOINC versions...
For all of you who report CUDA-related problems, please do always include the BOINC version you are using. Please make sure that you use at least BOINC 6.6 because 6.4 still has several known CUDA-related issues.
Thanks,
Oliver
At least on my Debian 5 (stable) systems BOINC6.6 has the even worse issue,
that it does not connect to any project, therefore there seems to be no
newer useful version than 6.4 :-(
But this might be different with other distributions or experimental versions
of Debian or with the next stable version of Debian ...
I use the stable version of Debian Lenny and 6.6.36 doesnt have problem connecting to any projects. Installed it by coping the files from the official boinc download into /usr/bin, iirc.
Im still using the 2.6.26-1-686-bigmem kernel, i see you have 2.6.26-2-686-bigmem but i doubt thats why.
____________ Team Philippines
After some connection problems again I managed now to get BOINC 6.6.41 work
including CUDA for Intel Core i7, GeForce GTX 260, driver 185.18.36
(the mashine already mentioned above) - hopefully with useful results now ;o)
After some connection problems again I managed now to get BOINC 6.6.41 work
including CUDA for Intel Core i7, GeForce GTX 260, driver 185.18.36
(the mashine already mentioned above) - hopefully with useful results now ;o)
Your rig is very similar to mine (i7 + GTX260, 32bit Lenny w. PAE kernel), maybe i should give CUDA another try, will test 6.6.41 and 185.18.36 later then.
____________ Team Philippines
ID: 100200 |
Michael Karlinsky
Joined: Jan 22 05 Posts: 665 ID: 6887 Credit: 1,208,578 RAC: 1,928
Hi all,
is it possible to run S6 application alongside CUDA with the app_info provided in the linux beta tar?
If not, can you tell me what to modify?
Michael
edit: S6 application means: "Hierarchical S5 all-sky GW search #6"
____________
Team Linux Users Everywhere
ID: 100331 |
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
Hi all,
is it possible to run S6 application alongside CUDA with the app_info provided in the linux beta tar?
Michael Karlinsky
Joined: Jan 22 05 Posts: 665 ID: 6887 Credit: 1,208,578 RAC: 1,928
Hi Bikeman,
sure, all S5R5 jobs are long gone. Thanks for posting app_info, never did this before, because it was never really necessary.
Tried CUDA app. (again) without app_info, because your post was a little late.
I still get the same error messages, I posted last time. (BOINC 6.6.41; NVIDIA 190.42). GPUGRID is not working too. I suspect the card itself might be broken (fan does not speed up, only while booting) or the OS (Suse 11.1 seems to have a problem, read that on GPUGRID boards)
So no CUDA for me ATM :(
Michael
PS [edit] I read somewhere (NVIDIA rel.-notes?), that it is possible to adjust fan speed using NVIDIA GUI, if Cooolbits is enabled. But I did not find it.
Hi!
Sure, this app.xml should do (I assume you've already run out of S5R5 jobs)
PS [edit] I read somewhere (NVIDIA rel.-notes?), that it is possible to adjust fan speed using NVIDIA GUI, if Cooolbits is enabled. But I did not find it.
Bikeman Forum moderator Volunteer developer
Joined: Aug 28 06 Posts: 2056 ID: 210833 Credit: 5,083,882 RAC: 9,728
As for CUDA app and 190 drivers, you might want to try the latest 6.10.x Boinc version.
CU
Bikeman
____________
ID: 100350 |
Michael Karlinsky
Joined: Jan 22 05 Posts: 665 ID: 6887 Credit: 1,208,578 RAC: 1,928
Hi all.
Maybe this will help. I didn't try it myself, because my card doesn't support cuda:
http://aldeby.org/blog/index.php/enable-nvidia-coolbits-frequency-tuner.html
Edit: Another link:
http://www.linuxhardware.org/nvclock/
Thanks, but I thought it is possible via NVIDIA GUI. Overclocking options are available, but nothing about fans.
This material is based upon work supported by the National Science
Foundation (NSF) under Grant NSF-0200852 and by the Max Planck
Gesellschaft (MPG). Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the investigators
and do not necessarily reflect the views of the NSF or the MPG.