Posts by Claggy

1) Message boards : Problems and Bug Reports : Einstein@Home GPU/APU Application for AMD/ATI Graphics Cards: discussion thread (Message 137766)
Posted 4 days ago by Claggy
Thanks for your suggestions but unfortunately things haven't worked. Although I can get things working immediately after I install BOINC (only after installing the ATI drivers first), as soon as re-boot, it no longer sees the GPUs.

How about:

Debian/Ubuntu/Mint/Derivatives - GPU recognition fixes

Issues & Solutions:

ISSUE #1: BOINC starts before GDM can finish getting up and running. This makes the BOINC think the video card is absent. This can occur with both ATI and Nvidia cards but I first found it on Nvidia card equipped machines. If your GPU is recognized after you restart BOINC but doesn't find the card right after a reboot this is likely your problem. This can be a bit sporadic on some machines where it'll find it after one reboot but not another. If this is the only problem it'll find it every time after a restart of just BOINC.
sudo /etc/init.d/boinc-client restart

See history here: https://bugs.launchpad.net/ubuntu/+source/boinc/+bug/414244

FIX #1: My fix was to add a delay into the start-up script so that GDM/X has a chance to have drivers fully loaded. This solution inserts a 6 second delay in the boinc start-up script which allows ample time for the driver to be available on all machines tested to date.
(a)Edit the start-up script file /etc/init.d/boinc-client with sudo gedit /etc/init.d/boinc-client.
(b)Find this function code and add the line “sleep 6” where it's shown here:
start() { log_begin_msg "Starting $DESC: $NAME" if is_running; then log_progress_msg "already running" else sleep 6 start-stop-daemon --start --quiet --background --pidfile $PIDFILE \ --make-pidfile --user $BOINC_USER --chuid $BOINC_USER \ --chdir $BOINC_DIR --exec $BOINC_CLIENT -- $BOINC_OPTS fi log_end_msg 0 if [ "$SCHEDULE" = "1" ]; then schedule fi }


FIX #2: Modify the start-up sequence of the /etc/init.d/boinc-client script.
(a)I got this from gfarmerfr an ATI/Ubuntu user on the DNETC forums after telling him how I was putting the delay in. I haven't actually tried it. It worked for gfarmerfr and he knows what he's doing. I believe this was tested on v9.10. In many ways this is a better fix but I was already into the script for other reasons (adding fan speed control) so I stuck with my sleep 6 version.
(b)Execute the following two commands to move BOINC start-up to the end of start-up processing:
sudo update-rc.d -f boinc-client remove sudo update-rc.d boinc-client defaults 99


Claggy
2) Message boards : Problems and Bug Reports : Progress bars stuck (Message 137749)
Posted 5 days ago by Claggy
I'm sorry, but I'm at a complete loss as to what you're trying to tell me :-).

Since FVIT hasn't responded, he has probably decided that E@H behaviour is not to his liking. He hasn't downloaded any further work and I suspect he may not even be listening to any suggestions being made.

However, since you responded to my post specifically, can you please enlighten me as to what I'm missing?

Basically his first two tasks made progress with no restarts, and had he not aborted his 2nd task it would probably have finished O.K.

The third task he was obviously watching when it started, and didn't like it when Boinc showed progress, then jumped back to a zero when he caused the app to exit repeatily,
Basically if he's going to watch the kettle come to the boil, it's going to take longer than he expects, and he should go on an do something else, and not worry about it,
If he's got 'Leave tasks in memory while suspended?' set to no, if he interrupts progress repeatedly he's not going make much progress on any project apps.

The Boinc showing progress before an app checkpoints, is a curveball that the devs thought was a good idea, but it confuses volunteers.

Claggy


Could the reason it repeatedly restarts Boinc crunching be due to the "while process usage is less than" setting? Could his pc be doing something that forces Boinc to quit and resume multiple times is my question I guess?

No, It should stay in memory during 'Suspend work if CPU usage is above', Even GPU apps stay in memory during it,
(there is a bug with the current Stock ATI/AMD Seti v7 app, it should stop during benchmarks & 'Suspend work if CPU usage is above' occasions, But stay in memory,
the x41zc Cuda app does stop (and stay in memory), the ATI/AMD Seti v7 app doesn't respond)

Claggy
3) Message boards : Problems and Bug Reports : Progress bars stuck (Message 137736)
Posted 5 days ago by Claggy
I'm sorry, but I'm at a complete loss as to what you're trying to tell me :-).

Since FVIT hasn't responded, he has probably decided that E@H behaviour is not to his liking. He hasn't downloaded any further work and I suspect he may not even be listening to any suggestions being made.

However, since you responded to my post specifically, can you please enlighten me as to what I'm missing?

Basically his first two tasks made progress with no restarts, and had he not aborted his 2nd task it would probably have finished O.K.

The third task he was obviously watching when it started, and didn't like it when Boinc showed progress, then jumped back to a zero when he caused the app to exit repeatily,
Basically if he's going to watch the kettle come to the boil, it's going to take longer than he expects, and he should go on an do something else, and not worry about it,
If he's got 'Leave tasks in memory while suspended?' set to no, if he interrupts progress repeatedly he's not going make much progress on any project apps.

The Boinc showing progress before an app checkpoints, is a curveball that the devs thought was a good idea, but it confuses volunteers.

Claggy
4) Message boards : Problems and Bug Reports : Can't contact scheduler, not even to report tasks (Message 137720)
Posted 6 days ago by Claggy
Your computers are hidden and even though you have provided a hostID, that only lets me see that you have 73 'in progress' results. It doesn't give me a link to the scheduler contact logs which would give the last successful exchange between your host and the scheduler. It's a bit of a long shot but perhaps you could take a look and tell us if there is anything of interest in that log?

You can work that out:

http://einstein.phys.uwm.edu/host_sched_logs/11708/11708865

Claggy
5) Message boards : Problems and Bug Reports : Progress bars stuck (Message 137719)
Posted 6 days ago by Claggy
FernValleyIT's first task completed in one go without being restarted, all 262 checkpoints.

Their 2nd task got to Sky point 16/28 (checkpoint 15 of 29 odd) in one go without restarting, before being aborted.

Their 3rd task got about 80% to the first checkpoint before being restarted (During this period Boinc will estimate progress, and will count up second by second),
then restarted again very shortly afterwards, it then progressed all the way up to the 1st checkpoint (now Boinc will show the actual progress the app reports),
and 90% to the 2nd before being restarted again (until it reaches the 2nd checkpoint the progress will not change),
it then progressed for a short period before being aborted,

a watched pot never boils

07:17:07 (6364): [normal]: This Einstein@home App was built at: Aug 21 2014 20:46:05

07:17:07 (6364): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe'.
07:17:07 (6364): [debug]: 2.1e+015 fp, 4e+009 fp/s, 525461 s, 145h57m40s79
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat --outputfile results.cand.out --alpha 2.78568035923 --delta -1.01473177713 --skyRadius 1.983062e-03 --ldiBins 15 --f0start 16 --f0Band 32 --firstSkyPoint 1064 --numSkyPoints 28 --f1dot -9.22e-10 --f1dotBand 1e-12 --df1dot 5.302732178e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_1'
07:17:07 (6364): [debug]: Flags: i386 SSE GNUC X86 GNUX86
07:17:07 (6364): [debug]: Set up communication with graphics process.
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat
% Total amount of photon times: 30000
% Preparing toplist of length: 5
read_checkpoint(): Couldn't open file 'results.cand.out.cpt': No such file or directory (2)
% fft_size: 67108864 (0x4000000)
% Sky point 1/28
% Creating FFT plan.
% Starting semicoherent search over f0 and f1.
% nf1dots: 190 df1dot: 5.302732178e-015 f1dot_start: -9.22e-010 f1dot_band: 1e-012
.
.
etc
.
.
07:55:36 (2208): [normal]: This Einstein@home App was built at: Aug 21 2014 20:46:05

07:55:36 (2208): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe'.
07:55:37 (2208): [debug]: 2.1e+015 fp, 4e+009 fp/s, 525461 s, 145h57m40s79
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat --outputfile results.cand.out --alpha 2.78568035923 --delta -1.01473177713 --skyRadius 1.983062e-03 --ldiBins 15 --f0start 16 --f0Band 32 --firstSkyPoint 1064 --numSkyPoints 28 --f1dot -9.22e-10 --f1dotBand 1e-12 --df1dot 5.302732178e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_1'
07:55:37 (2208): [debug]: Flags: i386 SSE GNUC X86 GNUX86
07:55:37 (2208): [debug]: Set up communication with graphics process.
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat
% Total amount of photon times: 30000
% Preparing toplist of length: 5
read_checkpoint(): Couldn't open file 'results.cand.out.cpt': No such file or directory (2)
% fft_size: 67108864 (0x4000000)
% Sky point 1/28
% Creating FFT plan.
% Starting semicoherent search over f0 and f1.
% nf1dots: 190 df1dot: 5.302732178e-015 f1dot_start: -9.22e-010 f1dot_band: 1e-012
.
.
etc
.
.
08:27:40 (3200): [normal]: This Einstein@home App was built at: Aug 21 2014 20:46:05

08:27:40 (3200): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe'.
08:27:40 (3200): [debug]: 2.1e+015 fp, 4e+009 fp/s, 525461 s, 145h57m40s79
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat --outputfile results.cand.out --alpha 2.78568035923 --delta -1.01473177713 --skyRadius 1.983062e-03 --ldiBins 15 --f0start 16 --f0Band 32 --firstSkyPoint 1064 --numSkyPoints 28 --f1dot -9.22e-10 --f1dotBand 1e-12 --df1dot 5.302732178e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_1'
08:27:40 (3200): [debug]: Flags: i386 SSE GNUC X86 GNUX86
08:27:40 (3200): [debug]: Set up communication with graphics process.
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat
% Total amount of photon times: 30000
% Preparing toplist of length: 5
read_checkpoint(): Couldn't open file 'results.cand.out.cpt': No such file or directory (2)
% fft_size: 67108864 (0x4000000)
% Sky point 1/28
% Creating FFT plan.
% Starting semicoherent search over f0 and f1.
% nf1dots: 190 df1dot: 5.302732178e-015 f1dot_start: -9.22e-010 f1dot_band: 1e-012
.
.
etc
.
.
INFO: Major Windows version: 6
% checkpoint 1
% Sky point 2/28
% Starting semicoherent search over f0 and f1.
% nf1dots: 190 df1dot: 5.302732178e-015 f1dot_start: -9.22e-010 f1dot_band: 1e-012
.
.
etc
.
.
09:00:09 (2176): [normal]: This Einstein@home App was built at: Aug 21 2014 20:46:05

09:00:09 (2176): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe'.
09:00:09 (2176): [debug]: 2.1e+015 fp, 4e+009 fp/s, 525461 s, 145h57m40s79
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.04_windows_intelx86__FGRP4-SSE2.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat --outputfile results.cand.out --alpha 2.78568035923 --delta -1.01473177713 --skyRadius 1.983062e-03 --ldiBins 15 --f0start 16 --f0Band 32 --firstSkyPoint 1064 --numSkyPoints 28 --f1dot -9.22e-10 --f1dotBand 1e-12 --df1dot 5.302732178e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0085E_48.0_1064_-9.21e-10_1_1'
09:00:09 (2176): [debug]: Flags: i386 SSE GNUC X86 GNUX86
09:00:09 (2176): [debug]: Set up communication with graphics process.
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0085E.dat
% Total amount of photon times: 30000
% Preparing toplist of length: 5
% checkpoint read: skypoint 1
% fft_size: 67108864 (0x4000000)
% Sky point 2/28
% Creating FFT plan.
% Starting semicoherent search over f0 and f1.
% nf1dots: 190 df1dot: 5.302732178e-015 f1dot_start: -9.22e-010 f1dot_band: 1e-012


Claggy
6) Message boards : Problems and Bug Reports : Progress bars stuck (Message 137688)
Posted 6 days ago by Claggy
This means that when BOINC is set to change between applications every hour

That means Boinc may switch applications after an hour, not that it must. (The settings basically means 'don't switch applications unless it has run for at least an hour')

Boinc should also only switch applications once the app has checkpointed, If it doesn't then Boinc should continue running that app until it until either it does checkpoint, or the app completes.

Claggy
7) Message boards : Problems and Bug Reports : DCF disabled :-( (Message 137651)
Posted 8 days ago by Claggy
According to your last scheduler contact DCF is being used (The time is UTC, so that was over three hours ago):

http://einstein.phys.uwm.edu/host_sched_logs/11723/11723146
2015-01-18 12:03:15.9447 [PID=6562] Request: [USER#xxxxx] [HOST#11723146] [IP xxx.xxx.xxx.110] client 7.4.36
2015-01-18 12:03:15.9460 [PID=6562 ] [handle] [HOST#11723146] [RESULT#478757445] [WU#209304066] got result (DB: server_state=4 outcome=0 client_state=0 validate_state=0 delete_state=0)
2015-01-18 12:03:15.9460 [PID=6562 ] [handle] cpu time 55723.050000 credit/sec 0.005297, claimed credit 295.187298
2015-01-18 12:03:15.9469 [PID=6562 ] [handle] [RESULT#478757445] [WU#209304066]: setting outcome SUCCESS
2015-01-18 12:03:15.9534 [PID=6562 ] [send] effective_ncpus 16 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-01-18 12:03:15.9534 [PID=6562 ] [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2015-01-18 12:03:15.9534 [PID=6562 ] [send] Not using matchmaker scheduling; Not using EDF sim
2015-01-18 12:03:15.9534 [PID=6562 ] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2015-01-18 12:03:15.9534 [PID=6562 ] [send] work_req_seconds: 0.00 secs
2015-01-18 12:03:15.9534 [PID=6562 ] [send] available disk 20.55 GB, work_buf_min 30239
2015-01-18 12:03:15.9534 [PID=6562 ] [send] active_frac 0.999969 on_frac 0.999988 DCF 1.342297
2015-01-18 12:03:15.9580 [PID=6562 ] Sending reply to [HOST#11723146]: 0 results, delay req 60.00
2015-01-18 12:03:15.9590 [PID=6562 ] Scheduler ran 0.018 seconds


Claggy
8) Message boards : Problems and Bug Reports : DCF disabled :-( (Message 137649)
Posted 8 days ago by Claggy
Set dcf_debug and post some output.

I'm running Boinc 7.4.36, Einstein still uses DCF here, from the client_state.xml:

<duration_correction_factor>1.743956</duration_correction_factor>


Edit: and on the website: Task duration correction factor 1.743956

Claggy
9) Message boards : Cruncher's Corner : Daily Quota (Message 137618)
Posted 9 days ago by Claggy
Yeah I saw that all 112 times.

Makes no sense because it is no different than it was before.

I rather not wait another 12hrs and 45mins to try getting tasks again and getting this one back to work......and yeah it is 4am here and I am still awake.

Try downloading the file manually, and over write the original:

http://einstein2.aei.uni-hannover.de/download/cufft_xp32_32_16.dll

Claggy
10) Message boards : Cruncher's Corner : Daily Quota (Message 137612)
Posted 9 days ago by Claggy
They errored because:

<core_client_version>7.4.36</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>cufft_xp32_32_16.dll</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>


Claggy


Next 10

Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2015 Bruce Allen