new units not downloading


Advanced search

Message boards : Problems and Bug Reports : new units not downloading

Sort
AuthorMessage
kenlo
Joined: Jun 1 05
Posts: 17
ID: 85094
Credit: 28,206
RAC: 0
Message 14554 - Posted 27 Jun 2005 21:13:23 UTC

new h1 units not downloading
____________
kenlo

Profile Thierry Van Driessche
Avatar
Joined: Feb 9 05
Posts: 210
ID: 7904
Credit: 120,878
RAC: 65
Message 14561 - Posted 27 Jun 2005 22:24:40 UTC - in response to Message 14554.

new h1 units not downloading

Any relevant message(s) from the messages tab of Boinc would be interesting to post.
____________
Greetings from Belgium
Thierry


kenlo
Joined: Jun 1 05
Posts: 17
ID: 85094
Credit: 28,206
RAC: 0
Message 14563 - Posted 27 Jun 2005 22:59:34 UTC - in response to Message 14554.

new h1 units not downloading

06/27/05 19:02:34||Starting BOINC client version 4.43 for windows_intelx86
06/27/05 19:02:34||Data directory: D:\Program Files\BOINC
06/27/05 19:02:35|Einstein@Home|Computer ID: 307979; location: home; project prefs: default
06/27/05 19:02:35|orbit@home|Computer ID: 682; location: home; project prefs: default
06/27/05 19:02:35||General prefs: from Einstein@Home (last modified 2005-06-13 13:31:31)
06/27/05 19:02:35||General prefs: no separate prefs for home; using your defaults
06/27/05 19:02:35||Remote control not allowed; using loopback address
06/27/05 19:02:35|Einstein@Home|Resuming computation for result H1_0326.5__0326.9_0.1_T21_Fin1_2 using einstein version 4.79
06/27/05 19:02:35|orbit@home|Deferring communication with project for 14 hours, 48 minutes, and 26 seconds
06/27/05 19:02:35|Einstein@Home|Started download of h1_0326.5
06/27/05 19:02:35||schedule_cpus: must schedule
06/27/05 19:02:49|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:02:52|Einstein@Home|Started download of h1_0326.5
06/27/05 19:03:03|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:03:06|Einstein@Home|Started download of h1_0326.5

____________
kenlo

Ulrich Metzner
Avatar
Joined: Jan 22 05
Posts: 113
ID: 5141
Credit: 313,260
RAC: 0
Message 14564 - Posted 27 Jun 2005 23:46:44 UTC
Last modified: 27 Jun 2005 23:58:22 UTC

Here an excerpt from proxomitron log:

+++GET 30654+++
GET /download/38/h1_0205.0 HTTP/1.0
User-Agent: BOINC client
Host: einstein.astro.gla.ac.uk:80
Range: bytes=14736000-
Accept: */*
Connection: keep-alive

+++RESP 30654+++
HTTP/1.0 416 Requested Range Not Satisfiable
Date: Mon, 27 Jun 2005 23:41:50 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 PHP/4.3.10-15 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_perl/1.999.21 Perl/v5.8.4
Keep-Alive: timeout=15, max=89
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
+++CLOSE 30654+++

+++GET 30655+++
GET /download/38/h1_0205.0 HTTP/1.0
User-Agent: BOINC client
Host: einstein.astro.gla.ac.uk:80
Range: bytes=14736000-
Accept: */*
Connection: keep-alive

+++RESP 30655+++
HTTP/1.0 416 Requested Range Not Satisfiable
Date: Mon, 27 Jun 2005 23:41:54 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 PHP/4.3.10-15 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_perl/1.999.21 Perl/v5.8.4
Keep-Alive: timeout=15, max=88
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
+++CLOSE 30655+++

There is some filesize wrong!
____________
greetz, Uli

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14566 - Posted 27 Jun 2005 23:55:51 UTC
Last modified: 27 Jun 2005 23:59:36 UTC

I had the same problem just now and I had to reset the project on that PC.


The reason :

It had two download tasks running on exactly the same file. (h1_0400.0)

One was downloaded successfully with the expected file size and the other downloader "wondered where those bytes all came from" and reported a file size error too with a retry every few seconds.

BOINC 4.19, Dual CPU P3s

After the reset it did download stuff successfully but still it shows is "download failed". Nothing missing but I guess I cannot allow BOINC to have two files with the same filename ;-)

There must be something damaged on server/scheduler side or in the WU XML config.

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14568 - Posted 28 Jun 2005 0:08:19 UTC
Last modified: 28 Jun 2005 0:08:50 UTC

After a reset I got a H1_501.0

Same problem first - but then after successful(!) transfer of H1_501.0 BOINC got a request to delete H1_501.0 while it was still downloading H1_501.0 on the other download thread.

Of course the client didn't like that too much either - now there's a checksum error, 2 tasks are crunching and a few are still in "downloading" state

Very weird !

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14569 - Posted 28 Jun 2005 0:12:15 UTC
Last modified: 28 Jun 2005 0:24:21 UTC

The story continues : After manually contacting the scheduler to report the error, it tried to delete H1_501.0

BOINC was very sad and told me it couldn't delete H1_501.0 .... but the work units are happy now and not trying to download H1_501.0 again (as it's still there of course)
___________

I guess it's the WU configuration that is wrong, the scheduler request which I saved after the first problem had this in it :

<file_info>
<name>H1_0400.0</name>
<report_on_rpc/>
</file_info>
<file_info>
<name>h1_0400.0</name>
<report_on_rpc/>
</file_info>

i.e. twice the same stuff


I would rate this as a critical problem

Robert Nelson
Joined: Mar 19 05
Posts: 5
ID: 60647
Credit: 505,130
RAC: 679
Message 14574 - Posted 28 Jun 2005 1:00:50 UTC - in response to Message 14569.



I would rate this as a critical problem[/quote]
Same here, just caught one machine in an endless loop here is an excerpt
6/27/2005 8:16:21 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:23 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:24 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:25 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:27 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:27 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:28 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:30 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:30 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:31 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
It went on till I aborted transfer which appears to have killed the issue. This machine running Einstein beta and 4.45 windows.
____________

Walt Gribben
Forum moderator
Project developer
Joined: Feb 20 05
Posts: 219
ID: 25264
Credit: 1,192,408
RAC: 2,267
Message 14575 - Posted 28 Jun 2005 1:47:52 UTC - in response to Message 14574.
Last modified: 28 Jun 2005 1:50:33 UTC



I would rate this as a critical problem

Same here, just caught one machine in an endless loop here is an excerpt
6/27/2005 8:16:21 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:23 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:24 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:25 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:27 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:27 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:28 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:30 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:30 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:31 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
It went on till I aborted transfer which appears to have killed the issue. This machine running Einstein beta and 4.45 windows.



Shut down boinc and restart it. Usually "exit" in boincmgr will do it, but the boinc process must end. If it doesn't, use the taskmanager to "kill" it.

Theres a bug in BOINC where temporarily failed downloads keep the file open which can cause the problems you see. When boinc ends, Windows will close all the files.

kenlo
Joined: Jun 1 05
Posts: 17
ID: 85094
Credit: 28,206
RAC: 0
Message 14576 - Posted 28 Jun 2005 2:00:33 UTC - in response to Message 14554.

new h1 units not downloading

all i did after the bad download was to abort it and it seems to be running ok now.
____________
kenlo

Walt Gribben
Forum moderator
Project developer
Joined: Feb 20 05
Posts: 219
ID: 25264
Credit: 1,192,408
RAC: 2,267
Message 14577 - Posted 28 Jun 2005 2:13:21 UTC - in response to Message 14576.
Last modified: 28 Jun 2005 2:20:14 UTC

new h1 units not downloading

all i did after the bad download was to abort it and it seems to be running ok now.


Thats good. But run Process Explorer, look at the handles for the BOINC process, and see if theres any for h1_0326.5. Or any other h1_* file.

Its fine for the einstein application to use these, but BOINC shouldn't hold on to the file. It'll cause problems later, when BOINC has to delete it. Which shouldn't be for a few weeks yet, when the scheduler decides its time to work in a different set of data.

EDIT:

The "download looping" problem is in boinc 4.43 and fixed with 4.45. Don't remember whether 4.45 fixes the "open handle" one though.


EDIT**2:

From Roberts post, I'd say the "open handle" bug isn't fixed in 4.45. Thats what happens when downloads fail like that, if BOINC leaves the file open, it can't delete the file to download it again. Thats a problem for Einstein@home, where one file is downloaded for all the WU's to use. In that case, its probably a good idea to restart BOINC.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14586 - Posted 28 Jun 2005 6:06:17 UTC

This may be at least partly a screw-up on my side.

The "new" S4 data files are named l1_XXXX.X and h1_XXXX.X, in contrast to the "old" files which are named L1_XXXX.X and H1_XXXX.X.

Unfortunately I had not realized that on Win32, file names are case-insensitive.
So there may be some issues in the next few days if workunits which are supposed to use the file H1_0400.0 (which has a particular size and checksum) try to instead use the file h1_0400.0 (which has a DIFFERENT size and checksum).

Meanwhile, I'll see what I can do on the server side to ameliorate this issue.

Bruce
____________

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14587 - Posted 28 Jun 2005 6:14:27 UTC - in response to Message 14577.
Last modified: 28 Jun 2005 6:18:10 UTC

The "download looping" problem is in boinc 4.43 ...



4.19 here


... and it's still happening, on a different PC now, while it loops it needs most CPU power.

ABT Chuck P
Avatar
Joined: Feb 9 05
Posts: 20
ID: 11725
Credit: 363,204
RAC: 0
Message 14589 - Posted 28 Jun 2005 6:22:34 UTC - in response to Message 14586.

This may be at least partly a screw-up on my side.


Bruce

==============
Whew, thought I was looking at Boinc Seti for a few minutes. Had 9 errors (7 DL and 2 computing) on ID 11073.


____________

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14590 - Posted 28 Jun 2005 6:29:59 UTC
Last modified: 28 Jun 2005 6:35:46 UTC

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 14592 - Posted 28 Jun 2005 7:04:18 UTC - in response to Message 14590.

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.


would this waste work that has already been done (even work that has been returned) on those wu?

____________
~~gravywavy

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 14593 - Posted 28 Jun 2005 7:23:04 UTC - in response to Message 14586.
Last modified: 28 Jun 2005 7:23:46 UTC


Unfortunately I had not realized that on Win32, file names are case-insensitive.


yes, when writing a cross-platform system, it is safest to use only lower case, (or only upper case !?) throughout. Maybe the BOINC developers community should add this requirement to the policy on filenames across all BOINC projects, which would reduce the chances of similar errors in future.

It is not fair to expect developers with a single-OS background to know all the cross-platform pitfalls and policies can help with that.

All versions of DOS & Win have been case insensitive, but then so too were many mainframe OS's. Sooner or later someone is going to put BOINC on a platform with some other case-insensitive filing system, so whle Win makes the issue urgent here, this is one that would eventually have wanted sorting out anyway.
____________
~~gravywavy

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14596 - Posted 28 Jun 2005 7:50:19 UTC - in response to Message 14592.

What about deleting all the uppercase or lowercase WUs on server side and then later reissuing them with new naming convention?

This should "convert" the temporary download error into a permanent one (with "giving up") so the computers break out of their download loop.


would this waste work that has already been done (even work that has been returned) on those wu?




The current situation does the same, some of my team already did report lost WUs after the restart and it happened to me too.

Maybe it would help to remove the H1 and h1 ones for some time, later reissue only the h1 ones there and later (much later) reissue the H1 ones.


Those endless loops are very much a waste of CPU cycles too, the CPUs are heavily loaded mostly with the download, my system had a permanent high load on BOINC (not on the project client) and BOINC does not run with low priority. Not much CPU power left for any project client and (that's worst) for me.

If that happens on a production system where BOINC should stay in background, the users and admins of those systems might become really mad.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14601 - Posted 28 Jun 2005 8:20:26 UTC
Last modified: 28 Jun 2005 8:20:59 UTC

After some discussions with David Anderson, I've taken the simple way out. I've cancelled the workunits with names that start "h1_" (NOTE: this is case sensitive, work starting "H1_" is NOT cancelled).

I've also removed the problematic h1_XXXX.X data files from the download servers. After these changes propagate to the data server mirrors (15 to 30 minutes) this should generate hard download errors for any client that attempts these WU.

I'll rename the workunits and files using "w1" (w for Washington state, where the Hanford detector is located) and reissue them.

Apologies to everyone for this fiasco. It's my fault. Hopefully we can recover quickly.

Please feel free to manually abort any h1_ workunits. My apologies for wasted CPU cycles. Fortunately these workunits have only been out there for a half-day so this shouldn't be too severe.

Bruce

____________

Ulrich Metzner
Avatar
Joined: Jan 22 05
Posts: 113
ID: 5141
Credit: 313,260
RAC: 0
Message 14603 - Posted 28 Jun 2005 8:35:01 UTC - in response to Message 14601.

...Please feel free to manually abort any h1_ workunits. My apologies for wasted CPU cycles. Fortunately these workunits have only been out there for a half-day so this shouldn't be too severe.

Bruce

Thank you for handling this issue so quickly :)
A project reset (I only have h1_... left) should do the trick, right?
____________
greetz, Uli

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14604 - Posted 28 Jun 2005 8:42:01 UTC

Any chance to reset the "daily quota" things too for today?

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14607 - Posted 28 Jun 2005 9:24:35 UTC - in response to Message 14604.
Last modified: 28 Jun 2005 9:47:34 UTC

Any chance to reset the "daily quota" things too for today?


Good idea. I should be able to reset the daily quota for any host that has had WU cancelled. I'll work on this now.

[Update 10 minutes later]

DONE!

I've reset the daily result quota for any host that received an h1 workunit.
By the way, I don't think I ever said 'thank you' to those people who pointed out that something was wrong.

THANK YOU VERY MUCH!!

Could anyone suggest a simple and reliable way to abort h1_ workunits from any host, including those running old clients? Since the input data file is no longer on the download servers, I would have thought a simple and guaranteed solution was (1) stop BOINC (2) delete all files named h1_* (LOWER CASE!) and (3) restart BOINC. Can anyone confirm that this works? Is there an easier way?

Bruce

____________

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14609 - Posted 28 Jun 2005 9:43:51 UTC - in response to Message 14607.

DONE!

I've reset the daily result quota for any host that received an h1 workunit.

Bruce



Great, that worked - thanks :-)

2 of my dual CPU machines have been sitting there with on SETI WU each, they didn't download more SETIs as Einstein has a much higher share. Now they are busy on both CPUs again :-)

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14612 - Posted 28 Jun 2005 10:27:30 UTC

I've had a large number of boxes affected by this. I've only just noticed it a few minutes ago on one box. I stopped BOINC, deleted the h1 file (lower case h), restarted BOINC, forced an update and got a new file (l1 this time - lower case ell) and everything seems sweet again.

I've started looking at other boxes that I can't physically get to immediately and have found quite a number (probably about 10 so far) that have errored out work for no apparent reason today. Interestingly a number of these show signs of autorecovering in that fresh work is appearing in the list of results.

I'm not at all angry about this - c'est la vie, as they say. All I'd like to know is whether all affected boxes will autorecover now that the 8 per day has been reset, or will I physically have to go to each box and delete the offending h1 file?
____________
Cheers,
Gary.

littleBouncer
Avatar
Joined: Jan 22 05
Posts: 63
ID: 5660
Credit: 103,970
RAC: 0
Message 14613 - Posted 28 Jun 2005 10:32:37 UTC

@ Bruce Allen,

Why you didn't change yet the application from 4.79 to 0.03 (Windows)?

-only a Q.-

greetz littleBouncer

____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14614 - Posted 28 Jun 2005 10:40:00 UTC - in response to Message 14612.
Last modified: 28 Jun 2005 10:42:30 UTC

I've had a large number of boxes affected by this. I've only just noticed it a few minutes ago on one box. I stopped BOINC, deleted the h1 file (lower case h), restarted BOINC, forced an update and got a new file (l1 this time - lower case ell) and everything seems sweet again.


I'm glad this works. I think that this is probably the easiest procedure for most users.

I've started looking at other boxes that I can't physically get to immediately and have found quite a number (probably about 10 so far) that have errored out work for no apparent reason today. Interestingly a number of these show signs of autorecovering in that fresh work is appearing in the list of results.


The basic problem is that some hosts may have WU that refer to different files, named (for example) H1_0050.0 and h1_0050.0. These have different lengths and different checksums. But Windows treats these files as the same and will replace one with the other. Hence a WU may error out because the checksum stated in the workunit does not agree with the calculated checksum from the file. If this happens, then all is well because the WU will exit immediately with no wasted CPU time.


I'm not at all angry about this - c'est la vie, as they say. All I'd like to know is whether all affected boxes will autorecover now that the 8 per day has been reset, or will I physically have to go to each box and delete the offending h1 file?


I'm glad you're not mad, though I imagine that others will be! In a few hours I will again re-run the script that resets the daily result quota for machines that got h1_ workunits. This should help the machines to get more work right away.

If you don't delete the offending h1 file, I am not sure what will happen. In some cases, if there is no conflict with an H1 file name, the WU may well complete. Then the main issue is wasted CPU cycles, since I cancelled these WU on the server side.

Cheers,
Bruce

____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14615 - Posted 28 Jun 2005 10:43:37 UTC - in response to Message 14613.

@ Bruce Allen,

Why you didn't change yet the application from 4.79 to 0.03 (Windows)?

-only a Q.-

greetz littleBouncer


We should probably have this discussion in the other thread. But the short answer is that the new executable seems to be slower in most cases. We need to understand and fix that problem before distributing it widely.

Bruce
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14617 - Posted 28 Jun 2005 11:05:24 UTC

OK, thanks very much for the reply. Let me get this straight. If I'm seeing repeated attempts to get a file and repeated checksum errors it's due to a clash between a H1_xxxx and a h1_xxxx and this results in rapidly errored out work.

However, if I see any box with work in its results list starting h1_nnnn then whilst it appears at the moment to be proceeding normally, I'm going to get a rude awakening when that work is finished and attempted to be reported so I'm going to be wasting cycles big time unless I go and delete all h1_ work on all boxes that have it.

Does that about sum it up in layman's terms? :).

If so, then AAAAAARRRRRRRRGGGGGGGGGGGHHHHHHHHHH!!!!!!! :).

Seriously, I'm still not at all mad about this. One redeeming feature is that because of my fetish for keeping small caches there is not a huge number of work units to be wasted even although (at a quick search) there are h1_ files on most of my boxes. In fact most of the wasted cycles have already occurred and if I spend the time it takes to get around every box, I'm probably not going to save very much anyway.

Would like to be assured that the basic layman's summary is correct though.
____________
Cheers,
Gary.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14623 - Posted 28 Jun 2005 12:59:43 UTC - in response to Message 14617.
Last modified: 28 Jun 2005 13:36:59 UTC

OK, thanks very much for the reply. Let me get this straight. If I'm seeing repeated attempts to get a file and repeated checksum errors it's due to a clash between a H1_xxxx and a h1_xxxx and this results in rapidly errored out work.

However, if I see any box with work in its results list starting h1_nnnn then whilst it appears at the moment to be proceeding normally, I'm going to get a rude awakening when that work is finished and attempted to be reported so I'm going to be wasting cycles big time unless I go and delete all h1_ work on all boxes that have it.

Does that about sum it up in layman's terms? :).


Yes!

I have CANCELLED all h1_ workunits. That means that any CPU time spent on them is entirely wasted. No credits, no glory, no purpose.

Shoot those workunits before they tire out your CPUs.

(And once again, sincere apologies for this fiasco.)

Bruce

____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14626 - Posted 28 Jun 2005 13:26:53 UTC - in response to Message 14623.



Yes!

I have CANCELLED all h1_ workunits. That means that any CPU time spent on them is entirely wasted. No credits, no glory, no good.

Shoot those workunits before they tire out your CPUs.


In anticipation of that answer I've just finished deleting h1_nnnn work on about a dozen boxes that I can actually get physical access to. Bit of a struggle for V4.19 as it doesn't have the nice abort button that the later CCs have. Here's basically what I had to do.

1. Stop BOINC
3. Delete the large h1_nnnn file in the the einstein subdir of the projects dir
4. Restart BOINC. It would complain about missing files and would try to reget them.
5. The current WU would error out and the reget would mostly fail but occasionally it seemed to succeed.
6. Stop BOINC and repeat the procedure. The next h1_nnnn would then seem to error out.
7. I think on all second passes, BOINC would then get an l1_nnnn data file and I knew I was winning.
8. I'd throw in the odd "update" which occasionally seemed to help. I also had to stop and start BOINC to get processing started.

The interesting thing was that on at least three occasions BOINC claimed to be able to reget at least part of the hi_nnnn large file. I thought they were all supposedly deleted? Maybe BOINC was kidding itself :).
____________
Cheers,
Gary.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14627 - Posted 28 Jun 2005 13:39:52 UTC - in response to Message 14626.

The interesting thing was that on at least three occasions BOINC claimed to be able to reget at least part of the hi_nnnn large file. I thought they were all supposedly deleted? Maybe BOINC was kidding itself :).


E@H uses five different data servers. Four are mirrored off the root server at UWM. I deleted the files from that root server about 8 hours ago, and the secondary servers are supposed to mirror that change after no more than 15 minutes. However if one or more of them failed to mirror the changes, then it will continue to serve out the files and might cause the behavior that you saw.

Bruce
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14630 - Posted 28 Jun 2005 13:57:23 UTC
Last modified: 28 Jun 2005 13:58:29 UTC

Ahhh OK... One one occasion, the re-download got to about 50K and then stalled. Maybe I was snagging the file just as the server was deleting it :). When I got the full download (about 8 megs) I'd just stop and delete again and that seemed to cure it. Bit of an eerie feeling when it's telling you it is getting a file that's not supposed to be there. Hopefully all servers are synced up now.

The interesting question is what is going to be the reaction of the silent majority out there who don't regularly follow the lists and are going to be mightily confused by these strange happenings. Is it possible to send a small email to all registered users to warn them to check if they have h1_nnnn style data files and if so check the web page for details? I can just imagine the complaints if someone has a couple of days of h1 work and they don't immediately notice that there is no credit. I watched one of mine do that and that spurred me into action :).
____________
Cheers,
Gary.

Divide Overflow
Avatar
Joined: Feb 9 05
Posts: 91
ID: 12208
Credit: 182,409
RAC: 730
Message 14632 - Posted 28 Jun 2005 14:05:42 UTC

What about the "l1_xxx" WU's? I understand the point that lowercase h WU's are troublesome right now and should be aborted. What about lowercase l WU's?
____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14639 - Posted 28 Jun 2005 14:50:12 UTC - in response to Message 14632.

What about the "l1_xxx" WU's? I understand the point that lowercase h WU's are troublesome right now and should be aborted. What about lowercase l WU's?


Lowercase l workunits l1_XXXX.X__... are FINE! This is because we don't have any data sets labeled 'L1_XXXX.[05] for them to get confused with.

Bruce
____________

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14641 - Posted 28 Jun 2005 14:56:57 UTC - in response to Message 14630.
Last modified: 28 Jun 2005 15:15:36 UTC

The interesting question is what is going to be the reaction of the silent majority out there who don't regularly follow the lists and are going to be mightily confused by these strange happenings. Is it possible to send a small email to all registered users to warn them to check if they have h1_nnnn style data files and if so check the web page for details? I can just imagine the complaints if someone has a couple of days of h1 work and they don't immediately notice that there is no credit. I watched one of mine do that and that spurred me into action :).


I have thought about doing this. There are about 6000 host machines that got these workunits, and about 5000 users. But it would take me some hours to cobble together and test scripts for mailing the users, and I would rather spend the time making sure (testing!) the new w1_XXXX workunits to make sure they are OK.

[Edit added 30 min later]
I found a script that I have used before, which I can use to grant credit to users/hosts/teams for workunits which I have cancelled. I am going to use this to grant credit to people who have had the misfortune of getting and doing work then having it cancelled.

Bruce

____________

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 14643 - Posted 28 Jun 2005 15:51:47 UTC - in response to Message 14641.


I found a script that I have used before, which I can use to grant credit to users/hosts/teams for workunits which I have cancelled. I am going to use this to grant credit to people who have had the misfortune of getting and doing work then having it cancelled.



that is a nice touch, Bruce.

Fortunately it does not affect me, but I'm pleased to see the swift way the problem has been dealt with.

____________
~~gravywavy

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14644 - Posted 28 Jun 2005 16:06:03 UTC - in response to Message 14643.


I found a script that I have used before, which I can use to grant credit to users/hosts/teams for workunits which I have cancelled. I am going to use this to grant credit to people who have had the misfortune of getting and doing work then having it cancelled.



that is a nice touch, Bruce.

Fortunately it does not affect me, but I'm pleased to see the swift way the problem has been dealt with.


Thank you very much.

Real science is VERY error prone. In fact one of the distinguishing characteristics of real research is that (especially the first and second time) one gets it wrong more often than not. The only saving grace in all of this is that with other scientists you get 99.9% forgiveness for being brutally honest about what happened and why. That's the one thing that I can promise Einstein@Home participants that they will get 100% of the time.
____________

CJOrtega
Joined: Feb 19 05
Posts: 28
ID: 20133
Credit: 894,857
RAC: 800
Message 14649 - Posted 28 Jun 2005 16:33:46 UTC - in response to Message 14626.



Yes!

I have CANCELLED all h1_ workunits. That means that any CPU time spent on them is entirely wasted. No credits, no glory, no good.

Shoot those workunits before they tire out your CPUs.


In anticipation of that answer I've just finished deleting h1_nnnn work on about a dozen boxes that I can actually get physical access to. Bit of a struggle for V4.19 as it doesn't have the nice abort button that the later CCs have. Here's basically what I had to do.

1. Stop BOINC
3. Delete the large h1_nnnn file in the the einstein subdir of the projects dir
4. Restart BOINC. It would complain about missing files and would try to reget them.
5. The current WU would error out and the reget would mostly fail but occasionally it seemed to succeed.
6. Stop BOINC and repeat the procedure. The next h1_nnnn would then seem to error out.
7. I think on all second passes, BOINC would then get an l1_nnnn data file and I knew I was winning.
8. I'd throw in the odd "update" which occasionally seemed to help. I also had to stop and start BOINC to get processing started.

The interesting thing was that on at least three occasions BOINC claimed to be able to reget at least part of the hi_nnnn large file. I thought they were all supposedly deleted? Maybe BOINC was kidding itself :).


########

I took the easy way out. :-)

Stopped Boinc/service, waited 1/2 mni., started Boinc/service.

Then did an update of the Einstein project via BoincView.
6/28/2005 11:16:55 AM||Starting BOINC client version 4.45 for windows_intelx86
6/28/2005 11:16:55 AM||Executing as a daemon
6/28/2005 11:16:55 AM||Data directory: C:\Program Files\BOINC
6/28/2005 11:16:55 AM|climateprediction.net|Computer ID: 105470; location: home; project prefs: home
6/28/2005 11:16:55 AM|Einstein@Home|Computer ID: 21342; location: home; project prefs: default
6/28/2005 11:16:55 AM|SETI@home|Computer ID: 56801; location: home; project prefs: home
6/28/2005 11:16:55 AM||General prefs: from Einstein@Home (last modified 2005-05-19 16:49:31)
6/28/2005 11:16:55 AM||General prefs: using separate prefs for home
6/28/2005 11:16:55 AM||Remote control allowed
6/28/2005 11:16:55 AM|climateprediction.net|Resuming computation for result 3ive_200186079_0 using hadsm3 version 4.12
6/28/2005 11:16:55 AM|climateprediction.net|Resuming computation for result 3vit_100202636_0 using hadsm3 version 4.12
6/28/2005 11:16:55 AM|SETI@home|Deferring computation for result 29ap04aa.22117.2656.709662.124_2
6/28/2005 11:16:55 AM|Einstein@Home|Deferring communication with project for 7 hours, 44 minutes, and 1 seconds
6/28/2005 11:18:59 AM||request_reschedule_cpus: project op
6/28/2005 11:18:59 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/28/2005 11:18:59 AM|Einstein@Home|Requesting 34560 seconds of work, returning 1 results
6/28/2005 11:19:08 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/28/2005 11:19:08 AM|Einstein@Home|Got server request to delete file H1_0592.5
6/28/2005 11:19:10 AM|Einstein@Home|Started download of Config_L_S4lA
6/28/2005 11:19:10 AM|Einstein@Home|Started download of l1_0277.5
6/28/2005 11:19:10 AM|Einstein@Home|Temporarily failed download of Config_L_S4lA: 404
6/28/2005 11:19:11 AM|Einstein@Home|Started download of Config_L_S4lA
6/28/2005 11:19:12 AM|Einstein@Home|Finished download of Config_L_S4lA
6/28/2005 11:19:12 AM|Einstein@Home|Throughput 3059 bytes/sec
6/28/2005 11:19:32 AM|Einstein@Home|Finished download of l1_0277.5
6/28/2005 11:19:32 AM|Einstein@Home|Throughput 287190 bytes/sec
6/28/2005 11:19:32 AM||request_reschedule_cpus: files downloaded


So all is well in the world again. :-)

Claude



____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14650 - Posted 28 Jun 2005 17:03:37 UTC - in response to Message 14649.


I took the easy way out. :-)

.......

So all is well in the world again. :-)

Claude


That's all right if you have 4.45. As I mentioned I was running 4.19. my notes were for the benefit of those running that version.

____________
Cheers,
Gary.

Profile rbpeake
Joined: Jan 18 05
Posts: 190
ID: 3466
Credit: 734,274
RAC: 210
Message 14651 - Posted 28 Jun 2005 17:04:55 UTC - in response to Message 14644.

Real science is VERY error prone. In fact one of the distinguishing characteristics of real research is that (especially the first and second time) one gets it wrong more often than not. The only saving grace in all of this is that with other scientists you get 99.9% forgiveness for being brutally honest about what happened and why. That's the one thing that I can promise Einstein@Home participants that they will get 100% of the time.


I certainly appreciate that sentiment, and thank you!

Compared with most distributed computing projects I have participated in over the past number of years, you have gotten it right the first time more so than the majority of them!

From my perspective, it is very nice indeed to be associated with such professionals and with such a professionally run project.



____________
Regards,
Bob P.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14652 - Posted 28 Jun 2005 17:13:18 UTC - in response to Message 14641.


[Edit added 30 min later]
I found a script that I have used before, which I can use to grant credit to users/hosts/teams for workunits which I have cancelled. I am going to use this to grant credit to people who have had the misfortune of getting and doing work then having it cancelled.

Bruce


I'm very pleased that you have done that and it will be good for the silent majority who probably aren't even aware of the problem yet.

However, it's not my day today :). I took your advice and cancelled running work that was in many cases 80-90% complete!!! And I'm still not mad at you in the slightest :). I'd rather lose the credits than hold up the science by doing work that will only have to be repeated anyway so my cancelling the partly completed work was still the right thing to do.

It must have been one of those nightmare days (and nights) for you :).
____________
Cheers,
Gary.

Sharky T
Joined: Feb 19 05
Posts: 159
ID: 20395
Credit: 836,157
RAC: 4,482
Message 14653 - Posted 28 Jun 2005 17:29:04 UTC
Last modified: 28 Jun 2005 17:43:07 UTC

However, it's not my day today :). I took your advice and cancelled running work that was in many cases 80-90% complete!!! And I'm still not mad at you in the slightest :). I'd rather lose the credits than hold up the science by doing work that will only have to be repeated anyway so my cancelling the partly completed work was still the right thing to do.


I aborted 1 ongoing h1_WU and its been granted the claimed credit,so I don't think you loose those credits. :)

Edit: Hmm.. 4.19.. Was there a abort/cancel-button on those?
Hope they got reported.(Haven't read all posts here.(too long))

____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14656 - Posted 28 Jun 2005 17:48:36 UTC - in response to Message 14653.


Edit: Hmm.. 4.19.. Was there a abort/cancel-button on those?
Hope they got reported.(Haven't read all posts here.(too long))


Yep, you worked it out exactly!! There is no abort button in 4.19 which is why I reported my procedure earlier thinking I might be helping other 4.19ers. The computation on the WU gets zeroed when BOINC restarts after deleting the h1_nnnn file. So no credit will be coming for those.

However it doesn't matter in the slightest as it would be a waste of science to keep spending cycles on a WU that wont contribute.
____________
Cheers,
Gary.

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14658 - Posted 28 Jun 2005 19:48:09 UTC - in response to Message 14652.


[Edit added 30 min later]
I found a script that I have used before, which I can use to grant credit to users/hosts/teams for workunits which I have cancelled. I am going to use this to grant credit to people who have had the misfortune of getting and doing work then having it cancelled.

Bruce


I'm very pleased that you have done that and it will be good for the silent majority who probably aren't even aware of the problem yet.

However, it's not my day today :). I took your advice and cancelled running work that was in many cases 80-90% complete!!! And I'm still not mad at you in the slightest :). I'd rather lose the credits than hold up the science by doing work that will only have to be repeated anyway so my cancelling the partly completed work was still the right thing to do.


Good news -- I'm giving credit for cancelled and 'download error' work as well as successful and valid results. Since these problems were my fault it seems the least I can do.



It must have been one of those nightmare days (and nights) for you :).


I confess to being in a pretty foul mood for most of the day today!
____________

Profile hih_tv-Greg
Avatar
Joined: Feb 11 05
Posts: 94
ID: 14543
Credit: 31,815
RAC: 5
Message 14672 - Posted 29 Jun 2005 3:21:09 UTC

I just aborted "h1_0118.0__0118.1_0.1_T00_S4ha_0" from my machine, 06/28/2005 08:11:06 PM|Einstein@Home|Starting result l1_0315.5__0315.9_0.1_T00_S4lA_0 using einstein version 4.79.

____________
Greg

Profile Mahray
Avatar
Joined: Nov 11 04
Posts: 43
ID: 2002
Credit: 597,951
RAC: 935
Message 14677 - Posted 29 Jun 2005 4:34:00 UTC

I'd also like to say thanks for keeping us informed. Screw-ups happen, and I'm quite happy as long as I'm reasonably well informed.
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,352,127
RAC: 174,818
Message 14679 - Posted 29 Jun 2005 6:09:41 UTC - in response to Message 14658.


I confess to being in a pretty foul mood for most of the day today!


Actually you deserve heaps of praise for the way you handled everything. I don't think you could have done more and the issue was completely defused before there were any nasty surprises and the accompanying flood of complaints that would normally be expected to follow.

It is this kind of professionalism that makes me proud to give my full support to this project. Well done, and many thanks for all your efforts!!
____________
Cheers,
Gary.

Ananas
Joined: Jan 22 05
Posts: 256
ID: 5031
Credit: 1,666,020
RAC: 200
Message 14686 - Posted 29 Jun 2005 7:40:13 UTC

I agree, good work from the country of cheese and packers :-)

Especially for the good communication I'll give an A++

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 14706 - Posted 29 Jun 2005 14:10:10 UTC - in response to Message 14679.
Last modified: 29 Jun 2005 14:10:38 UTC

Actually you deserve heaps of praise for the way you handled everything. I don't think you could have done more


it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us


It is this kind of professionalism that makes me proud to give my full support to this project. Well done, and many thanks for all your efforts!!

agreed^2
____________
~~gravywavy

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 14708 - Posted 29 Jun 2005 14:19:07 UTC - in response to Message 14706.


it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us


Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce
____________

gravywavy
Joined: Jan 22 05
Posts: 394
ID: 4216
Credit: 68,962
RAC: 0
Message 14715 - Posted 29 Jun 2005 15:31:30 UTC - in response to Message 14708.
Last modified: 29 Jun 2005 16:09:06 UTC


it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us


Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce


Gary has pointed out to me that credit is not granted for wu that are killed by stealing their files. On consideration this makes sense if the xml that held the cpu time has gone. If the client re-starts the download when the files vanish, presumably it also deletes/overwrites the file that remembers the cpu time so far?

My thought is that it may be better, if running 4.19, to kill those wu from the operating system while BOINC is actually crunching them. This assumes the OS has some kind of task manager (eg not Win-98).

On win-XP for example, hit ctrl-alt-del and the task manager comes up. Highlight the Einstein task, right click, and kill process. The wu will report to BOINC that it ended with some error code that means killed. I think that this means that BOINC will report it back with a 'client error' message and they will get credit.

On linux: you probably already know how to use top or ps to get the pid, and how to use kill to abort. If not, I recommend the man pages on top, ps, kill.

Note: I have tried the win-xp method in the past, but not on these wu. If my suggestion won't work, please say so!

____________
~~gravywavy

NonValueAdded
Joined: Feb 20 05
Posts: 1
ID: 26298
Credit: 12,835
RAC: 0
Message 14720 - Posted 29 Jun 2005 18:34:52 UTC - in response to Message 14708.


it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us


Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce


FYI, it seems that my UPPER Case "H1_" work units got caught up in the delete sequence. I had a reboot in there so that didn't help plus I'm, going from memory on how many WU showed up before and after they system cycle. I had the impression from the original note that only the lower case h1_'s were at issue. I'm not worried about the credit and agree with most other responders on that point. But it might be important for the BOINC people to model out the case sensitivity aspect of the delete. Maybe with the combinations of OS and file system versions to simply avoid using file name case as a differentiator in the future. JMHO

Walt Gribben
Forum moderator
Project developer
Joined: Feb 20 05
Posts: 219
ID: 25264
Credit: 1,192,408
RAC: 2,267
Message 14723 - Posted 29 Jun 2005 19:35:07 UTC - in response to Message 14715.
Last modified: 29 Jun 2005 20:02:22 UTC


it wasn't till I saw this wu that I realised just how much Bruce had done to defuse anger: he has set things up so that people get credit for the part worked wu they cancel part way through - at least I think that is what this wu is telling us


Your interpretation is entirely correct. I am giving credit for partial/aborted/failed/completed h1_* workunits. Note that this is not instantaneous and may take a few hours. I have to run the script by hand and only do it a few times per day.

Bruce


Gary has pointed out to me that credit is not granted for wu that are killed by stealing their files. On consideration this makes sense if the xml that held the cpu time has gone. If the client re-starts the download when the files vanish, presumably it also deletes/overwrites the file that remembers the cpu time so far?

My thought is that it may be better, if running 4.19, to kill those wu from the operating system while BOINC is actually crunching them. This assumes the OS has some kind of task manager (eg not Win-98).

On win-XP for example, hit ctrl-alt-del and the task manager comes up. Highlight the Einstein task, right click, and kill process. The wu will report to BOINC that it ended with some error code that means killed. I think that this means that BOINC will report it back with a 'client error' message and they will get credit.

On linux: you probably already know how to use top or ps to get the pid, and how to use kill to abort. If not, I recommend the man pages on top, ps, kill.

Note: I have tried the win-xp method in the past, but not on these wu. If my suggestion won't work, please say so!


Using taskmanager (or similar utilities like Process Explorer) to kill the science application work great on Windows. Even Win95/98/ME, which has a task list instead of a task manager. Its still used to "kill" progams.

However, Linux seems to recover "better". Most of the time it just restarts the WU with the messages

Restarting result xxxxx
Result xxxx exited with zero status but no 'finished' file
If this happens repeatedly you may need to reset the project.

Going thru the signals, SIGABORT works. Like this (note - you have to use the same userid that you run BOINC under):

List the users tasks, enter:
ps -a

or if it doesn't show the boinc tasks, enter:
ps -x

Use the Process ID (PID) in the kill commmand:
kill -SIGABRT PID



Message boards : Problems and Bug Reports : new units not downloading


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration