Kudos to Bruce!

ralic
ralic
Joined: 8 Nov 04
Posts: 128
Credit: 695810
RAC: 0

Thanks Bruce. Your E@H

Thanks Bruce.

Your E@H team would be better advised to screen all these S@H refugees more closely in future. I suspect an unnoticed foreign body was brought onboard and got into the air conditioning system, causing the failure.

Maybe we should quarantine all the S@H refugees until further notice... ;-)

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

I also wish to state that I

I also wish to state that I am highly impressed at the speed of recovery after that long of an outage. I saw very little lag time to seeing uploads happen, reports going through, and new work being downloaded. I know that this wasn't the case for everyone, because of location, and understand getting all the servers across the world in synch takes time, but still, WOW!

This project really is on it's toes, and is really well set up. Of course they have had help by watching other projects, and then doing it several times better.

Kudos! Keep up the impressive work.

Stefan
Stefan
Joined: 15 Nov 05
Posts: 52
Credit: 761198
RAC: 0

Thanks to all that got

Thanks to all that got Einstein back up again!

And a special thanks to Bruce, brilliant guy... ;)

Human Stupidity Is Infinite...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110411053503
RAC: 30702664

RE: I also wish to state

Message 35969 in response to message 35967

Quote:
I also wish to state that I am highly impressed at the speed of recovery after that long of an outage......

This is actually the most impressive part of the whole saga. Based on the experiences with the Seti servers after a long outage, one would expect to see some difficulties in getting results uploaded and reported, and new work downloaded. In my personal experience, I had 80+ machines with several thousand results to upload and report and all hungry for new work.

Virtually all of these boxes needed to be "kick started" because they were all out of work and had communications deferred for intervals of up to 300 hours!!! There was no way I was going to let "nature take its course" :). So, one by one in rapid succession, I made sure each machine's stuck results were uploaded and then updated. It took several hours to do them all. I was probably helped by the timezone as the servers had been up for an hour or two before I started. However, I can't say I ever saw an operation that needed to be retried. Every server contact was handled with little if any abnormal delay. The servers seemed to be able to cope with whatever was being thrown at them!!!

The servers are obviously well designed for the job with plenty of spare capacity for situations like this. Congratulations to all involved!!

Cheers,

Cheers,
Gary.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Thank you very much. I

Thank you very much.

I spent most of a day babysitting our servers after restarting the project. At one point we had about 300 machines simultaneously uploading results and downloading new work. The only real bottleneck was validation, and I was able to fix that by running five copies of the validator at the same time.

We really try hard to keep the project up and running 100% of the time. Unfortunately we have still not received any project funding, although I am quite hopefull that the US National Science Foundation will provide funding for us in the future. If this happens we can hire a couple of professionals to help take care of our hardware and software, which should greatly improve our reliability and capability to deal with unexpected problems.

Cheers,
Bruce

Director, Einstein@Home

John Hunt
John Hunt
Joined: 4 Mar 05
Posts: 1227
Credit: 501906
RAC: 0

RE: We really try hard to

Quote:
We really try hard to keep the project up and running 100% of the time. Unfortunately we have still not received any project funding, although I am quite hopefull that the US National Science Foundation will provide funding for us in the future. If this happens we can hire a couple of professionals to help take care of our hardware and software, which should greatly improve our reliability and capability to deal with unexpected problems.

Wow! Bruce - you have performed above and beyond the call of duty!

We salute you!


Stan Pleban
Stan Pleban
Joined: 2 Dec 05
Posts: 73
Credit: 4635380
RAC: 0

John, I agree with your

John, I agree with your comments regarding Kudos for Bruce...

I was unaware of the funding situation. Cash flow is always important!!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.