SETI technical news

Ed and Harriet Griffith
Ed and Harriet ...
Joined: 18 Jan 05
Posts: 30
Credit: 1852692
RAC: 9266
Topic 189449

June 29, 2005 - 23:00 UTC
Addendum from previous post:

The outage took a bit longer than expected - the database dump had to be restarted twice (we reorganized our backup method a little bit, which required some "debugging"). We did everything we set out to do except the UPS testing, so that will be postponed.

The machine "gates" wasn't working out as a splitter, so we went with "sagan" instead (even though it is still the classic SETI@home data server and therefore quite busy). Every little bit helps. Eventually we added "kosh" as well, as it wasn't doing much at the time.

June 29, 2005 - 19:00 UTC
Since we're in the middle of an outage, why not write up another general update?

The validators are still disabled. The only public effect is a delay in crediting results. No credit should be lost, as it is always granted to results that still exist in the database, and they aren't deleted until they are validated and assimilated. So various queues are building up, but that's about it.

While this is an inconvenience for our users, repairing this program has taken a back seat to higher priority items (that we expected or appeared out of nowhere).

First and foremost, galileo crashed last night. We haven't yet fully diagnosed the cause (as we've been busy keeping to the scheduled outage for mundane but necessary items like database backups, rebooting servers to pick up new automounts, and UPS testing). At this point we think it is a CPU board failure, but the server is back up (and working as a scheduling server, but not much else). That's the bad news.

The good news is that arriving today (just in the nick of time) is a new/used E3500 identical to galileo (graciously donated by Patrick Jeski - thanks Patrick!). It should be arriving at the loading dock as I type this message. So at least we already have replacement parts on site. Whether or not we need these parts remains to be seen, but the extra server definitely creates a warm, fuzzy feeling.

With galileo failing, and other splitter machines buckling under the load of increased demand, we are slowly running out of work to send out. We tried to add the machine "gates," but due its low RAM (and the fact it is still serving a bunch of SETI classic cgi requests) it didn't work very well. We'll try to add more splitter power today after the outage.

One of our main priorities right now is ramping down all the remaining pieces of SETI classic and preparing for the final shutdown. This includes sending out a mass e-mail, converting all the cgi programs to prevent future editing (account updates, team creation, joining, etc.), and buffing up the BOINC servers as best we can before the dam breaks.

As well, the air conditioning in our closet began failing again over the past week. While this time machines didn't get as hot as before, facilities took a long look at the system and determined that there is indeed a gas leak (freon or whatever they use besides freon these days). More gas was added which will last a few weeks until the problem is fixed.