Searching for pulsars in PALFA data from Arecibo


Advanced search

Message boards : Problems and Bug Reports : Searching for pulsars in PALFA data from Arecibo

Sort
AuthorMessage
Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 92812 - Posted 13 Dec 2008 21:00:04 UTC
Last modified: 13 Dec 2008 21:01:44 UTC

We are starting some limited public testing of a new pulsar search on Einstein@Home. This search uses data from the PALFA collaboration, taken at the Arecibo radio observatory. Science information will be available in this thread in the Science Message Board area.

Please use this thread (which is in the Problems and Bug Reports Messsage Board area) to report bugs and problems with this new search. This is especially useful during our initial public testing!

Bruce Allen
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 92813 - Posted 13 Dec 2008 21:03:19 UTC - in response to Message 92812.

How do we recognize these tasks? Do they have a different name or different application?

____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Pooh Bear 27
Avatar
Joined: Mar 20 05
Posts: 1330
ID: 61731
Credit: 3,487,843
RAC: 1,967
Message 92814 - Posted 13 Dec 2008 21:38:37 UTC

Will they download if running the optimized apps, or do we have to remove the app_info and run standard?

____________

Profile Chris S
Avatar
Joined: Aug 27 05
Posts: 360
ID: 104423
Credit: 32,126
RAC: 0
Message 92818 - Posted 13 Dec 2008 23:12:37 UTC
Last modified: 13 Dec 2008 23:13:29 UTC

If we want to take part, can we please have instructions on how to do so.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 92828 - Posted 14 Dec 2008 10:26:50 UTC

There is no need for you to do anything, the tasks will be issued randomly between the usual "HierarchicalSearch" tasks. The number of tasks will be very limited during the testing phase (628 workunits at a time), it is rather unlikely you get one. The name of the application will be "einsteinbinary_ABP1" instead of the current "einstein_S5R4". You will be able to opt-out from the "Arecibo binary pulsar search" in your Einstein@home preferences (once the App is in the database, which will be some time tomorrow afternoon CET). Users currently running the Windows Beta App will not automatically get ABP1 tasks, I'll publish instructions and a new app_info.xml in the Beta App thread in the next days. For various technical reasons the ABP1 App will not be available for Mac OS X on PowerPC.

BM

Klimax
Joined: Apr 27 07
Posts: 87
ID: 256704
Credit: 255,896
RAC: 210
Message 92890 - Posted 16 Dec 2008 13:13:14 UTC

It would be good idea to put info at server-status about how many WU is present in the system.

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,080,997
RAC: 9,738
Message 92891 - Posted 16 Dec 2008 15:10:13 UTC - in response to Message 92890.

It would be good idea to put info at server-status about how many WU is present in the system.


There sure will be once this goes into production, but with the test set of just a few hundred units, it will be more like now-you-see-them,now-you-don't.

CU
Bikeman
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 92909 - Posted 17 Dec 2008 11:19:50 UTC
Last modified: 17 Dec 2008 18:38:59 UTC

The first 628 "ABP1" workunits are being produced right now and should be delivered in about 1h.

For this to happen our scheduler runs in a new "mixed" mode, which could break something for the normal operation (S5R4 workunits), too. We did our best to test this, but haven't done this before on the scale of the whole project.

On a machine that needs about 9h10m for a S5R4 task a ABP1 task ran 6h12m.

BM

Profile Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist
Avatar
Joined: Oct 15 04
Posts: 985
ID: 3
Credit: 170,849,008
RAC: 0
Message 92915 - Posted 17 Dec 2008 14:11:58 UTC - in response to Message 92890.

It would be good idea to put info at server-status about how many WU is present in the system.


That's a good idea. We'll modify the server status page to show this.
____________

Klimax
Joined: Apr 27 07
Posts: 87
ID: 256704
Credit: 255,896
RAC: 210
Message 92925 - Posted 17 Dec 2008 20:31:54 UTC - in response to Message 92915.

It would be good idea to put info at server-status about how many WU is present in the system.


That's a good idea. We'll modify the server status page to show this.


Thank you.

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 92948 - Posted 18 Dec 2008 14:35:38 UTC

Two issues are (now) known for this application:

1. On Windows 98/ME the communication between the worker and the screensaver isn't working, i.e. you'll see the starsphere, but not the data that's usually displayed. That's a common problem to all MinGW-built Windows Applications (i.e. S5R4 Beta Apps, too). We're working on this.

2. The App had a bug that occasionally lead to "general access violations" / "segfaults", preferably on Windows. The bug was in the part of the application that communicated with the screensaver, it didn't affect the scientific computation. This bug has been fixed and a new application version was published, Tasks that are resent because they errored out previously should be run with the new version of the application (minor version 02).

BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93050 - Posted 21 Dec 2008 23:17:44 UTC

The server that serves the ABP1 workunit data files shut itself down for an emergency at about 19:30 CET. We are investigating, it will probably not be back up before tomorow. ABP1 workunits downloaded now will probably end up as download errors (nope, they are not served by the mirror network).

BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93122 - Posted 25 Dec 2008 0:10:18 UTC
Last modified: 25 Dec 2008 0:10:33 UTC

The ABP1 validator stumbled over a result file that apparently forced it into an infinite loop. The program has been diabled until the problem has been fixed. No credit for ABP1 workunits in the next days.

BM

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 93203 - Posted 27 Dec 2008 11:00:07 UTC

The ABP1 validator has been fixed and is running again.

BM

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 95816 - Posted 26 Mar 2009 10:36:59 UTC

Now that ABP1 has been officially released and new workunits have been created, could we have the validator running again, please?

Also, I note that the credit 'claim' passed back to the servers is the old benchmark*time figure. I presume that the eventual 'grant' will be a server-determined value, like the LIGO work: otherwise we'll have the old Linux/Windows imbalance back, as with my first WU:

WU 50492230
122176858 886719 25 Mar 2009 5:33:47 UTC 25 Mar 2009 21:56:21 UTC Over Success Done 33,730.76 65.61 pending
122176859 1001562 25 Mar 2009 5:33:26 UTC 26 Mar 2009 10:25:49 UTC Over Success Done 29,658.58 130.00 pending

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 95867 - Posted 31 Mar 2009 14:12:59 UTC - in response to Message 95816.

Now that ABP1 has been officially released and new workunits have been created, could we have the validator running again, please?

The handling of ABP1 results will be moved to a different machine, for some time the server status page won't be able to show the daemon status correctly.

Also, I note that the credit 'claim' passed back to the servers is the old benchmark*time figure. I presume that the eventual 'grant' will be a server-determined value

Yes.

BM

TWAGTN
Joined: Mar 15 09
Posts: 2
ID: 447154
Credit: 161,664
RAC: 2
Message 95991 - Posted 3 Apr 2009 14:05:18 UTC

Hello,

My system just finished a ABP1 file and cannot upload the results. Check the messages file and see the following message:

Error reported by file upload server: can't open file /BOINC/projects/EinsteinAtHome/upload/31b/<filename>:Permission denied.

May I assume that's an issue on the upload server?

Oliver
Project developer
Joined: Sep 4 07
Posts: 56
ID: 279320
Credit: 481,794
RAC: 1,091
Message 96036 - Posted 4 Apr 2009 16:23:37 UTC - in response to Message 95991.


My system just finished a ABP1 file and cannot upload the results. Check the messages file and see the following message:

Error reported by file upload server: can't open file /BOINC/projects/EinsteinAtHome/upload/31b/<filename>:Permission denied.

May I assume that's an issue on the upload server?


Hi, this problem has been solved. All uploads should now work flawlessly.

Cheers,
Oliver

RogerJ
Joined: Apr 21 07
Posts: 2
ID: 255433
Credit: 985,331
RAC: 2,241
Message 96552 - Posted 26 Apr 2009 10:27:24 UTC

Windows XP , BOINC 6.2.19

GDATA InternetSecurity reports it is downloading data when I try to download templates_400Hz.bank

BOINC cannot complete the task download.

RJ

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,080,997
RAC: 9,738
Message 96555 - Posted 26 Apr 2009 11:04:28 UTC - in response to Message 96552.

Windows XP , BOINC 6.2.19

GDATA InternetSecurity reports it is downloading data when I try to download templates_400Hz.bank

BOINC cannot complete the task download.

RJ


Yeah, it's unfortunate that "distributed computing" has been adopted for rather malicious applications (bot-nets etc) so malware-scanners are now detecting what is normal busineess in BOINC as a suspicious thing.

I'm sure there are ways to tell that scanner that the PALFA app (and the S5R5 app) are trusted by you, right?

CU
Bikeman
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 96561 - Posted 26 Apr 2009 11:49:46 UTC - in response to Message 96552.
Last modified: 26 Apr 2009 11:51:02 UTC

Windows XP , BOINC 6.2.19

GDATA InternetSecurity reports it is downloading data when I try to download templates_400Hz.bank

BOINC cannot complete the task download.

RJ

How stupid is that? templates_400Hz.bank is a plain ASCII file!

BM

RogerJ
Joined: Apr 21 07
Posts: 2
ID: 255433
Credit: 985,331
RAC: 2,241
Message 96562 - Posted 26 Apr 2009 12:23:58 UTC - in response to Message 96555.

Windows XP , BOINC 6.2.19

GDATA InternetSecurity reports it is downloading data when I try to download templates_400Hz.bank

BOINC cannot complete the task download.

RJ


Yeah, it's unfortunate that "distributed computing" has been adopted for rather malicious applications (bot-nets etc) so malware-scanners are now detecting what is normal busineess in BOINC as a suspicious thing.

I'm sure there are ways to tell that scanner that the PALFA app (and the S5R5 app) are trusted by you, right?

CU
Bikeman


I could not find the PALFA or S5R5 apps but instead I set Firewall Rules for boinc boinccmd and boincmgr. Don't know if this was the right thing to do ? Although the download did not complete immediately, after a delay it did.

It has not been necessary to set any rules in the past.

Normally GDATA does not report that it is downloading files unless GDATA itself is being updated or it is updating virus signatures.

Thanks for your suggestion.

RJ

Profile Bikeman
Forum moderator
Volunteer developer
Avatar
Joined: Aug 28 06
Posts: 2056
ID: 210833
Credit: 5,080,997
RAC: 9,738
Message 96563 - Posted 26 Apr 2009 12:24:38 UTC
Last modified: 26 Apr 2009 12:24:56 UTC

I guess the mere fact that an application is establishing an HTTP connection to the Internet is the reason for the alert, not the content (so it's more a personal firewall thing, not a virus scanner, I guess). You'd have to tell it that BOINC (not the apps as I indicated earlier, as BOINC is doeing the actual downloading) can be trusted.

CU
Bikeman
____________

TCU Computer Science
Joined: Dec 22 05
Posts: 3
ID: 153082
Credit: 20,308,914
RAC: 37,081
Message 96686 - Posted 30 Apr 2009 19:11:44 UTC

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task. I installed the latest BOINC client, but that did not help. I have not had problems with machines running 10.5. Does this application require 10.5?

Here are links to the results for those machines:

http://einstein.phys.uwm.edu/results.php?hostid=740255
http://einstein.phys.uwm.edu/results.php?hostid=740445

____________

Profile nevermore
Joined: Feb 14 06
Posts: 2719
ID: 171869
Credit: 1,388,406
RAC: 0
Message 96756 - Posted 3 May 2009 20:41:18 UTC - in response to Message 96686.
Last modified: 3 May 2009 20:41:45 UTC

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task.


In lieu of the lack of input from anyone that may be able to answer your query, I suggest disabling the "Arecibo Binary Pulsar Search" on your account. Perhaps at a later time, the issue will be answered and resolved.
____________

TCU Computer Science
Joined: Dec 22 05
Posts: 3
ID: 153082
Credit: 20,308,914
RAC: 37,081
Message 96758 - Posted 3 May 2009 22:38:09 UTC - in response to Message 96756.

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task.


In lieu of the lack of input from anyone that may be able to answer your query, I suggest disabling the "Arecibo Binary Pulsar Search" on your account. Perhaps at a later time, the issue will be answered and resolved.


Will do. Thanks.

____________

Profile nevermore
Joined: Feb 14 06
Posts: 2719
ID: 171869
Credit: 1,388,406
RAC: 0
Message 96760 - Posted 4 May 2009 0:29:22 UTC - in response to Message 96758.
Last modified: 4 May 2009 0:30:04 UTC

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task.


In lieu of the lack of input from anyone that may be able to answer your query, I suggest disabling the "Arecibo Binary Pulsar Search" on your account. Perhaps at a later time, the issue will be answered and resolved.


Will do. Thanks.


You're welcome, also I neglected to mention before that you may see occasional messages (sometimes a few in a row) stating that there is no work for the 'Hierarchical' searches and thus no work available. Just ignore those and eventually (I haven't ran out of work because of this but I also keep a 3 to 5 day cache of work units...) E@H will send non-ABP1 work.
____________

Profile Bernd Machenschalk
Forum moderator
Project developer
Joined: Oct 15 04
Posts: 2033
ID: 2
Credit: 21,971,104
RAC: 41,805
Message 96992 - Posted 15 May 2009 7:16:43 UTC - in response to Message 96756.

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task.


In lieu of the lack of input from anyone that may be able to answer your query, I suggest disabling the "Arecibo Binary Pulsar Search" on your account. Perhaps at a later time, the issue will be answered and resolved.

Are these Intel or PowerPC iMacs?

BM

TCU Computer Science
Joined: Dec 22 05
Posts: 3
ID: 153082
Credit: 20,308,914
RAC: 37,081
Message 97024 - Posted 16 May 2009 16:29:04 UTC - in response to Message 96992.

I just noticed that two iMacs running OS 10.4 have failed on every ABP1 task.


In lieu of the lack of input from anyone that may be able to answer your query, I suggest disabling the "Arecibo Binary Pulsar Search" on your account. Perhaps at a later time, the issue will be answered and resolved.

Are these Intel or PowerPC iMacs?

BM


Intel

Profile hoarfrost
Joined: Feb 9 05
Posts: 143
ID: 9971
Credit: 2,698,658
RAC: 4,412
Message 98430 - Posted 1 Aug 2009 12:00:49 UTC

Strange WU 55643247
____________

rroonnaalldd
Joined: Dec 12 05
Posts: 101
ID: 146004
Credit: 450,418
RAC: 444
Message 98433 - Posted 1 Aug 2009 12:14:37 UTC

WU 135113304 Why?
____________

samuel7
Joined: Feb 16 05
Posts: 26
ID: 17704
Credit: 740,440
RAC: 296
Message 98435 - Posted 1 Aug 2009 14:21:44 UTC

For the validate errors, see this thread.
____________

Profile Donald A. Tevault
Avatar
Joined: Feb 17 06
Posts: 308
ID: 173034
Credit: 9,455,962
RAC: 25,839
Message 99538 - Posted 20 Sep 2009 12:33:27 UTC

Hi folks!

A couple of days ago, after app 1.09 became official, I set up a new Linux computer. It found and downloaded the new app just fine. Now though, I've emptied the cache on two other Linux machines, so that I could switch from the beta app to the official app. On both of these machines, I've not been able to download the official app. They keeps trying, but apparently can't contact the download server. (They can however, find and download the regular hierarchical 1.06 app.)
____________

Profile tullio
Joined: Jan 22 05
Posts: 1175
ID: 6186
Credit: 167,788
RAC: 180
Message 99539 - Posted 20 Sep 2009 12:55:21 UTC

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio
____________

Profile Donald A. Tevault
Avatar
Joined: Feb 17 06
Posts: 308
ID: 173034
Credit: 9,455,962
RAC: 25,839
Message 99541 - Posted 20 Sep 2009 16:36:12 UTC - in response to Message 99539.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio



I'm getting the units. It's just that I'm not getting the application.
____________

Profile tullio
Joined: Jan 22 05
Posts: 1175
ID: 6186
Credit: 167,788
RAC: 180
Message 99543 - Posted 20 Sep 2009 17:08:40 UTC - in response to Message 99541.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio



I'm getting the units. It's just that I'm not getting the application.

How are your computing preferences in Einstein? Did you allow all applications?
____________

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 99544 - Posted 20 Sep 2009 17:14:02 UTC - in response to Message 99539.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.

Einstein has hundreds of hours of data in stock. Per day they use about 20 minutes worth, so they'll easily survive the outage at Arecibo.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Profile Donald A. Tevault
Avatar
Joined: Feb 17 06
Posts: 308
ID: 173034
Credit: 9,455,962
RAC: 25,839
Message 99545 - Posted 20 Sep 2009 18:06:39 UTC - in response to Message 99543.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio



I'm getting the units. It's just that I'm not getting the application.

How are your computing preferences in Einstein? Did you allow all applications?


I haven't changed anything, other than deleting the APB beta apps and the app_info.xml file.
____________

Profile tullio
Joined: Jan 22 05
Posts: 1175
ID: 6186
Credit: 167,788
RAC: 180
Message 99546 - Posted 20 Sep 2009 18:10:18 UTC - in response to Message 99545.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio



I'm getting the units. It's just that I'm not getting the application.

How are your computing preferences in Einstein? Did you allow all applications?


I haven't changed anything, other than deleting the APB beta apps and the app_info.xml file.

Did you try restarting BOINC?
____________

Profile Donald A. Tevault
Avatar
Joined: Feb 17 06
Posts: 308
ID: 173034
Credit: 9,455,962
RAC: 25,839
Message 99552 - Posted 20 Sep 2009 23:31:27 UTC - in response to Message 99546.

On the SETI boards it says that the ALFA receiver is down for maintenance until November. This might be the reason you are not getting ABP1 units.
Tullio



I'm getting the units. It's just that I'm not getting the application.

How are your computing preferences in Einstein? Did you allow all applications?


I haven't changed anything, other than deleting the APB beta apps and the app_info.xml file.

Did you try restarting BOINC?



Yes. The problem isn't on my end.
____________

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,627
RAC: 174,508
Message 99576 - Posted 22 Sep 2009 9:36:30 UTC - in response to Message 99545.

... I haven't changed anything, other than deleting the APB beta apps and the app_info.xml file.

I haven't been paying proper attention lately - I wasn't even aware that the test app had been made 'official'.

What I normally do is run the cache dry and report, stop BOINC, delete app_info.xml and delete the reference to the test app that is inserted into the state file. I don't delete the test app itself if the same app has become 'official'.

When BOINC restarts with no reference to the test app in the state file, it realises that it has to go to the project for the official app, but it also realises that it already has a copy of the official app so it skips the download and just continues to use the test app as the official app - since nothing has really changed.

If you still have a copy of the test app, maybe this technique would 'work around' the problem of not being able to download (for whatever reason) the new app. just wondering out loud :-).

____________
Cheers,
Gary.

Profile Ageless
Avatar
Joined: Jan 26 05
Posts: 1902
ID: 7430
Credit: 143,057
RAC: 332
Message 99577 - Posted 22 Sep 2009 9:47:42 UTC - in response to Message 99576.
Last modified: 22 Sep 2009 9:48:20 UTC

Resetting the project, after deleting the app_info.xml file & restarting BOINC, will in the least make sure the correct application version is downloaded again.
Just deleting the app_info.xml file will leave information in the client_state.xml file that an anonymous platform is being used.
____________
Jord

-The BOINC FAQ Service

-CUDA/CAL Stream FAQ

Richard Haselgrove
Joined: Dec 10 05
Posts: 579
ID: 144054
Credit: 2,965,572
RAC: 2,400
Message 99578 - Posted 22 Sep 2009 9:55:00 UTC - in response to Message 99577.

Resetting the project, after deleting the app_info.xml file & restarting BOINC, will in the least make sure the correct application version is downloaded again.
Just deleting the app_info.xml file will leave information in the client_state.xml file that an anonymous platform is being used.

I was just going to suggest the same thing!

In fact, I'm surprised that Gary's method works, because in general Beta apps are unsigned, but BOINC requires a signature for the production apps.

Profile Gundolf Jahn
Joined: Mar 1 05
Posts: 364
ID: 43449
Credit: 156,631
RAC: 183
Message 99580 - Posted 22 Sep 2009 11:08:27 UTC - in response to Message 99578.

In fact, I'm surprised that Gary's method works, because in general Beta apps are unsigned, but BOINC requires a signature for the production apps.

That would explain this:-)
19/09/2009 15:41:20|Einstein@Home|[error] Application file einstein_S5R5_3.05_windows_intelx86.exe missing signature
19/09/2009 15:41:20|Einstein@Home|[error] BOINC cannot accept this file
I only had deleted the app_info.xml file. However, there are no problems since, even without a reset.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,627
RAC: 174,508
Message 99584 - Posted 22 Sep 2009 22:16:04 UTC - in response to Message 99577.

Resetting the project, after deleting the app_info.xml file & restarting BOINC, will in the least make sure the correct application version is downloaded again.

True - but I've always had an aversion to resetting projects if there is another way ... :-).

Just deleting the app_info.xml file will leave information in the client_state.xml file that an anonymous platform is being used.

Which was why I said to "delete the reference to the test app that is inserted into the state file."

For someone with a single machine and no experience with playing around in the state file, resetting is obviously a far less risky way to go.

____________
Cheers,
Gary.

Profile Gary Roberts
Forum moderator
Joined: Feb 9 05
Posts: 2068
ID: 12521
Credit: 57,353,627
RAC: 174,508
Message 99586 - Posted 22 Sep 2009 23:57:49 UTC - in response to Message 99578.

... I'm surprised that Gary's method works, because in general Beta apps are unsigned, but BOINC requires a signature for the production apps.

Not only does the method work but cutting and pasting the signature from another machine running the official app works like a charm as well. I tried cutting and pasting first but then discovered I didn't even need to do that as long as I removed the AP stuff from the state file. The state file ended up with the signature inserted automatically - I guess as a result of the scheduler exchange that ended with the "file exists - skipping download" message.

It's a while since I last did that so I'm a bit hazy on the details but I'm sure that I was able to do this without even running the cache dry. I just examined the differences between two state files, one running under AP and the other running the same app 'officially'. It was pretty obvious what needed to be deleted. I used to pre-prepare the signature as a small block to be inserted and then do each machine sequentially. It took around a minute per machine to go from crunching under AP to crunching the same tasks with the same app - but 'officially'. These days I don't even bother inserting the signature - I just delete the AP stuff. I don't know if anything has changed with more recent BOINCs that would invalidate the procedure.

Another thing I've found I can do by playing around in the state file is completely recover trashed caches. On quad core machines, I quite often have a cache of 50-100 tasks or more. My internet plan with my ISP has a 6GB monthly limit with a 60GB off-peak free allowance. I run a cron job twice a day on one machine that uses boinccmd to stop network access on all machines just before the start of the peak period and then to allow access just after the start of the off-peak period. That way all uploading and downloading is done in the free period.

So with this regimen in place and with the peak period being during the day and evening, I tend to keep an eye out for problems during that time. If something goes wrong and a cache gets trashed, nothing gets reported to the project and an easy recovery is usually possible. As an example, I had an overheating problem on a machine with a dodgy CPU fan. Fortunately it happened and was discovered during peak hours when comms were disabled. The machine didn't crash - it just trashed the entire cache :-). After replacing the fan, I was able to restart the machine and edit the state file to remove all the trashed results, leaving the small number of results that had been completed without error prior to the fan going south. When I restarted BOINC and re-enabled network access, the server promptly sent a flood of missing results and accepted the upload of the good results that had been saved. All I lost was the time that had been spent on the 4 current tasks in flight, up to the point that the temperature became too hot.

I would have done at least 10 full cache recoveries like this (for various reasons) over the last year or two. As yet, I haven't had a single recovery fail. Gotta love that 'resend lost results' BOINC feature :-).

____________
Cheers,
Gary.

Message boards : Problems and Bug Reports : Searching for pulsars in PALFA data from Arecibo


Return to Einstein@Home main page

This material is based upon work supported by the National Science Foundation (NSF) under Grant NSF-0200852 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2009 Bruce Allen for the LIGO Scientific Collaboration