S6GC1-Files necessary?

Marteinstein
Marteinstein
Joined: 21 Jun 11
Posts: 4
Credit: 9758945
RAC: 1480
Topic 196562

Hi folks!

I have one question or problem:

I am doing some backups of the BOINC-directory here and there and recently noticed that the Einstein project directory is growing excessively large. I am doing Einstein now for just above one year and my Einstein directory has now grown to above 1GB. After further review I noticed a lot of files (116) with ending S6GC1 (e.g. h1_0237.70_S6GC1). Some of them are new and some are older, spread over this one year. These files alone make up almost 1G of disk space in total.

Question: Are these files (S6GC1) in any way necessary for further calculations/crunching or can I delete some or all of them?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5852
Credit: 111070875777
RAC: 34746888

S6GC1-Files necessary?

Yes, they are necessary - at least some of them anyway, and it takes quite a bit of manual intervention (which I wouldn't particularly recommend) to work out which ones could be safely removed.

They are the large data files that are continually being reused for additional tasks in the GW search. They should eventually be deleted by the project when they are no longer needed. This is automatic - you don't have to do anything.

If you delete any of them now, the server will simply send them to you again. To be able to delete any of these files permanently before automatic deletion kicks in, you would need to remove the entries for them in your state file (client_state.xml). This is not a normal or even trivial exercise unless you know exactly what you are doing.

Each file is 'tagged' with a particular frequency identification - in the example you quoted the frequency is 237.70 Hz. My observations are that the current run has largely completed the lower frequencies below about 250 - 300 Hz say and that most of the remaining work is for frequencies above that. You don't have any 'in progress' work at the moment but if you did the name of a task would give you a clue as to which data files are needed and how many tasks were left for that particular frequency bin. Also, the server may send replacements for failed tasks at frequencies that are substantially completed to it's efficient (if you can spare the space) to keep old frequency data 'just in case'.

Cheers,
Gary.

Marteinstein
Marteinstein
Joined: 21 Jun 11
Posts: 4
Credit: 9758945
RAC: 1480

Thank you for your quick,

Thank you for your quick, informative and interesting response. Quite understanding what you mean. Thank you again.

Two additional questions that I'm interested in:

1) What is the difference between files starting with "l" and those starting with "h"?

2) The S6LV1-project is scheduled to run a further estimated 73 days, according to the server status page. Does that mean, that these files may be automatically deleted altogether after this application ends, roughly at the end of the year?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4276
Credit: 245596317
RAC: 10389

RE: 1) What is the

Quote:
1) What is the difference between files starting with "l" and those starting with "h"?

This marks data from two different LIGO observatories: H1 is the (larger) instrument in Hanford and L1 is the one in Livingston.

Quote:
2) The S6LV1-project is scheduled to run a further estimated 73 days, according to the server status page. Does that mean, that these files may be automatically deleted altogether after this application ends, roughly at the end of the year?

This is correct in principle. The files should be deleted when no more "work" can be sent out for these. But there are two things to keep in mind:

First: The prediction on the server status page actually shows the estimation how long new "work" can be generated. After the work has been generated, it needs to be sent out to the clients, processed, returned and then pass validation. If any of this fails for the tasks originally sent out, new tasks will be generated from the same "work" and sent again. So the files necessary for a "workunit" are kept until successful validation of all workunits that need these files. From past experience I'd say that the files of the current run should be gone 6-8 weeks after the last workunit for that "run" has been generated. So when the server status page shows zero "work still remainig", add another ~50d before all the files will be gone.

Second: We may decide to use the same data files for the next analysis run on Einstein@Home. The data files currently being used have a "S6GC1" in the names just because these were originally prepared for a run named "S6GC1", which was finished about a year ago and superseded by a run "S5GC1HF", and now "S6LV1". All of these basically used the same set of data files, so some of these files on your computer might be rather old, dating back to the the original S5GC1 run and have not (yet) been deleted.

We are currently discussing the next GW analysis run, and it looks rather unlikely that we will use the same data again. But this hasn't been decided yet.

BM

BM

Marteinstein
Marteinstein
Joined: 21 Jun 11
Posts: 4
Credit: 9758945
RAC: 1480

Yes, these (sub-)projects

Yes, these (sub-)projects never seem to end. Happens often at WCG, projects often run weeks, months or even longer after projected end. :lol:

Thank you for your answers. Case closed.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.