Can't contact scheduler, not even to report tasks

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245259946
RAC: 12755
Topic 197939

I don't want to complain, but I still can't validate my workunits...

19-1-2015 23:03:59 | Einstein@Home | Sending scheduler request: To report completed tasks.
19-1-2015 23:03:59 | Einstein@Home | Reporting 73 completed tasks
19-1-2015 23:03:59 | Einstein@Home | Requesting new tasks for NVIDIA GPU and Intel GPU
19-1-2015 23:04:11 |  | Project communication failed: attempting access to reference site
19-1-2015 23:04:11 | Einstein@Home | Scheduler request failed: Server returned nothing (no headers, no data)
19-1-2015 23:04:14 |  | Internet access OK - project servers may be temporarily down.


Machine ID is 11708865
73 BRP4G units are uploaded fine last week but fail to validate.
As you can see the server won't even want to communicate.

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110008249276
RAC: 24424407

Can't contact scheduler, not even to report tasks

I'd be complaining like hell if I were you and if this had been going for some time :-). The previous problems were with the upload server but that's not the issue here. There should be no reason why the scheduler shouldn't be talking to you whenever you ask, automatically or via the 'update' button if you want to force a contact. Your client seems to be unable to make contact and get a sensible response, even though your internet access is reported as 'OK'.

The steps in the process are:-

1. Completed results get uploaded to an upload server which just stores the files.
2. Your BOINC client subsequently attempts to 'report' the uploaded results.
3. Other backend processes can use the 'reporting' step as a signal to retrieve those particular 'stored' results and move them further along the result processing chain, ultimately getting them to the validator.

It would appear that all your results are successfully uploaded but for some reason step 2. is failing. Of course, validation can't even be contemplated until the tasks are 'reported' so you need to get step 2. happening by manual intervention if necessary. I would be going to the projects tab in BOINC Manager, selecting the Einstein project and clicking the 'update' button every 10 minutes or so until you get the scheduler to properly respond.

Your computers are hidden and even though you have provided a hostID, that only lets me see that you have 73 'in progress' results. It doesn't give me a link to the scheduler contact logs which would give the last successful exchange between your host and the scheduler. It's a bit of a long shot but perhaps you could take a look and tell us if there is anything of interest in that log?

Cheers,
Gary.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 947
Credit: 25167626
RAC: 15

@aad: the last scheduler

@aad: the last scheduler reply for your host 11708865 got logged on the 7th, so almost two weeks ago. It seems your BOINC client is indeed not able to communicate with the project scheduler.

Please check your private messages for what to do next.

Thanks,
Oliver

 

Einstein@Home Project

aad
aad
Joined: 24 Dec 05
Posts: 13
Credit: 662464540
RAC: 25683

RE: @aad: the last

Quote:

@aad: the last scheduler reply for your host 11708865 got logged on the 7th, so almost two weeks ago. It seems your BOINC client is indeed not able to communicate with the project scheduler.

Please check your private messages for what to do next.

Thanks,
Oliver

Thanks Oliver,

Did that, but I can't get the email through.
It got bounced even if it is zipped and made plain tekst.
Maybe the email adress is wrong?

Aad
(I made the machines visible)

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 947
Credit: 25167626
RAC: 15

RE: Did that, but I can't

Quote:

Did that, but I can't get the email through.
It got bounced even if it is zipped and made plain tekst.
Maybe the email adress is wrong?

I'll forward this.

Hang in there,
Oliver

 

Einstein@Home Project

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

RE: Your computers are

Quote:
Your computers are hidden and even though you have provided a hostID, that only lets me see that you have 73 'in progress' results. It doesn't give me a link to the scheduler contact logs which would give the last successful exchange between your host and the scheduler. It's a bit of a long shot but perhaps you could take a look and tell us if there is anything of interest in that log?


You can work that out:

http://einstein.phys.uwm.edu/host_sched_logs/11708/11708865

Claggy

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110008249276
RAC: 24424407

RE: You can work that


Only if you're more persistent than I was. I took a look at how the URL is constructed by using one of my hosts and I saw the numerical bits were nnnn/nnnnxyz - I was looking at a seven digit hostID and made the erroneous assumption that it was the first 4 digits with the whole value as the final step in the path. I wasn't dedicated enough to keep trying other combinations such as the whole hostID minus the last 3 digits for the first bit of the path. I figured if someone wanted to hide their computers then they should learn how to check these logs for themselves :-).

Thanks for the tip on how to do it for hidden machines.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245259946
RAC: 12755

I got the scheduler request

I got the scheduler request file. Unlike I expected, the scheduler doesn't crash on that request, it just acts normally. It really looks like your requests simply don't get through.

The good news is that your 73 tasks have been reported, you'll be credited as usual (pending validation).

The original problem has not been resolved, though. Please try a project reset. You don't need to be afraid anymore that finished tasks get lost.

If that doesn't help, there are certainly people here that are more competent in helping you with debugging your client's http communication than I are with my limited time.

BM

BM

aad
aad
Joined: 24 Dec 05
Posts: 13
Credit: 662464540
RAC: 25683

Strangely the machine

Strangely the machine reported the 73 tasks and got new tasks according my accountpage.
On the machine itself the 73 old tasks are still there and no new tasks....

21-1-2015 19:34:57 | Einstein@Home | update requested by user
21-1-2015 19:35:01 | Einstein@Home | Sending scheduler request: Requested by user.
21-1-2015 19:35:01 | Einstein@Home | Reporting 73 completed tasks
21-1-2015 19:35:01 | Einstein@Home | Requesting new tasks for NVIDIA GPU and Intel GPU
21-1-2015 19:35:14 | Einstein@Home | Scheduler request failed: Server returned nothing (no headers, no data)
21-1-2015 19:35:21 |  | Project communication failed: attempting access to reference site
21-1-2015 19:35:23 |  | Internet access OK - project servers may be temporarily down.

Since they are reported it is save to do a reset on this machine now.
Sorry for the wingmen of the new tasks (which I didn't actualy got!)
Or maybe they will be reassigned to me.

Thanks for looking into this anyway.

Greetz
Aad

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110008249276
RAC: 24424407

RE: Strangely the machine

Quote:
Strangely the machine reported the 73 tasks and got new tasks according my accountpage.
On the machine itself the 73 old tasks are still there and no new tasks....


Something has obviously changed after you posted the above. The tasks list on the website shows all 73 'old' tasks - the vast majority as validated - and a bunch of new tasks, one of which has already been crunched and returned so your machine must have the new work if you are actually crunching it :-).

Is everything fully resolved now? I presume the 73 'old' results are no longer showing on your host and you must be able to see all the new ones? What did you do to 'break the deadlock'? :-).

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245259946
RAC: 12755

The change in the DB that you

The change in the DB that you see on the host & and account page happened by me manually feeding the scheduler request that I got via eMail into the scheduler, originally with the intention to debug a possible crash of the scheduler.

The host didn't yet actually get through the normal way. When it does, even after the project reset it will get the "new" tasks the manually called scheduler assigned to it as "resent lost" ones.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.