Posts by Richard Haselgrove

1) Message boards : Problems and Bug Reports : Undeleted completed work units (Message 133612)
Posted 23 hours ago by Richard Haselgrove
Don't just delete them - you may get them back again, at some pain to your bandwidth.

Search for and read some of Gary Roberts' explanations of 'locality scheduling': Einstein distributes those big files once, and you reuse them again and again. That's the theory, anyway, though we're in a difficult 'wrap up' phase at the end of a run, just at the moment.
2) Message boards : Cruncher's Corner : intel gpu help (Message 133572)
Posted 2 days ago by Richard Haselgrove
There's a question we ought to have got clear before we started this: is the host in question running Windows or Linux?

At the moment (referring to the applications page), there's only one Einstein application for Intel GPU, and it's BRP4 for Windows (available for 32-bit or 64-bit clients). Linux, nada - unless you feel like compiling it yourself?
3) Message boards : Cruncher's Corner : intel gpu help (Message 133568)
Posted 3 days ago by Richard Haselgrove
1) The Intel I7-4770 is not, of itself, an Intel GPU. But it contains an Intel HD 4600 graphics component, which is the iGPU we refer to.

2) BOINC will allow you to crunch with both GPUs, but there are other pre-requisites too. First, your motherboard must support using both devices at the same time. I have a Dell Optiplex which absolutely refuses to use both GPUs, and a home-build based on a Gigabyte motherboard which is perfectly happy to do so.

3) As above. There may be a BIOS setting which needs to be flipped, or there may not. It depends on the motherboard. It is also easiest to set up if you have a monitor or dummy load connected to the on-board video output (whichever device you use as your primary display output) - there are (complicated) workrounds, but try the easy way first if you can.

4) You certainly do need Intel HD graphics drivers installed to use the HD 4600 as a cruncher. Be careful with that weasel word 'current' - there's an active thread Widespread BRP4 validation errors in the Help Desk area with version numbers to watch out for: 10.18.10.3621 is looking like a good bet at the moment.

Intel's automatic driver tools sometimes claim (wrongly) that you CPU is incompatible. If that happens, I find that downloading the .zip file version of the driver, and following the manual 'have disk' procedure documented in the ReadMe file, is most likely to be successful.
4) Message boards : Problems and Bug Reports : Widespread BRP4 validation errors (Message 133564)
Posted 3 days ago by Richard Haselgrove
There appears to be a problem with some of the workunits using the Intel GPU. I have been carefully tracking my BRP4 Intel GPU workunits ever since I upgraded my Intel driver and then downgraded it after it came up with serious validate errors and invalids. I just got a workunit from 09/13/2014 invalidated by 2 other 4600 HD Intel GPUs. From what I can tell those hosts GPUs are also having validate and invalid errors, possibly from the upgraded driver. The problem I had was with driver version 10.18.10.3907.

I am currently using Intel(R) Graphics Driver: 10.18.10.3621, released 21 May 2014, download name Win64_153322.exe/zip. Host 5744895 is only showing 5 error/invalid and over 800 completed tasks, so I think we can recommend that driver version.

It is currently showing as 'latest' on the Intel driver download pages, so perhaps the 3907 variant has been withdrawn at source.
5) Message boards : Problems and Bug Reports : BRP4 Intel GPU app feedback thread (Message 133303)
Posted 17 days ago by Richard Haselgrove
Seems to be flowing more freely this morning (famous last words...). Got a dozen on my last request.
6) Message boards : Problems and Bug Reports : BRP4 Intel GPU app feedback thread (Message 133278)
Posted 18 days ago by Richard Haselgrove
I'm seeing the same thing, on all intel_gpu requests - and BRP4 (arecibo, non-GPU) is the only app version available for intel_gpu.
7) Message boards : Technical News : Fermi LAT Gamma-ray pulsar search #4 "FGRP4" (Message 133078)
Posted 26 days ago by Richard Haselgrove
The 1.03 version passed the 8 sec marker ...

A strange behaviour: after reaching ~25% whithin 6 min or so the progress bar makes a major step backwards to 2.4%

If you are using the recommended BOINC v7.2.42, or a later Beta build, BOINC will estimate and display a 'pseudo-progress' percentage while waiting for the first actual checkpoint and progress report from the science application.

If the first checkpoint is made quickly, or if the overall runtime estimate for the task is reasonably accurate, then the transition from pseudo-progress to real progress is almost invisible.

But if the estimate for the whole task is seriously wrong, pseudo- and real progress have time to diverge before the real figure is available, and a large correction becomes necessary.

Having a pseudo-progress display avoids the progress bar displaying 0.000% for extended periods, which tends to make users nervous.

A great explanation!
But can a big change of the estimated runtime also be handled by the BM (every version)? The initial estimated runtime was ~20 to 25 min, jumping then to > 4hrs. Maybe this is the reason why they time out on some machines.
My first 2 wu's are at 73% now with a runtime of ~3hrs and reporting a remaining time of ~1hr.
I'm using BM 7.4.12

By BM, I assume you mean BOINC Manager. As the term 'Manager' implies, that's the command-and-control module for BOINC, and doesn't do any actual work - your question would be better directed to the BOINC client.

And yes, the BOINC ecosystem as a whole - client and server - can handle a big change like this.

If both the server and the client are to a recent (2010 or later) specification, the adjustment is handled on the server, using tools like CreditNew and RuntimeEstimation.

If either (or both) of the server and client pre-date 2010 - as the server here at Einstein does - then both components drop back to the older 'Duration Correction Factor' mechanism (no longer documented, since the demise of the Unofficial BOINC Wiki).

Unfortunately, catch-22 applies in both cases. Neither CN/RE, nor DCF, updates their estimates until a task has successfully completed - in the case of CN/RE, 11 tasks have to complete and validate: in the case of DCF, a single completed task is sufficient. But if BOINC aborts the tasks for 'Maximum elapsed time exceeded' before successful completion......

Hence the references in this thread to 'innocculation' - modifying <rsc_fpops_bound> to bypass the infinite-loop safety-valve, and allowing completion so that estimate-modification can proceed. These are the sort of issues we were grappling with at Albert before attention switched to the new web design.
8) Message boards : Problems and Bug Reports : FGRP4 Observations and Problems (Message 133071)
Posted 26 days ago by Richard Haselgrove
They show a strange behaviour. After ~ 6 min the progress bar shows 25%, making a big step backwards then to 2,4%.
My first wu is now at 20% after 52 mins, the second at 16% after 47 min. Remaining time shows 3h35min and 3h16min now.
A third one started right now.
In the first stages the estimatet remaining time is at ~20 min.

See my explanation in the technical news area, replying to your matching post there.
9) Message boards : Technical News : Fermi LAT Gamma-ray pulsar search #4 "FGRP4" (Message 133070)
Posted 26 days ago by Richard Haselgrove
The 1.03 version passed the 8 sec marker ...

A strange behaviour: after reaching ~25% whithin 6 min or so the progress bar makes a major step backwards to 2.4%

If you are using the recommended BOINC v7.2.42, or a later Beta build, BOINC will estimate and display a 'pseudo-progress' percentage while waiting for the first actual checkpoint and progress report from the science application.

If the first checkpoint is made quickly, or if the overall runtime estimate for the task is reasonably accurate, then the transition from pseudo-progress to real progress is almost invisible.

But if the estimate for the whole task is seriously wrong, pseudo- and real progress have time to diverge before the real figure is available, and a large correction becomes necessary.

Having a pseudo-progress display avoids the progress bar displaying 0.000% for extended periods, which tends to make users nervous.
10) Message boards : Problems and Bug Reports : FGRP4 Observations and Problems (Message 133063)
Posted 26 days ago by Richard Haselgrove
The error on my fastest PC has a much longer stderr, of which one entry reads "Maximum elapsed time exceeded", although another entry buried deep in the might be interesting and reads
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x76343226

That's the normal error when Boinc aborts the job because of "Maximum elapsed time exceeded".

Rom Walton once told me it was a deliberate choice by the developers. One possible reason for a task running far longer than expected is that the execution path for that particular dataset has branched into a previously undetected infinite loop. The full program debug logs are to help the developer find that loop.


Next 10

Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2014 Bruce Allen