Sudden lurch in remaining work display |
Message boards : Cruncher's Corner : Sudden lurch in remaining work display
| Author | Message |
|---|---|
|
The S5R3 search progress pane on the server status page suddenly changed from saying we had well over 300 days of work to go to claiming on 6.2 days. | |
| ID: 79937 | | |
|
Actually news, but I didn't come to write the details yet. We found we had to break the current run in two parts at 800Hz frequency. The display shows the work remaining below 800Hz. We'll have set up the upper half in a few days. | |
| ID: 79939 | | |
|
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? | |
| ID: 79941 | | |
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? Apn unexpected side-effect of the problems we have found with the >=800Hz WUs is that they run shorter as intended. The new ones will get the same credit, but run noticeably longer. BM | |
| ID: 79943 | | |
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? Would it be possible to keep the runtime unchanged and adjust the credit instead? This would reduce alot of the grumbling from people with older machines that aren't on 24/7. ____________ ![]() | |
| ID: 79970 | | |
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? If the official Windows app becomes 4.26, there may be enough of a speed boost to help the GUM (Great Unwashed Masses). If not, then boosting deadlines up to 16-18 days until SSE can be implemented in the Windows app may also help... ____________ ![]() | |
| ID: 79971 | | |
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? Hmmm. . . I don't know. If I understand Bernd correctly, it sounds like these >= 800Hz workunits don't run long enough to complete all of the needed calculations. Thus, the need to create new workunits with longer runtimes. ____________ ![]() | |
| ID: 79972 | | |
I noticed all the WUs i had above 800 gives way too much credit, will future WUs in that range give lower credits? Yes, and what I stated does depend on the runtime staying consistent between the < and the >=. If >= 800 takes longer than < 800, then there is definitely going to be some need for panic... ____________ ![]() | |
| ID: 79974 | | |
Depends on what Bernd meant. The way i read it was that the WUs were completing all the work they needed to do in significantly less time than was expected. ____________ ![]() | |
| ID: 80002 | | |
If that's the case, then I don't understand what the problem is. Hopefully, we'll get some more amplifying info on this later. ____________ ![]() | |
| ID: 80007 | | |
The way I translated it, the workunits ran much faster than anticipated. What isn't stated is why they ran faster than anticipated. Another related message here was about how tasks at the 799.xx frequency were erroring out immediately... I unno... I've asked multiple times about deadline extensions. I was considering not asking again based upon the increase in speed by Windows 4.26. Will have to wait and see... ____________ ![]() | |
| ID: 80010 | | |
|
I've finally received a pair of these >= 800Hz jobs. They completed in about 76,000 seconds, far less than the 110,000 - 120,000 seconds that would be normal for this machine. So, there's definitely something strange here. | |
| ID: 80024 | | |
I've finally received a pair of these >= 800Hz jobs. They completed in about 76,000 seconds, far less than the 110,000 - 120,000 seconds that would be normal for this machine. So, there's definitely something strange here. My timing always sucks... I am only up to 779... :-( ____________ ![]() | |
| ID: 80027 | | |
I've finally received a pair of these >= 800Hz jobs. They completed in about 76,000 seconds, far less than the 110,000 - 120,000 seconds that would be normal for this machine. So, there's definitely something strange here. Brian, Here's a look at my Mobile AMD64 3700 laptops wus using windows and the work done so far: ![]() | |
| ID: 80029 | | |
I've finally received a pair of these >= 800Hz jobs. They completed in about 76,000 seconds, far less than the 110,000 - 120,000 seconds that would be normal for this machine. So, there's definitely something strange here. Yeah yeah... rub it in... You got the credit boost from going above 799 and then the performance boost by going to 4.26... :-P on you too... ____________ ![]() | |
| ID: 80031 | | |
|
Well, to be honest, I hadn't looked at the credits for the 800's until you mentioned it. | |
| ID: 80032 | | |
|
The data files currently on Einstein@home of 800Hz and above (h1_0800.0_S5R2* / l1_0800.0_S5R2*) are wrong. While we are generating the correct ones, we stopped generating workunits for 800Hz and above. | |
| ID: 80033 | | |
|
Current app will handle this new WU? | |
| ID: 80034 | | |
Current app will handle this new WU? The new workunits will reference the same Apps. No change there. BM | |
| ID: 80035 | | |
Thanks... The speed increase from 4.26 is definitely appreciated and would probably reduce the incidence of tasks missing deadline by only a couple of days as it appears to be 10-20% faster, depending on hardware. I guess it all will depend on how long the new results take... Anyway, as for the boundary tasks, do you know if all of those have already been distributed? Since they fail very quickly, any host that gets them will likely be driven down to only 1/day quota... ____________ ![]() | |
| ID: 80036 | | |
Anyway, as for the boundary tasks, do you know if all of those have already been distributed? Since they fail very quickly, any host that gets them will likely be driven down to only 1/day quota... You're right, I cancelled the workunits, which means that no new tasks should be generated for them. For the few dozen tasks that have already been generated for these in the DB I'm afraid I won't be able to do anything (without risking DB inconsistencies). BM | |
| ID: 80084 | | |
|
The Server Status page has been showing around 4 days or so remaining work (I didn't pay attention to the precise figures) but this morning it seems to be above 5 days which suggests that incremental additions to the stock remaining are possibly being made. Perhaps testing of small numbers of new +800 tasks?? | |
| ID: 80166 | | |
|
Update: We started to "drain" the current S5R3a workunit generator, i.e. it will generate all the workunits below ~799Hz that have not yet been generated, put them into the database and then terminate. | |
| ID: 80239 | | |
|
We started to send out the first (few hundred) "upper-half" S5R3 tasks for testing. For the curious: The task names end in "S5R3b", and the data files in "S5R3". The Delay Bound ("deadline") has been increased to 18 days. | |
| ID: 80356 | | |
|
To respect to the 800 hz, given the creditals of the format for recieving credits for wu's. I had believed that in the interim, computer that runs works under the assumption of half time. Meaning that if the work takes longer the credit should not be any different then if it took a short time. I believe that the wear and tear on the cpu in which its life expectancy deminishes. For the lack of a better word. The cpu dies out due to the excessive work load that it endures through processing data. | |
| ID: 82268 | | |
To respect to the 800 hz, given the creditals of the format for recieving credits for wu's. I had believed that in the interim, computer that runs works under the assumption of half time. Meaning that if the work takes longer the credit should not be any different then if it took a short time. I believe that the wear and tear on the cpu in which its life expectancy deminishes. For the lack of a better word. The cpu dies out due to the excessive work load that it endures through processing data. Hi! The fact that the old near 800 Hz units ran twice as fast was a special effect, so it was not reflected in the credits for WUs. Now the runtime is back to "normal" and all should be fine. As to wear and tear of the CPU: I would not be concerned about this, CPUs are designed to run on max load for years and years (if they run within the specified limits, overclocking is a different story, of course). Most components are stressed more when switching the system on and off, so 24/7 operation or a continuous high-load operation should not lower the lifetime of a CPU below the time you usually expect to have a CPU in operation (few of us use a CPU build 10 years ago, and we won't use our current CPUs in 10 years). There are some moving parts like fans and disk drives that might show a lower life expectancy from BOINC, tho. If you see it from an economical view, the cost of wear-and-tear should be insignificant anyway compared to the additional energy costs, so wear and tear is a non-issue, I guess. CU Bikeman ____________ ![]() ![]() | |
| ID: 82273 | | |
As to wear and tear of the CPU: I would not be concerned about this, CPUs are designed to run on max load for years and years (if they run within the specified limits, overclocking is a different story, of course). Most components are stressed more when switching the system on and off, so 24/7 operation or a continuous high-load operation should not lower the lifetime of a CPU below the time you usually expect to have a CPU in operation (few of us use a CPU build 10 years ago, and we won't use our current CPUs in 10 years). I was a reliability guy for a major semiconductor manufacturer for four years around 1990, and have had some further contact on this subject since. Your advice to users here differs from my understanding of the matter. Actually, these days the in-service reliability goal is set for the distribution of expected operating conditions, not on the worst-case assumption that all in-service parts see worst-case conditions. If everybody re-wrote their flash card at the maximum feasible rate, many more would fail far sooner than the goals. If everybody operated their CPU at 100% utilization with poor cooling, the fleet CPU failure rate would be much higher than the requirement. Hotter is worse, and higher voltage is worse, though the degree to which these two things hurt your chances varies with mechanism. Thermal cycling of a CPU to the degree presented by switching a system off and on is an utterly negligible stress. There have been cases (usually involving thin-film compatibility issues) of component/package combinations with appreciable thermal cycling failure rates stemming from delamination, but the accumulated harm varies as a quite high power of the cycling range (something like sixth power, if I recall a paper my colleague Rich Blish presented on the subject), and the range for desk-top CPUs is just not much. In summary, yes, you are raising the probability of failure of your CPU at any given moment (including the first minute after start) by running BOINC applications in time which would otherwise be idle. You are raising it further if you increase power consumption and temperature by overclocking. You are raising it further if you raise the CPU voltage. You are raising it further if you operate the PC during hours in which you would have shut it down. I do agree that there are probably components of the PC which don't like the system being powered up and down, but CPU failure probability is not likely in that category. So after all that negativism, let me switch sides and say: 1. You are far more likely to have your system fail from fouled-up software than from any hardware problem. 2. Among hardware problems, last time I saw the data, hard drive failures and monitor failures are considerably more common than failures in the CPU/motherboard parts of the system. 3. So if you don't carry overvoltage overclocking to extremes, and assure decent cooling, I don't think your extra risk is troublingly high. ____________ | |
| ID: 82276 | | |
I cannot comment on the monitor bit (I shut mine off between sessions usually) but I can attest to the HDD failures/BOINC issue. I replaced one a few months ago that had been in use for about 4 years (crashed hard) and in another box I have one (2.5 years) that is showing the Spin Retry Count to be out of it's acceptable threshhold (debating whether to do a HD swap between boxes because the oldest one is going to the kids shortly). It is possible that both is a case of "averages" but none of my previous computers burned through their HDDs in their lifetimes (5+ years with no BOINC). IMO, the cost of an equivalent HDD is quite reasonable though lost data if unrecoverable from a crash can be bad thing (repeat mantra "backup data, backup data") ____________ | |
| ID: 82328 | | |
Message boards :
Cruncher's Corner :
Sudden lurch in remaining work display