Important news on BRP7 and FGRPB1 work on E@H

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3745

Credit: 35580912771

RAC: 36609429

completed and validated. ran

29 Sep 2023 16:05:14 UTC

Message 217664

(moderation:

)

completed and validated. ran for 4 hours.

https://einsteinathome.org/task/1530744930

_________________________________________________________________________

Allen

Joined: 23 Jan 06

Posts: 71

Credit: 433172901

RAC: 1229749

Just a quick

29 Sep 2023 18:28:29 UTC

Message 217667

(moderation:

)

Just a quick question.

I've switched everything over to the BPR7, except the one machine running BPR4 (Intel GPU) and my RAC has dropped from 2.3 million to under 1 million.

Is this to be expected?

Thanks,

Allen

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3745

Credit: 35580912771

RAC: 36609429

Allen wrote: Just a quick

29 Sep 2023 18:50:25 UTC

Message 217669 in response to message 217667

(moderation:

)

Allen wrote:

Just a quick question.

I've switched everything over to the BPR7, except the one machine running BPR4 (Intel GPU) and my RAC has dropped from 2.3 million to under 1 million.

Is this to be expected?

Thanks,

Allen

yes, it's expected

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 216

Credit: 8460854329

RAC: 2324420

Ian&Steve C.

29 Sep 2023 19:35:33 UTC

Message 217671 in response to message 217664

(moderation:

)

Ian&Steve C. wrote:

completed and validated. ran for 4 hours.

https://einsteinathome.org/task/1530744930

Ha! That is impressive (both that it worked and how long it actually took!).

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5851

Credit: 110827705257

RAC: 33670309

Bernd Machenschalk wrote: On

29 Sep 2023 20:41:06 UTC

Message 217672 in response to message 217630

(moderation:

)

Bernd Machenschalk wrote:

On a larger scale (overall project), 0.15 (Linux) had an invalid rate of >10% when paired with 0.12 (Windows), while 0.17 has 3-4%. From my perspective this looks like an improvement, though possibly not for each and every host.

Since it's early days, are you also tracking the rate for inconclusives? What ever that rate is now, when enough time has elapsed for the 'decider' to be crunched and returned, 50% of current inconclusives will be extra invalids.

As a case in point, here are my latest figures for the 0.17 test machine:-

Pending = 14
Valid = 64
Invalid = 5
Error = 0
Inconclusive = 19

Because of the Windows/Linux imbalance, my guess is that ~80% of my inconclusives will be future invalids and ~20% will be valid - something like 15 to 4 tasks split for the 19 above. So, my current situation is more likely to become something like 68 valid to 20 invalid. This is actually worse than what I was seeing with the 0.15 app.

Of course, a Windows user wont really see a problem but as a Linux user, I do.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5851

Credit: 110827705257

RAC: 33670309

On thinking about the

30 Sep 2023 1:29:00 UTC

Message 217694

(moderation:

)

On thinking about the 'inconclusives' problem, I wonder if there might be two possible options for a solution, if there isn't a 'fix the app or validator' way to get a validation match for properly computed results.

Option 1:- Break up BRP7 into 2 separate searches - BRP7_Win and BRP7_Lin, with others (MacOS, etc.) added to whichever is the best likely match for them.

Option 2:- Add functionality to the scheduler to 'know' the OS of the _0 task in an 'unfilled' quorum so that when it looks at allocating the _1 task, it attempts to match the OS to that of the _0. If that could be done (even just most of the time) it should largely eliminate the Windows/Linux pairing that appears to cause so many inconclusives.

My personal preference would be for option 2.

I'm not a programmer so I have no real idea of how complicated this might be. If there are people reading who have understanding of how the scheduler works, I'd like to hear comments on whether or not any of this is feasible.

Judging by how long FGRPB1G lasted, it would seem that BRP7 might be around for a long time as well. Solving the wastage of volunteered contributions (both hardware and electricity) should be not only worthwhile, but pretty much an imperative, in my humble opinion.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5851

Credit: 110827705257

RAC: 33670309

I decided to look at the

30 Sep 2023 4:42:07 UTC

Message 217701

(moderation:

)

I decided to look at the quorum for one of my 5 current invalids. There were 4 hosts, 2xWin 2xLinux. The 2 Win tasks were validated. Out of curiosity, I checked the task stats for the other Linux host. Below are the numbers at the time I looked:-

Pending = 395
Valid = 518
Invalid = 225
Error = 5
Inconclusive = 150
In progress = 25
All tasks = 1,318

The host has dual 1070Ti GPUs and is running an 'anonymous platform' app so maybe that has something to do with the 375 total for invalid plus inconclusive. By looking at the stderr output, the app is listed as:-

BRP7_einsteinbinary_x86_64-pc-linux-gnu__cuda1200

if that means anything to anybody. Is that the 0.16 beta test app?

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 4777

Credit: 17790837168

RAC: 3817348

No that is one of Petri's

30 Sep 2023 8:00:17 UTC

Message 217703 in response to message 217701

(moderation:

)

No that is one of Petri's optimized app versions.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3745

Credit: 35580912771

RAC: 36609429

the beta app is cuda 102, and

30 Sep 2023 13:27:37 UTC

Message 217713

(moderation:

)

the beta app is cuda 102, and actually validates really well on my test (3 invalids out of 200 tasks). slightly better than the v0.17 opencl app which has 5 invalids so far for the same number of tasks.

in my analysis, the invalids (as a linux/nvidia user) almost always come from a pair of windows hosts. either win_cuda55+win_cuda55 or win_cuda55+win_ati or win_ati+win_ati. I had one invalid from a win_cuda55+linux_nvopencl pair, but that was an outlier. it's definitely a windows vs linux thing. and due to the relative spread of windows vs linux hosts (many more windows) that puts the linux hosts at a disadvantage.

that user is also running the older version (mostly for speed testing purposes). I recompiled it with cuda1222 using GCC 7.3 per Bernd's comments earlier in this thread.

_________________________________________________________________________

Keith Myers

Joined: 11 Feb 11

Posts: 4777

Credit: 17790837168

RAC: 3817348

And from what I have seen the

30 Sep 2023 17:14:49 UTC

Message 217722 in response to message 217713

(moderation:

)

And from what I have seen the new app compiled with GCC 7.3 has really helped bringing down the inconclusives.

Massive improvement for the one host starting with over 38% inconclusive rate on the older 1830 version I had on the host because I never got to updating to the 1200 version. Now on your 1222 version,

I thought it was a good idea to match how Bernd compiled the latest beta.

Important news on BRP7 and FGRPB1 work on E@H

Forums › Technical News

Comment viewing options

Forums › Technical News