Problem with GPU-CPU tasks

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

OK, so I understand if I run

OK, so I understand if I run two GPU tasks on one card that BOINC is counting as part of the CPU, and it can be fixed via a config file change. Since I am running this on a HT machine with HT on (BIOS is blocked to shut it off), I just changed the CPU to 62.5%, now I am running 4 CPU and the 2 GPU. This change has made no discernible difference in the CPU or GPU tasks. This is exactly what I expected.

Thanks for the education.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110457877278
RAC: 30933369

RE: OK, so I understand if

Quote:
OK, so I understand if I run two GPU tasks on one card that BOINC is counting as part of the CPU, and it can be fixed via a config file change.


I'm not sure I fully understand you, but BOINC doesn't count the GPU as "part of the CPU". It regards the GPU as a coprocessing unit - ie. something that functions quite separately provided there is some CPU support available when needed. With some projects, the needed support may be very small. It's rather more substantial here, particularly for AMD GPUs. The fractional CPU recommendations per GPU task are a compromise, since there is a wide variation in the systems that might attach to this project. Whilst the compromise values seem to be quite good, individual hosts may be able to run more efficiently with tweaked numbers. The only way to really know is to experiment.

Quote:
... now I am running 4 CPU and the 2 GPU. This change has made no discernible difference in the CPU or GPU tasks. This is exactly what I expected.


It's what I would have expected too. You're just wasting some CPU threads. Your problem is the relatively low power of your GPU. As I mentioned previously, you should try running just one GPU task. I wouldn't be surprised if you actually gained a little bit of GPU throughput by doing that. If you did this you would have all your virtual cores available for CPU tasks.

In a previous message you said:

Quote:
Einstein steals a CPU for any GPU work. I run 4 cores CPU, and my GPU, but when I run Einstein it takes one of the CPUs, so only 3 run. Other projects (MW, PG, Seti, etc), I can still run all 4 CPUs and the GPU and not have it steal one.


Einstein doesn't steal anything. BOINC decides what to run and you have some control over that with configuration files. The bit that puzzles me is that you are now saying that you can't turn HT off so you must be running 7 CPU jobs and not 3, surely?? From your previous messages, it appeared that you wanted to maximise your CPU output. So why not try just one GPU task and have 8 CPU tasks?

Cheers,
Gary.

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

RE: I wouldn't be surprised

Quote:
I wouldn't be surprised if you actually gained a little bit of GPU throughput by doing that. If you did this you would have all your virtual cores available for CPU tasks.


I have previously tested this. Running 2 was giving me a little speedup over 1. I didn't test with the current iteration, and you might be right, but I suspect it is minimally different.

Quote:
Einstein doesn't steal anything. BOINC decides what to run and you have some control over that with configuration files. The bit that puzzles me is that you are now saying that you can't turn HT off so you must be running 7 CPU jobs and not 3, surely?? From your previous messages, it appeared that you wanted to maximise your CPU output. So why not try just one GPU task and have 8 CPU tasks?


Einstein tells BOINC how much CPU to use for the GPU, so yes it has control. It says to use .5 CPU per GPU, since running two is taking 1.0.CPU My bet is that can be pared back on the newer iterations of the software since it's not taking that much CPU anymore.

Some projects are not very good with HT utilized, so I run it at half most of the time. One project that I run fairly often actually has a major difference if I am running on 5 cores. It nearly doubles the length. I've been doing some testing on many projects and I just leave most using 4 cores.

John Jamulla
John Jamulla
Joined: 26 Feb 05
Posts: 32
Credit: 1089406147
RAC: 372263

Hi Guys, I'm trying to

Hi Guys,

I'm trying to follow the thread here, not getting it.

I'm pissed on whatever has gone on and changed, since my credit has gone to hell.

Overall, I think the real question is, as soon as these new apps came on-line (not sure exactly when, maybe 3 weeks or a month ago), with OpenCL, etc., all of a sudden our credit dropped dramatically, and einstein@home stopped using the CPU when GPU apps are running, WHY?

BTW - I ONLY run einstein@home and ONLY with NVIDIA GPUs, so why I am being forced to use crappy OpenCL apps with 1/2 as much credit, don't use the CPU, etc. Why was this change made, and I don't see anything good about it?

Each one of my GPU apps all of a sudden demand 1/3 of a CPU (never did that before), so for example on 1 machine, when I have 3 GPUs each running 3 jobs, that's 9 CPUs being asked for, and I only have 8 (hyper-threaded quad core).
I am not noticing anything slower with these GPU jobs vs. previously, and now I just see CPUs doing nothing, there are no CPU jobs running.

Machine I'm sitting on now, I have a GTX 660 Ti with 3 GPU jobs runnings, and I have a quad core i7-3770K hyperthreaded, and it's only using like 40% of CPU now, and won't run any CPU jobs! WTF?
This all happened when I loaded the new BOINC version.

I don't understand why. Doesn't make any sense to me. This wasn't like this not long ago. It doesn't seem to have benefited anything, now I'm trying to pick up the pieces and get myself running max jobs/credit again, and don't really know how.

That's a MAJOR MAJOR change all of the sudden.
I also noticed, that with the new version of BOINC I think (maybe it's these APPS), it "hangs" my machine for say 15 seconds at a time when I let it run even when CPU being used. It NEVER did that before either. The CPU isn't even really being used, so what is causing that?

So to me, this new set of updates is crap, it's slower, it's caused problems, and changes to setups for everyone without any notice.
I don't think I should have to go around to all my machines, check them and re-configure everything (which honestly I don't even know what is the proper thing to do at this point).

I have a machine (I mentioned above) with 3 GPUs, each running 3 GPU jobs, and a quad core hyper threaded, and I had 7 or 8 (can't remember) CPU jobs running, and I was averaging like 100K credits a day, was GREAT!
As soon as I made a BOINC update, it went from working well to all of a sudden getting 1/2 as much credit and running no CPU jobs only GPU jobs.
I paid good money for both GPUs and CPU just for einstein@home and my $400 CPU si sitting around idle now.

I don't understand what happened. And on top of it, MAJOR changes like that should have some sort or project notice right in BOINC. Kinda ridiculous to just say "well you should only be running a single GPU app, when for years, you wanted to optimize how many on a single GPU.

After my ranting here, would be GREAT if someone told me how I should be setting things on these machines given the current state of things.

Do I just change it to run 1 job on a GPU at a time and things will go back better? Sounds like maybe No.

It was nice I last year or two got rid fo the app_info.xml, now sounds like I might have to go back to using it!

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

I'd bet the problems started

I'd bet the problems started when the GPU app for Gamma-ray Pulsar search #3 (FGRP3) was released, this is still an app in development and it should get better as new versions get released.

The easiest way for you to get back to how things were is to go to your Einstein@Home preferences and set the "GPU utilization factor of FGRP apps" to -1. If you have set up different venues/locations don't forget to change the setting in all of them. This will prevent you from receiving FGRP3 tasks and when the tasks that's already in you cache are completed thing should go back to normal. Or do the drastic thing and just abort the "offending" tasks.

The reason the app was released is because it's faster on the GPU than on the CPU so the project benefits. They run for something like 20 hours (don't see one in my list of tasks so can't give a correct number) on my i7 3770K but take only ~3 hours running x2 on my GTX660Ti.

As to using OpenCL I think that future apps released here is probably going to use OpenCL as it's less work for the developers to write and maintain one app that with only minor modifications run on all or close to all platforms.

Quote:
I also noticed, that with the new version of BOINC I think (maybe it's these APPS), it "hangs" my machine for say 15 seconds at a time when I let it run even when CPU being used. It NEVER did that before either. The CPU isn't even really being used, so what is causing that?


I've noticed that when I run more than x2 FGRP3 tasks on my 660Ti I also get screen lags, not for 15 seconds but maybe 1-2 seconds. Enough to be really irritating when using the machine so I opt to run x2. I think the reason is because a lot of the processing is still done on the CPU and the app has to transfer a lot of data back and forth to the GPU and this either saturates the PCIe bus or use driver calls that block the CPU until done.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110457877278
RAC: 30933369

RE: I have a machine (I

Quote:
I have a machine (I mentioned above) with 3 GPUs, each running 3 GPU jobs, and a quad core hyper threaded, and I had 7 or 8 (can't remember) CPU jobs running, and I was averaging like 100K credits a day, was GREAT!
As soon as I made a BOINC update, it went from working well to all of a sudden getting 1/2 as much credit and running no CPU jobs only GPU jobs.
I paid good money for both GPUs and CPU just for einstein@home and my $400 CPU si sitting around idle now.


The downward change in your RAC for this machine was not caused by the BOINC update. It wasn't even caused by the initial release of the FGRP3 GPU app because (until quite recently) FGRP3 GPU tasks were not allocated to GPUs with less than 2GB GPU RAM. Now that the limit has been revised down to 1GB, that host will be suffering, particularly if you are trying to run multiple FGRP3 tasks per GPU. Holmis has given good instructions for disabling FGRP3 tasks which you can use to solve the problem.

I'm responding because I think you can make other changes to improve things apart from just disabling FGRP3 GPU tasks. The 3 GPUs in this host are listed as GTX460 1GB. All Einstein GPU apps are pretty demanding on PCIe bandwidth so trying to run 3 concurrent tasks on each GPU (9 in total) as well as a bunch of virtual CPU threads is going to impact on performance, even if you have a pretty high end motherboard. If you could shift one of those GPUs to a different machine, you may be able to improve overall performance. You should also experiment with freeing up one or more virtual CPUs (set less than 100% for CPU usage) in order to improve overall performance.

Quote:
I don't understand what happened. And on top of it, MAJOR changes like that should have some sort or project notice right in BOINC.


The BOINC notice system seems to tell you about new BOINC versions but I don't recall seeing notices about new project apps and versions. I find it easier to look at the technical news board which always has announcements about such things. There's hardly any traffic on that board unless something important is happening so if you're concerned about performance you should check that board regularly.

Quote:
Kinda ridiculous to just say "well you should only be running a single GPU app, when for years, you wanted to optimize how many on a single GPU.


Do you mean "single GPU task" rather than "single GPU app"? Whatever you mean, nobody is telling you to do anything in particular. The important thing is that circumstances change, apps/versions come and go so you have to pay a bit of attention if you don't want surprises. If you go back and read the announcement thread in Technical News, it should give you a good idea of the various problems with the deployment of the FGRP3 CPU/GPU apps.

While you're at it, you might like to check this thread in Cruncher's corner regarding some validation issues with the second half of the BRP5 run which has just started. That way you wont be taken by surprise if you notice a few validate errors turning up in BRP5 tasks.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110457877278
RAC: 30933369

A couple more points you

A couple more points you might like to consider:

Quote:
BTW - I ONLY run einstein@home and ONLY with NVIDIA GPUs, so why I am being forced to use crappy OpenCL apps with 1/2 as much credit, don't use the CPU, etc. Why was this change made, and I don't see anything good about it?


You are not being forced to use "crappy OpenCL apps". You are perfectly able to run either BRP4 or BRP5 which have CUDA apps and probably more suited to your GPUs. The FGRP3 app only comes in an OpenCL version. The app is not "crappy", just immature. A lot of the calculations are not yet being run on the GPU - hence the high CPU involvement. As happened with BRP4/5, this will be rectified over time. The FGRP3 GPU app probably suffers on your GPUs because of NVIDIA's 'crappy' OpenCL implementation :-). I don't know that for a fact because I've never tried to run an OpenCL app on an NVIDIA GPU. I believe I've seen others mention it though.

Many people don't seem to realise how much improvement there has been in AMD cards and particularly the drivers in the last year or three. I started with a whole bunch of GTX650 budget GPUs and I still think they are pretty good but I'm much more impressed with my more recent purchases of AMD HD7850 cards. I've been pleasantly surprised at the modest power consumption when the output is more than double that of the GTX650s. Take a look at this host. It has a quad core i5-3570K CPU (2 CPU tasks and 2 free cores) and runs tasks 4x on the GPU. It has a RAC of almost 80K (falling at the moment because of the BRP5 validation issue) from a GPU that cost just $US135 - if I wanted a 660Ti it would cost $100 more here. I don't have it hooked up to a power meter at the moment but from memory the total draw from the wall was around 170 watts when I first set it up. So much for the "crappy OpenCL apps" theory :-).

Quote:
Machine I'm sitting on now, I have a GTX 660 Ti with 3 GPU jobs runnings, and I have a quad core i7-3770K hyperthreaded, and it's only using like 40% of CPU now, and won't run any CPU jobs! WTF?


I think you must be so focused on your rant that you're not checking things correctly. With 3 GPU tasks running on an 8 core host, your worst case would be that 3 of the 8 virtual cores would be reserved by BOINC for GPU support purposes if the 3 GPU tasks were FGRP3. If the other 5 cores are not trying to run CPU tasks then either you don't have any CPU tasks in your cache or your settings are somehow preventing them from trying to run.

Quote:
This all happened when I loaded the new BOINC version.


I'd be extremely surprised if it had anything to do with a BOINC version change.

Cheers,
Gary.

(retired account)
(retired account)
Joined: 28 Sep 11
Posts: 16
Credit: 7357648
RAC: 0

Anybody else having validate

Anybody else having validate errors with Gamma-ray pulsar search #3 v1.11 (FGRPopencl-ati) against Gamma-ray pulsar search #3 v1.11 (FGRPSSE)? For me it appears to be a game of pure chance now. 4 validate errors and 10 valid so far.

On the bright side, I really appreciate the small up- and download volume here for the new #3 search. CPU usage for the OpenCL app is 60 to 70% for me. Given the substantial speedup over the SSE app, this app and the preset of 1.0 cpu resource does make sense, IMHO.

Mark my words and remember me. - 11th Hour, Lamb of God

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110457877278
RAC: 30933369

RE: Anybody else having

Quote:
Anybody else having validate errors with Gamma-ray pulsar search #3 v1.11 (FGRPopencl-ati) against Gamma-ray pulsar search #3 v1.11 (FGRPSSE)? For me it appears to be a game of pure chance now. 4 validate errors and 10 valid so far.


This is one of the things the Devs are most interested in. Quite a while ago Bernd said that they had full confidence in the CPU app but that the GPU app was a bit of an unknown quantity and they were hoping to study CPU app/GPU app validations in order to prove up and have confidence in the GPU app. I guess they would be most interested in checking cross app validation. A validate error is a bit different. It occurs, not from result to result comparison, but from a sanity check by the validator before any comparison is attempted. However you can be sure that the Devs will be looking closely at both validate errors and any invalid results that arise from direct comparison.

Quote:
On the bright side, I really appreciate the small up- and download volume here for the new #3 search. CPU usage for the OpenCL app is 60 to 70% for me. Given the substantial speedup over the SSE app, this app and the preset of 1.0 cpu resource does make sense, IMHO.


I'm sure the Devs really appreciate your commitment to running the FGRP3 GPU app. Now that the (potentially) large LATeah skygrid files are no longer needed, download volumes are indeed very much reduced. Until now, I've not run the FGRP3 GPU app on my GPU hosts at all but just this week I made the decision to purchase some extra HD7850 cards to put into a number of older hosts with Q8400 quad core CPUs that I have crunching FGRP3 on the CPUs. They are over 4 years old and only have DDR2 RAM and PCIe version 1.x so GPU performance will suffer anyway but I'm going to try running 2 CPU tasks (2 free cores for GPU support) and 3x on the 2GB GPU.

As a preliminary test, I've set up this Q8400 host with a HD7850 crunching BRP5 4x on the GPU. It's doing the 4 GPU tasks in around 24.7Ksecs - a bit under 7 hours. By way of comparison, this Phenom II X4 host which has the same vintage of (slightly lesser capability) CPU but a motherboard that has PCIe2 and DDR3 1333 RAM, manages to do the 4x GPU tasks in around 16Ksecs - around 4hrs 25mins. Just goes to show the value of PCIe2+ and DDR3 RAM.

EDIT: I should also point out that this very machine used to run Catalyst 13.4 drivers and a month or two ago I upgraded to 13.12. Before the upgrade, the 4x time was over 19Ksecs. The drop to 16Ksecs was purely from the driver upgrade - nothing else changed.

My next step will be to work out how to reshuffle my venues to allow a new class of host running FGRP3/FGRP3 as well as the existing FGRP3/BRP5. Long ago I'd exhausted all venues so I can't see how I'm going to achieve this :-).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110457877278
RAC: 30933369

RE: My next step will be to

Quote:
My next step will be to work out how to reshuffle my venues to allow a new class of host running FGRP3/FGRP3 as well as the existing FGRP3/BRP5. Long ago I'd exhausted all venues so I can't see how I'm going to achieve this :-).


Well I worked out a compromise with the venues so I now have 2 new 7850s in two old Q8400 hosts crunching both CPU and GPU versions of FGRP3. I started this one crunching GPU tasks singly to get baseline info, and this one crunching 2x to see the difference.

What I ended up doing reinforces why I love running Linux. The distro I use had a news item recently pointing out that they were switching to a 3.12.x kernel and were revamping video drivers and that certain types of GPUs would need the new kernel in order to run the latest drivers. I didn't do much research but I wanted to try the latest kernels anyway because for a long time this distro had been stuck with quite old kernel versions - 3.2.x and 3.4.x. So rather than just sticking in a card and adding a driver from the repo, I decided to reinstall the OS completely (using a live USB from October last year) and do a full upgrade, kernel and everything. I keep a fully updated local copy of the repo so everything was on hand.

So I just stopped BOINC, shutdown and installed the new hardware, rebooted from the live USB, reinstalled the OS without touching the /home partition (where BOINC lives), rebooted the new installation and did a full upgrade from the local repo, making sure the OpenCL libs were also installed. The machine had been running BOINC 6.10.58 so I threw in all the 7.2.42 files 'over-the-top' as well as the einstein app and a suitably adjusted app_config.xml in the project directory. I like running any new BOINC or project apps through ldd to check for any missing shared libs, so that was the final step. A final reboot, and then launch BOINC. BOINC noticed the version change and the app_config.xml file and it did a work request for some GPU tasks. The local copy of the app was also noticed together with all the existing data files so nothing had to be downloaded before crunching could start.

The whole exercise only took about an hour and a half. The biggest part was upgrading from the local repo - 5 packages to delete, 29 new packages and 357 to upgrade. That took about 15 mins or so. So after completing the first machine, I did the second one, this time setting the app_config.xml file to have 2x for GPU tasks.

These two new setups have been running about 12 hours. I allowed the first one to complete 1 GPU task and get about 20 mins into the second before changing app_config.xml to allow it to run 2x. The second host was running 2x from the start and I could see that it was going to run 2 tasks in much the same time as the first one running singly. Here is a summary of what I'm seeing so far (very early days):

1. Run time for a CPU task (no GPU) -- around 12+ hours
2. Run time for a 1CPU+1GPU GPU task -- around 3 hours (only one task done that way)
3. Run time for a 1CPU+0.5GPU GPU task (running 2x) -- around 3 hours (average of 16 results)

When running 1x, 3 CPU cores were running CPU tasks. When running 2x, 2 CPU cores were running CPU tasks.

The machines are at a remote location and I'll change them to run 3x when I next go there. I'm on home duties at the moment :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.