GPU and CPU

John

Joined: 1 Nov 13

Posts: 59

Credit: 573081286

RAC: 0

31 May 2016 17:37:53 UTC

Topic 198624

(moderation:

)

A few questions, cause there are lots of gpus and cpus to combine:

1. in the GPU statistics table, the GTX 580 takes 2nd place, before the GTX Titan X. I don't know why for now, but the price is a big difference. You can find some GTX580s on Ebay or Amazon at something between 50eur and 100eur. Much lower than the Titan X. So, is it worth buying one? Was thinking about the 1,5Gb RAM version.

2. If the previous answer is YES, then which system would be best: an i7 920 (found at about 200eur) or one with an i5 6600 (at about 500eur)?

3. Is it a waste of time and money to look for old tech (cards), or not really? I'm thinking mostly about the fact that they may overheat, are used, you never know what you buy - compared to a new card.

Cheers!

mikey

Joined: 22 Jan 05

Posts: 11973

Credit: 1834116705

RAC: 224573

GPU and CPU

31 May 2016 17:55:24 UTC

Message 139926

(moderation:

)

Quote:

A few questions, cause there are lots of gpus and cpus to combine:

1. in the GPU statistics table, the GTX 580 takes 2nd place, before the GTX Titan X. I don't know why for now, but the price is a big difference. You can find some GTX580s on Ebay or Amazon at something between 50eur and 100eur. Much lower than the Titan X. So, is it worth buying one? Was thinking about the 1,5Gb RAM version.

2. If the previous answer is YES, then which system would be best: an i7 920 (found at about 200eur) or one with an i5 6600 (at about 500eur)?

3. Is it a waste of time and money to look for old tech (cards), or not really? I'm thinking mostly about the fact that they may overheat, are used, you never know what you buy - compared to a new card.

Cheers!

I'm running Nvidia 760 and Nvidia 560 gpu's and they work just fine making credits without running too warm for me. I only run one workunit at a time and the gpu's are running about 69C.

I'm running mostly AMD cpu's so can't say which of your Intel cpu's would be better here, but at most projects Intel cpu's are in general better than AMD cpu's at crunching. AMD cpu's though can be much cheaper to buy, which is why I run them.

archae86

Joined: 6 Dec 05

Posts: 3146

Credit: 7061564931

RAC: 1189116

RE: in the GPU statistics

31 May 2016 18:55:35 UTC

Message 139927

(moderation:

)

Quote:

in the GPU statistics table

I don't think that table is adjusted for even some of the most basic variables, such as number of tasks run at once. Hence it is extremely unreliable as a means of estimating relative performance.

Quote:

Is it a waste of time and money to look for old tech (cards), or not really?

The big good thing is that you may be able to get an older card for a very good price. At a low enough price purchase cost productivity can be very good.
The big bad thing is that generally speaking the power efficiency of cards has steadily improved, so that a proper total cost computation that gives the right weight to the cost of power may make a more modern card look better.

No one can do the power cost effect computation for you, as it depends on your own cost of incremental power, expected service life, and so on. If you are dipping your feet on a trial basis, perhaps the expected service life is short, which pushes the weight toward purchase price and away from power cost. If you are in for the long haul, and not likely to change out cards frequently, and live in a high power cost location, then power cost is likely to dominate.

At the moment the Einstein GRP6 CUDA55 (which you get if you opt to allow test applications in your Einstein preferences for the location) is much less demanding of the CPU support applications than has historically been the case here at Einstein for Nvidia cards. Thus you can get by with not too wonderful a CPU. However I/O matters, so I think you might be better off with a modern motherboard hosting a somewhat low-end modern processor than with other alternatives. I've been pretty happy with a dual-core Haswell host.

On the other hand, if Einstein offers a new gravity-wave application which uses GPUs some day, it may well be far more demanding of CPU support than the current GRP6 application.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5850

Credit: 110051164537

RAC: 22714200

RE: Is it a waste of time

1 Jun 2016 2:06:27 UTC

Message 139928

(moderation:

)

Quote:

Is it a waste of time and money to look for old tech ...

No, it's not a waste but you need to really do your homework first. Here are some questions you need to ask yourself.

* How strong is my commitment to this DC thing? Will I still be here in 5 yrs time?
* Can I afford the up front costs AND the on-going power bills?
* Have I researched the range of GPU types so that I understand which ones are likely to produce the best output for the TOTAL spend I'm prepared to make?
* Do I properly understand the likely flaws of the GPU statistics table?
* Have I tried to use the top computers list to work out a better performance metric than the GPU table?

There are also some general principles to keep in mind when trying to spend your money wisely. There is often quite a premium for top of the range performance. For a lot more dollars, you get a bit more performance. The dollars may also go in higher running costs for that small performance gain.

There is always a lot of hype around new releases. You can sometimes turn this to your advantage. Lots of people in the gaming community get sucked into upgrading their rigs and they sell off the previous generation to fund the new stuff. It's probably a good time to pick up a decent crunching card at a considerable discount.

If you are interested in a particular model, really be aware that the older it is, the less remaining life it will have and the more power it is likely to use. If you've considered all that and you're still interested, try to work out what it's going to produce at Einstein. Let's take the GTX 580 you mentioned as an example. Scroll through the top hosts list and see if you can find that one or a closely related one. I tried that and found host #88 owned by stoneageman. It has 2 GPUs at least one of which is a 570. You would think a 580 might be slightly more productive. I didn't keep going but you should until you find a 580.

By looking at the tasks list for this host, I see lots of results taking around 4,400 secs with very little variation. To me that says that both GPUs are the same. You can confirm this by following enough task ID links to see the actual model being used as gpu0 and gpu1. The crunch times are so close that I didn't bother doing that.

So we assume that the RAC shown (172K) is being produced equally from each GPU - 86K. There are no Einstein CPU tasks for this host (there could be for other projects) so 86K is entirely from a GPU. Can we work out what concurrency is being used? On the assumption that the host is running 24/7 and is stable and is only doing Einstein GPU tasks (likely correct because of the good RAC) the concurrency can be calculated from the following formula

concurrency = daily credit X av. elapsed time / 380,160,000

where 380,160,000 = seconds per day X credit per task = 86,400 X 4,400

So, for an average elapsed time of 4,400 seconds, the above formula gives a concurrency of 0.995 ie. 1x. The closeness to a whole number gives confidence that our assumptions are correct.

You should bear in mind that it should be possible to derive even higher output than 86K by running a concurrency of 2x. So the RAC looks attractive and now you just need to investigate power consumption :-).

Cheers,
Gary.

John

Joined: 1 Nov 13

Posts: 59

Credit: 573081286

RAC: 0

RE: At the moment the

3 Jun 2016 15:01:35 UTC

Message 139929 in response to message 139927

(moderation:

)

Quote:

At the moment the Einstein GRP6 CUDA55 (which you get if you opt to allow test applications in your Einstein preferences for the location) is much less demanding of the CPU support applications than has historically been the case here at Einstein for Nvidia cards. Thus you can get by with not too wonderful a CPU. However I/O matters, so I think you might be better off with a modern motherboard hosting a somewhat low-end modern processor than with other alternatives. I've been pretty happy with a dual-core Haswell host.

On the other hand, if Einstein offers a new gravity-wave application which uses GPUs some day, it may well be far more demanding of CPU support than the current GRP6 application.

Was thinking exactly about this when I opened the topic. Some sort or mix between a strong (server?) CPU vs a commercial one (let's say an i5). It seems an i3, even the 6th generation, can be pretty weak, underpowered. At least looking at the computers top and the procs table.

Regarding Gary's answer (fascinating and detailed), one dilemma is IHC (initial high cost) vs LHB (later higher bills).
About Q4 - the flaws of the GPU table, I didn't know too much about how it showed up, but thanks to this demo, now it's clear. At least as a general guideline.
Thanks for the card-upgrading tip, had no ideea about this habit.
Power consumption should be easy to investigate after about 3-6 months from installing the GPU. You can have an average with full days and partial ones I guess.

About >1x concurrencies, why would someone reach for higher ones?

archae86

Joined: 6 Dec 05

Posts: 3146

Credit: 7061564931

RAC: 1189116

RE: About >1x

3 Jun 2016 15:54:02 UTC

Message 139930 in response to message 139929

(moderation:

)

Quote:

About >1x concurrencies, why would someone reach for higher ones?

To get more total output.

In simplistic terms, for most of the distributed computing applications that employ GPUs, the GPU gets loaded up by a CPU task and set free to do some work. But shortly it finishes the bit it can do without some more support from the CPU. If you are running a single task on the GPU (concurrency=1) then the GPU simply sits waiting idly while the host OS notices the request for CPU activity, assigns the task to a (virtual) CPU core, and that task gets enough done to put the GPU back in condition to run a bit longer.

If you are running more than one GPU task, there is a decent chance that when one task holds up this way, another has already been serviced and is ready to go. Modern GPUs have internal hardware that allows the rather huge context switch to be done quite fast. Nevertheless there is some cost to the switching, so running too high a concurrency costs extra power, extra CPU work, and reduced total output. How high is "too high" is the question.

There are some GPUs in use here which don't produce correct results when run with concurrency greater than one (they are AMD models, I think). But for all other cases I can recall running concurrency of 2 gives more total system throughput than concurrency of 1. The amount of benefit varies between very big and modest. For illustration I'll hazard a guess that a typical benefit is perhaps a 40% increase in output, with the cost of a little more power consumption and somewhat higher CPU loading.

The benefit, break even, or harm arising from going above concurrency of 2 varies greatly with application, model of GPU, and perhaps host system characteristics. While it is rare for 3x to do substantially worse than 2X, it is not rare at all for it to fail to provide any appreciable improvement, including slight harm.

Gary Roberts is careful on these things, and I believe he has recently reported further gains up as high as 5X or 6X on particular AMD models running GRP6. In my own observations on about three applications and about four models of Nvidia GPU, I've never seen appreciable gain above 3X, and often seen clear loss up there. For the person not caring to do carefully monitored testing, my general advice is to set it to 2X and forget it. In very few cases will this leave much on the table, and in almost all cases it gives a nice benefit over 1X.

Regarding Intel chip models, you perhaps should be aware that Intel in a given generation commonly has very few distinct actual chips, but many part designations. Where the designations indicate functionality differences, they probably either burn fuses or break connections with a laser to convert the generic all-capable chips into the somewhat less capable chips.

Sometimes the reductions in capability don't matter at all for your purpose at hand, so buying the cheaper part with the less impressive name can be a pure win. Right now, I think all the following names are in use:
Celeron, Pentium, i3, i5, i7, Xeon.

People will tell you general rules such as "never buy a Celeron". That is not much smarter than saying "never buy a Chevrolet", as Celeron is just a brand, and has no permanent meaning save that it is usually the cheapest flavor on offer. In some generations Intel sold Celeron parts that were real dogs (the very first one, as I recall, disabled a crucial cache component and was no fun at all to own), while others were spectacular buys (soon after the initial disaster they sold one using the P55 chip which was often rather wonderfully overclockable, and was wildly the best buy of the generation even at stock clock).

You can spend hundreds of dollars at the Intel store buying almost nothing that will help you over a moderate model. I personally selected an i5-4690K in my most recent build, for a machine that is not just an Einstein workhorse but my personal daily use machine for the next 5 years. I think I probably over-spent on the CPU. I have zero regret at not getting an i7.

Gamboleer

Joined: 5 Dec 10

Posts: 173

Credit: 168389195

RAC: 0

An example: On my

3 Jun 2016 22:45:57 UTC

Message 139931

(moderation:

)

An example:

On my underclocked 7970 running as the only card on an i5-3570, my times are something like this;

1 unit: 50 minutes
2 units: 92 minutes (46 each)
3 units: 135 minutes (45 each)

4 shows an even smaller improvement, and 5 shows none. I've been running 3 simultaneous, but the CPU only has 4 cores and the GPU app uses that full 50% per core. I dropped it to 2 to verify the times above, and I will leave it there pending acquisition of an RX 480, so I can run two on both cards.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5850

Credit: 110051164537

RAC: 22714200

RE: ... he has recently

4 Jun 2016 5:13:36 UTC

Message 139932 in response to message 139930

(moderation:

)

Quote:

... he has recently reported further gains up as high as 5X or 6X on particular AMD models ...

I bought an ex-gaming HD7950 recently on ebay and, after confirming it was producing valid results at 1x, I pretty much started it at 4x. I have a lot of HD7850s that have run very well for a long time at 4x so I was confident this would be a suitable starting point. By going to 5x and 6x, the further gains are measurable but have been quite modest. There have been no problems with crashes or invalid results. I'm leaving it at 6x for other reasons.

When I bought the GPU, I also acquired the board and RAM it had been running with. The board is a Gigabyte GA-990FXA-UD3 and I decided to bid on it as well since it had 4 PCIe slots of which two should be usable for a dual GPU setup. I tried to buy the CPU (8 core FX-8320) but it went for way too much. I ended up buying a new FX-6300 to run the show.

I knew in advance that I would be testing 5x and 6x so I decided I would start off using 4 unloaded CPU cores by allowing just 2 CPU tasks. At 4x for GPU tasks, the average 'per task' time was around 47-48 minutes, ~30 tasks per day or a theoretical RAC of 132K. CPU tasks take over 12 hours each (GW-I series) so less than 4 per day - around 7.5K RAC. So the maximum I should expect is 140K.

At these settings there is a long term work cache problem. I use app_config.xml to control both CPU tasks and GPU tasks. For 4x concurrency, I use cpu_usage=1 and gpu_usage=0.25. The BOINC preference is still to use 100% of CPU cores so BOINC sees 6 cores despite being 'allowed' to use only 2. I like to do it this way because if there was a GPU task outage, the host could crunch 6 CPU tasks without having to make any change and would automatically revert to 2 CPU tasks once the outage was over. The 4 GPU tasks (when running) 'tie up' 4 CPU cores to service them. With a 3 day cache setting, BOINC fetches enough CPU work for 6 cores - effectively 9 days worth of CPU work since only 2 cores will crunch the tasks.

Fortunately, there is an easy way for keeping the amount of CPU work in check. If I run 6x, I do gain a little in production, around 45-46 minutes per task or about 31.5 tasks per day. Because the elapsed time for GPU tasks is longer at 6x, BOINC increases the DCF (duration correction factor) accordingly which causes the estimate for CPU tasks to increase markedly, so limiting the number of CPU tasks in the cache. With a 3 day setting, I have 3 days of GPU work but only about 6 days of CPU work. So it seems 6x is win/win for me. Of course, when CPU tasks complete much faster than the estimate, BOINC will lower the DCF a bit but this is quickly reversed when the next GPU task finishes. Because of the preponderance of GPU tasks, the CPU tasks can not make any headway in lowering the DCF and so the cache growth is restricted.

At the moment, the RAC on this machine seems to have risen to a new plateau. Here is a screenshot of the RAC graph taken just now. I reckon you can easily see that things were heading to a lower plateau when the change from 4x to 6x kicked in and upped the level a bit :-). The current parameters in app_config.xml are gpu_usage=0.16 and cpu_usage=0.67 which gives the 6 GPU tasks and 2 CPU tasks.

Cheers,
Gary.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: I knew in advance that

4 Jun 2016 7:42:39 UTC

Message 139933 in response to message 139932

(moderation:

)

Quote:

I knew in advance that I would be testing 5x and 6x so I decided I would start off using 4 unloaded CPU cores by allowing just 2 CPU tasks.

Thanks for the write-up Gary, i would be interested to see how the GPU run times change with 1 CPU task and 0 CPU tasks.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5850

Credit: 110051164537

RAC: 22714200

OK, I think the quickest and

6 Jun 2016 1:02:05 UTC

Message 139934 in response to message 139933

(moderation:

)

OK, I think the quickest and least invasive way to see any difference is just to suspend all CPU tasks on board, including the two running, and then suspend the 6 running GPU tasks. This allows 6 fresh tasks to start. I've just done that so, in about 4.5 hours time, there should be 6 freshly completed tasks which should be enough to get a rough idea of any 'improvement' on offer. I expect it wont be much but I've never tried it before so really don't know for sure. I'll edit this post with the results as soon as they are available.

After that, the partly crunched tasks will be resumed and I plan to adjust app_config.xml to allow 3 CPU cores to crunch. This will have an effect on the GPU crunch time but I'm hoping it won't be very much. If it's not too bad, I'll let it run that way for a few days so as to get enough results for a more precise mean value for both CPU and GPU tasks at that setting. Because of the previously described CPU task overfetch, I actually have several days of O1AS-I tasks still left - just enough to get the information before those tasks are gone and it's back to FGRPB1.

EDIT1: With the 6 GPU tasks at the 50% mark, the projected full crunch time per task calculates out to be 45.5 mins. The long term average for crunching 6+2 has been pretty much the same. It seems like losing the 2 concurrent CPU tasks has made no difference to the GPU crunch time. I'll report again when they finish.

EDIT2: The average for the 6 completed tasks was 45.51 mins and all 6 results were within 8 seconds of the mean. So there really was no gain from running no CPU tasks. I've adjusted app_config.xml to allow 3 CPU tasks with the next 6 GPU tasks. I'll report back later when I have data for both sets of times at this 6+3 configuration.

I should add that this all agrees very much with my experience with around 30 Pitcairn series (7850) GPUs where I've been running 4 GPU tasks plus 1 CPU task on G3258 Pentium dual cores and with 3 CPU tasks on i5-3570 quad cores. On i3 series CPUs (dual core plus HT), I only run 2 CPU tasks. I might try running 3 and see what happens. When I first tried that (before HB weaved his magic) I'm sure it didn't go well. Maybe it'll be different now with the much improved app :-).

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5850

Credit: 110051164537

RAC: 22714200

I now have 18 completed GPU

7 Jun 2016 3:35:59 UTC

Message 139935

(moderation:

)

I now have 18 completed GPU tasks and 3 completed CPU tasks since I made the change from 6+0 to the 6+3 configuration (concurrent GPU+CPU tasks) mentioned in the previous post. This is certainly not enough to predict long term stable behaviour but it is sufficient to make an assessment about whether or not I should persist with the test. The table below shows the comparison of three configs, 6+2, 6+0, 6+3. The motherboard, CPU and GPU have been noted previously and everything is at stock frequency and voltage.

[pre]Config Av Run Time (sec) Tasks/day Theoretical RAC (cr/day) Total RAC
(G+C) GPU tsk CPU tsk GPU CPU GPU tsk CPU tsk
====== ================= ========= ======================== =========
6+2 2,735 43,847 31.6 3.9 138,998 7,882 146,880
6+0 ~2,731 - 31.6 - 139,202 - 139,202
6+3 ~2,751 ~48,000 31.4 5.4 138,190 10,800 148,990[/pre]

Notes
1. Both 6+0 and 6+3 times are approximate because they are based on small sample sizes.
2. 6+2 times are based on a bigger sample size (about a week) but still do not account for possible variations in the data over a longer time scale.
3. Theoretical tasks per day were truncated to one decimal place.
4. Theoretical RACs (expected daily credit) were calculated to full accuracy.
5. Depending on many factors including hardware and what else runs, theoretical values are a rough guide only.
6. Because of AMD architecture constraints (each FPU shared between 2 integer cores) I imagine trying to run more than 3 CPU tasks would be quite bad for performance.

I intend to leave the machine 'as is' for a longer test. With the availability of advanced LIGO data and the prospect of continuous GW detection, the most important ongoing consideration for me is to maintain and improve if possible, CPU task output from GPU crunchers, even if that entails a small loss in GPU output. For me, it looks like 6+3 is a very nice compromise for that host.

Cheers,
Gary.

GPU and CPU

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner