Anyone GTX 980 running?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110382438962
RAC: 30457114

RE: Seems like 5 tasks run

Quote:
Seems like 5 tasks run just fine. CPU taks however seem to be quite inefficient.


I'm sure you could probably run 10 concurrent GPU tasks if you wanted to, based on the specs of the card but I don't imagine it would gain you much (if any) to do so. I had a quick look at 980s which have appeared locally and it seems they have 4GB RAM and cost around $700. You can certainly run 2 tasks per GB of video RAM and probably 3 per GB if you tried. It would certainly be a case of diminishing returns at some point.

From your recent figures, running 4x (no CPU tasks) seems to give you a single task time of 17648/4=4412secs whereas running 5x gives 21570/5=4314secs. You gained slightly (less than 2 mins per task). Quite some time ago, data was published by astrocrab for an AMD HD7970 with 3GB of RAM, all the way up to 10x task concurrency (BRP5 GPU tasks and no CPU tasks). There were a number of messages in addition to the one linked that gave results for different concurrency values. If I recall correctly, there were gains at every step up to 10x but after about 6x, the gains were quite small, just as you've seen in going from 4x to 5x. The 384bit interface width of the 7970 gives it a strong advantage which is probably one reason why a 7970 can do 10 tasks in around 31000 secs, an average of 3100 secs per task compared to your >4300secs per task at 5x.

I'm not sure why you say that CPU tasks are inefficient. I guess you mean that the CPU itself is an inefficient device for doing the calculations. The problem is that it is obviously quite difficult to develop an efficient GPU app for the calculation in FGRP tasks. There was a GPU app for FGRP3 but it needed a *lot* of CPU support and used the GPU quite inefficiently. The Devs are obviously having problems improving that since they have not released a GPU app for FGRP4.

Whether you choose to run BRP GPU apps only or to support FGRP4 as well is entirely up to you. I guess it all depends on your motivation. For me, I like to do both because I want to increase my chances of participating in the discovery of a new gamma ray pulsar as well as a radio pulsar. So I quite accept a small increase in BRP5 crunch time in order get FGRP4 tasks done as well. I'm also keen to minimise total cost of ownership so I try to use budget hardware that is also economical on energy use. I also reuse as much hardware as possible from previous crunchers.

I've replaced a number of less efficient machines with i3 (dual core, 4 threads) hosts with a 2GB AMD 7850 GPU. This host is one of them. It's Ivy Bridge, not Haswell and it has a RAC around 77K. It's doing BRP5 tasks 4x and 2 FGRP4 tasks. GPU tasks take around 16Ksecs - ie 4Ksecs per task and CPU tasks take around 25Ksecs. With 4 GPU tasks and 2 CPU tasks running concurrently, it's pulling only 160 watts from the wall. I've bought quite a few of these GPUs at around $140 each.

It could even be better if I used a modern high efficiency PSU. I have a very large number of 2002 vintage PSUs that were designed to power late P3/early P4 systems. They are rated at 175W and can deliver around 110W at 12v. All my hosts like the above example, use two of these PSUs, one to power the motherboard/CPU and one to power the GPU. They probably run at less than 80% efficiency but they cost nothing so I'm happy to use them.

Of course, I'm not suggesting that people shouldn't be investing in i7s and GTX980 GPUs. My comments are entirely directed towards building a budget box for crunching current EAH tasks only. Things could be quite different for other projects or if crunching was a low priority, secondary use of the machine.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7074524931
RAC: 1320148

RE: Of course, I'm not

Quote:
Of course, I'm not suggesting that people shouldn't be investing in i7s and GTX980 GPUs. My comments are entirely directed towards building a budget box for crunching current EAH tasks only. Things could be quite different for other projects or if crunching was a low priority, secondary use of the machine.


Thanks for your entire set of edifying comments, Gary.

A couple of weeks ago I was quite eager to re-equip my two main hosts running GTX 660s with either the GTX 970 or 980. I dreamed of very considerable Einstein output increase and very considerable Einstein work per watt increase. But the reported numbers suggest both improvements would be far less than my dreams--making the purchase cost and installation effort and risk seem rather questionable.

Do you have an opinion on whether the big Maxwells (and maybe even the baby 750 and 750 ti) have substantial unrealized Einstein compute capability that might get unleashed sometime in the foreseeable future with Einstein using a higher level of CUDA or making other architecture-specific improvements. I have little insight both into whether this is feasible and whether the project is likely to invest effort along these lines. Surely the hope for a single development thread held out by openCL must be a powerful lure to the project--but so far that has been a severe relative disadvantage to the Nvidia vs. AMD relative performance here, if I understand correctly.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5851
Credit: 110382438962
RAC: 30457114

Hi Peter, Sorry for the

Hi Peter,

Sorry for the late reply - I've had other commitments to attend to.

Quote:
Do you have an opinion on whether the big Maxwells (and maybe even the baby 750 and 750 ti) have substantial unrealized Einstein compute capability that might get unleashed sometime in the foreseeable future with Einstein using a higher level of CUDA or making other architecture-specific improvements.


Everyone usually has opinions but I should warn that mine (on this particular question) are not based on tangible knowledge :-). For years, there have been questions (and even some answers from Devs) about why EAH still uses apps based on CUDA 3.2, rather than the 'latest'. From memory, the most recent answers did indicate that an app would be put into beta test at some point 'soonish', but that was some time ago now.

My opinion is that the seemingly very long delay might be partly attributable to implementation difficulties, but it also might suggest that the performance improvement is rather less than stellar. I tend to think that time gets 'found' if the potential improvement is rather good. I don't recall if previous answers have mentioned the expected level of improvement or not.

I do remember conversations with Bernd and Oliver to the effect that AMD hardware and OpenCL based apps might turn out to be the preferred development direction for the project, but that was around 3 years ago. AMD drivers/libs have certainly improved over the intervening period and that would seem to strengthen the case for OpenCL.

When I started building GPU crunchers, I found the graph in this post by Robert to be quite compelling. Sure, the GPUs listed on the graph are quite old now but the basic message of a 'best value line' well away from the high end GPUs would still be just as true today as it was then. Robert has made a number of posts over the years that have proved very useful in assessing 'best value' GPUs for crunching EAH. I particularly liked this one as well. I know you have seen his recent post comparing the 750Ti, 970 and 7970. I'm getting 76K credits from a 7850 that is drawing an estimated 80W. I'm basing that estimate on a measured 160W for the entire machine crunching 4xGPU and 2xCPU tasks and subtracting around 20W for the two fully loaded CPU cores. At idle, the machine draws around 60W. So the performance/watt is 76000/80=950 credits per watt. From Robert's figures, it looks like the 750Ti would be best for you rather than the 970 or 980.

My first purchase (a couple of years ago) was a 550Ti and I put it into a Windows machine running on a q9400 CPU and budget Asus mobo (PCIe 1.x). I was happy with the boost in credits but thought it should have been better. Subsequently the app improved and I also discovered the importance of using PCIe 2 rather than 1.x, so it now runs in a PCIe 2 16x slot. Currently it produces around 35K. In the PCIe 1.x slot it was in the low 20s.

I had only a single GPU endowed host for quite a while until the release of the GTX650. At that time I bought one GTX650 and one HD7770. Both were just over $100 and seemed to be about equally matched. I ran them in Linux machines. No trouble with the GTX650 - worked out of the box but the only way I could get the 7770 to crunch was to change the distro to openSUSE which could supply working drivers. Head to head, the GTX650 was better by about 10-15% in performance. So I ended up buying a bunch more 650s. I'm sure you remember my enthusiasm for them at the time :-). I'm still running them all with no problems. Each basic machine gives at least 30K RAC.

Some time later three things changed. Firstly, improvements in AMD drivers progressively boosted the performance of the 7770. Secondly, I learned how to build my own drivers using packages from the AMD website and thirdly, my distro of choice started releasing drivers in the repo that worked. The 7770 GPU is still in exactly the same host as always, but instead of a RAC of around 25K, it now gives 41K. The OS is now PCLinuxOS but the performance gain is coming from improvements in the drivers.

A lot of the improvement came from one particular driver upgrade. When I saw the boost, I started thinking about buying more 7770s but I thought it might be smarter to source a wider memory interface than the 128bit one in the 7770. I also wanted more than 1GB RAM. I ended up stumbling on the 7850 (2GB 256bit) which was only $30 per unit dearer, due to a fortunate end-of-model clearance that a local supplier was having.

I feel confident that the project will continue supporting both OpenCL and CUDA for quite a while yet but you are probably right in thinking that OpenCL might be 'preferred' for future development purposes. I don't know which manufacturer has the higher share of the gaming market, but the project would suffer if it didn't support both.

Cheers,
Gary.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Dave, Sorry to

Hi Dave,
Sorry to butt in, but can you advise me 'how' to set E@H configuration for 2 WU on the GPU? I altered the 1 to 0.5 in the settings under e@h configuration but BAM is still only running 1 instance on my GPU.. BTW its a GTX970..

Regards

Cliff,

Been there, Done that, Still no damm T Shirt.

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7074524931
RAC: 1320148

RE: I altered the 1 to 0.5

Quote:
I altered the 1 to 0.5 in the settings under e@h configuration but BAM is still only running 1 instance on my GPU..


Has your system downloaded any new work since you made the change? If not, then this is normal behavior. The message to change how many units are running simultaneously on the GPU somehow does not get transmitted to your system for action except with work download.

On the other hand if new GPU work has downloaded but nothing changed, you probably changed the wrong parameter for the type of work you are getting.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi, Right, so altering 1.

Hi,
Right, so altering 1. 0.5 in E@H prefs for BRP was correct?
0.5 is the correct parameter? Not '.5' or whatever?

I had set NNT so dispite update taking place no actual d/l occurred.
I have a shed load of WU I'd like to get through before they expire:-)

I'll remove NNT and hope for the best.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7074524931
RAC: 1320148

RE: so altering 1. 0.5 in

Quote:
so altering 1. 0.5 in E@H prefs for BRP was correct?
0.5 is the correct parameter? Not '.5' or whatever?

I believe either of those two formats would get you 2 WU at once when the change takes effect. It appears both types of work you have been running are in the BRP group, so I think you changed the correct parameter.

Quote:
I'll remove NNT and hope for the best.


That should work eventually. If you get impatient, you can try nudging up the queue length requested by a little bit at a time and hitting update once per nudge (I think something like 5 minutes pause between requests is enough).

You just ran into one of the reasons that I suggest people set really low queue length parameters when they are starting new work types, installing new equipment, or are experimenting with the parallel settings and such. Some changes can trigger a LOT of extra download over what one might expect. If one has a very small queue this is not a problem. Also, in the case such as yours where one might want to force a download by raising the queue length, the needed rise will not in that case be worrisomely high.

All too late to help you for the moment, but possibly someone else will see and consider.

By the way, I'm very interested in observing your GTX 970 results. I've not quite given up on substituting one for one of my 660s. The mix of BRP4 and BRP5 work you are currently running may complicate comparisons, as dissimilar mixed work often does not share "fairly".

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi, Just got a d/l and now

Hi,
Just got a d/l and now have 2 tasks running on GPU and am waiting to see how long my rig takes to complete them.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Back again:-) the two WU now

Back again:-)
the two WU now running are:-
11/10/2014 02:49:40 | Einstein@Home | Starting task p2030.20131208.G199.80-01.40.S.b4s0g0.00000_2896_0
11/10/2014 02:49:40 | Einstein@Home | Starting task p2030.20131208.G199.80-01.40.S.b5s0g0.00000_1312_0

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi, Completed the 2 BRP4G

Hi,
Completed the 2 BRP4G WU, but when BAM ran GRP5 it only ran a single instance.

Still waiting to see if now its actually d/l both types of WU it will run 2 instances of BRP5.

It may take a bit of time since I'm running S@H as well and on a 60min switch over between projects.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.