Graphics card cooling

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690658349
RAC: 268026
Topic 195221

Hi all!

Recently I had some validation problems for ABP2 CUDA tasks on my Geforce 9800 GT eco (basically that is a slightly undervolted, underclocked 8800 GT). Apparently this was not related to some validator bug that was recently fixed, but was instead correlated to ambient temperature: on hot days, almost all results would fail to validate. On not-so-hot days, everything was just fine.

Ok, so I got a digital thermometer and played around with it a bit.

This is an ATX case with the usual configuration: motherboard is vertical, so the PCIe card is installed with the heat sink + fan horizontally, facing downwards. Hmmmm....how again is this supposed to work?? The fan is kind of small and convection just won't work because the sink is blocking the natural airflow in this position rather than encouraging it!

And indeed, the air temperature inside the case, measured halfway between the heat sink and the bottom of the case, was 40°C (about 100°F) under full load. The fan was just stirring the hot air, but not moving it anywhere! The heat sink itself was up to 58°C hot.

That's a silly design, isn't it ??

Ok so I put the case on its side so now the motherboard is horizontal and the gfx card is vertical, which should help convection a bit in theory. More important, I also installed an additional fan that sucks the hot air and blows it out thru the vacant extension slots. This makes sure that the temp of the air around the gfx card is now almost the same as the room temperature, and I had no validation trouble since then.

So what did I learn from this:

1) I had thought that the gfx card featured thermal throttling (in hardware) that would slow down the card before it would overheat and produce funny results. That doesn't seem to work for my card and system, and might not work for others as well. Don't take it for granted.

2) Those "green" and "eco" cards consume "only" max 75 W, but still, if they are designed to dissipate that heat into the PC case instead of moving it out of the case like more powerful, 2-slot-width cards, you might still get heat problems. At the end of the day, the heat has to be transported out of the case or it will over-heat.

Anybody had similar problems/solutions?

happy crunching
HB

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

Graphics card cooling

Quote:
This is an ATX case with the usual configuration: motherboard is vertical, so the PCIe card is installed with the heat sink + fan horizontally, facing downwards. Hmmmm....how again is this supposed to work??


The fans on GPUs suck in air, they blow it out through the back of the card, as usually the whole cover over the GPU and memory has blowout holes at the back of the card (inside the PC), through which all the hot air gets blown inside the case (hopefully in the path of a case fan).

Some cards have these blowout holes through the backplate, so they vent directly into the room.

Since air is cooler at the bottom of the case, the fans are pointed downwards.

hotze33
hotze33
Joined: 10 Nov 04
Posts: 100
Credit: 368387400
RAC: 93

Hi, I have 2 9800GX2. On hot

Hi, I have 2 9800GX2. On hot days temps were up to 100°C (ambient like 28°C). But no validation errors. I have removed the housing of the graphic cards and got like 20 degree less. Now with 22°C ambient I have around 60-70°C. The fans are @ 75% which would be to loud at home but in the office it is alright.
Thermal throttling is around 105°C.
imho *green* cards are the broken cards. They had to reduce the core frequency to get them to work.(no offense)

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6540
Credit: 286821350
RAC: 89637

RE: Thermal throttling is

Message 98788 in response to message 98787

Quote:
Thermal throttling is around 105°C. imho *green* cards are the broken cards. They had to reduce the core frequency to get them to work.(no offense)


It's a marketing gag/scam. To flip a bit requires a minimum energy, to do it quickly you either ramp up voltage or current or both. As power is the product of voltage by current, then for a given technology ( semi-conductor substrate that determines electron/hole energy states and differentials ), a given volume of active chip material and a given rate of bit-flip you can't escape some least expenditure. On the card, but off the chip, you can try to park 'savings' elsewhere - as you say by not including ( power consuming ) active cooling elements or by not working full time ( stop if too hot ).

To match an alleged 'green' card with a suitably specified similiar 'non-green' card you have to provide superior passive cooling or active cooling for the green one. Of course that active cooling device is likely from another manufacturer altogether - the one that makes stand-alone cooling fans - so the green card manufacturer doesn't include that in the calculation. All the while not divulging the need to spend elsewhere ( carbon credits and whatever included ) to actually get it to work to spec, fulltime and no nonsense.

It's rather like the toys you can buy with 'batteries not included', in this case it's a heat engine that calculates with 'provision for cooling not included'. Of course product lifetime at a higher temperature is now an issue too.

Another possible way to save is to have algorithmic efficiency using a cleverer design requiring fewer bit flips for some outcome. But by now nearly all cards are massively parallel anyway ....

Now what might save the day, green-wise, is whacking great lumps of metal ( copper is great for this! ) that conduct the heat away from the active zone(s) and provides a geometry of increased surface area for cooling using adjacent air. It would be nice to have even a modicum of airflow over that with, as Bikeman alludes, some arrangement to encourage convection ( smoke goes up chimneys ). That may not even be 'green' though as truthfulness requires a full birth to death assessment, that is : how does either manufacturing or disposal compare between cards? [ compare with the logic of lighter aluminium based car frames being more economical after construction, but what of the costs of smeltering? ]

Cheers, Mike.

( edit ) And if by manufacture you reduce the energy level differences between areas of the semiconductor lattice, that would reduce the bit flip energy cost. But now you've also reduced the physical distinction between 'off' and 'on' states ( 0's and 1's ). This increases the likelihood of an indeterminate state ( neither a 0 nor a 1 ), which also gets worse with increasing temperatures .. maybe our ex-Intel man Peter ( archae86 ) has some views?

( edit ) Sigh ... the devil is in the details, or should I say Maxwell's Demon ? ;-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

Mike: That's not strictly

Mike: That's not strictly true. As a processor deviates from 'perfect' the energy it needs to run at a given performance level increases. This is used in binning chips. The better chips can either run faster for a given power level or use less power at a given performance level. Performance binning is much more common; but the reason low voltage CPUs cost more than their standard voltage equivalent at a given clock speed is that they're higher binned parts and at a standard voltage would run significantly faster. That said, I'm not sure if anyone actually puts higher bin GPUs in low power card revisions or not. The primary GPU binning is on the number of functional units enabled, but if you compare the stock and maximum overclocked speeds of eg the 58xx series of cards the chips picked for 5870's aren't done just because they have all the core clusters working, they're also capable of running at higher speeds. Unless ATI starts selling a green 5870 GPU any chips that have all the core clusters working but can't reach the speed targets will have a cluster disabled and be sold as a 5850.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6540
Credit: 286821350
RAC: 89637

RE: Mike: That's not

Message 98790 in response to message 98789

Quote:
Mike: That's not strictly true. As a processor deviates from 'perfect' the energy it needs to run at a given performance level increases. This is used in binning chips. The better chips can either run faster for a given power level or use less power at a given performance level. Performance binning is much more common; but the reason low voltage CPUs cost more than their standard voltage equivalent at a given clock speed is that they're higher binned parts and at a standard voltage would run significantly faster. That said, I'm not sure if anyone actually puts higher bin GPUs in low power card revisions or not. The primary GPU binning is on the number of functional units enabled, but if you compare the stock and maximum overclocked speeds of eg the 58xx series of cards the chips picked for 5870's aren't done just because they have all the core clusters working, they're also capable of running at higher speeds. Unless ATI starts selling a green 5870 GPU any chips that have all the core clusters working but can't reach the speed targets will have a cluster disabled and be sold as a 5850.


Yes, you're right. Assuming that the term 'binning' refers to selection via testing post die manufacture ( sort via the good, the bad and the ugly ), then yes manufacturing faults will increase power consumption and/or lower performance for a given power. So yes, I wasn't including that. But then, to be true one has to amortise true costs by including the duds and poorer performers in the batch. Which is was what I was getting at, hiding real cost ( by whatever measure ) outside of some horizon, in this case outside of the box containing the product you just bought and not mentioning what was discarded to achieve that.

Now I know that one has to draw a boundary circle around somewhere in the analysis, else you'll include the entire universe ( time and space ), but surely the manufacturers know these 'hidden' elements. In fact they must, to stay in business, otherwise they wouldn't know what costs in failure need to be offset by the selling price of successful products.

Cheers, Mike.

( edit ) BTW I'm all for green computing. However I'm also into reality checks ... :-)

( edit ) To be more helpful though, I have a recent Gigabyte variant - not especially touted as 'green' - with 'Silent Pipe technology' in the vein of sculptured lumps of copper without an on-card fan. But then the entire system is contained in an Antec 1200 case ( fan city ) which certainly manages the thermal load, and has a neat little speed/flow setting knob per fan. In the front case bays I have mounted an 8 for 6 TB RAID-5 array of WD SATA's, so what air gets to the rear is already pre-heated by that plus the cooling fins of 10GB of DDR3 main memory ( @ 1600 ) sitting up into the flow. So maybe my "I'm all for green computing" refers to someone else then, huh ??? :-) :-0

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Cannibal Corpse
Cannibal Corpse
Joined: 21 Feb 05
Posts: 18
Credit: 1555535
RAC: 0

Hello all...Kinda of a

Hello all...Kinda of a DHUU..but I took an old power supply and an old huge aluminum heat sink with a 3000 rpm fan and placed it over my lap top cpu along with a 80 mm 3000rpm fan over my drive and now I can crunch 90%, was 60% max. Droped the temp 10 deg f.

DO WHAT THO WILL SHALL BE THE WHOLE OF THE LAW.
PROUD MEMBER OF THE CARL SAGAN TEAM.

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86314215
RAC: 213

RE: ... Now what might save

Message 98792 in response to message 98788

Quote:
... Now what might save the day, green-wise, is whacking great lumps of metal ( copper is great for this! ) that conduct the heat away from the active zone(s) and provides a geometry of increased surface area for cooling using adjacent air. ...

Better is to use heat pipes (recycling evaporative cooling), or liquid cooling, or even bolt on your own Stirling engine to move the heat out of the way.

Could you approximate a Maxwell demon by using a cooling gas with for example an etched silicon sieve where there are funnelled pits that present large entrance holes on one side that funnel down to tiny exit holes...?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6540
Credit: 286821350
RAC: 89637

RE: Could you approximate a

Message 98793 in response to message 98792

Quote:
Could you approximate a Maxwell demon by using a cooling gas with for example an etched silicon sieve where there are funnelled pits that present large entrance holes on one side that funnel down to tiny exit holes...?


Well, strictly speaking, a Maxwell Demon ought use a diminishing amount of energy ( in the limit, zero ) to separate some ensemble of particles with respect to some characteristic - say energy per particle. Now initially your setup would work but only until the sieve itself heats up by contact with the gas particles via impacts. Thus agitation of the ions in the silicon lattice will increase and hence help equilibrate average gas particle energies on both sides of the partition. You can't have passive one-way systems indefinitely, there must be a way of carting off energy from one side to maintain a temperature differential. The size or shape of the holes is quite irrelevant .... any 'little mechanism' ( the demon ) requires energy to operate, or at least energy to remember state ( is my little trapdoor open or shut? ) and change state. I can't ( after a while when things equilibrate ) use any heat engine to provide that energy as there is no temperature differential within the system to extract work from. So I have to import energy from elsewhere to run my demon ..... what keeps ( if at all ) my computer cool enough is whatever the air temperature is as the local weather systems come and go, or my air conditioner or .....

Cheers, Mike.

( edit ) In one of Richard Feynman's lectures he gives the terrific example of the 'ratchet and pawl' and why that cannot indefinitely be used to rotate a cogwheel one way. Basically as the situation unfolds the jittering of the components means that it becomes equally likely to turn one way as the other. The rotations/steps effectively pursue a 'random walk' sequence, which in the long term favors neither a nett clockwise nor anti-clockwise direction.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

dutchie
dutchie
Joined: 10 Jun 06
Posts: 34
Credit: 6102332
RAC: 0

Hello Folks!! I run 3

Hello Folks!!
I run 3 different videocards on this project all of them nvidia.
A 8600 GT, A GTS-250 plus A GTX-460, I ALL COOL THEM EXTRA WITH FANS.
What i do is i glu a 8 cm fan with one second-bound glue on the bottem
of the computer towards the video card, that brings the card from 56 celsius
to 42 celsiius that,s 42 degrees above freezing level for the GTX-460 ,minding al the wires.

grts-Rene-Dutchie.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.