4 WUs on a GTX 480

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7084474931
RAC: 1365656

Bikeman wrote:the "new" BRP4

Bikeman wrote:

the "new" BRP4 units are quite a bit less CPU intensive than the formerly distributed BRP3 workunits. (while the app itself is the same, the signal data is different).

That means that the GPU load will be higher now, and the saving you get by running several units in parallel will be smaller. It will be interesting to see new runtime measurements.

I've been working with my new host which has a GTX 460 graphics card of the Gigabyte SOC flavor.

Single WU at a time gives a very tight distribution of elapsed times averaging 1900 seconds, with a stdev of something like 20 seconds. GPU load generally was just under 70%.

But running two WU at a time exhibited bimodal behavior. Much of the time the system was running at little if any higher throughput than single WU (as shown by all of rate of progress, GPU load, and power consumption) but at times it would run materially faster (again as manifested by all three). About 50 WUs processed over about a day showed an average throughput advantage over single WU of a bit over 8%.

When I first started running three simultaneous WU, all three indicators suggested considerable further improvement. In particular GPU load was mostly about 88%. One of the three active WUs progressed much faster than the other two. But as soon as it finished, this desirable behavior vanished, and since then I've seen GPU load at 77%, power consumption to match, and matched progress on all three WUs with throughput no higher than the 2 WU case average--possibly a bit lower.

An additional problem is that the server is generally only awarding this host one BRP4 WU for each request. Sometimes a second or third request is generated and award of one WU each granted at one minute intervals, but then the four hour delay penalty for use of anonymous platform is posted. As the host consumes about eight WUs in four hours if available, this is a problem for unattended operation. When I was trying two WU operation yesterday, the host was given enough work to stay busy--I don't know what the difference is today.

In all three conditions, the forecast run time has not converged toward the real one on any reasonable time scale (well past the oft-mentioned 10 returned WU point). For single WU work (with no app_info.xml file) after many dozens of results returned, the estimate was about double truth. At double WU running, the error was much larger, though I failed to log the values. Now at triple WU running, for which the real elapsed time for each WU is about an hour and a half, the initial estimate was over 21 hours, and has so far declined only to 19.5 hours. As I've requested a 3.5 day queue, this alone, however, seems not enough reason for the parsimonious distribution of work.

While both double and triple WU operation has given a modest performance boost, unless I can get the higher activity condition observed on some work to be typical by some adjustment, the improvement seems not worth the overhead and risk associated with anonymous platform operation.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.