Memory depletion--graphics driver related

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7064164931
RAC: 1221321

Perhaps I have very good

Perhaps I have very good news.

Short version: the major update to  Windows 10 that I installed today seems to have fixed the leak behavior.  

Long Version:

I've been rebooting every day or two, partly as a means to manage the memory leak problem on my 1060/1070 machine.  After today's reboot, Windows advised me it had an update ready to install, and that I could either restart, accept the offered scheduled install time, or schedule a different one.  Thinking this would be a short matter, I selected immediate restart. As soon as it got going, it warned me that this would take a while, and involve multiple reboots. In the event it took almost an hour.  I think this may be the "Anniversary Update" which has been much in the news.  But the description actually given was "Feature update to Windows 10, version 1607".

After a couple of hours of steady Einstein running (3X BRP6/CUDA55 tasks on each of a 1070 and 1060) after completing the update, I see no progressive increase in paged pool bytes, no steady decline in available bytes, and no relentless accumulation of Vi12 tagged pool bytes.  Maybe somehow this update fixed this behavior.

I have another three Windows 10 machines, but only one includes a Maxwell2 or Pascal card and has been showing the memory leak symptoms.  I'll do this update on that machine soon, and report whether the leak behavior has also changed.

It would be helpful if others here who have observed similar memory leak symptoms on Windows 10 Maxwell2 or Pascal systems would check before and after this update to see if they have any such improvement.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7064164931
RAC: 1221321

The second of my two machines

The second of my two machines which was having the Vi12 tagged pool memory leak has also apparently been healed with a Window 10 Anniversary update.  Oddly enough, in that case, when I first ran it as updated it still leaked, with the 359.06 driver then on it (at the suggestion of someone here).  But I've rebooted after updating to the current (372.54) and now see none of the three primary symptoms of the problem.

The possibility that this particular update made a difference is enhanced by the observation that it installed new version of both of the .sys driver files which contain the Vi12 tag.

 driver  old_size new_size

dxgmms1 393,056 402,272

dxgmms2 576,864 658,784

 

The Windows 10 Anniversary update has been in slowly rolling distribution since August 2.  Published reports shows under 20% of system converted as of a couple of days ago, and as "born" Windows 10 system are getting the update sooner than upgraded ones, it is likely you don't have the update if you have not noticed.  If want to try it, you might consider this rather detailed description of a simple means to force the process, together with running commentary on how it tends to look.

As we have had some inconsistent results on "switches" which enable or disable this behavior, I'm cautious at the optimistic view that the combination of up-to-date Nvidia driver plus the Windows 10 Anniversary update heals the problem, but these are the best results I've gotten to date.

Both machines are currently running BRP6/CUDA55 work with the "test" or "beta" application designation.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

archae86 wrote:healed with a

archae86 wrote:
healed with a Window 10 Anniversary update...

...it installed new version of both of the .sys driver files which contain the Vi12 tag.

 driver  old_size new_size

dxgmms1 393,056 402,272

dxgmms2 576,864 658,784

Hmm, build 14905.1000 seems to have those file sizes closer to new_size than old_size.

dxgmms1 398,096 , dxgmms2 651,024

My host with a GTX 670 and 372.54 is currently having memory leak with these, but this week should bring out a new test build. I'll monitor if something will change.

 

edit:

Build 14915.1000, Nvidia driver 372.70, GTX 670. Memory still leaking.

dxgmms1 399,632 , dxgmms2 649,488

 

edit 2:

Build 14926.1000, Nvidia driver 372.70, GTX 670. Memory still leaking..

dxgmms1 399,632 , dxgmms2 648,464

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Just a little update. A few

Just a little update.

A few versions later... build 14955 and Nvidia driver 375.70 in five hosts.

While running the current app (Binary Radio Pulsar Search (Arecibo, GPU) v1.57 (BRP4G-Beta-cuda55) windows_intelx86)... Paged Pool is being raped on all three machines that have a GTX 960. Hosts with GTX 760 or GTX 670 are not affected. This was also happening with Nvidia driver 375.63

Interestingly...  at some point in the past, situation was the other way around with those cards.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.