NVIDIA driver crashing

David S
David S
Joined: 6 Dec 05
Posts: 2473
Credit: 22936222
RAC: 0
Topic 198321

My i7's GT 440 suddenly started crashing last night. Sometime during the night, the whole computer froze. This continued after I restarted it, until I suspended the GPU in Boinc.

I also posted about this at Seti and someone suggested the Einstein app might be the problem. Is anyone else having trouble with driver 353.30? Should I try the latest one? Any good reason NOT to try the latest one?

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056644931
RAC: 1605764

NVIDIA driver crashing

I can report that I have two systems running 750 cards, a 660, and 970 on 355.82, and one system running a pair 750 cards on 355.98.

But current is 359.00, and I've not tried that--just because I've not gotten around to it. It is a November 19 release, so quite recent.

I've not been tracking forum discussion on issues with specific versions.

mikey
mikey
Joined: 22 Jan 05
Posts: 11944
Credit: 1832574087
RAC: 218327

RE: I also posted about

Quote:

I also posted about this at Seti and someone suggested the Einstein app might be the problem. Is anyone else having trouble with driver 353.30? Should I try the latest one? Any good reason NOT to try the latest one?

And here at Einstein we say 'it's Seti causing the problems'. However BOTH are probably bogus and misleading, if it worked a week ago and the projects haven't released any new workunits, then the problem is more likely to be on your end.

Check for dust, check for overheating, check for other processes going on that could be causing the problems, did you add any new Projects that are now using resources that weren't being used before, etc, etc. What version of Boinc are you using? Did you upgrade/downgrade it since the problems began?

Snow Crash
Snow Crash
Joined: 24 Dec 09
Posts: 65
Credit: 100880785
RAC: 0

The 359.00 drivers are

The 359.00 drivers are working OK for me ...
GTX480 Win10_64, GTX670 Win7_32, GTX970_Win10_64, 6GTX660Ti Win7_64

What Mikey said :-)

When I run into issues I generally use that as an opportunity to check everything ... dust, fans, update OS, update drivers, clean up disk, defrag, check the BOINC version - basically give the system an oil change and tune up ... for me, the few hours (at most) it takes to do this is fun - yes, I find it enjoyable ... it's my hobby :-) While this potentially introduces many changes at the same time ... I know it will work in the end (it didn't magically stop so it won't require magic to make it start again :-) and I know I'm going to do the maintenance and upgrades at some point anyway, why not now?


After everything is clean and my PC enthusiast hobby is satisfied, I get back to the crunching fun. I restart step by step, checking the crunching specific areas one by one. Suspend BOINC manager and reboot, once it comes back up check the BOINC manager log to see everything is good and no errors are logged. Verify my GPU settings are what I want (fan, gpu, mem speeds). When it's all good, I make sure all tasks are suspended, note the WU numbers for the first couple and then release one at a time whole watching my cpu and gpu monitors to verify that while under load they are behaving the way I want them to.

If the system is responding normally with the exception of GPU tasks crashing the machine then I would consider aborting the GPU task that I had let run that "caused" the crash - perhaps the WU got corrupted, rule it out and move on to the next. If you are running multiple WU so you may need to do this a few times.

Sorry I made that longer than likely necessary but hopefully there is something helpful buried inside.

--------------------------
- Crunch, Crunch, Crunch -
--------------------------

David S
David S
Joined: 6 Dec 05
Posts: 2473
Credit: 22936222
RAC: 0

Okay, nothing is new on the

Okay, nothing is new on the box. No new software, no new hardware, no new Boinc version, no new drivers, no new Windows updates. Cleaning out the dust is probably a good idea.

I think after that my next step will be to suspend Einstein and let Seti use the GPU. If it's okay, I'll try it the other way around and see what happens. Killing the tasks in progress would be the next thing, if necessary.

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

m4573r
m4573r
Joined: 8 Nov 15
Posts: 1
Credit: 1124125
RAC: 0

Yeah, cleaning the dust

Yeah, cleaning the dust properly is always a good idea. I hope you solved your problem already, but if not, this should help:

Check your software scenario - uninstall GPU drivers, then install new, clean version. They can do some mess without reason. You can do the same with directx runtime libraries, just in case.

If your software is ok, you may have a problem with overheating of some components and card is just switching off for a moment. Try to remove radiator from your GPU and check the state of thermopads and components on the PCB. If thermopads have lost their softness or they broke on disassembly, you have to replace them. For GPU you can use thermal grease instead, but other elements should have thermopads.

On PCB look for broken condensers - they can have a little swelling on top and/or they can leak something like old glue. If they are, they're broken and you should replace them. In most cases broken condensers are still working, but they can cause a problem. Another thing you should look onto are the resistors. They're looking like small black/graphite cubes with two legs on the sides and they have number started with "R" like a.e. "R110". They're mostly placed close to the 6/8pin power connector. I had some cases (3 pieces of nvidia gtx285 yesterday) when those resistors went so hot, that they unsloder themselves.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1433361483
RAC: 595919

RE: RE: I also posted

Quote:
Quote:

I also posted about this at Seti and someone suggested the Einstein app might be the problem. Is anyone else having trouble with driver 353.30? Should I try the latest one? Any good reason NOT to try the latest one?

And here at Einstein we say 'it's Seti causing the problems'. However BOTH are probably bogus and misleading, if it worked a week ago and the projects haven't released any new workunits, then the problem is more likely to be on your end.

Check for dust, check for overheating, check for other processes going on that could be causing the problems, did you add any new Projects that are now using resources that weren't being used before, etc, etc. What version of Boinc are you using? Did you upgrade/downgrade it since the problems began?


What I was trying to say was running 2 BRP tasks at a time was the problem on my GT430 not that Einstein was the problem.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

David it that card running

David it that card running stock speeds or OC?

David S
David S
Joined: 6 Dec 05
Posts: 2473
Credit: 22936222
RAC: 0

AFAIK, the card is running at

AFAIK, the card is running at stock speed. I certainly didn't do anything to it.

I opened the box this evening and found surprisingly little dust inside, but I blew out what I could anyway. I removed the GPU to blow it out. No components were obviously bad, but I wasn't looking at them.

In the course of troubleshooting the problem, I changed my Einstein preferences to only run one task at a time on the GPU, but this won't take effect until it downloads more work.

I think I have tracked the problem down to Seti Beta opencl tasks. It's been running with those suspended and everything else enabled, and not had a problem in over an hour. At this moment, it's doing 1 BRP6 cuda 55 and 1 Seti cuda 42.

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

David, I replied on Seti

David,

I replied on Seti Main.

That OpenCL is an OpenCl_nvidia_sah.. a new app for VLARs.

Should only be crunched on Kepler or Maxwell cards.

I'd recommend not crunching them any more on that GPU.

Zalster

David S
David S
Joined: 6 Dec 05
Posts: 2473
Credit: 22936222
RAC: 0

RE: David, I replied on

Quote:

David,

I replied on Seti Main.

That OpenCL is an OpenCl_nvidia_sah.. a new app for VLARs.

Should only be crunched on Kepler or Maxwell cards.

I'd recommend not crunching them any more on that GPU.

Zalster


I'll reply there. Thanks.

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.