Parkes PMPS XT on GPUs - a general problem?

Michael H.W. Weber
Michael H.W. Weber
Joined: 22 Jan 05
Posts: 10
Credit: 399175195
RAC: 0
Topic 198057

I have reported massive error frequencies regarding this type of WUs on GPUs earlier.

Now again: It seems, these WUs generally pose an issue.
Just two examples:

http://einsteinathome.org/workunit/216250252
http://einsteinathome.org/workunit/216192690

Note that it is not GPU-dependent, meaning it affects Intel IGP, NVIDIA and also AMD/ATI.

Could someone from the project team please look into this to save us from producing hot air instead of useful results data?

Michael.

RNA World - A Distributed Supercomputer to Advance RNA Research

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

Parkes PMPS XT on GPUs - a general problem?

I am unsure where you are seeing it affects NVIDIA or AMD/ATI. There is known bugs with some of the Intel drivers as I am sure someone will point out. The second one in you list is awaiting a good return and has crunched OK on an ATI card.

Pollux_P3D
Pollux_P3D
Joined: 8 Feb 11
Posts: 30
Credit: 212418648
RAC: 0

The problem in this case is

The problem in this case is the Intel driver

Maximilian Mieth wrote:
I finished one BRP6 1.52 on my HD4000 (driver: 10.18.10.3621). It was about 25% faster than the 1.39 ones I did with my NVIDIA 610M. As always all CPU cores were free.

Maximilian Mieth wrote:
MrS wrote:

That's the last trouble-free driver for BRP4, before the problems started. Which is probably the reason why you're using it ;)

Yes ;)

--> http://einsteinathome.org/host/6684045/tasks&offset=0&show_names=0&state=3&appid=29

Michael H.W. Weber
Michael H.W. Weber
Joined: 22 Jan 05
Posts: 10
Credit: 399175195
RAC: 0

RE: I am unsure where you

Quote:
I am unsure where you are seeing it affects NVIDIA or AMD/ATI. There is known bugs with some of the Intel drivers as I am sure someone will point out. The second one in you list is awaiting a good return and has crunched OK on an ATI card.


Well, the AMD/ATI card issue I had posted before on this board. I am currently re-trying with my AMD R9 290X, but the one WU I have yet completed and which you mentioned still has not been validated (pending).

The Intel issue apparently is a known problem (although I could not extract a solution from the posting below). I use a Dell system where I cannot install any other driver than those provided by Dell because I use other custom graphics hardware whose proper function might be affected by manually changing the drivers. Only a week ago the latest Dell system driver update allowed me to at least receive WUs for the Intel HD 4000 IGP from the Einstein@home project, because before this IGP was not recognized as an OpenCL capable device.

In case of NVIDIA, I made a mistake: Taking these data

http://einsteinathome.org/workunit/216192690

I clicked on this computer:

http://einsteinathome.org/host/10141451

which carries an NVIDIA GPU. So I concluded that WU had been computed using that GPU without taking a look into more detail. But today, inspired by your posting, by looking further into the log after realizing that it was an Intel i7 CPU system that also holds an Intel HD 4000 IGP:

http://einsteinathome.org/task/494397855

I figured that this WU was actually NOT computed on the NVIDIA GPU but instead on the Intel IGP thereby explaining the error.

As said above, this posting:

Quote:

The problem in this case is the Intel driver

Maximilian Mieth wrote:
I finished one BRP6 1.52 on my HD4000 (driver: 10.18.10.3621). It was about 25% faster than the 1.39 ones I did with my NVIDIA 610M. As always all CPU cores were free.

Maximilian Mieth wrote:
MrS wrote:

That's the last trouble-free driver for BRP4, before the problems started. Which is probably the reason why you're using it ;)

Yes ;)

--> http://einsteinathome.org/host/6684045/tasks&offset=0&show_names=0&state=3&appid=29


makes it difficult for me to understand how exactly to solve the problem.

Michael

RNA World - A Distributed Supercomputer to Advance RNA Research

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2770855132
RAC: 902948

RE: The Intel issue

Quote:
The Intel issue apparently is a known problem (although I could not extract a solution from the posting below). I use a Dell system where I cannot install any other driver than those provided by Dell because I use other custom graphics hardware whose proper function might be affected by manually changing the drivers. Only a week ago the latest Dell system driver update allowed me to at least receive WUs for the Intel HD 4000 IGP from the Einstein@home project, because before this IGP was not recognized as an OpenCL capable device.


I run the iGPU on a Dell platform (Optiplex 9020 with Haswell i5), and I can - eventually - use the generic Intel driver installers to switch between drivers at will. To get started, you may need to force one to install manually: Intel's 'Installation ReadMe' file is actually pretty good, if you follow the instructions exactly. I found the 'have disk' route the best way to get started.

I've got to rush out now, but if you search the boards for "10.18.10.3621" - the most reliable driver found so far - your should be able to find a download link. Or some kind soul might post it here.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109978970858
RAC: 29150074

RE: ... Or some kind soul

Michael H.W. Weber
Michael H.W. Weber
Joined: 22 Jan 05
Posts: 10
Credit: 399175195
RAC: 0

Now that a few of these WUs

Now that a few of these WUs processed singly by my AMD/ATI 290X board

http://einsteinathome.org/host/11761322/tasks&offset=0&show_names=1&state=0&appid=29

have been validated, it seems that the previous errors which I got with this same system were a result of parallel processing of GPU WUs.

Michael.

RNA World - A Distributed Supercomputer to Advance RNA Research

Maximilian Mieth
Maximilian Mieth
Joined: 4 Oct 12
Posts: 128
Credit: 9903029
RAC: 912

RE: it seems that the

Quote:
it seems that the previous errors which I got with this same system were a result of parallel processing of GPU WUs.


Maybe there was a temperature problem with the GPU? Did you check that?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109978970858
RAC: 29150074

RE: ... it seems that the

Quote:
... it seems that the previous errors which I got with this same system were a result of parallel processing of GPU WUs.


I would think it's not just the parallel processing. I'm running 4x on lots of HD7850s and not seeing any issues at all. It's likely to be something else as well (like temperature, as suggested) that's really causing the problem.

Cheers,
Gary.

Yacob
Yacob
Joined: 29 Nov 12
Posts: 4
Credit: 12911740
RAC: 0

How do you limit temperature

How do you limit temperature on a GPU?
I know how to check the temperature of a Nvidia GPU through nvidia-settings or nvidia control panel programs, but not how to establish an upper limit.
There are some programs for CPUs, but for GPUs?

Thanks,
Yacob

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7058514931
RAC: 1609944

RE: How do you limit

Quote:
How do you limit temperature on a GPU?


efmer's TThrottle program will do it.

Also, many of the standard GPU monitoring programs give you the possibility to specific the relationship between fan speed and GPU temperature. While not a means of imposing an absolute limit, you can influence matters this way.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.