Parallella, Raspberry Pi, FPGA & All That Stuff

log in

Advanced search

Message boards : Cruncher's Corner : Parallella, Raspberry Pi, FPGA & All That Stuff

1 · 2 · 3 · 4 . . . 37 · Next
Author Message
Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119519 - Posted: 8 Oct 2012, 4:38:40 UTC

This came up in a thread over at Cafe Einstein :

Parallela

which I think is well worth looking at. At a glance it would be absolutely red-hot for FFT's and thus have excellent performance in the signal processing area. If it can be done it could quite revolutionise distributed workflows as practised here at E@H. Notably software can be developed and compiled for it using C/C++ on a GNU system. One to watch.

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Fred J. Verster
Avatar
Send message
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0
Message 119544 - Posted: 9 Oct 2012, 21:08:21 UTC - in response to Message 119519.

This came up in a thread over at Cafe Einstein :

Parallela

which I think is well worth looking at. At a glance it would be absolutely red-hot for FFT's and thus have excellent performance in the signal processing area. If it can be done it could quite revolutionise distributed workflows as practised here at E@H. Notably software can be developed and compiled for it using C/C++ on a GNU system. One to watch.

Cheers, Mike.



Very good article, I've followed this development for awhile, increasing
CPU/GPU core freqency has it limits as does (22nm) process shrinking to moleculair level.

Unfortunatly I've never learned a computer language, except BASIC :-/
Programs need to be programmed a different way, but C/C++ can be used.
And paralellezation is has already proved to be very effective. (CUDA / OpenCL).


____________
Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119545 - Posted: 9 Oct 2012, 22:59:33 UTC - in response to Message 119544.
Last modified: 10 Oct 2012, 1:38:25 UTC

Very good article, I've followed this development for awhile, increasing
CPU/GPU core freqency has it limits as does (22nm) process shrinking to moleculair level.

Unfortunatly I've never learned a computer language, except BASIC :-/
Programs need to be programmed a different way, but C/C++ can be used.
And paralellezation is has already proved to be very effective. (CUDA / OpenCL).

It wouldn't currently compete on performance anywhere near the existing GPU porting of E@H WU's, as their current arrays are too small for that ( Apteva's primary focus is on lowering price and power consumption ). But I envisage having an Epiphany array in a co-processor role, which could thus be handed off the stuff that it would be quite spectacular at eg. matrix manipulations, and thus delivering great performance on algorithms for which that is key ( Fast Fourier Transforms ). Their simplest offering is 4 x 4 for around $100 USD, however the design scales up to 64 x 64 ... I was most intrigued by their matrix multiplications using blocks within the matrix shifted synchronously b/w nodes, roughly speaking a two dimensional pipeline.

Cheers, Mike.

( edit ) The programming skill would largely be a matter of having a "parallel approach" and not language per se. For instance the address space within any given array is unprotected, meaning that any node/processor can read and write to any other's memory within a globally flat space, so the discipline required to prevent any incongruities arising from that would have to come from the program design and compilation. So you'd want to identify the elements in the problem space that could be simultaneously and independently executed, and if we have already written for GPU thread parallelism then that aspect is largely done.

( edit ) This also highlights an issue/query that arises here from time to time : why can't GPU's be used to speed up <insert Algorithm X here> ? Answer : Algorithm X or the problem space it arises from may not have sufficiently parallel aspects for that to yield a gain over non-massively-parallel solutions.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Fred J. Verster
Avatar
Send message
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0
Message 119562 - Posted: 11 Oct 2012, 18:11:57 UTC - in response to Message 119545.

Very good article, I've followed this development for awhile, increasing
CPU/GPU core freqency has it limits as does (22nm) process shrinking to moleculair level.

Unfortunatly I've never learned a computer language, except BASIC :-/
Programs need to be programmed a different way, but C/C++ can be used.
And paralellezation is has already proved to be very effective. (CUDA / OpenCL).

It wouldn't currently compete on performance anywhere near the existing GPU porting of E@H WU's, as their current arrays are too small for that ( Apteva's primary focus is on lowering price and power consumption ). But I envisage having an Epiphany array in a co-processor role, which could thus be handed off the stuff that it would be quite spectacular at eg. matrix manipulations, and thus delivering great performance on algorithms for which that is key ( Fast Fourier Transforms ). Their simplest offering is 4 x 4 for around $100 USD, however the design scales up to 64 x 64 ... I was most intrigued by their matrix multiplications using blocks within the matrix shifted synchronously b/w nodes, roughly speaking a two dimensional pipeline.

Cheers, Mike.

( edit ) The programming skill would largely be a matter of having a "parallel approach" and not language per se. For instance the address space within any given array is unprotected, meaning that any node/processor can read and write to any other's memory within a globally flat space, so the discipline required to prevent any incongruities arising from that would have to come from the program design and compilation. So you'd want to identify the elements in the problem space that could be simultaneously and independently executed, and if we have already written for GPU thread parallelism then that aspect is largely done.

( edit ) This also highlights an issue/query that arises here from time to time : why can't GPU's be used to speed up <insert Algorithm X here> ? Answer : Algorithm X or the problem space it arises from may not have sufficiently parallel aspects for that to yield a gain over non-massively-parallel solutions.



It does appear to be quite a change in 'thinking' and programming, as not
much replies or anwers have rosen ;-)

It also took some time before GPGPU was being used as CUDA o openCL.


____________
Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119564 - Posted: 11 Oct 2012, 22:48:50 UTC - in response to Message 119562.
Last modified: 12 Oct 2012, 4:24:22 UTC

It does appear to be quite a change in 'thinking' and programming, as not
much replies or anwers have rosen ;-)

There certainly is a lot to swallow ! :-)

It also took some time before GPGPU was being used as CUDA o openCL.

Actually they have said they will consider developing an OpenCL facility for it. Now that's a clever move, these guys are forwards thinkers for sure.

Also each node ( RISC processor plus it's slab of local memory ) is connected to each of three independent data buses that constitutes the 'mesh', two for writing and one for reading, with no latency on the channel that does fast on-chip writes b/w nodes. That goes ~ 16 x faster than corresponding reads! That's quite an asymmetry and a processor can never stall using that type of write! That would imply a chunk of buffering by the network-on-chip system. Anyways I think the hint there is : if a node_B needs results from a node_A, it is far more efficient for node_A to execute the above fast write to node_B's local memory THAN node_B executing a much slower read from node_A's memory. ( Plus Node_A will know better when it's finished some computation step, as opposed to Node_B polling ). Given that said data transfer could also include flags/semaphores/etc to coordinate/validate any data state, then you have a real snappy mechanism in the hardware to satisfy most 'process/thread' cooperation paradigms. [ There is specifically a TESTSET command at hardware level which is an atomic "test-if-not-zero" followed by conditional write. This is the usual mechanism to prevent deadlocks/races and whatnot/troubles with semaphores et al ]

Cheers, Mike.

( edit ) Think of the Three Stooges trying to go through the same door at the same time : 'after you Larry' ... 'no, after you Curly' ... 'please, you first Moe' ... 'no I couldn't' ... 'I must insist' ... 'no, I couldn't possibly' ... eventually they jam in the doorway and fight.

( edit ) You may be thinking : how can one label a data bus as 'only for reading' or 'only for writing', that typically being a question of perspective or which end you're at ? Answer : the difference is the specification of the addressing, who is controlling the transaction, and buffering. A read request has a 'return to sender' address component that a write doesn't ( similiar to a stamped self-addressed envelope ). That propagates through the mesh going left/right along a row and then going up/down along a column until the target node is found, and ditto for the return leg. A write does simply the first phase, in fact a node receiving data from another node's write does not know who sent it ( well, not from the hardware at least ).

( edit ) Of course such an array of processors can be task dedicated without double duty, unlike GPU's which generally perform a system graphics role also. BTW currently either a USB or an Ethernet pathway is how to connect such a co-processor to some 'host' system. There is talk of other modes, say even digital video output. So one can attach it to all manner of devices via suitable I/O ports. The other slower write channel, called xMesh, is for such off-chip connections - which could well be another similiar chip, but may be anything with appropriate circuit level compatibility/buffering. As you can tell I am rather enthused ... :-)
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Profile Rod
Avatar
Send message
Joined: 3 Jan 06
Posts: 4483
Credit: 811,266
RAC: 0
Message 119761 - Posted: 26 Oct 2012, 1:55:02 UTC
Last modified: 26 Oct 2012, 1:59:38 UTC

I just increased my pledge again.
So close.. A platform has so much potential, and right now that it. 'Potential to do great things'.

Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119762 - Posted: 26 Oct 2012, 2:16:58 UTC - in response to Message 119761.
Last modified: 26 Oct 2012, 2:19:28 UTC

I just increased my pledge again.
So close.. A platform has so much potential, and right now that it. 'Potential to do great things'.

Hey they've bumped up ~ $100K in the last day!! I'd given up hope ... so I've just gone up myself - the $199+ package ( 64 core Epiphany IV on the stretch ).

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Claggy
Send message
Joined: 29 Dec 06
Posts: 560
Credit: 2,444,053
RAC: 1,160
Message 119771 - Posted: 26 Oct 2012, 9:56:22 UTC - in response to Message 119762.

I just increased my pledge again.
So close.. A platform has so much potential, and right now that it. 'Potential to do great things'.

Hey they've bumped up ~ $100K in the last day!! I'd given up hope ... so I've just gone up myself - the $199+ package ( 64 core Epiphany IV on the stretch ).

Cheers, Mike.

I've just gone for the same package too,

Claggy
Profile dmike
Send message
Joined: 11 Oct 12
Posts: 76
Credit: 31,369,048
RAC: 0
Message 119776 - Posted: 26 Oct 2012, 12:01:32 UTC - in response to Message 119771.
Last modified: 26 Oct 2012, 12:08:48 UTC

Sure looks interesting. I love the open source nature of the product and the horsepower for such a low amount of power consumption.

I personally wouldn't have much use for one but still the price and package are attractive. I was considering one to add for E@H but I think I'd be better off buying a 550ti to add in another box as they're claiming 90 GFLOPS vs the 550ti 691 GFLOPS.

I'm a fan of what they're doing, I just wish I'd use it for something more than crunching. Unfortunately I don't have the need, but would look forward to others posting their experiences with the system.

Oh, and the cluster reward for pledging $975 sounds beyond awesome!
In any case, thanks for sharing this with us, Mike. I'd have never known about it otherwise!

Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119784 - Posted: 26 Oct 2012, 16:42:27 UTC - in response to Message 119776.
Last modified: 26 Oct 2012, 16:48:45 UTC

Sure looks interesting. I love the open source nature of the product and the horsepower for such a low amount of power consumption.

Well I don't want to get all poetical about it, but I think these guys are on an historic cusp. What they propose has massive potential and is so accessible. Why they're asking for grass roots money is simply that the biggies can't re-tool their way out of current commitments to existing paradigms ( which is not a criticism, just a fact of entrenched investment ). With development then the price goes down under threshold and you have a card that'll fit a PCIe slot with open source software that will hammer away.

I personally wouldn't have much use for one but still the price and package are attractive. I was considering one to add for E@H but I think I'd be better off buying a 550ti to add in another box as they're claiming 90 GFLOPS vs the 550ti 691 GFLOPS.

For now yes. But you can scale the very same design by several orders, while barely breaking a sweat ..... :-)

I'm a fan of what they're doing, I just wish I'd use it for something more than crunching. Unfortunately I don't have the need, but would look forward to others posting their experiences with the system.

Oh, and the cluster reward for pledging $975 sounds beyond awesome!
In any case, thanks for sharing this with us, Mike. I'd have never known about it otherwise!

Thank Rod for that, he told me! :-)

I've also just noted that they drop a Ubuntu 12.04 onto the ARM A9 CPU on the board they supply.

Cheers, Mike.

( edit ) They're up another $100K, and just $110K shy now.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Profile MarkJ
Avatar
Send message
Joined: 28 Feb 08
Posts: 309
Credit: 34,665,249
RAC: 9,554
Message 119789 - Posted: 26 Oct 2012, 20:57:02 UTC
Last modified: 26 Oct 2012, 20:59:33 UTC

As of this morning (7:55am Sydney time) and with 25 hours left they need another 41,000 to get there. Yes I have made a pledge, let's see if I need to pay.

I already have a Raspberry Pi but find it hard to find any BOINC projects that support it.
____________
BOINC blog

Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119790 - Posted: 26 Oct 2012, 21:05:49 UTC

I put in another $50 - for the book, a case with mounts, and any t-shirts that might turn up !! :-)

Cheers, Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Richard Haselgrove
Send message
Joined: 10 Dec 05
Posts: 1722
Credit: 64,904,104
RAC: 57,245
Message 119791 - Posted: 26 Oct 2012, 22:18:32 UTC - in response to Message 119789.

...with 25 hours left they need another 41,000 to get there...

23 hours to go, and $28,400 still needed - we can do this. My pledge is already included in that total.
Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119794 - Posted: 27 Oct 2012, 0:59:25 UTC

Yup, they made it just now!

Cheers, Mike
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal

Profile dmike
Send message
Joined: 11 Oct 12
Posts: 76
Credit: 31,369,048
RAC: 0
Message 119795 - Posted: 27 Oct 2012, 1:21:15 UTC - in response to Message 119794.

Nice!

Mike, will you be doing a comprehensive review of your setup once you've had some time with it?

Profile Mike Hewson
Volunteer moderator
Avatar
Send message
Joined: 1 Dec 05
Posts: 5084
Credit: 41,747,214
RAC: 9,736
Message 119796 - Posted: 27 Oct 2012, 1:29:04 UTC - in response to Message 119795.

Nice!

Mike, will you be doing a comprehensive review of your setup once you've had some time with it?

Absolutely! It's going to be bleeding edge but delivery is next May, so I have time to bone up on all the docs :-)

Mike.
____________
"I have made this letter longer than usual, because I lack the time to make it short." - Blaise Pascal
Profile Rod
Avatar
Send message
Joined: 3 Jan 06
Posts: 4483
Credit: 811,266
RAC: 0
Message 119797 - Posted: 27 Oct 2012, 1:44:52 UTC
Last modified: 27 Oct 2012, 1:45:11 UTC

Remember, as with anything, there is still risk. Expectations always get in people's way.

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,268,016
RAC: 109,222
Message 119808 - Posted: 27 Oct 2012, 19:05:35 UTC - in response to Message 119797.

Hi!
Nice that they got the money they had asked for, I also made a small pledge.

In the meantime, I'll play around with a Raspberry Pi which is of course the opposite of HPC and parallel computing, but should still be fun to play around with.

CU
HB

____________

Profile dmike
Send message
Joined: 11 Oct 12
Posts: 76
Credit: 31,369,048
RAC: 0
Message 119811 - Posted: 28 Oct 2012, 3:42:19 UTC - in response to Message 119797.

Remember, as with anything, there is still risk. Expectations always get in people's way.


I know this can be true myself, as I've been subject to that at various times in the past.

I see no reason to doubt that the platform would be able to produce processing power as advertised. I do have questions about how complex it might be to design applications for it that would be able to utilize that power to its fullest. I wouldn't be surprised at all if getting it to run as desired is a lot more complicated than the developer makes it sound.

I do also worry that the efforts required to make it a machine that is usable at the consumer will be enough to leave it stranded given that it's such a niche market. Non standardized components rarely survive unless there is a big market behind them.

Overall however, for people like the majority of us here, this technology is a big step in the right direction.
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 3502
Credit: 149,268,016
RAC: 109,222
Message 119813 - Posted: 28 Oct 2012, 10:32:40 UTC

I'm also not convinced that this will have a chance in the marketplace, but they deserve the opportunity to try, I guess. They do offer OpenCL in their SDK, so that sounds like it won't be too complicated to get something going on the platform with reasonable effort. Something like OpenAcc might be even better suited to this platform, maybe they can add that as well. We'll see.

Cheers
HB

____________

1 · 2 · 3 · 4 . . . 37 · Next

Message boards : Cruncher's Corner : Parallella, Raspberry Pi, FPGA & All That Stuff


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grants PHY-1104902, PHY-1104617 and PHY-1105572 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2016 Bruce Allen