In my last posting, Parallel Programming: A First Look (Nov-16-2008), I introduced the subject of parallel programming in MATLAB. In that case, I briefly described my experiences with the MATLAB Parallel Computing Toolbox, from the MathWorks. Since then, I have been made aware of another parallel programming product for MATLAB, the Jacket Engine for MATLAB, from AccelerEyes.
Jacket differs from the Parallel Computing Toolbox in that Jacket off-loads work to the computer's GPU (graphics processing unit), whereas the Parallel Computer Toolbox distributes work over multiple cores or processors. Each solution has its merits, and it would be worth the time of MATLAB programmers interested in accelerating computation to investigate the nuances of each.
Having observed the computer hardware industry for several decades now, I have witnessed the arrival and departure of any number of special-purpose add-in cards which have been used to speed up math for things like neural networks, etc. For my part, I have resisted the urge to employ such hardware assistance for several reasons:
First, special hardware nearly always requires special software. Accommodating the new hardware environment with custom software means an added learning curve for the program and drastically reduced code portability.
Second, there is the cost of the hardware itself, which was often considerable.
Third, there was the fundamental fact that general-purpose computing hardware was inexorably propelled forward by a very large market demand. Within 2 or 3 years, even the coolest turbo board would be outclassed by new PCs, which didn't involve either of the two issues mentioned above.
Two significant items have emerged in today's computer hardware environment: multi-core processors and high-power graphics processors. Even low-end PCs today sport central processors featuring at least two cores, which you may think of more-or-less as "2 (or more) computers on a single chip". As chip complexity has continued to grow, chip makers like Intel and AMD have fit multiple "cores" on single chips. It is tempting to think that this would yield a direct benefit to the user, but the reality is more subtle. Most software was written to run on single-core computers, and is not equipped to take advantage of the extra computing power of today's multi-core computers. This is where the Parallel Computer Toolbox steps in, by providing programmers a way to distribute the execution of their programs over several cores or processors, resulting in a substantially improved performance.
Similarly, the graphics subsystem in desktop PCs has also evolved to a very sophisticated state. At the dawn of the IBM PC (around 1980), graphics display cards with few exceptions basically converted the contents of a section of memory into a display signal usable by a computer monitor. Graphics cards did little more.
Over time, though, greater processing functionality was added to the graphics cards culminating in compute engines which would rival supercomputer-class machines of only a few years ago. This evolution has been fueled by the inclusion of many processing units (today, some cards contain hundreds of these units). Originally designed to perform specific graphics functions, many of these units are not small, somewhat general-purpose computers and they can be programmed to do things having nothing to do with the image shown on the computer's monitor. Tapping into this power requires some sort of programming interface, though, which is where Jacket comes in.
Here is a simple assessment of the pros and cons of these two methods of achieving parallel computing on the desktop:
The required hardware is cheap. If you program in MATLAB, you probably have at least 2 cores at your disposal already, if not more.
Most systems top out 4 cores, limiting the potential speed-up with this method (although doubling or quadrupling performance isn't bad).
The number of processing units which can be harnessed by this method is quite large. Some of the fancier graphics cards have over 200 such units.
The required hardware may be a bit pricey, although the price/performance is probably still very attractive.
Most GPUs will only perform single-precision floating point math. Newer GPUs, though, will perform double-precision floating-point math.
Moving data from the main computer to the graphics card and back takes time, eating into the potential gain.
My use of the Parallel Computing Toolbox has been limited to certain, very specific tasks, and I have not used Jacket at all. The use of ubiquitous multi-core computers and widely-available GPUs avoids most of the problems I described regarding special-purpose hardware. It will be very interesting to see how these technologies fit into the technological landscape over the next few years, and I am eager to learn of readers' experiences with them.