Introduction
Recently, I have been experimenting with the MATLAB Parallel Computing Toolbox, which permits MATLAB programmers to spread work over multiple cores, processors or computers. My primary interest is in leveraging my quad-core desktop PC to accelerate the compute-intensive programs I use for data mining.
The Parallel Computing Toolbox is a MATLAB add-on package from the Mathworks which provides a number of parallel programming mechanisms. The one I have spent the most time with is parallel looping, which is accomplished via the parfor command. The basic idea is to have separate iterations of a for-loop be executed on separate cores or processors.
The required change to conventional code is tiny. For example, this conventional loop:
>> for i = 1:10, disp(int2str(i)), end
1
2
3
4
5
6
7
8
9
10
...becomes this parallel loop:
>> matlabpool open 4, parfor i = 1:10, disp(int2str(i)), end, matlabpool close
Starting matlabpool using the parallel configuration 'local'.
Waiting for parallel job to start...
Connected to a matlabpool session with 4 labs.
Sending a stop signal to all the labs...
Waiting for parallel job to finish...
4
3
2
1
6
5
9
8
10
7
Performing parallel job cleanup...
Done.
Notice three important differences:
First, the command "for" becomes "parfor"- easy, right?
Second, there is some stuff before and after the loop regarding the matlabpool. These commands, respectively, start up and shut down the parallel programming capability. They do not need to bracket every parfor-loop: you can start the matlabpool at the beginning of a program, use any number of parfor-loops and shut down the matlabpool at the end of the program.
Third, notice that the loop iterations did not execute in order. In many situations, this will not matter. In some, it will. This is one of the quirks of programming for a parallel processor. Being aware of this is the programmer's responsibility. Welcome to the future of computing!
Experiences
My experiences programming with the Parallel Computing Toolbox have been mixed. The good news is that, just using the parallel looping functionality, I have seen code which runs as much as 3 times as fast on my quad-core computer. My tests have involved large numbers of regressions or clusterings (k-means): tasks typical of a data mining project, especially where parameter sweeps or bootstrapping are involved. The bad news is that I have not always seen such dramatic improvement, and in fact I sometimes see minor slow-downs.
As far as I can tell, there is a limit to the amount of data I can be juggling at any one time, and going beyond that (remember that each core will need space for its own share of the problem) exceeds my system's available RAM, consequently slowing parallel processing as cores fight for memory. For reference, my current system is thus:
Manufacturer: Velocity Micro
Model: Vector Z35
CPU: Intel Q6600, 2.4GHz (4 cores)
RAM: 4GB
OS: Windows XP (32-bit)
At present, Windows only shows about 3.24GB of that physical RAM. My strong suspicion is that moving to a 64-bit environment (there are 64-bit versions of both Windows XP and Window Vista, as well as Linux) would permit access to more physical RAM and allow acceleration of parallel code which deals with larger data. In the meantime, though, at least some of my code is running 3 times as fast as it was, which would require the equivalent of a single core processor running at about 7.2GHz!
See also: Parallel Programming: Another Look (Feb-12-2009)