Saturday, February 28, 2009

Getting Data Into and Out of Excel

Introduction

Recently, I needed to assemble a report which is to be generated quarterly and distributed to those unfortunate enough not to have MATLAB. Currently, such reports are distributed as Excel workbooks. Nearly everyone who produces such reports where I work does so by generating tables of numbers and cutting-and-pasting them into Excel (yuck!). As I have a number of these reports to produce, I was motivated to construct a more automatic solution.

Happily, MATLAB can easily get data from or send data to Excel documents. The mechanics of this are not difficult, but I thought that readers might not be aware that this facility exists and just how easy this is to accomplish.

There is just one catch: For users without Excel to act as a COM server (such as UNIX users), 'basic' mode is required, and functionality will be limited: See help xlsread for details.


Getting Data From Excel in MATLAB

MATLAB's function for extracting data from Excel documents is xlsread. Using it is as simple as this:

[NumericData TextData] = xlsread(FileName,SheetName,CellRange)

...where NumericData and TextData contain the numeric and text data read from the workbook, respectively; and FileName, SheetName and CellRange are the names of the Excel document, sheet name and cell range from which to read.

Often, I find myself needing to read data from growing ranges of cells within Excel spreadsheets. Think of daily rainfall data stored in a single column within a spreadsheet, which periodically has data appended to it. To load such data, simply set the range of cells to be much larger than the existing range: xlsread will ignore the extra empty spreadsheet cells.


Getting Data Into Excel in MATLAB

Writing data to Excel documents is also quite simple. Just use xlswrite:

xlswrite(FileName,DataArray,SheetName,CellRange)

...where FileName, SheetName and CellRange are the names of the Excel document, sheet name and cell range to which to write, and DataArray contains the data to be written.


Final note

Refer to the help facility for diagnostic and other capabilities of both of these functions. See also wk1read and wk1write to handle the old Lotus 1-2-3 .WK1 format.

Thursday, February 12, 2009

Parallel Programming: Another Look

Introduction

In my last posting, Parallel Programming: A First Look (Nov-16-2008), I introduced the subject of parallel programming in MATLAB. In that case, I briefly described my experiences with the MATLAB Parallel Computing Toolbox, from the MathWorks. Since then, I have been made aware of another parallel programming product for MATLAB, the Jacket Engine for MATLAB, from AccelerEyes.

Jacket differs from the Parallel Computing Toolbox in that Jacket off-loads work to the computer's GPU (graphics processing unit), whereas the Parallel Computer Toolbox distributes work over multiple cores or processors. Each solution has its merits, and it would be worth the time of MATLAB programmers interested in accelerating computation to investigate the nuances of each.


Some History

Having observed the computer hardware industry for several decades now, I have witnessed the arrival and departure of any number of special-purpose add-in cards which have been used to speed up math for things like neural networks, etc. For my part, I have resisted the urge to employ such hardware assistance for several reasons:

First, special hardware nearly always requires special software. Accommodating the new hardware environment with custom software means an added learning curve for the program and drastically reduced code portability.

Second, there is the cost of the hardware itself, which was often considerable.

Third, there was the fundamental fact that general-purpose computing hardware was inexorably propelled forward by a very large market demand. Within 2 or 3 years, even the coolest turbo board would be outclassed by new PCs, which didn't involve either of the two issues mentioned above.

Two significant items have emerged in today's computer hardware environment: multi-core processors and high-power graphics processors. Even low-end PCs today sport central processors featuring at least two cores, which you may think of more-or-less as "2 (or more) computers on a single chip". As chip complexity has continued to grow, chip makers like Intel and AMD have fit multiple "cores" on single chips. It is tempting to think that this would yield a direct benefit to the user, but the reality is more subtle. Most software was written to run on single-core computers, and is not equipped to take advantage of the extra computing power of today's multi-core computers. This is where the Parallel Computer Toolbox steps in, by providing programmers a way to distribute the execution of their programs over several cores or processors, resulting in a substantially improved performance.

Similarly, the graphics subsystem in desktop PCs has also evolved to a very sophisticated state. At the dawn of the IBM PC (around 1980), graphics display cards with few exceptions basically converted the contents of a section of memory into a display signal usable by a computer monitor. Graphics cards did little more.

Over time, though, greater processing functionality was added to the graphics cards culminating in compute engines which would rival supercomputer-class machines of only a few years ago. This evolution has been fueled by the inclusion of many processing units (today, some cards contain hundreds of these units). Originally designed to perform specific graphics functions, many of these units are not small, somewhat general-purpose computers and they can be programmed to do things having nothing to do with the image shown on the computer's monitor. Tapping into this power requires some sort of programming interface, though, which is where Jacket comes in.


Caveats

Here is a simple assessment of the pros and cons of these two methods of achieving parallel computing on the desktop:


Multi-Core:

Good:

The required hardware is cheap. If you program in MATLAB, you probably have at least 2 cores at your disposal already, if not more.

Bad:

Most systems top out 4 cores, limiting the potential speed-up with this method (although doubling or quadrupling performance isn't bad).


GPU:

Good:

The number of processing units which can be harnessed by this method is quite large. Some of the fancier graphics cards have over 200 such units.

Bad:

The required hardware may be a bit pricey, although the price/performance is probably still very attractive.

Most GPUs will only perform single-precision floating point math. Newer GPUs, though, will perform double-precision floating-point math.

Moving data from the main computer to the graphics card and back takes time, eating into the potential gain.


Conclusion

My use of the Parallel Computing Toolbox has been limited to certain, very specific tasks, and I have not used Jacket at all. The use of ubiquitous multi-core computers and widely-available GPUs avoids most of the problems I described regarding special-purpose hardware. It will be very interesting to see how these technologies fit into the technological landscape over the next few years, and I am eager to learn of readers' experiences with them.