Thursday, December 07, 2006

Hardware for Data Mining in MATLAB

Appropriate hardware is important if data mining is to be performed on the desktop (MATLAB users on other platforms: you get the day off). Needs will vary, naturally, with the amount of data being handled and the type of analysis being performed. In my opinion, given what most organizations are willing to pay for data mining software, the incremental cost of going from a stock PC to a performance PC is comparatively small.

If, like me, one reads the data entirely into memory, then the two most important performance factors in selecting a PC will be the processor and the amount of RAM. Hard-drive size should generally not be an issue given the current, extremely low cost of storage. Hard-drive speed, however, may become important if the drive is being accessed frequently. I welcome input from readers, but I have never found MATLAB to be very demanding of the graphics hardware.

In my day job, I tend to work with data sets (individual MATLAB arrays which I load from tab-delimited files) that range (give or take) in size up to a few hundred thousand rows and as many as a few hundred columns, not necessarily at the same time. At home, I experiment data sets of with similar size.

Until recently, my work computer used an AMD Athlon64 FX-53 (2.4 GHz, no overclocking) and 2 GB RAM, which cost US$2,000 when purchased in 2004. My current work system features an Intel Core 2 Extreme X6800 (2.93 GHz) and 4GB RAM, which cost US$2,700 at purchase in Nov-2006. My home system sports an AMD Athlon64 3400+ with 1GB RAM and cost somewhere between, as best as I can recall, about US$1,500, at purchase in 2004. All of these systems run Windows XP (32-bit) and have met my performance needs.

Performance PC boutiques offer a number of advantages over large PC manufacturers. Component quality tends to be higher with performance boutiques (not all 120GB hard drives are the same). Service and attention to detail also tend to be better. Provision of all original OS, software and driver disks is typical. Individualized "owener's manuals", with all system specifications and settings are also typical, as are disk images of the system as shipped. I have had very good experiences buying from performance PC boutiques. Specifically, I have purchased machines from these two vendors, and been very satisfied:

Falcon Northwest
Velocity Micro

Some other performance PC boutiques which are popular, although I cannot vouch for them personally, are:

Hypersonic PC Systems
Voodoo PC

The performance PC boutiques started by catering to the gaming and science/engineering crowds. (If you're not aware, some games are among the most computationally demanding applications.) Over time, though, these vendors have diversified their product lines to address other interests. Data miners will consume the same amount of computational horsepower as gamers, but don't need the fancy graphics adapters, joysticks, speakers, etc. Some boutiques have machines labeled as "A/V" computers, specifically geared for multimedia audio/video work: These are ideal for data mining work. A good source of information on performance computers, including reviews of systems, is ExtremeTech.

Readers using 64-bit MATLAB (on Windows or Linux), please comment on your experiences!

No comments: