There are plenty of commercial data mining tools and statistics packages out there. Why choose MATLAB? In short: flexibility and control. Having reviewed a wide assortment of analytical software for PC AI magazine and used any number of others in my work, I've had the opportunity to sample many tools. Some commercial tools are very polished, providing all manner of "bells and whistles" for things like data import, data preprocessing, etc.
In my work, however, I always found something missing, even in the best software. Real-world data mining projects often involve technical constraints which are unanticipated by commercial software developers. Not that this should be surprising: real projects come from all different fields and some impose the most bizarre constraints, like special treatment of negative values, large numbers of missing values, strange performance functions, small data requiring special testing procedures and on and on.
MATLAB provides the flexibility to deal with these quirky issues, if the analyst is able to code a solution. As a programming language, MATLAB is very like other procedural languages such as Fortran or C (MATLAB does have object-oriented features, but I won't get into that here). If I need to use a special type of regression and want to use my own feature selection process within a k-fold cross-validation procedure that needs a special data sampling procedure, I can by programming it in MATLAB.
Stepping back, consider major tasks frequently undertaken in a data mining project: data acquisition, data preparation, modeling, model execution and reporting/graphing. MATLAB allows me to do all of these under one "roof". The one gap with MATLAB is that it is not very good at relational joins. Look-up tables (even large ones) for tacking on a single variable are fine, but MATLAB is not built to perform SQL-style joins.
Much of this would be possible in more conventional programming languages, such as C or Java, but MATLAB's native treatment of arrays as data types and provision of many analysis-oriented functions in the base product make it much more convenient, and ensure that my MATLAB code will run for any other MATLAB user, without the need for them to own the same code libraries as I do.
Statistics packages fall short in that most of them provide a collection of canned routines. However many routines and options they provide, there will eventually be something which is missing, or a new procedure which you will find difficult or impossible to implement. Some statistics packages include their own scripting languages, but most of these are weak in comparison to full-blown programming languages.
Data mining tools vary, but tend to be even more limited in the procedures they provide than the statistics packages. Some even have only a single algorithm! They tend to be even more polished and easy to use than the statistics packages, but are hence that much more confining.
Graphing capability in MATLAB is among the best in the business, and all MATLAB graphs are compeltely configurable through software. Cutting-and-pasting to get data out of a statistics package (some provide little or no graphing capability in the base product) and into Excel isn't so bad if there is only one graph to produce. Recently, I needed to generate graphs for 15 market segments. Doing that by hand would have been collosally wasteful. In MATLAB, I set up a single graph with the fonts, etc. the way I wanted them, and looped over the data, producing 15 graphs.
On an unrelated note, you can read more posts my myself at Data Mining and Predictive Analytics. Also, consider visiting my seriously out-dated Web page at will.dwinnell.com.