Friday, March 23, 2007

Two Bits of Code

Given some private requests for code, I figured that I would share two routines which I've mentioned in this log.

In the Nov-17-2006 entry, Mahalanobis Distance, I mentioned that I had implemented my own Mahalanobis distance routine in MATLAB. That routine is now available at:

MahalanobisDistance.m


In the Jan-26-2007 posting, Pixel Classification Project, one of the texture features which proved useful was the "edge detector". This routine is now available here:

DiffEdge.m

DiffEdge calculates a summary of the differences in brightness levels of opposing pixels on the square (whose size is indicated by the user) surrounding the pixel of interest. This operator was described in the article "Image Processing, Part 6: Advanced Edge Detection", by Dwayne Phillips, which appeared in the Jan-1992 issue of "C/C++ Users Journal".

Saturday, March 03, 2007

MATLAB 2007a Released

The latest version of MATLAB, 2007a, has been released. While some changes to base MATLAB are of interest to data miners (multi-threading, in particular), owners of the Statistics Toolbox receive a number of new features in this major upgrade.

First, the Statistics Toolbox makes new data structures available for categorical data (categorical arrays) and mixed-type data (dataset arrays). Most MATLAB users performing statistical analysis or data mining tend to store their data in numerical matrices (my preference) or cell arrays. Using ordinary matrices requires the programmer/analyst to manage things like variable names. Cell arrays deal with the variable name issue, but preclude some of the nice things about using MATLAB matrices. Hopefully these new structures make statistical analysis in MATLAB more natural.

Second, the Statistics Toolbox updates the classify function, permitting it to output the discovered discriminant coefficients (at last!). I have been complaining about this for a long time. Why? Because classify provides quadratic discriminant analysis (QDA), an important non-linear modeling algorithm. Without the coefficients, though, it is impossible to deliver models to other (admittedly inferior to MATLAB) platforms.

Also of note: the Genetic Algorithm and Direct Search Toolbox now includes simulated annealing.

More information on 2007a is available at The Mathworks.