Overview
Linear discriminant analysis (LDA) is one of the oldest mechanical classification systems, dating back to statistical pioneer Ronald Fisher, whose original 1936 paper on the subject, The Use of Multiple Measurements in Taxonomic Problems, can be found online (for example, here).
The basic idea of LDA is simple: for each class to be identified, calculate a (different) linear function of the attributes. The class function yielding the highest score represents the predicted class.
There are many linear classification models, and they differ largely in how the coefficients are established. One nice quality of LDA is that, unlike some of the alternatives, it does not require multiple passes over the data for optimization. Also, it naturally handles problems with more than two classes and it can provide probability estimates for each of the candidate classes.
Some analysts attempt to interpret the signs and magnitudes of the coefficients of the linear scores, but this can be tricky, especially when the number of classes is greater than 2.
LDA bears some resemblance to principal components analysis (PCA), in that a number of linear functions are produced (using all raw variables), which are intended, in some sense, to provide data reduction through rearrangement of information. (See the Feb-26-2010 posting to this log, Principal Components Analysis.) Note, though, some important differences: First, the objective of LDA is to maximize class discrimination, whereas the objective of PCA is to squeeze variance into as few components as possible. Second, LDA produces exactly as many linear functions as there are classes, whereas PCA produces as many linear functions as there are original variables. Last, principal components are always orthogonal to each other ("uncorrelated"), while that is not generally true for LDA's linear scores.
An Implementation
I have made available on MATLAB Central, a routine, aptly named LDA which performs all the necessary calculations. I'd like to thank Deniz Seviş, whose prompting got me to finally write this code (with her) and whose collaboration is very much appreciated.
Note that the LDA function assumes that the data its being fed is complete (no missing values) and performs no attribute selection. Also, it requires only base MATLAB (no toolboxes needed).
Use of LDA is straightforward: the programmer supplies the input and target variables and, optionally, prior probabilities. The function returns the fitted linear discriminant coefficients. help LDA provides a good example:
% Generate example data: 2 groups, of 10 and 15, respectively
X = [randn(10,2); randn(15,2) + 1.5]; Y = [zeros(10,1); ones(15,1)];
% Calculate linear discriminant coefficients
W = LDA(X,Y);
This example randomly generates an artificial data set of two classes (labeled 0 and 1) and two input variables. The LDA function fits linear discriminants to the data, and stores the result in W. So, what is in W? Let's take a look:
>> W
W =
-1.1997 0.2182 0.6110
-2.0697 0.4660 1.4718
The first row contains the coefficients for the linear score associated with the first class (this routine orders the linear functions the same way as unique()). In this model, -1.1997 is the constant and 0.2182 and 0.6110 are the coefficients for the input variables for the first class (class 0). Coefficients for the second class's linear function are in the second row. Calculating the linear scores is easy:
% Calulcate linear scores for training data
L = [ones(25,1) X] * W';
Each column represents the output of the linear score for one class. In this case, the first column is class 0, and the second column is class 1. For any given observation, the higher the linear score, the more likely that class. Note that LDA's linear scores are not probabilities, and may even assume negative values. Here are the values from my run:
>> L
L =
-1.9072 -3.8060
1.0547 3.2517
-1.2493 -2.0547
-1.0502 -1.7608
-0.6935 -0.8692
-1.6103 -2.9808
-1.3702 -2.4545
-0.2148 0.2825
0.4419 1.6717
0.2704 1.3067
1.0694 3.2670
-0.0207 0.7529
-0.2608 0.0601
1.2369 3.6135
-0.8951 -1.4542
0.2073 1.1687
0.0551 0.8204
0.1729 1.1654
0.2993 1.4344
-0.6562 -0.8028
0.2195 1.2068
-0.3070 0.0598
0.1944 1.2628
0.5354 2.0689
0.0795 1.0976
To obtain estimated probabilities, simply run the linear scores through the softmax transform (exponentiate everything, and normalize so that they sum to 1.0):
% Calculate class probabilities
P = exp(L) ./ repmat(sum(exp(L),2),[1 2]);
As we see, most of the first 10 cases exhibit higher probabilities for class 0 (the first column) than for class 1 (the second column) and the reverse is true for the last 15 cases:
>> P
P =
0.8697 0.1303
0.1000 0.9000
0.6911 0.3089
0.6705 0.3295
0.5438 0.4562
0.7975 0.2025
0.7473 0.2527
0.3782 0.6218
0.2262 0.7738
0.2619 0.7381
0.1000 0.9000
0.3157 0.6843
0.4205 0.5795
0.0850 0.9150
0.6363 0.3637
0.2766 0.7234
0.3175 0.6825
0.2704 0.7296
0.2432 0.7568
0.5366 0.4634
0.2714 0.7286
0.4093 0.5907
0.2557 0.7443
0.1775 0.8225
0.2654 0.7346
This model is not perfect, and would really need to be tested more rigorously (via holdout testing, k-fold cross validation, etc.) to determine how well it approximates the data.
I will not demonstrate its use here, but the LDA routine offers a facility for modifying the prior probabilities. Briefly, the function assumes that the true distribution of classes is whatever it observes in the training data. Analysts, however, may wish to adjust this distribution for several reasons, and the third, optional, parameter allows this. Note that the LDA routine presented here always performs the adjustment for prior probabilities: Some statistical software drops the adjustment for prior probabilities altogether if the user specifies that classes are equally likely, and will produce different results than LDA.
Closing Thoughts
Though it employs a fairly simple model structure, LDA has held up reasonably well, sometimes still besting more complex algorithms. When its assumptions are met, the literature records it doing better than logistic regression. It is very fast to execute and fitted models are extremely portable- even a spreadsheet will support linear models (...or, one supposes, paper and pencil!) LDA is at least worth trying at the beginning of a project, if for no other reason than to establish a lower bound on acceptable performance.
See Also
Feb-16-2010 posting, Single Neuron Training: The Delta Rule
Mar-15-2009 posting, Logistic Regression
Showing posts with label classify. Show all posts
Showing posts with label classify. Show all posts
Saturday, December 11, 2010
Tuesday, February 16, 2010
Single Neuron Training: The Delta Rule
I have recently put together a routine, DeltaRule, to train a single artificial neuron using the delta rule. DeltaRule can be found at MATLAB Central.
This posting will not go into much detail, but this type of model is something like a logistic regression, where a linear model is calculated on the input variables, then passed through a squashing function (in this case the logistic curve). Such models are most often used to model binary outcomes, hence the dependent variable is normally composed of the values 0 and 1.
Single neurons with linear functions (with squashing functions or not) are only capable of separating classes that may be divided by a line (plane, hyperplane), yet they are often useful, either by themselves or in building more complex models.
Use help DeltaRule for syntax and a simple example of its use.
Anyway, I thought readers might find this routine useful. It trains quickly and the code is straightforward (I think), making modification easy. Please write to let me know if you do anything interesting with it.
If you are already familiar with simple neural models like this one, here are the technical details:
Learning rule: incremental delta rule
Learning rate: constant
Transfer function: logistic
Exemplar presentation order: random, by training epoch
See also the Mar-15-2009 posting, Logistic Regression and the Dec-11-2010 posting, Linear Discriminant Analysis (LDA).
This posting will not go into much detail, but this type of model is something like a logistic regression, where a linear model is calculated on the input variables, then passed through a squashing function (in this case the logistic curve). Such models are most often used to model binary outcomes, hence the dependent variable is normally composed of the values 0 and 1.
Single neurons with linear functions (with squashing functions or not) are only capable of separating classes that may be divided by a line (plane, hyperplane), yet they are often useful, either by themselves or in building more complex models.
Use help DeltaRule for syntax and a simple example of its use.
Anyway, I thought readers might find this routine useful. It trains quickly and the code is straightforward (I think), making modification easy. Please write to let me know if you do anything interesting with it.
If you are already familiar with simple neural models like this one, here are the technical details:
Learning rule: incremental delta rule
Learning rate: constant
Transfer function: logistic
Exemplar presentation order: random, by training epoch
See also the Mar-15-2009 posting, Logistic Regression and the Dec-11-2010 posting, Linear Discriminant Analysis (LDA).
Saturday, March 03, 2007
MATLAB 2007a Released
The latest version of MATLAB, 2007a, has been released. While some changes to base MATLAB are of interest to data miners (multi-threading, in particular), owners of the Statistics Toolbox receive a number of new features in this major upgrade.
First, the Statistics Toolbox makes new data structures available for categorical data (categorical arrays) and mixed-type data (dataset arrays). Most MATLAB users performing statistical analysis or data mining tend to store their data in numerical matrices (my preference) or cell arrays. Using ordinary matrices requires the programmer/analyst to manage things like variable names. Cell arrays deal with the variable name issue, but preclude some of the nice things about using MATLAB matrices. Hopefully these new structures make statistical analysis in MATLAB more natural.
Second, the Statistics Toolbox updates the classify function, permitting it to output the discovered discriminant coefficients (at last!). I have been complaining about this for a long time. Why? Because classify provides quadratic discriminant analysis (QDA), an important non-linear modeling algorithm. Without the coefficients, though, it is impossible to deliver models to other (admittedly inferior to MATLAB) platforms.
Also of note: the Genetic Algorithm and Direct Search Toolbox now includes simulated annealing.
More information on 2007a is available at The Mathworks.
First, the Statistics Toolbox makes new data structures available for categorical data (categorical arrays) and mixed-type data (dataset arrays). Most MATLAB users performing statistical analysis or data mining tend to store their data in numerical matrices (my preference) or cell arrays. Using ordinary matrices requires the programmer/analyst to manage things like variable names. Cell arrays deal with the variable name issue, but preclude some of the nice things about using MATLAB matrices. Hopefully these new structures make statistical analysis in MATLAB more natural.
Second, the Statistics Toolbox updates the classify function, permitting it to output the discovered discriminant coefficients (at last!). I have been complaining about this for a long time. Why? Because classify provides quadratic discriminant analysis (QDA), an important non-linear modeling algorithm. Without the coefficients, though, it is impossible to deliver models to other (admittedly inferior to MATLAB) platforms.
Also of note: the Genetic Algorithm and Direct Search Toolbox now includes simulated annealing.
More information on 2007a is available at The Mathworks.
Subscribe to:
Comments (Atom)