**Overview**

Linear discriminant analysis (LDA) is one of the oldest mechanical classification systems, dating back to statistical pioneer Ronald Fisher, whose original 1936 paper on the subject,

*The Use of Multiple Measurements in Taxonomic Problems*, can be found online (for example, here).

The basic idea of LDA is simple: for each class to be identified, calculate a (different) linear function of the attributes. The class function yielding the highest score represents the predicted class.

There are many linear classification models, and they differ largely in how the coefficients are established. One nice quality of LDA is that, unlike some of the alternatives, it does not require multiple passes over the data for optimization. Also, it naturally handles problems with more than two classes and it can provide probability estimates for each of the candidate classes.

Some analysts attempt to interpret the signs and magnitudes of the coefficients of the linear scores, but this can be tricky, especially when the number of classes is greater than 2.

LDA bears some resemblance to principal components analysis (PCA), in that a number of linear functions are produced (using all raw variables), which are intended, in some sense, to provide data reduction through rearrangement of information. (See the Feb-26-2010 posting to this log, Principal Components Analysis.) Note, though, some important differences: First, the objective of LDA is to maximize class discrimination, whereas the objective of PCA is to squeeze variance into as few components as possible. Second, LDA produces exactly as many linear functions as there are classes, whereas PCA produces as many linear functions as there are original variables. Last, principal components are always orthogonal to each other ("uncorrelated"), while that is not generally true for LDA's linear scores.

**An Implementation**

I have made available on MATLAB Central, a routine, aptly named LDA which performs all the necessary calculations. I'd like to thank Deniz Seviş, whose prompting got me to finally write this code (with her) and whose collaboration is very much appreciated.

Note that the

*LDA*function assumes that the data its being fed is complete (no missing values) and performs no attribute selection. Also, it requires only base MATLAB (no toolboxes needed).

Use of

*LDA*is straightforward: the programmer supplies the input and target variables and, optionally, prior probabilities. The function returns the fitted linear discriminant coefficients.

*help LDA*provides a good example:

% Generate example data: 2 groups, of 10 and 15, respectively

X = [randn(10,2); randn(15,2) + 1.5]; Y = [zeros(10,1); ones(15,1)];

% Calculate linear discriminant coefficients

W = LDA(X,Y);

% Generate example data: 2 groups, of 10 and 15, respectively

X = [randn(10,2); randn(15,2) + 1.5]; Y = [zeros(10,1); ones(15,1)];

% Calculate linear discriminant coefficients

W = LDA(X,Y);

This example randomly generates an artificial data set of two classes (labeled 0 and 1) and two input variables. The LDA function fits linear discriminants to the data, and stores the result in

*W*. So, what is in

*W*? Let's take a look:

>> W

W =

-1.1997 0.2182 0.6110

-2.0697 0.4660 1.4718

>> W

W =

-1.1997 0.2182 0.6110

-2.0697 0.4660 1.4718

The first row contains the coefficients for the linear score associated with the first class (this routine orders the linear functions the same way as

*unique()*). In this model, -1.1997 is the constant and 0.2182 and 0.6110 are the coefficients for the input variables for the first class (class 0). Coefficients for the second class's linear function are in the second row. Calculating the linear scores is easy:

% Calulcate linear scores for training data

L = [ones(25,1) X] * W';

% Calulcate linear scores for training data

L = [ones(25,1) X] * W';

Each column represents the output of the linear score for one class. In this case, the first column is class 0, and the second column is class 1. For any given observation, the higher the linear score, the more likely that class. Note that LDA's linear scores are not probabilities, and may even assume negative values. Here are the values from my run:

>> L

L =

-1.9072 -3.8060

1.0547 3.2517

-1.2493 -2.0547

-1.0502 -1.7608

-0.6935 -0.8692

-1.6103 -2.9808

-1.3702 -2.4545

-0.2148 0.2825

0.4419 1.6717

0.2704 1.3067

1.0694 3.2670

-0.0207 0.7529

-0.2608 0.0601

1.2369 3.6135

-0.8951 -1.4542

0.2073 1.1687

0.0551 0.8204

0.1729 1.1654

0.2993 1.4344

-0.6562 -0.8028

0.2195 1.2068

-0.3070 0.0598

0.1944 1.2628

0.5354 2.0689

0.0795 1.0976

>> L

L =

-1.9072 -3.8060

1.0547 3.2517

-1.2493 -2.0547

-1.0502 -1.7608

-0.6935 -0.8692

-1.6103 -2.9808

-1.3702 -2.4545

-0.2148 0.2825

0.4419 1.6717

0.2704 1.3067

1.0694 3.2670

-0.0207 0.7529

-0.2608 0.0601

1.2369 3.6135

-0.8951 -1.4542

0.2073 1.1687

0.0551 0.8204

0.1729 1.1654

0.2993 1.4344

-0.6562 -0.8028

0.2195 1.2068

-0.3070 0.0598

0.1944 1.2628

0.5354 2.0689

0.0795 1.0976

To obtain estimated probabilities, simply run the linear scores through the softmax transform (exponentiate everything, and normalize so that they sum to 1.0):

% Calculate class probabilities

P = exp(L) ./ repmat(sum(exp(L),2),[1 2]);

% Calculate class probabilities

P = exp(L) ./ repmat(sum(exp(L),2),[1 2]);

As we see, most of the first 10 cases exhibit higher probabilities for class 0 (the first column) than for class 1 (the second column) and the reverse is true for the last 15 cases:

>> P

P =

0.8697 0.1303

0.1000 0.9000

0.6911 0.3089

0.6705 0.3295

0.5438 0.4562

0.7975 0.2025

0.7473 0.2527

0.3782 0.6218

0.2262 0.7738

0.2619 0.7381

0.1000 0.9000

0.3157 0.6843

0.4205 0.5795

0.0850 0.9150

0.6363 0.3637

0.2766 0.7234

0.3175 0.6825

0.2704 0.7296

0.2432 0.7568

0.5366 0.4634

0.2714 0.7286

0.4093 0.5907

0.2557 0.7443

0.1775 0.8225

0.2654 0.7346

>> P

P =

0.8697 0.1303

0.1000 0.9000

0.6911 0.3089

0.6705 0.3295

0.5438 0.4562

0.7975 0.2025

0.7473 0.2527

0.3782 0.6218

0.2262 0.7738

0.2619 0.7381

0.1000 0.9000

0.3157 0.6843

0.4205 0.5795

0.0850 0.9150

0.6363 0.3637

0.2766 0.7234

0.3175 0.6825

0.2704 0.7296

0.2432 0.7568

0.5366 0.4634

0.2714 0.7286

0.4093 0.5907

0.2557 0.7443

0.1775 0.8225

0.2654 0.7346

This model is not perfect, and would really need to be tested more rigorously (via holdout testing, k-fold cross validation, etc.) to determine how well it approximates the data.

I will not demonstrate its use here, but the

*LDA*routine offers a facility for modifying the prior probabilities. Briefly, the function assumes that the true distribution of classes is whatever it observes in the training data. Analysts, however, may wish to adjust this distribution for several reasons, and the third, optional, parameter allows this. Note that the LDA routine presented here always performs the adjustment for prior probabilities: Some statistical software drops the adjustment for prior probabilities altogether if the user specifies that classes are equally likely, and will produce different results than

*LDA*.

**Closing Thoughts**

Though it employs a fairly simple model structure, LDA has held up reasonably well, sometimes still besting more complex algorithms. When its assumptions are met, the literature records it doing better than logistic regression. It is very fast to execute and fitted models are extremely portable- even a spreadsheet will support linear models (...or, one supposes, paper and pencil!) LDA is at least worth trying at the beginning of a project, if for no other reason than to establish a lower bound on acceptable performance.

**See Also**

Feb-16-2010 posting, Single Neuron Training: The Delta Rule

Mar-15-2009 posting, Logistic Regression

## 24 comments:

Dear Will

Great to have you back on the blog after so long.

hi

tnx for your post

do u have any MatLab code or another programming languages about discriminant function analysis?

I would be appreciated you if you help me about the codes...my email is:mrebrahimi709@gmail.com

best regard for you

Jaidev Deshpande:

Thanks!

mohammad reza:

I have heard "discriminant function analysis" (also the term "multiple discriminant analysis") used to refer to linear discriminant analysis, so my understanding is that they are the same thing.

hi again

do know have any information about multigroup discriminant function?

best regards

Mani

mohammad reza:

I assume that you mean classification when there are more than 2 groups? Linear discriminants can do this. If you want to try this with my LDA() function, just use a target variable with more than 2 distinct values.

hi

thanks for your answer

Hello Mr Will Dwinnell

I was looking a matlab code about LDA and i found your code. It is nice, however i need to use 5 different data in it. So, If i will write the input and target as a following form, is it right or not:

X = [c1 ; c2; c3; c4; c5];

Y = [zeros(689,1); ones(689,1); 2*ones(309,1); 3*ones(692,1); 4*ones(689,1)];

It is urgent, please reply to me ASAP.

Thanks beforehand.

Hi Mr Will,

I found your post is very informative. Thanks!

However, I would like to ask any idea to use principal component prior to disciminant analysis? Im a beginner in Matlab, and Im facing difficulties in writing the script. Thank you! My email: megumi.wai@gmail.com

Thank you in advance!

Can you help me out in classification, I have 123 subjects and have a 16384 dimension column vector for each, I have been able to use PCA for projecting data in Eigen space and doing the recognition.But I am confused with how to do it with LDA, as in things appear to be hazy.

shreyas,

That is very many candidate input variables for so few observations. I expect that you'll need to either select a much smaller number of inputs from this list or reduce them some other way (as by PCA).

It's probably a stupid question, but is it possible to do dimension reduction with this code, and if it is, I would like to know how that would work.

Thank you.

with6:

In a sense, yes. Assuming that the number of classes is less than the number of predictor variables, then the set of discriminant functions is a reduced set of data.

Of course, as with PCA, all of the original predictor variables will still be needed, but any downstream analysis will have less variables to deal with.

Thx for your code.

How can I check the probabilities if my data has more than 2 classes.

Assuming that the target variable used in the example above,

Y, had three distinct values, representing three class to be categorized, then calculation of the linear scores would be exactly the same:% Calulcate linear scores for training dataL = [ones(25,1) X] * W';

...and calculation of the probability estimates would be nearly the same, with the last element of the second parameter to

repmat()being the number of classes, which is 3 in this case:% Calculate class probabilitiesP = exp(L) ./ repmat(sum(exp(L),2),[1 3]);

Thank for your answer.

hello , I am trying to run your code but Matlab is not excepting LDA as a command :o

X = [randn(10,2); randn(15,2) + 1.5];

>> Y = [zeros(10,1); ones(15,1)];

>> W = LDA(X,Y)

??? Undefined function or method 'LDA' for input arguments of type 'double'.

Nauman, The error message you received indicates that your installation of MATLAB is not aware of any

LDAfunction. MATLAB only knows about the functions which are built into itself, and any which you add to it. You need to download the LDA function from MATLAB Central (check near the top of this posting).Hello,

I am wondering if you can help me, I have two variables one I know what it is i.e. the target and another I don't know, i.e. the input. Both variables are matrices (1281x1) however when I run it, W = NaN. what could be causing the problem?

can we get back the input variables after dimensionality reduction by LDA?

We can in PCA...

Could u please tell me...

do u have any matlab code for image quality identifier with using lda..?and explain about that plz..my mail:pavigiri1993@gmail.com

hi

i recently read a paper that uses LDA as a function to reduce dimentionality .in this article author said that :

"In order to reduce the dimensionality of the iriscode and remove the redundancy present in the code, LDA is applied to the

iriscode features. Only the top 80 LDA coefficients are retained and these ..."

i want to use your code as LDA, but i do'nt know how can i retain top 80 coefficient? i'm so confused .if you help me ,i will appreciate you .

Dear Will

Thank you so much for this very informative blog.

I have a question regarding prior probabilities:

My students and I are using LDA to classify satellite data into 2 categories - agriculture and non-agriculture using 9 features as inputs derived form satellite data. Our goal is to sample the afro-ecological zones of the Earth with randomly located satellite footprints (each about 150 miles wide) and extract the training data fro LDA within each footprint and get the discriminate coefficients and then apply those copefficients to other footprints without having to re-train. This is called spatial generalization. It turns out that it is not an easy problem but to help the issue, we also have another source of data for each footprint that shows probability of agriculture for every pixel. We have two ways of using these priors data: 1) calculate probability of agriculture from LDA for each pixel and merge that with the priors using the Bayes' rule; 2) modify the LDA prediction while using the priors directly in the LDA analysis. Do you have any suggestions about this? Would one work better than the other? Also if we want to use the small training data for predicting a large number of unseen cases in the satellite data, how do we bring the priors data into the LDA process?

Thank you again.

Mutlu..

I have obtained 2 discriminant functions for a three group model. There are 3 grouping variables, but all three variables are loading on first function and none loads on second. What should be done? How should I interpret it?

First function is explaining 90.1% variance and second function 9.9%

Dear Will.

Hello, i'm Jin.

I'm using your LDA function well.

But I wonder about why you multiply '-0.5' about (Temp*GroupMean).

I don't understand that....

i know that factor(-0.5 * Temp*GroupMean) using like a Threshold...

but WHY -0.5 ??

Can you give some intuition or explanation about that?

Post a Comment