Nice explanation! Thanks!
Could you also give an explanation for the case when the dimensionality (d) is greater than the number of samples (s), and how to extract more PCs than s?

For example:
Given:
s = 100
d = 3000

Problem:
Let's say reduce d to 1500.

Thanks in advance
Nice article.

After applying PCA , if I want to take only first component and throw out other 2 components and want to reproduce the original data , then my reproduced data still having three variable. Can you please tell me how this procedure affects my data.

Or if I want to predict some Y variable based on given data , then how can we use PCA for that ?

Thank you.
Hi Will, 
Nice post to explain PCA. I wonder if you can help my simple problem. I wish to do a GPR with input from PCA of my data, and I learned that the right way to do the CV is by doing PCA on the training set, then use the training regression coefficients to map the test set to their PCs. The following is my attempt in matlab:

% Calculate xscores for training set 
[coef, score, latent, explained, mu] = pca(data_train(:,2:end));
xscore_train=score; 
ytrain=zdata_train(:,1);

% Calculate xscores for test set=standardized newX * coef
xscore_test=zdata_test(:,2:end)*coef; ytest=zdata_test(:,1);

Do I compute the xscore_test correctly? Also, the scores returned are not standardized correct?

Thanks a bunch:) Does the dividing of the data by the standard deviation mess it up? Will it lower the energy of the high variance dimensions and increase the energy of the low variance dimensions Lets assume you've 3 features of an image instead of 3 measurements taken of a particular subject. I have 10 images. My training dataset will be 10 x 3; If I use matlab buildin function princomp and get COEFF SCORE LATENT? which one should I use; score also gives me 3 col. Do I need to use first col. only. How to use this number for better interpretation of my results? how to give input to the classifier B'B is 14 times cov(B) because the covariance estimator is 1/n-1* B'B when E(B)=0 ie series are zero mean. since n=15, n-1 =14. mystery solved. very interesting. covariance esitmator link http://www.encyclopediaofmath.org/index.php/Covariance_matrix If you want to get the original data from B , you do not need COEFF at all. COEFF*COEFF'=identity matrix or 1. I think you should point out that V or COEFF are the eigenvectors of Cov(B). It took me a while to figure out. In this case cov(B) is same as corr(B), as they are zscores. However, B*B' is 14*cov(B). why the 14? it is puzzling me. I was thinking that will give the covariance matrix. After applying PCA on original and modified data sets, I want to analyse the new data sets for accuracy and information loss & entropy. Any functions or specific codes for the same?
Nice post; the examples make it all very clear. I recently wrote an article about what PCA actually means. This might be helpful for some, to get a more intuitive understanding: http://www.visiondummy.com/2014/05/feature-extraction-using-pca/
hello please tell me how can use pca for privacy preserving in data mining thank u.
Very good and clear, after reading this can understand PCA.
How to use PCA to unbalanced data ?