Fitting lines to data is a fundamental part of data mining and inferential statistics. Many more complicated schemes use line-fitting as a foundation, and least-squares linear regression has, for years, been the workhorse technique of the field. Least-squares linear regression fits a line (or plane, hyperplane, etc.) with the minimum possible squared error. I explained the execution of least-squares linear regression in MATLAB in my Apr-21-2007 posting, Linear Regression in MATLAB.
Why least squares?
Least-squares offers a number of esoteric technical strengths, but many students of statistics wonder: "Why least-squares?" The simplest (and most superficial) answer is: "Squaring the errors makes them all positive, so that errors with conflicting signs do not cancel each other out in sums or means". While this is true, squaring should seem an odd way to go about this when taking the absolute values of the errors (simply ignoring the signs) is much more straightforward.
Taking the absolute values of the errors (instead of their squares) leads to an alternative regression procedure, known as least absolute errors regression or L-1 linear regression. Like least-squares linear regression, L-1 linear regression fits a line to the supplied data points. Taking the absolute values seems simpler, so why not use L-1 regression? For that matter, why is lest-squares regression so popular, given the availability of seemingly more natural alternative?
Despite the fact that L-1 regression was developed decades before least squares regression, least-squares regression is much more widely used today. Though L-1 regression has a few quirks, they are not what is holding it back. The secret real reason that least squares is favored, which your stats professor never told you is:
Least-squares makes the calculus behind the fitting process extremely easy!
That's it. Statisticians will give all manner of rationalizations, but the real reason least-squares regression is in vogue, is that it is extremely easy to calculate.
L-1 Regression
There are several ways to perform the L-1 regression, and all of them involve more computation than any of the least-squares procedures. Thankfully, we live in an age in which mechanical computation is plentiful and cheap! Also thankfully, I have written an L-1 regression routine in MATLAB, called L1LinearRegression.
L1LinearRegression assumes that an intercept term is to be included and takes two parameters: the independent variables (a matrix whose columns represent the independent variables) and the dependent variable (in a column vector).
L-1 regression is less affected by large errors than least squares regression. The following graph depicts this behavior (click to enlarge):
This example intentionally demonstrates least-squares' slavish chasing of distant data points, but the effect is very real. The biggest drawback of L-1 regression is that it takes longer to run. Unless there are many such regressions to perform, execution time is a small matter, which gets smaller every year that computers get faster. L1LinearRegression runs in about 10 seconds for 100,000 observations with 10 predictors on fast PC hardware.
References
Alternative Methods of Regression, by Birkes and Dodge (ISBN-13: 978-0471568810)
Least absolute deviation estimation of linear econometric models: A literature review, by Dasgupta and Mishra (Jun-2004)
See also
L1LinearRession Code Update (Mar-27-2009)