Tuesday, October 23, 2007

L-1 Linear Regression

Fitting lines to data is a fundamental part of data mining and inferential statistics. Many more complicated schemes use line-fitting as a foundation, and least-squares linear regression has, for years, been the workhorse technique of the field. Least-squares linear regression fits a line (or plane, hyperplane, etc.) with the minimum possible squared error. I explained the execution of least-squares linear regression in MATLAB in my Apr-21-2007 posting, Linear Regression in MATLAB.


Why least squares?

Least-squares offers a number of esoteric technical strengths, but many students of statistics wonder: "Why least-squares?" The simplest (and most superficial) answer is: "Squaring the errors makes them all positive, so that errors with conflicting signs do not cancel each other out in sums or means". While this is true, squaring should seem an odd way to go about this when taking the absolute values of the errors (simply ignoring the signs) is much more straightforward.

Taking the absolute values of the errors (instead of their squares) leads to an alternative regression procedure, known as least absolute errors regression or L-1 linear regression. Like least-squares linear regression, L-1 linear regression fits a line to the supplied data points. Taking the absolute values seems simpler, so why not use L-1 regression? For that matter, why is lest-squares regression so popular, given the availability of seemingly more natural alternative?

Despite the fact that L-1 regression was developed decades before least squares regression, least-squares regression is much more widely used today. Though L-1 regression has a few quirks, they are not what is holding it back. The secret real reason that least squares is favored, which your stats professor never told you is:

Least-squares makes the calculus behind the fitting process extremely easy!

That's it. Statisticians will give all manner of rationalizations, but the real reason least-squares regression is in vogue, is that it is extremely easy to calculate.

L-1 Regression

There are several ways to perform the L-1 regression, and all of them involve more computation than any of the least-squares procedures. Thankfully, we live in an age in which mechanical computation is plentiful and cheap! Also thankfully, I have written an L-1 regression routine in MATLAB, called L1LinearRegression.

L1LinearRegression assumes that an intercept term is to be included and takes two parameters: the independent variables (a matrix whose columns represent the independent variables) and the dependent variable (in a column vector).

L-1 regression is less affected by large errors than least squares regression. The following graph depicts this behavior (click to enlarge):



This example intentionally demonstrates least-squares' slavish chasing of distant data points, but the effect is very real. The biggest drawback of L-1 regression is that it takes longer to run. Unless there are many such regressions to perform, execution time is a small matter, which gets smaller every year that computers get faster. L1LinearRegression runs in about 10 seconds for 100,000 observations with 10 predictors on fast PC hardware.

References
Alternative Methods of Regression, by Birkes and Dodge (ISBN-13: 978-0471568810)

Least absolute deviation estimation of linear econometric models: A literature review, by Dasgupta and Mishra (Jun-2004)

See also
L1LinearRession Code Update (Mar-27-2009)

9 comments:

Anonymous said...

it is indeed true that least squares regression results in a particularly easy computation of the optimal weights of the hyperplane to be learned. However, one should not forget that the least squares solution to regression is also the maximum likelihood solution!

Will Dwinnell said...

I believe (someone correct me if I'm wrong) that the least squares regression is only the maximum likelihood solution when the errors have a Gaussian distribution.

Regardless, the performance function should be defined by the needs of the analysis, and the least absolute error is a logical choice for many problems.

Anonymous said...

The L-1 linear regression solution is the maximum likelihood solution when the errors are Laplace (double exponential) distributed.

Will Dwinnell said...

Thanks, Finn Årup Nielsen!

Unknown said...

Hi All,

I was wondering how do I get such information that the least squares method is the maximum likelihood estimation if the errors are laplace... am startin in this field n I like to understand what am doing.. not just plug data into equations...

Thanks
E.A.

Jim Van Zandt said...

Your L1LinearRegression function finds the L1 solution using a sequence of weighted least squares problems. The more standard approach is to solve a linear programming problem. Could you compare the two methods? E.g. is your method guaranteed to converge? How do run times compare?

Jim Van Zandt said...

Okay, I've found several iterative algorithms, for example in the survey by Dasgupta et al. http://mpra.ub.uni-muenchen.de/1781/1/MPRA_paper_1781.pdf. The 2004 paper by Li and Arse http://downloads.hindawi.com/journals/asp/2004/948982.pdf points out that the naive algorithm can get stuck in a local minimum and suggests improvements. Several methods are compared by Kuzmanovic et al. http://journal.aplimat.com/volume_2_2009/.../Kuzmanovic_Sabo.pdf

Song Sn said...
This comment has been removed by the author.
Song Sn said...

Hello,
I really much want to know how does those function works with 3D data points??