Thursday, May 03, 2007

Weighted Regression in MATLAB

Many predictive modeling techniques have weighted counterparts, which permit the analyst to assign weights representing the "importance" of individual observations. An observation with a weight of 8, for instance, is treated in the modeling process as though there were 8 individual observations with the same values. The usual, unweighted algorithms may be thought of as a special case of weighted algorithms, in which the weights of all observations equal 1.0.

There are several reasons for using weighted methods. One is simply that some data sets have been pre-summarized, with identical records being collapsed to a single record having a weight equal to the original number of identical records. Many analysts favor binning of predictor variables, which can drastically reduce the number of distinct combinations of input variable values.

A second reason to use weighting is simple economy of space: data with identical (or very similar) records consolidated with weights representing the number of original observations they represent can be much smaller (even by orders of magnitude!) than the original data.

Another important reason to weight observations is to "fix" class distributions in the data. Assume that the original data contains a million rows of bank loan data, of which only 2% represent bad loans. It is common to sample down the number of good loans, while retaining all of the bad loans. This can save time on learning, but will result in a systematically biased model. A learning system which can accept weights on the observations can correct for this bias.

There are also a number of on-line resources for performing weighted regression in base MATLAB, such as:

Optimization Tips and Tricks, by John D'Errico

The thread linked below records an interesting conversation about weighted linear regression, and some practical issues for implementation in MATLAB:

Weighted regression thread on Usenet

Weighted regression can also be accomplished using the Statistics Toolbox, via functions such as glimfit and nlinfit. See the help facility for these functions, or try wnlsdemo for more information.

The Curve Fitting Toolbox also provides facilities for weighted regression (see: help fitoptions).

See also:

The Apr-21-2007 posting, Linear Regression in MATLAB.

The Oct-23-2007 posting, L-1 Linear Regression.