## Saturday, April 21, 2007

### Linear Regression in MATLAB

Fitting a least-squares linear regression is easily accomplished in MATLAB using the backslash operator: '\'. In linear algebra, matrices may by multiplied like this:

output = input * coefficients

The backslash in MATLAB allows the programmer to effectively "divide" the output by the input to get the linear coefficients. This process will be illustrated by the following examples:

Simple Linear Regression

First, some data with a roughly linear relationship is needed:

>> X = [1 2 4 5 7 9 11 13 14 16]'; Y = [101 105 109 112 117 116 122 123 129 130]';

"Divide" using MATLAB's backslash operator to regress without an intercept:

>> B = X \ Y

B =

10.8900

Append a column of ones before dividing to include an intercept:

>> B = [ones(length(X),1) X] \ Y

B =

101.3021
1.8412

In this case, the first number is the intercept and the second is the coefficient.

Multiple Linear Regression

The following generates a matrix of 1000 observations of 5 random input variables:

>> X = rand(1e3,5);

Next, the true coefficients are defined (which wouldn't be known in a real problem). As is conventional, the intercept term is the first element of the coefficient vector. The problem at hand is to approximate these coefficients, knowing only the input and output data:

>> BTrue = [-1 2 -3 4 -5 6]';

Multiply the matrices to get the output data.

>> Y = BTrue(1) + X * BTrue(2:end);

As before, append a column of ones and use the backslash operator:

>> B = [ones(size(X,1),1) X] \ Y

B =

-1.0000
2.0000
-3.0000
4.0000
-5.0000
6.0000

Again, the first element in the coefficient vector is the intercept. Note that, oh so conveniently, the discovered coefficients match the designed ones exactly, since this data set is completely noise-free.

Model Recall

Executing linear models is a simple matter of matrix multiplication, but there is an efficiency issue. One might append a column of ones and simply perform the complete matrix multiplication, thus:

>> Z = [ones(size(X,1),1) X] * B;

The above process is inefficient, though, and can be improved by simply multiplying all the other coefficients by the input data matrix and adding the intercept term:

>> Z = B(1) + X * B(2:end);

Regression in the Statistics Toolbox

The MATLAB Statistics Toolbox includes several linear regression functions. Among others, there are:

regress: least squares linear regression and diagnostics

stepwisefit: stepwise linear regression

robustfit: robust (non-least-squares) linear regression and diagnostics

See help stats for more information.

See also:

The May-03-2007 posting, Weighted Regression in MATLAB.

The Oct-23-2007 posting, L-1 Linear Regression.

The Mar-15-2009 posting, Logistic Regression.

#### 19 comments:

Sandro Saitta said...

Hello Will,

This comment is not related to your post but it may be of interest to you. I've just found a new book on amazon about data mining and matlab. Perhaps you already know it.

Will Dwinnell said...

I was not aware of that particular title. It looks like it's time to brush up on my linear algebra!

Anonymous said...

How does one attain simple diagnostic statistics about the multiple regression, such as:
-standard error
-t statistic
-P-value
-confidence interval
-r square
-adjusted r square

in matlab?
These are available in Excel with hte click of a button, but I'm positive Matlab should be way better than Excel :P

Will Dwinnell said...

I never use those statistics, so I do not have any code immediately handy. The regress function in the Statistics Toolbox will generate a number of these diagnostics, and it should not be hard to create one's own calculations using MATLAB's built-in functions, like var.

Anonymous said...

[B,BINT,R,RINT,STATS] = REGRESS(Y,X) returns a vector STATS containing, in
the following order, the R-square statistic, the F statistic and p value
for the full model, and an estimate of the error variance.

Cristiano said...

dear Will Dwinnell,
many compliments for you work on MATLAB.

I'd like to know if there is a solution/workaround form my statistical problem.

I'd like to understand, just as overview, how I can solve this problem: I know yi(outcome) and wi(initial weights) and xi (values) but I don't know the f(x,w)

My function to predict is:

y = f(w1*x1,w2*x2,...,w39*x39)

I'd like to minimize error for predicting yi and find the final weight.

I'm wondering if is a Optimization problem: http://www.mathworks.com/matlabcentral/fileexchange/8553

I'm a statistician and I'd think to solve this example with nonlinear regression with Bound constraint, but I ask you if with Matlab can solve it better.

Any kind of suggestions will be really appreciated.

Thanks in advance

Anonymous said...

Hi,

I am doing multiple regression using matlab with three dependent variables. Matlab is spitting out only 1 p-value or strictly speaking, the F-statistic. How can i get the p-values corresponding to all of the dependent variables?

Thanks.

Game lover said...

Learn how to play the game only at lucian guide

Game lover said...

Enjoy most popular TV shows on uktvnow of 10 countries

Unknown said...

Nice information. Thanks for sharing! pdf expert mac crack

KMSPICO Office said...

Download Latest Version of KMSPICO as KMSPICO 11 .

Unknown said...

Download
uktvnow for pc for PC and enjoy high definition videos

Unknown said...

probuild xayah the most famous champion of the game

EBigo Live said...

Check Out Latest Video Streaming Bigo Live app download.

Umar said...

I am very impressed with your post.Thank you for Sharing! IDM Crack

indian movies said...

All people you watch latest simmbamovie trailer 2018 form this site.

latest movies said...

In this movie two indian biggest superstar download latest thugsofhindustan 2018 work with together.

comedy movie said...

You can watch this movie online trailer here and download latest movie de de Pyar de 2018 free from this site .

Unknown said...

You can also Download 2.0 Robot Movie free and watch the movie in high definition when releases.