This posting covers basic summary statistics in MATLAB.
First, note that MATLAB has a strong array-orientation, so data sets to be analyzed are most often stored as a matrix of values. Note that the convention in MATLAB is for variables to be stored in columns, and observations to be stored in rows. This is not a hard-and-fast rule, but it is much more common than the alternative (variables in rows, observations in columns). Besides, most MATLAB routines (whether from the MathWorks or elsewhere) assume this convention.
Basic summaries are easy to obtain from MATLAB. For the examples below, the following matrix of data, A, will be used (No, it's not very exciting, but it will do for our purposes):
>> A = [1 2 3 4; -1 10 8 5; 9 8 7 0; 0 0 0 1]
1 2 3 4
-1 10 8 5
9 8 7 0
0 0 0 1
MATLAB matrices are indexed as: MatrixName(row,column):
Common statistical summaries are available in MATLAB, such as: mean (arithmetic mean), median (median), min (minimum value), max (maximum value) and std (standard deviation). Their use is illustrated below:
2.2500 5.0000 4.5000 2.5000
0.5000 5.0000 5.0000 2.5000
-1 0 0 0
9 10 8 5
4.5735 4.7610 3.6968 2.3805
Note that each of these functions operate along the columns, yielding one summary for each, stored in a row vector. Sometimes it is desired to calculate along the rows instead. Some routines can be redirected by another parameter, like this:
The above calculates the arithmetic means of each row, storing them in a column vector. The second mean parameter, if it is specified, indicates the dimension along which mean is to operate.
For routines without this capability, the data matrix may be transposed (rows become columns and columns become rows) using the apostrophe operator while feeding it to the function:
2.5000 5.5000 6.0000 0.2500
Note that, this time, the result is stored in a row vector.
The colon operator, :, can be used to dump all of the contents of an array into one giant column vector. The result of this operation can then be fed to any of our summary routines:
The reader will find more information on summary routines in base MATLAB through:
The MATLAB Statistics Toolbox
MATLAB users lucky enough to own the Statistics Toolbox will have available still more summaries, such as iqr (inter-quartile range), trimmean (trimmed mean) and geomean (geometric mean). Also, there are extended versions of several summary functions, such as nanmean and nanmax, which will ignore NaN (IEEE floating point "not-a-number") values, which are commonly used to represent missing values in MATLAB.
To learn more, see the "Descriptive Statistics" section when using: