Sunday, April 08, 2007

Getting Data Into MATLAB Using textread

Before any analysis can be performed in MATLAB, the data must somehow be imported. MATLAB offers a number of functions for data import (dlmread, fscanf, xlsread, etc.- try help iofun for more information), which the reader should explore. In my work, I tend to work with other software systems (relational databases and statistical packages) which can export data to text files. I have found it convenient for my purposes to convert all categorical data to numeric form (dummy variables or integer codes) in the originating system, and then to dump the data to a tab-delimited text file for consumption by MATLAB.

Below is a typical textread call, from one of my recent projects (Web formatting necessitates breaking this up: it is supposed to be a single line of code):


A = ...
textread('F:\MyFile.dat','',-1,'delimiter','\t', ...
'headerlines',1,'emptyvalue',NaN);


An explanation of each textread parameter being used follows:

A is the variable containing the data after it is loaded from disk.

'F:\MyFile.dat' is the fully qualified name of the file being loaded.

The empty string indicates that no format string is being used.

The -1 indicates that all rows of data are to be loaded. To load some specific fraction of all rows, change this value to the number of rows to be loaded.

All parameters after this point come in pairs.

The 'delimiter' indicates that the data is delimited (as opposed to fixed-width). The '\t' lets textread know that the delimiter is the tab character. Comma-delimited .CSV files, for instance, would use 'delimiter',',' instead.

Next is the parameter pair 'headerlines',1, which tells textread to ignore the first row (it contains column headers).

Last, the parameter pair 'emptyvalue',NaN indicates that any missing values should be represented in the resulting MATLAB matrix as NaN ("not-a-number") values. Other values could be used. Some analysts are accustomed to using values like -99, -9999, etc., although it is generally the convention in MATLAB to use NaN to represent missing values.

I use a separate chunk of code to read in the header line and digest the variable names. Some MATLAB programmers prefer using cell arrays instead. Yet a third possibility of dealing with variable names is to use the categorical arrays and dataset arrays from the Statistics Toolbox (new with MATLAB version 2007a), but I haven't had a chance to explore them yet.

19 comments:

nur azhani said...

hi will,i'm nonie from malaysia.actually i've a problem with the importing file to matlab. i've tried the example attached in matlab,but it didn't work.i hope you can help me to solve this problem.thanks.

my email
sweet_nonie2003@yahoo.com

Will Dwinnell said...

Can you provide any more details? What was the format of the data you're trying to load? What was the exact MATLAB code you ran?

-Will

nur azhani said...

i'm trying to load .txt format data.i want to import email message into MATLAB, then tokenize the string into tokens.the MATLAB code ;

%fid=fopen('email1.txt')

textread('email1.txt', '%s', 'whitespace', '')

str = email1

remain = str;

for k=1:10
[token, remain] = strtok(remain);

token
end

%fclose(fid)

sorry sir for troubling you.i'm a beginner in MATLAB.thanks.

Jake Walter said...

This was great, just what I needed!!! Thank you

Anonymous said...

"I use a separate chunk of code to read in the header line and digest the variable names"

Could you share this code? I would find it useful!

Great blog btw!

Matt said...

Just use textread's 'headerlines' option if you want to skip a certain number of lines at the beginning of the file. Type 'help textread' in Matlab for more info.

Anonymous said...

Hi .. That was a great help to us..
I was probably wondering if there was a feature available in textread where I could get the name of the file from the user and reading from it rather than evrytime getting into the program and editting the filename from which it has to be read !!!!!!

Beimnet said...

Thank you so much -- this is what I had been looking for!!!

Praveen said...

Thank you very much....I got from you that what i am looking for

Praveen said...

Sir by doing so i got output as different rows in a same cell.How to convert the contents of same cell into different columns.

nur azhani said...

hi will..

i really need your help.
do you know how to compare matlab result with an excel database?

for example, i've matlab result;

token =

address
advertisement
act
click
free

and i've excel database with two column,

TOKEN SPAMICITY
add 0.7
address 0.54
break 0.43
duck 0.23
free 0.98
rest 0.65

i want to compare the token in the matlab with the excel database.if the token is existed in the database,it'll return the value of spamicity of that token.

thank you so much for your concern.

Anonymous said...

Hi Will!
I´d like to extract from the header of a text file
% E(x,y,z,f0) in V/m
% norm to 1W/423.3W at 0.064GHz
% Grid: 213x214x162x1
1 2 3 ....

the information [213 214 162].
Could you help me? thx
Sami

Sara Ferdousi said...

can we use textread command to read data from excel?

ToddG said...

Will, I exported data from an Excel 2007 worksheet, into a .txt file. It contains data and time. Here's an example line...
"4:00 PM 22 12 1"

What's the best way to import the data? BTW, I have ~8600 lines of data, and want to plot it in Matlab (x = 4:00 PM; y=22,12,1).

Thanks.

Neela Saikrishnan said...

Thanks for the post, it really helped!

John Ball said...

Hi Will,

Wondering if you can help me! I have data as follows; latitude, longitude and ozone (concentration in ppbv).

Ive managed to plot using command "plot3m" these onto a coastline overlay of my given area. However the data set for ozone is at best sketchy. In that where my platform has not been able to obtain results the data value is automatically set to -99.999.

The plotm3 function however plots this. Is there a way to get matlab to just ignore plotting values lower than zero?

Regards

John

Will Dwinnell said...

I cannot find a plot3m (or plotm3- you spelled it both ways) function in MATLAB, so I cannot experiment. I know that some plotting functions will ignore NaN ("not a number") values, so you might try switching your flag values to NaNs:

>> A

A =

-99.9990 0.6324 0.9575 0.9572
0.9058 0.0975 -99.9990 -99.9990
0.1270 0.2785 0.1576 0.8003
0.9134 0.5469 0.9706 0.1419

>> A(A(:) == -99.999) = NaN

A =

NaN 0.6324 0.9575 0.9572
0.9058 0.0975 NaN NaN
0.1270 0.2785 0.1576 0.8003
0.9134 0.5469 0.9706 0.1419

Another possibility is interpolating the missing values, which may or may not be appropriate, depending on your situation.

bharani said...

hi Will, i am new to data mining and matlab as well, i am doing a project in data mining,my question is how do i start off? i dont know which data mining technique to use?

Anonymous said...

Hi Will, can matlab be used to predict 2 variables at the same time from same data set?