*dlmread*,

*fscanf*,

*xlsread*, etc.- try

*help iofun*for more information), which the reader should explore. In my work, I tend to work with other software systems (relational databases and statistical packages) which can export data to text files. I have found it convenient for my purposes to convert all categorical data to numeric form (dummy variables or integer codes) in the originating system, and then to dump the data to a tab-delimited text file for consumption by MATLAB.

Below is a typical

*textread*call, from one of my recent projects (Web formatting necessitates breaking this up: it is supposed to be a single line of code):

A = ...

textread('F:\MyFile.dat','',-1,'delimiter','\t', ...

'headerlines',1,'emptyvalue',NaN);

A = ...

textread('F:\MyFile.dat','',-1,'delimiter','\t', ...

'headerlines',1,'emptyvalue',NaN);

An explanation of each

*textread*parameter being used follows:

*is the variable containing the data after it is loaded from disk.*

**A***is the fully qualified name of the file being loaded.*

**'F:\MyFile.dat'**The empty string indicates that no format string is being used.

The

*indicates that all rows of data are to be loaded. To load some specific fraction of all rows, change this value to the number of rows to be loaded.*

**-1**All parameters after this point come in pairs.

The

*indicates that the data is delimited (as opposed to fixed-width). The*

**'delimiter'***lets*

**'\t'***textread*know that the delimiter is the tab character. Comma-delimited .CSV files, for instance, would use

*'delimiter',','*instead.

Next is the parameter pair

*, which tells*

**'headerlines',1***textread*to ignore the first row (it contains column headers).

Last, the parameter pair

*indicates that any missing values should be represented in the resulting MATLAB matrix as NaN ("not-a-number") values. Other values could be used. Some analysts are accustomed to using values like -99, -9999, etc., although it is generally the convention in MATLAB to use NaN to represent missing values.*

**'emptyvalue',NaN**I use a separate chunk of code to read in the header line and digest the variable names. Some MATLAB programmers prefer using cell arrays instead. Yet a third possibility of dealing with variable names is to use the

*categorical arrays*and

*dataset arrays*from the Statistics Toolbox (new with MATLAB version 2007a), but I haven't had a chance to explore them yet.

## 19 comments:

hi will,i'm nonie from malaysia.actually i've a problem with the importing file to matlab. i've tried the example attached in matlab,but it didn't work.i hope you can help me to solve this problem.thanks.

my email

sweet_nonie2003@yahoo.com

Can you provide any more details? What was the format of the data you're trying to load? What was the exact MATLAB code you ran?

-Will

i'm trying to load .txt format data.i want to import email message into MATLAB, then tokenize the string into tokens.the MATLAB code ;

%fid=fopen('email1.txt')

textread('email1.txt', '%s', 'whitespace', '')

str = email1

remain = str;

for k=1:10

[token, remain] = strtok(remain);

token

end

%fclose(fid)

sorry sir for troubling you.i'm a beginner in MATLAB.thanks.

This was great, just what I needed!!! Thank you

"I use a separate chunk of code to read in the header line and digest the variable names"

Could you share this code? I would find it useful!

Great blog btw!

Just use textread's 'headerlines' option if you want to skip a certain number of lines at the beginning of the file. Type 'help textread' in Matlab for more info.

Hi .. That was a great help to us..

I was probably wondering if there was a feature available in textread where I could get the name of the file from the user and reading from it rather than evrytime getting into the program and editting the filename from which it has to be read !!!!!!

Thank you so much -- this is what I had been looking for!!!

Thank you very much....I got from you that what i am looking for

Sir by doing so i got output as different rows in a same cell.How to convert the contents of same cell into different columns.

hi will..

i really need your help.

do you know how to compare matlab result with an excel database?

for example, i've matlab result;

token =

address

advertisement

act

click

free

and i've excel database with two column,

TOKEN SPAMICITY

add 0.7

address 0.54

break 0.43

duck 0.23

free 0.98

rest 0.65

i want to compare the token in the matlab with the excel database.if the token is existed in the database,it'll return the value of spamicity of that token.

thank you so much for your concern.

Hi Will!

I´d like to extract from the header of a text file

% E(x,y,z,f0) in V/m

% norm to 1W/423.3W at 0.064GHz

% Grid: 213x214x162x1

1 2 3 ....

the information [213 214 162].

Could you help me? thx

Sami

can we use textread command to read data from excel?

Will, I exported data from an Excel 2007 worksheet, into a .txt file. It contains data and time. Here's an example line...

"4:00 PM 22 12 1"

What's the best way to import the data? BTW, I have ~8600 lines of data, and want to plot it in Matlab (x = 4:00 PM; y=22,12,1).

Thanks.

Thanks for the post, it really helped!

Hi Will,

Wondering if you can help me! I have data as follows; latitude, longitude and ozone (concentration in ppbv).

Ive managed to plot using command "plot3m" these onto a coastline overlay of my given area. However the data set for ozone is at best sketchy. In that where my platform has not been able to obtain results the data value is automatically set to -99.999.

The plotm3 function however plots this. Is there a way to get matlab to just ignore plotting values lower than zero?

Regards

John

I cannot find a plot3m (or plotm3- you spelled it both ways) function in MATLAB, so I cannot experiment. I know that some plotting functions will ignore NaN ("not a number") values, so you might try switching your flag values to NaNs:

>> A

A =

-99.9990 0.6324 0.9575 0.9572

0.9058 0.0975 -99.9990 -99.9990

0.1270 0.2785 0.1576 0.8003

0.9134 0.5469 0.9706 0.1419

>> A(A(:) == -99.999) = NaN

A =

NaN 0.6324 0.9575 0.9572

0.9058 0.0975 NaN NaN

0.1270 0.2785 0.1576 0.8003

0.9134 0.5469 0.9706 0.1419

Another possibility is interpolating the missing values, which may or may not be appropriate, depending on your situation.

hi Will, i am new to data mining and matlab as well, i am doing a project in data mining,my question is how do i start off? i dont know which data mining technique to use?

Hi Will, can matlab be used to predict 2 variables at the same time from same data set?

Post a Comment