tag:blogger.com,1999:blog-37324607.post116303194485518843..comments2023-11-03T08:31:23.698-04:00Comments on Data Mining in MATLAB: Why MATLAB for Data Mining?Will Dwinnellhttp://www.blogger.com/profile/03379859054257561952noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-37324607.post-15661509977109529402015-11-17T16:50:40.312-05:002015-11-17T16:50:40.312-05:00In Matlab 2013 there are is the table primitive, w...In Matlab 2013 there are is the table primitive, which allows joins on multiple keys.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-78824344898611732312008-06-24T13:13:00.000-04:002008-06-24T13:13:00.000-04:00Great function but with a minor problem. It does n...Great function but with a minor problem. It does not return the right results with your given matrices A and B. <BR/><BR/>It returns:<BR/><BR/>C =<BR/><BR/> 1 1 2 99 88 4 6<BR/> 1 2 1 77 88 6 4<BR/> 1 1 2 66 55 4 6<BR/> 4 1 3 44 11 NaN NaN<BR/> 1 2 1 66 22 6 4<BR/> 8 0 8 11 44 NaN NaN<BR/> 4 1 3 22 88 NaN NaN<BR/><BR/>When it should return:<BR/><BR/>C =<BR/><BR/> 1 1 2 99 88 6 4<BR/> 1 2 1 77 88 4 6<BR/> 1 1 2 66 55 6 4<BR/> 4 1 3 44 11 NaN NaN<BR/> 1 2 1 66 22 4 6<BR/> 8 0 8 11 44 NaN NaN<BR/> 4 1 3 22 88 NaN NaN<BR/><BR/>this is because your function assumes that the matched rows in B are ordered from small to large.<BR/><BR/>you may wish to consider the following modifications:<BR/><BR/>[ind,loc] = ismember(...<BR/><BR/>...<BR/><BR/>loc_index = loc(ind);<BR/><BR/>...<BR/><BR/>C(n==loc_index(i),colsToFill) = repmat(vals(i,:),sum(n==loc_index(i)),1);<BR/><BR/><BR/>Rgds,<BR/>Andrew Lim<BR/>JPMorgan<BR/>(andrew.t.lim@jpmorgan.com)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-49300525383576015992007-03-04T01:10:00.000-05:002007-03-04T01:10:00.000-05:00I would have to recommend the combo Python/numpy/s...I would have to recommend the combo Python/numpy/scipy/matplotlib. It's a natural replacement of Matlab, comes with a lot of the basic tools and its grounded on Python which, IMHO, has a beautiful syntax in contrast to Matlab which is a little bit clunky for my taste.<BR/><BR/>In addition, all the additional robust packages (and OS hooks) available for Python make a good candidate to develope a final product, rather than just one time prototyping.Elventearhttps://www.blogger.com/profile/07953245536166455068noreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1165266803122171622006-12-04T16:13:00.000-05:002006-12-04T16:13:00.000-05:00True, but you can put it into a function and then ...True, but you can put it into a function and then forget about it:<BR/><BR/>A = ...<BR/>[1 1 2 99 88;<BR/>1 2 1 77 88;<BR/>1 1 2 66 55;<BR/>4 1 3 44 11;<BR/>1 2 1 66 22;<BR/>8 0 8 11 44;<BR/>4 1 3 22 88;]<BR/><BR/>B = ...<BR/>[1 1 1 7 5;<BR/>1 2 1 4 6;<BR/>1 1 2 6 4;<BR/>2 1 1 0 3;<BR/>4 3 1 9 2;]<BR/><BR/>C = leftjoin(A,1:3,B,[1 3 2])<BR/><BR/><BR/>where<BR/><BR/>function C = leftjoin(A,colIndA,B,colIndB)<BR/><BR/>colIndA = unique(colIndA);<BR/>colIndB = unique(colIndB);<BR/><BR/>C = [A nan(size(A,1),size(B,2)-colIndB(end))];<BR/><BR/>[a, m, n] = unique(A(:,colIndA),'rows');<BR/><BR/>ind = ismember(B(:,colIndB), a, 'rows');<BR/><BR/>vals = B(ind,colIndB(end)+1:end);<BR/><BR/>colsToFill = size(A,2)+1 : size(C,2);<BR/><BR/><BR/>for i = 1:size(vals,1)<BR/> <BR/> C(n==i,colsToFill) = repmat(vals(i,:),sum(n==i),1);<BR/><BR/>endAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1165016909412384342006-12-01T18:48:00.000-05:002006-12-01T18:48:00.000-05:00Thanks! I had thought about something like this, ...Thanks! I had thought about something like this, but give me some time to digest your code. You can understand, though, why I consider this type of thing more cumbersome in MATLAB, than, say, SQL.Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1165015504193113392006-12-01T18:25:00.000-05:002006-12-01T18:25:00.000-05:00Oops, that for loop should have been:for i = 1:siz...Oops, that for loop should have been:<BR/><BR/>for i = 1:size(vals,1)<BR/> <BR/> C(n==i,end-size(vals,2)+1:end) = repmat(vals(i,:),sum(n==i),1);<BR/><BR/>endAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1165015047982298452006-12-01T18:17:00.000-05:002006-12-01T18:17:00.000-05:00Hi Will,This isn't optimal but it might give you s...Hi Will,<BR/><BR/>This isn't optimal but it might give you some things to think about:<BR/><BR/><BR/>A = ... <BR/>[1 1 2 99 88;<BR/>1 2 1 77 88;<BR/>1 1 2 66 55;<BR/>4 1 3 44 11;<BR/>1 2 1 66 22;<BR/>8 0 8 11 44;<BR/>4 1 3 22 88;]<BR/><BR/>B = ...<BR/>[1 1 1 7 5;<BR/>1 1 2 4 6;<BR/>1 2 1 6 4;<BR/>2 1 1 0 3;<BR/>4 1 3 9 2;]<BR/><BR/>numCols = 3;<BR/><BR/>C = [A nan(size(A,1),size(B,2)-numCols)];<BR/><BR/>[a, m, n] = unique(A(:,1:numCols),'rows')<BR/><BR/>ind = ismember(B(:,1:numCols), a, 'rows')<BR/><BR/>vals = B(ind,numCols+1:end);<BR/><BR/><BR/>for i = 1:size(vals,1)<BR/> <BR/> C(n==i,6:7) = repmat(vals(i,:),sum(n==i),1);<BR/><BR/>endAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1164990388239564362006-12-01T11:26:00.000-05:002006-12-01T11:26:00.000-05:00In a relational data base, these tables may be joi...In a relational data base, these tables may be joined by matching up multiple key fields.<BR/><BR/>Let's say that there are two tables of data which we wish to join, A and B:<BR/><BR/>A = <BR/>1 1 2 99 88<BR/>1 2 1 77 88<BR/>1 1 2 66 55<BR/>4 1 3 44 11<BR/>1 2 1 66 22<BR/>8 0 8 11 44<BR/>4 1 3 22 88<BR/><BR/>B =<BR/>1 1 1 7<BR/>1 1 2 4<BR/>1 2 1 6<BR/>2 1 1 0<BR/>4 1 3 9<BR/><BR/>Relational joins are additionally are defined by which table's rows must be included. Let's say that we want all of the rows in A matched up with and matching rows in B, where the values in the first 3 columns match. The result, C, would be:<BR/><BR/>C =<BR/>1 1 2 99 88 4<BR/>1 2 1 77 88 6<BR/>1 1 2 66 55 4<BR/>4 1 3 44 11 9<BR/>1 2 1 66 22 6<BR/>8 0 8 11 44 NaN<BR/>4 1 3 22 88 9<BR/><BR/>Notice: 1. That in the second-to-last row, there is no match in B for the row in A, so a missing value is generated (which I indicated as a NaN, which is the convention in MATLAB). 2. There is no match for the second to last row in B, so its values never appear in the result.<BR/><BR/>Essentially, this is a look-up process, where more than one column must match up.<BR/><BR/>Does this make more sense?Will Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1164645266341581152006-11-27T11:34:00.000-05:002006-11-27T11:34:00.000-05:00Will, I might be being dense but I can't extend yo...Will, I might be being dense but I can't extend your example to what you would like to do. Could you provide another simple example with the desired output? That way I can try to work out the required code, if it is possible.<BR/><BR/>Cheers,<BR/>Eric SampsonAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1164191078042595462006-11-22T05:24:00.000-05:002006-11-22T05:24:00.000-05:00It is very easy to write vectorized MATLAB code to...It is very easy to write vectorized MATLAB code to use a look-up table:<BR/><BR/><I><BR/>>> LookUp = [0.01 0.02 0.05 0.4 0.97]'<BR/><BR/>LookUp =<BR/><BR/> 0.0100<BR/> 0.0200<BR/> 0.0500<BR/> 0.4000<BR/> 0.9700<BR/><BR/>>> MyData = [1 17 100; 3 17 110; 3 16 95; 4 19 89]<BR/><BR/>MyData =<BR/><BR/> 1 17 100<BR/> 3 17 110<BR/> 3 16 95<BR/> 4 19 89<BR/><BR/>>> LookUp(MyData(:,1))<BR/><BR/>ans =<BR/><BR/> 0.0100<BR/> 0.0500<BR/> 0.0500<BR/> 0.4000<BR/></I><BR/><BR/>Relational database joins, however, often involve more than one field to indicate the join. For instance, rows in the two tables being joined might need to match on 4 different columns.<BR/><BR/>If there is an easy and efficient way to do this in base MATLAB, I'd love to hear about it. Honestly, being proven wrong on this would make my week.<BR/><BR/><BR/>-WillWill Dwinnellhttps://www.blogger.com/profile/03379859054257561952noreply@blogger.comtag:blogger.com,1999:blog-37324607.post-1164143250603669792006-11-21T16:07:00.000-05:002006-11-21T16:07:00.000-05:00Hi Will,can you explain more about your comment "T...Hi Will,<BR/><BR/>can you explain more about your comment "The one gap with MATLAB is that it is not very good at relational joins. Look-up tables (even large ones) for tacking on a single variable are fine, but MATLAB is not built to perform SQL-style joins."<BR/><BR/>Regards,<BR/>Eric Sampson<BR/>The MathWorksAnonymousnoreply@blogger.com