Friday, January 26, 2007

Pixel Classification Project

Being interested in both machine learning and image processing, I built a pixel-level classifier, on a lark, whose output is the probability that any given pixel was from the class "foliage". The project, in summary, followed these steps:

Pixel Classification Project Steps
1. Collect images, each containing pixels from only one class of interest
2. Extract samples (small windows surrounding pixels of interest) from images
3. Calculate derived features
4. Train classifier to distinguish between "foliage" and "not foliage" classes
5. Apply learned classifier to test images containing pixels from both classes

The salient details of each step follow:

1. Data Acquisition
Thirty-five images of each class ("foliage" and "non-foliage") were acquired. All training images were downloaded from the World Wide Web, after being located by AllTheWeb (Pictures). All images needed to be of at least moderate resolution (about 640x480) to provide enough information to render accurate classifications.

For the "foliage" class, search terms such as "foliage", "leaves" and "grass" were used. Images in this class were screened visually to include images which contained foliage, and only foliage, meaning leaves, plant stalks and stems. Images containing any extraneous elements (colorful flowers, pets, children, wires, rocks, etc.) were excluded.

For the "non-foliage" class, arbitrary search terms were employed, which would likely find things other than plants, like "nail", "dancer", "hallway", etc. Images in this class were visually screened to include anything but foliage. Indoor scenes containing small potted plants, for instance, were excluded.

It would also have been possible to utilize training images with mixtures of "foliage" and "non-foliage" pixels instead, but this would have required determining pixel class (at least for homogeneous regions within images) by hand. I did try this in a similar, earlier experiment, and I will report that: 1. manual identification of pixels can be time-consuming; 2. region-by-region classing complicates the next step, sampling; and 3. I suspect that this approach is less accurate for identification of pixels near the edge of a homogeneous region (which are much harder to distinguish).

2. Data Sampling
One thousand samples were drawn from each image, for a grand total of 70,000 examples (= 2 classes x 35 images each x 1000 samples per image). Each sample was composed of the color information from a small window surrounding the pixel of interest at a random location within each image. The pixel at the center of the window was considered to be from "foliage" class, if it came from the "foliage" set of images, and "non-foliage" if it came from the "non-foliage" set of images.

3. Derived Features
In addition to the raw red, green and blue values, my program calculated several derived features from the image data. In all, 11 features were used:

1. red
2. green
3. blue
4. hue
5. saturation
6. value
7. hue2
8. edge detector: 5x5
9. edge detector: 9x9
10. edge detector: 13x13
11. edge detector: 21x21

Hue, saturation and value, taken together, are another way of representing colors and are easily calculated using MATLAB's rgb2hsv function. The "hue2" variable is a fuzzy function of the hue variable, using a curved plateau function (see my posting of Nov-16-2006, Fuzzy Logic In MATLAB Part 1), hand-tweaked to flag appropriate colors. The edge detectors perform a simple, quick edge detection process over varying window sizes, and are intended to capture texture information at different scales.

There is nothing special about the above list, and indeed the list of possible derived features is limited only by the imagination. The image processing field has delivered a wide array of filters and other such manipulations. Interested readers are urged to examine either of these texts:

Algorithms for Image Processing and Computer Vision, by Parker (ISBN: 0-471-14056-2)

Digital Image Processing, by Gonzalez and Woods (ISBN: 0-201-50803-6)

4. Classifier Construction
My foliage classifier is a logistic regression, only because logistic regression is quick to train, and it was handy, as glmfit in the Statistics Toolbox. Any other machine learning or statistical classifier (linear discriminant, neural network, k-nearest neighbors, etc.) could have been used instead.

As this was just a quick experiment, I didn't bother with rigorous testing, variable selection, etc. Still, results on test images were quite nice (see below), and flaws in the classifier could certainly be addressed through a more thorough and structured effort.

5. Classifier Recall
The finished model was executed on some of my own digital photographs, which contained both "foliage" and "non-foliage" elements. The result appears below.

Overgrown Beams: Original Image
Overgrown Beams: Foliage Detection

Monarch Butterfly: Original Image
Monarch Butterfly: Foliage Detection

Potted Plant: Original Image
Potted Plant: Foliage Detection

I find working with image data to be particularly satisfying since the result is something one can actually look at. Images contain a great deal of data, both in terms of rich structure and in the sheer number of pixels. Even inexpensive digital cameras will deliver several million pixels per image, so consider image processing as a test-bed application for modeling experiments.

This was just a toy project, so it is hardly the last word in foliage detectors and weaknesses in the model should be evident in the images above. I strongly encourage readers to explore this field and improve on what has been presented. Consider training on other classes, like "skin", "people", "sky", "brickface", etc. I would be very interested in hearing from any readers who have results to share. Good luck!

Note on Handling Images in MATLAB
Even without toolboxes, MATLAB provides several tools for dealing with images, such as imread, which is used to load images. Most color images are loaded into MATLAB as 3-dimensional arrays, and are accessed as Image(VerticalCoordinate,HorizontalCoordinate,ColorPlane). The color channels are numbered: red (1), green (2) and blue (3), so Image(:,:,2) is just the green color plane of the image.

Related Work
One interesting work on the subject of pixel classification (skin detection) is Skin Detection, by Jennifer Wortman. While the classifier in this document is based only on color and is constructed by hand, the reader should find some insights on pixel classification.

See Also

Feb-02-2007 posting, Pixel Classificiation Project: Response

Mar-23-2007 posting, Two Bits of Code


physicsman said...

Wow. The mind boggles at the applications of such a "toy".

Question: Am I right in reading that the only required human-judgment input in this detector was: initial choice of photos for training, and the tweaking of your "hue2" function to pick out foliage-appropriate colors?

At the risk of sounding like a conspiracy theorist... you realize, of course, that all of the intelligence agencies would be interested in the automatic detection of objects of interest in photos. The application of human judgment is expensive and time-consuming for such agencies. Given the flood of photographic and audio data available to them, as many tasks as possible are being automated.

It's likely they have processes like this already automated. But it's not every day you realize that you have marketable espionage-related skills. Congrats. I'm sure the men in black are currently sitting in some back room trying to decide whether to recruit you or "silence" you as I write this comment.

physicsman said...

Here's some other applications that come to mind. (You tell me how practicable they are.)

An improvement to the lasso tool in photoshop-like programs for selecting irregularly shaped objects. Typically, you select an area and set a color tolerance and then hope that the tool selects only the object you want for cutting/pasting. Your process could "learn" the object by having the user select an area around the object, including lots of background, and then select an area inside the object.

In a similar vein, green-screen techniques are still widely used in Hollywood. Wouldn't it be nice to be able to use any background? Your process could be trained for the actors and their clothing (or for the background, whichever's easier) and then cut them out.

I wonder if your process could somehow be used to improve on jpeg-like compression of photos. If you could get software to recognize areas, it could apply different compression rules inside as opposed to on the edges of portions of the images, avoiding the fuzzing of sharp edges, but still retaining high compression rates inside those areas.

Coming back to the military-industrial complex, it would be handy for targeting smart weapons. I imagine your process could be trained to seek some sort of target ahead of time. With enough computing speed, it could even be applied in parallel to live visible, infrared, and UV images, making it even more accurate. Someone back at HQ, viewing a live video feed, selects some portion of the target and some portion of the background until the software recognizes the target and tracks it. Then launch.

I guess that leads right back to security. Right now, thousands of properties are being "protected" by cameras with either dozing security guards or no one watching the images. Most of these are B/W cameras, but possibly the cost of color could be recouped by having an automated service that uses your process to detect humans. Motion sensors alone are unreliable outdoors for obvious reasons. Your process would be a nice add-on to limit false-positives, or to give a "likelihood of human intruder" score. Although people wear a variety of clothing, I'm sure your process could be trained to look for the blotchy patterns and the typical colors we tend to wear, as well as skin.

Your process could be trained to look for areas of shades of certain colors and then change them for those who are color-blind. You might find a nice niche market in display software or hardware add-ons for this small but non-negligible proportion of the population. If it were identified as a "handicapped access" issue, you might find your product required to be available in certain kinds of workplaces where users must look at multicolored images in order to do their job.

Here's a silly one: A "do I match today?" website. (it's available.) Users submit a digital photo and your process separates out the fuzzily-defined colors of each piece of clothing and checks a previously defined database of color combo acceptability scores. It could report an acceptability score and then possibly make some suggestions for low scorers. You could even create your database of acceptability scores by asking users at the site to judge from random photos how well outfits match. Here's a variation: it could scan various fashion or celeb sites, looking at the color combos and it could tell you how "in" your color choices are.

I wonder if your method would translate into the audio world at all. I know we've got voice-recognition for voice-to-text purposes, but no one's been able to make it work well enough for security purposes. I wonder if anyone has tried a point-by-point or peak-by-peak look-ahead and look-behind method on the waveform analogous to your look-around for the pixels. In other words, one isn't looking specifically at the frequency spectra of someone's voice (which is what I think everyone's tried), but rather at how waveform follows waveform. It's a shot in the dark, but I wonder if anybody's tried it. The old problem of someone recording your voice and using that could be defeated pretty easily, I would think. If the recording is digital, then it has a sampling rate and I bet you could produce some sort of interference pattern by sampling it as it came in. On the other hand, analog recordings have known noise issues which could be used against them.

fornit said...

Will, thanks you for your blog , which devoted to data mining in matlab! is very interesting question.
for example, for me, very interesting - which open matlab sources for data mining (knowledge discovery?) are exist in inet?

2. what you think about Group Method of Data Handling (GMDH, link: in classification, prediction, forecasting tasks?
can you get open sources of this method in matlab?

3. what you think about ann applications in data mining?

4.are you think, that high-dimension models are decisions for poor (badly) formalised plants (fenomens, objects)?


Will Dwinnell said...

I want to thank everyone who has written in response to this posting, both here in Comments and via e-mail. Quite a variety of items were raised, so I will be responding soon via another posting. Keep the feedback coming!

Naveen Sundar G. said...

Nice work!! I agree that working on images is interesting. Here is an interesting work on car identification(like given an image the code outputs the owner's name)
The interesting thing is they were able to distinguish two almost identical cars.

Naveen Sundar G. said...

I am not able to see the full link in my previous post.

Will Dwinnell said...

Apologies for the absurdly late response to fornit's comments.

For free MATLAB tools and source code for data mining (and machine learning, pattern recognition, etc.), see my Nov-14-2007 posting, Finding MATLAB Source Code And Tools.

I have worked with GMDH outside of MATLAB and gotten good estimation and classification results. I am not aware of any MATLAB GMDH resources. It should be relatively easy to construct a GMDH routine in MATLAB, and I have been meaning to do so for some time.

Regarding neural networks: I think they are like any other tools, with their own particular mix of strengths and weaknesses. It is difficult to characterize "neural networks" as a class too precisely, as there are dozens of distinct architectures which vary widely in behavior. For an exploration of some issues surrounding modling algorithm selection, see Modeling Methodology 3: Algorithm Selection, an article I published in the Mar/Apr, 1998 issue of PC AI magazine.

I don't follow your last question about high-dimensional problems. Could you elaborate?

Anonymous said...

I don't follow your last question about high-dimensional problems

Use SVD (singular value decomposition) which is a built-in function in Matlab for dimensional reduction. Your other options are:

- PCA (principal component analysis)
- ICA (independent component analysis)
- NNMF (Non-negative matrix factorisation)

There are tons of free Matlab codes in those algorithms stated above if you Google.

BTW, this is an excellent blog for Matlab datamining.

Anonymous said...

I found this a very useful introduction to classification - thank you!

Anonymous said...

Respected Sir,
That was really an interesting article, i would like to know how to seperately classify trees, water, ground, houses, people etc. from the same picture using MATLAB
A Researcher

semperhomo said...

That is really nice article,
the same picture using MATLAB - grate education support.

Unknown said...

Hello Will
i just came across this blog of yours. very interesting and intelligent piece of work
i have a rough idea for my Undergraduate project that is to scan an X-Ray and diagnose a fracture or fault in the bones.
you pixel classification project seems to be relevant to my project. if i can use a image classification for picking out cracks and faults in bones and showing them to the user.
as you are the expert, do you think my theory is worth the try?
please feel free to contact me on my given email address, thanx in advance :)

Will Dwinnell said...

SM, I do not see an e-mail address for you, and your Blogger profile is set to 'private'.

Anonymous said...

hello ~ i juz want to ask you what should i do if i want to recognize color of human skin~ i want to get its hsv value for every pixel of the human face ~ and then exlude the eye and mouth part~ i want to get the value as my input to train in neural network~ can u help me~ urgent.. tq~

my email :