Take a couple words, alter them a bit and you've got a CAPTCHA. You've also got an image which is practically unidentifiable by even the most state of the art algorithms. Image analysis is hard, and even a simple task like distinguishing cats from dogs requires a large amount of graduate level mathematics.
Yet incredible progress has been made on these types of problems. One amazing use of machine vision has been for quality control in manufacturing. For an industry that relies heavily on optimizing automated processes, image analysis has demonstrated extremely promising results, as noted by Sight Machine CEO Jon Sobel in a recent WIRED article.
"The new computer vision, liberated from its hardware shackles and empowered by connectivity, unlimited data storage and Big Data-style statistical analysis, is beginning to change the role of vision in manufacturing. Instead of being a reactive tool to detect defects, computer vision is becoming a data collection tool supporting defect prevention initiatives, improving understanding of complex processes, and enabling greater collaboration across entire supply chains in real time."
Jon Sobel; Liberating Machine Vision From the Machines
Note: If you haven't heard of Sight Machine before, go watch the 2 min video on the homepage. Prepare for awesome.
Machine vision makes contributions to the mars rover, analyzing MRIs, detecting structural inefficiency and energy loss in buildings and neighborhoods (check out Essess), and numerous consumer products. If you're going to buy a video game console in the near future, it's more likely than not to have some sort of image tracking mechanism built right in. Google now allows handwritten input for a large portion of its services, and if you haven't spent an evening Google Goggling everything in your apartment, I'd highly recommend it.
Image source: http://googledrive.blogspot.com/2013/10/handwritingindocs.html
Facebook passed 240 billion photos back in 2012. Instagram reached 16 billion in it's three years as a company. Hubble made a million observations in a decade. Images haven't been left out of the recent data boom, and where there's data, there will always be data scientists ready to do something cool with it.
Unfortunately, for me, machine vision brings up a lot of time spent in a CS lab doing battle with MATLAB licenses. You can imagine how thrilled I was to see this on my timeline:
skimage - image processing in Python | http://t.co/hmxlktPXUW— yhat (@YhatHQ) January 21, 2014
Let's read in some images.
import skimage.io as io %matplotlib inline
mandrill = io.imread('mandrill.png') io.imshow(mandrill) io.show() lenna = io.imread('Lenna.png') io.imshow(lenna) io.show()
Lenna and Mandrill
Mandrill and Lenna are two classic specimens used by researchers in image processing.
I could find out less about Mandrill, unfortunately. But according to a thread on google groups, Mandrill comes from a National Geographic that was lying around the lab and was chosen for it's range of colors. Read More
Why Image Processing?
Emphasizing important traits and diluting noisy ones is the backbone of good feature design. In the context of machine vision, this means that image preprocessing plays a huge role. Before extracting features from an image, it's extremely useful to be able to augment it so that aspects which are important to the machine learning task stand out.
scikit-image holds a wide library of image processing algorithms: filters, transforms, point detection. Frankly, it's wonderful that an open source package like this exists.
Doing pretty displays for this blog will require a little
matplotlib customization. The following
pyplot function takes a list of images and displays them side by side.
import matplotlib.pyplot as plt import numpy as np def show_images(images,titles=None): """Display a list of images""" n_ims = len(images) if titles is None: titles = ['(%d)' % i for i in range(1,n_ims + 1)] fig = plt.figure() n = 1 for image,title in zip(images,titles): a = fig.add_subplot(1,n_ims,n) # Make subplot if image.ndim == 2: # Is image grayscale? plt.gray() # Only place in this blog you can't replace 'gray' with 'grey' plt.imshow(image) a.set_title(title) n += 1 fig.set_size_inches(np.array(fig.get_size_inches()) * n_ims) plt.show()
There are a lot of conventions with which to store colored images in computer memory, but the particular image I've imported uses the common RGB color model, where each pixel holds intensity values for red, green, and blue. In Python images are just
numpy.ndarrays. And though Python normally only allows for a handful of numeric types, images use NumPy's wide array of data types to store each color in a 8 bit unsigned integer. Practically, this means that within the RGB convention, color values are restricted to the range 0 to 255 (\(2^0\) to \(2^8 - 1\)).
Because three color values need to be stored, color images require more than just a \(y\) and \(x\) dimension. Since a third dimension is added for color, to access a particular pixel's value for a RGB image, the following convention is used:
image[ #ycoordinate , #xcoordinate , #red/green/blue ]
# Create an image (10 x 10 pixels) rgb_image = np.zeros(shape=(10,10,3),dtype=np.uint8) # <- unsigned 8 bit int rgb_image[:,:,0] = 255 # Set red value for all pixels rgb_image[:,:,1] = 255 # Set green value for all pixels rgb_image[:,:,2] = 0 # Set blue value for all pixels show_images(images=[rgb_image],titles=["Red plus Green equals..."])
Remember, there's no yellow coming from your computer monitor.
NumPy indexing makes separating out each color layer of an image easy. The slicing operator '
:' can be used to access all values of a specified dimension while a
tuple is used to requests a subset. In order to view a particular color in isolation, I can set the other two values to 0 with these conventions.
red, green, blue = image.copy(), image.copy(), image.copy()
red[:,:,(1,2)] = 0 green[:,:,(0,2)] = 0 blue[:,:,(0,1)] = 0
show_images(images=[red, green, blue], titles=['Red Intensity', 'Green Intensity', 'Blue Intensity']) print 'Note: lighter areas correspond to higher intensities\n'
Note: Lighter areas correspond to higher intensities
If you've been wondering what's wrong with my axes, image conventions dictate that the coordinate \((0,0)\) is in the top left of the image, meaning the \(y\)-axis is reversed from "normal" (blasphemous, I know).
Most image processing algorithms assume a two dimensional matrix, not an image with the third dimension of color. To bring the image into two dimensions, we need to summarize the three colors into a single value. This process is more commonly know as grayscaling, where the resulting image only holds different intensities of gray. The
color has the function to do this (even with your preferred spelling of 'gray'/'grey'). The user needn't worry about how to weight each color when producing gray, there's already a standard for this conversion.
from skimage.color import rgb2gray gray_image = rgb2gray(image) show_images(images=[image,gray_image], titles=["Color","Grayscale"]) print "Colored image shape:\n", image.shape print "Grayscale image shape:\n", gray_image.shape
Colored image shape: (480, 480, 3) Grayscale image shape: (480, 480)
As demonstrated, grayscaling doesn't just alter the color scheme, it fully reduces the dimensionality of the image. So with grayscale I can access a pixel value just by specifying an \((x,y)\) coordinate. Now that the image has been reduced to 2D there are a number of transformations and filters which
Each pixel of a grayscale image holds a float value between \(0\) and \(1\). However, this doesn't mean that a given image will use that entire spectrum, and the distribution will most commonly clump around specific values.
Histogram equalization takes a grayscale image and attempts to distribute intensity more evenly along the range of possible values. Pixels still rank the same (a coordinate with a higher value than another will still have a higher value after the transform), but the image as a whole becomes far more contrasted and normalized.
from skimage.exposure import equalize_hist equalized_image = equalize_hist(gray_image) show_images(images=[gray_image,equalized_image], titles=["Grayscale","Histogram Equalized"])
Drawing the density plots show that the original grayscale image (green density plot) only used intensities between ~\(0.05\) and ~\(0.8\). That's a whole lot of the possible values not being use. The resulting image (orange density plot) is also distributed far more evenly, causing pixels which started quite close in intensity, which were therefore relatively hard to differentiate, to be more distinctly separated.
import pandas as pd from ggplot import * ggplot(pd.DataFrame(),aes(fill=True,alpha=0.5)) + \ geom_density(aes(x=gray_image.flatten()),color='green') + \ geom_density(aes(x=equalized_image.flatten()),color='orange') + \ ggtitle("Histogram Equalization Process\n(From Green to Orange)") + \ xlab("pixel intensity") + \ ylab("density")
Binarizing and Blurring
Sometimes, it helps to simplify an image even further than grayscale by binarizing it. Binarization results in each pixel hold only one of two values (more commonly recognized as a pure black and white image). The object is to separate the foreground from the background to make feature generation even easier. A very simple way of doing this is to just choose a threshold, putting every pixel above that value to 1, and every pixel below it to 0. For this demonstration, I've chosen the mean of the grayscale image. So every pixel above the mean is set to white and those below are set to black. Since
skimage images are just multi dimensional
numpy arrays, a
np.where statement is enough to accomplish this goal.
More complex binarization methods put far more thought into this process than I'm exercising here. For instance, the threshold can be set to a local minimum somewhere near the center of pixel distribution. This makes the splitting a little less arbitrary. Differences in illumination can also confuse a thresholding method, because illuminated foreground objects will hold very different pixel values than those that are not. In those cases, it may be more appropriate to consider local subimages and binarize there rather than interpreting the whole image at once.
binary_image = np.where(gray_image > np.mean(gray_image),1.0,0.0) show_images(images=[gray_image,binary_image], titles=["Grayscale","Binary"])
An issue for the image above is that the binary may capture more detail than is helpful. For instance if the objective is to identify prominent features of the face, the position of every whisker isn't necessary. Blurring the image seems like a reasonable alternative. Scikit-image's Gaussian filter takes a weighted average of surrounding coordinates so individual pixels incorporate local intensities into their own. This produces a pretty recognizable blur effect. Raising the
sigma argument will use a wider "window" of surrounding pixels, resulting in greater blur.
The equalized image (rather than the pure grayscale) has been used to maintain a high level of contrast through the filtering.
from skimage.filter import gaussian_filter blurred_image = gaussian_filter(equalized_image,sigma=3) really_blurred_image = gaussian_filter(equalized_image,sigma=6) show_images(images=[equalized_image,blurred_image,really_blurred_image], titles=["Equalized","3 Sigma Blur","6 Sigma Blur"])
Ultimately, images have to be summarized somehow before feeding them to a machine learning algorithm. Many strategies use point of interest detection, where an algorithm is able to say "Hey there's something special here!" A commonly used flavor of these methods is corner detection. Scikit-image has a whole bunch of algorithms, but I've chosen the Harris method for its simplicity. The Harris algorithm itself doesn't produce corners, but returns a response matrix holding confidence values (the higher the value, the more likely that coordinate is a corner). The
corner_peaks() function then turns those values into a list of coordinates.
pyplot function is necessary to plot corner coordinates on top of an image.
from skimage.feature import corner_harris,corner_peaks # More pyplot! def show_corners(corners,image,title=None): """Display a list of corners overlapping an image""" fig = plt.figure() plt.imshow(image) # Convert coordinates to x and y lists y_corner,x_corner = zip(*corners) plt.plot(x_corner,y_corner,'o') # Plot corners if title: plt.title(title) plt.xlim(0,image.shape) plt.ylim(image.shape,0) # Images use weird axes fig.set_size_inches(np.array(fig.get_size_inches()) * 1.5) plt.show() print "Number of corners:",len(corners) # Make a checker board checkers = np.zeros((100,100),dtype=np.bool) ind = np.arange(100).reshape((10,10))[::2].flatten() checkers[ind,:] = True checkers[:,ind] = np.invert(checkers[:,ind]) checkers = np.where(checkers,1.,0.) # Run Harris checkers_corners = corner_peaks(corner_harris(checkers),min_distance=2) show_corners(checkers_corners,checkers)
Number of corners: 81
As one would hope, corner detection works well with simple examples like the one above. How does the algorithm do with the original image?
corners = corner_peaks(corner_harris(image),min_distance=2) show_corners(corners,image, title="Harris Corner Algorithm")
Number of corners: 108
For certain tasks like image alignment, its best when corner detection picks up prominent and constant meta features like the eyes or noise. However, when running Harris on the original image many of the hairs and whiskers were (rightly) detected as corners. In this case less detail is better, and running Harris on the blurred image seems like a reasonable alternative.
blurred_corners = corner_peaks(corner_harris(blurred_image),min_distance=7) show_corners(blurred_corners,blurred_image, title="Harris Corner Algorithm on Binary Blurred")
Number of corners: 85
Overlapping the original image with the corners detected in the blurred one, it becomes clear that Harris has guessed many more corners near prominent features.
show_corners(blurred_corners,image, title="Blurred Corners Overlapping Original Image")
Number of corners: 85
How The Harris Algorithm Works
Okay, so what just happened?
The Harris algorithm operates by considering gradients. Gradient images display information about the rate of change from one pixel to the next in either the horizontal or vertical directions. So if a pixel has a much higher intensity than the pixel before it, that pixel will have a large gradient value. The Sobel filter is a good way of producing two gradients from a grayscale image.
from skimage.filter import hsobel,vsobel h_gradient = hsobel(gray_image) v_gradient = vsobel(gray_image) show_images(images=[gray_image,v_gradient,h_gradient], titles=["Grayscale","Vertical Gradient","Horizontal Gradient"])
If a high gradient means a high rate of change in pixel intensity, we can consider the three following situations for any given \((x,y)\) coordinate. 1. Low gradient level in both the horizontal and vertical direction. 2. High gradient level in either the horizontal or vertical direction but low in the other. 3. High gradient level in both the horizontal and vertical direction.
For situation (1) it's likely there's nothing of interest at that particular coordinate; the pixel is the same as the ones around it! (2) probably indicates an edge because the gradient corresponds to a lot of change in one direction but not the other. Therefore (3) would mean the meeting of two edges, which certainly sounds like a corner.
Alas, though the Harris corner algorithm is one of the simplest corner detectors, it uses quite complicated linear algebra to achieve this. Though I'd love to show off my \(\LaTeX\) skills, instead what I'll do is pick coordinates with large values from each axis gradient and see which ones appear in both images. Just to be clear: this is an example for intuition, a pretty atrocious approximation, and if one of my old professors comes asking you didn't hear this from me.
top_n = 4000 # Take coordinates of top n "largest" derivative values from each image h_large = zip(*np.where(h_gradient > np.sort(h_gradient.flatten())[-top_n])) v_large = zip(*np.where(v_gradient > np.sort(v_gradient.flatten())[-top_n])) # Intersection of coordinates estimated_corners = list(set(h_large) & set(v_large)) show_corners(estimated_corners,gray_image)
Number of corners: 161
Once these point of interests have been generated, they are used very differently depending on the task. For aligning two images or stitching several together (like in a panorama) simple least squares optimization of the points can be sufficient. For image classification, more advanced summarizations are used. Though I won't be going over them in this blog, there are several links below if you're feeling ambitious!