Take a couple words, alter them a bit and you've got a CAPTCHA. You've also got an image which is practically unidentifiable by even the most state of the art algorithms. Image analysis is hard, and even a simple task like distinguishing cats from dogs requires a large amount of graduate level mathematics.
Yet incredible progress has been made on these types of problems. One amazing use of machine vision has been for quality control in manufacturing. For an industry that relies heavily on optimizing automated processes, image analysis has demonstrated extremely promising results, as noted by Sight Machine CEO Jon Sobel in a recent WIRED article.
"The new computer vision, liberated from its hardware shackles and empowered by connectivity, unlimited data storage and Big Data-style statistical analysis, is beginning to change the role of vision in manufacturing. Instead of being a reactive tool to detect defects, computer vision is becoming a data collection tool supporting defect prevention initiatives, improving understanding of complex processes, and enabling greater collaboration across entire supply chains in real time."
Jon Sobel; Liberating Machine Vision From the Machines
Note: If you haven't heard of Sight Machine before, go watch the 2 min video on the homepage. Prepare for awesome.
Machine vision makes contributions to the mars rover, analyzing MRIs, detecting structural inefficiency and energy loss in buildings and neighborhoods (check out Essess), and numerous consumer products. If you're going to buy a video game console in the near future, it's more likely than not to have some sort of image tracking mechanism built right in. Google now allows handwritten input for a large portion of its services, and if you haven't spent an evening Google Goggling everything in your apartement, I'd highly recommend it.
Image source: http://googledrive.blogspot.com/2013/10/handwritingindocs.html
Facebook passed 240 billion photos back in 2012. Instagram reached 16 billion in it's three years as a company. Hubble made a million observations in a decade. Images haven't been left out of the recent data boom, and where there's data, there will always be data scientists ready to do something cool with it.
Unfortunately, for me, machine vision brings up a lot of time spent in a CS lab doing battle with MATLAB licenses. You can imagine how thrilled I was to see this on my timeline:
skimage - image processing in Python | http://t.co/hmxlktPXUW— yhat (@YhatHQ) January 21, 2014
Let's read in some images.
import skimage.io as io %matplotlib inline
mandrill = io.imread('mandrill.png') io.imshow(mandrill) io.show() lenna = io.imread('Lenna.png') io.imshow(lenna) io.show()
Lenna and Mandrill
Mandrill and Lenna are two classic specimens used by researchers in image processing.
I could find out less about Mandrill, unfortunately. But according to a thread on google groups, Mandrill comes from a National Geographic that was lying around the lab and was chosen for it's range of colors. Read More
Why Image Processing?
Emphasizing important traits and diluting noisy ones is the backbone of good feature design. In the context of machine vision, this means that image preprocessing plays a huge role. Before extracting features from an image, it's extremely useful to be able to augment it so that aspects which are important to the machine learning task stand out.
scikit-image holds a wide library of image processing algorithms: filters, transforms, point detection. Frankly, it's wonderful that an open source package like this exists.
Doing pretty displays for this blog will require a little
matplotlib customization. The following
pyplot function takes a list of images and displays them side by side.
import matplotlib.pyplot as plt import numpy as np def show_images(images,titles=None): """Display a list of images""" n_ims = len(images) if titles is None: titles = ['(%d)' % i for i in range(1,n_ims + 1)] fig = plt.figure() n = 1 for image,title in zip(images,titles): a = fig.add_subplot(1,n_ims,n) # Make subplot if image.ndim == 2: # Is image grayscale? plt.gray() # Only place in this blog you can't replace 'gray' with 'grey' plt.imshow(image) a.set_title(title) n += 1 fig.set_size_inches(np.array(fig.get_size_inches()) * n_ims) plt.show()
There are a lot of conventions with which to store colored images in computer memory, but the particular image I've imported uses the common RGB color model, where each pixel holds intensity values for red, green, and blue. In Python images are just
numpy.ndarrays. And though Python normally only allows for a handful of numeric types, images use NumPy's wide array of data types to store each color in a 8 bit unsigned integer. Practically, this means that within the RGB convention, color values are restricted to the range 0 to 255 (\(2^0\) to \(2^8 - 1\)).
Because three color values need to be stored, color images require more than just a \(y\) and \(x\) dimension. Since a third dimension is added for color, to access a particular pixel's value for a RGB image, the following convention is used:
image[ #ycoordinate , #xcoordinate , #red/green/blue ]
# Create an image (10 x 10 pixels) rgb_image = np.zeros(shape=(10,10,3),dtype=np.uint8) # <- unsigned 8 bit int rgb_image[:,:,0] = 255 # Set red value for all pixels rgb_image[:,:,1] = 255 # Set green value for all pixels rgb_image[:,:,2] = 0 # Set blue value for all pixels show_images(images=[rgb_image],titles=["Red plus Green equals..."])
Remember, there's no yellow coming from your computer monitor.
NumPy indexing makes separating out each color layer of an image easy. The slicing operator '
:' can be used to access all values of a specified dimension while a
tuple is used to requests a subset. In order to view a particular color in isolation, I can set the other two values to 0 with these conventions.
red, green, blue = image.copy(), image.copy(), image.copy()
red[:,:,(1,2)] = 0 green[:,:,(0,2)] = 0 blue[:,:,(0,1)] = 0
show_images(images=[red, green, blue], titles=['Red Intensity', 'Green Intensity', 'Blue Intensity']) print 'Note: lighter areas correspond to higher intensities\n'
Note: Lighter areas correspond to higher intensities
If you've been wondering what's wrong with my axes, image conventions dictate that the coordinate \((0,0)\) is in the top left of the image, meaning the \(y\)-axis is reversed from "normal" (blasphemous, I know).
Most image processing algorithms assume a two dimensional matrix, not an image with the third dimension of color. To bring the image into two dimensions, we need to summarize the three colors into a single value. This process is more commonly know as grayscaling, where the resulting image only holds different intensities of gray. The
color has the function to do this (even with your prefered spelling of 'gray'/'grey'). The user needn't worry about how to weight each color when producing gray, there's already a standard for this conversion.
from skimage.color import rgb2gray gray_image = rgb2gray(image) show_images(images=[image,gray_image], titles=["Color","Grayscale"]) print "Colored image shape:\n", image.shape print "Grayscale image shape:\n", gray_image.shape