Convolutions¶
Last time, we learned about how to represent images in Python with numpy
. In this lesson, we'll learn about convolutions, specifically on image data, and how to implement convolutions. By the end of this lesson, students will be able to:
- Define convolutions in terms of the sliding window algorithm.
- Recognize common kernels for image convolution.
- Apply convolutions to recognize objects in an image.
import imageio.v3 as iio
import matplotlib.pyplot as plt
import numpy as np
def compare(*images):
"""Display one or more images side by side."""
_, axs = plt.subplots(1, len(images), figsize=(15, 6))
for ax, image in zip(axs, images):
ax.imshow(image, cmap="gray")
plt.show()
Convolutions as mathematical operators¶
Convolution defined mathematically is an operator that takes as input two functions ($f$ and $g$) and produces an third function ($f * g$) as output. In this example, the function $f$ is our input image, Dubs!
# Read image as grayscale
dubs = iio.imread("dubs.jpg", mode="L")
plt.imshow(dubs, cmap="gray");
# f(0, 0)
dubs[0, 0]
238
If $f$ represents our input image, what is the role of $g$? Let's watch a clip from 3Blue1Brown (Grant Sanderson).
%%html
<iframe width="640" height="360" src="https://www.youtube-nocookie.com/embed/KuXjwB4LzSA?start=539&end=588" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
In this example, $g$ is a 3-by-3 box blur kernel that is applied to each patch or subimage in $f$.
$$g = \begin{bmatrix} \frac{1}{9} & \frac{1}{9} & \frac{1}{9}\\[0.3em] \frac{1}{9} & \frac{1}{9} & \frac{1}{9}\\[0.3em] \frac{1}{9} & \frac{1}{9} & \frac{1}{9} \end{bmatrix}$$
A convolution is like a special way of looping over the pixels in an image. Instead of looping over an image one pixel at a time, a convolution loops over an image one subimage (patch) at a time. Convolutions are also often called "sliding window algorithms" because they start at the top row, generate the first subimage for the top leftmost corner, then slide over 1 pixel to the right, and repeats until every subimage has been generated.
Here's how we can create this kernel in Python:
np.ones((3, 3)) / 9
array([[0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111]])
box_blur_3x3 = 1 / 9 * np.ones((3, 3))
box_blur_3x3
array([[0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111]])
Practice: Get subimages¶
Uncomment the line code in the body of the loop that will correctly slice the subimage matching the given kernel shape.
def get_subimages(image, kernel_shape):
"""Returns an array of subimages matching the given kernel shape."""
image_h, image_w = image.shape
kernel_h, kernel_w = kernel_shape # kernel.shape == subimage.shape
# This version of the program discards the first + last row, and first + last col
# Number of rows (subimages_h): image_h - kernel_h + 1
subimages_h = image_h - kernel_h + 1
# Number of cols (subimages_w): image_w - kernel_w + 1
subimages_w = image_w - kernel_w + 1
# Our resulting subimages array that will be returned
# For each row + col, we'll have a subimage where kernel.shape == subimage.shape
subimages = np.zeros((subimages_h, subimages_w, kernel_h, kernel_w))
for y in range(subimages_h):
for x in range(subimages_w):
# Fix your code (but it's correct on PollEv): swap the _w with _h
# subimages[y, x] = image[x:x + kernel_w, y:y + kernel_h]
# subimages[y, x] = image[x:x + kernel_w + 1, y:y + kernel_h + 1]
subimages[y, x] = image[y:y + kernel_h, x:x + kernel_w]
# subimages[y, x] = image[y:y + kernel_h + 1, x:x + kernel_w + 1]
return subimages
dubs_eye = dubs[50:100, 300:350]
dubs_eye_subimages = get_subimages(dubs_eye, (3, 3))
compare(dubs_eye_subimages[(30, 20)], dubs_eye[30:33, 20:23])
assert np.allclose(dubs_eye_subimages[(30, 20)], dubs_eye[30:33,20:23])
dubs_eye.shape
(50, 50)
dubs_eye_subimages.shape
(48, 48, 3, 3)
Practice: Sum of element-wise products¶
Write a NumPy expression to compute the sum of element-wise products between the current subimage and the given kernel.
def convolve(image, kernel):
"""Returns the convolution of the image and kernel where the image shrinks by kernel shape."""
subimages = get_subimages(image, kernel.shape)
subimages_h, subimages_w = subimages.shape[:2]
result = np.zeros((subimages_h, subimages_w))
for y in range(subimages_h):
for x in range(subimages_w):
# np.sum(ndarray) or ndarray.sum() gives the sum of all values in an array
# How to get the element-wise products
# kernel.shape is 3x3
# subimage.shape is 3x3
result[y, x] = (kernel * subimages[y, x]).sum()
return result
dubs_blurred = convolve(dubs, box_blur_3x3)
compare(dubs, dubs_blurred)
What will the result look like if we make the kernel larger? How about a 10-by-10 kernel instead of a 3-by-3 kernel?
box_blur_10x10 = np.ones((10, 10))
box_blur_10x10 /= box_blur_10x10.size
box_blur_10x10
array([[0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]])
The entries in this kernel sum up to 1 to ensure that the resulting image is an average over the image. 10-by-10 is a somewhat awkward shape since we no longer have a notion of the exact center pixel, but the convolution code still works.
dubs_blurred_10x10 = convolve(dubs, box_blur_10x10)
compare(dubs, dubs_blurred_10x10)
dubs_blurred_10x10
array([[239.94, 239.97, 239.99, ..., 231.81, 231.78, 231.83], [239.98, 239.99, 239.99, ..., 231.87, 231.83, 231.86], [240.01, 240. , 239.99, ..., 231.93, 231.89, 231.9 ], ..., [240. , 240. , 240. , ..., 236.88, 236.86, 236.84], [240. , 240. , 240. , ..., 236.82, 236.79, 236.76], [240. , 240. , 240. , ..., 236.76, 236.72, 236.68]])
Understanding kernels¶
What other kernels exist? Let's try to deduce what a kernel might do to an image.
This kernel has 0's all around except for a 1 in the center of the kernel.
kernel = np.zeros((3, 3))
# Identity kernel
kernel[1, 1] = 1
kernel
array([[0., 0., 0.], [0., 1., 0.], [0., 0., 0.]])
compare(dubs, convolve(dubs, kernel))
This kernel is computed by taking the double the previous mystery kernel and subtracting-out the box blur kernel.
# Setup the previous kernel (identity kernel)
kernel = np.zeros((3, 3))
kernel[1, 1] = 1
# Double the previous kernel minus the box blur kernel
# This is called the sharpen kernel
# We take 2 times the identity minus box blur since that will keep units in the same range
kernel = 2 * kernel - box_blur_3x3
kernel
array([[-0.11111111, -0.11111111, -0.11111111], [-0.11111111, 1.88888889, -0.11111111], [-0.11111111, -0.11111111, -0.11111111]])
compare(dubs, convolve(dubs, kernel))
The picture that we generated looks a little darker because of how Matplotlib is trying to interpret the range of colors as decimals and stretching them out to the range of [0, 255].
This 10-by-10 kernel places a bell curve around the center with values shrinking the further you get from the center.
values = [[1, 2, 4, 7, 11, 11, 7, 4, 2, 1]]
kernel = np.array(values) * np.array(values).T
# Gaussian blur kernel
kernel = kernel / kernel.sum()
# Show values rounded to two decimal places
kernel.round(2)
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0.01, 0.01, 0.01, 0.01, 0. , 0. , 0. ], [0. , 0. , 0.01, 0.01, 0.02, 0.02, 0.01, 0.01, 0. , 0. ], [0. , 0.01, 0.01, 0.02, 0.03, 0.03, 0.02, 0.01, 0.01, 0. ], [0. , 0.01, 0.02, 0.03, 0.05, 0.05, 0.03, 0.02, 0.01, 0. ], [0. , 0.01, 0.02, 0.03, 0.05, 0.05, 0.03, 0.02, 0.01, 0. ], [0. , 0.01, 0.01, 0.02, 0.03, 0.03, 0.02, 0.01, 0.01, 0. ], [0. , 0. , 0.01, 0.01, 0.02, 0.02, 0.01, 0.01, 0. , 0. ], [0. , 0. , 0. , 0.01, 0.01, 0.01, 0.01, 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
compare(dubs, convolve(dubs, kernel))
Practice: Template match¶
Write a function template_match
that finds instances of a small template inside a larger image. Given an image and a template that can be found within it, template_match
returns an array of values representing the pixel-by-pixel similarity between the template and each corresponding subimage.
This algorithm essentially applies a convolution over the image using the template as the kernel. However, using the raw pixel values will favor brighter pixels. Instead, we first have to de-mean both the template and the current subimage by subtracting the average pixel values.
def template_match(image, template):
"""
Given a grayscale image and template, returns a numpy array that stores the similarity of the
template at each position in the image.
"""
image_h, image_w = image.shape
template_h, template_w = template.shape
# De-mean the template
demeaned_template = template - template.mean()
# Construct result of the expected output size
result = np.zeros((image_h - template_h + 1, image_w - template_w + 1))
result_h, result_w = result.shape
for y in range(result_h):
for x in range(result_w):
# Select corresponding subimage
subimage = image[y:y + template_h, x:x + template_w]
# De-mean the subimage
demeaned_subimage = subimage - subimage.mean()
# Compute sum of element-wise products
similarity = (demeaned_subimage * demeaned_template).sum()
result[y, x] = similarity
return result
match = template_match(dubs, dubs_eye)
compare(dubs, match)
np.unravel_index(match.argmax(), match.shape)
(50, 300)