Neural Networks¶
In this lesson, we'll learn how to train two kinds of artificial neural networks to detect handwritten digits in an image. By the end of this lesson, students will be able to:
- Identify neural network model parameters and hyperparameters.
- Determine the number of weights and biases in a multilayer perceptron and convolutional neural network.
- Explain how the layers of a convolutional neural network extract information from an input image.
First, let's watch the beginning of 3Blue1Brown's introduction to neural networks while we wait for the imports and dataset to load. We'll also later explore the TensorFlow Playground to learn more about neural networks from the perspective of linear models.
%%html
<iframe width="640" height="360" src="https://www.youtube-nocookie.com/embed/aircAruvnKk?start=163&end=331" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
For this lesson, we'll re-examine machine learning algorithms from scikit-learn. We'll also later use keras
, a machine learning library designed specifically for building complex neural networks.
!pip install -q keras tensorflow-cpu
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
import matplotlib.pyplot as plt
We'll be working with the MNIST dataset, which is composed of 70,000 images of handwritten digits, of which 60,000 were drawn from employees at the U.S. Census Bureau and 10,000 drawn from U.S. high school students.
In the video clip above, we saw how to transform an image from a 28-by-28 square to a 784-length vector that takes each of the 28 rows of 28-wide pixels and arranges them side-by-side in a line. This process flattens the image from 2 dimensions to 1 dimension.
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, parser="auto")
X = X / 255
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X
pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | pixel10 | ... | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | pixel784 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
69995 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69996 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69997 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69998 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69999 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
70000 rows × 784 columns
Because the MNIST dataset already comes flattened, if we want to display any one image, we need to reshape
it back to a 28-by-28 square.
plt.imshow(X.loc[0].to_numpy().reshape(28, 28), cmap="gray")
<matplotlib.image.AxesImage at 0x7ece013ec910>
Multilayer perceptrons¶
To create a neural network, scikit-learn provides an MLPClassifier
, or multilayer perceptron classifier, that can be used to match the video example with two hidden layers of 16 neurons each. While we wait for the training to complete, let's watch the rest of the video.
%%html
<iframe width="640" height="360" src="https://www.youtube-nocookie.com/embed/aircAruvnKk?start=332&end=806" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
mlp_16x16 = MLPClassifier(hidden_layer_sizes=(16, 16), max_iter=50, verbose=True)
%time mlp_16x16.fit(X_train, y_train)
mlp_16x16.score(X_test, y_test)
Iteration 1, loss = 0.92057592 Iteration 2, loss = 0.29722795 Iteration 3, loss = 0.23756933 Iteration 4, loss = 0.21112271 Iteration 5, loss = 0.19422838 Iteration 6, loss = 0.18225105 Iteration 7, loss = 0.17324423 Iteration 8, loss = 0.16501392 Iteration 9, loss = 0.15857895 Iteration 10, loss = 0.15230066 Iteration 11, loss = 0.14720966 Iteration 12, loss = 0.14277418 Iteration 13, loss = 0.13936616 Iteration 14, loss = 0.13408516 Iteration 15, loss = 0.13174061 Iteration 16, loss = 0.12817753 Iteration 17, loss = 0.12503233 Iteration 18, loss = 0.12311539 Iteration 19, loss = 0.12014867 Iteration 20, loss = 0.11857385 Iteration 21, loss = 0.11512187 Iteration 22, loss = 0.11338866 Iteration 23, loss = 0.11157962 Iteration 24, loss = 0.10960935 Iteration 25, loss = 0.10812857 Iteration 26, loss = 0.10606686 Iteration 27, loss = 0.10418295 Iteration 28, loss = 0.10334225 Iteration 29, loss = 0.10256605 Iteration 30, loss = 0.10004637 Iteration 31, loss = 0.09941234 Iteration 32, loss = 0.09920379 Iteration 33, loss = 0.09666938 Iteration 34, loss = 0.09612929 Iteration 35, loss = 0.09478655 Iteration 36, loss = 0.09375015 Iteration 37, loss = 0.09305731 Iteration 38, loss = 0.09080475 Iteration 39, loss = 0.09038090 Iteration 40, loss = 0.08853457 Iteration 41, loss = 0.08850181 Iteration 42, loss = 0.08687455 Iteration 43, loss = 0.08565323 Iteration 44, loss = 0.08496024 Iteration 45, loss = 0.08425925 Iteration 46, loss = 0.08292657 Iteration 47, loss = 0.08190950 Iteration 48, loss = 0.08285384 Iteration 49, loss = 0.08134182 Iteration 50, loss = 0.08066025 CPU times: user 3min 16s, sys: 13min 49s, total: 17min 5s Wall time: 4min 17s
/opt/conda/lib/python3.10/site-packages/sklearn/neural_network/_multilayer_perceptron.py:691: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (50) reached and the optimization hasn't converged yet. warnings.warn(
0.9509285714285715
Neural networks are highly sensistive to hyperparameter values such as the width and depth of hidden layers. Other hyperparameter values like the initial learning rate for gradient descent can also affect training efficacy. Early stopping is used to evaluate performance on a validation set accuracy (rather than training set loss) in order to determine when to stop training.
mlp_40 = MLPClassifier(hidden_layer_sizes=(40,), learning_rate_init=0.001, early_stopping=True, verbose=True)
%time mlp_40.fit(X_train, y_train)
mlp_40.score(X_test, y_test)
Iteration 1, loss = 0.60359422 Validation score: 0.913214 Iteration 2, loss = 0.27752123 Validation score: 0.929821 Iteration 3, loss = 0.23023978 Validation score: 0.936786 Iteration 4, loss = 0.20003422 Validation score: 0.941964 Iteration 5, loss = 0.17619242 Validation score: 0.946429 Iteration 6, loss = 0.15717317 Validation score: 0.947143 Iteration 7, loss = 0.14302481 Validation score: 0.952679 Iteration 8, loss = 0.13088789 Validation score: 0.955714 Iteration 9, loss = 0.12137246 Validation score: 0.957679 Iteration 10, loss = 0.11299235 Validation score: 0.959107 Iteration 11, loss = 0.10611967 Validation score: 0.958036 Iteration 12, loss = 0.09999652 Validation score: 0.960357 Iteration 13, loss = 0.09289065 Validation score: 0.962321 Iteration 14, loss = 0.08751306 Validation score: 0.961250 Iteration 15, loss = 0.08340278 Validation score: 0.959643 Iteration 16, loss = 0.07910504 Validation score: 0.963214 Iteration 17, loss = 0.07509801 Validation score: 0.963036 Iteration 18, loss = 0.07037412 Validation score: 0.963214 Iteration 19, loss = 0.06697048 Validation score: 0.963214 Iteration 20, loss = 0.06465216 Validation score: 0.965357 Iteration 21, loss = 0.06047192 Validation score: 0.964464 Iteration 22, loss = 0.05835036 Validation score: 0.963036 Iteration 23, loss = 0.05459474 Validation score: 0.963929 Iteration 24, loss = 0.05188749 Validation score: 0.965714 Iteration 25, loss = 0.05021234 Validation score: 0.963750 Iteration 26, loss = 0.04813498 Validation score: 0.966786 Iteration 27, loss = 0.04548930 Validation score: 0.964643 Iteration 28, loss = 0.04357784 Validation score: 0.965000 Iteration 29, loss = 0.04084137 Validation score: 0.965536 Iteration 30, loss = 0.03956452 Validation score: 0.964464 Iteration 31, loss = 0.03745647 Validation score: 0.965357 Iteration 32, loss = 0.03611854 Validation score: 0.964643 Iteration 33, loss = 0.03410587 Validation score: 0.966964 Iteration 34, loss = 0.03331220 Validation score: 0.964643 Iteration 35, loss = 0.03142496 Validation score: 0.965536 Iteration 36, loss = 0.02974230 Validation score: 0.965714 Iteration 37, loss = 0.02897044 Validation score: 0.967500 Iteration 38, loss = 0.02705448 Validation score: 0.965357 Iteration 39, loss = 0.02548154 Validation score: 0.965893 Iteration 40, loss = 0.02482754 Validation score: 0.966071 Iteration 41, loss = 0.02299959 Validation score: 0.966786 Iteration 42, loss = 0.02222078 Validation score: 0.968393 Iteration 43, loss = 0.02150016 Validation score: 0.967500 Iteration 44, loss = 0.02063204 Validation score: 0.966786 Iteration 45, loss = 0.01967113 Validation score: 0.967143 Iteration 46, loss = 0.01864624 Validation score: 0.967500 Iteration 47, loss = 0.01752543 Validation score: 0.966786 Iteration 48, loss = 0.01693803 Validation score: 0.964286 Iteration 49, loss = 0.01614957 Validation score: 0.966607 Iteration 50, loss = 0.01528624 Validation score: 0.967500 Iteration 51, loss = 0.01470098 Validation score: 0.968393 Iteration 52, loss = 0.01385476 Validation score: 0.966786 Iteration 53, loss = 0.01303875 Validation score: 0.967500 Validation score did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping. CPU times: user 2min 45s, sys: 9min 55s, total: 12min 40s Wall time: 3min 11s
0.9647142857142857
We can also visualize MLP weights (coefficients) on MNIST. These 28-by-28 images represent each of the 40 neurons in this single-layer neural network.
fig, axs = plt.subplots(nrows=4, ncols=10, figsize=(12.5, 5))
# Constrain plots to the same scale (divided by 2 for better display)
vmin, vmax = mlp_40.coefs_[0].min() / 2, mlp_40.coefs_[0].max() / 2
for ax, coef in zip(axs.ravel(), mlp_40.coefs_[0].T):
activations = coef.reshape(28, 28)
ax.matshow(activations, vmin=vmin, vmax=vmax)
ax.set_axis_off()
Convolutional neural networks¶
In the 3Blue1Brown video, we examined how a single neuron could serve as an edge detector. But in a plain multilayer perceptron, neurons are linked directly to specific inputs (or preceding hidden layers), so they are location-sensitive. The MNIST dataset was constructed by centering each digit individually in the middle of the box. In the real-world, we might not have such perfectly-arranged image data, particularly when we want to identify real-world objects in a complex scene (which is probably harder than identifying handwritten digits centered on a black background).
Convolutional neural networks take the idea of a neural network and applies it to learn the weights in a convolution kernel.
The following example, courtesy of François Chollet (the original author of Keras), shows how to load in the MNIST dataset using Keras.
import keras
from keras import layers, models
import matplotlib.pyplot as plt
import numpy as np
# Load the data as (N, 28, 28) images split between 80% train set and 20% test set
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
# Scale image values from [0, 255] to [0, 1]
X_train = X_train.astype("float32") / 255
X_test = X_test.astype("float32") / 255
# Add an extra dimension to each image (28, 28, 1) as Keras requires at least 1 "color" channel
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)
input_shape = (28, 28, 1)
assert X_train.shape[1:] == input_shape and X_test.shape[1:] == input_shape
# Convert a class vector (integers) to binary class matrix
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
# Display an image without any need to reshape
plt.imshow(X_train[0], cmap="gray")
2024-05-17 19:32:07.196716: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
<matplotlib.image.AxesImage at 0x7ad6ebc5f0d0>
The Keras Sequential
model allows us to specify the specific sequence of layers and operations to pass from one step to the next.
- The
Input
layer handles inputs of the given shape. - The
Conv2D
layer learns a convolution kernel withkernel_size
number of weights plus a bias. It outputs the given number offilters
, such as 32 or 64 used in the example below. - The
MaxPooling2D
layer to downsample the output from aConv2D
layer. The maximum value in each 2-by-2 window is passed to the next layer. - The
Flatten
layer flattens the given data into a single dimension. - The
Dropout
layer randomly sets input values to 0 at the given frequency during training to help prevent overfitting. (Bypassed during inference: evaluation or use of the model.) - The
Dense
layer is a regular densely-connected neural network layer like what we learned before.
Whereas MLPClassifier
counted an entire round through the training data as an iteration, Keras uses the term epoch to refer to the same idea of iterating through the entire training dataset and performing gradient descent updates accordingly. Here, each gradient descent update step examines 200 images each time, so there are a total of 270 update steps for the 54000 images in the training set.
# Build the model: in Keras, kernel_size is specified as (height, width)
kernel_size = (3, 3)
model = keras.Sequential([
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=kernel_size, activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=kernel_size, activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.2),
layers.Dense(num_classes, activation="softmax"),
])
model.summary(line_length=80)
# Train and evaluate the model (same loss, gradient descent optimizer, and metric as MLPClassifier)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
%time model.fit(X_train, y_train, batch_size=200, epochs=10, validation_split=0.1)
# Show the accuracy score on the test set
model.evaluate(X_test, y_test, verbose=0)[1]
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 26, 26, 32) │ 320 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 13, 13, 32) │ 0 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 11, 11, 64) │ 18,496 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 5, 5, 64) │ 0 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ flatten (Flatten) │ (None, 1600) │ 0 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 1600) │ 0 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 10) │ 16,010 │ └───────────────────────────────────┴──────────────────────────┴───────────────┘
Total params: 34,826 (136.04 KB)
Trainable params: 34,826 (136.04 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 15s 53ms/step - accuracy: 0.7741 - loss: 0.8031 - val_accuracy: 0.9753 - val_loss: 0.0893 Epoch 2/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 43ms/step - accuracy: 0.9659 - loss: 0.1120 - val_accuracy: 0.9835 - val_loss: 0.0601 Epoch 3/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 44ms/step - accuracy: 0.9764 - loss: 0.0751 - val_accuracy: 0.9862 - val_loss: 0.0523 Epoch 4/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 43ms/step - accuracy: 0.9806 - loss: 0.0629 - val_accuracy: 0.9868 - val_loss: 0.0467 Epoch 5/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 21s 44ms/step - accuracy: 0.9836 - loss: 0.0547 - val_accuracy: 0.9872 - val_loss: 0.0457 Epoch 6/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 20s 43ms/step - accuracy: 0.9848 - loss: 0.0485 - val_accuracy: 0.9885 - val_loss: 0.0419 Epoch 7/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 43ms/step - accuracy: 0.9875 - loss: 0.0421 - val_accuracy: 0.9903 - val_loss: 0.0370 Epoch 8/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 43ms/step - accuracy: 0.9876 - loss: 0.0385 - val_accuracy: 0.9887 - val_loss: 0.0384 Epoch 9/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 43ms/step - accuracy: 0.9890 - loss: 0.0361 - val_accuracy: 0.9895 - val_loss: 0.0373 Epoch 10/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 12s 44ms/step - accuracy: 0.9904 - loss: 0.0331 - val_accuracy: 0.9910 - val_loss: 0.0320 CPU times: user 7min 34s, sys: 20.1 s, total: 7min 54s Wall time: 2min 18s
0.9890999794006348
Practice: Multilayer perceptron in Keras¶
Write Keras code to recreate the two-hidden-layer multilayer perceptron model that we built using scikit-learn with the expression MLPClassifier(hidden_layer_sizes=(16, 16))
. For the hidden layers, specify activation="relu"
to match scikit-learn.
# Build the model
mlp_keras = keras.Sequential([
# Not a convolutional neural network, so...
# Note that your solution won't have any Conv2D or MaxPooling2D layers!
# Instructions: Flattened the image into a 784-length array
# Two densely-connected layers of hidden neurons, 16 neurons each
# 10 output classes representing the digits 0 through 9
keras.Input(shape=input_shape),
layers.Flatten(), # shape=(784,)
layers.Dense(16, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(num_classes, activation="softmax"),
])
mlp_keras.summary(line_length=80)
# Train and evaluate the model (same loss, gradient descent optimizer, and metric as MLPClassifier)
mlp_keras.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
%time mlp_keras.fit(X_train, y_train, batch_size=200, epochs=10, validation_split=0.1)
# Show the accuracy score on the test set
mlp_keras.evaluate(X_test, y_test, verbose=0)[1]
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten_1 (Flatten) │ (None, 784) │ 0 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 16) │ 12,560 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 16) │ 272 │ ├───────────────────────────────────┼──────────────────────────┼───────────────┤ │ dense_3 (Dense) │ (None, 10) │ 170 │ └───────────────────────────────────┴──────────────────────────┴───────────────┘
Total params: 13,002 (50.79 KB)
Trainable params: 13,002 (50.79 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.5933 - loss: 1.3531 - val_accuracy: 0.9140 - val_loss: 0.3132 Epoch 2/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9058 - loss: 0.3319 - val_accuracy: 0.9368 - val_loss: 0.2248 Epoch 3/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9236 - loss: 0.2644 - val_accuracy: 0.9425 - val_loss: 0.2060 Epoch 4/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9310 - loss: 0.2381 - val_accuracy: 0.9458 - val_loss: 0.1893 Epoch 5/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9356 - loss: 0.2217 - val_accuracy: 0.9498 - val_loss: 0.1816 Epoch 6/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9399 - loss: 0.2105 - val_accuracy: 0.9485 - val_loss: 0.1811 Epoch 7/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9410 - loss: 0.2048 - val_accuracy: 0.9508 - val_loss: 0.1697 Epoch 8/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9450 - loss: 0.1903 - val_accuracy: 0.9518 - val_loss: 0.1711 Epoch 9/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9464 - loss: 0.1840 - val_accuracy: 0.9535 - val_loss: 0.1631 Epoch 10/10 270/270 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9493 - loss: 0.1744 - val_accuracy: 0.9547 - val_loss: 0.1589 CPU times: user 8.54 s, sys: 1.28 s, total: 9.82 s Wall time: 6.84 s
0.9460999965667725
Visualizing a convolutional neural network¶
To visualize a convolutional layer, we can apply a similar technique to plot the weights for each layer. Below are the 32 convolutional kernels learned by the first Conv2D
layer.
fig, axs = plt.subplots(nrows=4, ncols=8, figsize=(10, 5))
conv2d = model.layers[0].weights[0].numpy()
vmin = conv2d.min()
vmax = conv2d.max()
for ax, coef in zip(axs.ravel(), conv2d.T):
ax.matshow(coef[0].T, vmin=vmin, vmax=vmax)
for y in range(kernel_size[0]):
for x in range(kernel_size[1]):
# Display the weight values rounded to 1 decimal place
ax.text(x, y, round(coef[0, x, y], 1), va="center", ha="center")
ax.set_axis_off()
The remaining Conv2D
and Dense
layers become much harder to visualize because they have so many weights to examine. So let's instead visualize how the network activates in response to a sample image. The first plot below shows the result of convolving each of the above kernels on a sample image. The kernels above act as edge detectors.
# Construct a debugging model for extracting each layer activation from the real model
activations = models.Model(
inputs=model.input,
# Only include the first 4 layers (conv2d, max_pooling2d, conv2d_1, max_pooling2d_1)
outputs=[layer.output for layer in model.layers[:4]],
).predict(X_train[0:1])
# Show how the input image responds to a convolution using the very first filter (kernel) above
plt.imshow(activations[0][0, ..., 0], cmap="gray")
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[5], line 3 1 # Construct a debugging model for extracting each layer activation from the real model 2 activations = models.Model( ----> 3 inputs=model.input, 4 # Only include the first 4 layers (conv2d, max_pooling2d, conv2d_1, max_pooling2d_1) 5 outputs=[layer.output for layer in model.layers[:4]], 6 ).predict(X_train[0:1]) 8 # Show how the input image responds to a convolution using the very first filter (kernel) above 9 plt.imshow(activations[0][0, ..., 0], cmap="gray") File /opt/conda/lib/python3.10/site-packages/keras/src/ops/operation.py:228, in Operation.input(self) 218 @property 219 def input(self): 220 """Retrieves the input tensor(s) of a symbolic operation. 221 222 Only returns the tensor(s) corresponding to the *first time* (...) 226 Input tensor or list of input tensors. 227 """ --> 228 return self._get_node_attribute_at_index(0, "input_tensors", "input") File /opt/conda/lib/python3.10/site-packages/keras/src/ops/operation.py:259, in Operation._get_node_attribute_at_index(self, node_index, attr, attr_name) 243 """Private utility to retrieves an attribute (e.g. inputs) from a node. 244 245 This is used to implement the properties: (...) 256 The operation's attribute `attr` at the node of index `node_index`. 257 """ 258 if not self._inbound_nodes: --> 259 raise ValueError( 260 f"The layer {self.name} has never been called " 261 f"and thus has no defined {attr_name}." 262 ) 263 if not len(self._inbound_nodes) > node_index: 264 raise ValueError( 265 f"Asked to get {attr_name} at node " 266 f"{node_index}, but the operation has only " 267 f"{len(self._inbound_nodes)} inbound nodes." 268 ) ValueError: The layer sequential has never been called and thus has no defined input.
Let's compare this result to another kernel by examining the table of filters above and changing the last indexing digit to a different value between 0 and 31.
plt.imshow(activations[0][0, ..., 0], cmap="gray")
The activations from this first layer are passed as inputs to the MaxPooling2D
second layer, and so forth. We can visualize this whole process by creating a plot that shows how the inputs flow through the model.
images_per_row = 8
for i, activation in enumerate(activations):
# Assume square images: image size is the same width or height
assert activation.shape[1] == activation.shape[2]
size = activation.shape[1]
# Number of features (filters, learned kernels, etc) to display
n_features = activation.shape[-1]
n_cols = n_features // images_per_row
# Tile all the images onto a single large grid; too many images to display individually
grid = np.zeros((size * n_cols, images_per_row * size))
for row in range(images_per_row):
for col in range(n_cols):
channel_image = activation[0, ..., col * images_per_row + row]
grid[col * size:(col + 1) * size, row * size:(row + 1) * size] = channel_image
# Display each grid with the same width
scale = 1.2 / size
plt.figure(figsize=(scale * grid.shape[1], scale * grid.shape[0]))
plt.imshow(grid, cmap="gray")
plt.title(model.layers[i].name)
plt.grid(False)
What patterns do you notice about the visual representation of the handwritten digit as we proceed deeper into the convolutional neural network?