In this post, we’ll introduce you to the Convolutional Neural Network and its application to image classification.
In this assignment, we will
In case you aren’t already familiar with Google Colaboratory, head on over to and get started with your own cloud-hosted Jupyter notebook. There’s virtually no setup required and it’s free for our intended use. Google is kind enough to even throw in GPU access for free, which we will be using in the later part of this notebook. To get started, create a new notebook and upload our dataset by dragging the .zip
file into the Files sidebar. The following steps will help you create a subdirectory and unzip the file.
For this image classification task, we will be using a popular multi-class classification dataset called the Flowers Recognition Kaggle dataset. This dataset consists of 4,242 low-resolution 2-D images each labeled with one of five flower types.
To follow along with this notebook, you should obtain a copy of the dataset via Kaggle at the link here.
### Splitting the dataset
# import flowers_create_dataset
Categorising the flower dataset
Creating the dataset
Author: Pierre Nugues
import os
from os import path
import random
import shutil
# The machine name (False if using Colab)
vilde = False
# To create the same dataset
# Here write the path to your dataset
if vilde:
base = 'src/'
original_dataset_dir = os.path.join(base, 'flowers_original')
dataset = os.path.join(base, 'flowers_split')
train_dir = os.path.join(dataset, 'train')
validation_dir = os.path.join(dataset, 'validation')
test_dir = os.path.join(dataset, 'test')
categories = os.listdir(original_dataset_dir)
categories = [category for category in categories if not category.startswith('.')]
print('Image types:', categories)
data_folders = [os.path.join(original_dataset_dir, category) for category in categories]
pairs = []
for folder, category in zip(data_folders, categories):
images = os.listdir(folder)
images = [image for image in images if not image.startswith('.')]
pairs.extend([(image, category) for image in images])
img_nbr = len(pairs)
train_images = pairs[0:int(0.6 * img_nbr)]
val_images = pairs[int(0.6 * img_nbr):int(0.8 * img_nbr)]
test_images = pairs[int(0.8 * img_nbr):]
# print(train_images)
for image, label in train_images:
src = os.path.join(original_dataset_dir, label, image)
dst = os.path.join(train_dir, label, image)
os.makedirs(os.path.dirname(dst), exist_ok=True)
shutil.copyfile(src, dst)
for image, label in val_images:
src = os.path.join(original_dataset_dir, label, image)
dst = os.path.join(validation_dir, label, image)
os.makedirs(os.path.dirname(dst), exist_ok=True)
shutil.copyfile(src, dst)
for image, label in test_images:
src = os.path.join(original_dataset_dir, label, image)
dst = os.path.join(test_dir, label, image)
os.makedirs(os.path.dirname(dst), exist_ok=True)
shutil.copyfile(src, dst)
Image types: ['sunflower', 'tulip', 'rose', 'daisy', 'dandelion']
The Flowers Recognition dataset consists of the following five class labels (flower types)
classes = ['daisy', 'dandelion', 'rose', 'sunflower', 'tulip']
def set_variables(base):
base_dir = base
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')
return train_dir, validation_dir, test_dir
base_dir = 'src/flowers_split/'
train_dir, validation_dir, test_dir = set_variables(base_dir)
Now that we’ve collected our dataset and split it into train, test and validation sets, let’s move onto creating our first Convolutional Neural Network model.
Below is an outline of the first task and some suggestions from the EDAN95 course instructors
# Model parameters
epochs = 20
batch_size = 128
target_size = (150, 150)
In order to prepare our data, the 2-D images, for use in our model, we have to first perform several pre-processing steps. This involves reading in our images in JPEG format, resizing them to 150x150px for faster processing, then converting their RGB pixel values into floating-point tensors. To help our neural network in the training process, we will also rescale pixel values from 0-255 to the same interval [0,1] for every image. This allows our images to contribute more evenly to the total loss (more on that here).
To accomplish this task in real-time, the Keras ImageDataGenerator
class will be used.
# This is module with image preprocessing utilities
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
def data_preprocessing():
# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
# All images will be resized to 150x150
# Since we use categorical_crossentropy loss, we need categorical labels
validation_generator = test_datagen.flow_from_directory(
return train_datagen, test_datagen, train_generator, validation_generator
Let’s run our data generator and see how many images we have to process…
# Data preprocessing
train_datagen, test_datagen, train_generator, validation_generator = data_preprocessing()
Found 2590 images belonging to 5 classes.
Found 863 images belonging to 5 classes.
Checking to make sure we’ve specified the right image size in our generator…
print('image dimensions:', train_generator.target_size)
image dimensions: (150, 150)
Futhermore, we can take a look at the output of our train_generator
to verify our image dimensions and the batch size (number of images per batch):
for data, labels in train_generator:
print("data batch shape: (# samples={}, width(px)={}, height(px)={}, channels={})".format(data.shape[0], data.shape[1], data.shape[2], data.shape[3]))
print("labels batch shape: (# samples={}, # classes={})".format(labels.shape[0], labels.shape[1]))
data batch shape: (# samples=128, width(px)=150, height(px)=150, channels=3)
labels batch shape: (# samples=128, # classes=5)
Since our dataset consists of five different classes (flower types), we want to first visualise the number of samples for each output class. To do so, we can get the count of the unique samples (images) in each class from our training set, then plot the counts on a bar chart using a familiar Python library.
from collections import OrderedDict
import numpy as np
unique, counts = np.unique(train_generator.classes, return_counts=True)
vals = OrderedDict(zip(unique, counts))
class_counts = []
for i in range(5):
import matplotlib.pyplot as plt
x = np.arange(len(class_counts)), class_counts)
xlabel = list(train_generator.class_indices.keys())
plt.xticks(x, xlabel)
targets = [(counts)]
plt.pie(targets, labels=xlabel, explode=[0.05, 0.05, 0.05, 0.05, 0.05], autopct='%1.1f%%')
We’ll use the following plot_history()
method to visualise our model’s training and validation accuracy and loss for each epoch. The input parameter is a History callback object which keeps track of the training metrics we want to visualise.
def plot_history(history):
acc = history.history['categorical_accuracy']
val_acc = history.history['val_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
The structure of our Convolutional Neural Network (CNN) is defined as a Keras Sequential
model with a stack of alternating Conv2D
and MaxPooling2D
layers. At the core of our CNN is the Conv2D
layer which transforms the input and outputs the transformation to the next layer. The tranformation performed in Conv2D
is known as a convolution operation. In this operation, a filter is applied over the entire input in a sliding window protocol (from top-left to bottom-right of the matrix). The dot product is computed at each step and its resulting value is stored in the output channel. Once the filter has convolved the entire input, a new representation of our input is formed. This output channel is referred to as a feature map. Most commonly, a filter is applied over an input image to detect patterns such as edges, curves or textures. For a more complete understanding of the convolutional layer, read this article.
The MaxPooling2D
layer combination is used to reduce the dimensionality of the input (image) by reducing the number of pixels in the output from the previous convolutional layer. This reduces the amount of computation performed and the number of parameters to learn from the feature map.
To speed up our model training, we will use the ReLU
(Rectified Linear Unit) activation function in our Conv2D
layers. This function returns 0
if it receives any negative input, otherwise it will return the positve input value x
. This function speeds up the gradient computation by setting any negative values to zero. For futher reading, see this article about ReLU on Keras.
from keras import layers
from keras import models
from keras import optimizers
from keras import metrics
from tensorflow.keras.utils import to_categorical
# Building our network (Conv2D/MaxPooling2D stack + Dense/Flatten layer)
def build_network():
model = models.Sequential()
model.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(150,150,3)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
# This converts our 3-D feature maps into 1-D feature vectors
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(len(classes), activation='softmax'))
metrics =['categorical_accuracy'])
return model
model = build_network()
From the above model summary, we can see two things. One is that our feature map size decreases from 150x150 in our first Conv2D
layer to 15x15 in our last Conv2D
layer. Another is that the feature map depth increases from 32 to 128 as the network progresses. François Chollet–the creator of Keras, notes that this is a common pattern in almost all convnets.
The last layer in our model is specific to our task of multi-class classification. We expect a final layer of size 5, where each node corresponds to one of the five total classes (flower types). A softmax
last-layer activation function is used to predict a multinomial probability distribution, or, in other words, the likelihood each image corresponds to one of the five possible target classes. The softmax
constraint specifies that the sum of the five probability values must add up to 1.0. After our prediction, we will use the argmax
function to “select” the most-likely class label (one with the greatest probability value) for each predicted sample in our output.
Now that we’ve built our first CNN, it’s time to run it through our training data. In order to do that, we use the keras.preprocessing.image
class we specified earlier. The fit_generator
method takes all the same parameters as a standard fit
method aside from the first and most important one–the input train_generator
object. This object “batches” our training data into pre-processed chunks, performing the resizing and normalising of our images according to the specifications we set at the start of this notebook. To be more precise about our other parameters, we must remember which batch_size
we set for our train_generator
. In our case, this was set to 128. This means that 128 images will be fetched from our training set directory, pre-processed, then fed into our network. During each epoch, we pass through every example in our training set. Thus, we must also specify the number of steps_per_epoch
such that every training example is seen in each epoch for n number of batches. The following np.ceil
calculation helps you in determining that amount.
def train_model(model, epochs=1):
history = model.fit_generator(
steps_per_epoch=np.ceil(train_generator.samples / train_generator.batch_size),
validation_steps=np.ceil(validation_generator.samples / validation_generator.batch_size))
train_model(model, epochs)
Epoch 1/20
For a simple Convolutional Neural Network, our initial results don't look all that bad! However, examining the plots a bit closer we see that the validation accuracy reaches a maximum around the 10th epoch. We also see that our validation loss reaches a minimum around the 7th epoch. Conversely, our training loss appears to decrease linearly until it reaches 0. This is characteristic of overfitting.
Why does this behavior occur? Well, a simple answer is that the number of training samples we have (ca. 2000) is relatively few. In order to combat overfitting, there are many popular techniques from adding Dropout layers to penalising our model’s weights with weight decay (L2 regularisation). Another technique specific to computer vision is data augmentation. We’ll be using this in our next model to improve our results.
For the ML enthusiasts out there, we’ll report our model’s performance on several important metrics, namely precision, recall, and f1-score. In addition to generating a classification report, we’ll produce a confusion matrix to quantify how many misclassifications our model is making.
from sklearn.metrics import confusion_matrix, classification_report
from mlxtend.plotting import plot_confusion_matrix
def evaluate_model(model):
test_generator = test_datagen.flow_from_directory(
y_prob = model.predict_generator(test_generator, np.ceil(test_generator.samples / test_generator.batch_size))
# Select greatest class probability for each sample in y_prob
y_pred = np.argmax(y_prob, axis=1)
print('-'*10 + 'Classification Report' + '-'*5)
print(classification_report(test_generator.classes, y_pred, target_names=classes))
print('-'*10 + 'Confusion Matrix' + '-'*10)
#print(confusion_matrix(test_generator.classes, y_pred))
plot_confusion_matrix(confusion_matrix(test_generator.classes, y_pred))
y_pred = evaluate_model(model)
Found 864 images belonging to 5 classes.
----------Classification Report-----
precision recall f1-score support
daisy 0.63 0.60 0.62 139
dandelion 0.68 0.80 0.74 209
rose 0.73 0.22 0.34 151
sunflower 0.71 0.75 0.73 166
tulip 0.58 0.77 0.66 199
accuracy 0.65 864
macro avg 0.67 0.63 0.62 864
weighted avg 0.66 0.65 0.63 864
----------Confusion Matrix----------
So, our model’s overall F1 score was 0.65
or ca. 65% accuracy. We’ll use this as a benchmark to compare to as we make incremental progress in the following models.
In the next model, we’ll be using a clever approach to tackle overfitting specific to deep learning models for computer vision applications. This approach involves generating new data (more images) by “augmenting” the existing samples. We accomplish this by applying a number of random transformations to the images in our dataset to produce more samples that appear new to the model. Our handy Keras ImageDataGenerator
allows us to do just that while remaining consistent with our batched, pre-processed image pipeline.
Here’s an outline of what we will accomplish…
# The data augmentation generator
datagen = ImageDataGenerator(
In order to better understand each of the parameters in our datagen, here’s a description provided to us by F. Chollet:
is a value in degrees (0-180), a range within which to randomly rotate pictures.width_shift
and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally.shear_range
is for randomly applying shearing transformations.zoom_range
is for randomly zooming inside pictures.horizontal_flip
is for randomly flipping half of the images horizontally – relevant when there are no assumptions of horizontal asymmetry (e.g. real-world pictures).fill_mode
is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.With our datagen parameters specified, we have successfully completed the first step in our augmentation process.
Let’s visualise a set of our augmented images…
from keras.preprocessing import image
# Selecting the "daisy" class images from the training set
folder_path = 'src/flowers_split/train/daisy'
file_names = [os.path.join(folder_path, fname) for fname in os.listdir(folder_path)]
# We pick one image to "augment"
img_path = file_names[50]
# Read the image and resize it
img = image.load_img(img_path, target_size=target_size)
# Convert it to a Numpy array with shape (150, 150, 3)
x = image.img_to_array(img)
# Reshape it to (1, 150, 150, 3)
x = x.reshape((1,) + x.shape)
# The .flow() command below generates batches of randomly transformed images.
# It will loop indefinitely, so we need to `break` the loop at some point!
i = 0
for batch in datagen.flow(x, batch_size=1):
imgplot = plt.imshow(image.array_to_img(batch[0]))
i += 1
if i % 4 == 0:
Great! We’ll now be able to train our model on a larger dataset using more images like the samples above. However, there’s still one extra step we must take to prevent overfitting. The input images, although augmented, are still heavily intercorrelated with the original dataset. In other words, most of the “new” information we’ve introduced is from our original data. To help address this issue, we add a Dropout layer to our model.
To further fight overfitting, we will also add a Dropout layer to our model, right before the densely-connected classifier. Dropout is a regularization technique that helps prevent the model from overfitting by randomly “dropping” nuerons from the network in each training iteration. The goal of dropout is to encourage each hidden unit in the neural network to “learn” to work with a random set of surviving hidden units (neurons), creating a more robust network. This happens because each hidden unit must learn to encode a representation of the feature map without relying on other hidden units that might be “dropped” over the training iterations.
# Building our network (Conv2D/MaxPooling2D stack + Dropout + Dense/Flatten layer)
def build_network():
model = models.Sequential()
model.add(layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(150,150,3)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
# This converts our 3-D feature maps into 1-D feature vectors
# Here we set our dropout rate to 0.2
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
# Setting learning rate manually
metrics =['categorical_accuracy'])
return model
model = build_network()
We’ll now train our updated model on the augmented images and visualise our training performance. Since we want to increase the number of samples seen by our model, we will train our model for 100 epochs (up from 20). This will allow our datagen to generate more augmented samples than previously seen in the original, unmodified set.
import time
def train_model(model, epochs=100):
start_time = time.time()
train_datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
# All images will be resized to 150x150
# Note that validation data shouldn't be augmented
validation_generator = test_datagen.flow_from_directory(
target_size=(150, 150),
history = model.fit_generator(
steps_per_epoch=np.ceil(train_generator.samples / train_generator.batch_size),
validation_steps=np.ceil(validation_generator.samples / validation_generator.batch_size))
print('Total training time (sec):', time.time() - start_time)
train_model(model, epochs=100)
Found 2590 images belonging to 5 classes.
Found 863 images belonging to 5 classes.
Epoch 90/100
Total training time (sec): 1925.0976030826569
def evaluate_model(model):
test_generator = test_datagen.flow_from_directory(
y_prob = model.predict_generator(test_generator,
np.ceil(test_generator.samples / test_generator.batch_size))
# Select class label with highest probability
y_pred = np.argmax(y_prob, axis=1)
print('-'*10 + 'Classification Report' + '-'*5)
print(classification_report(test_generator.classes, y_pred, target_names=classes))
print('-'*10 + 'Confusion Matrix' + '-'*10)
plot_confusion_matrix(confusion_matrix(test_generator.classes, y_pred))
Found 864 images belonging to 5 classes.
----------Classification Report-----
precision recall f1-score support
daisy 0.63 0.58 0.60 139
dandelion 0.73 0.61 0.67 209
rose 0.66 0.44 0.53 151
sunflower 0.53 0.88 0.66 166
tulip 0.64 0.59 0.61 199
accuracy 0.62 864
macro avg 0.64 0.62 0.61 864
weighted avg 0.64 0.62 0.62 864
----------Confusion Matrix----------
While the augmented images didn’t seem to improve the model’s F1 score, our model’s training curves indicate that we are no longer overfitting. In other words, our training curve more closely matches the validation curve. In the next section of this notebook, we will be improving the accuracy of our classifier with a pre-trained model via a technique called transfer learning.
A common and highly effective approach to deep learning on small image datasets is to use a pretrained network. A pretrained network is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. (F. Chollet, Ch. 5.3 - Deep Learning with Python). What makes the use of pretrained convolutional bases desirable is that these networks are often trained on large enough datasets (e.g. ImageNet’s 1.4 mil. labeled images of 1,000 classes) that the pretrained network’s learned representations can encompass a general enough model of the visual world. Through feature extraction and fine-tuning techniques, we can “port” the learned features of the pretrained convbases across other models, such as our own. In the next section of this notebook, we’ll be exploring the InceptionV3 pretrained convolutional base and applying it to our own model using a feature extraction approach.
Our first step is to select a pretrained network. For this task, we will use Google’s InceptionV3. This is a Keras image classification model that has been optionally loaded with weights pre-trained on the ImageNet dataset. The InceptionV3 is referred to as our “convolutional base”. Note that the densely-connected classifier of the InceptionV3 has been removed (with parameter include_top=False
). The reason for this decision is that our pretrained network’s output, the densely-connected prediction layer, is not usable for our task specific to classifying a subset of five flower types. The ImageNet dataset used to train the InceptionV3 consists of 1,000 unique classes, and thus we will not have any use for the majority of the model’s output tensors (the distinct classes’ probability distributions).
Thankfully, Keras provides us with a simple way to leave this part of the model out. Instead, we will be preserving layers that come earlier in the InceptionV3 which extract local, highly generic feature maps (such as visual edges, colors, and textures).
# Importing the pretrained network as a Keras model
from tensorflow.keras.applications import InceptionV3
# Initialising our conv_base object with desired parameters
conv_base = InceptionV3(weights='imagenet',
input_shape=(150, 150, 3))
Downloading data from
87916544/87910968 [==============================] - 0s 0us/step
Let’s look at the architecture of our convolutional base in detail…
Now we’ll check the last layer in our convolutional base. It’s input shape will become our input shape for the Keras Sequential
conv_output_shape = conv_base.layers[-1].output_shape[1:]
(3, 3, 2048)
We’ll be covering two techniques to utilise our conv_base
pretrained model. The first, demonstrated below, runs the convolutional base over our dataset, recording its output to Numpy arrays on disk. The resulting output, our image features, will be generated by calling the predict
method of the conv_base
model. This will serve as input to a standalone densely-connected classifier. While this first technique is computationally cheap to run, we will not be able to levarage data augmentation.
from pathlib import Path
def count_files(path, extension):
directory = Path(path)
return len(list(directory.glob('**/*.{extension}'.format(extension=extension))))
def extract_features(directory):
sample_count = count_files(directory, 'jpg')
features = np.zeros(shape=(sample_count, *conv_output_shape))
labels = np.zeros(shape=(sample_count, len(classes)))
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(
i = 0
for inputs_batch, labels_batch in generator:
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
labels[i * batch_size : (i + 1) * batch_size] = labels_batch
i += 1
if i * batch_size >= sample_count:
# Note that since generators yield data indefinitely in a loop,
# we must `break` after every image has been seen once.
return features.reshape(-1,, labels
#return features, labels
x_train, y_train = extract_features(train_dir)
x_val, y_val = extract_features(validation_dir)
x_test, y_test = extract_features(test_dir)
Found 2590 images belonging to 5 classes.
Found 863 images belonging to 5 classes.
Found 864 images belonging to 5 classes.
def build_model():
model = models.Sequential()
model.add(layers.Dense(512, activation='relu',
model.add(layers.Dense(len(classes), activation='softmax'))
metrics =['categorical_accuracy'])
return model
model = build_model()
def train_model(model, x_train, y_train, batch_size, epochs):
history =
validation_data=(x_val, y_val)
train_model(model, x_train, y_train, batch_size, epochs=120)
Epoch 110/120
y_pred = model.predict_classes(x_test)
print('-'*10 + 'Classification Report' + '-'*5)
print(classification_report(np.argmax(y_test, axis=1), y_pred, target_names=classes))
print('-'*10 + 'Confusion Matrix' + '-'*10)
plot_confusion_matrix(confusion_matrix(np.argmax(y_test, axis=1), y_pred))
----------Classification Report-----
precision recall f1-score support
daisy 0.82 0.81 0.81 139
dandelion 0.89 0.87 0.88 209
rose 0.77 0.74 0.75 151
sunflower 0.87 0.81 0.84 166
tulip 0.75 0.85 0.80 199
accuracy 0.82 864
macro avg 0.82 0.81 0.82 864
weighted avg 0.82 0.82 0.82 864
----------Confusion Matrix----------
WARNING: “This technique is so expensive that you should only attempt it if you have access to a GPU–it’s absolutely intractable on CPU. If you can’t run your code on GPU, then the previous technique is the way to go” (F. Chollet, p.149).
Thankfully, Google provides us free access to a Tesla K80 GPU or equivalent on the Colab platform. To enable GPU hardware acceleration, go to Edit > Notebook settings
and select GPU
as our Hardware accelerator
We can verify that our GPU is connected with the following script:
import tensorflow as tf
If you see a device name after running the above, you’re good to go!
epochs = 20
batch_size = 128
target_size = (150, 150)
conv_base = InceptionV3(weights='imagenet',
input_shape=(150, 150, 3))
In this technique, we will be extending our model using the conv_base
as a layer in our network. We can do so by adding Dense
layers on top and running the whole thing end-to-end on the input data. This technique allows us to use data augmentation, because every input image is going through the convolutional base every time it is seen by the model. However, for this same reason, this technique is far more expensive than the first one.
Before we compile and train our model, we have to freeze the pretrained network weights. In other words, we want to prevent the conv_base
weights from being updated during the training process. This is an important step since the use of randomly-initialised Dense
layers would propagate very large weight updates through the network, effectively destroying the representations previously learned (Chollet, p.150).
# Freeze pretrained network weights
conv_base.trainable = False
def build_model(conv_base):
model3 = models.Sequential()
model3.add(layers.Dense(64, activation='relu'))
model3.add(layers.Dense(len(classes), activation='softmax'))
metrics =['categorical_accuracy'])
return model3
model = build_model(conv_base)
def train_model(model, epochs=1):
train_datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
# All images will be resized to 150x150
# Note that validation data shouldn't be augmented
validation_generator = test_datagen.flow_from_directory(
target_size=(150, 150),
history = model.fit_generator(
steps_per_epoch=np.ceil(train_generator.samples / train_generator.batch_size),
validation_steps=np.ceil(validation_generator.samples / validation_generator.batch_size))
train_model(model, epochs=30)
Found 2590 images belonging to 5 classes.
Found 863 images belonging to 5 classes.
Epoch 20/30
def evaluate_model(model):
test_generator = test_datagen.flow_from_directory(
y_prob = model.predict_generator(test_generator, np.ceil(test_generator.samples / test_generator.batch_size))
y_pred = np.argmax(y_prob, axis=1)
print('-'*10 + 'Classification Report' + '-'*5)
print(classification_report(test_generator.classes, y_pred, target_names=classes))
print('-'*10 + 'Confusion Matrix' + '-'*10)
plot_confusion_matrix(confusion_matrix(test_generator.classes, y_pred))
Found 864 images belonging to 5 classes.
----------Classification Report-----
precision recall f1-score support
daisy 0.83 0.87 0.85 139
dandelion 0.86 0.88 0.87 209
rose 0.76 0.77 0.77 151
sunflower 0.82 0.85 0.84 166
tulip 0.88 0.81 0.85 199
accuracy 0.84 864
macro avg 0.83 0.84 0.84 864
weighted avg 0.84 0.84 0.84 864
----------Confusion Matrix----------
Fine-tuning refers to the unfreezing of a few layers in the conv_base
frozen model base. We choose to unfreeze the layers which encode more specialised features. These layers in the InceptionV3
are at the top of the network.
Warning: Before we proceed, make sure that you have already trained your fully-connected classifier from above.
To further clarify, we will be re-purposing the InceptionV3
pre-trained model by fine-tuning the layers in the base. We already removed the convolutional base’s original classifier by setting include_top=False
, which allowed us to use only the layers in the base which contained abstract representations of features useful for general image classification tasks. We then added our own fully-connected classifier to the network that allowed us to restrict the predictions to our specific problem domain (predicting the five classes of the flowers dataset). In order to preserve the InceptionV3
’s learned weight values, we set conv_base.trainable = False
. This step froze all the pre-trained model layers and kept their weight values the same during the training process. Now, we want to fine-tune our pre-trained model by setting some of the conv_base
top layers to layer.trainable = True
. We choose to unfreeze the higher layers whose specific features are said to be problem dependendent, as opposed to the lower layers whose general features are problem independent.
The figure below illustrates the two strategies we use in this notebook.
Figure 1. Fine-tuning strategies.
Strategy 2 was used previously to train only the fully-connected layers, while Strategy 3 will now be used to train both the fully-connected layers and some of the InceptionV3
top layers.
To help you choose which of the three transfer learning strategies is best for your specific dataset, you can refer to the following size-similarity perceptual map.
Figure 2. Size-similarity perceptual map.
Since we started off with a relatively small dataset, we chose Strategy 2 (Quadrant IV–small dataset, similar to pre-trained model dataset). We removed the original output layer from the InceptionV3
and ran the pre-trained model as a fixed feature extractor. The resulting features were used to train a new classifier, the fully-connected classifier we created in the previous model. In the model below, we will be assuming a balance between Quadrants I and III (small dataset, somewhat dissimilar to pre-trained model dataset).
import keras
print('Num layers in InceptionV3 base:', len(conv_base.layers))
Num layers in InceptionV3 base: 311
conv_base.trainable = True
### Fine-tuning
# set all layers trainable by default
for layer in conv_base.layers:
layer.trainable = True
if isinstance(layer, keras.layers.BatchNormalization):
# we do aggressive exponential smoothing of batch norm
# parameters to faster adjust to our new dataset
layer.momentum = 0.9
# fix deep layers (fine-tuning only last 50 layers)
for layer in conv_base.layers[:-50]:
# fix all but batch norm layers, because we neeed to update moving averages for a new dataset!
if not isinstance(layer, keras.layers.BatchNormalization):
layer.trainable = False
We will be using the RMSprop
optimiser with a very low learning rate in order to limit the magnitude of the modifications made to the representations of the layers that we are fine-tuning. Updates that are too large might harm these representations (F. Chollet, p.178).
### Re-training model
metrics =['categorical_accuracy'])
train_model(model, epochs=100)
Found 2590 images belonging to 5 classes.
Found 863 images belonging to 5 classes.
Epoch 90/100
### Evaluating model performance
Found 864 images belonging to 5 classes.
----------Classification Report-----
precision recall f1-score support
daisy 0.86 0.84 0.85 139
dandelion 0.90 0.90 0.90 209
rose 0.86 0.83 0.84 151
sunflower 0.89 0.86 0.88 166
tulip 0.84 0.90 0.87 199
accuracy 0.87 864
macro avg 0.87 0.87 0.87 864
weighted avg 0.87 0.87 0.87 864
----------Confusion Matrix----------
Great news–we were able to improve our model’s F1 score ca. 3% by fine-tuning our pre-trained convolutional base. Unfreezing some of the layers in the base let us successfully repurpose the pre-trained model weights to better fit our task. Let’s save our final model and call it a day! Thanks for sticking around this far.
### Saving final model'flowers_final.h5')
This assignment was prepared by Pierre Nugues, an extroadinary professor of Semantic Systems and Language Technology at Lunds Tekniska Högskola. More about him can be found here. Most of the inspiration for this notebook came from François Chollet, creator of Keras and author of Deep Learning with Python. His version of this task is found in notebooks 5.2-using-convnets-with-small-datasets.ipynb and 5.3-using-a-pretrained-convnet.ipynb. Another version can be found on the TensorFlow Guide blog here. Pedro Marcelino helped inspire Figures 1 and 2 in Fine-tuning, link to his article here. Also, special thanks to Alexander Maemev on Kaggle for the Flowers Recognition dataset used in this task, available publicly at the link here.
