Previously we learned how to prepare dataset to feed it to our tensorflow model. So In this tutorial, we will be using the same dataset generator class that we wrote in our previous post, to get data from the dataset directory. We will also learn to build an image classifier using convolutional neural network which we will train on that dataset
Prerequisites
- Download Kaggle Cats vs Dogs dataset
- DataSetGenerator Code that we wrote in our earlier post
The Building Blocks
To build an image recognition model we need to create some basic building blocks.These are
- Convolution Layer
- Pooling Layer
- Activation Layer
- Fully Connected Layer
So we will be creating a network builder class which will be a helper class and we will add methods to create these layer. So that it will make our job easier while creating the network.I will be explaining
I will be explaining all the above block while defining them in code
Let’s Start Coding
We will start by creating the helper class, and let’s name it NetworkBuilder
import tensorflow as tf
class NetworkBuilder:
def __init__(self):
pass
So created the NetworkBuilder class, We don’t have anything to initialize so we are passing it for now
Convolution Layer Method
Let’s add a method to attach a convolution layer to our network. Convolution layer is a special kind of neural network layer which detects features or patterns in the input layer. It learns to detect features and patterns in the input. So we will be using convolutional layers in as initial feature detectors in our image recognition model.
class NetworkBuilder:
:
:
def attach_conv_layer(self, input_layer, output_size=32, feature_size=(5, 5), strides=[1, 1, 1, 1], padding='SAME',
summary=False):
with tf.name_scope("Convolution") as scope:
input_size = input_layer.get_shape().as_list()[-1]
weights = tf.Variable(tf.random_normal([feature_size[0], feature_size[1], input_size, output_size]), name='conv_weights')
if summary:
tf.summary.histogram(weights.name, weights)
biases = tf.Variable(tf.random_normal([output_size]),name='conv_biases')
conv = tf.nn.conv2d(input_layer, weights, strides=strides, padding=padding)+biases
return conv
So we created a method called attach_conv_layer , Let’s see what are the arguments we are feeding to this method.
The input_layer , to attach a new layer in the model we need an input layer for this new layer.
To create a convolutional layer we need to know the output_size which is the number of output feature map (basically number of features we want to extract) not the number of neurons.
feature_size is the size of the kernel. Basically, it will create multiple numbers of small kernels (patches) of pixels of this particular size with some random feature drawn in it, and the layer will try to find the similar patterns matching in the input image by sliding the kernel all over the image. And we create feature map from each kernels indicating the coordinates where the kernels were matching with the input image.
strides is the steps we want to move in a particular direction each time we match the feature kernels with the input image i.e. [batch step, height step, width step, channel step]
Every time we slide a kernel over an input image the size of the generated feature map is less than the size of the input image and the size reduction depends on the size of the kernel. If we want to keep the feature image size same as input image we have to pad zeros at the edges. For that, we have the padding flag.
The summary is just a flag to determine whether we want a summary for this layer in the tensorboard. If you don’t know how to use tensorboard I have a detailed explanation on tensorboard here. But this is optional and not related to designing the network, it’s totally fine if you want to ignore it.
Let’s see inside the method
Before doing anything we started new scope and named it “Convolution”. This will help us we group things together while debugging it in tensorboard.
To create the new conv layer we first need the size of the input image. or more specifically the number of the channel the input image has. For an RGB image, the number of channels is 3, it’s the last axis of the input tensor. So we used input_layer.get_shape().as_list()[-1] to get the size of the last axis, channels.
After that, we created the weights for the conv layer. weights for conv layer should be in this shape [kernal height, kernel weight, input channel size, output channels ]. we used tf.random_normal to initialize a random weight matrix and then used tf.Variable to convert it to a tensorflow variable.
Next (optional) we checked the summary flag and used the summary writer add a summary for tensor board
Then we created the biases neurons, and used tf.nn.conv2d to create a convolutional layer, we added the biases neurons with the conv2d output, and we get the final conv layer.
The Pooling Layer
We created a method for Convolution layer. Let’s create a method for attaching a pooling layer to our network. A pooling layer basically down samples the images (resize it to smaller image) because as we attach more and more convolution layers the number of feature maps starts to increase. We will down sample the feature maps so that it will be easier and faster to process. We will be using max pooling method for pooling.
class NetworkBuilder:
:
:
def attach_pooling_layer(self, input_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME'):
with tf.name_scope("Pooling") as scope:
return tf.nn.max_pool(input_layer, ksize=ksize, strides=strides, padding=padding)
So in pooling, we need input_layer just like the previous method and the strides as well as the padding as arguments. we also need another parameter called ksize.
Like the previous method, we opened a scope named it “Pooling”.
The ksize is the pooling size, which means how many pixels will be converted to one pixel in each direction. It should be in this format [batch, height, width, channel]. We don’t want to merge any pixels from batch and channel axis. So for default, we put 1 in those positions and 2 in both height and width position. So with the default parameters, the pooled image will be half of the input image in height and width.
We are also moving 2 steps in height and width, and the padding is same so that the size remains same during the kernel sliding.
We used tf.nn.max_pool to create the pooling layer and put the arguments and returned it
Activation Layer
We need some activation layer to add nonlinearity in our network otherwise the network won’t be able to learn complex functions. We will define relu, sigmoid & softmax, these three activation functions. let’s define them
class NetworkBuilder:
:
:
def attach_relu_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.relu(input_layer)
def attach_sigmoid_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.sigmoid(input_layer)
def attach_softmax_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.softmax(input_layer)
So here are the three methods I think these are self-explanatory and no need further explanations.
Fully Connected Layer
Till now we were working with images, or feature maps which are also an image in some form, and all are 3D in shape excluding the batch axis but the fully connected layer only works with 1D so we have to convert that 3d input to flat 1D input. For that reason, we need another method. We will be calling it flatten
Flatten Layer
class NetworkBuilder:
:
:
def flatten(self, input_layer):
with tf.name_scope("Flatten") as scope:
input_size = input_layer.get_shape().as_list()
new_size = input_size[-1]*input_size[-2]*input_size[-3]
return tf.reshape(input_layer, [-1, new_size])
So in the above code, we have given the scope name Flatten and calculated the total number of neurons (representing each pixel values) in the input layer excluding the batch axis. So, the total number of neurons will be (number of neurons along the height axis) x (number of neurons along the width axis) x (number of neurons along the channels axis). which is basically the last 3 axis in the input layer. Now if we multiply them we get a number and that will be the new size of the 1D vector for our fully connected layer. So now we reshape the input layer to [batchsize, newsize] where -1 is for batch size which means it can take any value and that’s our flattened layer of features ready to be classified by a fully connected layer.
Finally The Fully Connected (Dense) Layer
So to make one thing clear, fully connected Layers are basically the same kind of layers that we built in our previous tutorials.
class NetworkBuilder:
:
:
def attach_dense_layer(self, input_layer, size, summary=False):
with tf.name_scope("Dense") as scope:
input_size = input_layer.get_shape().as_list()[-1]
weights = tf.Variable(tf.random_normal([input_size, size]), name='dense_weigh')
if summary:
tf.summary.histogram(weights.name, weights)
biases = tf.Variable(tf.random_normal([size]), name='dense_biases')
dense = tf.matmul(input_layer, weights) + biases
return dense
So it’s as I already said, its same as the previous tutorials, we get the size of the input layer which is a 1D layer so we only need the last axis. we are not interested in the batch length (most of the time we are never interested in batch length). We created the weights with input and output size we created the biases neurons. then we did a matmul with the input layer, added the biases and we are done with a fully connected layer named dense (it’s called dense sometimes).
This Is The Final Network Builder Class
We completed the network builder class the whole code should like this
class NetworkBuilder:
def __init__(self):
pass
def attach_conv_layer(self, input_layer, output_size=32, feature_size=(5, 5), strides=[1, 1, 1, 1], padding='SAME',
summary=False):
with tf.name_scope("Convolution") as scope:
input_size = input_layer.get_shape().as_list()[-1]
weights = tf.Variable(tf.random_normal([feature_size[0], feature_size[1], input_size, output_size]), name='conv_weights')
if summary:
tf.summary.histogram(weights.name, weights)
biases = tf.Variable(tf.random_normal([output_size]),name='conv_biases')
conv = tf.nn.conv2d(input_layer, weights, strides=strides, padding=padding)+biases
return conv
def attach_relu_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.relu(input_layer)
def attach_sigmoid_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.sigmoid(input_layer)
def attach_softmax_layer(self, input_layer):
with tf.name_scope("Activation") as scope:
return tf.nn.softmax(input_layer)
def attach_pooling_layer(self, input_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME'):
with tf.name_scope("Pooling") as scope:
return tf.nn.max_pool(input_layer, ksize=ksize, strides=strides, padding=padding)
def flatten(self, input_layer):
with tf.name_scope("Flatten") as scope:
input_size = input_layer.get_shape().as_list()
new_size = input_size[-1]*input_size[-2]*input_size[-3]
return tf.reshape(input_layer, [-1, new_size])
def attach_dense_layer(self, input_layer, size, summary=False):
with tf.name_scope("Dense") as scope:
input_size = input_layer.get_shape().as_list()[-1]
weights = tf.Variable(tf.random_normal([input_size, size]), name='dense_weigh')
if summary:
tf.summary.histogram(weights.name, weights)
biases = tf.Variable(tf.random_normal([size]), name='dense_biases')
dense = tf.matmul(input_layer, weights) + biases
return dense
So What is Next?
We completed the network builder class, so now what? remember our original goal was to build a network. We need to do the following steps before building our first CNN network
- So let’s Save this code in a file named exactly as the class name. which is NetworkBuilder.py
- We will need the dataset Generator class that we wrote in the earlier tutorial. Save that code in another file with the exact same file name as the dataset generator class, which was DataSetGenerator.py
- Put both of the files in the same.
- Unzip the Cat vs Dog train.zip in the same directory in train folder
- Create a new python file and let’s name it MyFirstCNNModel.py
So after these steps, your working folder should look something like this

Let’s Create a CNN Model in Tensorflow
Let’s open the new file myFirstCNNModel.py and start building our model.
Importing libraries
import tensorflow as tf
from NetworkBuilder import NetworkBuilder
from DataSetGenerator import DataSetGenerator,seperateData
import datetime
import numpy as np
import os
These are the libraries that we are go ing to use. The NetworkBuilder, DataSetGenerator and seperateData are libraries that we wrote ourselves
Now let’s create the place holders
with tf.name_scope("Input") as scope:
input_img = tf.placeholder(dtype='float', shape=[None, 128, 128, 3], name="input")
with tf.name_scope("Target") as scope:
target_labels = tf.placeholder(dtype='float', shape=[None, 2], name="Targets")
So these are the inputs and the target placeholders. input shape is in this format [batch size, height, width, channels]
Let’s Design the Model
Now we will create NetworkBuilder object and we will start adding layers
nb = NetworkBuilder()
with tf.name_scope("ModelV2") as scope:
model = input_img
model = nb.attach_conv_layer(model, 32, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 32, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 64, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 64, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 128, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 128, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.flatten(model)
model = nb.attach_dense_layer(model, 200, summary=True)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 32, summary=True)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 2)
prediction = nb.attach_softmax_layer(model)
That’s our model. it’s self-explanatory. Thanks to our network builder class, creating image classifier models are super easy now.
Now we will create the optimization and accuracy blocks
with tf.name_scope("Optimization") as scope:
global_step = tf.Variable(0, name='global_step', trainable=False)
cost = tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=target_labels)
cost = tf.reduce_mean(cost)
tf.summary.scalar("cost", cost)
optimizer = tf.train.AdamOptimizer().minimize(cost,global_step=global_step)
with tf.name_scope('accuracy') as scope:
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(target_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Here we created a variable called global_step which will be updated and incremented by one every time we run the optimizer. This will help us to keep track of the total number of training steps we performed, even if we load a saved model, the global steps will be restored as well.
Visualize in Tensorboard
This is the result in TensorBoard (If you read my TensorBoard tutorial then you already know how to use tensor board).

And this is the model after expanding it (sorry I had to rotate it to fit it in the view)

We completed The CNN Image Classifier Model, let’s train it
Now we will use the DataSetGenerator class to create mini batches from the train folder, and train our newly created image classifier
# we need to run this only once
#seperateData("./train")
dg = DataSetGenerator("./train")
epochs = 10
batchSize = 10
saver = tf.train.Saver()
model_save_path="./saved model v2/"
model_name='model'
with tf.Session() as sess:
summaryMerged = tf.summary.merge_all()
filename = "./summary_log/run" + datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%s")
# setting global steps
tf.global_variables_initializer().run()
if os.path.exists(model_save_path+'checkpoint'):
# saver = tf.train.import_meta_graph('./saved '+modelName+'/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint(model_save_path))
writer = tf.summary.FileWriter(filename, sess.graph)
for epoch in range(epochs):
batches = dg.get_mini_batches(batchSize,(128,128), allchannel=False)
for imgs ,labels in batches:
imgs=np.divide(imgs, 255)
error, sumOut, acu, steps,_ = sess.run([cost, summaryMerged, accuracy,global_step,optimizer],
feed_dict={input_img: imgs, target_labels: labels})
writer.add_summary(sumOut, steps)
print("epoch=", epoch, "Total Samples Trained=", steps*batchSize, "err=", error, "accuracy=", acu)
if steps % 100 == 0:
print("Saving the model")
saver.save(sess, model_save_path+model_name, global_step=steps)
So this is our training setup. We will run the seperateData function first time to separate the images of dogs and cats into two different folders. After that, we created the DataSetGenerator object and created some variables for batch size and epochs.
Rest of the code is same as our previous tutorial. The only difference is that we added a saver.restore method and saver.save to save and restore the trained model. So the saver.save function will be called after every 100 iterations and it will save the model in the directory which we specified in model_save_path variable with the name specified in model_name variable. If we close the program during training and run the program again it will load the last saved checkpoint using saver.restore method and it will continue its training from where ever it left. So we don’t have to train it from scratch every time we run the program again.
So This is the complete CNN model and training setup Code
This is the complete CNN Image Classifier training setup code. we can run this as many time as we want and each time we run, it will start its training from the last saved checkpoint.. so the more we run and train this classifier the more accurate it will become.
import tensorflow as tf
from NetworkBuilder import NetworkBuilder
from DataSetGenerator import DataSetGenerator, seperateData
import datetime
import numpy as np
import os
with tf.name_scope("Input") as scope:
input_img = tf.placeholder(dtype='float', shape=[None, 128, 128, 1], name="input")
with tf.name_scope("Target") as scope:
target_labels = tf.placeholder(dtype='float', shape=[None, 2], name="Targets")
with tf.name_scope("Keep_prob_input") as scope:
keep_prob = tf.placeholder(dtype='float',name='keep_prob')
nb = NetworkBuilder()
with tf.name_scope("ModelV2") as scope:
model = input_img
model = nb.attach_conv_layer(model, 32, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 32, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 64, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 64, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.attach_conv_layer(model, 128, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_conv_layer(model, 128, summary=True)
model = nb.attach_relu_layer(model)
model = nb.attach_pooling_layer(model)
model = nb.flatten(model)
model = nb.attach_dense_layer(model, 200, summary=True)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 32, summary=True)
model = nb.attach_sigmoid_layer(model)
model = nb.attach_dense_layer(model, 2)
prediction = nb.attach_softmax_layer(model)
with tf.name_scope("Optimization") as scope:
global_step = tf.Variable(0, name='global_step', trainable=False)
cost = tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=target_labels)
cost = tf.reduce_mean(cost)
tf.summary.scalar("cost", cost)
optimizer = tf.train.AdamOptimizer().minimize(cost, global_step=global_step)
with tf.name_scope('accuracy') as scope:
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(target_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
dg = DataSetGenerator("./train")
epochs = 10
batchSize = 10
saver = tf.train.Saver()
model_save_path="./saved model v2/"
model_name='model'
with tf.Session() as sess:
summaryMerged = tf.summary.merge_all()
filename = "./summary_log/run" + datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%s")
# setting global steps
tf.global_variables_initializer().run()
if os.path.exists(model_save_path+'checkpoint'):
# saver = tf.train.import_meta_graph('./saved '+modelName+'/model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint(model_save_path))
writer = tf.summary.FileWriter(filename, sess.graph)
for epoch in range(epochs):
batches = dg.get_mini_batches(batchSize,(128,128), allchannel=False)
for imgs ,labels in batches:
imgs=np.divide(imgs, 255)
error, sumOut, acu, steps,_ = sess.run([cost, summaryMerged, accuracy,global_step,optimizer],
feed_dict={input_img: imgs, target_labels: labels})
writer.add_summary(sumOut, steps)
print("epoch=", epoch, "Total Samples Trained=", steps*batchSize, "err=", error, "accuracy=", acu)
if steps % 100 == 0:
print("Saving the model")
saver.save(sess, model_save_path+model_name, global_step=steps)
Some Advanced Modifications
I added one extra method in the network builder class to create similar modules like we have in the inception model by google.
here is the code
class NetworkBuilder:
:
:
def attach_inception_module(self,input_layer, output_size):
output_size_road1 = int(output_size*0.2)
road1 = self.attach_conv_layer(input_layer=input_layer, output_size=output_size_road1,
feature_size=(1, 1))
road1 = self.attach_relu_layer(road1)
output_size_road2 = int(output_size * 0.3)
road2 = self.attach_conv_layer(input_layer=input_layer, output_size=output_size_road2,
feature_size=(1, 1))
road2 = self.attach_relu_layer(road2)
road2 = self.attach_conv_layer(input_layer=road2, output_size=output_size_road2,
feature_size=(3, 3))
output_size_road3 = int(output_size * 0.3)
road3 = self.attach_conv_layer(input_layer=input_layer, output_size=output_size_road3,
feature_size=(1, 1))
road3 = self.attach_relu_layer(road3)
road3 = self.attach_conv_layer(input_layer=road3, output_size=output_size_road2,
feature_size=(5, 5))
output_size_road4 = output_size-(output_size_road1 + output_size_road2 + output_size_road3)
road4 = self.attach_pooling_layer(input_layer=input_layer, strides=[1, 1, 1, 1])
road4 = self.attach_conv_layer(input_layer=road4, output_size=output_size_road4,
feature_size=(1, 1))
with tf.name_scope("FilterConcat") as scope:
concat = tf.concat([road1, road2, road3, road4], axis=3, name="FilterConcat")
concat = self.attach_relu_layer(concat)
return concat
and this is how it looks like in TensorBoard

Limitations of our Image Classifier
To train this image classifier from scratch it will take weeks in CPU to get a decent result. And even if you are using GPU it will still be very time-consuming. So if you are like me and don’t have a computer with High-end GPU, then you are doomed.
But the good news for us, We can still train complex tensorflow model which can do these stuff without using so much computation power.
In the next tutorial, we will use Transfer learning to load an already trained image classifier model called Inception (trained by Google using multiple GPUs for weeks). and We will use its pre trained checkpoint as our base point for training, we will also modify the network structure to fit our own need. So that will take very less time to train and we can train it even in CPU.
hi, it problem.
Traceback (most recent call last):
……
feed_dict={input_img: imgs, target_labels: labels})
what is there after this “Traceback (most recent call last):”
Dear Sir,
Thank you for your help.
Traceback (most recent call last):
File “C:/Users/PETER/PycharmProjects/test/MyFirstCNNModel.py”, line 86, in
feed_dict={input_img: imgs, target_labels: labels})
File “C:UsersPETERAppDataLocalProgramsPythonPython35libsite-packagestensorflowpythonclientsession.py”, line 895, in run
run_metadata_ptr)
File “C:UsersPETERAppDataLocalProgramsPythonPython35libsite-packagestensorflowpythonclientsession.py”, line 1104, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8, 8) for Tensor ‘Target/Targets:0’, which has shape ‘(?, 2)’
How many classes you have in your dataset. I am guessing its 8 classes.
Traceback (most recent call last):
File “C:/Users/jackn/PycharmProjects/tf-foodar/tf-foodar-beta.py”, line 87, in
feed_dict={input_img: imgs, target_labels: labels})
File “C:Usersjacknanacondaenvslibsite-packagestensorflowpythonclientsession.py”, line 895, in run
run_metadata_ptr)
File “C:Usersjacknanacondaenvslibsite-packagestensorflowpythonclientsession.py”, line 1104, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 3) for Tensor ‘Target/Targets:0’, which has shape ‘(?, 2)’
Same error here. Any ideas? I have three folders with my images inside, each with 12 images in them (obviously will be more for actual training).
Any suggestions?
Please how can I feed an image as an input for prediction after training the CNN model. It’s really urgent any help ?
Dear Sir,
Thank you , I success let the network classify 2 types(classes) of images .
Does this network provide Multi-label image classification approach?
I wanna train it classify 5 different types images , howerer , it shows
Traceback (most recent call last):
File “MyFirstCNNModel.py”, line 91, in
feed_dict={input_img: imgs, target_labels: labels})
File “C:UsersOwnerAnaconda3libsite-packagestensorflowpythonclientsession.py”, line 895, in run
run_metadata_ptr)
File “C:UsersOwnerAnaconda3libsite-packagestensorflowpythonclientsession.py”, line 1104, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (9, 5) for Tensor ‘Target/Targets:0’, which has shape ‘(?, 2)’
Hello, @eternalflame21:disqus You are doing a great job, Your sense of teaching is really impressive. Thanks for Good Tutorial.
When you will upload the next video ?