Easy Guide to Image Recognition with CNN

6 min readJan 3, 2024

This post will focus on classifying complex images from the Canadian Institute for Advanced Research 10 classes (CIFAR-10) dataset. To learn the introduction of Convolutional Neural Network (CNN), see my previous post titled “Comparing SVM and CNN in Recognizing Handwritten Digits: An Overview.”

The CIFAR-10 dataset is commonly used for testing and comparing computer vision models, which consists of 60,000 color images divided into 10 classes as follows: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

Initially, we use CNN for classifying images into one of ten categories using the CIFAR-10 dataset, which is available in TensorFlow. The CIFAR-10 dataset includes:

train_images: These are the training images, each measuring 32x32 pixels. They represent various objects in 10 different classes.
train_labels: These labels correspond to the training images and specify the category (class) to which each image belongs. There are 10 possible class labels.
test_images: Similar to train_images, these are the test images used to evaluate the model’s performance. They are also 32x32-pixel images.
test_labels: These labels match the test_images and indicate the class for each test image. They are used to assess the accuracy of the model’s predictions.

This tutorial showcases Python code in a Jupyter Notebook in Example 5. You can download the notebook from this link: DLBasicsTutorial.ipynb.

# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install pip install pandas numpy matplotlib seaborn scikit-learn scipy

# Importing libraries
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np

datasets.cifar10.load_data(): It is used to load the CIFAR dataset with TensorFlow, which splits it into a training set with 50,000 images and a testing set with 10,000 images.

# Load dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

The use of normalized pixel values with 255 for 8-bit color images to scale them to a range between 0 and 1 is essential for machine learning as it helps in the efficient training of models by standardizing the input data.

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

Image Processing in CNN Layers:

First Convolutional Layer:
▪ Input Size: 32x32x3 (Height x Width x Channels), where Channels=3 indicates that the images are in color.
▪ Number of Filters: 32
▪ Convolutional Kernel Size: 3x3
▪ Activation: ReLU
▪ Output: Approximately 30x30x32. The output size reduces to 30x30 from 32x32 due to the 3x3 kernel size and no padding. As the kernel moves across the image, it can’t cover the image borders by reducing the size by 2 pixels in both width and height.
Max-Pooling Layer:
▪ Pooling Window Size: 2x2, used to halve the height and width of the feature map.
▪ Output: Approximately 15x15x32
Second Convolutional Layer:
▪ Input: 15x15x32
▪ Filters: 64
▪ Kernel Size: 3x3
▪ Activation: ReLU
▪ Output: Approximately 13x13x64
Max-Pooling Layer:
▪ Pooling Window Size: 2x2, used to halve the height and width of the feature map.
▪ Output: Approximately 6x6x64
Third Convolutional Layer:
▪ Input: 6x6x32
▪ Filters: 64
▪ Kernel Size: 3x3
▪ Activation: ReLU
▪ Output: Approximately 4x4x64
Flatten Layer: Converts 2D maps to a 1D vector of size 4x4x64 = 1024.
Two Dense Layers:
▪ First: 64 Neurons, ReLU activation.
▪ Second: 10 Neurons for classification.

# Build the CNN model
model = models.Sequential([
  layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
  layers.MaxPooling2D((2, 2)),
  layers.Conv2D(64, (3, 3), activation='relu'),
  layers.MaxPooling2D((2, 2)),
  layers.Conv2D(64, (3, 3), activation='relu'),
  layers.Flatten(),
  layers.Dense(64, activation='relu'),
  layers.Dense(10)
])

The model is compiled using Adam optimizer, Sparse Categorical Cross-Entropy loss, and Accuracy as the metric. Let’s delve into these three key components:
▪ Adam Optimizer: This optimizer is used to update network weights efficiently for large datasets and automatically adjusts the learning rate during training.
▪ Sparse Categorical Cross-Entropy Loss: This loss function is used for multi-class classification, where each class is represented by a single integer, which offers memory efficiency for problems with many classes.
▪ Accuracy Metric: It measures the ratio of correct predictions to total predictions, which offers a straightforward but potentially misleading metric for imbalanced datasets.

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

The model is trained for 10 epochs using training and validation data, which records accuracy and loss per epoch, as seen below.

# Train the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

Epoch 1/10
1563/1563 [==============================] - 15s 9ms/step - loss: 1.5258 - accuracy: 0.4379 - val_loss: 1.2129 - val_accuracy: 0.5595
Epoch 2/10
1563/1563 [==============================] - 16s 10ms/step - loss: 1.1607 - accuracy: 0.5902 - val_loss: 1.0670 - val_accuracy: 0.6195
Epoch 3/10
1563/1563 [==============================] - 16s 10ms/step - loss: 1.0169 - accuracy: 0.6429 - val_loss: 1.0025 - val_accuracy: 0.6526
Epoch 4/10
1563/1563 [==============================] - 15s 9ms/step - loss: 0.9235 - accuracy: 0.6767 - val_loss: 0.9296 - val_accuracy: 0.6740
Epoch 5/10
1563/1563 [==============================] - 15s 9ms/step - loss: 0.8563 - accuracy: 0.6998 - val_loss: 0.9376 - val_accuracy: 0.6714
Epoch 6/10
1563/1563 [==============================] - 15s 10ms/step - loss: 0.7990 - accuracy: 0.7214 - val_loss: 0.9069 - val_accuracy: 0.6883
Epoch 7/10
1563/1563 [==============================] - 15s 9ms/step - loss: 0.7457 - accuracy: 0.7413 - val_loss: 0.9674 - val_accuracy: 0.6690
Epoch 8/10
1563/1563 [==============================] - 14s 9ms/step - loss: 0.7081 - accuracy: 0.7532 - val_loss: 0.8844 - val_accuracy: 0.6925
Epoch 9/10
1563/1563 [==============================] - 15s 9ms/step - loss: 0.6684 - accuracy: 0.7637 - val_loss: 0.8768 - val_accuracy: 0.7033
Epoch 10/10
1563/1563 [==============================] - 15s 9ms/step - loss: 0.6304 - accuracy: 0.7787 - val_loss: 0.9212 - val_accuracy: 0.6938

Figure 1 visualizes the model’s training history to trace accuracy changes over time on both training and validation data.

# Plot training history
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Figure 1: Demonstrating the consistency performed among training and validation accuracy of a model over several epochs.

Figure 2 visualizes the model’s training history to trace loss changes over time on both training and validation data.

plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

Figure 2: Demonstrating the consistency performed among training and validation loss of a model over several epochs.

model.evaluate(): It is used to assess and report the model’s test accuracy, which is approximately 69.38% for this model.

# Test the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'\nTest accuracy: {test_acc}')

Figure 3 displays the first 5 images with their actual and predicted labels, which converts its output for each category.

# Class names in CIFAR-10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
               'dog', 'frog', 'horse', 'ship', 'truck']

# Make predictions
predictions = model.predict(test_images)

# Visualize the first 5 images and their predicted labels
plt.figure(figsize=(15, 3))
for i in range(5):
  plt.subplot(1, 5, i+1)
  plt.imshow(test_images[i], interpolation='nearest')
  plt.title(f"Actual: {class_names[test_labels[i][0]]}\nPredicted: {class_names[np.argmax(predictions[i])]}")
  plt.axis('off')
plt.show()

Figure 3: Demonstrating first 5 images with actual and predicted labels.

We should conduct further tests to optimize the CNN model for the CIFAR-10 dataset. What steps are needed for this fine-tuning?

To optimize our CNN model for CIFAR-10, consider these key areas:

Layer Adjustments: Modify the number of layers and filters. Experiment with kernel sizes (e.g., 1x1, 3x3, 5x5).
Activation Functions: Try different functions like Leaky ReLU or ELU.
Pooling Layers: Experiment with the size and type, like Average Pooling.
Dense Layers: Adjust the number of neurons in the dense layer.
Output Layer: Ensure compatibility with CIFAR-10’s 10 classes.
Optimizer: Test different optimizers and their settings, like learning rate.
Loss Function: Confirm the suitability of SparseCategoricalCrossentropy for our data format.
Regularization: Implement dropout layers or L1/L2 regularization to prevent overfitting.
Batch Normalization: Use it to normalize layer inputs.
Learning Rate and Scheduling: Experiment with different rates and schedules.
Data Augmentation: Apply techniques like flipping and rotation.
Training Configuration: Vary batch sizes and epoch numbers.

In my next post, I’ll explore predicting future air quality trends using the Long Short-Term Memory (LSTM) model in a time series analysis.

Easy Guide to Image Recognition with CNN

Written by Neda Peyrone, PhD

No responses yet