Introduction
ALEXNet Architecture won Imagenet competition in 2012. Alex Krizhevsky, Geoffrey E. Hinton, and Ilya Sutskever proposed this model in the ImageNet Classification with Deep Convolutional Neural Networks research paper. Alexnet achieved a top-5 error of 15.3%. This was 10.8% lower than that of the runner-up.
Alexnet used a large, deep convolutional Neural network to train and classify the 1.2 million high-resolution images in ImageNet. This was the first convolutional architecture that used GPU to boost the training performance.
Alexnet Architecture consists of 5 convolutional layers, 3 max-pooling layers, 2 fully connected layers, and 1 softmax layer.
Objectives
- Understand the architecture of Alexnet.
- Why does Alexnet Architecture give better results?
AlexNet Architecture
AlexNet Architecture is a convolutional Neural Network consisting of 5 convolutional layers, 3 max-pooling layers, 2 fully connected layers, and 1 softmax layer. Let us understand each layer in depth.
Input Layer
- The input size is 227*227*3.
- 3 corresponds to the RGB channel
- Input size is fixed due to the presence of fully connected layers
Layer 1
- This is a convolutional layer with 96 filters of size 11*11.
- There is no padding and stride is equal to 4.
- To calculate output, output = ((Input + 2*padding – filter size)/stride) +1
- Here Input = 227 , padding = 0 , filter size = 11, stride = 4.
- Output = ((227 + 2*0 – 11)/4) + 1 = 55
- Output feature map: 55*55*96
Layer 2
- This is a max pool layer of filter size 3*3 with stride 2. There is no Padding.
- Output = ((55 + 2*0 – 3)/2) + 1 = 27
- Output feature map: 27*27*96.
Layer 3
- This is a convolutional layer with 256 filters of size 5*5.
- Padding is equal to 2 and stride is equal to 0.
- Output = ((27 + 2*2 – 5)/1) + 1 = 27.
- Output feature map: 27*27*256.
Layer 4
- This is a max pool layer of filter size 3*3 with stride 2. There is no Padding.
- Output = ((27 + 2*0 – 3)/2) + 1 = 13
- Output feature map: 13*13*256.
Layer 5
- This is a convolutional layer with 384 filters of size 3*3.
- Padding is equal to 1 and stride is equal to 0.
- Output = ((13 + 2*1 – 3)/1) + 1 = 13.
- Output feature map: 13*13*384.
Layer 6
- This is a convolutional layer with 384 filters of size 3*3.
- Padding is equal to 1 and stride is equal to 0.
- Output = ((13 + 2*1 – 3)/1) + 1 = 13.
- Output feature map: 13*13*384.
Layer 7
- This is a convolutional layer with 256 filters of size 3*3.
- Padding is equal to 1 and stride is equal to 0.
- Output = ((13 + 2*1 – 3)/1) + 1 = 13.
- Output feature map: 13*13*256.
Layer 8
- This is a max pool layer of filter size 3*3 with stride 2. There is no Padding.
- Output = ((13 + 2*0 – 3)/2) + 1 = 6
- Output feature map: 6*6*256.
Layer 9
- This is a fully connected neural network with 4096 neurons.
- we pass input of size 9216(6*6*256).
- The activation function used is Relu.
Layer 10
- This is a fully connected neural network with 4096 neurons.
- The activation function used is Relu.
Output
This is a fully connected neural network with 1000 neurons.
- The activation function used is Softmax.
Why does Alexnet Architecture give better results?
Deep Architecture
AlexNet introduced a deep architecture with eight layers, including five convolutional layers and three fully connected layers. This deep structure allowed the model to learn complex features and hierarchical representations of images, which were crucial for achieving high accuracy.
ReLU(Rectified Linear Unit)
AlexNet Architecture used ReLU instead of tanh. ReLU-based deep convolutional networks are trained several times faster than tanh and sigmoid functions.
Dropout Regularization
AlexNet introduced dropout as a regularization technique that, randomly drops units during training to prevent overfitting. This made the model more robust and generalized to understand unseen data better.
Data Augmentation
To avoid overfitting, AlexNet introduced data augmentation techniques like random cropping and horizontal flipping to artificially increase the size of the training dataset. This increased the diversity of the training data, allowing the model to learn more generalized features.
Local Response Normalization (LRN)
After using ReLU f(x) = max(0,x), It can be seen that the value after the activation function has no range like tanh and sigmoid. So a normalization will usually done after ReLU and AlexNet introduced LRU.
LRU is a steady proposal, one method in neuroscience is called “Lateral inhibition”, which talks about the effect of active neurons on their surroundings.
Use of GPUs
AlexNet was one of the first architectures to use, GPUs (Graphics Processing Units) for training. This enabled the model to handle large-scale datasets like ImageNet and train faster compared to previous methods that relied on CPUs
Comments