Logo Recognition with the Use of Deep Convolutional Neural Networks

—Automatic logo recognition is gaining importance due to the increasing number of its applications. Unlike other object recognition tasks, logo recognition is more challenging because of the limited amount of the available original data. In this paper

INTRODUCTION Logos are symbols that are generally used by firms to identify themselves and their products. They normally contain colors, shapes, textures, and/or text [1]. Logo recognition is a key problem for many applications such as copyright infringement detection, online brand management, vehicle recognition, contextual advertisement placement, etc. [2,3]. Although companies do not change their logos often, and it is only the context in which the logo appears that changes for each product of the same company, logo recognition is still a challenging task. Some of the challenges for accurate logo recognition are perspective deformations, varying background, occlusions, warping, varying size, varying colors, etc. [1]. Moreover, the growing number of brands having personalized logos makes the logo recognition task even more challenging. The logo recognition systems require high computational powers to support multi-class classification efficiently.
Traditionally, Artificial Intelligence (AI) techniques have been used to solve object recognition problems. In particular, Convolutional Neural Networks (CNNs) with deep structure and many hidden layers is a very popular model that is commonly used for solving object recognition problems [18,19]. The approach followed by these techniques is based on two important tasks: feature extraction and feature classification. These tasks are commonly achieved using the convolutional layers as feature extraction modules and the Fully Connected (FC) layers for classification [17]. Many techniques derived from CNNs have been used for solving the problem of logo recognition. For example, authors in [4,5] used pre-trained CNNs for logo recognition. However, such techniques have high computational overhead associated with them. This limits the context in which such computational intensive solutions can be used. Hence, the problem of accurate logo recognition using low computational effort remains unresolved.
In this paper, the transfer learning technique was applied to a Deep Convolutional Neural Network (DCNN) model to ensure logo recognition without using huge computation resources and the results of the accuracy comparison with the state-of-the-art methods show that the proposed method performs similarly.
II. RELATED WORKS Many solutions have been proposed for the problem of accurate logo recognition. Earlier works on logo recognition were mainly based on keypoint detectors and descriptors. A feature bundling was proposed by Romberg and Lienhard [6] for scalable logo recognition. Their method combined both local features and those from the spatial neighbourhood into a Bag of Words (BoW). Similarly, Romberg et al. [7] proposed another logo recognition system based on the relative spatial layout of local features encoding and indexing (e.g. edges and triangles). The local features and the spatial layout helped them to quantize the regions in the logos. Francesconi [8] presented a Recursive Neural Network-based technique for the classification of black and white logos. The method also used contour trees to hold the topological structural information. Although Francesconi's method was efficient, its main limitation is that its performance for more complex colored logos is not known. Moreover, this approach assumes that the maximum number of children for each node are known in advance. Duffner and Garcia [9] also proposed a CNN-based solution for recognizing logos in television programs. In this technique, pixel values were directly fed into a CNN with two convolution layers to detect watermarks on television. The main limitations of this technique are its low detection rates and its limited applicability (i.e. only television logos can be detected). Zhu and Doermann [10] used Fisher classifiers for recognizing logos in documents. A problem with their technique is that it cannot handle large variations [11]. Authors in [3][4][5]12] proposed Deep Learning-based solutions for the problem of accurate logo recognition. Authors in [4,5] relied on pre-trained CNNs and synthetically generated data for logo detection. Similarly, authors in [3] proposed and evaluated several network architectures while authors in [12] used pretrained CNN models along with Fast Region-Based Convolutional Networks. The main limitations of these techniques are that they rely on pre-trained CNNs. A major limitation is that the training data available for logo recognition was limited. Authors in [2] proposed a Deep Learning-based logo recognition. The proposed recognition pipeline includes a logo region proposal module followed by a CNN module that is trained for logo identification. This method can also handle logos that are not localized. However, its accuracy is still limited.
III. PROPOSED ARCHITECTURE FOR LOGO RECOGNITION CNNs are one of the most powerful Deep Learning models. A CNN is composed of multiple layers of Convolution, Pooling, ReLU correction and Fully-Connected (FC) stocked in a robust manner. CNN models have demonstrated an impressive ability to generalize large datasets with millions of images. The input image passes through multiple hidden layers to be filtered, corrected, and compressed many times, to finally form a vector. For the classification task, the output vector presents the probabilities of class membership. All CNNs must start with a convolutional layer and end with a fully-connected layer. The intermediate layers can be stacked in different ways, provided that the output of one layer has the same structure as the input of the next. In general, a CNN stacks multiple Convolution and ReLU correction layers then adds a Pooling layer (optional), and repeats this pattern multiple times. Then, it stacks the FC layers. The more layers there are, the more "deep" the neural network is.
To get more accurate CNNs and achieve better results, we must use a model that can learn more competitive representations without a dramatic increase in network parameters. As we tackle recognition with a limited amount of original data, we are interested in efficient representations that can be achieved with a small number of parameters. In effect, exploring the Densely Connected Convolutional Networks (DenseNet) [14] can lead to efficient results. In DenseNet, the original CNN layers are replaced by dense blocks and transition layers except for the first convolutional layer. It outperforms the state-of-the-art CNN models in the classification task on many datasets using a lower complexity network. A DenseNet architecture with three dense blocks and two transition layers is illustrated in Figure 1. DenseNet is a revolutionary Deep CNN with a stack of dense blocks and transition layers. A dense block is a group of convolution layers in which the layers are densely connected. Each layer receives as input all previous layers' outputs. A single dense block packs Convolution layers followed by a ReLU activations layer and a Batch Normalization layer. To reduce the size of the feature maps, DenseNet uses transition layers which are composed of a Batch Normalization layer, followed by a 1×1 convolution and a 2×2 average pooling. The transition layer reduces the height and the width dimensions but leaves feature dimensions the same. The DenseNet stacks hundreds of layers without any optimization complexity. Thus, DenseNet is one of the best deep CNN models for image classification and object recognition tasks. In order to classify logos, our custom-made model consists of four dense blocks and three transition layers. Figure 2 presents the proposed CNN model. Before the first dense block, a 7×7 convolution layer and a 3×3 Max Pooling layer are performed on the input images to extract more important features in order to detect small variations in the image. Between each consecutive dense blocks, a transition layer composed of a 1×1 convolution followed by 2×2 average pooling layers was used. A 7×7 global average pooling layer with stride of 2 is placed after the fourth dense block to fix the size of the feature maps to be connected to the fully connected layer. Finally, the transfer learning technique was used to configure the output layer for the logos classes instead of the original ImageNet dataset classes [15]. collection of input images is limited, it is hard to train the CNN from scratch with random weight initialization. In effect, the number of parameters to learn is much higher than the number of images and this may lead to the risk of overfitting. So, implementing this technique is very effective and it is widely used in practice. It requires having a CNN already trained, preferably on a problem close to the one we want to solve.
IV. EXPERIMENTS AND RESULTS In order to evaluate the proposed model, the FlickrLogos-32 logo recognition dataset [7] was used for training and testing. FlickrLogos-32 is a publicly-available dataset with a collection of 8240 real-world images of 32 different brand logos from Flickr and it was built for the evaluation of logo retrieval and multi-class logo detection and recognition systems. The dataset was divided into three separated subsets:  The weights of our custom-made network were initialized with a pre-trained DenseNet model on the ImageNet Dataset. Then, the network was fine-tuned on the FlickerLogo32 training set. In effect, the pre-trained weights were loaded to the first convolution layer, the dense blocks, and transition layers while the weights of the output layer were updated until optimizing the loss function. Then, the entire network including the first convolution layer was fine-tuned using the validation set. After the training of our model, the test set P3 was used for testing. Table I shows the achieved accuracy of the custommade network compared to existing models. The proposed model achieves state-of-the-art performance with high classification accuracy. We can see that 92.8 % of average accuracy was achieved compared to 91.7 % achieved in [2]. V. CONCLUSION Logo recognition is considered an important task for many applications. In this paper, a Deep CNN model was designed for logo recognition based on the DenseNet model. The proposed model was trained and tested on the FlickerLogo32 dataset. The obtained results were very encouraging compared to the state-of-the-art works. The proposed logo recognition and classification model can be used in many applications like online brand management and contextual advertisement placement. As potential future works, the design of a real-time logo recognition system for a mobile application may be considered.