Pneumonia and Eye Disease Detection using Convolutional Neural Networks

—Automatic disease detection systems based on Convolutional Neural Networks (CNNs) are proposed in this paper for helping the medical professionals in the detection of diseases from scan and X-ray images. CNN based classification helps decision making in a prompt manner with high precision. CNNs are a subset of deep learning which is a branch of Artificial Intelligence. The main advantage of CNNs compared to other deep learning algorithms is that they require minimal pre-processing. In the proposed disease detection system, two medical image datasets consisting of Optical Coherence Tomography (OCT) and chest X-ray images of 1-5 year-old children are considered and used as inputs. The medical images are processed and classified using CNN and various performance measuring parameters such as accuracy, loss, and training time are measured. The system is then implemented in hardware, where the testing is done using the trained models. The result shows that the validation accuracy obtained in the case of the eye dataset is around 90% whereas in the case of lung dataset it is around 63%. The proposed system aims to help medical professionals to provide a diagnosis with better accuracy thus helping in reducing infant mortality due to pneumonia and allowing finding the severity of eye disease at an earlier stage.

INTRODUCTION A medical image based disease detection system using CNN is proposed in this paper. The suggested system has the ability of detecting pneumonia and eye disease from X-rays and scan images respectively. The novel feature of the proposed system is that it has been implemented using low cost hardware. In [1], a diagnostic system is proposed for detecting retinal diseases. The result shows that the performance of the proposed method is comparable to that of human experts. However, the implementation of the system using hardware is not suggested. A computationally efficient algorithm is introduced in [2]. Adam stochastic optimization method is used to train the neural network. Empirical results demonstrate that Adam works well in practice and compares favourably to other stochastic optimization methods. In [3], the effect of the convolutional network depth on its accuracy is investigated and changes in architectural configuration which improve the accuracy of the algorithm are proposed. A deep-learning-based approach to detect diseases and pests in tomato plants using images is presented in [4]. The images are captured in-place by camera devices with various resolutions and are processed. The experimental results show that the proposed system can effectively recognize nine different types of diseases and pests in tomato plants. In [5], the Face Detection and Face Recognition pipeline framework (FDREnet) is proposed which involves face detection through histograms of oriented gradients and uses Siamese technique and contrastive loss to train a deep learning architecture. However, disease detection is not investigated in this paper. On the other hand, a review of the applications of AI in soil management, crop management, weed management and disease management can be seen in [6], but disease management and disease detection in humans using AI are not investigated.

II. DATASETS USED
In order to test the proposed idea, two datasets were considered. The Lung dataset consisted from images from [7] and the eye dataset from images from [8]. Data are essential to train any neural network. The neural network, apart from other parameters, is only as good as the data it is trained on. For training the CNN, medical image data are used. Two different kinds of publicly available medical image datasets are considered for training two convolutional neural networks. OCT images in the iris region of the eye are considered for eye disease detection. OCT is a non-invasive method of capturing biological tissues using low-coherence light. It can capture two dimensional and three dimensional images of micro meter level. The images of the OCT scan are classified under 4 categories: i) choroidal neovascularization, ii) diabetic macular edema, iii) multiple drusen, and iv) normal. Choroidal neovascularization is the creation of new blood vessels in the choroid region of the eye. This problem is a major cause of vision loss. Macular edema is build-up of fluid in an area in the center of the retina. This build up causes the macular to thicken, distorting vision. Drusen consists of multiple deposits under retina. Drusen is a fatty protein made up of lipids. Having drusen may increase the possibility of age-related macular degeneration. The dataset contains normal/healthy iris scan images too. The images are collected from [8] dataset which contains more than 5GB of 84438 images from [9,10] which are classified on the above mentioned categories. Chest x-ray images of children belonging to 3 classifications: i) viral pneumonia, ii) bacterial pneumonia, and iii) normal were taken from [7] and are considered in this study. Pneumonia is an infection that accumulates in the lung's air sacs causing hindrance for breathing. The lung image dataset contains 1GB of 5238 images belonging to the 3 above mentioned categories. Both datasets were split into three sets: train, test and validation. Figure 1 describes the training of a neural network. The given data is split into training, validation and testing data with each utilizing 70%, 20% and 10% of the data respectively. After each iteration of training, the neural network is tested with the validation data to see its performance at that instant. After completing the whole training process, the performance is evaluated using the testing data. This proposed method is heavily inspired from [11] and a similar neural network with the lung data which was presented in [12]. The image datasets [7][8][9][10] are first collected and annotated or labeled in order to distinguish the normal images from images with diseases. To generate the training dataset the existing labeled data are further used to generate a new dataset using a technique called augmentation. Annotated and augmented data are used for training the proposed neural network. Block diagram of the proposed system III. CONVOLUTIONAL NEURAL NETWORKS CNNs [3] are a type of deep artificial neural networks, used mainly to identify and cluster images, and perform object recognition. A CNN consists of image processing layers and neural network layers namely: (a) convolutional layer, (b) pooling layer, (c) flattening layer, (d) ReLU layer, and (e) Softmax layer. These layers are described briefly below.

A. Convolutional Layer
The convolutional layer is the main building block of a CNN. The layer's parameters consist of a set of user-defined learnable filters (or kernels), which is generally a 3×3 matrix, and iterates through each submatrix of the input. The number of input filters used is generally of the order of 2N. During a forward pass, each filter is convolved across the dimensions of the input image matrix, the mathematical function carried out being dot product and thus producing a 2-dimensional featureextracted matrix of that filter. This reveals various details like vertical or horizontal edges of the images which are extracted and fed into the next layer. The weights that are used are generated randomly using the Glorot uniform distribution function. Figure 2 shows the filters. Figure 3(c) demonstrates the output image when an input image, shown in Figure 3(a) is convolved with the one of the above displayed filters. The 32 filters used in the proposed CNN

B. Pooling Layer
Another important concept used in CNNs is pooling, which is a form of non-linear down-sampling. Out of the several pooling functions analyzed in [13], max-pooling is the most effective. Max-pooling partitions the input image into a set of (n×n) (generally 2×2) sub matrices and the output is the maximum value. The convolved image is first converted into arrays and then maxpooling is performed. Figure 3 displays the convolution and maxpooling steps. In maxpooling the dimensions of the image are reduced from a 50×50 matrix to a 24×24 matrix.

C. Flattening Layer
The output from the pooling layer will be in a matrix form which can't be fed into the neural network. The flattening layer converts the n×n matrix from the pooling layer into a n 2 ×1 matrix which is a compatible format to be fed into the neural network.

D. RELU Layer
ReLU is the abbreviation of Rectified Linear Unit, which applies a non-saturating activation function. These functions remove negative values of weights by replacing them with zero. It increases the nonlinear properties of the decision function. This activation function is used in input and hidden layers of the neural network. The type of ReLU used is leaky ReLU. ReLU as explained in [14] is used in the neural network layers. Figure 4 shows the leaky ReLU activation function. Mathematically, the Leaky ReLU can be defined as: Graphical representation of the ReLU

E. Softmax Layer
This layer is predominantly used when the neural network solves multiclass-classification problems. It usually consists of a number of output nodes with Softmax as activation function. Softmax function assigns probability to each node in the output layer. These probability values are normalized to one. The node with highest value is the prediction of the neural network. The ReLu layer and the Softmax layer both use backpropagation [15] and forward propagation to train the CNN. Figure 5 shows the softmax function.
Mathematically, softmax function can be defined as: where i= 1, 2,….k and z= z 1 , z 2 ,…..z k . Equation (3) shows the standard exponential function to each element z i of the input vector Z and normalizes these values by dividing by their sum. This normalization ensures that the sum of the components of the output vector σ(z) is 1.

1) Loss Function
Loss function or cost function generally is the difference between the actual output and the predicted output. The main aim of the loss function is to reduce error. i.e. to minimize the difference between the predicted value and the actual value. The loss function predominantly used in both datasets is mean squared error. In this method, the difference between the predicted and the actual output is squared. It is better than the gradient descent methods for decreasing loss [16]. The sum of all these squares is divided by their total number. Mathematically this can be represented as where n is the number of inputs, Y i is the actual output and Ŷ i is the predicted output.

2) Optimizer
The optimizer is a function which is guided by the loss function to update the weights so that the loss is minimized. It does so by changing the learning rate after every iteration in accordance with the calculated loss function. The weights of each node change based on the learning rate. If the learning rate is too fast, the neural network may not learn enough to generalize. If the learning rate is too low, the neural network may learn very slowly. The neural network needs to learn in an optimum speed and optimum manner and that is helped by the optimizer function. The optimizers used were the Adam optimizer and the Root Mean Square Propagation optimizer.

3) Adam Optimizer
It is one of the best optimizers available. It is computationally efficient, it augments optimized learning and has very little memory requirements. Adam [2] stands for adaptive moment estimation. Instead of changing the weights based on the first moment (mean) alone or based on the second moment (variance) alone, this uses both first moment and second moment to update the learning parameters: θ t+1 = θ t -√ṽ୲ ା ࣟ m̂୲ (5) where m t and v t are first moments and second moments respectively, and η is the learning rate.  Table I shows the results for the lung dataset [7]. The architecture comprises of an input layer, multiple hidden layers and an output layer. The training accuracy, training losses, validation accuracy and validation losses with respect to the number of iterations used for simulation are listed. The simulation is performed using Python Integrated Development Environment (IDE) Spyder. The maximum validation accuracy obtained in the case of the lung dataset is only around 63% with 10 epochs/iterations and 5215 steps per epoch. This result can be further improved with larger size dataset. In Table II, the complete architecture consisting of two pairs of convolution layers (named as conv2d_13 and conv2d_14), maxpooling layers (named as max_pooling2d_13 and max_pooling2d_14), and a flattening layer are shown. The optimum artificial neural network consists of an input layer consisting of 7 nodes and an output layer consisting of 3 nodes for each classification.  Tables II and IV indicates the number of input weights that is processed through that one given layer. Total params is the sum of all the input weights in the total architecture of the neural network. The output shape denotes the number of inputs at a time (given by none) followed by the expected shape of the input. Table III shows the observations for eye dataset [8][9][10]. The maximum validation accuracy obtained in the case of the eye dataset is around 90% which can be further improved with a larger size dataset. Table IV shows the summary of the neural network model which yielded the best parameters during training and validation for eye disease detection. The maximum number of epochs used for the eye dataset ranged from 5 to 64, and the maximum validation accuracy was obtained for 15 epochs. It can be seen from Table III that for the eye dataset the optimizer predominantly used was the Adam optimizer. In the 5 th trial of Table III, the RMS Prop optimizer was used. The loss was the Mean Squared Error loss function with the exception of the 8 th trial where categorical cross entropy loss function was used. The complete architecture consists of two pairs of convolution layers (named as conv2d and conv2d_1) and Maxpooling layers (named as max_pooling2d and max_pooling2d_1) and a flattening layer named as Flatten.

B. Eye Dataset
The ANN has an output layer consisting of four nodes, one for each kind of classification. The output shape consists of a 4 dimensional array for the 2 pairs of convolutional and maxpooling layer. The 1 st dimension (denoted by None in all the given pairs) is the number of inputs that will be fed into that given layer at that particular given time. It is mentioned as None because these observations were taken after training, when there was not any input to be fed into at that instant of time. The rest of the 3 dimensions mention the dimensions of a single input unit. The same holds true for remaining flattening and neural network layers.

V. DEPLOYMENT
The neural networks which yielded the best parameters were saved in h5 format and were deployed in a Raspberry Pi which uses Raspbian with features to program in Python 3.5.3. A simple Graphic User Interface (GUI) was made where the user was asked to enter the directory of the image and the neural network would make the prediction and display the result. Snapsots of the results and of the GUI output for both datasets can be seen in Figures 6-11. The complete setup used to implement the proposed system using hardware is shown in Figure 12. The hardware part includes the Raspberry Pi board for interface with the GUI using Tkinter library in Python IDE. Further, there are two ways to connect the LCD to the Raspberry Pi board: 4 bit mode and 8 bit mode. In this work, 4 bit mode was used in which the byte to be sent is split into two sets that (upper bits and lower bits) of 4 bits each which are sent one by one over 4 data wires.  GUI output predicting the given X-ray image has viral pneumonia Fig. 11. GUI output of the neural network predicting the given X-ray image has bacterial pneumonia Figures 13 and 14 show the eye disease detection system and the pneumonia detection system implemented in hardware. Result obtained for human chest X-ray of bacterial pneumonia VI. CONCLUSIONS AND FUTURE WORK A medical image based disease detection system using Convolutional Neural Networks is proposed and developed. The eye disease detection system effectively classifies the normal eye images and eye images with diseases like choroidal neovascularization, diabetic macular edema, and multiple drusen. Lung image dataset [7] consisted of bacterial pneumonia, viral pneumonia, and normal lung X-ray images of children in the age group of 1-5 years. The training model was simulated using Python libraries like Tensorflow, Keras, Skimage, etc. to improve training speed. The enhanced speed www.etasr.com Chakraborty & Tharini: Pneumonia and Eye Disease Detection using Convolutional Neural Networks of the training model yielded the real time implementation of the systems more suitable. The proposed system has the potential to be used in generalized high-end applications in biomedical imaging and provides a cost effective solution at a single board computer (Raspberry Pi). Regarding future work, focus will be given in improving the current results. Another promising application will be to extend the idea for identification of various diseases not only in humans but also in plants and crops.