Chest X-Ray Classification of Lung Diseases Using Deep Learning

Chest X-ray images can be used to detect lung diseases such as COVID-19, viral pneumonia, and tuberculosis (TB). These diseases have similar patterns and diagnoses, making it difficult for clinicians and radiologists to differentiate between them. This paper uses convolutional neural networks (CNNs) to diagnose lung disease using chest X-ray images obtained from online sources. The classification task is separated into three and four classes, with COVID-19, normal, TB, and viral pneumonia, while the three-class problem excludes the normal lung. During testing, AlexNet and ResNet-18 gave promising results, scoring more than 95% accuracy.


Introduction
Corona virus disease 2019 (COVID-19) was identified in December 2019 in Wuhan, China. It has been identified as a highly infectious respiratory disease, which was already prevalent throughout the world [1]. The number of cases caused by the virus has increased dramatically. As a result, the World Health Organization (WHO) declared a pandemic on March 11, 2020. COVID-19, unfortunately, has similar characteristics to viral pneumonia and tuberculosis (TB), which makes its prevention and countermeasures more difficult. In fact, TB patients will be diagnosed similarly to COVID-19 and viral pneumonia patients as they have similar symptoms [2]. All of the above-mentioned lung diseases are characterized by a severe cough and a high fever. As a result, manually observing them is challenging.
The most frequent testing method for COVID-19 detection is a nasal and throat swab. However, due to the increased demand, the quantity of testing kits was in great demand, and supply was limited. As a result, alternative detection technologies such as chest X-ray imaging and CT-scanning have been considered [3].
The task of classifying chest X-ray images into one of the above-mentioned lung diseases has become more feasible as machine learning techniques advance. In particular, deep learning has been used to perform this task. Deep learning is a subset of machine learning that mimics the architecture of the human brain by generating patterns that can be used in decision making. A convolutional neural network (CNN) is a type of deep learning algorithm that specializes in using input images for classification purposes.
DarkCovidNet was proposed as a DarkNet-19 modification to improve its effectiveness in detecting and classifying COVID-19 disease [4]. The model was able to identify binary and multi-class problems with 98% and 87% accuracy, respectively. Furthermore, the authors [5] compared eight different models in terms of their performance in classifying normal lungs and lungs infected by pneumonia and COVID-19 based on X-ray images collected. The images used were also augmented so that the results could be compared to images that were not augmented. According to the results, Densenet201 outperformed the other networks with image augmentation, while CheXNet outperformed other networks without image augmentation.
The authors of [6] evaluated 15 CNN networks to classify three types of lung X-ray images: normal, pneumonia, and COVID-19. The findings revealed that the VGG19 network had the best testing accuracy of 89.3% while requiring the least amount of training time per epoch. In [7], the authors used combinations of both public and local data sets to train and test the MobileNets and VGG-16 models. Accuracies between 89% and 96% were obtained for the combinations. ResNet50V2 and MobileNet were used in [8] to categorize COVID-19, normal, and pneumonia chest X-ray images. The results showed that 97% accuracy could be achieved. In [9], VGG-19 was used to classify normal lungs and pneumonia lungs infected by COVID-19 and bacteria using chest X-ray images. Results showed the accuracy of more than 95%.
The objective of this paper is to classify four classes of chest X-ray images, i.e., normal, COVID-19, viral pneumonia, and TB using five CNN networks: AlexNet, GoogLeNet, ResNet-18, ShuffleNet, and SqueezeNet. Furthermore, we also classify three classes of X-ray images without the normal class. By doing this, we can observe the effectiveness of the pretrained CNN networks in classifying similar-type lung diseases. For the three-class classification testing, AlexNet gives the best accuracy of 97.89%, while for the four-class testing, ResNet-18 gives the best accuracy of 97.14%.
The rest of this paper is structured as follows. In Section 2, the fundamentals of CNN are revisited. In Section 3, simulation results and analysis are performed. Finally, Section 4 concludes this paper.

Convolutional Neural Network (CNN)
A CNN architecture's structure consists of many layers that must be assembled into blocks. The convolution layer, fully connected layer, and pooling layer are the three basic layers in CNN. In implementation, the CNN architecture employs these layers in repetition. CNN requires data input in order to learn the patterns and characteristics that will allow it to discriminate between various classes during testing. Images are the most commonly utilized input data for CNN construction. Forward propagation refers to the stage at which input is converted into output by the layers of the CNN. Meanwhile, backpropagation modifies the network's weights during the training stage based on the rate of error received from each previous epoch. The backpropagation sequence begins with the input arriving at the output layer. Each neuron from the input, hidden, and output layers will be calculated at the output layers based on the randomly determined weight inside the layer.
As the name implies, the convolution layer becomes the main layer in CNN. Two parameters that define CNN are the kernel and its size. The kernel size is usually 3x3 or 5x5. The kernel will scan the image from left to right, top to bottom. Subsequently, an element-wise product of the image and kernel will be performed and summed up to result in a convolution value. The output of convolution layers is then passed to a nonlinear activation function, such as Rectified Linear Unit (ReLU).
The pooling layer is always put after the nonlinear layer. This layer can be considered as the "summary" layer which summarizes the features from the previous layers. There are two common types of pooling layer: average pooling and maximum pooling. Maximum pooling computes the maximum value of each feature map, whereas average pooling computes the average value of each feature map. There is also a global type of pooling, such as global average pooling and global maximum pooling, which implements average and maximum operations on the entire feature map [10].
Finally, the fully connected layer converts each of the preceding layers' output into a one-dimensional array of integers. These numbers are then linked to one other and to additional completely connected layers, producing the dense layer, in which each input is coupled to each output. In this paper, five CNN networks are used: AlexNet, GooGleNet, ResNet-18, ShuffleNet, and SqueezeNet. The summary of their architectures is shown in Table 1.

Research Methodology
The number of images used for training and validation in each class is 200. The chest X-ray images were collected from various sources. The COVID-19, viral pneumonia, and normal lung images are taken from [11], which contains chest X-ray images from various sources, such as Qatar, Bangladesh, Pakistan, and Malaysia. Meanwhile, the TB chest X-ray images are taken from the database of the National Library of Medicine, which contains images from Montgomery County, USA and Shenzhen No. 3 People's Hospital, Guangdong, China. Fig. 1 shows a sample of COVID-19, viral pneumonia, and TB chest X-ray images. Furthermore, the images are separated into a training set and a validation set with a ratio of 70:30. In this paper, we use an independent data set for the testing stage. The testing data set is an independent set of 100 images categorized into 3-class and 4-class sets, except for COVID-19 (85 images) due to the limitation of the images. The data set details are summarized in Table 2. An image preprocessing process is performed before training the images into the networks. The X-ray images are grayscale and need to be converted to three color channels to be suitable for the input. A computer with an AMD Ryzen 5 5400H @3GHz processor, 8GB RAM, and a 4-GB NVIDIA GeForce GTX 1650 Ti Graphic Processing Unit (GPU) is required for simulations. The training and validation settings are shown in Table 3.   Precision Recall F1-score = 2 Precision + Recall × × (4) Note that "True Positive" refers to correct prediction of the positive class and "True Negative" refers to correct prediction of the negative class. On the contrary, "False Positive" is the incorrect prediction of the positive class, and "False Negative" is the incorrect prediction of the negative class. This can be easily visualized using the confusion matrix of 2-class problems as shown in Fig. 2. For multiclasses problems, we can find overall precision, recall, and F1-score metrics by averaging the values for each class.

Simulation Results and Analysis
For three-class classfication, the training accuracies for the five networks are shown in Table  4. It can be seen that ResNet-18 has the highest accuracy. In addition, the model does not overfit, as seen from the validation loss. The testing results for this problem are depicted in Table 5. From testing results, it can be seen that AlexNet has the best accuracy, although it is relatively comparable with ResNet-18. For the four-class classification problem, the training and testing results are shown in Table 6 and 7, respectively. As seen, ResNet-18 gives the best result for both training and testing. Moreover, ShuffeNet has the same results as SqueezeNet in terms of accuracy, and they have the lowest accuracy. Based on the two tabulated findings (Tables 5 and 7), it is possible to conclude that AlexNet and ResNet-18 are two stronger model networks in this task. On the other hand, ShuffleNet is less accurate and has a lower F1-score. Table 8 shows the comparison between AlexNet and ResNet-18 in three-class problems. A precision of 93% indicates that the model can accurately predict the COVID-19 images 93% of the time. A recall of 100% indicates that the model correctly identifies all COVID-19 images. Lastly, the F1-score of 96% indicates a good measure, which shows the balance between precision and recall.

Conclusion
The goal of this paper is to use deep learning to diagnose more lung diseases that have similarities with COVID-19 based on chest X-ray images. This leads to more complicated tasks. To do these tasks, five pre-trained networks were trained, verified, and tested using data sets supplied online from hospitals all around the world. The acquired data sets were divided into three sets: training, validation, and testing. The data sets obtained for each class utilized in the network were independent from one another. The training network was separated into 3class and 4-class training. The three-class training consisted of COVID-19, TB and viral pneumonia. The four-class training consisted of normal, COVID-19, TB, and viral pneumonia. In testing, AlexNet and ResNet-18 were the two best networks in terms of accuracy.