Durian Species Classification Using Deep Learning Method

: Durian is a popular fruit in Southeast Asia, and the market offers various species of durians. Accurate species classification is crucial for quality control, grading, and marketing. However, the complexity of this task has led to the utilization of machine learning and deep learning methods. Traditional machine learning algorithms, such as K-Nearest Neighbors, Linear Discriminant Analysis, Support Vector Machines, and Random Forests, have demonstrated good accuracy, but they require extensive feature engineering. Deep learning algorithms, particularly Convolutional Neural Networks, can automatically extract features, making them less dependent on manual feature selection. This research aims to review deep learning classification algorithms, including Convolutional Neural Networks and Recurrent Neural Networks, to determine the most suitable algorithm for an efficient and accurate durian classification system. The objective is to enhance the precision and speed of durian species classification, presenting potential advantages for both durian producers and consumers. The literature review revealed that Convolutional Neural Networks outperformed other deep learning and traditional machine learning algorithms on datasets of varying sizes, achieving the highest accuracy of 98.96% through techniques like image resizing, color conversion, and additional parameters such as days harvested and dry weight. Deep learning emerges as a promising approach for robust and accurate durian species recognition, with future directions including developing models to classify durian species from different plant parts and even real-time video analysis. However, while Convolutional Neural Networks lead the way, a critical research gap exists in identifying optimal features, necessitating further investigation to refine durian species recognition accuracy.


Introduction
In Southeast Asia, the durian is often referred to as the "King of Fruits."This fruit is famous for its spiky exterior, unparalleled flavor, and unmistakable scent.The name "durian" is derived from the Malay word "Duri," which means thorn, in reference to the fruit's unique thorny appearance.While the durian tree is indigenous to Malaysia, Brunei, and Indonesia, it is also cultivated in countries such as Thailand, the Philippines, and Australia [1].
Durian is a fruit of various tree species within the Durio genus, with about 30 recognized species.Indonesia, Thailand, and Malaysia have a rich diversity of durian varieties, with over 100 in Indonesia, around 300 in Thailand, and approximately 100 in Malaysia, all officially registered with their respective governments.
Over time, Southeast Asia has seen the emergence of numerous durian cultivars, achieved through cloning, chosen for specific characteristics such as fruit shape, size, aroma, color, texture, taste, and tree attributes.These cultivars are commonly given a common name and a code number, often starting with "D" [2].The Malaysian Federal Agricultural Marketing Authority (FAMA) has been registering durian species since 1934 and currently maintains a list of over 200 registered durian species.Among these, there are around 13 common Malaysian durian species known for their favorable qualities such as color, texture, taste, and yield [3].
The quality of durian may be compromised due to the unregulated cultivation of the fruit, driven by increasing demand and high profitability.Over time, durian has become the most popular crop planted in Malaysia, making up to 41% of the cultivated land, or about 70,000 hectares [4].It is in high demand in parts of southern and eastern China, mostly due to its high nutritional value and the widespread popularity of its other derivative snack products, such as ice creams, moon cakes, and dumplings, as shown in Figure 1 [5].The increasing global awareness of the health benefits of durian has significantly contributed to its popularity and high demand [7].A survey [8] found that Western and Asian consumers have different perceptions of durian.Western consumers see durian as a healthy food and are more likely to purchase organically grown durians.In contrast, Asian consumers are more likely to buy durians based on their smell, taste, and texture.However, both Western and Asian consumers agree that the species of durian is important.The author argues that given the importance of the durian species to consumers, durians should be classified based on their species to meet different consumer preferences.
Fruit classification is a crucial agricultural process, but traditional manual methods are repetitive, prone to error, and require skilled workers, which is exacerbated by the COVID-19 pandemic [9].This can lead to inconsistencies in quality and efficiency.Since most fruits, including durian, are perishable, non-destructive techniques are needed to evaluate the species of the fruit.New automated methods based on computer vision and machine learning (ML) can overcome these limitations and provide a more reliable and efficient way to classify fruit.
For the durian fruit, the potential use of emerging and non-destructive techniques, such as artificial intelligence utilizing image recognition, has been the focus of researchers [7,[10][11][12][13].This is because image recognition and classification can be done on the surface of the durians without using destructive techniques like opening or sampling parts of the fruit.Although FAMA provides a reference for classifying types and categories of durian, and ML has advanced, there is still a lack of research on developing a reliable method and classifier for accurate durian species classification [11,14].The objective of this research is to review various deep learning classification algorithms, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), and recommend the best and most suitable algorithm for implementing a durian classifier in terms of high accuracy, speed, and ease of implementation.The goal of this research is to improve the accuracy and efficiency of durian species classification, which could benefit durian growers and consumers alike.

Literature Review
Artificial intelligence (AI) is a broad area in computer science, encompassing the study of computations that enable machines to perceive, reason, and act.The long-term goal of AI is to develop algorithms for machines that allow them to navigate complex environments, potentially surpassing human capabilities.AI systems are designed and trained to excel at specific tasks, demonstrating high accuracy and proficiency.As a rapidly evolving field, new applications of AI are continually being developed, impacting various industries such as healthcare, finance, manufacturing, and transportation.Examples of AI applications include predictive models, facial recognition systems, chatbots, virtual assistants, and more.As AI technology continues to advance, its impact on our lives is expected to grow [15].ML is a subset of AI focused on developing algorithms and statistical models that enable machines to perform specific tasks and improve their performance through experience and training.Unlike traditional programming with strict rules, ML algorithms learn patterns from data and make decisions based on trained knowledge, enhancing their performance over time [16].ML finds applications in image recognition, the process of identifying and classifying objects in images, and can be categorized into various types, such as supervised learning and unsupervised learning.

Supervised and unsupervised machine learning.
Supervised learning is a common and widely used type of ML.In this method, the ML algorithm is trained on a labelled dataset, where each input or data point is associated with a label corresponding to the desired output.The algorithm learns the pattern or mapping between input and output labels during training.Subsequently, the trained algorithm can make predictions on new, unlabelled data.Examples of applications utilizing supervised learning algorithms include text recognition, image classification, and face detection.
Unsupervised learning involves training the ML algorithm on an unlabelled dataset, where data points lack predefined labels.The algorithm's goal is to discover patterns, structure, and relationships within the data without external labelling.Unlike supervised learning, where all data points have pre-existing labels, unsupervised learning relies on the algorithm learning features from the data.When introduced to new data, the trained algorithm uses the previously acquired features to recognize and analyze this new data [17].A diagrammatic representation of unsupervised and supervised learning is illustrated in Figure 2. To reiterate, supervised learning tasks involve training a model on labelled data, while unsupervised learning is trained on unlabelled data, aiming to identify patterns, structures, or relationships within the dataset.The evolution of machine learning algorithms has been remarkable, particularly in recent years with advancements in deep learning.This powerful sub-field, characterized by its use of artificial neural networks, has proven effective at modelling and solving complex problems.

Deep learning.
Deep learning is a specialized sub-field of machine learning [18], focusing on the development and utilization of artificial neural networks to model and solve complex problems.The structure and functions of deep learning neural networks mirror those of the human brain, featuring multiple layers of interconnected nodes known as neurons.This hierarchy allows the computer to learn hierarchical representations of data by constructing them from simpler ones.Two key aspects define deep learning: models consisting of multiple layers of nonlinear information processing and methods for both supervised and unsupervised learning at higher, more abstract layers [19].This structure enables them to handle largescale, high-dimensional datasets effectively.Deep learning gained popularity due to significantly improved chip processing power, particularly in general-purpose graphical processing units, the vast amount and size of data required for training, and advancements in the field of machine learning and data processing research.These advances have allowed deep learning methods to solve complex, compositional nonlinear functions, learn distributed and hierarchical feature representations, and maximize the efficient usage of labelled and unlabelled data.

Comparison between machine learning and deep learning.
After researching in-depth about ML and deep learning and their algorithms, the two methods are compared to justify which of the two is more suitable for developing a Durian species classification model.Since Durian species recognition falls under image recognition and classification, Table 1 below compares ML and deep learning for general image classification.Based on Table 1, deep learning appears to be more advantageous for a Durian species recognition model.This is attributed to the considerable number of Durian species in the market that require identification.Achieving accurate classification of Durian species necessitates a significantly large dataset, a requirement that deep learning handles more effectively than traditional ML.Furthermore, the variations in Durian species, such as shape, texture, color, and size, can be easily learned by the deep learning model, as it automatically extracts features from raw data.Several studies on fruit classification [13,20,21] utilize deep learning algorithms.Researchers have observed that while ML algorithms differentiate classes effectively, they struggle with datasets containing more noise compared to deep learning.Additionally, they assert that ML models do not perform well with the various combinations of color features, a situation common in Durian species with numerous color variations.Durian colors can range from yellow to yellowish-green and green, making deep learning better equipped to handle color variances compared to ML.

Related Work and Research
Several earlier researchers have developed classification algorithms for Durians, employing diverse quantities of training data, various classification algorithms, and distinct image processing techniques, resulting in differing levels of accuracy.This section reviews and discusses these methods to identify the most optimal algorithms and image processing methods for classifying Durian species.It is worth noting that the features extracted from Durian images play a crucial role in enhancing the accuracy of the model.
Researchers in [14] utilized the K-Nearest Neighbors (KNN), a supervised learning algorithm, focusing on four durian species: D24, D101, D160, and D197, with each species comprising ten samples.The study proposed a global shape method for extracting features from a durian's base, using area, perimeter, and circularity.Sixty percent of the data was used for training, while 40% of the samples were for testing.The method achieved 100% accuracy for training, while validation testing achieved 75%.In [23], Linear Discriminant Analysis, another supervised learning algorithm, was employed for the classification of the same four durian species.Features were extracted based on shape signature and Local Binary Patterns, using 240 durian images for training and testing the model on 42,337 durian samples, achieving an accuracy of around 70%.
The K-Nearest Neighbors algorithm and digital image processing based on the Gray-Level Co-occurrence Matrix (GLCM) were used in the study [23].The researchers employed a dataset of Durian fruits called "fruit-262" uploaded by MIHAI MINUT on Kaggle, consisting of 1,600 images.They split the dataset into 1,281 training samples and 321 testing samples, concluding that the KNN method with K=3 achieved the highest accuracy of 93%.In their study [12], researchers employed two supervised learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), for image processing.They used edge detection techniques such as Canny Edge Detector and color extraction, with 180 samples for the training and validation set and 120 samples for the testing set.The results showed that SVM achieved an accuracy of 89.3%, while RF achieved an accuracy of 84.3%.
The study by [13] used Convolutional Neural Network (CNN) deep-learning to process Durian images.They resized the images and converted them to color, selecting the thorns and star-shape at the bottom of the durian shell as differentiating features.A total of 800 durian images from three species (D24, D175, and D197) were used.After testing with non-durian fruit images added in, the model achieved an accuracy of 81.25%.The researchers concluded that increasing the training dataset is expected to improve the model's accuracy.
In a study [24], researchers harnessed the power of KNN alongside the Fast Discrete Curvelet Transform for feature extraction.They utilized a dataset comprising 600 durian images, representing six distinct durian species (D24, D88, D101, D160, D175, and D197).Out of these, 480 images were allocated for training and validation, while the remaining 120 were reserved for testing.The study's findings suggest that the combination of Fast Discrete Curvelet Transform and KNN yielded the most promising results, achieving an impressive accuracy rate of 92.5%.
Learning Vector Quantization, a supervised learning technique relying on GLCM parameters, was employed [25].The study focused on extracting various durian features, including shape, slimness, colors, and textures, using a dataset comprising 300 durian images.Fifty images were allocated for training across five different durian species, and the remaining 250 were reserved for testing.The model demonstrated a commendable accuracy rate of 89%, though the researchers observed that this accuracy is influenced by factors such as lighting conditions and the procedures involved in positioning durian fruit, as well as the angle of irradiation and the retrieval of test data.
A study was conducted to assess the maturity of Durian fruit for global export and proposed a CNN approach to classify fruit based on the harvesting period [21].They used 1,500 images, split into 1,200 training images, 150 validation images, and 150 testing images.The study categorized Durians into five classes based on days harvested, and dry weight was analyzed to determine maturity.The results showed that durians harvested at 117 and 124 days were mature, with dry weights of 38.94% and 42.13%, respectively, while others were immature.The study compared CNN architectures and found that the proposed DuNet-12 architecture with ReLU activation and Adam optimizer achieved the highest testing accuracy of 98.96% and a prediction accuracy of 100%.This offers potential for future development of Durian export industry equipment.Table 2 below shows the final comparison of the various classification algorithms and image processing techniques used by the researchers.The table also includes the Durian features focused on, the amount of training and testing samples used for the algorithm, and the accuracy produced by the model.

Discussions
Durian, a popular fruit in Southeast Asia, has numerous species, making accurate classification crucial for quality control, grading, and marketing.Due to the high variance in their shapes, textures, colors, and sizes, human classification becomes challenging.Hence, the adoption of automated classification methods such as Machine Learning (ML) and Deep Learning (DL).ML algorithms like KNN, LDA, SVM, and RF have shown good accuracy in durian species classification.However, ML algorithms require meticulous feature engineering and selection, which can be time-consuming and difficult.DL algorithms, particularly CNNs, have proven effective in various image recognition tasks, including Durian species classification.
The features used for classification cover a wide array, and the algorithms' performance is evaluated across datasets of varying sizes.Training samples range from 40 to 1,200, and testing samples range from 40 to 42,337.The resulting accuracies vary from 70% to 100%, with KNN achieving a remarkable accuracy of 100% in global shape representation.CNN excels, achieving the highest accuracy of 98.96% with image resizing and color conversion, utilizing additional parameters like days harvested and dry weight.
Deep learning emerges as a promising approach for accurate and robust durian species recognition.DL algorithms handle large datasets effectively, learn complex relationships, and automatically learn features, reducing reliance on feature engineering.Several studies support the effectiveness of deep learning for Durian species recognition.Future research should explore deep learning models for classifying Durian species from different plant parts and real-time video analysis.

Conclusion
This study reviewed past research on durian species classification using machine learning and deep learning algorithms.It aimed to discover and recommend the best algorithm for implementing a durian classifier in terms of accuracy, speed, and ease of implementation.The study reveals that CNN is the most widely used and effective DL algorithm, achieving an accuracy of 98.96%, significantly higher than other DLs like RNN and traditional ML algorithms like SVMs and RFs.The study suggests CNN as a better DL algorithm for durian species classification.However, the limitation lies in not recommending specific features, urging future studies to explore different feature combinations to enhance accuracy.The combination of features should be thoroughly investigated in future studies to improve classification accuracy.

Table 1 .
Comparison between machine learning and deep learning for general image classification.

Table 2 .
Comparison of different classification algorithms used in fruits classification.

Table 2
shows the results of various classification algorithms used to classify Durian fruits based on different features.The algorithms used include K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Network (CNN), Fast Discrete Curvelet Transform, and Learning Vector Quantization (LVQ).The features used for classification encompass global shape representation, area, perimeter, circularity, feature extraction and Local Binary Pattern (LBP), Gray Level Co-occurrence Matrix (GLCM), edge detection and color extraction, image resizing and single-color channel conversion, Durian visual features from the bottom view, and curvilinear features based on thorns.The number of training samples ranges from 40 to 1200, while the number of testing samples ranges from 40 to 42337.The accuracies achieved range from 70% to 100%, with KNN achieving the highest accuracy of 100% with global shape representation.CNN achieved the highest accuracy of 98.96% with image resizing and color conversion using days harvested and dry weight.