Investigating the Use of Active Transportation Modes Among University Employees Through an Advanced Decision Tree Algorithm

Now more than ever, the health and economic benefits of active transportation (AT) are evident and several planning efforts and programs are particularly targeted at improving active transportation options for different populations, such as students and seniors. Administrative employees at universities received less attention in the literature than other population groups.This population spends a lot of time doing sedentary activities and behaviors during their working time. Thus, the present study used a C5 decision tree to examine the usage of university employees’ AT modes when they are out of campus to get to work, shopping, and leisure. The effects of the sociodemographic and living environment of employees on their AT mode choice were also examined. According to the results, walking was the most frequently used mode to get to work and leisure and public transport was the most frequently used mode to get to shopping. Transit station conditions (25), sidewalk availability and coverage (36), and bike path availability and coverage (30) were the most important factors in the use of AT modes by employees to get to work, shop, and leisure, respectively. Furthermore, several decision rules were extracted from the C5 tree, which included combinations of multiple factors.


Introduction
Active transportation (AT) refers to all forms of human-powered transportation, such as walking and cycling [1]. In addition, the [2] considers rolling and public transportation as active modes of getting to essential destinations such as school and work. AT is also considered as a recreational activity for all people. Several studies have been conducted to identify the factors that influence the use of AT modes by general and special group populations such as children, schoolchildren, women, and seniors [3][4][5][6][7][8][9][10][11]. This wide range of benefits of AT for societies motivates policymakers and urban planners to identify factors and attributes that contribute to a higher level of AT modes' usage [12]. To this end, several studies were conducted to identify the influencing factors on the usage of AT modes by general and special group populations such as children, school children, women, and seniors [13][14][15][16][17][18][19][20][21]. However, some special groups, such as employees, attracted less attention from academia in relation to the use of AT modes.
Millions of employees across the world are more likely to drive a car or motorcycle to work, leisure, or shopping each day than they are to use any other modes. This dependence on motor vehicles might contribute to negative health outcomes caused to some extent by inactivity. This dependency can also be worse in developing countries where the rate of vehicle ownership is extremely high. Malaysia, for example, has the highest vehicle ownership rate in the ASEAN region [22] and the third highest level of vehicle ownership worldwide (93%) [23]. Thus, encouraging AT modes provides an opportunity to address improving employees' health through physical activity. However, several areas in developing countries lack characteristics that make them well-suited for AT. For example, an ineffective public transport system and sprawl are significant barriers to reducing vehicle commuting in developing countries.
Staff of universities can benefit from AT promotion like other employees. Unlike the academic staff of universities who have a more fruitful schedule and activities, the administrative staff have a monotonous schedule, which might cause them to lack physical activity options in their offices at the universities. Administrative university staff might find it difficult to be physically active while at work and are less likely to engage in moderate physical activity and are more likely to engage in sedentary behaviors, such as sitting. Thus, it is important to identify the ways to promote the use of AT modes among this group. While a number of studies have examined the workplace factors that influence the use of AT modes by university employees [24], no study is available to assess the factors that affect the use of AT by this group when they are out of their workplaces. Identifying such factors would help in developing better AT programs, where university employees would be incentivized to start using the use of AT modes in their routine life.
Several studies examined factors that affect individuals' travel mode choice. Based on previous studies, factors related to sociodemographic, trip characteristics and purpose, and built environment are among the most frequently used factors to examine the travel mode choice of the general population [25,26]. Individuals' personal characteristics influence their travel mode choice both directly and indirectly [27][28][29][30][31]. Factors including age, race, education, gender, vehicle ownership, occupation, and income are regarded as directly influential on the mode of choice ( Figure 1). Besides, a number of the abovementioned factors influence the destination choice [27]. For example, in China, women were more interested in choosing a workplace near their houses than men, because they had to take care of family members and they devoted less time to work compared to men. In fact, the destination itself might define the characteristics of the trip and the activity. These characteristics, include travel distance, travel time, and travel cost. Then, a certain mode is chosen based on these characteristics. While sociodemographic factors define the trip characteristics, built environment attributes also influence the trip characteristics as well [27].
Built-environment attributes such as density and land use mixture are regarded as influential factors on travel distance and time [32]. A proper implication of density and a mixture of land use might contribute to a higher number of short trips by individuals. Indeed, these characteristics provide housing close to other activities and access to facilities within a certain radius.. Several studies confirmed that these short distances can assist individuals in making these trips using active modes such as walking and biking [33][34][35][36][37]. The mixture of land use and service density near the workplace can also reduce the use of motorized transport and encourage employees to use active modes such as walking and biking [38]. In terms of AT research, several studies examined the impacts of sociodemographic and built environment attributes on AT mode choice. Previous study showed that the impact of sociodemographic factors (in this order) with AT to/from school and to other destinations in Hong Kong adolescents [39]. They found that age had a positive association with the frequency of AT; education had a positive association with the total weekly minutes of AT; household income, vehicle ownership, and the number of motor vehicles in the household had a positive and negative association, respectively, with the weekly frequency of non-school AT. In New Among the university population, students receive more attention than staff. Zhou studied the employees' carsharing in University of California, Los Angeles (UCLA) [85]. The author identified that employees' gender and income influence their carsharing mode choice. Wang and Liu identified factors influencing public transport use by the staff in University of Queensland (UQ), and these factors were travel time and distance [86]. Travel time also were mentioned by Shannon et al. [87], who attempted to identify influential factors on the use of AT in the University of Western Australia. Uttley and Lovelace examined commuting behaviour and long-term behavioural shifts towards cycling in response to outside intervention at the organisational level in University of Sheffield [88]. They also presented information regarding the staffs' mode of travel, as well as commute distance and time.
Most AT-relevant attributes in the general population, and especially in children and seniors, can be thought of as encompassing both built environment and sociodemographic attributes.Overwhelmingly, most research examines AT modes' usage among children, seniors, and the general population through the aforementioned factors. AT mode usage by employees while they are out of their workplaces is less studied. It is important to examine the AT usage among different employees since the individual's occupation can affect the travel mode choice and AT usage [25,27,28].
In light of the above, this study was carried out through the analysis of the active travel data of university employees in Malaysia. Precisely, the aim of the study was to: (1) investigate the AT mode choice of university employees for different trip purposes; (2) identify the most important factors influencing the use of AT modes by university employees for different trip purposes; (3) identify behavioral patterns that occur within the active travel data by rule extraction technique; and (4) provide insights for the development of AT improvement strategies focused on university employees. A data mining technique was performed to analyze the AT data, which aims to extract knowledge from observed data previously unknown. It is a free assumption and does not require a priori probabilistic knowledge about the phenomena of interest. The main reason for using data mining techniques was also related to the complex character of the mode choice phenomenon. A mode choice can be viewed as a multi-factor event always preceded by a situation in which one or more built-environment and sociodemographic factors affect the mode choice decision. In fact, each mode choice is the result of a chain of factors which have some common actors in several mode choice decisions. Using data mining techniques to identify these factors and their interdependence can provide useful insights for the development of effective land use and transportation strategies.Precisely, the data mining method that we used was decision tree (DT), which is also called hierarchical tree-based regression (HTBR). Among several algorithms of HTBR, C5 was chosen to build the model to predict the AT mode choice of university employees. Concerning AT mode choice, identifying important factors is the key to developing an effective model to characterize future transport planning initiatives that promote AT modes among different groups of people.. The remainder of the paper is structured as follows: section 2 provides details of the statistical methodologies used in the analysis and describes the case study; section 3 explains the results of the C5 trees and of their decision rules; and finally, a discussion places the results in the context of the practice of urban and transport planning, and conclusions are drawn.

Data
This study was conducted between mid-July and the end of December 2018 as an online survey among administrative staff of the Universiti Teknologi Malaysia (UTM). The whole population of 3274 administrative staff received a letter from the research team explaining the aims of the study. The internet address of the questionnaire was also included in the letter. After two weeks, a reminder letter was sent to all staff to increase the response rate. 511 staff (15.6%) finally participated in the study. This study was conducted among those employees that reported using AT modes at least three times or more per week to get to work, shop, or leisure. This approach helped this study to measure how frequent users employ the AT modes to get to different destinations. Thus, 45 participants (1.37%) that reported to use AT modes to travel to work, 38 (1.16%) of those that reported using AT modes to travel to shopping, and 84 of those (2.56%) that reported to use AT modes to travel to leisure destinations were included in the study. The response rate is low, but within the expected range for an online survey census.
The study was conducted at the main campus of UTM in Skudai, which is the second largest public university campus in Malaysia ( Figure 2). The campus is about 20 km from the state capital, Johor Bahru. The connection between the UTM campus and the center of town (Skudai) is undesirable by way of walking and biking. The main road to UTM lacks bike lanes. Besides, the sidewalks on the road are disconnected and narrow. On the other hand, the road to the campus is suitable for motorized transport. Regarding public transport, the campus is only connected to Taman Universiti through the city bus system. The bus service is free only for Malaysian students. Numerous UTM students and staffs live in Taman Universiti. This neighborhood has many shops lots, offices, and restaurants. From Taman University, students and staff can access other areas of Johor Bahru through buses and taxicabs. As pointed out earlier, the present study used an online survey to investigate the UTM employees' use of AT modes and their travel frequency by these modes. This questionnaire also attempted to identify the factors associated with more usage of AT modes by university employees. Eventually, we used the outcomes of the questionnaire to construct models that can predict employees' usage of AT modes. The questionnaire included 16 items related to sociodemographic information of respondents, their living environment, and their daily travel distance, time, and cost based on different trip purposes. Table 1 presents the questionnaire items, their types and values.

Hierarchical tree-based regression (HTBR)
In the present study, the AT mode of choice is the target variable, classified into eight values. Several traditional techniques, such as multinomial logistic regression and ordinal logistic regression, are available to model the categorical variables. However, the power and accuracy of these techniques are prone to be negatively affected by the violation of their pre-defined assumptions and functions [89]. To address the aforementioned flaws, hierarchical tree-based regression (HTBR) can be used as an assumption-free option suitable for forward stepwise and non-parametric variable selection [90]. Furthermore, the HTBR is extremely efficient for problems' classification and prediction [91]. This method identifies both exclusively and exhaustive groups/subgroups of the target variable that share common characteristics and impact the dependent variable of interest.
Several algorithms of HTBR are available, including CHAID, QUEST, CART, and C5. The present study used the C5 algorithm, which was developed by Quinaln [92] and is one of the most commonly used algorithms in HTBR in transport research [93,94]. The C5 algorithm uses both continuous and categorical data as input, and categorical data as an output variable. C5 creates non-binary trees using the information gain ratio. To prepare data for analysis, we used Automated Data Preparation (ADP) in SPSS Modeler software. The ADP allows for making data ready for quick and easy model building, without requiring prior knowledge of the statistical concepts involved. The cross-validation method (10-folds) was also used to assess the trees' structure and their accuracy.
The results of this paper are reported in two forms: the decision tree and the "if-then". The latter is a presentation form of decision trees. In this format, the "if" represents an attribute set of one or more variable and "then" represents the dependent variable state. Thus, in this study, the dependent variable (consequent) is the usage of AT modes of university employees based on their trip purpose, and the independent variables (antecedent) are factors that might affect the use of these modes. Typically, the number of rules achieved is equal to the number of tree child nodes. However, we used three parameters in every possible rule to extract significant rules which can provide useful information for implementing planning policies to increase the level of AT. In order to show how often any combination of antecedent and consequent occurs in a database, a measure of support (S) is used. To measure the cases' percentage in which a consequent appears given that the antecedent has occurred, a measure of probability (P) is defined. The third parameter is population (Po), which measures the statistical dependence of the rule by relating the observed frequency of co-occurrence to the expected frequency of co-occurrence under the assumption of conditional independence (P = S/P) [95]. In this study, to obtain high quality rules, those that have P% ≥ 75, S% ≥ 0.9, Po% ≥ 1.2 were chosen as significant.

Result
The present study conducted a questionnaire survey among 511 UTM administration staff. After a careful review of the completed questionnaires, the respondents that reported to use the active modes for purposes of work, shopping, and leisure were selected to participate in this study. Thus, 45 participants that reported using AT modes to travel to work, 38 of those that reported using AT modes to travel to shop, and 84 of those that reported using AT modes to travel to leisure destinations were included in the study. The sociodemographic of the participants and built environment attributes reported by them are presented in Appendixes A and B, respectively. The frequency of daily travel distance and time by the trip purposes of participants are shown in Table 2. In work, shopping, and leisure categories, DTC of below RM 5 and DTT of 10 min and below have the greatest frequencies. While the highest DTD frequency related to travel to work and leisure is between 1 km and 5 km, the highest DTD frequency related to travel to shopping is between 5 km and 10 km.
The frequencies of the usage of AT modes are presented in Table 3. In the category of "work", walking has the greatest usage frequency, while no participant used the combination of public transport and ride sourcing. Public transport had the highest usage frequency in the "shopping" category, while no UTM employees used a combination of private vehicles and active modes or a combination of public transport and ride sourcing. In the "leisure" category, the greatest usage frequency belongs to walking, and the frequency of the combination of private vehicles and active modes is zero. Between the categories, walking for leisure has the greatest usage frequency (50%).  Figure 3 shows the relative importance of input variables for work, shopping, and leisure trips of university employees through the AT modes. These influential factors and their importance were identified through the C5 decision trees. One C5 model was developed for each trip's purpose. For work purposes, sidewalk availability and coverage (SAC) and daily travel time (TDT) were identified as the most important factors, with a relative importance of 16%. SAC is also the most important factor for shopping purposes (relative importance = 36%). For leisure trips, biking path availability and coverage (BAC) has the highest relative importance (30%). The least important factor for work is neighborhood type (N1), with a relative importance of 12%. For both shopping and leisure trips, the least important factor is neighborhood facilities within a walkable distance (NF), with a relative importance of 6% and 20%, respectively.  Figure 4 shows the C5 model that predicts the use of active transport modes for travelling to the workplace. The overall accuracy of this model is 71.11%. The tree identified eight associate rules with the use of AT modes (Table 4). Among these rules, five rules were identified as significant (P% = 75, S% = 0.9, and Po%=1.

Work purpose
2). The model shows that the node 0 is split based on the variable of transit stop conditions (TSC) and this shows that the TSC is the single best variable to predict the use of AT modes by UTM employees to travel to work. C5 directs the involved TSCs of "some bus stop shelters" and "widely available bus stop shelters" to the left, forms node 1; directs the TSC of "no shelters" to the right, forms node 14. As shown by node 14, if an employee perceives the transit stop conditions in his/her living area as "no shelters", the tree predicts that the employee's AT mode choice to work is most likely to be walking (Rule 8: P% = 68.75, S%=24.44, Po%=35.56/non-significant).
Concerning the division of node 1 and based on TSC and an employee's daily travel time (DTT) to work, this node is split based on DTT. The C5 sends the DTT value of 10 min and below to the node 2; the node 13 sends the DTT value of greater than 10 min to the nodes 13. As shown by node 13, if an employee's daily travel time to work is greater than 10 min and he perceives the transit stop conditions in his/her living area as "some bus stop shelters" or "widely available bus stop shelters", the tree predicts that the employee's AT mode choice to work is most likely to be public transport (Rule 7: P% = 50, S%=8.89, Po%=17.78/nonsignificant). Node 2 is further split in relation to sidewalk availability and coverage (SAC) and C5 sends the SAC values of "no sidewalks" and "discontinuous, narrow sidewalks" to node 3; sends the SACs of "narrow sidewalks along all major streets" and "adequate sidewalks along all major streets" to node 10. Nodes 3 and 10 are further split in relation to SAC and vehicle ownership (VO). At node 3, the C5 sends the SACs of "no sidewalks" and "discontinuous, narrow sidewalks" to nodes 4 and 5, respectively. As shown by node 4 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters" and DTT of "10 min and below", the tree predicts that the employee's AT mode choice to work is most likely to be a combination of private vehicle and public transport if an employee perceives the sidewalk conditions in his/her living area as "no sidewalks" or "discontinuous, narrow sidewalks" (Rule 1: P% = 100, S%=4.44, Po%=4.44). At node 10, the C5 sends the VO of "no" and "yes" to nodes 11 and 12, respectively. As shown by node 11 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", DTT of "10 min and below", and SACs of "narrow sidewalks along all major streets" or "adequate sidewalks along all major streets", the tree predicts that the employee's AT mode choice to work is most likely to be walking if an employee does not own a private vehicle (Rule 5: P% = 75, S%=6.67, Po%=8.89). As shown by node 12 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", DTT of "10 min and below", and SACs of "narrow sidewalks along all major streets" or "adequate sidewalks along all major streets", the tree predicts that the employee's AT mode choice to work is most likely to be a combination of walking, biking, and ridesourcing if an employee owns a private vehicle (Rule 6: P% = 60, S%=6.67" Turning to division of node 5 and conditioned on SAC, DTT, TSC, and employees' education (E1), the C5 sends the E1 values of "primary, "secondary", and "diploma" to node 6; sends the E1 values of "bachelor's degree", "master's degree", "doctorate degree" to node 7. As shown by node 6 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", DTT of "10 min and below", and SACs of "no sidewalks" or "discontinuous, narrow sidewalks", the tree predicts that the employee's AT mode of choice to work is most likely to be walking if an employee's education is primary, secondary, or diploma (Rule 2: P% = 100, S%=4.44, Po%=4.44). The C5 further splits node 7 in relation to the neighborhood type (N1) and sends the N1 values of "commercial area with some residential" and "mixed residential and commercial" to node 8; sends the N1 values of "residential only", "residential with some commercial buildings", and "residential with some industrial facilities" to node 9. As shown by node 8 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", DTT of "10 min and below", and SACs of "no sidewalks" or "discontinuous, narrow sidewalks", E1 values of "primary", "secondary", or diploma, the tree predicts that the employee's AT mode choice to work is most likely to be walking if the type of employee's neighborhood is "commercial area with some residential" or "mixed residential and commercial" (Rule 3: P% = 100, S%=4.44, Po%=4.44). At node 9 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", DTT of "10 min and below", and SACs of "no sidewalks" or "discontinuous, narrow sidewalks", E1 values of "primary", "secondary", or diploma, the tree predicts that the employee's AT mode choice to work is most likely to be public transport if the type of employee's neighborhood is "residential only", "residential with some commercial buildings", or "residential with some industrial facilities" (Rule 4: P% = 83.33, S%=11.11, Po%=13.33).     Figure 5 shows the C5 model that predicts the use of active transport modes for travelling to shopping. The overall accuracy of this model is 92.11%. The tree identified seven associate rules with the use of AT modes (Table 5). Among these rules, five rules were identified as significant (P% = 75, S% = 0.9, and Po%=1.

Shopping purpose
2). The model shows that the node 0 is split based on the variable of transit stop conditions (TSC) and this shows that the TSC is the single best variable to predict the use of AT modes by UTM employees to travel to shopping. C5 directs the involved TSCs of "some bus stop shelters" and "widely available bus stop shelters" to the left, forms node 1; directs the TSC of "no shelters" to the right, forms node 7. The nodes 1 and 7 are further split in relation to SAC and employees' race (R1), respectively. Conditioned on TSC and SAC, the C5 splits node 1 based on SAC and sends the SACs of "narrow sidewalks along all major streets" and "adequate sidewalks along all major streets" to node 2; and sends the SACs of "no sidewalks" and "discontinuous, narrow sidewalks" to node 3. As shown by node 2, if an employee perceives the transit stop conditions in his/her living area as "some bus stop shelters" or "widely available bus stop shelters", and perceives the sidewalk conditions as "no sidewalks" or "discontinuous, narrow sidewalks", the tree predicts that the employee's AT mode of choice to work is most likely to be public transport (Rule 1: P% = 73.33, S%=28.95, Po%=39.48/non-significant). Node 3 is further split based on neighborhood facilities within a walkable distance (NF). The C5 sends the NF value of "parks and other open spaces" to node 4; sends the NF values of "childcare facilities; "public transport", and "three or more facilities" to node 5; sends the NF values of "schools", "shops", "banks", "place of worship", and "two facilities" to node 6. At node 4 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", and SACs of "narrow sidewalks along all major streets" and "adequate sidewalks along all major streets", the tree predicts that the employee's AT mode choice to work is most likely to be a combination of walking, biking, and ride sourcing if the employee's neighborhood has parks and other open spaces within a walkable distance (Rule 2: P% = 100, S%=26.63). As shown by node 5 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", and SACs of "narrow sidewalks along all major streets" and "adequate sidewalks along all major streets", the tree predicts that the employee's AT mode choice to work is most likely to be walking if the employee's neighborhood has childcare facilities, public transport, or three or more facilities in a walkable distance from his/her house (Rule 3: P% = 100, S%=7.89, Po%=7.89). At node 6 and conditioned on TSCs of "some bus stop shelters" or "widely available bus stop shelters", and SACs of "narrow sidewalks along all major streets" and "adequate sidewalks along all major streets," the tree predicts that the employee's AT mode of choice to work is most likely to be public transport if the employee's neighborhood has schools, shops, banks, places of worship, and "two facilities" within a walkable distance from his/her house (Rule 4: P% = 80, S% = 10.53, Po% = 80. Turning to division of node 7, conditioned on the TSC of "no shelters" and different values of employees' race (R1), C5 divides node 7 based on R1 and directs the races of "Indian" and "others" to node 8; directs the race of "Chinese" to node 9; directs the race of "Malay" to node 10. The tree predicts that the employee's AT mode choice to shopping is most likely to be biking if the employee's race is Indian or others (Rule 5: P% = 66.66, S%=5.26, Po%=7.90). The tree continues to predict that the employee's AT mode choice for shopping is most likely to be a combination of walking, biking, and ride sourcing if the employee's race is Chinese (Rule 6: P% = 100, S%=2.63, Po%=2.63). The tree then predicts that the employee's AT mode choice to shopping is most likely to be walking if the employee's race is Malay (Rule 7: P% = 80, S%=21.05, Po%=26.32).   Figure 6 shows the C5 model that predicts the use of active transport modes for travelling to the leisure destinations. The overall accuracy of this model is 84.52%. The tree identified eight associate rules with the use of AT modes (Table 6). Among these rules, four rules were identified as significant (P% = 75, S% = 0.9, and Po%=1.2). The model shows that the node 0 is split based on the variable of biking path availability and coverage (BAC) and this shows that the BAC is the single best variable to predict the use of AT modes by UTM employees to travel to leisure. C5 directs the involved BAC of "little or none" to the left, forming node 1; directs the BACs of "some bike paths or routes" and "many cycle paths, lanes, or routes forming a network" to the right, forming node 2. As shown by node 1, if an employee perceives the biking facilities in his/her living area as "little or none", the tree predicts that the employee's AT mode of choice for leisure is most likely to be walking (Rule 1: P% = 61.66, S%=44.05, Po%=71.44/non-significant).

Leisure purpose
Turning to the division of node 2, conditioned on the BACs of some bike paths or routes "and" many cycle paths, lanes, or routes forming a network "and daily travel distance (DTD) to leisure, the C5 splits node 2 based on DTD and sends the DTD of 5 km and below to node 3; the DTD of greater than 5 km to node 9. As shown by node 9, if an employee's daily travel time to leisure is greater than 5 km, the tree predicts that the employee's AT mode choice to leisure is most likely to be a combination of walking and biking (Rule 6: P% = 75, S%=3.57, Po%=4.76). Node 3 is further split in relation to neighborhood facilities within a walkable distance (NF). C5 sends the NFs of schools, childcare facilities, public transport shops, banks, leisure facilities, parks and other open spaces, and places of worship to node 4; and sends the NF of three or more facilities to node 5. At node 4, if an employee lives in a neighborhood with the abovementioned facilities in a walkable distance from his/her house, the tree predicts that the employee's AT mode choice to leisure is most likely to be biking (Rule 2: P% = 100, S%=11.90, Po%=11.90). The node 5 is further split in relation to employees' race (R1). C5 sends the R1 of "Indian" to node 6; sends the R1 of "Chinese" to node 7; sends the R1 of "Malay" and "others" to node 8. As shown by node 6, if the race of the employee is Indian, the tree predicts that the employee's AT mode of choice for leisure is most likely to be public transport (Rule 3: P% = 100, S%=1. 19,Po%=1.19). At node 7, if the race of the employee is Chinese, the tree predicts that the employee's AT mode of choice for leisure is most likely to be biking (Rule 4: P% = 80, S%=4.76, Po%=5.95). As shown by node 8, if the race of the employee is Malay or others, the tree predicts that the employee's AT mode of choice for leisure is most likely to be biking (Rule 5: P% = 100, S%=4.76, Po%=4.76).

Discussions
Analysis results showed that the usage of AT modes is strongly sensitive to several combinations of the living environment, sociodemographic, and trip attributes. The C5 models identified transit station conditions (TSC), sidewalk availability and coverage (SAC), and bike path availability and coverage (BAC) as the most important factors affecting the use of AT modes for work, shopping, and leisure trips, respectively. These findings are in line with those of several studies that have shown the positive influence of the availability of well-conditioned transit stations on the use of public transport, particularly buses [96][97][98]. Several studies also noted the importance of sidewalk and bike path conditions in increasing people's willingness to use them [99][100][101].
The present study created several rules associated with the use of AT modes for work, shopping, and leisure trips. Each rule was a combination of factors from the living environment, sociodemographics, and trip characteristics.Concerning the work trips, the tree created five significant decision rules. According to the rules, two built environment factors, including SAC and TSC, were presented in all five rules, while another built environment factor, neighborhood type (N1), was only available in two rules. The frequent presence of SAC and TSC in all significant rules is in line with the findings of Heinen [102], who showed that the availability of comfortable walking and biking paths, as well as proximity to bus stops and busways, increase the share of commute trips involving any active travel to work. Concerning the factors of trip characteristics, the DTT was also available in all five significant rules, and its value was 10 min and below (value = 1). In general, therefore, it seems that the use of active modes is suitable for those employees that daily spend 10 min or less to get to work. This fixed value of DTT resulted in various outputs of AT. The association of other factors, including TSC, SAC, E1, N1, and VO with these decision rules, can possibly explain these differences. The association of other factors, including TSC, SAC, E1, N1, and VO with these decision rules, can possibly explain these differences. This association and the frequent presence of built environment factors, especially those that support walking and public transport (e.g., SAC and TSC), implies that these could be major factors, influencing the AT mode choice of employees to get to work. Taken together, these results suggest that the combination of SAC, DTT, and TSC can be a reliable predictor and perform better than other combinations (or a single factor) to predict the AT mode choice of employees to work. An important practical implication is that any improvement in sidewalk and transit station conditions might increase the usability of AT modes among employees to work. However, this action might not be sufficient for those who spend more than 10 minutes getting to work each day.
With regards to sociodemographic factors influencing the AT mode choice of employees to work, vehicle ownership (VO) and education level (E1) of employees were available in significant decision rules. The VO appeared in only one significant rule (rule 5), while the E1 appeared in three significant rules (rules 2-4). While a number of studies have identified vehicle ownership as a significant factor in AT participation [39,103], it is surprising that vehicle ownership is not included in more significant rules to predict AT mode choice for work trips.However, this finding does not imply that vehicle ownership is not important for this purpose. It can therefore be assumed that when an employee decides to use an AT mode for work purposes, owning a motor vehicle at home does not impact his or her AT mode choice. Concerning the employees' education level, the values of E1 were primary, secondary, and diploma (values = 1-3). It is somewhat surprising that the lower levels of education are associated with the use of AT modes. These results differ from some published studies that note that people with higher education are much more inclined to use the AT modes [39,104]. The association of other factors, including TSC, SAC, DTT, and N1 with these decision rules, can possibly explain this result. These findings suggest that higher AT levels are associated with lower employees' education level.
The C5 created five significant decision rules in regard to employees' AT mode choice to shopping. Concerning the built environment factors, TSC appeared in all five significant rules. The majority of employees used buses to get to shopping (39.5%) and possibly the design, location, and condition of the bus stops was of concern for the employees. This attention of employees to the bus stops can explain the frequent presence of TSC in the rules. TSC appeared in all the rules, but it does not mean that this factor is the only predictor of AT mode choice to get to shopping. Besides, SAC and NF appeared in the three rules. As the combination of SAC, TSC, and NF frequently appeared in the rules (three out of five), this combination could be an efficient predictor for AT mode choice of employees to shop. In the rules 2-4 of shopping trips, the values of SAC and TSC were "narrow sidewalks along all major streets" or "adequate sidewalks along all major streets" and "some bus stop shelters" or "widely available bus stop shelters", respectively. Combined with NF, the same values of SAC and TSC resulted in different AT modes, including a combination of walking/biking with ridesourcing, walking, and public transport. The variability in NF values can explain the variability in AT modes caused by these rules.It is therefore likely that such connections exist between the availability of sidewalks and transit stations in neighborhoods and higher rates of using walking and public transport to get to shopping by the employees. R1, as a sociodemographic factor, appeared in two rules (6 and 7). Combined with R1, the same value of TSC (no shelters) resulted in different AT modes, including a combination of walking/biking with ride sourcing and walking. It can thus be suggested that when the value of a built environment factor such as TSC does not change, the likelihood of a combination of walking/biking and ride sourcing is higher for Chinses employees to get to shopping compared to other races; the likelihood of walking is higher for Malay employees to get to shopping compared to other races.
Concerning the purpose of leisure, the tree created four significant rules. According to the rules, a built environment factor, BAC, and a trip characteristics factor, DTD, appeared in all rules. The frequent presence of BAC in the decision rules was not surprising since biking was the second major mode of leisure activity. Additionally, the bike is regarded as a means of recreation in Malaysia. Thus, the importance of BAC for leisure trips makes sense. Another built environment factor, NF, was presented in three rules. Furthermore, the race was influential on rules 4 and 5. While the values of NF, DTD, and BAC do not change in these rules, the different values of R1 resulted in various AT outputs, including biking and walking to leisure activities. It can thus be suggested that when the built environment and trip characteristic factors such as NF, BAC, and DTD are fixed, the likelihood of biking is higher for Chinese employees compared to other races, and the likelihood of walking is higher for Malay and others (e.g., foreigners and Indians). Because this combination appeared in all of the rules, the combination of DTD and BAC can be an effective predictor of employees' AT mode choice for leisure. This frequent presence also shows that the AT mode choice of employees to leisure is mostly influenced by DTD and BAC.

Limitations
Within the scope of this research, although the present study produced notable insights on the use of AT modes among university employees, there are some notable limitations. First, the study population was purposefully kept limited to those employees that used active modes for the purposes of work, shopping, and leisure three times per week. This ensures that the target population includes frequent users of AT, and these users are developing a consistent pattern of AT. Second, the present study was conducted among university employees in a developing country where the overall condition of active transport infrastructure, including sidewalks, bike lanes, and bus stations, is undesirable. Therefore, caution must be applied as the results of proposed models may not be transferable to developed countries. Third, the sample size used in this study might have reduced the generalizability of the results. Therefore, the findings of this study may not be generalizable to employees in universities of varying sizes with diverse demographics, and participation bias may have influenced the results. Finally, the self-report data of AT modes' usage was used in this study. Further studies can complement self-report with trip observations.

Conclusions
This study set out to investigate the AT mode choice of employees for different trip purposes, including work, shopping, and leisure. To this end, a sample of administrative university employees was selected to be studied. This study then analyzed the data using the C5 algorithm. The data analyzed was presented in the form of decision trees and decision rules. The most obvious finding to emerge from this study is that the TSC, SAC, and BAC are the most important factors affecting the use of AT modes by employees for work, shopping, and leisure trips, respectively. The second major finding was that combinations of (a) SAC, DTT, and TSC to get to work, (b) SAC, TSC, and NF to get to shopping, and (c) DTD, and BAC to get to leisure are the best predictors for AT mode choice of university employees. The results of this research support the idea that it is necessary to simultaneously use strategies that improve the built environment conditions, reduce the travel time and distance with regards to the trip purposes, and consider the sociodemographic attributes of employees in order to achieve a high level of active transport among the employees. The current findings also enhance our understanding of the AT modes' characteristics and their influencing factors. This research will serve as a base for future studies and further research to understand the underlying cause of complex employees' mode choice behavior with the living environment and sociodemographic change.

Conflicts of Interest
The authors declare no conflict of interest.