Delineation of ecotourism suitability zone using machine learning based ensemble models

Raha, Shrinwantu; Deb, Sayan

Delineation of ecotourism suitability zone using machine learning based ensemble models

Shrinwantu Raha ¹*; Sayan Deb ¹

1, Department of Geography, Bhairab Ganguly College, Belgharia, West Bengal, India

E-mail:
shrinwanturaha1@gmail.com

Received: 30/10/2025
Acceptance: 09/12/2022
Available Online: 12/12/2022
Published: 01/01/2026

Download this article

Manuscript link
http://dx.doi.org/10.30493/DAS.2025.011212

Abstract

Precise demarcation of ecotourism-suitable zones is essential for achieving sustainable development and guiding infrastructure investment across regions. This research presents a machine learning approach to assess and demarcate ecotourism suitability zones (ESZs) in Odisha using two machine learning ensembles: CatBoost and Model Averaged Neural Network (MA-NNET). The classification framework divided the state’s landscape into four tourism potential categories (Very High, High, Moderate, and Low) based on several physical and social criteria. Both models achieved comparable accuracy, precision, recall, F1-score and AUC-ROC values with the training and test sets; however, CatBoost scored a marginally better consistency between training and testing performance. CatBoost spatial output revealed that more than half the area of the state has a high and very high potential as ecotourism zones. Approximately 31.44% of the total area was categorized under the moderate ecotourism potential class, and the remaining 13.31% of area was classified under the low ecotourism potentials. SHAP analysis revealed that relief and relative relief are the most influential features driving model decisions in both MA-NNET and CatBoost. The study highlights the usefulness of machine learning algorithms in regional tourism planning and provides practical results to the development of data-driven policies and sustainable sectoral development (specifically SDG 8 and SDG 11) in Odisha.

Keywords: Ecotourism, Suitability zone, Model averaged neural network, CatBoost, SDG

Introduction

Tourism is defined as the temporary movement of people from their usual place of residence to destinations outside their everyday environment—for purposes such as leisure, business, or other activities—along with the services and facilities provided to support their needs during the visit [1][2]. Out of this broad definition emerges ecotourism with the specific interest in responsible traveling in the natural environment while conserving the ecosystem, promoting the well-being of the local population, as well as involving educational and interpretive aspects [3-7]. On the same note, ecotourism is widely perceived as visiting comparatively undisturbed or unpolluted natural regions with the express purpose of studying, admiring, and appreciating the landscape, and any cultural manifestations that may be present in the regions [8][9]. Ecotourism has proved to be a type of sustainable tourism that entails environmental accountability besides social-economic engagement of host populations [10][11]. To elaborate further, an ecotourism suitability zone refers to a geographical area that has been identified as “appropriate” for the development and promotion of ecotourism projects, where natural attractions are available, the environment is ecologically sound, and the area is culturally significant and accessible [4][12]. While the concept plays a crucial role in guiding sustainable land use planning and minimizing negative environmental impacts, research on ecotourism suitability areas remains limited despite the extensive bibliometric interest. This gap highlights significant opportunities for advancing both scientific inquiry and practical applications in sustainable tourism management.

Historically, eco-tourism planning and site selection processes have been predominantly based on anecdotes, expert opinions, and a qualitative assessments [13]. These traditional approaches provide valuable information based on the local knowledge, understanding of the culture, and experience, and in many cases, it allows the planners to notice the context-relevant peculiarities that a strictly quantitative model may fail to identify [14][15]. Nevertheless, in spite of their merits, such methods are characterized by inherent limitations associated with subjectivity, lack of consistency and limited scalability [16][17]. With the complexity of tourism systems (including multifaceted interactions between environmental, social, economic, and infrastructural dimensions), these conventional methods have become ineffective to describe and analyze the complexity of the spatial dynamics of sustainable tourism planning [18][19]. The increasing sophistication of contemporary tourism implies that it requires the incorporation of caliber analytical and quantitative models that can offer objective and quantitative information for the identification of potential tourism areas with a high level of feasibility and sustainability [20][21]. Such models can provide a more systematic and replicable way of evaluating tourism potential over large geographical regions by including geospatial technologies, multi-criteria decision-making frameworks, and modern data analysis algorithms [21][22]. Lack of these techniques in the past has mostly led to an ineffective exploitation of tourism resources, haphazard and uncoordinated development programs, and poor allocation of infrastructure growth in tourist areas, ultimately limiting the competitiveness and sustainability of the tourist destination in the long term.

There are several Multi-Criteria Decision-Making (MCDM) frameworks [such as Analytic Hierarchy Process (AHP), Analytic Network Process (ANP), TOPSIS] and machine learning models that are already applied in the field of tourism [23][24]. Although MCDM techniques are common in the ecotourism suitability zone demarcation, they suffer from various limitations. Conventional MCDM techniques may oversimplify complex ecological, social and cultural interactions, reducing them to linear or numeric scale without considering the interconnected nature of the natural systems [25][26]. Moreover, their results are dependent upon the data quality rendering them resolution-sensitive and unsatisfactory in managing uncertainty or fuzziness in ecological and socio-economic data [27][28]. Additionally, scaling and weighting inconsistencies, failure to consider the spatial relationships (connectivity and autocorrelation) and absence of sound validation processes also diminish the reliability of the results [29][30]. As a result, although MCDM provides a systematic scheme of inclusion of a variety of criteria, its application in the ecotourism suitability mapping is constrained without the integration of complex spatial and machine learning methods.

The combination between Machine Learning (ML) and the development of Geographic Information Systems (GIS) has revolutionized regional and ecological planning [31][32]. It has been shown that the ML models can be effectively used to identify and classify complex patterns in large and heterogeneous data, and produce highly accurate predictions and detailed classifications on a variety of spatial and environmental levels [33][34]. Machine learning models can broadly be classified into five categories [35]; supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning and deep learning. Supervised learning (i.e., SVM, Decision Trees, Linear Regression) is the process of training models using labeled data to offer predictions or classifications, and the advantages are high accuracy and interpretability in structured problems, but disadvantages are that it requires large labeled datasets and it can be easily overfitted [36]. Unsupervised learning (e.g. K-means, Hierarchical clustering) is applicable to cluster and reduce dimensions in unlabeled data, and is also used to find hidden patterns or structure in unlabeled data, but its key drawbacks are that it is hard to assess performance, and the results are not as interpretable [37][38]. The semi-supervised learning algorithm (e.g. semi supervised SVM, Self-training) balances both performance and data efficiency by using small quantities of labeled data and large quantities of unlabeled data but may also perform poorly when labeled data are not representative or when the assumptions concerning the unlabeled data are violated [39]. Reinforcement learning aims at training agents to produce a chain of decisions by rewarding desirable behaviors, which is very effective in dynamic and interactive tasks such as robotics or gaming, and has disadvantages such as high computational cost, tuning complexity, and training instability [40,41]. Lastly, deep learning, a branch of machine learning based on neural networks with numerous layers, is particularly useful at learning complex representations of raw data (images, audio, and text) to a high level of accuracy and automation of feature extraction, but suffers from significant weaknesses of high resource consumption, large data sets, and non-explain ability of decisions (the black box problem) [42–45].

The strengths of ensemble models, a type of model that follows the results of other multiple models, can counter most of the weaknesses of the conventional five machine learning categories [46][47]. Ensemble methods achieve much better generalization, robustness and overall prediction accuracy by combining the performance of many diverse learners through bagging, boosting, or stacking techniques [48][49]. Their capability to diminish the bias and variation of overfitting, which is a prevalent occurrence in supervised and deep learning models, through the smoothing of errors [50][51]. They also reduce the effects of bias since different model perspectives are included, resulting in more stable and reliable outcomes even in the case where a particular model is noisy and biased [52][53]. Contrary to unsupervised or semi-supervised models, which might not be able to handle ambiguity in data labelling, ensembles have the power to utilize several learners to capture unknown patterns and relationships [54]. Moreover, ensemble methods can be used to achieve a more balanced and interpretable structure where reinforcement learning or deep models demand large amounts of computation, or are not easily interpretable, through a combination of simpler models to achieve decent predictive performance [55][56]. In general, the power of ensemble models is that they can utilize the power of collective brain of several algorithms, enhancing accuracy, stability and resilience to various types of data and learning tasks [57][58].

Gradient boosting, especially the CatBoost, which is able to handle both categorical and numeric data efficiently, can be of particular use in ecotourism suitability assessment since it is capable of processing a wide range of environmental variables, including land cover, elevation, climatic conditions, accessibility, and biodiversity, without needing a complex preprocessing [59][60]. It is best suited for complex tasks where ecological and socio-economic factors can heavily affect site suitability due to its efficiency in dealing with nonlinear relationships and interactions between variables [61][62]. In addition, CatBoost reduces overfitting and bias through the use of ordered boosting and automatic regularization, which makes the predictions more stable and interpretable in different landscapes [63][64]. Conversely, The Model Averaged Neural Network (MA-NNET) is a combination of various neural network models, which leads to averaged predictions increasing certainty and decreasing variance which would come with the use of only one model [65]. Combined, these models offer a better, fact-based basis to the appropriate identification of the best locations to develop sustainable ecotourism, and to facilitate informed decision-making that will be able to balance between conservation goals and tourism opportunities. However, till now, there is hardly any research paper focusing on these models in the context of ecotourism.

Odisha is situated along the eastern coast of India and bordered by Chhattisgarh, Jharkhand, West Bengal and Andhra Pradesh to the west, south, north and east respectively, with 480km long coastline on the Bay of Bengal (Fig. 1). The various scenery of the state, such as plateaus, fertile hills, and beaches render it a lively tourist destination [66][67]. The Odishi tribal culture, art, and festivals like the Chaiti and Parab Tribal Festivals, specifically in Koraput, Malkangiri and Rayagada, are very popular among the visitors [68]. The state is famous with its abundance of handcrafts such as the Bolangir blankets, elaborate Pattachitra paintings, silver filigree jewelry, and old fashion handloom weaving such as Sambalpuri and Ikat weaving [69][70]. The major tourist attractions of Odisha are Puri, Chilika Lake, Similipal, wildlife reserves including Satkosia Gorge Sanctuary, Bhitarkanika, Nandankanan and Similipal National Parks. Nandankanan and Similipal, which were declared national parks in the 20^th century, are known for their biodiversity [71]. It is interesting to note that, Odisha has a high potential of eco-cultural tourism as Similipal is a world-renowned tiger reserve, and a UNESCO Biosphere Reserve [72]. However, ecotourism suitability zone has not been delineated for Odisha till now. Thus, the objective of this research is to demarcate the ecotourism suitability zone of Odisha using diverse machine learning techniques especially CatBoost and MA-NNET models.

**Figure 1.** Location of the study area

Methodology

Data collection

In this phase, a tourism inventory map was prepared [73]. This process relied on wide field observation, Google Earth Imagery, and support of Global Positioning System (GPS) to allocate 415 points. Out of the 415 tourism points identified, there were 162 points (non-tourism points) and 253 points (tourist location). The tourism points are those which are predominantly qualify as any geographically distinguishable site, location, or establishment which mainly serves to attract tourists owing to its cultural, religious, historical, recreational, natural or commercial interest. In that case, there are also many attractions that will make up tourist points: religious spots (e.g. Buddhist temples, Gurudwaras, Masjids, and Churches), hospitality and leisure venues (e.g. hotels and restaurants), commercial zones (e.g. malls and shops), sports and recreational places (e.g. sports complexes and stadiums), and natural or wildlife reserves (e.g. national parks and sanctuaries). All these places together form the contribution to the tourism landscape as they attract visitors to have various experiences, which in line with the greater conception of tourist attractions by being sites that inspire travel and visitation [74][75]. On the other hand, non-tourist points are those which do not satisfy any of the above conditions. The geocoding of the aggregated tourism inventory points was done using ArcGIS 10.4 software and randomly divided into 80:20 (training:testing) datasets.

Enumeration of data

The potential of tourism in the research area is assessed according to various environmental and infrastructural criteria that are obtained based on the open-source geospatial data and remote sensing products (Table 1). Out of the data collected from the Shuttle Radar Topography Mission (SRTM DEM, 90 m resolution), relief, relative relief, and hill shade has a positive relationship with tourism, since changes in elevation and landscape attractiveness usually contribute to the scenic value and attract tourists. In contrast, dissection index, terrain ruggedness index, and topographic position index, exhibit a negative correlation with tourism, since extreme ruggedness and landform dissection may inhibit the creation of tourism-related facilities. Climatic and ecological variables including, the temperature condition index (TCI), and vegetation condition index (VCI), obtained through the NOAA NESDIS STAR Repository (4-km resolution, 7-day composite data) demonstrate a systemic effect, with the bad conditions of temperatures limiting the tourism potential, whereas healthy and dense vegetation cover has a positive effect, increasing the natural attractiveness and promoting the ecotourism potential. To derive the TCI, Satellite Meteorology and Climatology (SMCD) (an ongoing project of NOAA NESDIS STAR Repository) uses the radiance measurements of AVHRR in the 10.3-11.3 um range and that have been converted into brightness temperature (BT) following the complete removal of high-frequency noise. SMCD operates with BT, which is an anomaly of the 25-year climatological mean, based on biophysical principles: law of minimum, law of tolerance, and carrying capacity. Calibrated radiances obtained prior to and following the launching are calculated to get the Vegetation Condition Index (VCI) which are converted to the noise-free Normalized Difference Vegetation Index (NDVI) using visible (VIS) and a near-infrared (NIR) band values:

Tourism development is highly supported by road-rail density (Infrastructure). Road and rail layers were digitized in the Google Earth Pro and then those layers were added to ArcMap 10.4. Taken together, these criteria serve to highlight the interdependence of the terrain, climate, ecology and infrastructure in influencing tourism appropriateness with each playing either a positive or negative role in the dynamics of tourism as a whole.

Assessment of nature of data

First of all, the nature of data was assessed using the descriptive statistics and boxplot to identify the potential outliers. Then, the distribution of the data was checked.

Transformation of data

Power transformation and robust scaling were applied to the original datasets. A power transformation was used to transform data and stabilize the variance, reduce skewness, and make the distribution of the variables more symmetric or closer to normal [76]:

Where, (y>0) is the data, and (λ) (the power parameter) determines the strength and direction of the transformation. When λ=1, the data remain unchanged; when λ=0, the logarithmic transformation is applied; when λ=0.5, it corresponds to the square root transformation; and when λ=-1, it becomes the reciprocal transformation.

The Box–Cox transformation assumes all data are strictly positive. For data that include zeros or negative values, a modified approach known as the Yeo–Johnson transformation was used as follows [74]:

The transformations are especially useful if the raw data are skewed. For example, right-skewed data (long tail to the right) transformation using a logarithm (λ=0) or square root (λ=0.5), which compresses large values more than small ones. On the other hand, left skewed data may need reciprocal or negative power transformations. By selecting an appropriate power transformation (often estimated by using maximum likelihood), normality and variance stabilization are improved, thus improving the validity of parametric statistical methods and predictive models.

Robust scaling is a data pre-processing technique that is used to normalize features and minimize the impact of outliers [75][76]. Unlike standardization (which rescales the data using the mean and the standard deviation) or min-max scaling (which rescales using the minimum and the maximum), robust scaling uses statistics that are less sensitive to extreme values: the median and the interquartile range (IQR) [77]:

where x is the original value, Median(X) is the median of the feature distribution, and IQR(X)=Q3-Q1 is the interquartile range, computed as the difference between the 75^th percentile (Q3) and the 25^th percentile (Q1). This scaling moves the data around zero, by subtracting the median, and scales the data based on the spread between numbers defined by the IQR.

Feature selection

Checking correlation-coefficient

In statistical learning and predictive modelling, correlation-based feature selection is often used as an initial dimensionality reduction method in order to improve model interpretability and efficiency [78][79]. The method consists of calculating the strength of association between each predictor and the outcome variable by quantifying the correlation coefficient between the two, in the case of continuous outcomes Pearson’s correlation, or point-biserial correlation for binary outcomes. Features with small or no correlation with the target are usually discarded as they are considered to have low predictive value. Furthermore, the method incorporates redundancy by considering inter-feature correlations; when two predictors show a high linear correlation (i.e. |r| > 0.8), one of them can be removed in order to reduce multicollinearity, which can bias the estimation of the model and inflate variance.

Mutual information-based feature selection

While correlation coefficients provide an intuitive criterion for feature selection that is computationally efficient, they only capture linear dependencies and neglect non-linear and/or interaction effects. As a result, correlation-based feature selection is usually combined with more advanced methods like mutual information (MI) or embedded model-based methods to obtain a parsimonious and robust feature set. Features selected by the MI-based method are those that exhibit the highest statistical dependency with the target variable, quantified as follows [80][81]:

where, X represents a candidate feature, Y is the target, while p(x,y), p(x) and p(y) are the joint and marginal probability distributions, respectively, capturing both linear and nonlinear relationships that traditional correlation measures might miss. By maximizing mutual information, the selected features provide the most predictive power while minimizing redundancy, often through criteria such as Max-Relevance Min-Redundancy (mRMR), which balances high relevance I(X_i;Y) with low redundancy:

among already chosen features S, ensuring that each new feature contributes unique information. This regularization is especially well suited to high-dimensional datasets, where irrelevant or correlated features can negatively affect model performance and increase computational complexity [82][83]. MI-based selection improves model generalization by selecting statistically significant variables, minimizes overfitting and maximizes interpretability by generating a small subset of informative features which may applied to heterogeneous data (continuous, discrete or categorical) based on proper estimation of probability distributions, kernel density methods or discretization schemes, which can be conveniently applied to different domains [84][85].

L1 regularization-based feature selection

In L1-regularization-inspired feature selection algorithm, also called Lasso (Least Absolute Shrinkage and Selection Operator), features are selected by incurring a penalty on the magnitude of the regression coefficients, which not only causes overfitting by providing constraints on the model complexity, but also allows interpretation by creating a smaller set of predictors, as the most significant predictors can therefore be more easily identified [86][87]. This method is especially useful with high-dimensional data, where the number of features is extremely larger than the number of observations, as L1 regularization does not only alleviate overfitting by requiring the model to be constrained in its complexity; it also increases the interpretability of the predictions, producing a more concise set of predictors which is easier to interpret [88][89].

Applied models

CatBoost model

CatBoost (Categorical Boosting) is a gradient boosting library created by Yandex. It is also characterized by its high-performance, particularly with categorical features, and offers a strong and quick training. CatBoost is formulated on gradient boosting structure, which is able to construct an ensemble of weak learners (usually decision trees) one by one [90]. The training of each new tree is produced to estimate the negative gradient of the loss function on the basis of the predictions of the current ensemble.

If F_m (x) is the prediction of the ensemble after m trees, the next tree h_(m+1) (x) is trained to minimize the loss function L(y,F_m (x)+h_(m+1) (x)). In gradient boosting, this is approximated by fitting h_(m+1) (x) to the negative gradient of the loss function evaluated at the current prediction:

The new ensemble prediction is then:

Where, γ_(m+1)is the step size or learning rate.

The CatBoost model solves the issue of prediction shift during training that can happen in normal gradient boosting when passing the same data, the tree is trained on through the gradient estimates the tree is trained on. Ordered boosting involves training each tree on a dissimilar subset of the training data than the one being used to compute the gradients. It will help avoid overfitting and better the generalization. CatBoost also offers new mechanisms of dealing with categorical features without necessarily having to one-hot encode them. It is one of the most important CatBoost techniques [59,91]. In a categorical variable, the scores are substituted with a numerical statistic, which is founded on the target variable. Given a categorical feature value, c, and a target variable, y:

Where, the sum is over the observed data points, prior is a user-defined prior, and count is the number of times the category c appeared in the historical data. The “ordered” aspect means this calculation is done based on a random permutation of the data, using only data points that appeared before the current data point in the permutation. CatBoost is able to automatically unify categorical features creating new more informative features. This is practiced greedily in construction of the trees.

CatBoost relies on oblivious decision trees, in which the same split is used on all the nodes at the same tree level. This facilitates a more rapid prediction process and reduce overfitting probability. The prediction for a single tree is a piecewise constant function:

where R_j are the disjoint regions formed by the tree’s splits, c_j are the constant predictions in each region, and I is the indicator function.

CatBoost supports various loss functions depending on the task (regression, classification, ranking, etc.). For binary classification, the cross-entropy loss is commonly used:

Where, y is the true label (0 or 1) and p is the predicted probability.

The Catboost model was tuned using the Randomized Search with Cross-Validation; which is a common and efficient technique [92]. At first, a dictionary ‘catboost_param_grid’ was created to specify a range of values for key CatBoost hyperparameters such as ‘iterations’, ‘learning rate’, ‘depth’, ‘l2_leaf_reg’, and ‘border count’. This defines the search space for the tuning process. Next, a base CatBoostClassifier model was instantiated with a fixed random_state for reproducibility and verbose=0 to suppress training output during the search. After that, randomized search was customized using estimator, CatBoost param grid, K-fold cross-validation, scoring, and random state. The number of processor cores were set to -1 utilizes all available cores, speeding up the search.

Model Averaged Neural Network (MA-NNET)

Model Averaged Neural Networks (MA-NNET) is an ensemble method, which uses the predictions of a number of single neural networks to enhance the overall performance and generalization of the technique, and to minimize the chance of overfitting [93]. In contrast to other ensemble techniques such as bagging or boosting, which normally require training models on dissimilar subsets of data, or different weightings of samples, model averaging typically requires training two or more models with independent initializations, architectures, or hyperparameters, and combining the results to achieve better generalization and robustness. The working principle of MA-NNET is as follows:

Train N individual neural networks, f₁(x),f₂(x),…,f_N(x), independently on the same training data (X_train,Y_train). Each network f_i(x) is a function that takes input features x and produces a prediction. The independence in training can come from random initialization, network architecture, hyperparameter settings with different optimizer, regularization and learning rate. Let the prediction of the ith network for a given input x be:

The final prediction y ̂_avg is the arithmetic mean of the individual model predictions:

While the specific combination method varies, the general idea is to combine the outputs of individual models fi(x) using a function C:

For this research, C is the mean function. Instead of a formal hyperparameter grid for automated search, a ‘nn_param_space’ was defined containing a list of distinct Neural Network configurations to evaluate. Each configuration specifies the number of layers, units per layer, activation function, optimizer, learning rate, epochs, and batch size. The code loops through each predefined configuration in nn_param_space[‘configs’]. For each configuration, Stratified K-Fold cross-validation (with 5 splits) was performed on the selected training data. Finally, predictions were made on the test data using each of the 5 trained models. The predicted probabilities from these models were then averaged to produce the final ‘averaged_tuned_predictions’ for the Model Averaged Neural Network [94].

Assessment of accuracy

In the case of ecotourism suitability zones (ESZ) modeling, high degree of precision is an assuring factor that the delimited areas can provide a fair account of the tourism potential without exaggerating or erroneously classifying the potential. This study employed the use of accuracy, precision, recall, F1-score, and AUC-ROC to evaluate the performances of various data-driven models [95][96]:

Where, TPR(FPR) is the ROC curve function. ROC curve is the plot of true positives and false positives. TN represents True Negative, TP represents True Positive, FP represents False Positive, FN represents False Negative, TPR is the true positive rate, FPR is the false positive rate, AUC represents the Area Under Curve and N is the number of observations. The full methodological framework of this research is summarized in Fig. 2.

Results

Assessment of data

Significant positive relationships were observed between some of the variables, especially between VCI, TCI, and VHI (Fig. 3 A). This implies that these variables are interdependent, and they have overlapping information. It is also possible to see moderate correlations between TRI and RL (0.48) and TRI and RLL (0.59), RL and RLL (0.54). Conversely, the rest of the relationships seem to be weak or near zero which means that there is little linear association. Overall, the dataset has a considerable level of multicollinearity between certain variables prior to transformation, which may impact the consistency of machine learning models. Additionally, the statistical distributions of eleven terrain and vegetation parameters indicate environmental heterogeneity in the study area. All the indices of VCI, TCI and VHI showed a strong left-skewed distribution. The Terrain Ruggedness Index (TRI), Relief (RL), Road density (RD), Dissection Index (DI), and Relative Relief (RLL) have strong right-skewed distributions. By contrast, Topographic Position Index (TPI) demonstrated a sharp peak of near-symmetry around zero, which means that most of the terrain is in mid-slope or neutral elevations, whereas Aspect (AS) distribution is almost equal which means that there is no predominant slope orientation of the area. The distribution of Hill shade (HS) parameter has a high concentration around the middle of the ranges, which implies a rather uniform illumination environment in the presuppositions of the light source (Fig. 4). VHI index was omitted due to high correlation with other indices.

**Figure 3.** Correlation matrix between indices before (A) and after transformation (B). Indices followed by “y” or “r” in the transformed matrix refer to them being transformed using Yeo-Johnson or robust scaling, respectively

The Yeo-Johnson and robust scaling were applied for the transformation of the data. After the transformation, the general correlation pattern became weaker and balanced. The high correlation coefficients between VCI, and TCI, have declined significantly (Fig. 3 B), and it is possible to state that the transformations were effective in mitigating multicollinearity. Although few moderate correlations remained, such as TRI-RLL (0.63) and TRI-RL (0.61), however, generally, the relationships seem to be less radical and more symmetrically distributed. This shows that the dataset became more suitable to the subsequent machine learning models because the applied changes (Yeo-Johnson and robust scaling) have probably equalized the variance and the variables have been made closer to normality.

All indices were normalized through Yeo-Johnson method except for AS and Hs, where robust scaling method was used (Fig. 4). The transformation and scaling methods are applied to improve the normality and comparability of the data distributions. Most of the variables exhibit approximately symmetric, near-normal distributions after transformation, with a noticeable improvement in variance homogeneity. Overall, the figure illustrates the effectiveness of pre-processing techniques (Yeo–Johnson and robust scaling) in normalizing diverse terrain-related metrics prior to machine learning analyses. Further, the extremity of each selected variable was checked using boxplots (Fig. 5). In all boxplots, the distributions are normally concentrated around zero with the interquartile range of the data being approximated to be between -1 and +1, indicating that the transformations have standardized the data. There is the presence of slight outliers in some of the variables like TPI_yeojohnson (Fig. 5 D) and HS_robustscaled (Fig. 5 G) yet in general the spread is balanced resulting in a lower skewness. There are other like TRI_yeojohnson (Fig. 5 C) and RL_yeojohnson (Fig. 4 F) which are symmetrically distributed and contain few extreme values.

**Figure 4.** Distribution of criteria after transformation VCI_yeojohnson (A), TCI_yeojohnson (B), TRI_yeojohnson (C), TPI_yeojohnson (D), AS_yeojohnson (E), RL_yeojohnson (F), HS_robustscaled (G), RD_yeojohnson (H), DI_yeojohnson (I), RLL_yeojohnson (J)

**Figure 5.** Boxplot after transformation VCI_yeojohnson (A), TCI_yeojohnson (B), TRI_yeojohnson (C), TPI_yeojohnson (D), AS_yeojohnson (E), RL_yeojohnson (F), HS_robustscaled (G), RD_yeojohnson (H), DI_yeojohnson (I), RLL_yeojohnson (J)

Analysis of criteria

The relief map (Fig. 6 A) illustrates that the plains bordering the Bay of Bengal is the low-lying territory (with altitudes of 0-150 meters) and the central and western region of Odisha (with altitudes of 150-700 meters) show undulating uplands and plateaus. The southwestern region (bordering Andhra Pradesh and Chhattisgarh) contains the highest elevations (700–1,664 m), corresponding to the Eastern Ghats hill ranges, particularly in the Koraput and Rayagada districts. The Relative Relief map (Fig. 6 B) displays the change in elevation across the localized points, which measures the ruggedness in the terrain. The coastal and valley river basin areas exhibit low relief (0.51–150 m), characterized by smooth and flat topography, whereas a large portion of central Odisha displays moderate relief (150.01–300 m), manifesting as gently rolling terrain. The south-western and north-western regions are marked with a rugged and mountainous landscape related to the Eastern Ghats and plateau margins (300.01-1151.14 meters). These differences show the physiographic diversity of Odisha with an obvious east to west gradient between flat coastal plains and rough highlands with high absolute elevation and high local relief located in the southwestern region, underlining the structural and erosional dominance of the Eastern Ghats system.

Aspect (Fig. 6 C) indicates slope orientation, categorized into four directional classes: −1° to 90° (east-facing), 90.01° to 180° (south-facing), 180.01° to 270° (west-facing), and 270.01° to 359.97° (north-facing). The gradient change of the slope in the various directions of the state and various orientations can be observed in the plateau and hilly areas especially western and southern Odisha.Hill Shade (Fig. 6 D), indicates terrain illumination based on slope and aspect, with values ranging from 0–254. Lighter shades (yellow) represent more illuminated, steeper areas, while darker orange tones indicate shaded or less illuminated regions. The Hill Shade in the western and southwestern parts of Odisha, bordering Chhattisgarh and Andhra Pradesh, are more rugged and elevated, while the coastal plains to the east are relatively flat and less shaded.

Terrain Ruggedness Index (TRI) is a measure of the amount of variation in the elevation. Regions with a higher TRI values (indicated in orange to red) are more rugged terrain, and regions with lower TRI values (indicated in yellow) are flat terrain, mostly on the eastern coast of Odisha (Fig. 6 E). The Dissection Index (DI) describes the extent of landscape dissection or topographic irregularity as a result of erosion and drainage patterns. Higher DI values (dark brown) are found in the northern and north-eastern highlands, indicating intensely cut valleys and more dissected terrain, and lower DI values (light beige) are found on the coastal and central plains, representing resolved landscapes (Fig. 6 F).

**Figure 6.** Spatial distribution of the included indices: Relief (A), Relative Relief (B), Aspect (C), Hill Shade (D), TRI (E), DI (F), TCI (G), VCI (H),TPI (I), and RD (J)

The Temperature Condition Index (TCI) of Odisha (Fig. 6 G) shows a high level of temperature stress with large areas of red and orange color, particularly in the southern and coastal belts, indicating poor thermal conditions. However, the Vegetation Condition Index (VCI) map (Fig. 6 H) depicts comparatively more healthy vegetation cover and especially on the north and central portions, where the majority of the area exhibits favorable vegetation conditions. It suggests that vegetation is resilient to thermal stress partially.

The Topographic Position Index (TPI) value (Fig. 6 I) fluctuates from -750.13 to 706.38. Most of Odisha appears to fall within moderate elevation ranges (−89.99 to 120.00), indicating relatively uniform topography with some hilly regions in the western part. The Road and rail density (RD) (Fig. 6 J) is also low (0-30) to very high (200.01-366.24), although the densities of the highest ones fall in the coastal and southern regions of Odisha, especially around the large cities and industrial regions. The state’s railway and road networks, along with broader transportation infrastructure, are predominantly concentrated in low-lying and accessible areas, rather than in higher-altitude and rugged regions.

Ecotourism suitability zones

Ecotourism suitability zone was prepared using the Tuned Model Averaged Neural Network and CatBoost models. The results from the tuning were prepared by specifying the number of boosting iterations, step-size shrinkage, depth, penalizing leaf weights, border count and verbose. During the tuning process 100, 250 and 500 iterations were applied. The learning rate was applied here as 0.01, 0.05 and 0.1. The 3, 5 and 7 depth of the decision trees were implemented in this research. The L2 regularization coefficient was applied to the leaves of the trees to prevent overfitting by penalizing large leaf weights. The values 1,3,5 was applied to different strengths of L2 regularization. The fine-grained split (32,64) was used. Verbose here was set to zero to get the clean output result. Additionally, to compile each neural network effectively, adam optimizer, relu activation function, 0.001 as the learning rate, 50 epochs, 32 batch_size and binary_crossentropy loss function were used. In this research, the shape of averaged tuned predictions is 83.

The suitability according to Tuned Model Averaged Neural Network was divided into four classes: Very High, High, Moderate, and Low, which are represented by dark blue, light blue, yellow, and red color respectively. Approximately, 28.83% of the area lies under the category of Very High suitability, 27.97% of the total area is marked under the category of High suitability, 28.49% area is demarcated under the category of Moderate suitability, and 14.62% area is considered under the category of Low suitability (Fig. 7 A). The coastal strip along the Bay of Bengal, particularly in eastern and southeastern Odisha, was found to be highly suitable, likely due to the presence of well-developed tourism infrastructure, extensive forest cover, and good accessibility. The western and southwestern areas, which are less developed, forested or hilly, were predominantly classified between moderate and low suitability. The high-suitability regions are home to major tourist attractions including temples, beaches and hotels, and this is indicative of the high capability of the neural network to render non-linear relationships between tourism determinants and spatial features.

The final tuned CatBoost model was developped using 50 iterations, 5 cross-fold validation, and with the random state of 42. The ecotourism suitability zones of a Tuned CatBoost model has a similar classification pattern to the neural network with some discrete differences in space. Here, approximately, 26.82 % of the area is of Very High suitability, 28.43 % of the High, 31.44 % of the Moderate and the remainder is of Low suitability, 13.31 % (Fig. 7 B). It means that a slightly greater percentage of moderately appropriate areas are predicted by CatBoost versus the neural network. Similarly, the eastern coastline area was the most suitable to tourism; however, the CatBoost model is inclined to generalize suitability to a much smoother range through central Odisha with less fragmented areas. There are still large areas of low-suitability zones in the western and southern districts, similar to the neural network output. The availability of major tourist attractions, such as temples, water, and accommodation facilities, is also in line with the predictions of higher suitability, and it denotes that CatBoost is suitable to incorporate several explanatory factors, and the spatial consistency in predictions is also present. For both CatBoost and MANNET models, the Balasore, Jajpur, Kendrapura, Jagatsinghpur, Puri, Khardha, Cuttack, parts of Bargarh, Subarnapur, Jharsuguda, Deogarh and Bolangirh are very highly suitable for ecotourism. Bolangir, Neapada, Bargarh, Sambalpur, Deogarh, Sundargarh, Angul, Keonjhar, Mayurbhanj, Neapada, Kalahandi, Malkangiri, Ganjam Nayagarh and some portions of Bhadrak are demarcated as high ecotourism suitability. On the other hand, moderate to low ecotourism suitability has been identified in the Koraput, Nbarangpur, Rayagada, Gajapati, Kandhamal, Mayurbhanj, Keonjhar and Eastern portions of Sundargarh.

**Figure 7.** Ecotourism suitability zone of the study area using Tuned Model Averaged Neural Network (A) and tuned CatBoost model (B)

Accuracy assessments

The MA-NNET model achieved a training accuracy of 0.913, with precision of 0.910, recall of 0.943, F1-score of 0.926, and an AUC-ROC of 0.984. On the test set, the model’s accuracy decreased slightly to 0.867, with precision of 0.960, recall of 0.840, and an F1-score of 0.901. Even though these metrics report a slight drop in performance relative to training data, the model nevertheless generalizes effectively due to its strong predictive power and robustness as indicated by a test AUC-ROC of 0.964 (Fig. 8 A).

**Figure 8.** AUC-ROC for Tuned Model Averaged Neural Network (A) and Tuned CatBoost model (B)

The CatBoost model exhibits superior performance during training, achieving an accuracy of 0.958, precision of 0.959, recall of 0.969, F1-score of 0.964, and an AUC-ROC of 0.997, which implies minimal error. The CatBoost model attains the same test accuracy as MA-NNET (0.867), but with marginally improved precision (0.962) and recall (0.850), yielding an F1-score of 0.903 and an AUC-ROC of 0.962 (Fig. 8 B). This implies that both models perform comparably on unseen data; however, CatBoost has a marginally better consistency between training and testing performance, which means that it has a stronger generalization and a slightly reduced overfitting than the neural network.

Discussion

The findings of this research have significant implications for the ecotourism industry and real-world land-use planning. First, the study pushes the methodological boundary in the ecotourism suitability modelling by integrating two state-of-the-art machine learning techniques, a tuned and averaged neural network (MA-NNET) and tuned CatBoost models giving the resultant products, spatial suitability maps. Optimizing each model with hyperparameter tuning (iterations, learning rate, tree depth, L2 leaf regularization) and the averaging of the ensemble/average over instantiations of a neural network leads to lowered model variance and overfitting.

The resultant spatial categories of Very High, High, Moderate, and Low suitability offer a fine-tuned image of the ecotourism potential, both in the form of hotspots (eastern coastal Odisha) and the curve of decreasing suitability towards the interiors. These spatially explicit products are highly useful to policy makers, planners and conservationists as they can inform decisions on where to either permit or encourage ecotourism development, where to limit access due to conservation, and where to invest in infrastructure upgrades to increase suitability. On a broader level, the study shows that the nonlinear models of black-box can be used with a variety of spatial predictors (accessibility, forest cover, infrastructure, tourism facilities, etc.) and can identify complex interactions, which classical weighted overlay or linear models are frequently unable to do. Regarding ecotourism suitability mapping, MA-NNET and CatBoost models enhances the predictive accuracy of ecotourism suitability predictions within spatial scales, predictability of models, and reliability of ecotourism suitability zones through balancing model predictions.

Previous research used various machine learning models (including the ensemble models) to predict ecotourism suitability in the Zhangjiajie region of China [97]; their ensemble model predicted the suitability of approximately 19.34% and 28.78% as either highly or moderately suitable, respectively. CatBoost predicts a somewhat smoother and more generalized spatial pattern (less fragmentation in central Odisha) and tends to assign a higher share to “Moderate” suitability (31.44 % compared to 28.49 % in MA-NNET) and slightly lower to “Very High”.

The CatBoost model achieves slightly better accuracy for both training and test sets. The results from this research is consistent with the notion that gradient-boosted tree ensembles often enforce more regularization and smoother decision boundaries than neural nets, especially when combined with leaf regularization and shallow depth settings. The differences between the two model outputs (though broadly concordant) highlight that model choice and tuning influence the spatial detail and the “diffuseness” of classification boundaries.

SHAP (SHapley Additive exPlanations) is a game-theoretic approach used to assign each feature an importance value for a particular prediction, offering a unified and theoretically grounded framework for interpreting machine learning model outputs. By computing Shapley values, SHAP can quantify the marginal contribution of each feature across all possible feature combinations. This ensures consistency, local accuracy, and fairness in attribution, which makes SHAP particularly valuable for understanding complex, nonlinear models. Moreover, it provides both global feature importance rankings and local explanations that reveal how individual features influence predictions for specific instances. In the context of the current research, SHAP was used with the developed ecotourism suitability models, enables a transparent comparison of how different algorithms leverage topographic and environmental variables to inform their decisions. SHAP feature importance plots (Fig. 9) reveal that, for both tuned models (MA-NNET and CatBoost) the features RL_yeojohnson and RLL_yeojohnson, exhibiting the highest mean absolute SHAP values, are expected to exert the greatest influence on model predictions. The dominance of these two characteristics indicates that elevation of the terrain and the extent of the local elevation variation are critical in determining ecotourism suitability. These topographical features exert a substantial influence on scenic value, accessibility, landscape diversity, and overall visitor experience, making them central to identifying areas with high ecotourism potential. Similar findings have been reported by scholars [98] in the context of ecotourism suitability mapping.

**Figure 9.** SHAP feature importance plot for Tuned Model Averaged Neural Network (A) and Tuned CatBoost model (B)

The policy implications of the ecotourism suitability analysis results based on the CatBoost and MA-NNET models have a direct role in meeting the Sustainable Development Goals (SDG) 8 (Decent Work and Economic Growth) and SDG 11 (Sustainable Cities and Communities). With the combination of the latest machine learning ensembles including CatBoost and MA-NNET, the policymakers will recognize and prioritize ecotourism zoning of Odisha with the most potential ecological, cultural, and socio-economic value and reduced environmental damage. This analytical solution enables strategic allocation of investments in rural and peri-urban regions, promoting inclusive economic development, employment opportunities, and sustainable tourism, which comply with SDG 8 principals. At the same time, the spatial accuracy of these models assists the urban and regional planners to reinforce sustainable land-use patterns, support the cultural heritage, and enhance the infrastructural resilience which are important themes of SDG 11. The results of the models allow evidence-based zoning to drive regulations that consider conservation as well as development in balancing tourism to be more sustainable in the long-term. In addition, governments can empower local populations and make the tourism sector play a role by encouraging fair mechanisms of benefit-sharing and community involvement in the process of tourism governance. Finally, a new paradigm in ecotourism planning with the introduction of AI-based appropriateness analyses into the framework of policies can make ecotourism planning a driver towards sustainable livelihoods, sustainable ecosystems, and environmentally conscious urbanization, evidenced by the overall 2030 Agenda to Sustainable Development.

Conclusion

Odisha is one of the most prominent and popular tourist states in India with an exceptionally rich heritage of culture, natural beauty, and landscapes. Using MA-NNET and CatBoost models, this study carefully examined ecotourism suitability areas of Odisha.

According to the developed models, some coastal, central and interior forested regions of Odisha have been identified as high potential regions, which have a favorable combination of both environmental and accessibility conditions. On the other hand, the less accessible or environmentally limited areas were assigned a lower tourism potential classification.

Nevertheless, forecast accuracy of the tourism potential map can be improved further by incorporating a broader set of variables in the subsequent researches, including socio-economic factors and the detailed preferences of the tourists. In addition, any change in the natural environment created by anthropogenic factors or any change in natural processes like relief, climatic patterns, and alteration in road networks, can substantially change current situation of tourism prospective of an area. Hence, there is a strong necessity of updating the tourism potential zone map on a frequent basis by factoring in all such dynamism. This constantly updated ESZ map can then be used as a base information to guide informative planning and sustainable developmental undertakings across the state.

As far as we know, the study is the first attempt to emphasize the eco-tourism suitability of Odisha by applying the Machine Learning models in a spatial environment. This complex approach highlights the originality and unique contribution of this study to the domain. The research has a great potential in improving sustainable tourism development by using the machine learning algorithms to specifically detect and focus on areas that have high tourism potential. Besides informing the policymakers and tourism stakeholders in making informed decisions in terms of infrastructure development and marketing strategies, the findings serve as an added value to the development of tourism in the region, heritage preservation, and a better distribution of tourism throughout the state, which is perfectly in line with overall objectives of sustainable development (Specifically SDG 8 and SDG 11) and digital governance. Increasing the spatial and thematic resolution of the data can enhance the accuracy and scalability of the model more. As the tourism trends are changing, incorporating real-time tourist numbers, social media sentiment, environmental indicators, and updated transport network data can be more useful to reflect this change. Participatory GIS methods and crowd-sourced local knowledge would further increase the accuracy of models and relevance to the stakeholders. The research on the interpretability and explainability of machine learning outputs should also be considered in future studies in order to make machine learning outputs more viable to be implemented by policymakers and tourism planners. Moreover, the incorporation of detailed sustainability signals such as the ecological, cultural and socioeconomic aspects into the eco-tourism system can help make sure that the development plans will be inherently correlated with the long-term conservation objectives. Lastly, it might be helpful to introduce strong temporal elements into the models to study seasonal changes and the long-term trends to enhance the effectiveness of planning and the flexibility of tourism planning in Odisha.

Conflict of interest statement
The authors declared no conflict of interest.
Funding statement
The authors declared that no funding was received in relation to this manuscript.
Data availability statement
The authors declared that all used data sources are mentioned in the manuscript. Additionally, all used datasets and related codes/models will be available upon reasonable request from the corresponding author.

References

Jewpanya P, Nuangpirom P, Pitjamit S, Nakkiew W. Optimized Travel Itineraries: Combining Mandatory Visits and Personalized Activities. Algorithms. 2025;18(2):110. DOI
Kinczel A, Müller A. Study on Travel Habits and Leisure Activities in The Light of Covid-19 Triggered Changes in Romania and Hungary. Geo J. Tourism Geosites. 2022;41(2):440-7. DOI
da Silva AE, Maracajá KFB, Batalhão ACS, Silva VF, Borges IMS. Ecotourism and Co-Management: Strengthening Socio-Ecological Resilience in Local Food Systems. Sustainability. 2025;17(6):2443. DOI
Huang C, Li S, Chan Y, Hsieh M, Lai JM. Empirical Research on the Sustainable Development of Ecotourism with Environmental Education Concepts. Sustainability. 2023;15(13):10307. DOI
Satrya IDG, Kaihatu TS, Budidharmanto LP, Karya DF, Rusadi NWP. The role of ecotourism in preserving environmental awareness, cultural and natural attractiveness for promoting local communities in Bali, Indonesia. J. East. Eur. Cent. Asian Res. 2023;10(7):1063-75. DOI
Kelly AS, Lu X. Chinese Mass Nature Tourism and Ecotourism. In: Critical Landscape Planning during the Belt and Road Initiative. Springer Nature Singapore. 2021. DOI
Nguyen TD, Hoang HD, Nguyen TQ, Fumikazu U, Vo TPT, Nguyen CV. A multicriteria approach to assessing the sustainability of community-based ecotourism in Central Vietnam. APN Sci. Bull. 2022;12(1):123-40. DOI
Machnik A. Ecotourism as a Core of Sustainability in Tourism. In: World Sustainability Series. Springer International Publishing. 2021:223-40. DOI
Mbaiwa JE. The realities of ecotourism development in Botswana. In: Responsible tourism. Routledge. 2012:205-23.
Baloch QB, Shah SN, Iqbal N, Sheeraz M, Asadullah M, Mahar S, Khan AU. Impact of tourism development upon environmental sustainability: a suggested framework for sustainable ecotourism. Environ. Sci. Pollut. Res. 2022;30(3):5917-30. DOI
Gajić T, Vukolić D, Spasojević A, Blešić I, Petrović MD, Bugarčić J, Bugarčić M, Drašković BD, Milivojević M. Exploring Attitudes on the Sustainable Balance Between Nature Conservation and Economic Development Through Ecotourism—Lessons from EU and Non-EU Countries. Land. 2025;14(2):395. DOI
Withanage NC, Chanuwan Wijesinghe D, Mishra PK, Abdelrahman K, Mishra V, Fnais MS. An ecotourism suitability index for a world heritage city using GIS-multi criteria decision analysis techniques. Heliyon. 2024;10(11):e31585. DOI
Karataş E, Özköse A, Heyik MA. Sustainable Heritage Planning for Urban Mass Tourism and Rural Abandonment: An Integrated Approach to the Safranbolu–Amasra Eco-Cultural Route. Sustainability. 2025;17(7):3157. DOI
Gupta A, Arora N, Sharma R, Mishra A. Determinants of Tourists’ Site-Specific Environmentally Responsible Behavior: An Eco-Sensitive Zone Perspective. J. Travel Res. 2021;61(6):1267-86. DOI
Tiberghien G. Managing the Planning and Development of Authentic Eco-Cultural Tourism in Kazakhstan. Tour. Plan. Dev 2018;16(5):494-513. DOI
Chen C, Huang J. Integrating Dynamic Bayesian Networks and Analytic Hierarchy Process for Time-Dependent Multi-Criteria Decision-Making. Mathematics. 2023;11(10):2362. DOI
Jorge-García D, Estruch-Guitart V. Comparative analysis between AHP and ANP in prioritization of ecosystem services – A case study in a rice field area raised in the Guadalquivir marshes (Spain). Ecol. Inform. 2022;70:101739. DOI
Fatina S, Soesilo TEB, Tambunan RP. Collaborative Integrated Sustainable Tourism Management Model Using System Dynamics: A Case of Labuan Bajo, Indonesia. Sustainability. 2023;15(15):11937. DOI
Rawat A, Joshi S, Rai SK. Evaluating the issue of sustainable tourism with a system dynamic approach: evidence from Uttarakhand, India. Environ. Dev. Sustain. 2023;26(10):1-28. DOI
Punzo G, Trunfio M, Castellano R, Buonocore M. A Multi-modelling Approach for Assessing Sustainable Tourism. Social Indic. Res. 2022;163(3):1399-443. DOI
Skrame A, Ciancio C, Corvello V, Musmanno R. A Quantitative Model Supporting Socially Responsible Public Investment Decisions for Sustainable Tourism. Int. J. Financ. Stud. 2020;8(2):33. DOI
Yang J, Lo H, Chao C, Shen C, Yang C. Establishing a Sustainable Sports Tourism Evaluation Framework with a Hybrid Multi-Criteria Decision-Making Model to Explore Potential Sports Tourism Attractions in Taiwan. Sustainability. 2020;12(4):1673. DOI
Raha S, Mondal M, Gayen SK. Ecotourism Potential Zone Mapping by Using Analytic Hierarchy Process (AHP) and Weighted Linear Algorithm: A Study on West Bengal, India. J. Geogr. Stud. 2021;5(2):44-64. DOI
Raha S, Gayen SK. Tourism Potentiality Zone Mapping by Using the AHP Technique: A Study on Bankura District, West Bengal, India. J. Geogr. Stud. 2022;6(2):58-85. DOI
Ezell B, Lynch C, Hester P. Methods for Weighting Decisions to Assist Modelers and Decision Analysts: A Review of Ratio Assignment and Approximate Techniques. Appl. Sci. 2021;11(21):10397. DOI
Kizielewicz B, Więckowski J, Franczyk B, Wątróbski J, Sałabun W. Comparative Analysis of Re-Identification Methods of Multi-Criteria Decision Analysis Models. IEEE Access 2025;13:8338-54. DOI
Nabavi SR, Wang Z, Rangaiah GP. Sensitivity Analysis of Multi-Criteria Decision-Making Methods for Engineering Applications. Ind. Eng. Chem. Res. 2023;62(17):6707-22. DOI
Scholten L, Schuwirth N, Reichert P, Lienert J. Tackling uncertainty in multi-criteria decision analysis – An application to water supply infrastructure planning. Eur. J. Oper. Res. 2015;242(1):243-60. DOI
Faizi S, Rashid T, Sałabun W, Zafar S, Wątróbski J. Decision Making with Uncertainty Using Hesitant Fuzzy Sets. Int. J. Fuzzy Syst. 2017;20(1):93-103. DOI
Sotoudeh-Anvari A. The applications of MCDM methods in COVID-19 pandemic: A state of the art review. Appl. Soft Comput. 2022;126:109238. DOI
Chaturvedi V, de Vries WT. Machine Learning Algorithms for Urban Land Use Planning: A Review. Urban Sci. 2021;5(3):68. DOI
Li A, Zhang Z, Hong Z, Liu L, Liu L, Ashraf T, Liu Y. Spatial suitability evaluation based on multisource data and random forest algorithm: a case study of Yulin, China. Front. Environ. Sci. 2024;12:1338931. DOI
Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A. Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecol. Modell. 2019;406:109-20. DOI
Yang W, Deng M, Tang J, Luo L. Geographically weighted regression with the integration of machine learning for spatial prediction. J. Geogr. Syst. 2022;25(2):213-36. DOI
Alnuaimi AF, Albaldawi TH. An overview of machine learning classification techniques. In: BIO Web of Conferences. EDP Sciences. 2024;97:00133. DOI
Najjar E, Majeed Breesam A. Supervised Machine Learning a Brief Survey of Approaches. Al-Iraq. J. Sci. Eng. Res. 2023;2(4):71-82. DOI
Sinaga KP, Yang M. Unsupervised K-Means Clustering Algorithm. IEEE Access. 2020;8:80716-27. DOI
Chong B. K-means clustering algorithm: a brief review. Acad. J. Comput. Inf. Sci. 2021 Jan;4(5):37-40. DOI
Nartey OT, Yang G, Wu J, Asare SK. Semi-Supervised Learning for Fine-Grained Classification With Self-Training. IEEE Access. 2020;8:2109-21. DOI
Forestier G, Wemmert C. Semi-supervised learning using multiple clusterings with limited labeled data. Inf. Sci. 2016;361-362:48-65. DOI
Shen A, Dai M, Hu J, Liang Y, Wang S, Du J. Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data. In2024 4th International Conference on Electronic Information Engineering and Computer Communication (EIECC). IEEE. 2024:492-6. DOI
Cheng Y, Wang D, Zhou P, Zhang T. Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges. IEEE Signal Process. Mag. 2018;35(1):126-36. DOI
Shrestha A, Mahmood A. Review of Deep Learning Algorithms and Architectures. IEEE Access. 2019;7:53040-65. DOI
Marinó GC, Petrini A, Malchiodi D, Frasca M. Deep neural networks compression: A comparative survey and choice recommendations. Neurocomputing. 2023;520:152-70. DOI
Ahmed SF, Alam MSB, Hassan M, Rozbu MR, Ishtiak T, Rafa N, Mofijur M, Shawkat Ali ABM, Gandomi AH. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023;56(11):13521-617. DOI
Sagi O, Rokach L. Ensemble learning: A survey. WIREs Data Min. Knowl. Discov. 2018;8(4):e1249. DOI
Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Front. Comput. Sci. 2019;14(2):241-58. DOI
Ribeiro MHDM, dos Santos Coelho L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020;86:105837. DOI
Zhao C, Peng R, Wu D. Bagging and boosting fine-tuning for ensemble learning. IEEE Transactions on Artificial Intelligence. 2023;5(4):1728-42. DOI
Mohammed A, Kora R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. Comput. Inf. Sci. 2023;35(2):757-74. DOI
Ranglani H. Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models. Mach. Learn. Appl. 2024;11(4):01-12. DOI
Parker WS. Ensemble modeling, uncertainty and robust predictions. WIREs Clim. Change. 2013;4(3):213-23. DOI
Belitz K, Stackelberg P. Evaluation of six methods for correcting bias in estimates from ensemble tree machine learning regression models. Environ. Model. Softw. 2021;139:105006. DOI
Gharroudi O. Ensemble multi-label learning in supervised and semi-supervised settings. Doctoral dissertation, Université de Lyon. 2017.
Vouros GA. Explainable Deep Reinforcement Learning: State of the Art and Challenges. ACM Comput. Surv. 2022;55(5):1-39. DOI
Rane N, Choudhary SP, Rane J. Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions. Stud. Med. Health Sci. 2024;1(2):18-41. DOI
Mienye ID, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access. 2022;10:99129-49. DOI
Shaikh TA, Rasool T, Verma P, Mir WA. A fundamental overview of ensemble deep learning models and applications: systematic literature and state of the art. Ann. Oper. Res. 2024:1-77. DO
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J. Big Data. 2020;7(1):94. DOI
Bentéjac C, Csörgő A, Martínez-Muñoz G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2020;54(3):1937-67. DOI
Ajin RS, Segoni S, Fanti R. Optimization of SVR and CatBoost models using metaheuristic algorithms to assess landslide susceptibility. Sci. Rep. 2024;14(1):24851. DOI
Cai Y, Yuan Y, Zhou A. Predictive slope stability early warning model based on CatBoost. Sci. Rep. 2024;14(1):25727. DOI
Luo M, Wang Y, Xie Y, Zhou L, Qiao J, Qiu S, Sun Y. Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass. Forests. 2021;12(2):216. DOI
Maulana A, Afidh RPF, Maulydia NB, Idroes GM, Rahimah S. Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model. Infolitika J. Data Sci. 2024;2(1):17-27. DOI
Dey S, Das S, Roy SK. Demystifying the predictive capability of advanced heterogeneous machine learning ensembles for landslide susceptibility assessment and mapping in the Eastern Himalayan Region, India. Nat. Hazard. 2025;121(11):13407-46. DOI
Guha S, Paul AK, Sardar J. The Importance of Shoreline Beaches for Coastal Tourism Potential and Their Diversities on the Odisha Coast. In: Crisis on the Coast and Hinterland. Springer Nature Switzerland. 2023. DOI
Mahakul R, Panigrahi M. Geotourism: Determining Sustainable Development and Ecotourism Dynamics in Odisha. In: Advances in Geographical and Environmental Sciences. Springer Nature Singapore. 2025. DOI
Malhotra N. Introduction to Indigenous Tribes. In: Sustainable Pathways. Emerald Publishing Limited. 2024. DOI
Pandey AD. The Fabric of Life on the Edge. In: The Plural Social Sphere. Routledge India. 2024. DOI
Kumar R. An Artisan Heritage Crafts Village: Indigenous Sustainability of Raghurajpur. In: 8th International Conference on Recent Advances in Civil Engineering, Architecture and Environmental Engineering for Sustainable Development. 2015:13-4.
Mohapatra N. Economics of Ecotourism: A Study on Orissa. Indian J. Mark. 2011;41(7):36-42.
Sahu HK, Rath B, Mohanta BK, Nayak D, editors. Past Present and Future of Similipal. Newredmars Education Pvt Ltd. 2023
Jovanovic V, Njegus A. The application of GIS and its components in tourism. Yugosl. J. Oper. Res. 2008;18(2):261-72. DOI
Swarbrooke J, Page SJ. Development and Management of Visitor Attractions. Routledge. 2012. DOI
Gunn CA, Var T. Tourism Planning. Routledge. 2020. DOI
Othman SA, Ali HT. On the Use of Yeo-Johnson Transformation in the Functional Multivariate Time Series. Statistics, Optimization & Information Computing. 2025;13(6):2634-46. DOI
Dhawas P, Dhore A, Bhagat D, Pawar RD, Kukade A, Kalbande K. Big Data Preprocessing, Techniques, Integration, Transformation, Normalisation, Cleaning, Discretization, and Binning. In: Advances in Business Information Systems and Analytics. IGI Global. 2024. DOI
Desai P, Karthik P, Loganathan D, Preethi S, Bharani BR. Different data cleaning techniques and normalization techniques with focus on current normalization techniques: A study. In: Advances in Electrical and Computer Technologies. CRC Press. 2025. DOI
Gopika N, ME AM. Correlation based feature selection algorithm for machine learning. In: 2018 3rd international conference on communication and electronics systems (ICCES). IEEE. 2018:692-5. DOI
Zhao T, Zheng Y, Wu Z. Feature selection-based machine learning modeling for distributed model predictive control of nonlinear processes. Comput. Chem. Eng. 2023;169:108074. DOI
Raha S, Deb S. Tourism Potential Zone Mapping Using MCDM and Machine Learning Models in The State of Madhya Pradesh India. Geoplanning J. Geomat. Plan. 2025;12(1):95-122. DOI
Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005;27(8):1226-38. DOI
Okan Sakar C, Kursun O. A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection. Expert Syst. Appl. 2012;39(3):3333-44. DOI
Bonev B, Escolano F, Cazorla M. Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal. Appl. 2008;11(3-4):309-19. DOI
Win TZ, Kham NS. Mutual information-based feature selection approach to reduce high dimension of big data. In: Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence. 2018:3-7. DOI
Liu H, Sun J, Liu L, Zhang H. Feature selection with dynamic mutual information. Pattern Recognit. 2009;42(7):1330-9. DOI
An Z, Wang J, Wang J, Song J. Mutual information and error probability analysis on generalized spatial modulation system. IEEE Trans. Commun. 2016;65(3):1044-60. DOI
Wang Y, Cui Z, Ke R. Machine learning basics. In: Machine Learning for Transportation Research and Applications. Elsevier. 2023. DOI
Ravikumar N, Zakeri A, Xia Y, Frangi AF. Deep learning fundamentals. In: Medical Image Analysis. Elsevier. 2024. DOI
Witten IH, Frank E, Hall MA, Pal CJ. Probabilistic methods. In: Data Mining. Elsevier. 2017. DOI
Escobar CA, Morales-Menendez R. Classifier development. In: Machine Learning in Manufacturing. Elsevier. 2024. DOI
Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363. 2018. DOI
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Advances in neural information processing systems. Curran Associates, Inc. 2018;31.
Guan X, Xue R, He Z, Chen S, Chen X. CatBoost-Optimized Hyperspectral Modeling for Accurate Prediction of Wood Dyeing Formulations. Forests. 2025;16(8):1279. DOI
Zhang X, Liu S, Wang X, Li Y. A fragmented neural network ensemble method and its application to image classification. Sci. Rep. 2024;14(1):2291. DOI
Kim S, Geem ZW, Han G. Hyperparameter Optimization Method Based on Harmony Search Algorithm to Improve Performance of 1D CNN Human Respiration Pattern Recognition System. Sensors. 2020;20(13):3697. DOI
Huang Q, Zhou C, Li M, Ma Y, Hua S. An Approach for Mapping Ecotourism Suitability Using Machine Learning: A Case Study of Zhangjiajie, China. Land. 2024;13(8):1188. DOI
Yeo LB, Said I, Bak Yeo L. Mapping Recreational Ecosystem Service at Sub-Districts of Muar. European Proceedings of Social and Behavioural Sciences. 2018;40. DOI

Cite this article:

Raha S., Deb S.. Delineation of ecotourism suitability zone using machine learning based ensemble models. DYSONA-Applied Science. 2026;7(1):152-175. doi: 10.30493/das.2025.011212