An Explorative Application of Random Forest Algorithm for Archaeological Predictive Modeling. A Swiss Case Study

The present work proposes an innovative approach to surveys and demonstrates the effectiveness of bringing together traditional archaeological questions, such as the exploration and the analysis of settlement patterns, with the most innovative technologies related to Machine Learning. Namely, we applied Random Forest, an ensemble learning method based on decision trees, to perform archaeological predictive modeling (APM) for the Canton of Zurich, in Switzerland. This was done based on a dataset of known archaeological sites dating back to the Roman Age. The APM represents an automated decision-making and probabilistic reasoning tool that is relevant for archaeological risk assessment and cultural heritage management. Machine learning-based approaches can learn from data and make predictions, starting from the acquired knowledge, through the modeling of the hidden relationships between a set of observations, representing the dependent variable (i.e. the archeological sites), and the independent variables (i.e. the geo-environmental features prone to influence the site locations). The main objective of the present study is to assess the spatial probability of presence for Roman settlements within the study area. As results, we produced: 1) a probability map, expressing the likelihood of finding a Roman site at different locations; 2) the importance ranking of the geo-environmental features influencing the presence of the archeological sites. These outputs in our results are of paramount importance, not only in verifying the reliability of the data, but also in stimulating experts in different ways. Also, these results help evaluate the benefits and constraints of using such innovative techniques and, ultimately, help explore the performance of machine learning-based models in processing archaeological information.


INTRODUCTION
The massive expansion of urban settlement areas and transport infrastructures, increasingly threatens our cultural heritage all over the world. In particular, in the few last decades, Switzerland has been experiencing constant growth in its infrastructures and modern agglomerations (Hafner 2013). Several areas, which were so far unexploited are nowadays critically threatened by modern development, which often results in permanent destruction to any possible archaeological remains not yet unearthed. Within the Swiss Confederation, archaeology per se is a prerogative of each single region or member state (i.e. canton). Due to the country's decentralized political organization, each canton has its own specific procedures for archaeological heritage management (Kaeser 2013;2012;Kaenel 2002). Each canton is also responsible for the protection of nature, landscape, and cultural heritage within its territorial boundaries. Hence, a multiplicity of approaches exists. This decentralized situation further strengthens the need of exploring new solutions and to develop objectified and quantitative tools that can help detect archaeological sites, identify them, and protect them.
Prediction and modeling have always played a relevant role in this regard (Nebbia et al. 2016;Arnoldus-Huyzendveld, Citter & Pizziolo 2015;Rogers, Fischer & Huss 2014;McEwan 2012;Danese et al. 2014;van Leusen & Kamermans 2005;Lock 2000). Based on the assumption that human behavior can be patterned, the possibility to map this pattern can result in a helpful tool for the assessment of where we have the highest likelihood of (re)-discover archaeological remains not yet unearthed. Archaeological Predictive Models (APMs) have been studied and implemented both in academia, as "locational preference maps" or "distribution maps", both in cultural resource management offices, making numerous steps forward (to cite few examples of APM: Cecamore & Castiello 2014; Nicu 2019; Visentin & Carrer 2017;Anichini et al. 2011;Verhagen & Whitley 2011;Ford, Clarke & Reisen 2009;Rua 2009;Oštir et al. 2007;de Vries 2007;Verhagen 2007;Kamermans et al. 2005;Ducke & Münch 2005;Ejstrud 2003;Kvamme 1990). The APMs produced so far share a number of common aspects, such as the use of archaeological data and environmental variables and a methodological approach based on multivariate statistical techniques such as Logistic Regression (on this topic see for example: Wachtel et al. 2018;Carrer 2013;Vaughn & Crawford 2009;Espa 2006;Kvamme 1999). Verhagen and Withley (2020) have recently published a comprehensive list of the most popular predictive models developed worldwide to predict the spatial location of archaeological sites. Authors highlight strengths and weaknesses of their applications stressing how, over the last decades, the academic research had to deal with the raising availability and complexity of archaeological datasets, as well as the complexity of the questions they raise. Questions that, in time, led archaeologists to establish new collaborations and exchange with experts in different research domains (Hintz, Laabs & Castiello 2019;Carlson 2017;Barcelo & Bogdanovic 2015;Dubbini & Lodoen 2014;Djindjian 2009;Giligny et al. 2010). It is only recently that archaeological researchers have started to explore more complex models, relying on innovative applications of spatial and statistical computing, as well as on Machine Learning (ML) techniques. So far, these studies are increasingly numerous when dealing with archeological site detection relying on high-resolution satellite or drone imagery, as well as in pottery classification (see for example the works of: Garcia-Molsosa et al. 2021;Orengo et al. 2020;Orengo & Garcia-Molsosa 2019;Caspari & Crespo 2019;Gattiglia 2018;Chen et al 2013). At the best of our knowledge, applications of ML in the field of archaeological site distribution dealing with settlement patterns and location probability assessment are very rare in literature. Märker and Heydari-Guran (2009) used Random Forest (RF) to predict the location of Paleolithic sites in the Zagros Mountains of Iran, which represents a first attempt of application of data-mining approaches in this domain. In a more recent study, Roalkvam (2020) compares logistic regression with RF to formalize and quantitatively evaluate environmental factors for coastal site location in Mesolithic Norway and to determine the evolution in time of the relative importance of these variables. Our study differs from the abovementioned ones not only in that it results in both the assessment of the environmental factors and the elaboration of a predictive map for archeological site location, but also in the implementation of an accurate procedure for the model evaluation and validation. In addition, despite the several examples produced at international level, only few studies concerning application in the domain of APM and mapping were realized for Switzerland. In this regard, the work carried out by Ebersbach (2015) provides a first and almost unique example in Switzerland where the author applies exploratory spatialstatistical analysis for evaluating locational criteria of Roman villas and Neolithic sites in the Canton of Bern.
In the present study, we explore the potential of RF, a learning algorithm based on a multitude of regression trees, to elaborate predictive maps for archaeological Roman settlements in the Canton of Zurich. The predisposing factors suggesting the presence of Roman sites in the area are the environmental features. They are described by: (i) topographic indices derived from the digital elevation model (DEM); (ii) different characteristics related to the soil and its aptitude to agricultural activities; (iii) strategic and water-related criteria on which past populations may have based their site location choice. Although predictive models based on ML have been successfully applied in different environmental studiessuch as geological prospection, geological and mineral mapping (see: Oonk & Spijker 2015;Baudron et al. 2013;Abedi & Nouruzi 2012;Abedi, Nouruzi & Bahroudi 2012) and natural hazard susceptibility mapping (see: Tonini et al. 2020;Tehrany et al. 2019;Reichenbach et al. 2018;Deluigi 2018;Leuenberger et al. 2017;Zêzere et al. 2017;Pham et al. 2016;Goetz et al. 2015) -the present research represents a first example of ML application to carry out an exhaustive analysis for APM and mapping in Switzerland.

STUDY AREA
The study area lies in the current administrative limits of the Canton of Zurich, located in the northeastern part of Switzerland (Figure 1). The territory covers 1729 km² and nowadays is considered productive for about 80% of its area. Forests cover 505 km² and lakes 73 km². Most of the canton consists of narrow river valleys that go towards the Rhine River in the north. Together with the Lakes Zurich, Greifensee, Pfäffikersee, the rivers crossing the region have played important roles for commercial and communication purposes since the antiquity. Turicum (the ancient settlement of the City of Zurich) arose in the Limmat valley as a small artisan settlement (vicus) occupying both sides of the valley and becoming the first military post in the area (Horisberger 2017).
The Roman epoch in Switzerland, lasting from 30 BC to 450 AD, is a well-known period of the history thanks to the numerous literature sources and archaeological discoveries. According to the Archaeological Department of the Canton of Zurich ("Amt fur Raumentwicklung Kantonsarchäologie Zurich"), the region in antiquity was particularly dense with roads and settlements (especially military camps). Numerous vici (small towns) were probably embedded in a wider and dense networking context connecting the heart of the Roman Empire and the Mediterranean coasts to the south, with its northern provinces. The settlements were mainly located on headlands or on the shores of main lakes and surrounded by trenches and fortifications, in an easy to defend position and suitable for the trades, (Flutsch et al. 2002;Furger et al. 2001;Frei-Stolba & Benedetti Martig 1991). At the time, they were probably provided for by a forum, tabernae or thermae, and one or more religious temples (Cramatte 2012;Bögli 1962). According to the historical sources, and to more recent archaeological investigations and analyses, the settlements were often located at a distance of 30 km from each other, corresponding to about one walking day. From the 1 st to 3 rd century AD, vici, as well as a certain number of urban domus, and hundreds of rural villae of varying sizes, intensively occupied the countryside, e.g. the villas of Dietikon, Neftenbach, Buchs, etc. (Ebnöther & Monnier 2002;Ebnöter 1995). Agriculture assumed a very important role as the main subsistence activity. Thus, environmental factors such as the suitability of the terrain for agriculture, the proximity to water resources, and the topographic indices are all highly relevant for the analysis developed in this study.

Roman archaeological sites
The Roman archaeological sites discovered in the study area represent the dependent variable of our model. The original dataset was provided by the cantonal archaeological service of Zurich. It was in the form of a digital table containing a list of surveys carried out in previous decades (5812 entries catalogued until October 2015) covering different epochs (from Mesolithic to Middle Age). But, our current analysis is limited to the Roman period. The table was structured into different columns (i.e. identifier, municipality, type of site, assigned epoch, X and Y geographic coordinates). However, a certain underlying degree of uncertainty exists at this stage, as not all entries had exact coordinates. The dataset was reworked and standardized in order to be processed into a GIS (Geographical Information System). The preprocessing was thus necessary to check information and correct it, and to elaborate it in the form of a well-structured geo-localized punctual dataset. This included the transformation of the coordinates into the official Swiss projected coordinate system and the precision of the name of the modern municipalities, where the archaeological evidences were found. Additionally, the description of each previous survey was verified, and a short interpretation correctly defined and embedded. These interpretations provided detailed information about the nature of the findings or the nature of the site itself, such as a specific socio-economic function, whether it was a permanent or non-permanent settlement (e.g. housing, villae urbanae, rusticae, vici, etc.), a place of worship and of religious identity (e.g. funum, temple, etc.), a burial grave, a necropolis, an artisanal production center, etc. Further screenings allowed us to divide the dataset into two main categories: "settlements" sensu stricto, and "single finds", like pottery shards, coins, etc. Figure 2 shows the amount of discoveries attributed to each class. The final geo-dataset, containing only the information related to the presence of Roman settlements, consists of 227 occurrences. In addition, a random set of pseudoabsences was generated, which are located across the landscape and assumed to be non-sites. This process resulted in a balanced binary geo-dataset of presences and absences for Roman settlements, which is essential for ML modeling purposes.

Predisposing factors
The following geo-environmental features prone to influence the Roman sites location, acting as independent variables of the model, were taken into account: Digital Elevation Model (DEM; altitude) and derivates (slope, northness and eastness); Distance to water (lakes and rivers); Agricultural suitability; Depth of vegetal soil; Soil skeleton; Water saturation and Water storage capacity; Permeability and Nutrient storage capacity (Figure 3). We used a DEM with a cell resolution of 100 m (pixel size = 100 m x 100 m), as for the digital layers on soil properties. Indeed, the present-day topography can be a useful factor to detect relations between site locations and their environmental surroundings (Märker & Bolus 2018). Northness and eastness, corresponding respectively to the cosine and to the minus sine of the aspect angle, were considered instead of the terrain orientation in order to avoid the use of a circular variable that can introduce bias, as very distinct values (0° and 360°) represent the same situation in reality (i.e. north orientation). We also considered the proximity of each archaeological site to lakes and rivers, intended as a source of water supply: the distance values were computed for each pixel, considering the Euclidean distance to the closest water element (i.e. lake or river). Furthermore, we used the soil map 1 (Digitale Bodeneignungskarte der Schweiz, 2012) providing us with the most valuable information about a terrain's suitability for agricultural. Such information provided specific soil properties, each stored as single digital raster layer (agricultural suitability, depth of vegetal soil, soil skeleton, water saturation capacity, water storage capacity, soil permeability capacity, nutrient storage capacity) ranked each into five or six classes based on the different aptitude of the soil for the specific characteristic (Table 1). Soil properties are considered to be significant factors in determining agricultural productivity, which, in turn,  Table 1 for the maps representing the soil properties (i.e. PERMEABILITY, DEEP SOIL, SKELETON SOIL, WATER SATURATION, WATER STORAGE, NUTRIENT STORAGE, AGRICULTURAL SUITABILITY); See Table 2 for the description of different categories of the geotechnical map.
shapes the Roman site distribution patterns (Simpson et al. 2002;Westcott & Brandon 1999). Intensive land use changes and deforestation occurred during the Roman period, probably related to the introduction of agriculture and to the mass movements of human population. These changes are discernible from the soil properties, and confirmed also by recent studies on pollen-based landcover reconstructions, focused on northern and central Europe (Roberts et al. 2018;Wickham 2011). Hence, we decided to use these maps representing soil properties, such as agricultural suitability and nutrient storage capability, texture and specific soil depth, permeability, water saturation and water storage capacity (Nussbaum et al. 2018;Ebersbach 2015). Finally, the geotechnical map of Switzerland provided information about the distribution of the uppermost rock strata ( Table 2). 2 The ensemble of these spatial layers was pre-processed in order to correct and eliminate construction errors (e.g. no-data, and resampled to match the same spatial resolution of 100 meters). The pre-and post-processing of the geographical layers was performed in a GIS environment using ArcGIS Desktop, release 10.7 (ESRI).

METHOD
Random Forest (RF) (Breiman et al. 2018), a machine learning based approach, was used to estimate the probability of discovering archaeological remains, namely Roman sites, in a given area. RF is an ensemble learning algorithm based on decision trees, capable of learning from data and making predictions, starting from the acquired knowledge (i.e., observations) through the modeling of the hidden relationships between a set of input variables (i.e, geo-environmental features prone to influence the location of Roman sites) and output variables (i.e., the archaeological sites). In detail, a decision tree is a decision support system using a treelike model. The paths from root to leaf represent the rules of the model which, for a binary classification problem (e.g., presence or absence of Roman sites) works as follows: internal nodes allow splitting the observations based on the value of a specific attribute (e.g., elevation below or above a certain value); each branch represents the outcome of the previous step, where data are split in two main groups by maximizing the difference among them in terms of presences and absences; each leaf node represents a class label, after computing all attributes (if a roman site is present or absent at pixel level). RF constructs a huge number of independent trees and the prediction of new data is finally computed taking the majority or the soft voting. The latter consists in converting the results of a binary classification, such as the prediction of presence ("yes") or absence ("no") of a Roman site, by counting how many times each observation is classified as "yes" or "no", and by normalizing the result over the total number of predictions. This provides probabilistic outputs, which can be used to elaborate maps identifying areas susceptible to experience the presence of a Roman site, over a rank from very low to very high.
Operationally, a subset of the training dataset is generated by bootstrapping (i.e. random sampling with replacement), while about one-third of the cases, called  'out-of-bag', are left out at each iteration. Then the algorithm creates a decision tree for each training subset, and a reduced number of independent variables are randomly sampled as candidates at each split, which is done by measuring the node impurity. This lets the trees grow and eventually stops when each terminal node contains less than a pre-fixed amount of observations. The out-of-bag are used to optimize the parameters of RF (basically, the number of trees and the reduced number of variables), while a third dataset, named the testing dataset, is used to evaluate the error rate of the final optimized model and to assess its generalization performance. Indeed, at the beginning of the computation, a fraction of the original dataset is normally held out from the construction of the learning model, and used in the final step to assess the ability of the algorithm, trained on independent data (the training dataset), to make good predictions on unused observations (the testing dataset).
Our model involves the generation of pseudo-absences representing the non-sites. To assure a good generalization of the model and to avoid the overestimation of the lower classes, a balanced number of pseudo-absences (i.e. in a number equal to the observed presences) need to be specified. Moreover, RF allows us to measure the relative importance of each variable on the prediction. This is assessed by evaluating the mean decrease accuracy, computed by looking at how much the tree nodes, which use that variable, reduce the mean square errors estimated with the out-of-bag, across all the trees in the forest. Additionally, the partial dependence plots give us a graphical depiction of the marginal effect that each variable has on the class probability.
Two models were compared in this study: the first, including all the geo-environmental features, and a second one considering only the first six most important features resulting from the previous model. The final probability map was elaborated based on the results of the second model. Analyses were performed with the R language and environment for statistical computing (R Core Team, 2018). Specifically, for probability mapping we used the package randomForest (Liaw and Wiener 2002).

Model validation
When dealing with spatial data, as in the present study, observations belonging to the testing dataset are most likely located close to the training ones. This leads to overestimating the predictive performance of the model. This circumstance is known as "spatial autocorrelation", which implies that observations close to each other hold similar characteristics. One way to solve this issue is to select the training and testing data far enough apart in the geographic space by adopting, for example, a statistical technique called spatial k-fold cross validation. This technique consists in splitting the original dataset into a number k of non-overlapping groups, training the model on k-1 sets, and then testing it on the hold out set. The process is repeated k-times and the k-error estimates are finally averaged to yield the overall error rate. In this study, we adopted a 5-folds spatial cross validation (Figure 4). The blockCV package (Valavi et al. 2018) was used for performing spatial cross validation. The model was finally evaluated by using the ROC curve (Receiver Operating Characteristic). This is a graphical technique based on the plotting of the true positives rate (TPR) against the false positives rate (FPR), both expressed as a percentage of the total number, where true positives (TP) (or true negatives, TN) are the correct classifications, while false positives (FP) occurs when outcomes are incorrectly predicted as "yes" when it is actually "no" (and vice-versa for false negatives, or FN). The value of the "Area Under the ROC Curve" (AUC) lies between 0.5, denoting a bad classifier, and 1, denoting an excellent classifier. Both the ROC curve and the corresponding AUC were estimated to evaluate the performance of the model.

RESULTS AND DISCUSSION
The spatial probability of Roman settlements presence in the Canton of Zurich was assessed by using the RF model we implemented. As by-product, the model returns the variable importance ranking. The six most important variables (i.e., DEM, slope, water saturation capacity, geotechnical map, distance to lakes, and soil skeleton) are highlighted in Figure 5a (red rectangle): these were retained as input independent variables in the second model. Their relative importance, resulting from this last, is shown in Figure 5b. Concerning the model validation and the estimation of its predictive performance (Figure 6), the AUC using all the variables is equal to 0.71 (blue line), while for the second model (i.e. considering only the six most important variables) it is slightly higher and equal to 0.72 (red line), which is considered as an acceptable discrimination according to the criteria of Hosmer and Lemeshow (2000). This result attests that variables low in the ranking do not add any supplementary predictive power to the model. They can thus be removed to estimate the final probabilistic output. Figure 7 shows the output probability map obtained by the second model. The probability of finding Roman settlements in a certain area is expressed as percentage, and displayed in ten classes of equal intervals. The highest probability (red areas) are located around the modern urban agglomerates of the cities of Zurich and Winterthur, and in the middle area between these two municipalities, as well as around the main lake, Lake Zurich. This phenomenon can be interpreted in two ways. (i) Modern urban centers in the region were built upon the remaining of ancient settlements, in continuum with the main old vici (like Zurich or Winterthur). This cultural factor may have affected the preference for location choices of past populations, and may have led to new settlements in their neighborhood. (ii) Urbanized areas are more intensively excavated as most archaeological surveys are rescue operations, which happen prior to any construction activity. By consequence, the majority of the archaeological sites discovered today are sites where such excavations take place.
Meanwhile, the lowest probability (blue areas) mainly occurs at the highest altitudes (above 700 m a.s.l. in the south-east of the region), and where unproductive zones (for agriculture) are located. Moreover, different partial dependence plots were evaluated, encapsulating how much each specific class for every single variable has positively or negatively influenced the location of Roman settlements. The partial dependence plots referring to the six most important variables are shown in Figure 8. When looking closely at the partial dependence plots of the DEM and the Slope variables, we can see that the high probability to re-discover sites is mainly located between 400 and 600 m a.s.l. (within a slope of less than eight degrees). With regard to the distance to lakes, the highest probability lies between 5.5 km and 8.5 km away from the lakes. This is not surprising, as a high proportion of the study area is located in a certain distance to the nearest lake. Nevertheless, a small peak of high probability can be observed in close vicinity to the lakes, but not in their immediate vicinity. This points, in fact, to the certain importance of the lakeshores as preferred settlement areas, while avoiding the marshy or humid areas that existed along the lake coasts (also attested by the historical Dufour Map of Switzerland, published between 1845 and 1865). The partial dependence plot of the geotechnical classes, showing clays, gravels and glacial moraine deposits, as wells as marls, reveals recurrent types, corresponding to soils that are well-suited for agricultural uses. The partial dependence plots of the water saturation capacity and soil skeleton variables show that the class #1 (unknown, see Table 1) occurs as the most influencing factor for the location of Roman settlements. Class #1 corresponds to the urbanized areas where no soil analyses were performed. It is essential to remember here that these variables, along with the depth of vegetal soil, water storage, permeability and nutrient storage capacity, derive directly from the soil map, and were compiled first for agricultural purposes by the Swiss Federal Office for Agriculture, in 2012. It therefore should come as no surprise that the digitization process may have produced less accurate information, or no data at all, with respect to those areas falling within modern urbanized agglomerations. This observation corroborates the correspondence we saw between high probability classes in urbanized areas, as discussed above.
Dealing with archeological remains, which are by nature under-sampled spatio-temporal data, is a fundamental issue in archeological predictive modeling. The final dataset of Roman settlements in the Canton of Zurich examined in the present study includes only 227 cases over an area of 1729 km 2 . The scarcity of observations can explain the model's fair performance, as evaluated with an AUC of 0.72. Nevertheless, our model allows us to discover an interesting pattern in the distribution of the areas falling into the probability classes above the 50% threshold. These areas are generally located at: (i) about 7 km walking distance from the observed sites; (ii) an elevation of about 500 m.a.s.l. and a slope of less than 10°; (iii) more than 8 km distance from a lake; (iv) they belong to the areas defined as already urbanized in the modern soil map.
Furthermore, the machine learning modeling procedure revealed significant advantages with regard to the state of the art in archaeological predictive modeling studies. It represents an alternative to more classical statistical approaches. The reason why and the advantage of this innovative approach are exposed in the following. (i) It is not affected by any kind of subjective assumption; that is, the selection of variables as well as their weight in the prediction procedure is performed independently, and without any previous assignment of reclassification or threshold values. (ii) No supplementary testing data is needed because the model quality assessment is part of the modeling procedure, as the input data are split in training, testing, and validation (performed by using the 'out of bag'). (iii) The spatial k-cross validation strategy for assessing the AUC using the testing dataset avoided inferring a model from new data that could be very close to, and hold the same characteristics of, the testing set. That would thus result in meaningless modeling performances. In other words, when assuming that two neighboring pixels are (almost) identical, the main goal of the spatial sampling is to avoid using archaeological site evidence for training, and identical evidence (in terms of geo-spatial characteristics) for testing the model. The quality of our classification was thus more honest since it was computed on different data. (iv) The variable ranking allows us to assess which environmental factor is a stronger player, while the partial dependence plot to determine the influence of each class belonging to the considered factors. (v) The model can manage thousands of variables and classes, both categorical and numerical at once, without internal conflict. (vi) The model we developed can be applied to investigate any kind of archaeological evidences and epochs, and (vii) it provides graphical outputs that non-expert readers can easily interpret, including a predictive map allowing them to identify areas where the likelihood of finding an archaeological site is very high.

CONCLUSIONS
This study aimed to outline and test a new predictive modeling technique based on Machine Learning approach, namely Random Forest, in order to identify Roman sites in the canton of Zurich, north-eastern Switzerland, with the help of institutional/legacy data. The predictive model we built here can be easily implemented and updated with the data collected during the most recent surveys (as from October 2015 to date). The model prediction can also be tested on the field through planned surveys. However the quality and quantity of the data used are an important constraint for such such kind of applications, as it is often the case when dealing with archaeological information and modern environmental variables. Thus, the results obtained in this study inevitably raise interesting questions for archaeological managers and researchers, and also shed further light on possible future research avenues. Can Archaeological Predictive Models (APMs) really be used to comply with inventory requirements? To implement and improve data collection and storage strategies? To forecast effects, and make proactive streamlined management decisions regarding where to focus Cultural Heritage Management efforts? As this study shows, the answer to these questions is affirmative. The Random Forest based approach has demonstrated that it is a helpful instrument in overcoming issues related to data size, structure, and reliability. This study showed the importance of quantitative analysis for assessing the reliability of data on Roman settlement patterns in the Canton of Zurich and it has provided important insights for the interpretation and quantification of the variables that were only empirically considered to be important factors for locational strategies. Machine learning-based approaches are indeed able to extract insights and knowledge directly from data, and the algorithm can successfully highlight the relationships among the observed events (i.e., the archeological sites) and the prone environmental features, thus identifying trends and patterns hardly discernible by the human eye.
Finally, it is worth pointing out that the research developed in the present study is very promising in terms of technological innovation. Given the lack of previous quantitative investigations in this region, our study raises awareness on the necessity of employing quantitative methods to tackling more urgent questions, such as the protection and preservation of endangered archaeological sites. It also helps assess the importance of research biases and locational choices.