A Model-Based Statistical Classification Analysis for Karamattepe Arrowheads

The Nif Dağı Research Project has been led by Prof. Dr. Elif Tül Tulunay since 2006. Nif Dağı (Mount Nif) is located at the eastern side of Izmir province, in Turkey’s western Anatolia region. The project is carried out in four different excavation sites, which are located in the southeast of Nif Dağı (Tulunay 2006). A significant site of the project is Karamattepe, defined as a rural metal production area in the Archaic Period (6th century BC), which stands apart because of its rich metal finds including approximately 500 pieces of iron and bronze arrowheads, four metal production furnaces and extremely high amounts of slags. Given the rich metal finds, the area was probably used as a metal production workshop. No inclusive classification study similar to early Neolithic El Khiam arrowheads has been found for comparing the arrowheads produced in this region with those found in the antecedent or contemporaneous regions (Echegaray 1966:50). Although there is a significant lack of scholarly interest in metal finds dating from before the Persian invasion (5th century BC) in Western Anatolia, a common typology could be useful for the archaeologists working in the area to compare the production processes and to clarify social and economic relations of the period. When analysing the current status of the excavation area, one can see that since 2006, a total of 483 arrowheads made of iron and bronze have been unearthed. In the excavation process, numerous academic publications related to arrowheads have been added to the literature. Associate Professor Dr. Daniş Baykan has also carried out a typological study in 2012 (Baykan 2012). In this study, arrowheads were classified into nine different categories, with seven of them present in Karamattepe. Baykan’s classification study only took into consideration the material (iron or bronze) and the observable similarities of the arrowheads, especially those of appearance; also, this research was limited to items from this particular excavation area. Since 2017, all metric data obtained in the related excavation process have been digitalized. Nevertheless, many variations within 10 years of collected data and various omissions due to missing measurements have been detected in the data set. The main objective of the current study is to create an unbiased classification tool which will cover both metric and categorical characteristics for present and future arrowheads gathered from Karamattepe, an outstanding region of West Anatolia for metal production in the Archaic period; as well as to resolve the lack of a common typological base by providing a comparison with similar – contemporaneous – arrowheads found in nearby regions. Of course, it would not be accurate to create a comprehensive model to define all types of Archaic period arrowheads from such a limited data set; however, the common metrics for the arrowheads are thought to be sufficient for a general assessment. Tuncali Yaman, T. 2019. A Model-Based Statistical Classification Analysis for Karamattepe Arrowheads. Journal of Computer Applications in Archaeology, 2(1), pp. 12–20. DOI: https://doi.org/10.5334/jcaa.20


Introduction
The Nif Dağı Research Project has been led by Prof. Dr. Elif Tül Tulunay since 2006.Nif Dağı (Mount Nif) is located at the eastern side of Izmir province, in Turkey's western Anatolia region.The project is carried out in four different excavation sites, which are located in the southeast of Nif Dağı (Tulunay 2006).A significant site of the project is Karamattepe, defined as a rural metal production area in the Archaic Period (6th century BC), which stands apart because of its rich metal finds including approximately 500 pieces of iron and bronze arrowheads, four metal production furnaces and extremely high amounts of slags.Given the rich metal finds, the area was probably used as a metal production workshop.No inclusive classification study similar to early Neolithic El Khiam arrowheads has been found for comparing the arrowheads produced in this region with those found in the antecedent or contemporaneous regions (Echegaray 1966:50).Although there is a significant lack of scholarly interest in metal finds dating from before the Persian invasion (5th century BC) in Western Anatolia, a common typology could be useful for the archaeologists working in the area to compare the production processes and to clarify social and economic relations of the period.When analysing the current status of the excavation area, one can see that since 2006, a total of 483 arrowheads made of iron and bronze have been unearthed.In the excavation process, numerous academic publications related to arrowheads have been added to the literature.Associate Professor Dr. Daniş Baykan has also carried out a typological study in 2012 (Baykan 2012).In this study, arrowheads were classified into nine different categories, with seven of them present in Karamattepe.
Baykan's classification study only took into consideration the material (iron or bronze) and the observable similarities of the arrowheads, especially those of appearance; also, this research was limited to items from this particular excavation area.
Since 2017, all metric data obtained in the related excavation process have been digitalized.Nevertheless, many variations within 10 years of collected data and various omissions due to missing measurements have been detected in the data set.
The main objective of the current study is to create an unbiased classification tool which will cover both metric and categorical characteristics for present and future arrowheads gathered from Karamattepe, an outstanding region of West Anatolia for metal production in the Archaic period; as well as to resolve the lack of a common typological base by providing a comparison with similarcontemporaneous -arrowheads found in nearby regions.Of course, it would not be accurate to create a comprehensive model to define all types of Archaic period arrowheads from such a limited data set; however, the common metrics for the arrowheads are thought to be sufficient for a general assessment.
Given this objective, the following sections include information about the data at hand, followed by information on the as yet missing parts of the data set and an overall introduction of the data completion model selected in accordance with the lack of a reliable estimation model.Classification results of the Imputation Method will be discussed in light of the aim of the study, followed by suggestions for future methodologies.

Material
This section contains illustrative statistics pertaining to the arrowheads, excavated within a 10-year interval.Below is the distribution of a total of 483 arrowheads; 468 made of iron and 15 made of bronze, according to Baykan's typology (Table 1).
For the classification process, given the aim of the study, a Multinomial Logistic Regression Model (Hosmer & Lemeshow 2000) was run with the Stepwise method to include all the variables; however, the procedure could not be completed due to the missing elements in the data set.Therefore, a few variables with the highest amount of incompleteness such as Stem Width (cm), Driller Length (cm), and Helve Thickness (cm) were omitted and the model was run again, resulting only in a useless estimate which gave statistically insignificant results without taking any of the variables into consideration.
Hence, in order to establish a reliable model that fits our research purpose, the missing data must be completed first.Due to the physical inaccessibility of the excavated material that is currently stored in local museum depots, a data completion methodology was adopted using a suitable data imputation method.In the following section, first the theoretical information about the applied methods will be detailed followed by the results obtained by the application.

Methodology
This section discusses the missing data problem in statistics and categorization models, as well as the Multiple Imputation (MI) Method and cases from literature.

The Missing Data Problem
The problem of missing variables in a data set can be found in almost all empirical research (Little & Rubin 2002).The question: "if the data were complete, would the results of the research be different?"(Gold & Bentler 2000;Sung & Geyer 2007) unfortunately does not -and will not -have a  specific answer, since incompleteness causes a decrease in the performance of parameter estimation and sensitivity of the method.If the missing data pattern is non-random, the concern about bias will be added throughout the course of the research.In this context, bias results in the failure of observed data to represent the incomplete parts (Gold & Bentler 2000).In recent years, there has been a growing interest towards discussing missing data mechanisms and consequently different techniques have been introduced (Enders 2003).

The Missing Data Problem in Classification Studies
The incomplete data problem may either evolve naturally, or deliberately.The data set may include latent variables due to the design of the researcher or deviation of respondents (Rubin 1991).For instance, any missing content within a data set based on examinations or tests -which would serve to the identification of a specific illness or the categorization of related patients -would create an error in the decision-making process.As omissions increase the complexity of the analysis, numerous studies have been conducted to analyse the missing data patterns and evaluate their effects on the results of regression models.Incompleteness patterns are primarily categorized into two types: missing at random and missing at non-random.
The non-random pattern may cause an additional effect of bias, since the missing data cannot be assumed to be similar to the unobserved parts of the data.Hence, it is more complicated to handle (Gold & Bentler 2000).

Handling The Missing Data Problem
In the scientific literature, missing data is addressed in various ways; the first one is to omit the missing components and only treat the observed ones (Soley-Bori 2013) While facilitating the implementation, this method excludes the information of incomplete cases.Thus, with the possible differences between the missing and/or observed parts being omitted, the results may fail to reflect the characteristics of the whole data set.
Another approach assigns a predicted value for each missing item in a data set, using the means of observed components to replace the missing values.This algorithm is called Single Imputation and applies standard statistical procedures to a newly predicted complete dataset.Since single imputation considers the process as if there is no missing value in the dataset, it cannot reject the uncertainty of predictions; hence it might yield a biased resulting variance of estimated variables which tend to converge to zero (Yuan 2010).Rubin (1987) thus introduced the concept of multiple imputation, which is a part of the Markov Chain Monte Carlo methods, where all possible values for the latent variables are evaluated, and each one of them is used in parameter estimation (Gold & Bentler 2000;Yuan 2010;Rubin 1991).Markov Chain Monte Carle (MCMC) methods are mostly used in applications where there are intractable densities which are approximated by a finite number of samples (Gelman et al. 2014).The MI procedure imputes each missing variable with a set of possible values.Among the variety of these offered values, the uncertainty of missing variables can be represented.The algorithm could thus perform the standard data analysis procedure to each completed dataset.Then it combines and compares the results for the inference.
In the following section, after a short description about MCMC to clarify the context, more detailed information about MI will be given.For more background about these methods in the literature, see Horton & Lipsitz (2001), Enders (2010), Raghunathan et al. (2001), andBrand et al. (2003).

Markov Chain Monte Carlo Methods (MCMC)
A Markov chain with a stationary distribution π, is a sequence of random variables in which the distribution of each element depends on the value of the previous one.In MCMC, one constructs a Markov chain long enough for the distribution of the elements to stabilize to a common distribution.This stationary distribution is the distribution of interest.By repeatedly simulating steps of the chain, it simulates draws from the distribution of interest (Molenberghs & Kenward 2007:113).For a detailed discussion of this method see Shafer (1997).
In Bayesian inference, information about unknown parameters is expressed in the form of a posterior probability distribution.That is, through MCMC, one can simulate the entire joint posterior distribution of the unknown quantities and obtain simulation-based estimates of posterior parameters that are of interest.(O'Hara et al. 2002;Acquah 2013).
Suppose that we have a m × n matrix X, that consists of observed and missing variables, such that X = (X obs , X mis ) its corresponding n-dimensional parameter vector for the given model is θ = (θ 1 , θ 2 , …, θ n ) T , and the m-dimensional output vector of the model is Y = (Y 1 , Y 2 , …, Y m ) T .Using the Bayesian idea we can write the conditional probability of X mis : ) where we assume X is independent of θ, i.e.The procedure starts with Bayesian learning for variables, and then an MCMC technique is employed to draw samples from the probability distributions learned by the Bayesian network.The algorithm continues iteratively, until convergence is reached.This method provides unbiased estimates, as well as the confidence levels of the imputation results (Ni & Leonard II 2005).

Multiple Imputation (MI)
MI is a MCMC-based method for the analysis of incomplete data sets.The main assumption of the method derives from the assumption that missing observations show a random distribution.A statistical analysis using Joint modelling starts its procedure by determining a parametric multivariate density p(y|θ) for Y with given model parameters θ.The posterior predictive distribution of p(y mis |Y obs ) and the appropriate prior distributions of θ are generated by a Bayesian framework under the assumption of an ignorable missing data mechanism (Rubin 1987).Unlike JM, fully conditional specification does not start with prominent multivariate density.In FCS a separate conditional density p(y j |Y j-1, θj ) for each Y j-1 is implicitly defined by p(y|θ).In order to impute mis j Y given Y j-1 , mentioned density is used.Performing multiple imputations by using FCS, all conditionally specified imputation models (such as linear regression for continuous variables, logistic regression for binary variables, and multinomial regression for polytomous variables) are iterated and each of them consists of one cycle through all Y j .According to Brand et al. (2003) the advantages of the method are listed as: • Simple and versatile • Attributes close to the data, omits impossible data • Subset collection of predictors • Modular, can conserve costly work • Successful, both in simulations and execution The simplicity of the method resulted in its applicability by several software programs.FCS application can be executed by the mice, transcan, mi, VIM, baboon packages in R, the multiple imputation command in SPSS and by the ice and multiple imputation command in SAS IVEware and STATA.
Before the data imputation procedure, the missing data distribution within all the variables (Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm) and Elevation (cm)) of the arrowhead data set has been examined.The "Missing Data Patterns" graph lists distinct missing data patterns.It shows that the data set does not have a monotone missing pattern which means that the distribution of absence does not depend on the data.To be more specific, in this study, the missing data in the studied variables was not originating from (a) specific arrowhead(s) (see Figure 1).Given the missing data patterns and the diversity of variables in the imputation of the data set, the MI Method has been applied with the FCS procedure for this study, because this model completes each and every variable with a characteristic regression model and it gives stable results in case the missing data does not follow a specific pattern.In the imputation process of the missing data, different regression models were established for each variable.Variables of the regression models are Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm) and Elevation (cm).
The following section describes the results from the selected data completion model MI and the classification study.

Results
Based on the FCS based MI results, the most suitable model was selected from 5 different imputation models.The selection criterion was set to be the minimum difference between average and standard deviation values of missing and completed datasets.The descriptive statistics obtained as a result of the procedure are presented below in comparison with the original data.Measured metric variables of arrowheads are Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm), Material (bronze, iron), Plan Square (a location code in excavation area), Absence (Yes, No) and Elevation (cm).
After data completion, Multinomial Logistic Regression Analysis (Multinomial-LOGIT) was again carried out to determine the extent to which the arrowheads coincide with the given estimates and types determined according to the existing typology.Prior to the analysis, the first three categories of the type determined by this expertbased typology were preserved; while the other types (from Type 4 to Type 7) with low frequency were combined to form a single group called Other.Multinomial-LOGIT was performed in IBM SPSS 24 with the NOMREG command.The interactions between Weight and Length attributes (Weight*Length) were taken as an essential variable for the regression model, due to their high correlation coefficient (ρ = 0.493, p < 0.01).The other variables were selected by Stepwise Forward Entry method (Le 1998:112) by considering their statistical significance.The interactive variable of weight and length (WEIGHT * LENGTH), MATERIAL, DRILLER LENGTH, and Absence (ABS) and (ELEVATION) variables have been chosen  df :15, 24.99 χ = According to the Chi-Square Test, the null hypothesis is rejected (α = 0.05).The generated model is sufficient for prediction (Wilks 1938: 62).In the logistic regression analysis, the determination coefficient R 2 gives misleading results.Thus, weighted Cox-Snell's Pseudo R 2 and Nagelkerke's R 2 are preferred.According to Nagelkerke R 2 , the independent variables in the model explain 0.75 of the dependent variable (Table 4).
Since the large Chi-square values are taken as an indication of the model's predictive power and adequacy, prediction of the generated unrestricted model is found to be effective here (Sullivan, Ferguson & Donndelinger 2011).The parameter estimates obtained from the Multi-Nomial Logistic Regression Model and the measures of the goodness of fit are detailed in the following tables (Tables 5  and 6).
When the compliance measures and likelihood ratio tests are examined in the context described above, it can be seen that all variables are statistically significant in the model.
In order to estimate the Multi-Nomial Logistic Regression Model, the category that has the highest frequency is determined as the reference category.Here, Type 1 is selected as the reference category and in the estimation results, For each type, the regression coefficient is interpreted as how that category compares to the reference category (Kuhfeld, Tobias & Garratt 1994).
Interpretation of the coefficients in the logistic model does not make sense, and variables with values less than 1 odds ratio (indicated as Exp (β) on the Table 5) are interpreted as variables that have significant influence on the change of the dependent variable and vice versa (Agresti 2002).
In the parameter estimates of the model, it is apparent that the ABS variable for Type 3 and MATERIAL and ELEVATION for Other have insignificant Wald test statistics due to the insufficient sample size.Fortunately, this does not affect the overall fit of the model.The overall correct assignment rate of the predicted model is 84.1% (Table 7).
Finally, it can be reported that the percentage of correct assignment of the model is relatively high.Correct assignment ratios obtained for Type 1 and 2 are higher than for Type 3 and Other because of the lower sample size of the latter.

Summary, discussion and future research
This study first and foremost aims to create a proper classification based on the high number of arrowheads from Karamattepe, With the clear lack of scholarly interest in the metal finds of the area (Baykan 2015), the absence of a common typology makes it harder for the archaeologists working in the area to compare the production processes and to clarify social and economic relations of the period.This lack of common typology has formed the impetus for this study.The results are expected to facilitate any research and studies about Archaic arrowheads in the region.On the other hand, this study also aims to present an example to the researchers working with similar cases of the use of technology for the analy-  sis of archaeological data; for the completion of missing variables in an archaeological data set is not widely discussed in the literature despite its frequent occurrence (Sneath 1982).
The arrowhead findings in Karamattepe were first classified using an expert-based typology, based on the visual qualifications of the previous finds.This study proposes instead to develop an objective categorization  tool to include the metric and categorical qualifications of these arrowheads.For the classification method, the Multinomial Logistic Regression has been employed, used in conjunction with the Multiple Imputation (MI) FCS method to complete the missing data in order to provide a more coherent estimate.Analysis of all the indicators shows that the model is statistically coherent.In addition to that, the accordance of the results from the model with the expert-based typology is considered as an important factor for its coherence.The results are approximately 84% in accordance with the expert-based typology.
As stated above, the data set at hand has several qualitative and regional limitations.Any common typology depicting an overall picture of archaic arrowheads in this region should be considered imprecise.However, if common metrics could be provided for the arrowhead findings in the neighbouring regions, the model could be considered as a guiding model for a general description.
Hopefully, this study will be considered as the first step to determine a comprehensive typology of the Archaic arrowheads of Western Anatolia.In order to create a more inclusive typological model for future studies, it is essential that data from a wider area within the region should be collected; while the imputation and categorization techniques are also updated with this newly acquired data.Future studies with wider extents are expected to contribute to the sociological, economic and technological analysis in archaeological researches in Western Anatolia.
multiple imputation method typically includes three main steps.The first step involves specifying and generating plausible synthetic data values, called imputations, for the missing values in the data.The second step consists of analysing each imputed data set by a statistical method that will estimate the quantities of scientific interest.The third step pools the m estimates into one estimate, thereby combining the variation within and across the m imputed data sets(van Buuren 2007).According to the notation of the procedure, Y j indicates that one of k incomplete random variables (j = 1, …, k) and (Y = Y 1 , …, Y k ).obs i Y and mis i Y are denotations of the observed and missing parts of Y j .Jointly, observed and missing data in Y showed as missing variable Y k and the k−1 predictors Y -j = (Y 1 , …, Y j-1 , Y j+1 , …,Y k ), where the nature of the relation is primarily estimated from the units contributing to obs j Y .Joint modeling (JM) and fully conditional specifi- cation (FCS) have appeared as two recent approaches for multiple imputations.

Table 3 :
Descriptive statistics before and after data completion.Before evaluating the parameter estimates of the model, it is essential check the overall effectiveness and the predictive power.The goodness of fit coefficient (-2 log λ) was found to be 876.815for the ideal model (which has only a fixed term) and 417.427 for the restricted model (predicted with selected variables).The test statistic, used to test goodness of fit under the hypothesis of validity of the ideal model is calculated below:

Table 5 :
Measures of good compliance with the multi-territorial logistic regression model.

Table 6 :
Parameter estimates of the multinomial logistic model.
a The reference category is: Type 1. b This parameter is set to zero because it is redundant.c Floating point overflow occurred while computing this statistic.Its value is therefore set to system missing.