The Nif Excavation Project is carried out by Elif Tül Tulunayin the southeastern part of
The Nif Dağı Research Project has been led by Prof. Dr. Elif Tül Tulunay since 2006. Nif Dağı (Mount Nif) is located at the eastern side of Izmir province, in Turkey’s western Anatolia region. The project is carried out in four different excavation sites, which are located in the southeast of Nif Dağı (
Baykan’s classification study only took into consideration the material (iron or bronze) and the observable similarities of the arrowheads, especially those of appearance; also, this research was limited to items from this particular excavation area.
Since 2017, all metric data obtained in the related excavation process have been digitalized. Nevertheless, many variations within 10 years of collected data and various omissions due to missing measurements have been detected in the data set.
The main objective of the current study is to create an unbiased classification tool which will cover both metric and categorical characteristics for present and future arrowheads gathered from Karamattepe, an outstanding region of West Anatolia for metal production in the Archaic period; as well as to resolve the lack of a common typological base by providing a comparison with similar –contemporaneous – arrowheads found in nearby regions. Of course, it would not be accurate to create a comprehensive model to define all types of Archaic period arrowheads from such a limited data set; however, the common metrics for the arrowheads are thought to be sufficient for a general assessment.
Given this objective, the following sections include information about the data at hand, followed by information on the as yet missing parts of the data set and an overall introduction of the data completion model selected in accordance with the lack of a reliable estimation model. Classification results of the Imputation Method will be discussed in light of the aim of the study, followed by suggestions for future methodologies.
This section contains illustrative statistics pertaining to the arrowheads, excavated within a 10-year interval. Below is the distribution of a total of 483 arrowheads; 468 made of iron and 15 made of bronze, according to Baykan’s typology (Table
Distribution of arrowheads according to the typology developed by Baykan (
Typology | Frequency | % |
---|---|---|
Type 1 | 289 | 59.8 |
Type 2 | 129 | 26.7 |
Type 3 | 28 | 5.8 |
Type 4 | 13 | 2.7 |
Type 5 | 8 | 1.7 |
Type 6 | 5 | 1 |
Type 7 | 2 | 0.4 |
N/A | 9 | 1.9 |
Total | 483 | 100 |
The descriptive statistics calculated from the data at hand for all the metric variables of arrowheads are: Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm), Material (bronze, iron), Absence (Yes, No) and Elevation (cm). They are shown in Table
Descriptive statistics of the arrowheads.
Total | Type | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type 1 | Type 2 | Type 3 | Type 4 | Type 5 | Type 6 | Type 7 | N/A | ||||||||||||
Mean | n | Mean | n | Mean | n | Mean | n | Mean | n | Mean | n | Mean | n | Mean | n | Mean | n | ||
Weight | 8.92 | 456 | 8.05 | 274 | 11.87 | 123 | 5.08 | 28 | 8.70 | 12 | 4.38 | 6 | 4.43 | 2 | 10.68 | 2 | 11.12 | 9 | |
Length | 5.67 | 471 | 5.79 | 286 | 5.59 | 123 | 5.74 | 28 | 5.62 | 13 | 3.56 | 8 | 3.75 | 4 | 4.40 | 2 | 5.99 | 7 | |
Width | 1.05 | 324 | 1.02 | 268 | 1.15 | 24 | 1.37 | 3 | 0.96 | 12 | 1.24 | 7 | 0.97 | 3 | 1.50 | 1 | 1.57 | 6 | |
Thickness | 0.80 | 220 | 0.80 | 128 | 0.94 | 57 | 0.45 | 18 | 0.77 | 3 | 0.66 | 6 | 0.40 | 1 | 0.85 | 2 | 0.66 | 5 | |
Case width | 1.13 | 110 | 0 | 1.16 | 102 | 0 | 0 | 0.70 | 4 | .070 | 3 | 0 | 1.2 | 1 | |||||
Helve thickness | 0.55 | 295 | 0.55 | 248 | 1.34 | 5 | 0.40 | 24 | 0.50 | 12 | 0.80 | 1 | 0 | 0.30 | 1 | 0.75 | 4 | ||
Stem width | 1.43 | 28 | 0.90 | 1 | 0 | 1.44 | 25 | 0 | 1.30 | 1 | 0 | 1.70 | 1 | 0 | |||||
Driller length | 3.65 | 425 | 4.12 | 261 | 2.59 | 116 | 4.52 | 26 | 2.57 | 12 | 2.50 | 4 | 2.35 | 2 | 4.40 | 1 | 2.73 | 3 | |
Elevation | 437.2 | 419 | 436.9 | 257 | 437.5 | 110 | 439.4 | 23 | 435.6 | 8 | 436.9 | 8 | 434.9 | 3 | 437 | 2 | 438.1 | 8 | |
Abs | Yes | 103 | 46 | 38 | 3 | 7 | 5 | 0 | 2 | ||||||||||
No | 338 | 225 | 77 | 20 | 4 | 3 | 2 | 6 | |||||||||||
Material | Iron | 468 | 289 | 129 | 28 | 13 | 0 | 0 | 8 | ||||||||||
Bronze | 15 | 0 | 0 | 0 | 0 | 8 | 2 | 1 |
For the classification process, given the aim of the study, a Multinomial Logistic Regression Model (
Hence, in order to establish a reliable model that fits our research purpose, the missing data must be completed first. Due to the physical inaccessibility of the excavated material that is currently stored in local museum depots, a data completion methodology was adopted using a suitable data imputation method. In the following section, first the theoretical information about the applied methods will be detailed followed by the results obtained by the application.
This section discusses the missing data problem in statistics and categorization models, as well as the Multiple Imputation (MI) Method and cases from literature.
The problem of missing variables in a data set can be found in almost all empirical research (
The incomplete data problem may either evolve naturally, or deliberately. The data set may include latent variables due to the design of the researcher or deviation of respondents (
In the scientific literature, missing data is addressed in various ways; the first one is to omit the missing components and only treat the observed ones (
Another approach assigns a predicted value for each missing item in a data set, using the means of observed components to replace the missing values. This algorithm is called Single Imputation and applies standard statistical procedures to a newly predicted complete dataset. Since single imputation considers the process as if there is no missing value in the dataset, it cannot reject the uncertainty of predictions; hence it might yield a biased resulting variance of estimated variables which tend to converge to zero (
Rubin (
In the following section, after a short description about MCMC to clarify the context, more detailed information about MI will be given. For more background about these methods in the literature, see Horton & Lipsitz (
A Markov chain with a stationary distribution π, is a sequence of random variables in which the distribution of each element depends on the value of the previous one. In MCMC, one constructs a Markov chain long enough for the distribution of the elements to stabilize to a common distribution. This stationary distribution is the distribution of interest. By repeatedly simulating steps of the chain, it simulates draws from the distribution of interest (
In Bayesian inference, information about unknown parameters is expressed in the form of a posterior probability distribution. That is, through MCMC, one can simulate the entire joint posterior distribution of the unknown quantities and obtain simulation-based estimates of posterior parameters that are of interest. (
Suppose that we have a
where we assume
The procedure starts with Bayesian learning for variables, and then an MCMC technique is employed to draw samples from the probability distributions learned by the Bayesian network. The algorithm continues iteratively, until convergence is reached. This method provides unbiased estimates, as well as the confidence levels of the imputation results (
MI is a MCMC-based method for the analysis of incomplete data sets. The main assumption of the method derives from the assumption that missing observations show a random distribution. A statistical analysis using multiple imputation method typically includes three main steps. The first step involves specifying and generating plausible synthetic data values, called imputations, for the missing values in the data. The second step consists of analysing each imputed data set by a statistical method that will estimate the quantities of scientific interest. The third step pools the m estimates into one estimate, thereby combining the variation within and across the m imputed data sets (
According to the notation of the procedure,
Joint modelling starts its procedure by determining a parametric multivariate density
Simple and versatile
Attributes close to the data, omits impossible data
Subset collection of predictors
Modular, can conserve costly work
Successful, both in simulations and execution
The simplicity of the method resulted in its applicability by several software programs. FCS application can be executed by the
Before the data imputation procedure, the missing data distribution within all the variables (Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm) and Elevation (cm)) of the arrowhead data set has been examined. The “Missing Data Patterns” graph lists distinct missing data patterns. It shows that the data set does not have a monotone missing pattern which means that the distribution of absence does not depend on the data. To be more specific, in this study, the missing data in the studied variables was not originating from (a) specific arrowhead(s) (see Figure
Distribution of missing values.
Detailed information and comparison regarding various data imputation methods can be found in Demirtas, Freels & Yucel (
The following section describes the results from the selected data completion model MI and the classification study.
Based on the FCS based MI results, the most suitable model was selected from 5 different imputation models. The selection criterion was set to be the minimum difference between average and standard deviation values of missing and completed datasets. Table
Descriptive statistics before and after data completion.
Original Data | Missing Data | Completed Data (N:483) | |||||
---|---|---|---|---|---|---|---|
N | Mean | Standard Deviation | Frequency | % | Mean | Standard Deviation | |
Weight | 456 | 8.91 | 3.82 | 27 | 6 | 8.92 | 3.86 |
Length | 471 | 5.67 | 1.30 | 12 | 2 | 5.67 | 1.30 |
Width | 324 | 1.04 | 1.19 | 159 | 33 | 1.16 | 1.20 |
Thickness | 220 | 0.80 | 0.24 | 263 | 54 | 0.78 | 0.33 |
Case Width | 110 | 1.12 | 0.25 | 373 | 77 | 1.00 | 0.31 |
Helve Thickness | 295 | 0.55 | 0.75 | 188 | 39 | 0.60 | 0.72 |
Stem Width | 28 | 1.42 | 0.18 | 455 | 94 | 1.51 | 0.29 |
Driller Length | 425 | 3.64 | 1.03 | 58 | 12 | 3.62 | 1.05 |
Elevation | 419 | 437.20 | 3.32 | 64 | 13 | 437.22 | 3.38 |
10 Iterations have been executed in the procedure, and the existing typology classification (
The descriptive statistics obtained as a result of the procedure are presented below in comparison with the original data. Measured metric variables of arrowheads are Weight (gr), Length (cm), Width (cm), Thickness (cm), Case Width (cm), Stem Width (cm), Driller Length (cm), Helve Thickness (cm), Material (bronze, iron), Plan Square (a location code in excavation area), Absence (Yes, No) and Elevation (cm).
After data completion, Multinomial Logistic Regression Analysis (Multinomial-LOGIT) was again carried out to determine the extent to which the arrowheads coincide with the given estimates and types determined according to the existing typology. Prior to the analysis, the first three categories of the type determined by this expert-based typology were preserved; while the other types (from Type 4 to Type 7) with low frequency were combined to form a single group called
According to the Chi-Square Test, the null hypothesis is rejected (α = 0.05). The generated model is sufficient for prediction (
Goodness-of-fit criteria.
Value | |
---|---|
882.815 | |
453.427 | |
Cox and Snell’s Pseudo R^{2} | 0.647 |
Nagelkerke R^{2} | 0.750 |
McFadden | 0.524 |
Pearson Chi – Squared | 11443.137 |
Relative Chi – Squared | 417.427 |
Since the large Chi-square values are taken as an indication of the model’s predictive power and adequacy, prediction of the generated unrestricted model is found to be effective here (
Measures of good compliance with the multi-territorial logistic regression model.
GoF (Restricted Model) | Likelihood Ratio Test | |||||
---|---|---|---|---|---|---|
AIC | BIC | –2 |
χ^{2} | df | p | |
Intercept | 453.427 | 527.030 | 417.427^{a} | 0.000 | 0 | |
WEIGHT * LENGTH | 581.545 | 642.881 | 551.545^{b} | 134.118 | 3 | 0.000 |
MATERIAL | 518.147 | 579.482 | 488.147 | 70.719 | 3 | 0.000 |
DRILLER LENGTH | 744.169 | 805.504 | 714.169^{b} | 296.742 | 3 | 0.000 |
ABS | 456.826 | 518.162 | 426.826^{b} | 9.399 | 3 | 0.024 |
ELEVATION | 461.745 | 523.081 | 431.745^{b} | 14.318 | 3 | 0.003 |
^{a} This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of freedom.
^{b} Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged.
Parameter estimates of the multinomial logistic model.
Type^{a} | Variables | B | St. Error | Wald | df | p. | Exp(β) | %95 CI for Exp(B) | |
---|---|---|---|---|---|---|---|---|---|
Lower Bound | Upper Bound | ||||||||
2 | Intercept | –91.247 | 35.72 | 6.522 | 1 | 0.01 | |||
WEIGHT * LENGTH | 0.057 | 0.008 | 52.05 | 1 | 0.00 | 1.058 | 1.042 | 1.075 | |
[MATERIAL=IRON] | –1.278 | 0.000 | 1 | 0.279 | 0.279 | 0.279 | |||
[MATERIAL=BRONZE] | 0^{b} | 0 | |||||||
DRILLER LENGTH | –2.745 | 0.275 | 99.64 | 1 | 0.00 | 0.064 | 0.037 | 0.110 | |
[ABS=YES] | 0.961 | 0.436 | 4.843 | 1 | 0.02 | 2.613 | 1.111 | 6.147 | |
[ABS=NO] | 0^{b} | 0 | |||||||
ELEVATION | 0.223 | 0.082 | 7.393 | 1 | 0.00 | 1.250 | 1.064 | 1.467 | |
3 | Intercept | –106.93 | 38.35 | 7.774 | 1 | 0.00 | |||
WEIGHT * LENGTH | –0.125 | 0.024 | 26.65 | 1 | 0.00 | 0.882 | 0.842 | 0.925 | |
[MATERIAL=IRON] | –0.112 | 0.000 | 1 | 0.894 | 0.894 | 0.894 | |||
[MATERIAL=BRONZE] | 0^{b} | 0 | |||||||
DRILLER LENGTH | 2.281 | 0.434 | 27.56 | 1 | 0.00 | 9.787 | 4.177 | 22.934 | |
[ABS=YES] | 0.826 | 0.756 | 1.193 | 1 | 0.27 | 2.284 | 0.519 | 10.054 | |
[ABS=NO] | 0^{b} | 0 | |||||||
ELEVATION | 0.228 | 0.087 | 6.803 | 1 | 0.00 | 1.256 | 1.058 | 1.490 | |
4 | Intercept | –43.918 | 1228 | 0.001 | 1 | 0.97 | |||
WEIGHT * LENGTH | 0.045 | 0.010 | 20.21 | 1 | 0.00 | 1.047 | 1.026 | 1.068 | |
[MATERIAL=IRON] | –20.958 | 1227 | 0.000 | 1 | 0.98 | 7.907E-10 | 0.000 | .^{c} | |
[MATERIAL=BRONZE] | 0^{b} | 0 | |||||||
DRILLER LENGTH | –2.280 | 0.374 | 37.24 | 1 | 0.00 | 0.102 | 0.049 | 0.213 | |
[ABS=YES] | 1.637 | 0.595 | 7.567 | 1 | 0.00 | 5.138 | 1.601 | 16.491 | |
[ABS=NO] | 0^{b} | 0 | |||||||
ELEVATION | 0.153 | 0.127 | 1.461 | 1 | 0.22 | 1.166 | 0.909 | 1.495 |
^{a} The reference category is: Type 1.
^{b} This parameter is set to zero because it is redundant.
^{c} Floating point overflow occurred while computing this statistic. Its value is therefore set to system missing.
When the compliance measures and likelihood ratio tests are examined in the context described above, it can be seen that all variables are statistically significant in the model.
In order to estimate the Multi-Nomial Logistic Regression Model, the category that has the highest frequency is determined as the reference category. Here, Type 1 is selected as the reference category and in the estimation results, For each type, the regression coefficient is interpreted as how that category compares to the reference category (Kuhfeld,
Interpretation of the coefficients in the logistic model does not make sense, and variables with values less than 1 odds ratio (indicated as Exp (β) on the Table
In the parameter estimates of the model, it is apparent that the ABS variable for
Classification results.
Type | Predicted | |||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
1 | 255 | 14 | 2 | 0 | ||
2 | 20 | 95 | 0 | 0 | ||
3 | 16 | 0 | 7 | 0 | ||
4 | 5 | 12 | 1 | 14 | ||
Finally, it can be reported that the percentage of correct assignment of the model is relatively high. Correct assignment ratios obtained for
This study first and foremost aims to create a proper classification based on the high number of arrowheads from Karamattepe, With the clear lack of scholarly interest in the metal finds of the area (
The arrowhead findings in Karamattepe were first classified using an expert-based typology, based on the visual qualifications of the previous finds. This study proposes instead to develop an objective categorization tool to include the metric and categorical qualifications of these arrowheads. For the classification method, the Multinomial Logistic Regression has been employed, used in conjunction with the Multiple Imputation (MI) FCS method to complete the missing data in order to provide a more coherent estimate. Analysis of all the indicators shows that the model is statistically coherent. In addition to that, the accordance of the results from the model with the expert-based typology is considered as an important factor for its coherence. The results are approximately 84% in accordance with the expert-based typology.
As stated above, the data set at hand has several qualitative and regional limitations. Any common typology depicting an overall picture of archaic arrowheads in this region should be considered imprecise. However, if common metrics could be provided for the arrowhead findings in the neighbouring regions, the model could be considered as a guiding model for a general description.
Hopefully, this study will be considered as the first step to determine a comprehensive typology of the Archaic arrowheads of Western Anatolia. In order to create a more inclusive typological model for future studies, it is essential that data from a wider area within the region should be collected; while the imputation and categorization techniques are also updated with this newly acquired data. Future studies with wider extents are expected to contribute to the sociological, economic and technological analysis in archaeological researches in Western Anatolia.
I am grateful to the site director, Prof. Dr. Elif Tül Tulunay and her deputies Assoc. Prof. Dr. Daniş Baykan and Assoc. Prof. Dr. Müjde Peker for their support.
The author has no competing interests to declare.