A critical aspect of analysing an archaeological site is identifying the network of relationships between the things we find and the locations where we find them. These associations are typically determined by a combination of quantitative analyses and the professional knowledge and intuition of the archaeologist, but where exactly is the boundary between what is truly empirical field data and what is inferred through our prior knowledge and field methods? How can we best support those inferences? This paper is a critical evaluation of that boundary to firmly ground, as much as possible, a quantitative analysis on only that which we can directly observe – the thing and its location – and derive associations from that basis alone. To do so, the approach described here relies on a combination of set and graph theories rather than statistical or spatial methods. This revised ontology allows a formalization, in combinatorial terms, for describing an underlying structure to contexts and assemblages that suggests a clear association between archaeological site analysis and a wellstudied class of set and graph covering problems. This, in turn, points towards potential algorithmic solutions for a more holistic parsing of the total relationships between sites, contexts, assemblages, proveniences, and artefacts.
A fundamental aspect of archaeological work is identifying patterns within a site of interest. The data most typically used to identify interpretable patterns are derived from our unit locations and stratigraphic levels, section profiles, numerous maps and detailed plans in addition to any artefacts collected. This information constitutes the empirical archaeological record – the physical, observed, measured, counted, and mapped samples from which we will build,
As we proceed through from excavation to interpretation, each step of the archaeological process entails a certain increase in abstraction from those initial empirical data. Archaeologists commonly expect, due to this incomplete nature of archaeological materials, that our inferences will reflect a certain amount of necessarily interpolated and extrapolated conclusions. Thus, we infer patterns from both the consistencies and the discontinuities between and among the data of the archaeological record. Each interpretative step we take away from the empirical data leads to an aggregation of further inferences. Such abstractions also, necessarily, involve a corresponding degree of information loss as particulars are subsumed into generalizations.
The uncertainty introduced by moving incrementally further from the empirical basis of our data underscores the most difficult and pertinent question for interpreting the archaeological record – how can we show that our inferences, inasmuch as they are based on that empirical record, are correct? In other words, are we reasonably certain that we’ve correctly identified which samples belong to which contexts? Can we demonstrate that our assemblages are, in fact, related? How can we provide stronger evidence to support whether our samples are truly associated? Is there a way to penetrate the intervening layers of noise, untangle the cumulative transformations of postdepositional processes, and get a glimpse of the site as it was originally?
At the core of it, the problem is that the analyses of spatial and temporal patterns of interest are derived from the underlying structure of relationships between those empirical
The process must start, then, not with the site’s spatial patterns or assemblages, but by first establishing the underlying structural network of empirical associations within and between the site’s excavated samples. In other words, there is a certain amount of preprocessing of the field data that must occur in order to have a more solid and supportable means of access to the information needed to reconstruct the other patterns of interest. The first step should be to determine
This paper presents a methodology, with its quantitative and computational implications, to ascertain those
The premise of the methodology presented here is that an assemblage of artefacts from any excavated sample must be a subset of the total assemblage within its associated stratigraphic context (see Figure
The goals of intrasite analysis based on the relationships between the basic elements of a site.
Thus, this methodology entails what should be the initial stage of analysis – finding and defining the empirical interrelationships. Therefore, this step occurs
Different regions, specializations, and subdisciplines tend to evolve their own jargon and nomenclature for field excavation and its units (e.g., “levels” versus “splits”, “units” or “ops and subops”, etc.). As a Northeastern U.S. cultural resource archaeologist, I generally work (and think) in terms of discrete gridded units and their vertical subdivisions – i.e., systematic sampling rather than complete context excavation. This does have some ramifications for the terminology I will use here. Most notably, the usage of the terms
References in this paper to
Both
The term
Reconsidering the first stages of archaeological analysis requires reexamining some of our basic assumptions about how to approach decoding the archaeological record. We must start with thinking through our initial processes of establishing the linkages between each excavated sample within a site before we consider reconstructing stratigraphic sequences, spatial distributions, or formation processes. These linkages inform many of the initial questions of interpreting field data, such as:
How many distinct stratigraphic components are reflected by the samples?
Which layers in one unit correspond to which layers in other units?
Which artefact types occur together and may represent a deposited assemblage?
Which assemblages are related, and how?
Which units contain related layers or materials?
How many distinct formation processes produced the observed stratigraphy?
Often, these questions and the initial relationships between excavated samples are matters of intuition and professional judgement by the archaeologist. For small or single occupation sites, this is relatively manageable by direct assessment of artefact inventories and contingency tables, maps, field notes, and stratigraphic section profiles.
Larger or more complicated sites, however, can quickly become intractably difficult. Such sites can present numerous obstacles to assessing the stratigraphic contexts from soil section data. A variety of field and site conditions can lead to ambiguous boundaries between soil strata or other layered deposits. For example, sampling units may not be spatially contiguous. A site may have had multiple overlapping occupations and/or activities, each producing large and diverse artefact assemblages. Different contexts could intrude through or intermingle with previous or contemporaneous deposits. There may be significant postdeposition disturbance due to natural processes, later historical occupations, or modern construction. Soil strata may not have clear and obvious demarcations, or they may consist of multiple lenses of disparate soils.
As these complexities multiply, the cumulative processes affecting the spatial distributions and stratification of the soil matrix quickly obviate any simple assessment and interpretation of stratigraphic contexts. The original linkages between samples are not always apparent. In these cases, it may not be feasible to reconstruct the stratigraphic sequences and their spatial boundaries from soil descriptors or otherwise to determine associations between samples solely by their stratigraphic matrix. A supplemental or alternative diagnostic measure is required.
The artefact content of an excavated sample, though certainly not the only pertinent data, is generally considered an important diagnostic for reconstructing those associations. Whether by typology, known temporal range, or functional or morphological classification, the artefacts provide a significant source of information for identifying the related proveniences and contexts by which the sequence and spatial boundaries of a site’s deposits are determined. Much of the early work in statistical archaeology (see historical retrospectives such as
Being able to associate groups of artefacts into assemblages is an essential aspect of linking excavated samples to their contexts in conjunction with (or in the absence of) contextual stratigraphic matrices. With the advent of spatially oriented statistics in archaeology (e.g.,
Quantitative approaches for assemblage analysis still tend to focus, however, on either specific artefact subsets (e.g.,
I don’t intend an exhaustive survey of the literature and methods, merely to highlight that determining the patterned structure of deposition within an archaeological excavation has motivated significant progress in quantitative and computational analyses. Intrasite assemblage and stratigraphic analyses, in particular, present distinct methodological challenges. Even under ideal data recovery conditions – i.e., a wellpreserved site, clear stratigraphic demarcations, complete feature and component excavation, pointprovenience mapping of finds, wellestablished typologies and chronologies, etc. – it is difficult to untangle the spatial network of relationships within a site. Often, however, it is the less than ideal conditions for which these methods are needed most. Such sites provide, however, the least suitable data for the majority of current approaches.
Most of my work is with just such “less than ideal” sites, and various attempts at adapting quantitative methods to address problematic analyses has led to the methodology presented here. In cultural resource management (CRM) archaeology, it is very common that the terms and scope of excavations are either determined or substantially constrained by nonarchaeological considerations. The spatial extent of excavations may be limited by the footprint of potential construction effects, or only a certain percentage of the estimated area of a site designated for systematic excavations for example. Time, schedule, and budgets are always a concern. This, of course, frequently results in partial site excavations, limited samples, noncontiguous excavation units, and incomplete assemblages. Such sites can and should certainly still yield significant information, but the constraints and fragmentary data do limit the options for analysis. These restrictions also accentuate certain limitations and assumptions inherent in existing methods for spatial and contextual analyses of assemblages or excavated components.
The majority of approaches used to analyse associations within assemblage data consist of a target diagnostic of interest (e.g., particular types or categories of finds) and statistical inferences based on these tabulated type frequencies. Such frequency tables effectively reflect a compositional profile that characterizes each sample and highlights similarities and/or distinctions across these samples. The reliance on contingency tables emphasizes pairwise comparisons, based on various distance measures, within and between types and their sampling unit of analysis (i.e., site, context, activity area, or excavated component).
These approaches entail an assumption that either: 1) the
Approaches grounded in spatial methods involve various analyses of topology or patterning in the distribution of a target feature of interest (e.g., artefact types, assemblages, soil morphology, or features), generally leading to some form of interpolation of density or probability across the area of study. For spatial analyses, there are a number of conditions and requirements for data that necessitate specific field sampling and recording strategies. Distribution analysis – whether by point patterns, lattices, or surfaces – requires sufficient areal sampling coverage and resolution for the study area to provide representative data. The shape of the distribution for the target feature of interest, and any clustering or patterning to that distribution, is highly sensitive to both the precision of the source data and selection and conceptualization of the target feature. Spatial statistics are predicated on proximity weighting of features across geographic (i.e., planar) space, so the lowest resolution of recording determines the overall quality of spatial analysis.
Most of the available methods of spatial analysis are robust for sites where it is feasible to have large area blocks of controlled excavation, with precise mapping of finds and features, or those that entail substantial regular grids of sample units. For many projects, however, that level of recording is
For both assemblage and spatial approaches, the data requirements of current methods present a serious conundrum for either exploratory intrasite analyses or for problematic sites. Analyses in which the assemblage structure through which to associate finds, the spatial structure through which to associate samples, or both are unknown (i.e., the objective of the analysis) many of the current methods have limited application. Generally one or the other is presumed/required to be previously established by some other means.
The methodology presented here was developed specifically to address intrasite analyses in which there are significant ambiguities in the underlying spatial
Despite these restrictions, the sites were considered historically significant and clearly retained substantial areas of intact archaeological deposits. The spatial constraints and discontinuities as well as postdeposition intermingling of assemblages, however, made their evaluation difficult. My solution at the time was essentially to conduct a rudimentary form of biclustering, adapted from weighted gene coexpression network analysis (
Not only was this approach largely successful in identifying the areas of relatively intact deposition, it also showed that (in many cases) even the disturbed contexts retained at least some indications of their original stratigraphic coherence (
What I found was that it is preferable to consider artefact types for their semantic domains of associations rather than for their independent intensities of copresence. The information content of artefact types is in their
The problem is that considering
If we consider the association between assemblages and stratigraphic contexts as
This small conceptual shift identifies a lowerlevel space of archaeological interpretation than compositional, typefrequency, or spatial approaches – one for which the local semantics of artefacts within an excavated context inform empirical partitioning of a site into interpretable subsets of assemblages and contexts. Not only does this retain more of the contextual information of artefact deposition, it reveals a substratum to the quantitative evaluation of both assemblages and contexts that might not otherwise be apparent. In short, this additional stage of quantitative evaluation needs to occur prior to the tabulation of contingency tables and spatial analysis that is otherwise overlooked.
Ontology, in the broad sense, refers to the philosophical study of the essential nature of things and the relationships between them. Effectively, it is the study of what things
The objective of archaeological study has a certain intrinsic duality in relation to both its past and present existence. Even the most basic terms used in the description of archaeological entities reflect these dual connotations. For example, a
This dualistic and analogical nature to archaeological practice is, at this point, very well established (e.g.,
The emphasis here is on establishing a clear ontology of an archaeological site’s elements that isolates the empirical (i.e., directly observable) attributes inherent in field data. We cannot observe the activities that produced the archaeological deposits, nor can we observe the various formation processes that culminate in the observable archaeological record. Basically, we need to work backwards from effects to causes. The goal is to identify which empirical attributes of the archaeological record indicate the underlying structure of its internal associations, and to isolate those attributes that are distinct from the interpretive implications of those observations.
For an archaeological site, this essentially breaks down into five distinct but interrelated entities: 1) a site, 2) the contexts that represent finalstage site formation processes, 3) the excavated components (i.e., proveniences) that constitute the sampling from the site, 4) the assemblage(s) of archaeological material collected from that site, and 5) the artefacts found within the excavated samples that constitute the elements of the assemblage. Each of these relate to the others, but each entails a specific domain of information to be analysed. As will be discussed below, however, not all of those domains are necessarily appropriate for empirical or quantitative analyses.
The concept of a
In another sense, the modern entity of an archaeological site is a consequence of both postdeposition formation processes and our field methodologies. An archaeological site is defined by our sampled approximation of an extant material distribution. What remains for us to define as a site is the spatial extent and distribution of artefacts and features that are comprised of the final byproducts of behavior and their subsequent transformation by historical and natural processes (
The more interpretive connotation of site is as the
Thus, an archaeological site is in some measure a byproduct of excavation and field methodology rather than a direct expression of the processes and materials of its formation. In a very real sense, a site becomes defined as a collection of samples that constitute what is deemed the archaeological record. It is these empirical data choices that become the archaeological record and inform all interpretations that follow. Its empirical attributes are limited to those that archaeologists select to observe and record (i.e., the excavated and documented samples) and the methods used. Clearly, this constrains the empirical utility of site, as an analytical entity, to defining the overall sample space for analysis.
Similarly,
Contexts are also empirically problematic. Since the archaeological record of contexts is derived from the same selection of excavated samples as the site itself, contexts share the same problem of empirical ambiguities. Contexts are (empirically, at least) a related subset of the site’s samples. Interpreting a context as a unified area of interest, spatially and/or behaviourally, is wholly dependent on verification that samples from it are in fact related to the same underlying formation processes and initial deposits. This renders the empirical utility of context, and the related concepts of activity areas, somewhat problematic. As discussed in the preceding sections, there can be significant difficulties in
The remaining spatial entity in a site’s ontology is
Effectively, the idea of “provenienceassample” rolls together all the information associated with a given excavated component: location, stratigraphic sequence and juxtaposition, geomorphology, and material content. Each of these constituent attributes is an empirical observation, from which both context and site can then be securely built. The data referenced by provenience are the only unambiguously empirical entities related to the association and disposition of an archaeological record. Most critically, the empirical entity of provenience itself (despite its being a product of field methods) is as a unit of collocation for all of the other attributes. There are, of course, theoretical and interpretive implications to proveniences since they are constituent samples from contexts and thereby related to formation processes and activity areas. For the methodology presented below, however, the focus is on their observable content and their utility as a discrete sample observation.
The concept of
As a coherent collection of (potentially) distinct but related objects, an assemblage also represents the material correlates and byproducts of a discrete episode of human activities in both time and space (see
Association through shared provenience or context (i.e., collocation) might be empirically measurable, but it would necessarily depend on an already determined identification for both the assemblage and its context. The individual objects and features that make up an assemblage do not, of themselves, specifically identify those associations. Therefore, the identification of assemblages, as they are recovered during excavation, is an objective of analysis rather than an empirical datum.
Like all other aspects of archaeological materials, assemblages are equally subject to various postdeposition processes that can introduce uncertainty into their concrete identification and associations. The dispersal and degradation of
What can unambiguously be observed
The network of associations between uniquely identifiable objects indicate their relationships within both assemblages and contexts, and those associations are observable irrespective of their interpretive implications. Associations by artefact collocations, then, constitute the primary empirical indicators of association for deposit assemblages. These associations establish the analytical utility of other observable measurements of artefacts and their spatial distributions.
Our traditional units of interpretive analysis –
With respect to quantitative analyses, only unambiguously empirical data is what results from the most basic archaeological practice – i.e., the excavated component (or
The question then becomes a matter of methods that can impute the appropriate associations – artefact to assemblage, and provenience to context – that can discriminate the optimal partitioning of the total site into analytically verifiable subsets. If the problem is approached in terms of identifying the networks of associations between artefact and provenience, then the answer is a matter of optimal assignment of those elements to their associated collection or set. Rather than trying to identify an unknown assemblage composition profile from type frequencies or discriminate spatial discontinuities between unknown contexts, it would be more appropriate to address the analysis as a
Set theory, or some aspect of it, has played a limited role in archaeological analysis at least since Petrie (
The computational revolutions in quantitative scientific analyses, including those in archaeology, have both benefited from and been limited by the greatest strength of computers – the ability to quickly perform numerical calculations. To find algorithmic (i.e., computational) solutions largely entails finding a way to translate phenomena and concepts into suitable numerical terms. For archaeological problems, however, those translations are not always obvious. Certain archaeological attributes (e.g., coordinates, measurements, dates, or counts) are obviously numerical in nature. When the objectives are the associations and relationships between those entities, enumeration becomes less obvious. It is easy to forget that numbers are merely symbolic representations of
Several branches of mathematics (e.g., set theory, graph theory, combinatorics, etc.) are specifically concerned with just those sorts of relationships in which quantification is more subtle. The basis of the formalization for the archaeological ontology presented above is a setbased, combinatorial approach. A
There are numerous approaches to solving set and combinatorial problems. The following sections present a brief review of set theory covering some of the basic properties of sets, set operations, and an introduction to two generalizations of classical set theory – multisets and fuzzy sets – that have useful potential for applications to archaeological problems. In addition, a generalization on graph theory (i.e., hypergraphs) is introduced that has applications both for visualization of the complexity of these types of sets and for methods specifically designed to evaluate the intersecting areas of such graphs as an independent means of partitioning complex sets.
Table
Definition and notation for common set properties and operations.
Operation  Symbol  Example  Definition 

Set  {…}  A collection of elements  
Cardinality  …   
# of elements in a set 
Empty set  ∅  {}, ∅=0  A set with no elements 
Universe  Ω or 
Ω({ 
Set of all possible elements in a set’s domain 
Union  ∪  { 
Elements in either 
Intersection  ∩  { 
Elements in both 
Symmetric Difference  Δ  { 
Elements in either 
Multiset Sum  ⊎  { 
Sum of the multiplicities of multisets 
Subset, Superset  ⊆, ⊇  { 

Strict Sub, Superset  ⊂, ⊃  { 

“in…”  ∈  
“for all…”  ∀  ∀ 
Logical statement used in set declarations 
“and”  ∧  ∀ 
Logical statement used in set declarations 
“such that…”   or :  ∀ 
Logical statement used in set declarations 
Complement  All elements 

Powerset  ℘( 
℘({1, 2}) = {∅, {1}, {2}, {1, 2}}  All possible subsets of a set 
A
A Venn diagram of unions and intersections for two sets,
It is first necessary to define a set’s
The size, or
A set may also be a member or
Various basic operations follow from these properties – unions, intersections, and differences. The
Those are some of the basic properties of sets, but two further properties bear directly on the archaeological methodology presented below. Firstly, the elements of a set may themselves be sets. More specifically, a set of sets (or
Using these concepts, it becomes relatively simple to translate the archaeological site ontology described previously in terms of sets. We can define a site in terms of a set
One significant limitation of the classical set concept is that the elements of a set must be
Each element of a multiset
The cardinality (i.e., size) of a multiset is simply the sum of all
As a generalization of classical sets, multisets have equivalent (although also generalized) forms of the various set operations described in the previous section (i.e., union, intersection, powerset, etc.). Since multisets need to consider element multiplicities, however, the operations and their expressions are somewhat different. For example, unions and intersections need to consider a multiset’s repeated elements and to specify subset relationships explicitly. This is needed in order to distinguish whether a union operation results in the sum of multiplicities (if the two multisets are declared disjoint) or the greater multiplicity (if one multiset is a subset of the other). In this case, the use of the multiset sum (⊎) is generally used to indicate the former and the standard union (∪) the latter (e.g.,
Difference between set union ∪, multiset union ∪, and multiset sum ⊎. Note the handling of intersecting elements (element
For intrasite analysis, typically we are not starting with the entire collection of all artefacts contained within the site (i.e., the unknown universe Ω_{S} from which the site’s collection is sampled). The site’s
The objective of intrasite assemblage analysis is to find the natural combinations of these proveniences (i.e., sample subsets) to find whether there is a natural partitioning of the site’s assemblage
By addressing archaeological assemblages as multisets, however, these interrelationships between site, assemblage, and provenience can be formally specified for algorithmic expression while remaining archaeologically intuitive. If we consider site assemblages as multisets, then the mapping from the archaeological entities to multisets further simplifies our setbased ontology. Furthermore, a multiset’s multiplicity and support set introduce a clear method for relating artefact to typology.
We can now define a site’s assemblage as a multiset
Another limitation of classical set theory is that an element’s membership in any given set is binary – i.e., an element either
Yet another generalization of classical sets specifically addresses this issue of partial, multiple, or ambiguous membership for an element in a set – a
The membership function allows for indistinct boundaries between sets. This type of set was derived specifically to deal with circumstances in which an element’s membership in any given classical set is not determined by a simple binary (0,1) or Boolean (True, False) decision boundary. In classical set theory, an element either is or is not a member of a given set. In fuzzy set theory, such binary membership is called a “crisp” set. An element’s membership in a fuzzy set, by contrast, consists of a gradient
The membership function for a fuzzy set should not, however, be simply viewed as a probability or proportion of membership. These would, in terms of fuzzy theory, be simple determinants derived from the composition and distribution of the sample space itself. Instead, membership is an ascribed and contextual zonal gradient, indicating the relative “truth” of the element’s assignment to that set. The degree of “truth” is determined relative to other defined sets. The combined
In this way, fuzzy sets can be defined to provide a set membership determinant in circumstances where the decision boundary entails some ambiguity, such as an element’s “proximity” (so to speak) within the mapping of the universal set’s sample space. Since the membership function is not necessarily a function of the elements, instead being imposed based on an ascribed distribution within the field of discourse, the set of output values of the membership function may itself be a fuzzy set.
Common examples of simple fuzzy set memberships are the assignment of certain continuous values to nominal classifications such as “short, tall” or “small, medium, large” that have indistinct boundaries but concrete connotations. In the shorttall example, a person that is four feet tall is generally considered “short” compared to a person that measures six feet or more in height. A “crisp” membership cutoff of “
Although the membership function
Whereas multisets address repetition of artefact types, fuzzy sets address ambiguous categorical boundaries (see
While multisets allow repetition of elements and fuzzy sets allow gradient boundaries of element membership within a given set, There are also situations in which a collection of elements might exhibit
In archaeological applications, this may occur for artefact types that have multiple possible categorizations depending on their context. Take, for example, the possible categorizations of a porcelain sherd in historical deposits. A small porcelain sherd could be refined tableware, or a piece from a toy doll or child’s tea set. A large chunk could be an electrical insulator or piece of a toilet. Each of these instances of ‘porcelain’ would have a completely different field of discourse depending on assemblage context. Such artefact types would require a multiplicity of membership functions to each context.
There is an additional generalization of fuzzy sets that allows multiple membership functions for an element – i.e., fuzzy multisets (see
A fuzzy multiset element has an associated membership function depending on its local embedding, but also has additional and contextually differentiated memberships in other separate and distinct fields of discourse. In that way, a word for which the context determines the specific meaning or connotation is associated with the word itself in the lexical support set of the corpus’ multiset, but additionally has separate relative gradients of membership associations dependent on its contextual embedding.
In many ways, the concept of fuzzy multisets could addresses the various archaeological caveats noted in the preceding sections. They allow for both the multiplicity required to assert site assemblages as sets of independent discrete artefacts with related types, and to assert the vague boundaries of assemblage and context membership for any given artefact type. Mutisets address the repetition of artefacts within a type, but do not allow for ambiguous categorical membership. Fuzzy sets address categorical ambiguity for types, but do not allow for multiplicity. Fuzzy multisets provide a generalized framework the incorporates both, allowing classification schema that potentially account for more contextually sensitive categories.
In discrete mathematics, set theory and graph theory are closely related. A graph is, in its most general sense, just a representation of relationships and associations. In its most typical form, a graph is a collection of nodes (also called vertices), which are connected by edges indicating a relationship between each pair of nodes. Graphs can also, however, be considered in terms of sets – a given graph can be defined as a system of sets
The recent archaeological interest in social network analysis is, in part, a consequence of this algorithmic efficiency. The major advantage of network analysis is in using these innate properties of graphs to determine edge adjacency between nodes, rather than more computationally intensive methods for determining proximity and relationships between data points. These efficiencies can also be applied to combinatorial problems, so offer a potential suite of tools for addressing the archaeological ontology as well. The limitation here, however, remains the issue concerning the
The concept of a
Example hypergraph
Essentially, a hypergraph provides a generalized algebra for representing the various interactions between sets of sets. A hypergraph
Aside from the mathematical methods and implications, hypergraphs also provide a powerful visualization tool for sets of complex relationships (see
Of particular interest, in regards to site and assemblage analyses, are methods of decomposing
In terms of its application to the archaeological ontology as defined as a system of sets, consider a hypergraph of an archaeological site constructed in two possible ways – one in which the vertices are the artefacts, and one in which the vertices are proveniences. If we construct a site hypergraph consisting of artefact vertices, then the hypergraph edges constitute the relationships of artefacts within assemblages (i.e., the associated sets of artefacts), with the edges determined by their collocations through shared sets of proveniences. Conversely, a site hypergraph of provenience vertices would render the edges as contexts (i.e., associated sets of proveniences), with the edge sets determined by shared artefact content. What remains to be determined, then, is the criteria for finding optimal solutions for assigning those edge memberships.
The previous discussion touched on some of the parallels between the ontology of archaeological field data and several aspects of sets. Conceptually, at least, the traditional views and treatment of archaeological assemblages, as collections of discrete objects, fit well with the logic of set theory. We can begin to formalize a mathematical structure of archaeological assemblages, using standard set notations from these initial premises, to describe the content of excavated proveniences on its own.
First, let’s stipulate that the total sampled collection of all artefacts at a site constitutes a fixed, finite multiset
Furthermore, each excavated provenience contains an individual sample of artefacts drawn from that unknown total site population Ω_{S}. The total sampled collection
Since
The goal of the intrasite analysis is to find a natural partitioning of
If we consider an assemblage to be a subset of a site’s artefacts indicative of an occupation and/or activity, and consider a context to be a subset of proveniences containing the assemblage deposited by that activity, then we’re looking at evaluating two specific families of sets over the two related domains – the total domain
More explicitly, the objective is to find a member of the powerset ℘(
The more common cases are ones in which artefacts from different natural assemblages are found within the same excavated component, stratigraphic layers are mixed or inverted, later features cut through older deposits, and the spatial distribution of materials is anything but discrete. In these cases, there would be no complete solution in which assemblage or context families of sets would be internally disjoint. Instead, they could instead be
It is important to consider that a site, as excavated, reflects the cumulative spatial disposition of various sequences of deposit and redeposit. All of the original assemblages will be represented in the samples, but contexts may become conflated by later intrusions and/or postoccupation disturbances. The formalization presented above is assessing the natural partitions within that final
With the natural partitioning of a site defined as the system of sets (
Construed as a graph, the partitioning of a site’s assemblages and contexts entails the identification of discrete communities or
The basic premise of archaeological field practice is to record all necessary information to associate archaeological features and artefacts with their spatial contexts. Similarly, the basic premise of intrasite assemblage analysis is to discern the patterning of classes of artefacts within those spatial contexts in order to infer patterns of activities. The goal of each, essentially, is to provide a means of identifying coherent groups that provide interpretable information within both the total artefact assemblage and spatial units. Since the ideal group membership is an unknown, this is effectively a problem of simultaneous clustering across spatial units and assemblage artefact types – which artefact types consistently occur together in space, and which spatial units consistently contain similar artefact types (or appear to be drawn from the same source assemblage). There are a number of existing methods that can be (or have been) applied to this problem, but the specific nature of archaeological assemblages do present certain difficulties.
As previously noted, archaeological data is rarely “clean” in the sense of being discretely compartmentalized by occupation assemblage or context. Assemblages become mixed within stratigraphic contexts both through occupation activities and postdeposition disturbances. If we suppose, though, that these overlapping interactions of assemblages and contexts may be represented by the membership functions of fuzzy multisets within the site’s field of discourse,
Approaching assemblage analysis as a combinatorial problem, the discrimination of a site’s assemblages and their contexts of deposition become a matter of optimal set partitioning. As described above, this entails an optimal partitioning of the site’s total assemblage multiset
Similarly, the objective of clique problems is to find an optimal subset of vertices (or
The formalization presented above is only the initial step toward the problem of evaluating the structuring relationships within archaeological deposits, and an algorithmic implementation of the framework is currently under development. This is still a preliminary effort with a number of issues remaining to be resolved. In particular, further work is needed towards establishing the relationship between the network of associations derived from the set and graphbased ontology and the network of stratigraphic and spatial associations. I believe, however, that the formal ontology presented here provides a more empirically grounded and semantically sensitive approach to those networks of archaeological relationships than is available by strictly statistical or spatial analyses. By readdressing such a fundamental concept as the underlying empirical basis of intrasite associations, my hope is to reinvigorate and broaden the discussion regarding the nature of data for quantitative analyses.
Correctly defining a problem and specifying its components are always necessary steps towards finding a solution. If the goal is a quantitative or computational solution, however, these steps are absolutely critical in the process. Formalizing the problem and specifying its evaluative criteria, typically in a mathematical or formal logical structure, delineates both the possibilities for an algorithmic solution and the form and content of its output (
Intuitively, archaeologists recognize and correct for such biases in their considerations of sites and assemblages and in their interpretations. Our training and insight allow us to identify patterns and their mismatches (e.g., artefacts out of context, disturbed proveniences, etc.). Quantitative and computational solutions, conversely, leave no room for the luxury of intuition. Underspecification and categorical errors in these definitions result in incoherent and erroneous algorithmic outputs. The biggest danger in quantitative and computational approaches is that they will nearly always output
The isolation of discrete assemblages and contexts within a site forms the basis of nearly all archaeological interpretation, and the most prevalent form of empirical data are artefacts and the proveniences from which they are recovered. These field data are central to archaeological methodology. The complex nature of those field data, however, present significant problems for quantitative inference. The standard procedures of normalization, feature selection, and dimensionality reduction are certainly able to coerce the sparse and highvariance distributions, typical of archaeological assemblages, into suitable conditions to allow the application of any number of statistical techniques. In doing so, however, the complex interrelationships of assemblages and contexts are necessarily simplified with some corresponding degree of information loss. I believe these problems are only further exacerbated by reliance on compositional profiles and artefact typefrequency approaches.
I have presented a different approach that I believe to be more appropriate to the nature of archaeological field data. The combinatorial approach captures more of the internal complexities of assemblages and contexts by viewing them holistically rather than as discrete independent events or variables. A clear structural ontology of site, assemblage, and context suggests that the necessary empirical units of analysis devolve to the most basic elements – proveniences and artefacts. By addressing those elements in terms of combinatorial sets rather than compositional frequencies, the conceptual mappings between the standard archaeological units and various aspects of set theory render an obvious and relatively simple formalization. That formalization, in turn, leads to welldefined and studied classes of combinatorial problems in discrete mathematics and graph theory, but remains consistent with an archaeologically grounded ontology. The specific archaeological implementation of these concepts will, however, require further research.
A summary of the methods was presented at CAA2017 in Atlanta, GA as “Unscrambling the Egg – Quantitative, assemblagebased component consociation methods for densely mixed or disturbed contexts”.
Note that I am not referring here to the concepts of assemblage theory in the sense of Deleuze & Guattari (
Typically, there are
For the time being, the other attributes of proveniences as stratigraphic units (e.g., their spatial locations, soil morphology, etc.) are set aside as a separate analysis. Proveniences as spatial objects, with particular stratigraphic attributes, are better addressed by other quantitative methods. The formalization described here provides means to impute the related assemblage and context information to each artefact and provenience, which can then provide data for subsequent analyses.
An indexed set, indicated by the subscript
It is necessary in this case to specify that a given
i.e., the sample space of the site’s universe Ω_{S} with respect to the assemblage and context families of artefacts and proveniences, respectively.
“NP” stands for nondeterministic polynomial time. NPcomplete problems are those for which there is no efficient algorithmic solution. This means that the time to compute such a problem would increase exponentially such that the time to compute all possible solutions would not be computationally feasible. Verifying a
I extend my sincere gratitude for the support and thoughtful insights of the numerous people that have made this work possible, and to the editors and reviewers for their considered and constructive comments. My thanks to Christina Rieth for the opportunity and encouragement that began this research as well as comments through earlier drafts, and to John Hart for encouraging my independent research. My thanks also to Miquel Colobran, Oliver Nakoinz, Grégoire van Havre, and Georg Roth for their conversations and encouragement at recent CAA conferences. A special thanks to Peter Biro, James Cardinal, and Martin Pickands for commenting on earlier drafts. Most of all, I am thankful for my family’s support, especially for my brilliant wife Jennifer LoughmillerCardinal – without whom I would not have believed my ideas worth presenting, and whose keen insights always inspire.
The author has no competing interests to declare.