Seriation was one of the earliest applications of computers to an archaeological problem. Despite the origins of the technique in numismatics, the vast majority of coinage studies manually sequence coin hoards and issues. For many periods, the coin designs or legends can be used to provide a date. For the Republican series, however, detailed sequences rely on the use of coin hoard data. In recent years, Correspondence Analysis has become the

Dating artefacts, and by extension contexts, layers and assemblages, is a fundamental concern for all archaeologists. Relative dating by typology is one of the oldest techniques in the archaeologist’s toolbox, if one fraught with uncertainty as it relies so heavily on expert opinion. Petrie (

The process of ordering the rows and/or columns of a table on the basis of values within them is one which attracted statistical attention from quite early on (

O’Brien & Lyman (

Despite the origins of seriation techniques within numismatics, there has been almost no cross-fertilisation between the two disciplines. Numismatists have created quite complex sequences entirely manually (

For the Roman Republican series,

Following presentations of my work analysing Roman Republican coin hoards, the question has been often asked if Correspondence Analyses could contribute to the problem of dating (

An extract from

The essential problem in assessing the various methods is that we do not know what the correct sequence is. There have been studies using simulated data sets to assess the results of Correspondence Analysis (

Three magistrates were elected each year to oversee the production of coinage. For most of the period in which we are interested, this leads to three different coin types being struck each year.

Each coin type is struck for

The size of coinage issues were highly variable. Although we do not know their absolute size, we can see their relative sizes via the hoard evidence.

Previously, it was found that including the whole series of late Roman Republican coin types in one analysis was unhelpful due to the complex mixture of underlying patterns (

using a minimum hoard size of 30 well-identified coins;

using the dates of coin types from the standard catalogue, hoards closing

also using the dates from the standard catalogue, all issues ten or more years older than the closing date of the oldest hoard were omitted.

In this way, it was possible to reduce the influence of early issues prior to the closing date of the earliest hoard without reducing the size of those hoards to the extent that they would not be viable in the analysis.

Using this experience as a template, and our knowledge of the pattern of coinage production, it is possible to create simulated data sets that have the same characteristics as real data, but for which we know the correct sequence of hoards and issues.

The simulation program has been written in the statistical software system R (

A simplified flow diagram showing the sequence used in the simulation. Of the inputs shown on the left, only the number of hoards required is compulsory. The remainder have default values which can be over-ridden if desired and are shown on the right.

The simulation runs through a series of steps:

^{n}

The size of real hoards also follows a log-normal distribution, so the size of the simulated hoards is created by sampling from a log-normal distribution with similar characteristics: a mean of 4.36 and a standard deviation of 1.44, restricted to hoard sizes between 30 and 1500. The minimum figure is that used by the author in his analyses of real data. Alternatively, a vector of hoard sizes can be passed to the program using the parameter

Fourteen seriation techniques were tested with the simulated data sets. All the analyses were conducted within the R Statistical System (

The remaining methods listed below were applied to data sets in three forms. For the PCA and PCA Angle methods, raw counts, counts standardised by types, or as percentages by hoards were used. As the Bond Energy Algorithm and the BEA Travelling Salesman Problem require non-negative data (

Although CA has become the preferred method for analysing count data since the late 1980s, PCA can be used in the analysis of artefact counts (

Thus, three options were tested with all four methods.

To compare the results of the seriation routines with the original data a Spearman’s Rank Correlation coefficient (

For the issues, the first ten years of issue are, essentially, a randomising element as the earliest possible closing date for the hoards is year 11. I have, therefore, calculated

Issues which only occur in a single hoard present an additional problem. If the issues are large, and only occur in one hoard, we could probably be confident that they belong at the end of the sequence. A rare issue might, however, occur a few times in an especially large hoard, or a common issue only once in a small late hoard. Rather than introduce a measure of subjectivity in deciding what to exclude or include, I have run the analyses with all the data. This, again, might result in a conservative estimate of

Results of the simulations with 10 hoards.

Results of the simulations with 20 hoards.

Results of the simulations with 50 hoards.

The increase in sample size to 20 hoards improves the results slightly for CA/DCA (

To examine the results more fully, let us look at one data set —

Details of the hoards in data set

HOARD | CLOSING DATE | TOTAL | TOTAL EXCL. YEAR ZERO |
---|---|---|---|

H01 | 12 | 68 | 53 |

H02 | 14 | 33 | 23 |

H03 | 15 | 84 | 66 |

H04 | 15 | 895 | 730 |

H05 | 16 | 42 | 36 |

H06 | 16 | 650 | 521 |

H07 | 16 | 387 | 309 |

H08 | 17 | 201 | 165 |

H09 | 18 | 191 | 155 |

H10 | 20 | 37 | 33 |

H11 | 20 | 126 | 112 |

H12 | 21 | 1069 | 911 |

H13 | 22 | 225 | 193 |

H14 | 23 | 239 | 202 |

H15 | 24 | 145 | 126 |

H16 | 25 | 76 | 64 |

H17 | 27 | 245 | 216 |

H18 | 28 | 148 | 138 |

H19 | 28 | 77 | 65 |

H20 | 29 | 744 | 666 |

Totals | 5682 | 4784 | |

The original data in hoard

The data in hoard

Ordination maps from CA of data set

We can compare this to the result for the PCA angle method with standardised data (

The data in hoard

In contrast, the results from the BEA TSP method on chord transformed data are less than optimal (

The data in hoard

At present, just the first axis of inertia from the CA is being used to give the sequence. It might be preferable to be able to use the first two axes and to take the sequence around the horseshoe curve (

Principal curves through the issues from the CA of data set

An alternative method of ‘straightening out’ the curve is detrended CA, as applied in the above tests. Detrended CA has not met with universal acceptance (see references cited by

One weakness of the automated approach adopted in this study compared to using CA to analyse real data sets is that problematic data which would ordinarily be dealt with by the analyst, are left ‘as is’. As an example, the CA of data set

Symmetric maps from CA of data set

With real data we do not, of course, have the ‘correct’ answer to compare to the results of a CA. One thing is clear from the simulations is that even with a large set of ‘hoards’ which are truly random selections from the coinage pools — something we cannot be sure of with real data sets — the results are never perfect. Although numismatists frequently attempt to assign precise dates to issues, and suggest sequences into which they all fit, we must take account of uncertainty. Clearly, small issues will be harder to place within a sequence than larger issues. One method by which we can assess uncertainty would be to use bootstrapped CA (

In order to look at how we might use such an analysis, I have subjected a trimmed-down version of a previously analysed data set of hoards from Italy and the Iberian peninsula at the end of the second century

Ordination maps from the bootstrapped CA of hoards from Italy and Spain at the end of the second century

With a large analysis like this, assessing overlaps between individual issues is difficult from maps like

The circles plot score for each coin issue on the first axis of inertia from a CA. The lines represent approximately 95% of the values from the bootstrapped replicates calculated from the reported variances. The issue numbers are those from Crawford (

Despite CA generally concentrating the largest values on the diagonal of a re-organised data matrix, it represents the best method, of those tested here, to seriate coin hoard data as shown by the trials using the simulated data. Although detrended CA may help with the issue of ‘stretching’ where the later hoards are more spread-out on the first axis from a CA than the earlier ones — as discussed in detail elsewhere (

The basic premise of using simulated data sets to test statistical methods has long been known. One of the principal issues, however, is knowing enough about the material to create realistic data sets. For this particular coinage series, we have a very good understanding of this thanks to several hundred years of research. Unlike pottery, for example, we do not need to take into account variable use-lives as, with one exception,

An appreciation of the variable degrees of uncertainty in the sequence can be obtained by bootstrapping. In general, we have to accept that despite there being a ‘correct’ answer for the striking of Republican issues, it is very unlikely we will be able to unequivocally recover that sequence. Rare issues are always going to be especially problematic. We should be prepared to think in terms of probabilities.

CA remains, however, somewhat of a blunt instrument. There are other sources of data which could contribute to the correct sequencing of issues such as die-links. Coin dies have a limited life, and where two issues can be shown to share dies, it is almost certain they were struck in the same year, or perhaps at most a year or two apart. Historical information on the coins, such as that mentioned in the introduction, could also contribute to refining the model.

One approach would be to use a modified version of the constrained CA method (

The additional file for this article can be found as follows:

Detailed results (Tables 2–4). DOI:

The standard Roman silver coin in circulation between 211

The program code is available at DOI

The closing date of a hoard is given by the date of the latest coin included within it.

For earlier studies using simulated hoard data sets, see Lockyear (

I would like the thank the anonymous referee for the reference and suggesting the use of the chord transform.

Although the precise sequence for Roman Republican coins is unknown, the approximate sequence can be determined via historical and stylistic criteria and it is, therefore, possible to unambiguously identify the start and end of a sequence suggested by seriation.

As noted above, a moderate number of issues from before the closing date of the earliest hoard have to be included to avoid having hoards which, after trimming of the earlier issues, contain almost no coins.

CA full data set:

The 95% estimate has been calculated using the variances in the scores from the bootstrapped replicates for each issue reported by the cabootcrs package.

The debased legionary

I would like to thank Trevor Ringrose for his extremely helpful comments on an earlier draft of this paper, and for his help with bootstrapped CA.

The author has no competing interests to declare.