The modern online trade in anatomical, ethnographic and archaeological human remains, especially using social media and e-commerce platforms such as Instagram, Facebook, Etsy, Marktplaats, Amazon.com, etc., is becoming documented and exposed (e.g. Huxley & Finnegan 2004; Huffer & Chappell 2014; Halling & Seidemann 2016; Huffer et al. in press). In our first paper on data mining the human remains trade on Instagram (Huffer & Graham 2017), we scraped several thousand posts to explore and map this well-connected network of collectors and dealers. We found individuals in this trade who are ‘specialist’ (exclusively specific categories) and ‘generalist’ (taxidermy, medical implements, bones of animals and humans, etc.) in their collecting focus. Furthermore, the display of their collections and the “true” stories of the acquisition of rare items seems to influence another group we identified, the ‘enthusiasts’, or those who do not seem themselves to collect, but rather who re-broadcast the photographs of those who do to help form tastes. Our previous work is limited in that it only mined the text of the posts, neglecting the rich visual data of the photographs.
The problem is one of identifying, at scale, the legal and ethical ramifications of a given photograph when the language of the accompanying post itself is occluded, in code or innuendo. Not everything for sale is marked as such. What were we missing? It takes a long time to manually study each of the thousands of images in our dataset to understand broader patterns. The manual nature of this type of content analysis has been a major obstacle to analyzing images at a large scale in many fields. However, in recent years computational methods have become available that make it possible to train a typical desktop computer to recognize content in large collections. This technological breakthrough makes it possible for us to efficiently identify the contents of the thousands of images we have collected. In particular, the release of the Tensorflow library and the Google Inception-v3 convolutional neural network model (2017) trained on Imagenet (Stanford Vision Lab 2016) allows us to identify clusters of visually similar images, at scale, relatively quickly. These clusters can then be referenced against the data mining of the text in the accompanying posts.
Can we teach machines to automatically identify from photographs alone patterns in the ‘visual rhetoric’ which signal that an item is for sale? Our first step in analyzing visual rhetoric in our sample of Instagram images is to develop methods for the automatic clustering of images with similar patterns of features, not all of which are necessarily apparent to the human eye. We can then identify visual rhetoric through the choices made about how and what objects the photographer included in their image to convey meaning to their audience.
We first introduce neural networks and computer vision technology in general, and in archaeology in particular. Then we provide some brief background on the trade in human remains. We then walk the reader through our process using Tensorflow and Inception-v3 to turn our collection of images into image vectors. We describe how we reduced and visualized the multi-dimensional vector space in two dimensions using t-SNE, and clustered the results using affinity propagation. We close with remarks on the visual rhetoric of the human remains trade on Instragram.
Neural Networks and Computer Vision
Computer vision technologies are typically based on a neural network (NN), a system that allows a computer to learn how to perform some task by analyzing training examples. The NN consists of layers of computational ‘neurons’, which are mathematical functions that mimic the functioning of a biological neuron. In our case, we want the computer to learn to recognize the contents of images in our sample, so it can identify large numbers of visually similar images. Our NN has been trained (not by us) by inputting a large set of images with known and labeled contents, so that the NN has developed a system for recognizing image contents by itself. During the training process, the labeled images trigger the neurons to fire in particular ways depending on the visual information contained in the image; images that are similar in some fashion cause similar cascades of firing. These cascades are represented as mathematical functions that the NN stores to use when we give it the unlabeled images in our sample. The NN uses these functions on our images to detect the content that we have trained it to recognize. This is a necessarily simplified description of what NN do. For an extended technical description see Bishop (1995). In our particular case, while it is the final layer of the NN that applies the label to the images, we can stop the process at the penultimate layer to retrieve the mathematical description of the images and use this data to identify visually similar materials.
Schmidhuber (2014) provides a comprehensive overview of NN from its origins in the 1960s to the present day. It highlights how NN used for ‘deep learning’ today differ from earlier ‘shallow’ NN (that is, the number and complexity of layers of neurons, and the ‘credit assignment paths’ or weights between neurons in these layers that make the NN perform the desired behavior). Whereas shallow NN typically require a great deal of supervision and training, the power of deep NN lies in the ability for them to learn without such supervision. Google’s Inception-v3 model is a convolutional neural network (CNN), part of this class of deep NN (Szegedy et al. 2014). Because of their architecture explicitly engineered to mimic the connectivity in the visual cortex (which is referenced by the word ‘convolutional’), convolutional networks have performed particularly well in image recognition tasks (cf. Olah 2014).
The ‘Inception-v3 model’ (Szegedy et al. 2014) was trained against the benchmark dataset, ‘ImageNet’ (Deng et al. 2009). Imagenet contains over 1000 classifications of nouns (i.e. items often found in images) where each noun is represented by hundreds or thousands of tagged images. While that particular dataset (ImageNet) has not been trained specifically on human remains, it is useful for us in that the vector representations of images used by the Inception-v3 model (from the penultimate layer before labeling) can be used to determine visual similarity. These vectors can be clustered. By finding clusters of similar images according to their contents, we can elucidate patterns in the visual rhetoric of these images which may then become the basis for filtering and tracking this trade.
Neural Networks in Archaeology
With the advent of powerful consumer-grade graphical processing units and open-source software, applications of NN to scientific problems have proliferated alongside bodies of code and other packages. Neural networks have appeared in the archaeological literature from at least the 1990s (cf. Baxter 2014 who provides an overview). Some studies concluded that the method did not have any more explanatory power than other more common models (Gibson 1996; Everitt & Dunn 2001), though work such as Bell and Croson (1998) concluded that early NN were particularly well suited for sparse datasets. Important work on NN in general in archaeology has been done by Barceló from the mid-1990s onwards (in particular, 1995, 2004, 2008; Barceló & Faura, 1999). Aprile, Castellano and Eramo (2014) found success using NN to classify mineral inclusions in potsherds. Ma et al. (2000) employed NN to aid in the classification of pottery. Kadar et al. (2004) used NN for archaeometric work with copper artefacts. Other work uses NN to enhance information retrieval on, for instance, pottery databases (Benhabiles & Tabia, 2016), or to determine whether or not degraded statuary belongs to a particular ‘school’ for the purposes of restoration (Wang et al. 2017). The above are just examples of a growing sub-field of research.
The Human Remains Trade
The questionable or illicit sale of cultural property is frequently estimated to be the third most profitable black-market industry following narcotics and weapons trafficking, “bring[ing] in $2 billion to $6 billion annually” (Choi 2011). While the exact dollar value of the antiquities trade is difficult to quantify, the amount of money involved in the human remains trade itself, at least on first reckoning (Huffer & Chappell 2014; Huffer & Graham 2017), appears low compared to other categories of antiquities trafficking. However, human remains trade research remains is in its infancy. Financial considerations aside, indications are that its impact could be much larger. Recent research is bringing to light issues of damage to archaeological and ethnographic knowledge, vandalism of cemeteries historic and modern, theft from collections, (dealer) alleged complacency of professional scholars, loss of Indigenous cultural memory, and violation of the rights of descendent communities (Huffer et al. in press; Huffer & Charleton in review).
Redman (2016) provides a popular account of the various (often dubious) routes through which American and European museums and governments acquired human remains in the past. As some of these collections have been deaccessioned, individual items from them can sometimes find their way onto the commercial market. Even when circulating between collectors for years or decades, numerous questions remain surrounding how and when the skeletal remains appeared on the market, and (if archaeological in nature) what crucial contextual information could have been lost due to the looting act itself, regardless of when it took place (see for instance BABAO 2017).
Tensorflow is a software library developed by Google for machine learning using NN. It was open-sourced and released to the public in 2015. One of Google’s tutorials for Tensorflow (2017) walks the user through the process of classifying a folder of images on the user’s machine using the Inception-v3 CNN model. Enthusiasts worked out that the final step – classifying the images according to the vector representations in the model – could be skipped, leaving the user with the vector representations of similarity. Douglas Duhaime of Yale University’s DH Laboratory wrote a blog post (2017) explaining this process in detail, including Python code, which we follow and employ here. We provide the computed image vectors and code (including copies of Duhaime’s) used for the analysis we report here in our OSF research compendium (Graham 2018). We do not provide the images themselves in the compendium nor do we reproduce detailed versions of the images here, given the ethics of researching human remains (cf. BABAO 2010, item 6).
Our method can be summarized by the following steps:
- Turn our folder of images into vectors using Tensorflow and the Inception-v3 model. The result of this process is that for each image we have a vector, the mathematical description of the image. We first installed Tensorflow using an Anaconda install of Python 3 on a Mac OS 10.10.5 laptop. Then, to classify a folder of images using Tensorflow we followed Duhaime (2017) and ran his modified ‘classify-images.py’ script to write the second-to-last layer of neural network weights from the model to a folder containing our 12010 image files. This took approximately 2 hours.
- Visualize those vectors by reducing the complexity down to two dimensions using t-SNE (t-distributed stochastic neighbor embedding). Dimension reduction is a common step in handling NN output, regardless of the input type. It is necessary to visualize the data with currently available methods. This is a dimension reduction technique similar to Principal Component Analysis, but it uses non-linear methods that make it especially effective for high dimensional datasets such as images (Maaten & Hinton 2008). t-SNE optimizes for keeping points close to their neighbors, so it is an effective tool if we want to visualize which images are close together in our sample. We compute t-SNE by running Duhaime’s ‘get_tsne_vector_projections.py’ script on our newly-created folder of image vector files. The .json formatted results of this operation are also in our repository.
- Identify clusters using affinity propagation. Affinity propagation is a clustering algorithm that identifies exemplars among data points and forms clusters of data points around these exemplars. K-means is often used for clustering, but it is sensitive to the initial random selection of exemplars, and does not necessarily select the best representation of clusters; in Frey and Duecke’s approach, all data points are considered as possible exemplars (2007). We imported the vector projections resulting from the previous step into R to determine clusters with the affinity propagation algorithm using the R package apcluster (Bodenhofer et al. 2017), as described in Frey and Duecke (2007).
- We manually sampled images in those clusters with the captions in the original posts to determine what the machine is seeing and why.
Figure 1 shows the t-SNE projection of 12,010 image vectors. Figure 2 is the same data, but colored according to the clustering determined by the affinity propagation technique which also identifies ‘exemplar’ data points within clusters. We plotted these 84 exemplar images using the projection from the t-SNE and using the actual images as the data points as a locally-served website. Figure 3 gives a zoomed-out view of this plot (thus obscuring the details of any one particular image; the html framework for that visualization is that used by Duhaime 2017). Looking at the html visualization of the data, we see that pencil-sketches and other ‘artistic’ interpretations of human bones are for the most part located in the bottom right quadrant, while photos of people with bones are in the top left. Taxidermy and animal skeletons seem to be in the lower left, while human bones are in the upper right.
In the very center of the complete projection is an image of a skull upon a shelf, with a price tag. This exemplar is from cluster 35. Figure 4 plots clusters 35, 80 and 82, which we will discuss in more detail, in the context of the complete corpus as depicted in Figure 1. We plotted the 143 images assigned to cluster 35 in our html framework in order to explore it (Figure 5). It seems as if the critical feature that unifies these images is that they are of bones/skulls that are positioned on things – often, but not always, a shelf, (see Figure 6, a detail from the exemplar image for this cluster). The image is composed so that the foreground is in sharp focus and the other items on the shelf are blurred. They are reminiscent of mid-20th century museology, of items ranged in ordered rows, heightening the sense of ‘other’, distancing their humanity (cf. Redman 2016).
If we return to the html plot of the cluster exemplars, and consider the photos of human remains according to their original associated text, two more clusters attract our immediate interest – cluster 80 (168 images), and cluster 82 (110 images). These clusters are in fact adjacent in the t-SNE plot, and quite distinct from the location of cluster 35. Cluster 80 is skulls that often have been photographed square to the face, and largely fill the frame, while cluster 82 seems to be skulls that are turned slightly to the left or right or upside down. In the associated posts for the images in cluster 80, the language used is of the ‘look what just arrived in my collection’ or ‘look what I just gave away’ discourse. Some photos are indicated as having been taken in a museum, and there is at least one photograph from a well-known business in this trade where the associated post advertises that the store is seeking skulls to buy. Items marked for sale are discussed obliquely, e.g., “new skull arrived… come pay your respects at [the] most amazing curiosity shop in Texas”.
In the associated posts for the images in cluster 82 are many explicit notices of materials for sale. A number of active business (predominantly in Canada, but also in the UK) account for several of the posts and clearly state the item is for sale, often naming a price. As was reported in Huffer and Chappell (2014), active businesses (with brick-and-mortar storefronts and/or online websites with catalogs, PayPal account details, contact information, etc.) are also known to exist in the US, the Netherlands, Belgium, Australia, and elsewhere. Perhaps there are national trends in visual rhetoric?
Visual Rhetoric in Images of Human Remains on Instagram
While the Inception-v3 model was never trained on human remains, the Tensorflow framework allows us to unpack its identification of visual similarities to the level where salient features of the images are identified. We can then use those vectors to explore patterns in the visual composition of these images that can then be cross-referenced with the original language in the posts. This initial experiment does seem to support the idea that items for sale are displayed in ways that are discernible to the machine, and so, the machine can be taught to trawl other bodies of data for more evidence of the trade in human remains. The machine directs our attention to the framing of photographs, and the relationship of the human remains to other elements within the photograph. Exhibition design – rows of objects in cases on display – are recreated here. The interplay of foreground and background also seems to be important. Photos composed to show off a collection might also be subtly signaling that the item might also be for sale. These signals could be isolated, and used to train further iterations of a CNN, allowing a researcher to scale up their investigation. We intend to cross-reference this data with the network of followers and followed, to see how these visual clusters play out across networks of influence and on other platforms aside from Instagram.
Our research was motivated by the question of whether machine learning can detect visual signals in Instagram photographs indicating that the human remains depicted are for sale. We found that meaningful clusters (in terms of items for sale, or items for display) of similar images containing human remains can be identified by a neural network model, dimensionality reduction with t-SNE and affinity propagation clustering. We have demonstrated an approach to getting insights from large collections of images that may be useful in a variety of research contexts relating to cultural heritage and archaeology.
Can machine learning detect visual signals in Instagram photographs indicating that the human remains are for sale? The results of this initial experiment would seem to indicate ‘yes’. The positioning of a skull, for instance, relative to the plane of the camera; the arrangement of materials on shelving (or other objects), mimicking a museum display case; and foregrounds in sharp detail with backgrounds blurred all seem to be relevant signals. These all seem to be invitations to the viewer to consider the availability of an object for purchase or trade. No doubt we will discover more.
The results and discussion presented above are only the beginning of what we expect to be able to do using a neural network approach. Outside of human remains trade research itself, the use of this approach might be helpful in identifying many other kinds of materials bought and sold online, whether licit and with no public objections, or related to possible or confirmed illicit activity (such as drug or wildlife trafficking). The automated research into illicit markets of any kind, let alone cultural property, on social media, is still in its infancy (e.g. Yang & Luo 2017; Hernandez-Castro & Roberts 2015). Although our application of NN to the myriad questions raised by the human remains trade is still in its preliminary stages it adds to the growing body of research that is expanding the use of NN in general to studies of human history. Moving forward, we intend to replicate our studies on a growing data set obtained from monitoring various platforms. Investigating the differences between social-media platforms could reveal how each platform is used by this particular community and give insights into why.
In addition to aiding other researchers investigating diverse forms of trafficking, the focus on using computer vision to study human remains places our work within the larger corpus of research that makes use of image classification, machine learning and NN in archaeology, bioarchaeology and forensic science in general. This corpus has included, for example, ceramic classification (e.g. Aprile et al. 2014), or sex, age and stature estimation of skeletons recovered from both archaeological and forensic contexts (e.g. Ionescu et al. 2016), and even examining whether or not the quick disposal of the dead during a conflict can leave a unique signature in the mortuary record of a place beyond the presence of a mass grave (Spars 2014). Other archaeological uses are possible, for example improving upon human abilities to detect new site looting from satellite images. The application of automated methods could speed up ‘citizen science’ methods to crowd-source data analysis efforts (Hersher 2017).