Kaspar Beelen and Kunika Kono

Description

This investigation explores novel ways of connecting digital museum collections. It takes inspiration from Anna-Maria’s previous research on museum collections as data. Similarly, we aim to build on the ideas driving Heritage Connector, in the sense that we want to reflect on

a) extracting meaning from unstructured information (in text and images)

b) developing methods for binding together otherwise isolated and lonely data points.

We investigate different strategies for weaving together the sparse, thin and often disconnected records that constitute these collections. Instead of focussing on linked open data or knowledge graphs, we experiment with embeddings and representation learning as a method for connecting heritage data.

Representation learning is the process of transforming unstructured data, such as images and texts, to meaningful vector representations (also referred to as ‘embeddings’). It is an integral ingredient of today’s deep learning framework, which architectures both ingest and produce such embeddings. The process of encoding, however, captures only some aspects of meaning  and is fraught with many problems, especially when applied to heritage data. In this investigation we look more closely at:

a) how these models encode heritage data

b) what affordances embeddings provide for breaking down information silos and connecting heritage collections

c) how to improve representation learning for digital heritage.

More technically, we have a closer look at Sentence Transformers (free and open source) and GPT-3.5/4 (depending on budget) for encoding text. For converting images to vectors we harness the features extracted from Vision Transformers. Lastly, we investigate the application of multimodal models such as CLIP (and potentially ImageBind) which allow us to connect data across modality (i.e. image and text).

The Congruence Engine project has collected various database exports from museum collections across the United Kingdom. As mentioned, these records are often thin (i.e. are only minimally described) sparse (some fields have missing data) and lack associated data (such as images of the objects). Collections we are considering at the moment comprise Science Museum Group, National Museums Scotland, British Telecom Archives. Where possible we will scrape the images.

We do not intend to change the data itself, in the sense of producing enrichments that feed back into the museum collection. The principal goal is to produce a report aimed at the GLAM sector, which showcases novel and meaningful ways of exploring, connecting and traversing the digitized collections through embeddings. We will publish all code and data following the example of Heritage Connector. We will also prepare a publication for the Computational Humanities Workshop in 2024.

Experiments (so far)

Below, we demonstrate some experiments we ran on a small and random sample of the data. We report these experiments just to provide some initial intuitions of how embedding information might help with connecting data, but equally how we could harness these links to obtain novel views on museum collections at scale. Again, these are initial thought experiments, and over the investigation, we will look more critically at which algorithms and visualizations provide useful insights into these embedded collections.

We experimented with CLIP, a multimodal model that embeds text and images in a common vector space. Using these embeddings we can compare the images directly but also compute the similarity between a text and an image. For example, we can formulate a query, for example, the phrase “a bottle” to retrieve images that portray a bottle. We indexed around 6000 images taken from the Science Museum Group. Interestingly, if we search by an image, the results make sense, i.e. the retrieved images look similar to the query. However, when we retrieve images based on a text prompt, for example, “a glass bottle”, the results are more confusing: bottles do appear, but we also noticed that whatever we searched for, the images reappear that have no connection to the query.

Figure 1: query is the image of a glass bottle

Figure 2: query is the string “a glass bottle”

We can also explore images across collections by embedding them in the same vector space. The Figure below is based on 5000 images from the Science Museum Group and 600 from the British Telecom Archives. Converting images to vectors provides an effective way to break down silos. As shown in Figure 3, we can start studying where records from the BT Archives (red) sit in relation to the objects in the SMC collection (blue). Nonetheless, the orderings proposed by vectorization and dimensionality reduction need to be interpreted critically, something we hope to look into during this investigation.

Besides images, we can also vectorize text. In this last experiment, we embedded the descriptions of (a subsection of) items in the National Museum of Scotland (NMS) collection as well as the Science Museum data. For each description in NMS, we then inspect if there are ‘similar’ entries in the SMG (‘similarity’ means that the vector representation of the object descriptions are close to each other).