The Visualise Oral History project aims to use recent advances in large language models and audio-to-text processing to enable visual exploratory analysis of oral history archives.

Background

With the advent of the History from below, Oral History has become a key source to connect museum collections with their social history and the skills, knowledge and memories connected to the objects’ stories. Several national and international digitisation projects have increased digital access to these sources in the past twenty years and explored new ways to share this material with the public. However, the data currently available in these collections often do not allow extracting machine-readable content from these sources and digitally linking them with other collections and knowledge.

Pessanha and Salah (2021) highlighted how recent advances in NLP might hold the key to “bringing new patterns to the light of day” (p. 12). Indeed, in the past five years, the emergence of large language models (Vaswani et al., 2017) has completely revolutionised the field of NLP, especially since the publication of open-source models such as BERT (Devlin et al., 2018). Transcriptions obtained by novel automatic speech recognition approaches (see, e.g., Radford et al., 2022) can be used in combination with document transformers for the effective numerical processing and categorisation of texts, including their spatial and temporal aspects (Liu and De Sabbata, 2021), and the visual representation of their relational similarity (see also, Suzen et al., 2020).

Methodology

voh-pipeline.png

Our methodology so far is summarised by the pipeline above.

  1. The audio recordings are automatically transcribed to text. In our first experiment, we used OpenAI’s Whisper, but doubts have been cast about the approach taken by OpenAI to train the model. Thus, we are currently looking at alternative models. Current candidates are: Facebook's XLSR-Wav2Vec2, NVIDIA’s NeMo and Microsoft’s SpeechT5.
  2. The transcripts are then processed using the large language model Bloom to generate a numerical representation of the text and the BERTopic approach to identify topics.
  3. The text, its numerical representation and the identified topics are used to create a visual analytics tool that allows researchers in oral history to visually explore and interact with the entire collection.

Case study

The images below illustrate how the presence of a series of identified topics throughout an archive can be visualised based on when those topics are discussed in the transcripts (first image below) or based on how their numerical representations cluster in the space created by the large language model.

voh-mm-timelines.png

voh-mm-scatterplot.png

The video below illustrates how the visual analytics tool can be used to interactively select a set of sentences that group together in the numerical representation space created by the language model and then visualised based on their position in the interview timelines.

oral_history_vis_prototype-131.mov

Future work

The next phases of the development of this project will focus on discussing the proof-of-concept tool showcased above with oral history researchers to better understand their user requirements and the affordances of the tool moving forward.

Moreover, we plan to explore how this approach might allow us to link the contents of oral history archives with museum collection metadata. We aim to use the same large language model to obtain numerical representations of the descriptions of objects in museum collections to allow a comparison between object descriptions and interview sentences. In turn, that would allow us to, for instance, pull the museum collection objects that seem to more closely fit what is being discussed in an oral history interview. The objects could then be integrated into an analysis or within the visual analytics tool.

On a more technical level, we are currently working towards: