The Visualise Oral History project aims to use recent advances in large language models and audio-to-text processing to enable visual exploratory analysis of oral history archives.
With the advent of the History from below, Oral History has become a key source to connect museum collections with their social history and the skills, knowledge and memories connected to the objects’ stories. Several national and international digitisation projects have increased digital access to these sources in the past twenty years and explored new ways to share this material with the public. However, the data currently available in these collections often do not allow extracting machine-readable content from these sources and digitally linking them with other collections and knowledge.
Pessanha and Salah (2021) highlighted how recent advances in NLP might hold the key to “bringing new patterns to the light of day” (p. 12). Indeed, in the past five years, the emergence of large language models (Vaswani et al., 2017) has completely revolutionised the field of NLP, especially since the publication of open-source models such as BERT (Devlin et al., 2018). Transcriptions obtained by novel automatic speech recognition approaches (see, e.g., Radford et al., 2022) can be used in combination with document transformers for the effective numerical processing and categorisation of texts, including their spatial and temporal aspects (Liu and De Sabbata, 2021), and the visual representation of their relational similarity (see also, Suzen et al., 2020).

Our methodology so far is summarised by the pipeline above.
The images below illustrate how the presence of a series of identified topics throughout an archive can be visualised based on when those topics are discussed in the transcripts (first image below) or based on how their numerical representations cluster in the space created by the large language model.


The video below illustrates how the visual analytics tool can be used to interactively select a set of sentences that group together in the numerical representation space created by the language model and then visualised based on their position in the interview timelines.
oral_history_vis_prototype-131.mov
The next phases of the development of this project will focus on discussing the proof-of-concept tool showcased above with oral history researchers to better understand their user requirements and the affordances of the tool moving forward.
Moreover, we plan to explore how this approach might allow us to link the contents of oral history archives with museum collection metadata. We aim to use the same large language model to obtain numerical representations of the descriptions of objects in museum collections to allow a comparison between object descriptions and interview sentences. In turn, that would allow us to, for instance, pull the museum collection objects that seem to more closely fit what is being discussed in an oral history interview. The objects could then be integrated into an analysis or within the visual analytics tool.
On a more technical level, we are currently working towards: