This has been, for me, a very productive couple of weeks, with a series of meetings dedicated to different aspects of the two investigations I am currently working on.

ORAL HISTORY INVESTIGATION

Daniel B. and I met Carol and Nina earlier last week to discuss about how to frame the agreements to use Oral History datasets with the two partners we are currently in discussion with: the National Coal Mining Museum and the University of Leicester. This is a very important stage in the process, as we need to discuss not only the use of their data, but also to reflect on the ethical and copyright implications of working with this type of material. A further meeting was scheduled with the SMG legal team in which we realized that the previous contract drafted during the development of the Congruence Engine bid does not adapt to new partners, and each new contract need to be drafted accordingly to the specificities of the dataset and the nature of the collaboration. Daniel was involved in a first meeting about drafting a data sharing agreement with the National Coal Mining Museum, and a second meeting has been scheduled next week to talk about the University of Leicester. In this case, the agreement we also will need to consider the intellectual contribution Stef De Sabbata is bringing in the project, them being the data scientist involved in the Oral History investigation.

This week has also been extremely prolific in terms of how to structure the next steps of the investigation. I was in Leicester on Thursday to meet Colin Hyde (EMOHA) and Stef De Sabbata and discuss the datasets. I have been working with Colin in the past years, as the East Midlands Oral History Archive was one of the partners of the Unlocking Our Sound Heritage project, where I did my first PhD fieldwork research. In 2020 and 2021 I also worked with Colin in the Module ‘Engaging Audiences’ at the School of Museums Studies, where we guided students in the creation of a Sound Exhibition inspired by the oral history archive. Ethics in the use of the data was a key concern across the module and we reflected on the fact that this is still a consistent barrier in making this material more accessible in innovative ways. We agreed that reflecting on the ethical implications of processing this type of data needs to be a key aspect of the investigation, and I am keen to involve Anna Maria in this conversation soon. We then discussed the two datasets from the EMOHA that we identified for this preliminary stage of testing: Mines of Memory and Textile Tales. Both datasets include high quality audio recordings and digitized transcripts. Colin highlighted how Mines of Memory in particular is low risk in term of content sensitivity, suggesting that we could also use reductive datasets in the experiment, so excluding any sensitive interview. In relation to Textile Tales, he explained that the recordings – but not the transcripts - are archived at the EMOAH and there is an agreement with the NTU - which retains the copyright - mentioning the opportunity to use the recordings for research and public use (including internet access). Next Monday I have a meeting with Tom Fisher, Tonya Outtram and Helen Foster – the EMOHA researcher who worked in the project – to discuss this in detail.

After this meeting, I also had a two hour session with Stef where we could talk extensively about our preliminary findings and reflections, and their experiments with large language model with social media data. We discussed the phases of the pipeline (transforming audio into text using Whisper and then text into number using the large language models) and what we think can be the core of the experiment: exploring different ways to visualize the topics and connections emerging from pre-processing the data through the models. I showed to Stef the geolocalized timeline I created on Omeka to visualize the connections enclosed (and manually extracted) in an oral history transcript, and we reflected on the fact that a visualization like this can help to imagine new ways to share oral history recordings online and make people interact with these stories. We compared this example to the visualizations Stef developed for their previous experiment on social media, showing different layers of aggregated data. We discussed the potential for this type of visualization – and further types that we will explore in this experiment – to inspire – and be inspired - by specific historical questions. Once we will have a preliminary test (hopefully by the end of March), our intention is to extend this conversation with a group of historians and researchers that know these datasets very well and can help us to understand which visualization can support different kinds of research questions and modes of interaction with the data. I feel that what we achieved in this in person session together could not have been achieved in an online meeting, and we decided that a monthly, in person day in Leicester would make our work together more productive and fruitful. I will discuss with Alex another potential date in March to go to Leicester together. This would allow also me to see how Stef is progressing with the coding and the EMOHA data, considering that in order to access this dataset we would need to have the agreement in place, and this could take time. However, we discussed the opportunity of involving a data scientist – internal to Congruence Engine – on this, with whom Stef would be able to share the coding. This would also allow us to experiment the same pipeline with the OH datasets we previously collected from the Congruence Engine partners.

FOLK SONG INVESTIGATION

The folk songs investigation is equally progressing quickly. Tim put us in contact with Mick Grierson, Professor of Creative Computing at UAL, and we began a conversation about the opportunity to involve musical analysis software in this investigation. Mick shared our request to his team and we hope we can schedule a conversation soon. On Monday Daniel and I joined the first digital drop in session with Kunika Kono and Alex Butterworth. Ahead of this meeting, Daniel shared with Kunika two preliminary visualization tools (the knowledge graph on ONODO and a digital story on Yarn) both showing the connections enclosed in the song ‘The Row between the Cages’. We reflected on the potential of Yarn to display the narrative dimension and the ability of the knowledge graph to visualize different types of connections. However, Kunika pointed out the fact that, before choosing a visualization tool, we need to understand the data format we are going to use. She suggested to explore the use of XML, a machine-readable format that would allow us to deconstruct the song lyrics in individual pieces, which might be useful for future analysis. We also met with Simon Popple to talk about the role of Yarn in this investigation and, more broadly, to explore the connections between personal stories and museum collections. In creating the story about my flat cap for the symposium organized by De Monfort University, I wanted to include a museum object from an online collection website, but I noticed that the only way to do that is to download the image, copy the metadata information and upload both as item on Yarn. Daniel had the same issue with his story. We reflected on the opportunity to develop a ‘museum collection’ retrieval function, similar to the one used to embed videos or sounds from YouTube or SoundCloud. This would allow Yarn users to include museum objects in their story more easily, and it would also be a way to encourage people to search and find collection items for their stories. We agreed that we will share with Simon our reflections on potential features that might help us to reflect on the role of Yarn in exploring the concept of national collection as a verb. We also talked about the role that YouTube could have in this investigation, both as an archival space – to find audio and video recordings which might allow us to compare different versions of the same song – but also as a community space. Simon noticed how the folk song videos on YouTube often displays comments from the experts or performers, who might also explain the context of the performance or specific dialectal terms.

We also started discussing with Tim, Helen and Arran about the potential involvement of Jennifer Reid in this investigation. We would like to update her on how the investigation is progressing and explore to what extent she would like to be involved. She would be key in helping us reflecting on the linguistic/performative dimension of the songs, and we would also like to explore the opportunity to use her material from the Manchester Workshop (recordings of the performances, digital copies of her musical scores and any further data she would be happy to share).

Alongside the work for the investigations, I have been preparing a lecture for the SMG led Module Curating Science and Technology. In the lecture, which took place on Wednesday, I reflected on the changing relationship with audiences in a digital participatory culture and I presented Yarn as a space to experiment with the creation of an object biography. At the end of the sessions, we gave students time to think about a potential digital story and we encouraged them to use the platform for their assignment.