Present: Tim, Jane, Alex B, Alex A, Stefania, Arran, Jane, Sarah, Daniel, Alex F, Felix, Anna-Maria, Kunika, Stef

Apologies: Helen

Agenda:

  1. Deeper Catalogue Data - Tim An Intensive day with some progress, 28 April 2023
  2. Oral histories and folk songs investigation - Stefania, Daniel and Stef

Notes:

  1. Although we started by thinking about ‘bags of terms’ as fodder for linking, but this investigation, amongst others, is beginning to show that taxonomy and categories remain very important for linking. 1921 catalogue contains a list of subject headings and that could easily be transposed. (but it would, of course, need to be added to).

    Investigation has 3 ways of extracting object names:

    1. Use the headings and break them down semantically. Required human-in-the-loop
    2. SPD
    3. KeyBERT

    Both SPD and KeyBERT are effective and it is difficult to say whether one is a lot better than another. Small further work on experimenting with the specificities - not quite proof of concept yet.

    Questions asked by Tim:

    Link to Curatorial Voice project and published articles https://curatorialvoice.github.io/.

  2. Oral histories - Stef and Stefania. To what extent can what has already been done be understood as proof of concept?

    There may be a risk to pausing the investigation after getting the historians excited about the oral histories work. But we can un-pause at any point. Stef’s PhD student will be working on themes about oral histories and living well together in cities. So, there will be continued development work.

    Stef keen to highlight that we need to keep expectations managed as this is very much at proof of concept phase, and not even at prototype stage.

    High performance computer would be needed to run this again on a different dataset. Felix interested in what aspects of the work require high performance computing. The automated transcription would, but also the embedding (using Bloom). (Alex B - As we get some VMs here at SMG, Stef might be able to tap into some of that too.)

    Neslihan bringing together object records and oral histories - this might be the point to bring together the data into a vector database, particularly at the SMG. Pinecone or weaviate.

    We agreed that it would be good to move forward with the extended meeting with historians. Also agreed that if Stef needs to / wants to move on with the work next year, we will support it as best as we can.

    Daniel offered that there are over 900 pages of transcription related to mining oral histories if we wanted energy related material too.