This week I have been looking at how to turn questions from my last diary entry into real investigation proposals, so lots of (digitally) searching archives and trying to familiarise myself with technologies available to us. The ideas I have been focusing on are 1) how to turn researcher notes into catalogue descriptions 2) how to examine use of gendered language in the BT Archives dataset and 3) what we might do with some of the GPO Film Unit sources at BT Archives. Next week I will be exploring cross-institutional communication, joining the TagLab project, and continuing with some of the ideas developed below.

  1. Researcher notes into catalogue descriptions

This is an idea I am really excited by, and I have scheduled some time in the archive soon to have a look at potential files to share with a group of volunteers to explore. I am currently thinking that a simple table would suffice for users to input their notes, so they do not need to learn any new technologies but the information they produce is already in a tabulated form for processing. I am imagining KeyBERT would be a good way to process information and turn it into something approaching a catalogue description, but am also open to other digital techniques.

  1. Gendered use of language in BT Archives

I started the week simply searching BT Archives metadata for gendered terms in the description and title fields, noting statistics such as the word “women” came up 2x less often than “men” in the description field, and 3x less often in the title field. I think a next step would be to look at a combined catalogue which brought together object descriptions and authority records, so we could investigate the gender of names linked most often to catalogue descriptions. This would be much more revealing than the gendered language I have been focusing on, as far more catalogue entries have an authority record connected to them than have any of the gendered terms I searched for.

I’ve also been looking at Lucy Havens work, which has focused on annotating datasets and using NLP to look at gender biased language in archival collections. In her 2022 paper, Havens defined gender biased language as “language that creates or reinforces inequitable power relations among people, harming certain people through simplified, dehumanizing, or judgmental words or phrases that restrict their [gender] identity; and privileging other people through words or phrases that favour their [gender] identity.”

Her work can produce clear visuals that reveal how gender is situated in a collection, for example the below chart is taken from her 2022 article reflecting on archival documentation on the Centre for Research Collections at the University of Edinburgh. The article itself is here (https://aclanthology.org/2022.gebnlp-1.4v2.pdf) and explains the annotation labels (which, crucially, move away from the gender binary).

Untitled

  1. GPO Film Unit

Four collections readily connect when looking at the GPO Film Unit: these are BT Archives, who hold some of the films; BFI, who also hold films, including some duplicates from BT Archives; TNA, who hold production documents; and the Postal Museum Archive, who hold films not held by BT, as well as a series of stills from the films, Public Relations documents related to film production, reports by a General Committee that investigated the Film Unit, and a handful of oral histories by those that worked for the Film Unit.

What can we do with this information?

I have been exploring some linkages between these archives myself, trying to focus on areas that are relevant to the project already. A couple of linkages I have found include The Story of Cotton, a film produced by the GPO and held at BFI, and Coal Face, a film about coal miners, held at BT and the BFI, with stills at the Postal Museum, and accompanying material at BT. Another linkage, more relevant to my own interests than the energy or textiles strand, is What’s On Today, a film about recording of a sporting event to be distributed over telephone (thus embracing the telephone as a unstable medium), which draws together BFI, BT, TNA, and potentially even the BBC.

Digitally, I am not sure what we could do with this. Most of the files I am finding are not yet digitised, so there might be a case to be made for OCR capturing them if this would allow for linkage. This could also go hand in hand with the use of Whisper for transcribing these videos, giving us more metadata to play with. Or, perhaps annotation is the way forward. A goal for next week is certainly to think more about the technical approach for this potential project.

Looking forward to more questions, more conversations, and more investigations next week!