03/11/2023 | Notion

My week has been mostly focused on getting my investigations proposals ready for Monday. These are now available in the investigations table, and I’ll split this week’s diary into the same headings as the proposals and Mondays upcoming agenda.

Investigating gendered language in the archive

On the way home from work last Thursday I received a very welcome ‘investigation collaborator wanted!’ email from Anna-Maria, and we are now charging ahead with a proposal to use a customisable NLP to investigate how gender is represented in BT Archives (with scope to expand to other archives later on). Much of this investigation was pinning on how our conversation with Lucy Havens, PhD student that created the NLP, went on Tuesday and I was motivated by our discussion with her.

A couple of key points from the conversation with Lucy were:

The NLP is based on annotations by gender studies experts: individuals highly motivated to find disparity in the archive. This is definitely worth bearing in mind
In the annotation exercise Lucy came across similar tropes to ones I was expecting to find in my own work, including the labelling of adult women as girls, and the gendering of inanimate objects (mostly ships in Lucy’s case)
The annotation Lucy’s NLP is built on relied on 4 people working over a period of 8 weeks, totalling 400 hours of work, to read just 10% of the catalogue. So, if we want to tweak her process, we really need to rely on customising an NLP and not on attempting our own annotation!
While Lucy is not publishing until Christmas, she is planning to provide us some code in the next two weeks

Another interesting comment from Lucy was about the binary aspect of genderize.io. While obviously problematic from a contemporary perspective, she reflected that some gender scholars find it useful for historical work as it is representative of the binary the historic actors we are commenting on lived in.

Transforming personal researcher notes into archival descriptions

This proposal is mostly lying in wait ahead of Monday’s meeting, but I am intrigued by 1) Tim’s suggestion that we might do something with the many photos historians take of archival documents on their visits to archives and 2) Stefania and Alex’s comment on the usability of data such as Alex A’s for this project.

I am wondering, and this is an early, half-developed thought, if this project should be looking towards inventing some sort of collaborative space where all researchers can share whatever they want from their archival work in one location. Notes, worked tables, images, published papers, etc. These could be shared with the underlying agreement that “anything goes,” and archives or other scholars can do with the resources what they will. Useful for catalogue enhancement, useful for linkage, and crucial for saving information held by archives in event of changes to access?

Linking communications collections through computer vision

This week, I finally managed to stump LLaVA when I introduced it to images of telegraphy objects. The below NMS telegraph pole was labelled as “a wooden structure with a variety of knobs and dials on it” while the telegraph pole climbers from the SMG collection were labelled as scissors.

Untitled

Still, my goal is not actually to use LLaVA to always correctly identify objects, and instead I am interested to see if the descriptions it provides might be useful for linkage. I have started creating a spreadsheet of the keywords the descriptions produce, and colour coding these to show where there are similarities (small sample below).

Untitled

I was thinking about turning a more complete sheet such as this into the sort of map nomic atlas creates, however since then I met with Kaspar and have become very excited by the work he (and the rest of the team) have been doing on the Heritage Weaver project!

I am feeling optimistic that there are a few different ways this computer vision project could go, and while I’m interested to see how far I could take this LLaVA work, I am also very keen to combine forces with Heritage Weaver. Feeding NMS and SMG communications images into Heritage Weaver’s pipeline might mean we can start answering questions about what the linkages mean, instead of just wondering whether or not we can make the linkages in the first place.

Creating a searchable database of GPO circulars