26/10/2023 | Notion

A slightly early diary entry from me as I’ve had an exciting week, and wanted to get some things down before the equally exciting Bradford workshop tomorrow. It’s a very long one, so I have divided it into a technology and an archival work section.

Technology work

Monday kicked off a technical two days for me, with a meeting with Alex to go over the technical aspects of my investigation proposals for the 3rd November. It was great to work with quite a few of you in the digital drop in to look at some Computer Vision techniques, and I ended the day with a catch up with Jon to iron out the research questions for the investigation proposals.

On Tuesday, feeling a little overwhelmed by just how much I knew I didn’t know, I decided to experiment with some technologies myself. I have reached out to Lucy Havens about her gender bias NLP, which I hope to be able to apply to BT’s catalogue and possibly even a file of OCR’d data in the future, and have a meeting coming up with her next Tuesday. In the meantime, I’ve been familiarising myself with genderize, which can guess with reasonable accuracy the gender of any given name, although it can only handle one name at a time (so, Tasha, but not Tasha Kitcher, see below example):

Untitled

I extracted around 1500 names from the BT dataset we already have, using the ‘Person’ field which is only populated when the record relates to an object, and split these on OpenRefine. There is an issue I still need to address with the names acquired this way, as they are sometimes written ‘Forename, Surname’, sometimes ‘Surname, Forename’ and sometimes ‘Title, Forename, Surname’, so running genderize on one column means it is sometimes picking up surnames, sometimes forenames, and sometimes titles. Still, initial results are showing a strong trend with 150 female names identified, 201 names not identified, and the remaining 1149 names all identified as male.

In the future, I would look to search the whole BT catalogue names database for gendered discussions. The goal would be to understand how gender is talked about, using Havens work or something similar, as well as understand how often male or female names are connected to catalogue entries. To get this list of names I would need to reconciles BT’s authority list with the catalogue export we have, which has authority codes on but not the actual people and company names.

After gender, I had a look at some of the Computer Vision tools from Monday’s digital drop in discussion. For now, I am testing tools on two images of the same telephone held in the NMS and SMG collection. The telephone is the 706 dial phone made of ivory. Images are below, both clear images without background ‘noise’ but taken from slightly different angles.

Untitled

I am interested to see if any pre-existing CV tools could be used to identify the connection between these objects, which would in some ways circumvent the need to train them. When putting the images through LLaVA (Large Language and Vision Assistant) I was pleased that both images were recognised as white telephones, however the descriptions did reveal a slight discomfort with the technologies:

The top image, when put through LLaVA, produced this chatbox – “The image features a white rotary dial telephone sitting on a black surface. The phone has a clear dial pad, which is a characteristic of older telephone models. The phone is placed on a table, and it appears to be in good condition, ready for use. The simplicity of the design and the absence of any additional features or accessories make it a classic and nostalgic piece.”

The bottom image led to this – “The image features a white telephone with a clock on the front. The clock is positioned in the middle of the phone, and it appears to be a prominent feature of the device. The phone is placed on a black background, which contrasts with the white color of the phone and the clock. The overall scene gives the impression of a simple, functional device that combines the functionality of a phone with the convenience of a clock.”

The top image description is much more accurate, while surprisingly the dial being almost head-on with the photographer in the bottom image has confused the programme (where it thinks the phone has a clock). In both cases the programme identified a white telephone, which, thinking more generally, might be a useful way to group objects. Might a researcher ever think “I am searching for an ivory telephone, I will search for a white telephone and look through the N.number of results this CV tool presents me”?

I ran these description through KeyBERT afterwards, wondering if key words would be the way to link these images. The keywords are visible in the Python screenshot below, and might be useful if we were looking for a tool that could identify telephones in collections… but I think ultimately we would need something with finer granularity for the sort of collections communications historians deal with.

Untitled

I have some more tools to trial next week, but am still very much pondering how a CV project linking communications collections would work. Dan ran these two images through TensorFlow and had a similar result – the tool knows they’re both phones, but thinks one contains a watch, and doesn’t know any more than that.

Archival work