10/11/2023 | Notion

This week four communications investigations got the green light to go ahead. This was also the week I finally got to get cracking on the communications-specific dataset for TAGLab, so it’s been great to finally stop thinking and reading about things, and instead start getting practical (although, more thinking and reading to come, for sure).

Gender in BT Archives

Anna-Maria and I have already launched into some work on the gender project. We have run the genderize.io on the full test dataset, which is a series of people records associated with objects in the BT collection. This list is 1500 entries long, with names repeating as it shows cases of a person being linked to catalogue entry. I am anticipating the actual list we work with being much longer, as it will be a combination of the 57,000 entries in BT’s database and 7487 unique people records matched accordingly.

We are not sure how to ensure the generize.io always runs on the first name, because the first and surname of people in BT’s catalogue is stored inconsistently so the order of the names often switches around. This is something to keep pondering – as I’m reluctant to manually correct 57,000 rows on OpenRefine. Any suggestions are very welcome!

Today I was granted access to BT’s catalogue, and I have exported a full list of person names, which I will convert into a readable format next week and then look to reconcile with the full catalogue export we already have. As well as giving me the ability to export catalogue entries, having access to the catalogue means I can access most digitised files belonging to BT, so if we look to take the NLP testing beyond the catalogue data, we might find some strong files to sample it on from here. Lucy Havens has been in touch to ask for my GitHub ID, so I’m optimistic we will have her NLP model to review shortly.

Searchable GPO Database & Bradford telegraphy

While I’m waiting to get set up on ABBYY FineReader, I have been trying to get to grips with how I could use Langchain and GPT for my work on the GPO Circulars Database. I’m keen to work out, soon, how much of this project will pin on manual work with ABBYY and how much can lean on the documents as they stand. Once I know this, I’ll be in a better position to start estimating just how large this project I am embarking on is!

A question from Tim regarding information relevant to the international reach of Bradfords telegraph network, alongside my own TagLab work this week which has been flagging hardly any telegraphy related hits, made me wonder: where are the telegraph materials? Not the fleeting references, but the proper meaty content? And, once we find them, what are they saying about Bradford?

The answer may lie in the Postal Museum. For a bit of context, BT Archives used to hold the full history of the Post Office, owing to the GPO controlling all means of communication including telephony. After the Postal Museum, and archive, emerged as a separate space, the GPO records have been (and continue to be) split between the two archives. Letters in one, telephones in the other, and, apparently, telegraphy wherever it best fits. I suspect this has often been the Postal Museum due to the period of the telegraph being a time when it made a lot more sense that the GPO was controlling all communications in the country, while the telephone was always written about and dealt with by separate departments.

The Postal Museum hold at least 12 years of Post Office Guides for Bradford and District, from 1903 – 1915. I imagine, once accessed, these would be valuable additions to a searchable database of the history of Bradford’s communications. I’m hoping to view them soon, and also open up conversations with the museum’s archive team about their interest in Congruence Engine.

TAGLab

For both my CV and researchers notes investigations I am waiting on some data, which has meant this week I had plenty of time to make progress with TAGLab. Today I finished the BT Archives dataset, and got to the Science Museum collection. Very quickly, it became clear my BT Archives mapping doesn’t fit particularly well with the SMG objects for two reasons. 1 – the BT dataset is archival, so topics revolve around very focused contents of literature, while the SMG dataset will be objects that fit into broader object-focused subtopics. 2 – the SMG dataset includes a lot of objects that have nothing to do with BT, notably cameras and cinema equipment.

It’s interesting to me that linkages I expected to come up through this work are not. For example, I thought BT’s collection would mix well with the SMG computing collection because significant objects, such as ERNIE I were the result of work at Dollis Hill, the GPO research unit whose archival files live in BT. However, having reached the end of the BT mapping, I haven’t created a single computing related master or subtopic.

Jack suggested we have a meeting next week to reflect on what we should be pulling from the Science Museum to link with BT, since what we have selected is too general. Looking back with hindsight, I can see that I initially was hoping that BT might map nicely against any reference to media in the Science Museum collection – but really BT is a telecommunication collection, not a general (totalitarian) communication one. Just as BT don’t hold all the answers for telegraphy in Bradford, they also don’t hold enough in-depth information about the history of generic communications to link to cameras in the Science Museum collection.