A few investigation specific updates:

Gender

Thank you to Kunika, who worked with Anna-Maria and myself to merge our existing BT Archives dataset with the ‘people’ export from the authority files in BT’s catalogue! While the process was easy once the VIB extension was installed, the results were not ideal – with a lot of authority codes not linking through to authority records.

I have since linked the entire authority dataset with the BT catalogue, and while this initially still led to 15,331 codes not linking, after Kunika came to the rescue again this number is now down to 921. I intend to ask the BT team about possible reasons for these codes not matching authority records, perhaps things have been systematically moved in the recent past. Depending on what they say, I will then look to ignore or manually correct the 921 codes that do not attach to authority records. Once this dataset is complete and clean, we can then start to look at the different techniques we can employ to quantify gender representation in the archive.

GPO Circulars

I have converted the 1979 circular into an RTF file, which keeps the titular formatting of the circular but splits the columns up into page chunks. My understanding is that I now need to convert this file into pickle format, and then markdown, to start feeding it through GPT. Awaiting advice on how to move RTF to pickle, if that is indeed the correct next step!

I have also had a brief exchange with the Postal Museum, who would be willing for their Bradford telegraphy material to be fed into this pipeline without the need for a formal arrangement as long as the information was properly credited. I’m keen to get further on with the information I already have before exploring their archive and considering expanding the scope, but it’s good to know there is potential to take this project even further.

CV on Comms Collections

Looking forward to talking to Kaspar about next steps for this project in collaboration with Heritage Weaver!

In the meantime, I’ve been playing with my own proposed LLaVA -> KeyBERT -> nomic atlas process. After a few false starts, I’ve finally been able to feed my own data into nomic and produce maps. These are mostly meaningless for now as they have so few datapoints, but I look forward to seeing the results much later when I should have five key words for around 9,000 objects to unite in one space.

I am in the process of setting up the LLaVA Colab notebook (thank you again Kunika!) after a fun but unproductive day yesterday trying to set it up to run through Juypter on my larger computer (the environment that was already in place to run ARC-GIS was not compatible with LLaVA’s required extensions).

Personal Researcher Notes

I am still waiting to hear back from the THG about their involvement, but in the meantime, I decided to run a few tests of the proposed process for note conversion on my own PhD notes. This is also with the goal of informing the note-taking template I will send out to the participants.

My PhD notes were detailed down to page number, which is probably a bit much to expect any sane person to do (see small sample below):

Untitled

The current BT archives catalogue description for POST 33/1445, which is based on my own work looking into the files, is as follows:

“A collection of files relating to the Electrophone Company from 1922 – 1925. This box includes a copy of the license signed between the General Post Office and Electrophone Limited; a file containing details of royalty disputes between the GPO and Electrophone Limited; a file containing graphs showing monthly revenue and subscription numbers; an accounts file for the year 1922 – 1924; a file containing details of a serious review of financial arrangements which led to the termination of the company and the license between the GPO and the company in 1924/1925; a cessation file; a file documenting a GPO staffer’s proposal that GPO men be used to canvass Electrophone Services in 1924 and the secretaries subsequent refusal; a file documenting the reduction in the Electrophone Exchange Manager’s working time as the Electrophone business reduced; and files on the disposal of Electrophone equipment as well as the recovery of Electrophone Exchange equipment owned by the Post Office.”

I fed all of the notes in the ‘description’ section of my own notes into GPT and asked it ‘can you create a file summary using these notes written about the file?’ The output was very detailed:

The file chronicles the challenges faced by the Electrophone business since its transfer to the National Telephone Company's system. It encompasses a range of correspondence between the Electrophone Company, the General Post Office (GPO), and various internal memos discussing canvassing, commissions, declining business, and proposed solutions.

Highlights of the file include:

Throughout the file, there's a recurring theme of declining business attributed to the refusal of commissions for canvassers. Mr. Booth's persistent requests for reconsideration, along with internal debates, meetings, and attempts to find alternative solutions, shape the narrative of the Electrophone Company's struggle to maintain business under the new GPO arrangement.

I then asked ‘could you reduce that summary to a 100 word one?’ and the output was more akin to an archival description:

“The file details Electrophone's struggle post-transfer to the National Telephone Company. It covers disputes over canvasser commissions, declined proposals by Mr. Booth, internal debates on declining business, and objections from the Treasury. Persistent requests for reconsideration, numerous meetings, and suggestions for alternative solutions shape the narrative. Financial disputes and debates over performance metrics add complexity. Overall, it reflects the Electrophone Company's challenge in sustaining business under new GPO arrangements, primarily attributed to the refusal of canvasser commissions.”

A final small trial I did was to run the description column of my notes through KeyBERT first, to create a list of key words for each file level description. I then fed these to GPT, and asked ‘can you create a file summary using these keywords extracted from notes written about the file?’ GPT gave me this:

“It seems like these notes cover a wide array of topics related to an electrophone company, canvassing, commissions, letters, memos, and internal communications, often involving the GPO (General Post Office) and various commissions. There are discussions about canvassing techniques, commissions paid to canvassers, subscriptions, booths, telephones, and electrophones. Internal memos and letters seem to address objections, declining business, monitoring performance, and even investigations. The file appears to be a comprehensive record of the operations, challenges, and interactions of an electrophone business with governmental bodies like the GPO and its various commissions, along with its canvassing strategies and associated correspondence.”

This summary is less inclusive of organisations and people involved in the file, taking more of a generic listing format. I think the results of this mini-trial are quite exciting, but I still need to deal with how we might record and store other information that would be required in a CMS such as file creation dates, or relevant authority files to link through from the BT authority catalogue.