General updates
Anna-Maria and I submit a pitch this week to The Conversation to talk about ChatGPT’s potential for history as well as the considerable ethical considerations. We will be submitting this to a journal in the future regardless of the response we get from The Conversation.
I also spent a lot of this week tiding spreadsheets for the Pollution study, which gave me a nice opportunity to test ChatGPT’s RegEx abilities. The key highlight is: if you have about five rows of data, it’ll be great, if you’re working with a normal (larger) spreadsheet, prepare to do most of the heavy lifting yourself. Although, I’m pretty sure the RegEx aspect of ChatGPT is new, and so it’s worth keeping an eye out to see if it gets better.
Circulars
Nayomi suggested we try ABBYY FineReader alongside the other parsing techniques she’s been trying for the RAG pipeline, so we can compare different ways of processing the circulars and the impact that has on the output. I’ve been running ABBYY on the circulars when I can (which is not often, as it makes my computer scream in complaint) and saving it all as plain text. This means the columns get split up, which is the key issue with the circulars formatting.
For the LLM we are thinking of using GPT, which would be a nice way to allow for direct comparison between the MyGPT pipeline and the RAG one.
Personal researcher notes
One last researcher has surfaced and sent me their notes. What they sent me was totally different to anything I had received before. He sent me virtual ‘post-it notes’ which are stuck into a PDF from the archive at points he found noteworthy. To me, this seemed a bit of a lost cause.
However, the notes can be exported into an excel sheet, which when fed to my ‘Archive Scribe’ MyGPT returned the following description:
A technical assessment report on British Telecom kiosks, analyzing design, functionality, and maintenance. The report includes historical comparisons, financial implications of color schemes, and concerns regarding preservation. It highlights safety aspects, market demand, and potential for monetary gain through advertising.
It will be a few weeks until I get the volunteers’ thoughts on how accurate and complete the description is, but my impression is this is actually quite good. Certainly, it’s better than the result I was expecting. I tentatively propose that this may mean we can work with even the most peculiar format of notes.
After Easter, I will see if I can get a write-up of this investigation into a journal.
In the meantime, I’m going to be trying out some of the same prompting methods I used for this investigation on collections data, to see if it can help create summaries for the data registry work.
Comms & CV (Heritage Weaver)
I had a brief chat with Kaspar about looking to ingest textiles and energy collections from SMG and NMS in two months time, which he’s very up for. We are interested to see how the model works with these (potentially more challenging) objects. Telephones are fairly bog-standard objects to train a model on, while CLIP has probably not seen many engines or looms before!
We also discussed potentially trialling a segment of moving image material, literally ten seconds, to see if the tool can identify images within a moving file. I told Kaspar he would be very popular indeed if this were to be possible – so more experiments to come in time! I imagine Once Upon a Sheep would be a good case study, here.
Gender
We have reached out to Havens about possibly buying some of her time to finally get this NLP up and working, I’ll provide any updates as I get them.
Exhibit
We are hoping to send a brief out to designers in the next five weeks, for which we need a concrete idea of the data we have available, both for the mapping interface itself as well as for the “data points” where images, objects, and oral histories may be connected. I’m doing some digging around for this information, and will be working with Dave over the next few weeks to finalise the brief.
Tim and I had a meeting with Kate Burnett from NSMM about the potential for our Congruence Engine exhibit to feature within a wider ‘Bradford Screen Histories’ exhibition, which could mean we interlace some of our work throughout their exhibition, as well as have the exhibit and mapping interface. We will continue to talk to the team at NSMM over the next few weeks, and this could be a really exciting opportunity to showcase the way C/E techniques can help tell specific stories for an existing exhibition.