Two weeks mostly populated by preparing presentations with various co-workers, proposing book chapters, and a little bit of Bradford event admin. I will keep this update short and to the point, mostly to jog my own memory when I’m back!
Heritage Weaver
As discussed on Monday, Kaspar and I have agreed the aim of this investigation before the end of the project is to:
- Evaluate how well HW works beyond telephones, radios, and valves
- Also evaluate the effect of fine-tuning CLIP
- Identify looms in Once Upon a Sheep (compare to annotations from Max’s word) and objects in the Baird TV photo album (compare to Zooniverse work by Alex F)
- Run a group annotation exercise to understand how effective people find the linkage produced (and the effects of fine-tuning)
- Look to ingest a smaller museum collection to see how smaller museums could be part of proposed national shared vector space (I have approached Birmingham Museum, thanks to Daniel for the recommendation)
- Release data and code, and write up publications:
- Historical book chapter focused on computer vision, a section of this work will also feature in a chapter on moving image
- Write up a journal article with DH focus or write for a computational humanities research conference
Circulars
As also discussed on Monday, Nayomi and I have agreed the next steps for this work is to:
- Further explore how the circulars can be used for linkage. I will take the lead on this, prompting RAG and then Heritage Weaver in turn to see what outputs we get
- Improve the search function of RAG, possibly by embedding a graph search
- Prepare RAG to absorb more data at a time, and see how this impacts the search results. It’s worth noting we’re meeting with Asa in two weeks’ time and hope to discuss this with him then
We presented the work to BT Archives on Monday, and they expressed excitement about the pipeline. They see it might be a promising technique for them to to share their data within BT, in a way that might reduce the workload of answering internal user enquiries.
Gender
Lucy has helpfully drawn up a schedule of work, which I think demonstrates the goal of this investigation well. So, to paraphrase Lucy, this investigation will:
- Read through a subset of the BT data and manually note examples of gender-biased language (especially “stereotype” and “omission”)
- Run Lucy Havens’ models on the BT data (there are three different series of models that can be run on the data)
- Evaluate the usefulness of the models’ labels on the three labelled BT datasets with domain experts
Then we can decide on next steps based on what we find in the evaluation stage. For example:
A. If the models are accurate and useful, we could run the models on another dataset
B. If the models aren’t that accurate or useful, we could try using other NLP methods (similar to Baker and Salway) to investigate the language in the BT dataset
Anna-Maria and I will be talking to the BBC Archive team in two weeks time about possibly using their data as a comparison to the BT dataset.