Two weeks mostly populated by preparing presentations with various co-workers, proposing book chapters, and a little bit of Bradford event admin. I will keep this update short and to the point, mostly to jog my own memory when I’m back!

Heritage Weaver

As discussed on Monday, Kaspar and I have agreed the aim of this investigation before the end of the project is to:

Circulars

As also discussed on Monday, Nayomi and I have agreed the next steps for this work is to:

We presented the work to BT Archives on Monday, and they expressed excitement about the pipeline. They see it might be a promising technique for them to to share their data within BT, in a way that might reduce the workload of answering internal user enquiries.

Gender

Lucy has helpfully drawn up a schedule of work, which I think demonstrates the goal of this investigation well. So, to paraphrase Lucy, this investigation will:

  1. Read through a subset of the BT data and manually note examples of gender-biased language (especially “stereotype” and “omission”)
  2. Run Lucy Havens’ models on the BT data (there are three different series of models that can be run on the data)
  3. Evaluate the usefulness of the models’ labels on the three labelled BT datasets with domain experts

Then we can decide on next steps based on what we find in the evaluation stage. For example:

A. If the models are accurate and useful, we could run the models on another dataset

B. If the models aren’t that accurate or useful, we could try using other NLP methods (similar to Baker and Salway) to investigate the language in the BT dataset

Anna-Maria and I will be talking to the BBC Archive team in two weeks time about possibly using their data as a comparison to the BT dataset.