2 May 2023 | Notion

Present: Tim (chair), Alex, Arran, Jane, Daniel W, Stefania, Anna-Maria (online), Daniel B (online), Felix, Sarah, Carol, Nina

Apologies: Helen, Kunika

Agenda:

Data acquisition update - Anna-Maria
1. Data acquisition meeting
2. Overview of data acquisition so far
Trade Directories - Alex, Daniel W
1. Trade directories work in preparation for the occupations investigation & exhibit (Alex & Felix)
2. ML approaches to making trade directories machine readable (Daniel)
Folk songs update - Stefania and Daniel B

Notes:

Data acquisition:
1. report on data acquisition as of 28 April 2023
CE_data_management28042023.pdf

AMS talked through the report above. TB asked if there were different formats that the report could come in? Could we see what datasets are related to what investigation etc?

JW asked how we can get movement on the dataset documentation. Clarification that everyone is responsible for recording the datasheets on their datasets. SZL confirmed that she has datasheets for a number of her datasets, but they are in progress. AMS recommended uploading version, so we have the most up-to-date datasheet at all times. AMS - it is important to do some reflection work on the reasons why we’re collecting data and what we intend to do with the datasets.
Trade Directories: Alex and Felix - needed to do a bit of work related to the exhibit development. Trade Directory work at the Turing was doing particularly innovative work, but Felix and Alex needed to do some smaller (”lumpier”) work. Focused on Bradford and Newcastle due to the requirements of the exhibit. 2-3 weeks of data cleaning and modelling, using OpenRefine. Have mappable occupations data for about 30% of Newcastle, and think that can be scalable up to 70%. Difficulty of the data related to abbreviations etc. Interested in seeing how generalisable it can be. May experiment with segment everything - depending on what is decided with regards to DW’s investigation.

JW asked if 70% is enough for the needs of the exhibit and AB confirmed. It is useful for the first exhibit and understanding some of the social machine dynamics. There may be some human-in-the-loop (Memory mapping) experiments that can tie into the Bradford exhibit for the end of 2024. Also going to try and bring in some Historic England datasets too.
AMS - call for more documentation of this investigation (code on GitHub, updates on Notion), and on the exhibit (area on Notion, but nobody is documenting anything - something for @Nina to look into).
DW - what data have you been using? AB - Leicester. DW used Turing dataset (Living with Machine) and the work DW has done on Trade Directories currently in the Turing cloud. Needs to move over the CE space. (@Anna-Maria to chat with Daniel about this). We should probably stick to using the data from partners.

Details of Daniel’s investigation are recorded here - Historical Trade Directories. This is the investigation’s first time at this meeting and we’ll be discussing work to date and what the investigation needs to do next. Aspiration for a more programatic approach and something that can be generalisable. Leicester dataset is from a Jisc funded project and designed to be representative of the UK. Idea was to map out what would be needed to make this machine readable. Potentials for CV or layout segmentation. After the workshop, decision to focus on the Trade Listing sections and how to break the page into columns and then longer strings of text. When they are in a single stage, the text is more easily recognisable and segmentable.
- SZL - would be helpful to add Trade Directories data into the data sharing agreements SZL and CC are working on with Leciester (they were not initially counted as a data providing partner). @Arran to loop in William Farrell into conversation with DW, SZL and CC about the data sharing agreement.
DW sketched out three different options for the investigation going forward. Option 1 - create a generalisable (Trade directory wide) open tools and process. Quite a resource heavy option. Option 2 - more targeted, limited number of Trade Directories related to the Communications strand and run a proof of concept. Trade Directories are a communicative form in themselves, but this would look at trying to identify a communications-specific industry. Good discussion on potential communication topics, to be picked up at a communications meeting. Option 3 - do no more extra development work. Wrap up and document what has been done and make available. TB - documentation of whats been done is necessary, no matter what option is gone for.

Potential for the two approaches to trade directories to be complimentary for the project’s work on social machines. DW’s ‘upstream’ - high level, resource heavy. AB & FNS approach ‘downstream’ - ground level, ‘lumpy’ approach. Good comparative approaches to support scalable approaches to the social machine, rather than selecting one.

Folk songs: Details of the investigation can be found here - Connecting workers’ songs with mining and textile collections: a cross-strand investigation. The investigation is coming back to the investigation meeting to share progress and establish the next steps.

Reflections on the folk song investigation.pdf

Working with MS Word as annotation tool with Jennifer. Focusing on annotating the lyrics to create an annotation schema. Focus on the human-in-the-loop aspects of working with Jenn’s expertise. There was an assumption that Word was widely accessible, but Jenn did have it and that is a good prompt for us to consider the openness of tools we use. Have done two 2-hour sessions annotating songs with Jennifer on Teams. Want to bring in reflections on the annotation process and reflections on working with a domain expert.
- JW - often best to not go for tools you need to install (re. word). SM mentioned Recogito good for place based things, but not for others. However, it could be used if they have their own schema. Good to think about barriers to using tools. Potentially Google Docs, it can be saved as a word doc.
- DW - could we see examples of the type of annotation (SZL shared - link in notion associated with the investigation record).
- TB - lets take film question to an exhibit meeting.
- TB - potential points at which the other investigations start intersecting and where the collections illuminate materials rather than it being about starting at the collections and moving outwards.
- JW - impressed by the sensorial/sonic elements brought by Jennifer into the session, this is something that would be great to document through the film. Interesting also the use of Whatsapp to communicate/share links & images to reflect on the specific ways of working that the investigation is originating.
Key action points from the discussion on next steps:
- AB suggested to experiment with both Word and Recogito with Jennifer. This would require Stefania and Daniel playing around with Recogito first, and see how we can add our tagging system. Sarah highlighted how Recogito is very good with places but it needs to be adapted to our specific aims.
- Stefania and Daniel will create a tabular version of the categories we came up with on Notion, adding our key questions and a short description so Alex, Kunika, Kasper and Sarah can contribute with their thoughts. Alex will be in touch to organize a meeting in a couple of weeks to discuss this in details.
AB asked AMS if she can suggests reading from OH projects that can help us not only to reflect on the content but also of the sonic affordances of the data. A training about modeling with Sarah is already under discussion.