Guide for the uninitiated: Please Enter!

When is it?

6-7th December, 1030-1730 on the first day and then 0900-1730 on the second.

What is the aim of this workshop/hack thing?

We are hoping to experiment with some of the latest tools in computer vision, OCR and document analysis to test the feasibility of extracting machine-readable information from our collection of historical trade directories (provided by the UKDS and the University of Leicester).

What is a hack? What can I expect?

For those unfamiliar with computer/data science research, this is an occasion for group working and experimentation around a specific challenge, where there is a benefit to being in the room together and working and discussing collectively.

What will people be doing?

Those attending will largely be research software engineers, data scientists, digital curators and others, experimenting on data, tools and software libraries to achieve the aims of the workshop.

How does this look?

People will break into small sub-groups spontaneously around particular tasks and sub-tasks and pursue them. They will sit on their laptops and ‘hack’ with data and code and, typically, report back to the main group periodically through the day/s on their progress.

What is the programme?

Apart from the start time and the end time, there is no programme. The participants will decide and form the programme as determined by the work that takes place. It is free-flowing and unpredictable by its nature.

What can I contribute if I'm not a coder?

Historians, curators and other non-coders can bring lots of subject expertise to hack events by listening to the way that problems are formulated and helping ensure that solutions are sensitive to the source material; that any data or solutions proposed fit the possible needs of future researchers (including themselves!), and by answering questions about the detail of the material being worked on.

In addition: where Machine Learning (ML) is involved there is a crucial role for annotations to play in the process of developing rigorous tools. This is the first instance of ML being used in Congruence Engine so this will be an opportunity for colleagues to annotate data, images and documents – and thereby to developing a better understanding of how ML pipelines work: what they can and cannot do.
And further: there is plenty of information about the provenance and history of the collection we are using, which it would be very useful to gather and collect. Colleagues can familiarise themselves with some of this literature in ways that will be very helpful for planning next possible steps that may result from this hack.
Even further: for anyone interested in questions of AI or ML ethics and practices this may also be an interesting angle from which to explore the nature of our own annotation process and the nature of the models we will rely on.

Where is the hack/workshop?

This will take place within the Alan Turing Institute's offices in the British Library in London. We have two rooms booked, but you are free to use other co-working spaces within the offices to do work in small groups or singly if you prefer. The office is open plan so we ask you to respect the space where others are working, and to keep noise down in shared areas.

What else can I do at the Turing?