Thomas Smits and Mike Kestemont, ‘Towards Multimodal Computational Humanities. Using CLIP to Analyze Late-Nineteenth Century Magic Lantern Slides’, n.d., 10, http://ceur-ws.org/Vol-2989/short_paper23.pdf
Notes
A useful discussion was had of the way in which this paper attempts to explore and evaluate the prospects of using combined text and visual analysis such as promised by CLIP and its proponents -- when applied to historical data, such as this collection of lantern slides from Exeter. Very prelimnary results seem to show a high sensitivity to textual prompts, but with no great difference in accuracy between text, visual or text+visual models. Could this be a way for a large collection to classify its collection; or, better: to search for content within collections automatically?