Trade Hack infos here: Trade Directories Workshop, Dec 22

Post-Hack Update

With DONUT , I got to the point where we had end-to-end fine-tuning taking place, towards a binary classification task (trades, or other).

To make this actually useful/practical however, it needs wrapping in a full training and evaluation loop – checkpointing, per-epoch evaluation on the test set, logging metrics (F1, mean accuracy, loss) to Tensorboard, etc. Then expanding to multi-category training (which is the easy bit!)

Potentially the most promising/useful avenue of exploration however, in terms of ML models, was that followed by Ryan Chan (Research Software Engineer, Turing. Link to bio). He was getting some good results from DiT, and it’s much easier to work with and fine tune than Donut, with good support from Hugging Face for auto-training, fine-tuning etc. Also DiT looks to be the best candidate model for layout parsing.

Useful Links

davanstrien (Daniel van Strien)

ryanchan26 (Ryan CHan)

Hacking…

This article was the starting the point 👇

Accelerating Document AI

Started off working with DONUT, given that this looked to be a good starting point for an image classification run in the first instance. The thinking here is that if we can tackle a basic task first – is this image a page of trades or not – then we can move on to other tasks like layout parsing once the data is ready to go.

After a bit of a false start (following this notebook) I landed on this set of tutorials – which is linked directly from the Donut page on Hugging Face:

Donut

Pre-trained DONUT needs a very specific dataset format (described in the paper) and doesn’t work with Autotrain. However I have a Colab Notebook up and running – here – which pulls Daniel Van Strien’s set of binary classified trade directory pages (here), fettles it into DONUT format then feeds that into a basic train/eval cycle.

Pre-hacking…

(Re-)installing Pytorch and Jupyter, via Anaconda. Into its own environment: