(result were rounded to the nearest whole number)
Tesseract-OCR out of the box, original image. 👎
CER 100, WER 100, LDR 0
As suspected from the score, this text extraction would not be useful
Tesseract-OCR out of the box with deskewing 😟 (gray scaling must be done for deskewing)
CER 46, WER 147, LDR 56
<aside> 💡 Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, and thus, the word accuracy can be smaller than 0.0.
</aside>
With the deskewing and gray scaling, the results improve but the WER is not good
Tesseract-OCR out of the box with preprocessing (no deskewing) 😟
CER 76, WER 100, LDR 27
This was done to see the impact of deskew separate from the other preprocessing steps
Tesseract-OCR out of the box with deskewing and preprocessing 👍
CER 19, WER 66, LDR 74
We can clearly see by the rating results, that significantly improvement was made by the deskewing and preprocessing steps combination on the text extraction
Tesseract-OCR with deskewing, preprocessing, and using a preciously custom Trained model with StorySquad handwriting samples. the ssq.traineddata can be found here 😟https://github.com/dakotagporter/tesstrain
CER 26, WER 85, LDR 71
We can see the specific traineddata model did not generalize well in this instance. As our rating worsen by using the ssq.traineddata.
Google.cloud.vision with no deskewing or preprocessing
CER 8.73, WER 19.04, LDR 91
Google.cloud.vision with deskewing and gray scaling CER 0, WER 5, LDR 95