A Review of Failure - Deep Generative Model for Chinese Fonts

<aside> 👉 Written by *Lucy* Jialin Lu, Nov 30th 2019. This is a review of a failed project.

</aside>

I started this project, namely deep learning for the generative modeling of Chinese Fonttype, two years ago when I interned as a research assistant at the University of Hong Kong working with Prof. Li-yi Wei. The essence of this project is to learn a generative model for Chinese Fonts with a hope of the flexibility and premise of deep learning. I spent a lot of time, tried a lot, even a long time after the internship but this project unfortunately failed. Frankly, it failed out of three reasons:

The first reason is that I tried a lot of methods but didn't result well.
The second reason is simply a personal matter: I am now doing my master's in Canada on a different topic so I do not have extra time on this. So I have to pause to work on this anymore.
The last reason is that Li-yi also gradually lost interest in this project, partly because of his new job, partly because of the limited-or-next-to-zero research output from me.

So you can see that in fact, it is entirely my fault. All this happens because I failed to produce a reasonable POSITIVE research output.

But from the failures, I also began to understand the real obstacles to this problem. The painful and bitter lesson is: if we really wish to tackle this hard problem of learning a generative model for Chinese fonts, there are three key challenges that we inevitably need to conquer. Thus here in this post, I will summarize the three key challenges I found, during my failed quest for the generative modeling of Chinese fonts.

This post is more or less intended for interested researchers and the future me, because soon or later I will come back to it (at least I hope so). I always have a mania on Chinese fonts. My father is a calligrapher specialized in Wei-Bei (魏碑, roughly the calligraphy works of the northern dynasties between AD 420–589) and makes a living for our family by it. I myself studied calligraphy, specializing in Zhao Mengfu and Chu Suiliang, and I've twice made it to the national prize exhibition of young calligraphers, although due to many reasons I've been away from my brush pen for a long time since the beginning of college. Anyway, I still think it is highly likely that I will resume this failed project.

A glimpse of contents:

Three Key Challenges

The three key challenges I found, of the generative modelling of Chinese fonts are:

Key challenge #1: A font consists of glyphs (for each character) which are represented as vector images, instead of the more comfortable pixel images. In particular, we say each glyph is parametrized in Bezier curves.

However, current methods almost only work properly for pixels (raster images). How to handle the unordered, not-fixed-size, irregular representation of data is still an open challenge for deep learning. Pre-training in pixel image and then transferring to vector image seems to be currently the best practice. But this is far from satisfactory.
Key challenge #2: The Chinese language has an extremely large alphabet. Any working font for commercial use will need a collection of at least 6000~7000 glyphs (the complete alphabet will be more than 40,000). This does not even count some other critical issues, such as
- A tricky phenomenon called Variant Chinese character (异体字) which are different glyphs for the same character such as 飃&飄, 嶋&島, 回&囘&囬. They are technically the same thing, but just in different appearances.
- Different glyphs for the same character for traditional Chinese and simplified Chinese, or even Japanese Kanji. It means the alphabet needs to be at least 1.5 times bigger (roughly).
Current methods struggle to work for a fixed small size of glyphs, such as the Latin alphabet. Scaling up to 6000 glyphs seems to be computationally infeasible.
Key challenge #3: The compositional nature of glyphs: The reason why the Chinese font has a large alphabet is that Chinese font is compositional. One glyph might consist of many "parts" and any parts can be re-used to form different glyphs. Parts are composed into a glyph following some aesthetic principles. This is also what happens in the design studios of Chinese font designers. Designers would first come up with
- a collection of basis "parts" and
- a collection of important glyphs that give the aesthetic principles on how to compose "parts" into a glyph.
Chinese glyphs are, by a large margin, more complex than Latin characters. My opinion is that explicit consideration of compositionality will be essential. It seems that we need a more "model-based" network rather than a "model-free" network (like the notion of model-based RL methods and model-free ones). A proper new architecture with inductive biases that can handle compositionality is desired.