<aside> 💡 "To draw you must close your eyes & sing." — Pablo Picasso

</aside>

Reaching back through the hallowed halls of art museums & human history, "style" is a notion oft-regarded as beautifully intangible, & yet a simultaneously visceral thing.

It is Picasso's Cubism and impressionist paintings by Monet. It is the confusingly entrancing images of Salvador Dali and baroque architecture which overwhelmed 17th Century Europe. And while one could certainly attempt to imitate these styles, there exists a sense of "soul" within art, something incredibly hard to pin down despite being exceedingly obvious at the same time.

Introduction to Style Transfer

This is why Neural Style Transfer is so, incredibly cool! In recent years, we've seen increasing attempts to conduct "style transfer", where we are recomposing the contents of an image in the style of another. While we did come up with techniques such as non-photorealistic rendering prior, these methods were pretty inflexible and inefficient. Then came Leon.

In 2015, Leon Gatsys et. al released the paper, "A Neural Algorithm in Style", proving the viability of using Deep Neural Networks (DNNs) to conduct style transfer. This was exciting because it demonstrated that DNNs could disentangle the representation of the "content" (structure) of an image from its "style" (appearance). Essentially, they had found a way to encode these somewhat abstract concepts into concrete mathematical functions that could be manipulated & computed.

Neural Style Transfer

The objective of the style transfer algorithm is to (i) synthesise a generated image by (ii) minimising the difference in "content" between the content image & generated image, while simultaneously (ii) minimising the style difference between the style image & generated image.

A represents the content image, while the smaller picture represents the style image — both of which are synthesised to produce composite image B.

A represents the content image, while the smaller picture represents the style image — both of which are synthesised to produce composite image B.

To understand how the model first understands what "content" & "style" are, we'd first have to understand the model itself. Leon Gatsy introduced a model using the VGG-19 architecture, a type of Convolutional Neural Network (CNN), which in itself is a subset of the Deep Neural Networks we mentioned earlier.

Convolutional Neural Networks

CNNs can be thought of as self-assembling robots which train themselves to get really good at image recognition. Sounds like a weird analogy? It probably is - but I couldn't think of a better one and we're getting off-track.

This CNN is a model composed of multiple convolutional layers in the same way that a robot is a structure composed of a bunch of modular components. (There's other stuff too like non-linearity & max pooling, but that's non-essential right now.)