1. Introduction

2. Related work

Image Classification

The Transformer architecture

Knowledge Distillation

3. Visual transformer: overview

Multi-head Self Attention layers (MSA)