Input space : I, output space : O
An ideal model is a connotative mapping function
$$ f : x \mapsto y, x \in \mathcal{I}, y \in \mathcal{O} $$
The knowledge of a neural network
$$ \hat{f} : x \mapsto \hat{y}, x \in \mathcal{I}, \hat{y} \in \mathcal{O} $$
The differences is the dark part of the neural network’s knowledge
Are all student networks equally capable of receiving knowledge from different teacher?
8 teacher models
5 different student architectures from the search space defined by MNAS
students 모두 다른 성능을 보였고, 모든 teacher network에 대해 best result를 보인 student는 없었음
Distribution
Accuracy
미리 정해진 student를 사용하는 것은 student의 parameters를 단순히 teacher’s architecture를 배우는데 사용할 뿐 optimal solution이 아님