Entropy Minimization
$$ p_{\text{model}}(y|x; \theta) $$
Consistency Regularization
Encourage the model to produce the same output distributions when its inputs are perturbed
augmentation에 대해서 output에 대해 consistency를 가지도록
$$ ||p_{\text{model}}(y | \text{Augment}(x);\theta)-p_{\text{model}}(y|\text{Augment}(x);\theta)||^2_2 $$
Mean Teacher
$$ J(\theta)=\mathbb{E}{x,\eta^{\prime},\eta}[||f(x,\theta^{\prime},\eta^{\prime})-f(x,\theta,\eta)||^2] \newline \theta_t^{\prime}=\alpha \theta{t-1}^{\prime}+(1-\alpha)\theta_t $$
Virtual Adversarial Training
$$ \text{LDS}(x_,\theta) := D[p(y|x_,\hat{\theta}), p(y|x_+r_{\text{vadv}},\theta)] \newline r_{\text{vadv}} := \text{argmax}_{r;||r||2<=\epsilon} D[p(y|x,\hat{\theta}), p(y|x_*+r)] $$
Generic Regularization
Encourage the model to have strictly linear behavior between examples
$$ \tilde{x} = \lambda x_i + (1-\lambda) x_j, \text{where } x_i, x_j \text{ are raw input vectors} \newline \tilde{y} = \lambda y_i + (1-\lambda) y_j, \text{where } y_i, y_j \text{ are one-hot label encodings} $$
MixMatch는 위의 세 방법을 모두 사용!
$$ \mathcal{X}^\prime, \mathcal{U}^\prime = \text{MixMatch}(\mathcal{X}, \mathcal{U}, \mathcal{T}, \mathcal{K}, \alpha) \newline \mathcal{L}\mathcal{X} = \frac{1}{\mathcal{X}^\prime}\sum{x,p\in\mathcal{X}^\prime} \text{H}(p,\text{p}\text{model}(y|x;\theta)) \newline \mathcal{L}\mathcal{U} = \frac{1}{L|\mathcal{U}^\prime|} \sum_{u,q\in\mathcal{U}^\prime} ||q-\text{p}\text{model}(y|u;\theta)||^2_2 \newline \mathcal{L} = \mathcal{L}\mathcal{X} + \lambda_\mathcal{U}\mathcal{L}_\mathcal{U} $$
Data Augmentation
Label Guessing
$$ \bar{q}b = \frac{1}{K}\sum^K{k=1}\text{p}{\text{model}}(y|\hat{u}{b,k};\theta $$
Sharpening
$$ \text{Sharpen}(p,T)_i := p^{\frac{1}{T}}i / \sum^L{j=1}p^{\frac{1}{T}}_j $$
MixUp
$$ \lambda \sim \text{Beta}(\alpha, \alpha) \newline \lambda^\prime=\text{max}(\lambda, 1-\lambda) \newline x^\prime=\lambda^\prime x_1 + (1-\lambda^\prime)x_2 \newline p^\prime = \lambda^\prime p_1 + (1-\lambda^\prime)p_2 $$
Loss Functions
$$ \mathcal{X}^\prime, \mathcal{U}^\prime = \text{MixMatch}(\mathcal{X}, \mathcal{U}, T, K, \alpha) \newline \mathcal{L}\mathcal{X}=\frac{1}{|\mathcal{X}^\prime|} \sum{x,p\in\mathcal{X}^\prime} \text{H}(p,\text{p}{\text{model}}(y|x;\theta)) \newline \mathcal{L}\mathcal{U} = \frac{1}{L|\mathcal{U}^\prime|} \sum_{u,q\in\mathcal{U}^\prime} ||q-\text{p}\text{model}(y|u;\theta)||^2_2 \newline \mathcal{L}=\mathcal{L}\mathcal{X}+\lambda_\mathcal{U}\mathcal{L}_\mathcal{U} $$
Hyper-parameters