Untitled

$f(\bold x;W,\bold b)=W\bold x+\bold b$

Bias trick: extend vector to have a extra 1, and matrix W to have b as a column

X_train = np.hstack([ X_train, np.ones((X_train.shape[0], 1)) ])

$f(\bold x;W,\bold b)=W\bold x$

(this trick is not that convenient to use in NN)

Multiclass Support Vector Machine(SVM) classifier

hinge/max-margin loss

hyperparameter $\Delta$
The Multiclass Support Vector Machine "wants" the score of the correct class to be higher than all other scores by at least a margin of $\Delta$.
- for those class with score less than the real class with more than $\Delta$, it’s fine
- otherwise, its difference will be accumulated in loss
$\text{example }i: (x_i,y_i) \to s$
- $y_i$ are the ground truth labels
- the output vector s contains the scores for each class

Structured SVM [Weston Watkins 1999]:

hinge $L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)$
squared hinge loss: in the form of max^2 to give a larger penalty to difference

One vs All (OVA): a separate binary SVM is trained for every class independently. The arguably simplest OVA strategy is likely to work just as well (as also argued by Rikin et al. 2004 in In Defense of One-Vs-All Classification (pdf) )

All-vs-All (AVA): least common

Multiclass Support Vector Machine(SVM) classifier

hinge/max-margin loss

observation