Untitled

$f(\bold x;W,\bold b)=W\bold x+\bold b$

Bias trick: extend vector to have a extra 1, and matrix W to have b as a column

X_train = np.hstack([ X_train, np.ones((X_train.shape[0], 1)) ])

$f(\bold x;W,\bold b)=W\bold x$

(this trick is not that convenient to use in NN)

Multiclass Support Vector Machine(SVM) classifier

hinge/max-margin loss

Structured SVM [Weston Watkins 1999]:

One vs All (OVA): a separate binary SVM is trained for every class independently. The arguably simplest OVA strategy is likely to work just as well (as also argued by Rikin et al. 2004 in In Defense of One-Vs-All Classification (pdf) )

All-vs-All (AVA): least common

observation