Jupyter Notebook

Jacobian Matrix

Suppose $\mathbf{\hat{y}}$ is an $m$ length vector is a function of another variable vector $\mathbf{w}$ with length $n$ (i.e. $\mathbf{\hat{y}} = \psi(\mathbf{w}$), where $\psi: \mathbb{R}^n \to \mathbb{R}^m$). The Jacobian matrix (matrix with the first-order partial derivatives) of $\mathbf{\hat{y}}$ with respect to $\mathbf{w}$ is:

$\mathbf{\hat{y}}=[\hat{y}{i}] =\begin{bmatrix} \hat{y}{1}\\ \hat{y}{2}\\ \vdots\\ \hat{y}{m} \end{bmatrix}, \;\;\; \mathbf{w}=[w_{j}] =\begin{bmatrix} w_{1}\\ w_{2}\\ \vdots\\ w_{n} \end{bmatrix}, \;\;\; \mathbf{J}_\psi(\mathbf{w})=\frac{\partial\mathbf{\hat{y}}}{\partial\mathbf{w}}=\begin{bmatrix} \frac{\partial\hat{y}_1}{\partial w_1} & \frac{\partial\hat{y}_1}{\partial w_2} & \cdots & \frac{\partial\hat{y}_1}{\partial w_n}\\ \frac{\partial\hat{y}_2}{\partial w_1} & \frac{\partial\hat{y}_2}{\partial w_2} & \cdots & \frac{\partial\hat{y}_2}{\partial w_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial\hat{y}_m}{\partial w_1} & \frac{\partial\hat{y}_m}{\partial w_2} & \cdots & \frac{\partial\hat{y}_m}{\partial w_n} \end{bmatrix}$

Jacobian Matrix of Linear Combination

For instance, let $\mathbf{\hat{y}}=\mathbf{Xw} + b$, where $\mathbf{X}$ is a matrix independent from $\mathbf{w}$:

$\mathbf{X}=[x_{i,j}] =\begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,n}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,n}\\ \vdots & \vdots & \ddots & \vdots \\ x_{m,1} & x_{m,2} & \cdots & x_{m,n}\\ \end{bmatrix}, \;\;\; \mathbf{\hat{y}} =\begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,n}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,n}\\ \vdots & \vdots & \ddots & \vdots \\ x_{m,1} & x_{m,2} & \cdots & x_{m,n}\\ \end{bmatrix} \begin{bmatrix} w_{1}\\ w_{2}\\ \vdots\\ w_{n}\\ \end{bmatrix} + b =\begin{bmatrix} w_1x_{1,1} + w_2x_{1,2} + \cdots + w_nx_{1,n} + b\\ w_1x_{2,1} + w_2x_{2,2} + \cdots + w_nx_{2,n} + b\\ \vdots\\ w_1x_{m,1} + w_2x_{m,2} + \cdots + w_nx_{m,n} + b\\ \end{bmatrix}$

The Jacobian matrix of $\mathbf{\hat{y}}$ with respect to $\mathbf{w}$ is:

$\mathbf{J}{\mathbf{Xw}+b}(\mathbf{w})=\mathbf{\frac{\partial \hat{y}}{\partial w}} =\begin{bmatrix} x{1,1} & x_{1,2} & \cdots & x_{1,n}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m,1} & x_{m,2} & \cdots & x_{m,n}\\ \end{bmatrix} = \mathbf{X}$

The Gradient of Residual Sum of Squares

Suppose $f(\mathbf{\hat{y}})=\mathbf{\hat{y}}^T\mathbf{A}\mathbf{\hat{y}}$, where $\mathbf{A} \in \mathbb{R}^{m\times m}$ is a matrix independent from $\mathbf{\hat{y}}$, then $f(\mathbf{\hat{y}})$ is:

$f(\mathbf{\hat{y}}) =\begin{bmatrix} \hat{y}1 & \hat{y}2 & \cdots & \hat{y}m \end{bmatrix}\begin{bmatrix} a{1,1} & a{1,2} & \cdots & a{1,m}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,m}\\ \end{bmatrix}\begin{bmatrix} \hat{y}1 \\ \hat{y}2 \\ \vdots \\ \hat{y}m \end{bmatrix}=\begin{bmatrix} \sum\limits{i=1}^{m}\hat{y}ia{i,1} & \sum\limits{i=1}^{m}\hat{y}ia{i,2} & \cdots & \sum\limits{i=1}^{m}\hat{y}ia{i,m}\\ \end{bmatrix} \begin{bmatrix} \hat{y}_1 \\ \hat{y}2 \\ \vdots \\ \hat{y}m\\ \end{bmatrix}=\sum\limits{j=1}^{m}\sum\limits{i=1}^{m}\hat{y}_j\hat{y}ia{i,j}$

then the gradient (the partial derivatives of a function $f: \mathbb{R}^m \to \mathbb{R}$ with respect to a $m$ length vector) for the above mentioned function $f(\hat{y})$ with respect to $\hat{y}$ is (here for detail):

$\nabla_{\hat{y}}f =\begin{bmatrix} \frac{\partial f}{\partial\hat{y}1} & \frac{\partial f}{\partial\hat{y}2} & \cdots & \frac{\partial f}{\partial\hat{y}m} \end{bmatrix}^T =\begin{bmatrix} (\hat{y}1\sum\limits{j=1}^{m}a{1,j} + \sum\limits{i=1}^{m}\hat{y}ia{i,1}) & (\hat{y}2\sum\limits{j=1}^{m}a{2,j} + \sum\limits_{i=1}^{m}\hat{y}ia{i,2}) & \cdots & (\hat{y}m\sum\limits{j=1}^{m}a_{m,j} + \sum\limits_{i=1}^{m}\hat{y}ia{i,m}) \end{bmatrix}^T =[(\mathbf{A}+\mathbf{A}^T)\mathbf{\hat{y}}]^T$

Suppose we have another function that map an vector to a scalar $L(\mathbf{\hat{y}}) = (\mathbf{y} - \mathbf{\hat{y}})^T\mathbf{I}(\mathbf{y} - \mathbf{\hat{y}})$, where $\mathbf{\hat{y}}$ is a $m$ length vector independent from $\mathbf{y}$ ($L$ is actually the residual sum of squares between $\mathbf{y}$ and a linear regression model $\mathbf{\hat{y}}$)

$\mathbf{y}=[y_{i}]=\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots\\ y_{m}\\ \end{bmatrix}, \;\;\; \mathbf{\hat{y}}=[\hat{y}{i}]=\begin{bmatrix} \hat{y}{1}\\ \hat{y}{2}\\ \vdots\\ \hat{y}{m}\\ \end{bmatrix}=\mathbf{Xw}+b, \;\;\; L(\mathbf{\hat{y}})=(\mathbf{y} - \mathbf{\hat{y}})^T\mathbf{I}(\mathbf{y} - \mathbf{\hat{y}})=\sum\limits_{i=1}^{m}(y_i - \hat{y}_i)^2$