Matrix Calculus

Introduction

In the previous section, we talked about gradients of one function of many variables. This gave us a vector.

Now, we are going to talk about gradients of multiple functions and explore matrix calculus. Previously, we had a function, $f(x, y) = 3x^2y$. Now let’s say we also have another function, $g(x, y) = 2x+ y^8$. We can follow a strategy as the previous section and find $\nabla f \; \text{and } \nabla g.$

Then we assemble the two gradients vertically (or horizontally depending on your choice of layout).

Assembled gradients are given a special name called a Jacobian, J. So,

$$ \begin{align}

J =

y &= \begin{bmatrix} \nabla f \\ \nabla g \end{bmatrix}

&= \begin{bmatrix} 6xy \;\; 3x^2 \\ 2 \; \;\;\;\;\ 8y^7 \end{bmatrix}

\end{align} $$

This is the numerator layout. As said above, there’s also the denominator layout. The only difference is

$$ J' = J^T = \begin{bmatrix} 6xy \;\;\;\; 2 \\ 3x^2 \;\ 8y^7 \end{bmatrix} $$

Generalization of the Jacobian

To generalize the Jacobian Matrix, we will first group all the function parameters into a single vector:

$$ f(x_1, x_2, ...) \rightarrow f(\vec{x}) $$

We also have to define an orientation for vector . We’ll assume that all vectors are vertical by default of size n × 1:

$$ \begin{align}

\vec{x} &= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

\end{align} $$

And if there are multiple scalar valued functions of many parameters, we can group those into a vector as well:

$$ \begin{align}

\vec{y} &= \begin{bmatrix} f_1(\vec{x}) \\ f_2(\vec{x}) \\ \vdots \\ f_m(\vec{x}) \end{bmatrix}

\end{align} $$

Let $\vec{y} = \vec{f}(\vec{x})$ be a vector of m scalar-valued functions that each take a vector $\vec{x}$ of length $n = |\vec{x}|$ where $|\vec{x}|$ is the cardinality (count) of elements in $\vec{x}$.