In the previous section, we talked about gradients of one function of many variables. This gave us a vector.
Now, we are going to talk about gradients of multiple functions and explore matrix calculus. Previously, we had a function, $f(x, y) = 3x^2y$. Now let’s say we also have another function, $g(x, y) = 2x+ y^8$. We can follow a strategy as the previous section and find $\nabla f \; \text{and } \nabla g.$
Then we assemble the two gradients vertically (or horizontally depending on your choice of layout).
Assembled gradients are given a special name called a Jacobian, J. So,
$$ \begin{align}
J =
y &= \begin{bmatrix} \nabla f \\ \nabla g \end{bmatrix}
&= \begin{bmatrix} 6xy \;\; 3x^2 \\ 2 \; \;\;\;\;\ 8y^7 \end{bmatrix}
\end{align} $$
This is the numerator layout. As said above, there’s also the denominator layout. The only difference is
$$ J' = J^T = \begin{bmatrix} 6xy \;\;\;\; 2 \\ 3x^2 \;\ 8y^7 \end{bmatrix} $$
To generalize the Jacobian Matrix, we will first group all the function parameters into a single vector:
$$ f(x_1, x_2, ...) \rightarrow f(\vec{x}) $$
We also have to define an orientation for vector . We’ll assume that all vectors are vertical by default of size n × 1:
$$ \begin{align}
\vec{x} &= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}
\end{align} $$
And if there are multiple scalar valued functions of many parameters, we can group those into a vector as well:
$$ \begin{align}
\vec{y} &= \begin{bmatrix} f_1(\vec{x}) \\ f_2(\vec{x}) \\ \vdots \\ f_m(\vec{x}) \end{bmatrix}
\end{align} $$
Let $\vec{y} = \vec{f}(\vec{x})$ be a vector of m scalar-valued functions that each take a vector $\vec{x}$ of length $n = |\vec{x}|$ where $|\vec{x}|$ is the cardinality (count) of elements in $\vec{x}$.