The gradient of a function of many variables, $\\nabla f(x,y,z,...)$ is a vector of its partials, $\\left[\\frac{\\partial f}{\\partial x}, \\frac{\\partial f}{\\partial y}, \\frac{\\partial f}{\\partial z}, ... \\right]$.
Neural networks are almost never functions of just one parameter, $f(x)$ but instead functions of many variables or parameters, $f(x, y, ...).$ But how do you take the derivative of some function like $f(x, y) = xy$?
We compute the derivative w.r.t one variable at a time while treating the rest as constants. So in this case
$$ \frac{\partial f}{\partial x}=\frac{\partial (xy)}{\partial x}=y \\ \frac{\partial f} {\partial y}= \frac{\partial (xy)}{\partial y} = x $$
Or if $f(x, y) = 3x^{2}y$ then
$$ \frac{\partial f}{\partial x}=\frac{\partial (3x^2y)}{\partial x}=6xy \\ \frac{\partial f} {\partial y}= \frac{\partial (3x^2y)}{\partial y} = 3x^2 $$
Now, we have these partial derivatives but what exactly do we do with them? To make it clear that we are doing vector calculus we put the partials inside a vector. So then we get,
$$ \nabla f(x, y) = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right] = [6xy, 3x^2] $$