*C. Hinsley*

*21 August 2021*

Whereas in elementary calculus we often attempt to find extrema of functions by obtaining their critical points — setting, say, $\frac{d}{dx}f(x) = 0$ and solving for the extreme point $x$ — we often find that we should like, instead, to find a *function* which minimizes some *functional*. Recall that a functional maps from a function to (in our case) a real number.

Suppose we want to find a function $x^*(t)$ defined on an interval $t \in [a, b]$ for $a, b \in \mathbb{R}$. Our criterion for this function is that it minimizes a functional $J(x)$ such that $J(x^*) \leq J(x)$ for all $x(t)$ defined on $t \in [a, b]$. We call $x^*$ an *extremal* of $J$ — this is analogous to the familiar idea of an extreme point of a function.

We clearly do not know how to differentiate $J(x)$, so we need some other way to determine an extremal. Suppose we have already found our extremal $x^*(t)$. For some sufficiently small deviation from this function (whatever that might mean in a particular circumstance), we should expect an arbitrarily small change in the value of the functional $J(x)$. Letting that "sufficiently small deviation from $x^*(t)$" be denoted $\delta x(t)$, we may equivalently say that for any small $\epsilon > 0$, there exists some $\delta x(t)$ so that

$$
J(x^*(t) + \delta x(t)) - J(x^*(t)) < \epsilon. \tag{1}
$$

We refer to $\delta x(t)$ as the *first variation,* or simply the variation, of $x^*(t)$.

A nice fact from [K-1] is that $\epsilon$ can be shown to only directly bound a chosen *norm* of $\delta x(t)$, so that $**\delta x(t)$ can take on any shape** due to being able to scale a function with its shape down by an arbitrarily small coefficient.

That's great — we now have a rudimentary analog of a critical point to a real function, but rather for a functional. We will generalize the left-hand side of the inequality above as the *increment* of $J$, written with a capital delta:

$$ \Delta J(x, \delta x) = J(x(t) + \delta x(t)) - J(x(t)). \tag{2} $$

The inequality $(1)$ then becomes $\Delta J(x^*, \delta x) < \epsilon$.

We now return to the idea of choosing norms on $\delta x(t)$; it turns out that doing so provides us with a notion of a local region of neighboring functions (for example, selecting $||\delta x|| < \alpha$ for some $\alpha > 0$ allows one to deal with all the functions $x + \delta x$ in some way "close to" the function $x$). The norm of a function $x$ is itself a functional, assigning to each function $x$ some number $||x|| \in \mathbb{R}$. Norm of functions obey three properties:

- Positive-definiteness: $||x|| \geq 0$, where $||x|| = 0$ if and only if $x$ is the constant function $x = 0$.
- Homogeneity: $||c \cdot x|| = |c| \cdot ||x||$ for $c \in \mathbb{R}$.
- Triangle inequality: $||x + y|| \leq ||x|| + ||y||$ for functions $x, y$.

Note that there are multiple functionals that qualify as norms. The norm you select will depend on the problem you are trying to solve; usually, the algebraic mess you find yourself in will signal what to look for in a norm.

In order to make use of norms, we first note that we can rewrite the increment $\Delta J$ as