For a target parameter (estimand), we seek a regular and asymptotic linear (RAL) estimator, otherwise the confidence interval based on asymptotic variance may not be valid. (spoiler: every RAL estimator has a influence function, and the variance of influence function is the asymptotic variance.)
If we can remove the plug-in bias, then the resulting estimator attain asymptotic linearity and efficiency (efficiency = asymptotic linearity?).
However, RAL doesn’t guarantee efficiency. To reach efficiency bound, the estimator’s influence function should be the efficient influence function.
<aside> 💡
efficiency bound: the influence function has the lowest possible asymptotic variance among all RAL estimator.
</aside>
<aside> 💡
Abstract
Goal: Efficiently estimate a target parameter in a semi-parametric model, such as ATE.
Challenge: In the semi-parametric context, while plug-in estimator are consistent under regularity conditions (?), they often fail to achieve asymptotic linearity because of plug-in bias. One reason is the non-linear functional could propagate the error of distribution estimation nonlinearly to the target parameter, leading to bias and loss of efficiency.
Method: Remove the first-order (leading) component of the plug-in bias by targeting the estimator along a parametric submodel. Specifically, TMLE update the initial estimate so that the empirical mean of efficient influence function (EIF) is approximately zero.
Components:
Given a dataset of $n$ observations $\{O_1, \dots,O_n\}$ sampled from a nonparametric distribution $P_0 \in \mathcal{M}$, we are interested in estimating a $d$-dimensional parameter $\psi$ (a.k.a target parameter), which is mapped from the distribution, i.e. the target functional is $\Psi: \mathcal{M} \rightarrow\mathbb{R}^d$.
For simplicity, I didn’t formally define $\mathcal{M}$ here. We just need to know it is the space where the distribution lives. For its formal definition, see xyz.
The semi-parametric theory cares about the perturbation of the distribution. Specifically, we ask ‘when the distribution is perturbed, how will the target parameter change’. TMLE leverage such property and perturb the estimated distribution toward the ground truth distribution. Below, we are going to define several tools to help us describe the perturbation.
For any $p, q \in \mathcal{M}$. Consider the $\epsilon$-perturbed distribution
$$ p_\epsilon = (1-\epsilon)\cdot p + \epsilon \cdot q = p + \epsilon (q-p)\text{, for small } \epsilon \in \mathbb{R} $$
$\Psi$ is Gateaux differentiable at $p$ if, for all $q \in \mathcal{M}$, the following limits (a.k.a. Gateaux derivative) exists:
$$ \frac{d}{d\epsilon}\Psi(p_\epsilon)\Big|{\epsilon=0} = \underset{\epsilon \rightarrow 0}{\lim}\frac{\Psi(p\epsilon)-\Psi(p)}{\epsilon} $$