Causal inference seeks to answer the question: “What would have happened if we had followed a different treatment policy?” Broadly, the methodology unfolds in two steps: (1) defining the causal world and specifying assumptions that connect it to the observable world, and (2) conduct estimation from observational data under those assumptions.

The first step is where the causality comes in. After that, the second step is just statistical inference. This post focuses on the first step, with an emphasis on the assumptions that makes the causal estimand identifiable. In other words, these assumptions allow us to translate an “imagined” counterfactual quantity into something we can identify from observed data.


Target of Causal Inference

What would have happened if we had followed a different treatment policy?” — Based on how we define the “what” and the “different treatment policy”, this question can be formulated into many different ways. Here I will use Conditional Average Treatment Effect (CATE), the most commonly targeted estimand, as an example to explain the principle of causal inference. Once we understand the principle, generalizing it to other formulation will be straightforward.

Let me first introduce the minimal notations before we dive in: I will use $A$ to denote treatment, $Y$ to denote outcome, and $X$ to denote samples covariates. For simplicity, I discuss binary treatment, i.e. $A\in\{0,1\}$. On the right is a simple causal directed acyclic diagram (cDAG), where the direction implies the causal relationship. Here, $Y$ is affected by $X$ and $A$, and $A$ is affected by $X$.

image.png

Following the Neyman-Rubin potential outcome framework, CATE is defined as

$$ E[Y^1 - Y^0|X=x] $$

where $Y^1$ and $Y^0$ is called the potential outcome under $A=1$ (treat) and $A=0$ (control), respectively. In plain language, we first assume that a sample with covariate $X$ has ground-truth underlying outcomes for both treatmeat and control arm, and then we want to know the difference of these two quantities.

Given $X$, we want to know $Y$ — ML people may already start thinking about how to train a regression model to estimate this quantity. However, from here we run into the fundamental problem of causal inference — we can never observe the true label. In reality, an individual either receive treatment or no — we can never observe the outcome for both situation. That said, this quantity lives in an “imaginary” world. Without the true label, we cannot fit a regression model to estimate the potential outcome. So, are we doomed?


Assumptions to make the target quantity identifiable

The solution is — we make assumptions to convert it into something observable. In statistic language, if these assumptions are met, the estimand is “identifiable” from the data. The standard assumptions are as follows:

With the three assumptions, CATE can be converted in this way:

$$ \begin{align} &E[Y^1-Y^0|X=x]\notag \\&= E[Y^1|X=x]-E[Y^0|X=x]\notag\\ &= E[Y^1|A=1,X=x] - E[Y^0|A=0,X=x] \quad \text{by ignorability}\notag\\ &= E[Y|A=1,X=x] - E[Y|A=0,X=x] \quad\quad \text{by SUTVA} \end{align} $$