Weighted histograms

Date: @today

Topic: Weighted histograms

Recall

Reweighting

Suppose that we have a probability distribution that is difficult to sample $\mathbf{p}$. Further, suppose that we want to estimate this distribution by calculating a histogram. Sadly we cannot do so because we do not know how to sample $\mathbf{p}$ effectively. When we are in such a situation we can build a histogram by extracting samples from some other distribution $\mathbf{p}'$. We can then get $\mathbf{p}$ by reweighting using this expression:

$$ \mathbf{p} = \mathbf{w} \mathbf{p}' \qquad \textrm{where} \qquad \mathbf{w} = \frac{\mathbf{p}}{\mathbf{p}'} $$

It is straightforward to show using maximum likelihood that the elements of the distribution of interest $\mathbf{p}$ can be estimated using:

$$ p_i \propto \frac{x_i}{w_i} \qquad w_i = \frac{p_i'}{p_i} $$

where $x_i$ is the number of times bin $i$ is visited when we sample from $\mathbf{p}'$ and where $p_i$ and $p_i'$ are the (unnormalized) probabilities of being in bin $i$. A proof of this result is given in the following video:

https://www.youtube.com/watch?v=3dn7-vaIP1g

Errors

To make your weighted histogram reproducible you can calculate percentiles for the distribution of values and thus report error bars on your estimate of the distribution. You can do this by generating multiple histograms. Alternatively you can do this by resampling the histogram you generated as is explained (for unweighted histograms) in this video:

https://www.youtube.com/watch?v=fyab5MG_Om4

Lastly, you can recognise that each estimate of the probability is a weighted mean (with random weights) You can thus use what you know about the variance of weighted means and the central limit theorem to report the errors. This idea is explained (for unweighted histograms where you use what you know about the expectation and variance of a sample mean) in the following video:

https://www.youtube.com/watch?v=UqRU6LtmJRA

<aside> 📌 SUMMARY: You can use weighted histograms to estimate probability distributions that are hard to sample. This method works by sampling from some easier to sample distribution.

</aside>