Histogram | Notion

git clone https://github.com/plumed/masterclass-21-2.git

Date: January 30, 2021

Topic: Histograms

Recall

Discrete case

We can estimate the probability mass function of a discrete random variable by computing a histogram. To do so we generate repeated samples from the distribution of interest and compute the number of times each of the possible values for the random variable appears, $n_i$. We then normalise our histogram by dividing by the total number of random variables that were generated, $N$. Our estimate of the probability mass function is thus given by:

$$ p_i = \frac{n_i}{N} $$

The way this procedure is implemented in python is explained in the video below:

https://www.youtube.com/watch?v=l2h9oNUCJsM&t=352s

Continuous case

We can estimate the probability density function for a continuous random variable by computing a histogram. In doing so we divide up the continuous range of values that the random variable might take into a discrete number of bins. We then generate repeated samples from the distribution of interest and compute the number of times the random variable falls in each bin, $n_i$. We then normalise our histogram by dividing by the total number of random variables that were generated, N, multiplied by the width of the bin, $d_i$. Our estimate of the probability density is thus given by:

$$ p_i = \frac{n_i}{d_i N} $$

The way this procedure is implemented in python is explained in the following video:

https://www.youtube.com/watch?v=-aS_CrskEYE

Maximum likelihood

It is straightforward to use Lagrange's method of undetermined multipliers to show that the estimators for the probability mass/density functions in the previous two sections are maximum likelihood estimators for a multinomial distribution. This proof is explained in the video that follows:

https://www.youtube.com/watch?v=YRHY_98bMdU

Errors

To make your histogram reproducible you can calculate percentiles for the distribution of values and thus report error bars on your estimate of the distribution. You can do this by generating multiple histograms as is explained in this video.

https://www.youtube.com/watch?v=9nLQ8KGIdwQ

You can also resample your first estimate of the histogram and use this to calculate your errors as is explained in this video:

https://www.youtube.com/watch?v=fyab5MG_Om4

Lastly, you can recognise that each estimate of the probability is a sample mean. You can thus use what you know about the variance of the sample mean and the central limit theorem to report the errors as is explained in the following video:

https://www.youtube.com/watch?v=UqRU6LtmJRA

<aside> 📌 SUMMARY: You can use maximum likelihood to derive an estimator for the probability mass/density function of a random variable. You can thus estimate these probability distributions by computing a histogram. It is important to quote error bars when you do so as the probability mass/density function you obtain by sampling is an estimate.

</aside>