Date: October 5, 2019

Topic:

Recall

Notes

<aside> 📌 SUMMARY:

</aside>


Date: January 24, 2021

Topic: The cumulative probability distribution function

Recall

What does the cumulative probability distribution function measure?

Give three properties that the cumulative probability distribution function has?

How can you obtain an estimate of the cumulative distribution from a data set?

Notes

The cumulative probability distribution function of a random variable is a function $P(X\le x)$ which tells you the probability that the random variable $X$ is less than or equal to $x$. This function has the following three properties:

$$ \begin{aligned} \lim_{x\rightarrow -\infty} P(X\le x) & = 0 \\ \lim_{x\rightarrow \infty} P(X\le x) & = 1 \\ \lim_{\epsilon \rightarrow 0} P(X\le x + \epsilon) & = P(X\le x) \end{aligned} $$

One formal definition of this function is as follows:

If each event, s, in the sample space $\Omega$, has a value, $x(s)$, taken from the set of real numbers, $\mathbb{R}$. Then there exists a cumulative probability distribution function, $P(X\le x)$, that maps each subset of $\Omega$ that can be formed using $A(x') = \{s: (s\in\Omega)\wedge (x(s)\le s')\}$ to a number from the set of real numbers, between 0 and 1. This function has the three properties that were given above.

This definition is explained in the following video:

https://www.youtube.com/embed/qbbTEZ4NlCI

If you have repeated results from an experiment you can get information on the cumulative probability distribution function that was sampled in the experiment by sorting the data into ascending order as is discussed in this video:

https://www.youtube.com/embed/VaZTKmcxLvY

The following meanwhile explains how you can use python to plot the cumulative probability distribution using this idea.

https://www.youtube.com/embed/fQ0Iy0Sew_U

Notice, last of all that in python you can calculate the $p$th percentile of the data in a numpy array dataset using:

import numpy as np

percentile = np.percentile( dataset, p )

This function uses linear interpolation as is described in the video below:

https://www.youtube.com/embed/UUbkt9nA3Mc

<aside> 📌 SUMMARY: The cumulative probability distribution function $P(X\le x)$ for a random variable $X$ gives the probability that the random variable is less than or equal to $x$. You can obtain an estimate of this function from a dataset by sorting the data set into ascending order.

</aside>