Normal distribution withy scipy and plotly

Normal Distribution

Application example: the amount of time a user spends on a website, the height of all the students in a class.

The mean and median and mode fall at the center of the ideal normal distribution
Area under the curve is all the 1.0 and 100%
Normal distribution follows empirical rule 68-95-99.7
- 2sigma covers 68% of the data
- 4sigma covers 95% of the data
- 6sigma covers 99.7% of the data
- little tails 0.03 % of the data and equally divided on both sides.
Percentile 95th means 95% of the data lies below that curve
Other forms of Gaussian is t-distribution/exponential distribution

Z-score

How far is the value from the mean in terms of standard deviations? That is defined by z-score.
$z= (X-u)/sigma$ u is mean, X is data point, sigma is std deviation.
for ex: if mean is 16 and the value we are looking for is 17.5, and std is 1, (17.5-16) / 1 z-score is 1.5 then we look into the table it says this value z-table is 93.38% above all the values.
If the value of z is 1.5 that means its 1.5 std above the mean and the area under the curve will always be the same.
The z value and the z-table is used to identify the percentage of area under the curve, which is also the p-value
z-score always give area left to the curve

z-statistic

When population standard deviation is known we can use the z-score
When population standard deviation is unknown and we have sample greater than 30 samples, we use z-statistic for proportions only if we can assume the population is normally distributed, we use sample standard deviation in the formula, that can be proven by

Probability Density Function

It means how much probability is concentrated per unit length (d𝒙) near 𝒙, or how dense the probability is near 𝒙. x is any point on the x-axis and y is the P(x)

<aside> 💡 The probability Density curve is not the same as the Probability of a function at (X=x). It is the integral of the probability density function.

</aside>

Plots

Probability Density Function Plot

norm.pdf returns a PDF value, we can use this function to plot the normal distribution function.
The following is a PDF of the normal distribution using scipy, numpy and plotly.
The domain of −10<𝑥<10, the range of 0<P(𝑥)<0.20, the default values 𝜇=0 and 𝜎=1

from scipy.stats import norm
import scipy.stats as stats
import plotly.graph_objs as go
import plotly.graph_objects as go
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, plot_mpl
colors=['#151515','#f0c24f']
fig = go.Figure()
x = np.linspace(-10,10,1000)
**p = norm.pdf(x, scale=2)**
fig.add_trace(go.Scatter(y=p , x=x, mode='lines+markers',marker_color=colors[1]))
fig.update_layout(title="Probability Density Function mu=0, sigma=2", 
                 legend=dict(x=.05,y=0.95, traceorder='reversed', font_size=16), 
                 width=500,
                 height=500,
                 yaxis=dict(
                          title="Probability Density Function P(x)",
                 titlefont=dict(
                          color="#1f77b4"
                                ),
                 tickfont=dict(
                        color="#1f77b4"
                               )
  ))
fig.show()

Probability Density Function with varying standard deviation

What does this really mean?
- It means if we have a set of data say between 1000 linearly spaced points between (-10 to 10) and we plot a graph for constant mean =0 and keeping varying the sigma between [2,4,6].
- The observation is the graph gets broader which means the probability density is spreading out.