Lets take the example of height:
We take bunch of measurement of height of people.
We measured so many people that points overlap each other, and some points are hidden.
In order to solve this we have divided them into range of values into bins.
Stack the measurement that falls in the same bins.
The taller the stack within the bin, more measurement we made fall into that bin
We can predict the future measurements probability using the histogram.
The shorter the stack, measurements are rare and so is their probability for the feature measurements to be fall in this(tails) section for our histogram.
One can use distribution for this. In this we can assume normal distribution.
If the data looks like this, it would have been exponential distribution.
It is tricky to find specific number of bins. One should try different number of bins to get a good distribution.
If too many bins, points are more horizontally spread, too less, more vertically spread.