Sources
https://www.maartengrootendorst.com/blog/quantization/
https://www.youtube.com/watch?v=mii-xFaPCrA
https://www.youtube.com/watch?v=0VdNflU08yA
Quantization came into effect because large language models contain billions of parameters, which are weights. During inference activations are created as a product of the input weights, which are similarly very large. If we really want to store all these weights actively in our memory of the GPU, it will take a lot of space.
To store them by optimizing them first so that they don't take that much of space, that is why we bring quantization into the picture.
Represented as “bits”, or binary digits which are represented as - sign, exponent, fraction(mantissa)
The the more bits we use to represent a value, the more precise it really is.
The more bits we have available the larger the range of the values that can be represented.

Dynamic range - the range that a representation like FP32 or FP16 can present - like min this number, max this number.
Precision - the distance between two neighboring values.