why would fluctuation around the equilibrium decrease exponentially to entropy (which is related somehow to the number of possible states)?

how to connect the N as steps of random walk in sqrt(N) for standard deviation to sample size or timesteps that go higher towards equilibrium distribution converging?

CLT → equilibrium

large deviations → fluctuation from equilibrium

I'm very confused by the account on fluctuations near the equilibrium in chapter 12 of Landau's book on statistical physics. To be brief, the kernel of my doubt is that he states that if you have a observable X such that equilibrium is attained at X=0, then the probability of having a fluctuation of size x is given by the exponential of the entropy evaluated at x. I wonder, what is the mathematical definition of this notion of entropy?

𝑆=logΩ

𝑝∼exp(Δ𝑆)

Large Deviations Overview

Motivation

“Type” is a concept I didn’t fully get in the data compression class. Yet I learned from the information theory textbook that types are used widely for interpreting samples. Not only can we look for averages, we can also look for fluctuations and how off a sample might be from the true distribution. So in this report I will look into large deviations principle, especially the key result Sanov’s theorem.

Reflecting on ways I learned best in class, I wanted to start with very introductory materials and do exercises that are contained or slightly harder than the materials. Even concepts as fundamental and widely used as large deviations contain ideas beyond my current understanding. I consulted lectures from Touchette, 2021 and Tulsiani, 2022 to guide my notes and try their exercises. For parts I am not solid, I elaborate with prose and try as far as can.

Background

When you look at symbols drawn from an alphabet according to a source distribution, not all sampled symbols are exactly at the most visited symbol set. Central limit theorem looks at symbols that are close to the most visited, with sample mean. Large deviation principle looks at symbols that are far from the most visited, with fluctuations. The gist of large principles is very simple — as system size grows large ($n→\infty$), the probability of visiting a non typical state is exponentially small. This exponential form comes from all kinds of distributions. Though, we mostly look at sequences of IID random variables in discrete systems to start. Sanov’s theory is a key result that bounds the probability of observing a state relative to the typical state. I will elaborate on the terms in the quantitative exercises.

Cartoon examples.

Applications

Hypothesis test: There are two hypothesis based on different distributions (e.g. $H_0: (X_1,…,X_n) \sim_{i.i.d} P = N(0,1)$ And $H_1: (X_1,…,X_n) \sim_{i.i.d} P = N(2,1)$. The frequenting question would be given a sample, what was the source distribution you sampled from? Without large deviation, people could find all possible configurations (hard), or use Gaussian distribution to get sample mean (Central Limit Theorem, but hard to approximate large fluctuations on the tails).

At the scaling limit of large system size, the exponent is good enough usually. With large deviation, we can count each variable and then approximate the exponential form and estimate the rate on the exponent. Counting informs how hypothesis tests can be designed because if we expect that the probability of an element to approach a certain value, then the minimum samples we need to take would be the number of elements * the probability. The lecture said that you can make numerical estimates by generating samples from a simulation, with Monte Carlo methods.

Data compression: The minimal probability of error tends to zero with longer and longer length compression, but the rate that the error probability converges can get faster if we assign shorter codes to more typical sequences, and longer codes to more rare sequences. The lecture remarked that we can prove source coding theorem with Sanov’s theorem, but I haven’t gone through that yet. And I haven’t looked at how the quantum version would look, except probably it takes extending into real numbers for states.

In physical systems, there are some quantities that you can observe. In gas that might be temperature or work that come from random variables integrated over time. Each has several possible configurations of particles it can come from. You can calculate distributions that show the most probable values which are equilibrium states. Then you can find fluctuation from those typical values by using large deviation principles.

Concept map connecting large deviation theory to QCIT class concepts.