Sanov’s Theorem and the Geometry of Rare Events: A Modern Large‑Deviation Perspective
- Get link
- X
- Other Apps
Large-deviation theory provides a powerful mathematical framework for quantifying the probability of rare events in stochastic systems. Among its foundational results, Sanov’s theorem plays a central role: it characterizes the exponential decay of probabilities associated with atypical empirical distributions. This perspective is deeply connected to information theory, statistical mechanics, and the analysis of stochastic processes.
1. Introduction
Many stochastic systems—ranging from molecular dynamics to neural activity—exhibit fluctuations that are typically small but occasionally produce rare, high-impact deviations. Understanding the probability of these deviations requires tools that go beyond classical variance-based approximations. Sanov’s theorem provides exactly such a tool: a geometric and information-theoretic description of rarity.
This post introduces the theorem, its intuition, and its connection to diffusion processes through discrete approximations and contraction principles. A companion post explores applications in biology and neuroscience.
Notation
For clarity, we summarize the variables and symbols used throughout this post:
- \(X_1, \dots, X_n\): independent and identically distributed (i.i.d.) random variables.
- \(P\): true underlying probability distribution of the samples.
- \(\hat{P}_n\): empirical measure of the sample, \(\hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}\).
- \(Q\): alternative probability distribution used to describe atypical empirical behavior.
- \(D_{\mathrm{KL}}(Q\|P)\): Kullback–Leibler divergence between \(Q\) and \(P\).
- \(\mathbb{P}(\hat{P}_n \approx Q)\): probability that the empirical measure is close to \(Q\).
- Rate functional: the quantity governing the exponential decay of rare events.
Note on KL divergence
The Kullback–Leibler divergence is defined as
\[ D_{\mathrm{KL}}(Q\|P) = \int \log\!\left(\frac{dQ}{dP}\right)\,dQ, \]
and measures the “cost” of observing empirical behavior consistent with \(Q\) when the true distribution is \(P\). It plays the same role as an energy barrier in large-deviation theory.
2. Sanov’s Theorem: Statement and Intuition
2.1 Setup
Let \(X_1, \dots, X_n\) be i.i.d. random variables taking values in a Polish space \(E\), with common law \(P\). The empirical measure is
\[ \hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}. \]
Sanov’s theorem states that the sequence \(\hat{P}_n\) satisfies a large deviation principle (LDP) with rate function
\[ I(Q) = D_{\mathrm{KL}}(Q\|P), \]
the Kullback–Leibler divergence between \(Q\) and \(P\).
2.2 Geometric and Informational Interpretation
The theorem reveals that the “cost” of observing an empirical distribution \(Q\) different from the true distribution \(P\) is precisely the relative entropy between them. This connects large deviations to:
- the geometry of probability measures,
- the principle of maximum entropy,
- Bayesian updating and variational inference.
3. From Discrete Noise to Diffusion Processes
Although Sanov’s theorem applies to i.i.d. samples, its influence extends to continuous-time stochastic processes through discretization and contraction principles.
3.1 Brownian Motion as a Limit of i.i.d. Increments
Consider the SDE
\[ dX_t = b(X_t)\,dt + \sigma(X_t)\,dW_t. \]
A standard discretization uses Gaussian increments \(\Delta W_i\), which are independent. Sanov’s theorem provides an LDP for the empirical distribution of these increments. Through the Euler scheme, these increments map continuously to discrete trajectories of the SDE.
By the contraction principle, the LDP for increments induces an LDP for trajectories. In the small-step limit, one recovers the classical Freidlin–Wentzell rate functional:
\[ I(\phi) = \frac{1}{2}\int_0^T \|\dot{\phi}(t) - b(\phi(t))\|^2_{\sigma^{-1}}\,dt. \]
3.2 Extension to SPDEs
For SPDEs driven by space-time white noise, discretization produces arrays of i.i.d. Gaussian variables. Sanov’s theorem again yields an LDP for their empirical distribution. Finite-difference schemes map these increments to approximate SPDE trajectories, and the contraction principle leads to infinite-dimensional LDPs such as:
- Dawson–Gärtner LDP for measure-valued processes,
- Budhiraja–Dupuis variational representations.
4. Applications of LDPs for Diffusion Processes
- Exit times and metastability
- Hitting probabilities
- Occupation measures
- Invariant measures under perturbations
- Entropy production in non-equilibrium systems
- Error bounds for numerical schemes
5. Concluding Remarks
Sanov’s theorem provides the microscopic entropy structure underlying many large-deviation principles for diffusion processes. Rare paths arise from rare configurations of noise increments, and their probabilities are governed by relative entropy. The next post explores how this perspective illuminates rare events in biology and neuroscience.
References
- Thiéry A. Sanov’s
Theorem. GitHub Pages. 2023. Available from: https://alexxthiery.github.io/notes/sanov/sanov.html [alexxthier….github.io]
- Sanov’s theorem –
Wikipedia. Available from: https://en.wikipedia.org/wiki/Sanov%27s_theorem [en.wikipedia.org]
- Singh A. Information
Processing and Learning – Lecture 23. Carnegie Mellon University.
2016. Available from: https://www.cs.cmu.edu/~aarti/Class/10704_Fall16/lec23.pdf [cs.cmu.edu]
- Weissman T. Information
Theory – Lecture 14: Sanov’s Theorem. Stanford University. 2018.
Available from: https://web.stanford.edu/class/ee376a/files/2017-18/lecture_14.pdf [web.stanford.edu]
- Weber H. Large
Deviations TCC Course. University of Warwick. Available from: https://warwick.ac.uk/fac/sci/maths/people/staff/hendrik_weber/tcc/ [warwick.ac.uk]
- Lee S. Sanov’s
Theorem: A Deep Dive. NumberAnalytics. 2025. Available from: https://www.numberanalytics.com/blog/sanovs-theorem-number-theory-ergodic-theory [numberanalytics.com]
- Get link
- X
- Other Apps
Comments
Post a Comment