A Guided Path Through the Large Deviations Series

  This post serves as a short guide to the four-part series on large deviations and their applications to stochastic processes, biology, and weak-noise dynamical systems. Each article can be read independently, but together they form a coherent narrative that moves from foundational principles to modern applications. 1. Sanov’s Theorem and the Geometry of Rare Events The series begins with an intuitive introduction to Sanov’s theorem , highlighting how empirical distributions deviate from their expected behavior and how the Kullback-Leibler divergence emerges as the natural rate functional. This post lays the conceptual groundwork for understanding rare events in high-dimensional systems. Read the post → 2. Sanov’s Theorem in Living Systems The second article explores how Sanov’s theorem applies to biological and neural systems . Empirical measures, population variability, and rare transitions in gene expression or neural activity are framed through ...

Sanov’s Theorem and the Geometry of Rare Events: A Modern Large‑Deviation Perspective

Large-deviation theory provides a powerful mathematical framework for quantifying the probability of rare events in stochastic systems. Among its foundational results, Sanov’s theorem plays a central role: it characterizes the exponential decay of probabilities associated with atypical empirical distributions. This perspective is deeply connected to information theory, statistical mechanics, and the analysis of stochastic processes.

1. Introduction

Many stochastic systems—ranging from molecular dynamics to neural activity—exhibit fluctuations that are typically small but occasionally produce rare, high-impact deviations. Understanding the probability of these deviations requires tools that go beyond classical variance-based approximations. Sanov’s theorem provides exactly such a tool: a geometric and information-theoretic description of rarity.

This post introduces the theorem, its intuition, and its connection to diffusion processes through discrete approximations and contraction principles. A companion post explores applications in biology and neuroscience.

Notation

For clarity, we summarize the variables and symbols used throughout this post:

  • \(X_1, \dots, X_n\): independent and identically distributed (i.i.d.) random variables.
  • \(P\): true underlying probability distribution of the samples.
  • \(\hat{P}_n\): empirical measure of the sample, \(\hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}\).
  • \(Q\): alternative probability distribution used to describe atypical empirical behavior.
  • \(D_{\mathrm{KL}}(Q\|P)\): Kullback–Leibler divergence between \(Q\) and \(P\).
  • \(\mathbb{P}(\hat{P}_n \approx Q)\): probability that the empirical measure is close to \(Q\).
  • Rate functional: the quantity governing the exponential decay of rare events.

Note on KL divergence

The Kullback–Leibler divergence is defined as

\[ D_{\mathrm{KL}}(Q\|P) = \int \log\!\left(\frac{dQ}{dP}\right)\,dQ, \]

and measures the “cost” of observing empirical behavior consistent with \(Q\) when the true distribution is \(P\). It plays the same role as an energy barrier in large-deviation theory.

2. Sanov’s Theorem: Statement and Intuition

2.1 Setup

Let \(X_1, \dots, X_n\) be i.i.d. random variables taking values in a Polish space \(E\), with common law \(P\). The empirical measure is

\[ \hat{P}_n = \frac{1}{n}\sum_{i=1}^n \delta_{X_i}. \]

Sanov’s theorem states that the sequence \(\hat{P}_n\) satisfies a large deviation principle (LDP) with rate function

\[ I(Q) = D_{\mathrm{KL}}(Q\|P), \]

the Kullback–Leibler divergence between \(Q\) and \(P\).

2.2 Geometric and Informational Interpretation

The theorem reveals that the “cost” of observing an empirical distribution \(Q\) different from the true distribution \(P\) is precisely the relative entropy between them. This connects large deviations to:

  • the geometry of probability measures,
  • the principle of maximum entropy,
  • Bayesian updating and variational inference.

3. From Discrete Noise to Diffusion Processes

Although Sanov’s theorem applies to i.i.d. samples, its influence extends to continuous-time stochastic processes through discretization and contraction principles.

3.1 Brownian Motion as a Limit of i.i.d. Increments

Consider the SDE

\[ dX_t = b(X_t)\,dt + \sigma(X_t)\,dW_t. \]

A standard discretization uses Gaussian increments \(\Delta W_i\), which are independent. Sanov’s theorem provides an LDP for the empirical distribution of these increments. Through the Euler scheme, these increments map continuously to discrete trajectories of the SDE.

By the contraction principle, the LDP for increments induces an LDP for trajectories. In the small-step limit, one recovers the classical Freidlin–Wentzell rate functional:

\[ I(\phi) = \frac{1}{2}\int_0^T \|\dot{\phi}(t) - b(\phi(t))\|^2_{\sigma^{-1}}\,dt. \]

3.2 Extension to SPDEs

For SPDEs driven by space-time white noise, discretization produces arrays of i.i.d. Gaussian variables. Sanov’s theorem again yields an LDP for their empirical distribution. Finite-difference schemes map these increments to approximate SPDE trajectories, and the contraction principle leads to infinite-dimensional LDPs such as:

  • Dawson–Gärtner LDP for measure-valued processes,
  • Budhiraja–Dupuis variational representations.

4. Applications of LDPs for Diffusion Processes

  • Exit times and metastability
  • Hitting probabilities
  • Occupation measures
  • Invariant measures under perturbations
  • Entropy production in non-equilibrium systems
  • Error bounds for numerical schemes

5. Concluding Remarks

Sanov’s theorem provides the microscopic entropy structure underlying many large-deviation principles for diffusion processes. Rare paths arise from rare configurations of noise increments, and their probabilities are governed by relative entropy. The next post explores how this perspective illuminates rare events in biology and neuroscience.

References

  1. Thiéry A. Sanov’s Theorem. GitHub Pages. 2023. Available from: https://alexxthiery.github.io/notes/sanov/sanov.html [alexxthier….github.io]
  2. Sanov’s theorem – Wikipedia. Available from: https://en.wikipedia.org/wiki/Sanov%27s_theorem [en.wikipedia.org]
  3. Singh A. Information Processing and Learning – Lecture 23. Carnegie Mellon University. 2016. Available from: https://www.cs.cmu.edu/~aarti/Class/10704_Fall16/lec23.pdf [cs.cmu.edu]
  4. Weissman T. Information Theory – Lecture 14: Sanov’s Theorem. Stanford University. 2018. Available from: https://web.stanford.edu/class/ee376a/files/2017-18/lecture_14.pdf [web.stanford.edu]
  5. Weber H. Large Deviations TCC Course. University of Warwick. Available from: https://warwick.ac.uk/fac/sci/maths/people/staff/hendrik_weber/tcc/ [warwick.ac.uk]
  6. Lee S. Sanov’s Theorem: A Deep Dive. NumberAnalytics. 2025. Available from: https://www.numberanalytics.com/blog/sanovs-theorem-number-theory-ergodic-theory [numberanalytics.com]


Comments

Popular posts from this blog

Understanding Anaerobic Threshold (VT2) and VO2 Max in Endurance Training

Owen's Function: A Simple Solution to Complex Problems

Cell Count Analysis with cycleTrendR