Concept index

Hanzhe Li, Jinghan Liu, Jieyang Hu

Contents

This page collects definitions and tools that are used repeatedly in the main text but are not restated every time. Each entry keeps only the most common checks and formulas, so it can be used as a quick reference.

Basic modeling and distribution functions

Definition: probability space

A probability space is a triple (Ω,F,P)(\Omega,\mathcal F,\mathbb P). Here Ω\Omega is the sample space, F\mathcal F is the event space, and P:F[0,1]\mathbb P:\mathcal F\to[0,1] satisfies P(Ω)=1\mathbb P(\Omega)=1 and countable additivity. In problems, first identify what the outcomes are, what the events are, and how probabilities are assigned.

Definition: sigma-algebra

A collection F\mathcal F is a σ\sigma-algebra on Ω\Omega if ΩF\Omega\in\mathcal F, and if it is closed under complements and countable unions. By De Morgan's laws, it is also closed under countable intersections. It tells us which sets are allowed to have probabilities.

Tool: constructing a probability space from a random experiment

For a finite or countable model, write it in three steps:

  • Sample space Ω\Omega: list all possible outcomes.
  • Probability P\mathbb P: state whether outcomes are equally likely, or give the weights.
  • Random variable XX: map each outcome to a number.

This avoids mixing up the outcome of the random experiment with the value of a random variable.

Definition: random variable

A random variable is a measurable function X:ΩRX:\Omega\to\mathbb R from the sample space to the real line. Many random variables can be defined on the same probability space. In many problems, the clean order is to write Ω\Omega and P\mathbb P first, then define X(ω)X(\omega).

Definition: independence

A family of events {Ai:iI}\{A_i:i\in I\} is mutually independent if for every finite set of distinct indices i1,,iki_1,\dots,i_k,

P(Ai1Aik)=j=1kP(Aij).\mathbb P(A_{i_1}\cap\cdots\cap A_{i_k}) =\prod_{j=1}^k \mathbb P(A_{i_j}).

Pairwise independence only checks the case k=2k=2, and is strictly weaker than mutual independence.

Definition: distribution function

The distribution function of a random variable XX is F(x)=P(Xx)F(x)=\mathbb P(X\le x). It is nondecreasing, right-continuous, and satisfies

limxF(x)=0,limx+F(x)=1.\lim_{x\to-\infty}F(x)=0,\qquad \lim_{x\to+\infty}F(x)=1.

Point masses are jumps:

P(X=x)=F(x)F(x).\mathbb P(X=x)=F(x)-F(x-).
Tool: checking whether a function is a distribution function

To check that a function FF is a distribution function, usually verify:

  • FF is nondecreasing.
  • FF is right-continuous.
  • limxF(x)=0\lim_{x\to-\infty}F(x)=0.
  • limx+F(x)=1\lim_{x\to+\infty}F(x)=1.

If F,GF,G are distribution functions and 0λ10\leq\lambda\leq 1, then

λF+(1λ)G\lambda F+(1-\lambda)G

is also a distribution function.

Tool: constructing a random variable from a distribution function

If UU[0,1]U\sim U[0,1], the inverse transform construction

X=F1(U),F1(u)=inf{x:F(x)u}X=F^{-1}(U),\qquad F^{-1}(u)=\inf\{x:F(x)\ge u\}

gives a random variable with distribution function FF.

Conditional expectation, indicators, and second moments

Tool: tail-sum formula

If XX is a nonnegative integer-valued random variable, then

EX=n=0P(X>n).\mathbb E X=\sum_{n=0}^{\infty}\mathbb P(X>n).

If X0X\geq 0 is a general nonnegative random variable, then

EX=0P(X>t)dt\mathbb E X=\int_0^\infty \mathbb P(X>t)\,dt

in the extended sense, allowing the value ++\infty.

Tool: conditioning

For a mixture distribution or a multi-stage experiment, first choose a variable YY that simplifies the structure. Then use

P(A)=yP(AY=y)P(Y=y),EX=E[E(XY)].\mathbb P(A)=\sum_y \mathbb P(A\mid Y=y)\mathbb P(Y=y), \qquad \mathbb E X=\mathbb E[\mathbb E(X\mid Y)].

In the continuous case, replace sums by integrals.

Definition: conditional expectation
E[XF]\mathbb E[X\mid\mathcal F]

is the average prediction of XX after the information F\mathcal F is given. In the discrete case, one can think of F\mathcal F as dividing the sample space into conditional blocks; the conditional expectation is the average on each block. A common formula is the tower property

EX=E[E(XY)].\mathbb E X=\mathbb E[\mathbb E(X\mid Y)].
Tool: indicator decomposition

Counting random variables are often written as

N=iIi.N=\sum_i I_i.

Then

EN=iEIi,\mathbb E N=\sum_i\mathbb E I_i,

and the variance can be computed by

Var(N)=iVar(Ii)+2i<jCov(Ii,Ij).\operatorname{Var}(N)=\sum_i\operatorname{Var}(I_i) +2\sum_{i<j}\operatorname{Cov}(I_i,I_j).

This is useful for counting adjacency relations, local structures, and numbers of appearances.

Tool: linearity of covariance

Covariance is linear in each argument. For example,

Cov(aX+bY,Z)=aCov(X,Z)+bCov(Y,Z).\operatorname{Cov}(aX+bY,Z) =a\operatorname{Cov}(X,Z)+b\operatorname{Cov}(Y,Z).

If X,YX,Y are independent and have finite second moments, then

Cov(X,Y)=0.\operatorname{Cov}(X,Y)=0.

For sample means, centered variables, and projection residuals, covariance linearity often gives the answer in one line.

Tool: higher even-moment method and Markov's inequality

The 2m2m-th moment method rewrites a tail event in terms of a higher even power. If m1m\geq 1 and EX2m<\mathbb E|X|^{2m}<\infty, Markov's inequality gives

P(Xa)=P(X2ma2m)EX2ma2m.\mathbb P(|X|\ge a) =\mathbb P(|X|^{2m}\ge a^{2m}) \le \frac{\mathbb E|X|^{2m}}{a^{2m}}.

When m=1m=1 and EX=0\mathbb E X=0, this becomes Chebyshev's inequality:

P(Xa)Var(X)a2.\mathbb P(|X|\ge a)\le \frac{\operatorname{Var}(X)}{a^2}.

This is often used to prove convergence in probability. If

EXnc2m0,\mathbb E|X_n-c|^{2m}\to 0,

then

XnPc.X_n\xrightarrow{P}c.

The usual move is to write the target difference as XncX_n-c and bound an even moment. If the second moment is too weak, try the fourth, sixth, or another higher even moment.

Characteristic functions and independence

Definition: characteristic function

The characteristic function of a random variable XX is

φX(t)=EeitX.\varphi_X(t)=\mathbb E e^{itX}.

It always exists, and φX(0)=1\varphi_X(0)=1. A distribution is uniquely determined by its characteristic function, so this tool is well suited to independent sums and limiting distributions.

Tool: characteristic function of an independent sum

If X,YX,Y are independent, then

φX+Y(t)=φX(t)φY(t).\varphi_{X+Y}(t)=\varphi_X(t)\varphi_Y(t).

More generally, a sum of independent random variables corresponds to a product of characteristic functions. For limits of independent sums, first write the characteristic function of each term, then study the product.

Tool: testing independence by the joint characteristic function

The joint characteristic function is

φX,Y(s,t)=Eei(sX+tY).\varphi_{X,Y}(s,t)=\mathbb E e^{i(sX+tY)}.

If

φX,Y(s,t)=φX(s)φY(t)for all s,t,\varphi_{X,Y}(s,t)=\varphi_X(s)\varphi_Y(t) \quad\text{for all }s,t,

then XX and YY are independent. Knowing only

φX+Y(t)=φX(t)φY(t)\varphi_{X+Y}(t)=\varphi_X(t)\varphi_Y(t)

does not usually imply independence of X,YX,Y, because it only checks the diagonal of the joint characteristic function.

Tool: convergence of characteristic functions

If

φXn(t)φ(t),\varphi_{X_n}(t)\to \varphi(t),

and φ\varphi is the characteristic function of a random variable XX and is continuous at 00, then

XndX.X_n\xrightarrow{d}X.

In particular, if the limit is

et2/2,e^{-t^2/2},

then the limiting distribution is N(0,1)N(0,1).

Convergence in distribution and test functions

Definition: convergence in distribution
XndXX_n\xrightarrow{d}X

is equivalent to

Fn(x)F(x)F_n(x)\to F(x)

at every continuity point xx of the distribution function of XX. It is also equivalent to

Eh(Xn)Eh(X)\mathbb E h(X_n)\to \mathbb E h(X)

for every bounded continuous function hh. When using distribution functions, take limits directly only at continuity points.

Tool: Skorohod representation

If XnXX_n\Rightarrow X, then under suitable conditions one can construct copies with the same distributions,

X~n=dXn,X~=dX,\widetilde X_n\stackrel d=X_n,\qquad \widetilde X\stackrel d=X,

such that

X~nX~a.s.\widetilde X_n\to\widetilde X\quad a.s.

This can turn a weak convergence problem into an almost sure convergence problem. But it is a theorem; it does not mean the original XnX_n converges almost surely.

Tool: independence is preserved under almost sure limits

If XnXX_n\to X a.s., YnYY_n\to Y a.s., and Xn,YnX_n,Y_n are independent for each nn, then X,YX,Y are independent. One proof uses bounded continuous test functions. For any bounded continuous f,gf,g,

Ef(Xn)g(Yn)=Ef(Xn)Eg(Yn),\mathbb E f(X_n)g(Y_n) =\mathbb E f(X_n)\mathbb E g(Y_n),

and then the dominated convergence theorem passes to the limit.

Limit theorem toolbox

Tool: law of large numbers

If XiX_i are i.i.d. and EX1<\mathbb E|X_1|<\infty, then

1ni=1nXiPEX1.\frac1n\sum_{i=1}^n X_i\xrightarrow{P}\mathbb E X_1.

Use this to replace a sample average by the theoretical mean. Before applying it, check independence, identical distribution, and the first moment condition.

Tool: central limit theorem

If XiX_i are i.i.d., EXi=0\mathbb E X_i=0, and Var(Xi)=1\operatorname{Var}(X_i)=1, then

1ni=1nXidN(0,1).\frac1{\sqrt n}\sum_{i=1}^n X_i\xrightarrow{d}N(0,1).

In the general case, center first and divide by the standard deviation. Before applying it, check the mean, variance, and i.i.d. assumptions.

Tool: Slutsky's theorem

If

XndX,YnPc,X_n\xrightarrow{d}X,\qquad Y_n\xrightarrow{P}c,

then

XnYndcX,Xn+YndX+c.X_nY_n\xrightarrow{d}cX,\qquad X_n+Y_n\xrightarrow{d}X+c.

In particular, if the denominator converges in probability to 11, then

XnYndX.\frac{X_n}{Y_n}\xrightarrow{d}X.

This is commonly used for random normalizations and negligible error terms.

Tool: method of moments

If all moments converge to the moments of a distribution that is uniquely determined by its moments, then convergence in distribution follows. For a standard normal variable NN, odd moments are 00, and

E[N2k]=(2k1)!!.\mathbb E[N^{2k}]=(2k-1)!!.

When using this method, say why the target distribution is determined by its moments. It is not enough to write only "the moments converge."

Triangular arrays

Definition: normalized sum in a triangular array

For sums whose entries change with each row,

k=1nYn,k,\sum_{k=1}^n Y_{n,k},

we often write

Bn2=k=1nVar(Yn,k).B_n^2=\sum_{k=1}^n\operatorname{Var}(Y_{n,k}).

The normalized object is

1Bnk=1n(Yn,kEYn,k).\frac1{B_n}\sum_{k=1}^n (Y_{n,k}-\mathbb E Y_{n,k}).

First compute Bn2B_n^2, then check the relevant central limit theorem condition.

Tool: Lindeberg condition

For every ε>0\varepsilon>0, if

1Bn2k=1nE[Yn,k21{Yn,k>εBn}]0,\frac1{B_n^2}\sum_{k=1}^n \mathbb E\left[ Y_{n,k}^2\mathbf 1_{\{|Y_{n,k}|>\varepsilon B_n\}} \right]\to0,

then, under the usual surrounding assumptions, a central limit theorem holds. The steps are:

  • First compute Bn2B_n^2.
  • Then write the Lindeberg term.
  • Control it using tail integrability or a stronger moment condition.
Tool: third-moment criterion for Lindeberg

If

1Bn3k=1nEYn,k30,\frac1{B_n^3}\sum_{k=1}^n \mathbb E|Y_{n,k}|^3\to0,

then the Lindeberg condition holds. Indeed, on Yn,k>εBn|Y_{n,k}|>\varepsilon B_n,

Yn,k2Yn,k3εBn.Y_{n,k}^2\le \frac{|Y_{n,k}|^3}{\varepsilon B_n}.

This is a common quick check in textbooks.

Advanced tools: tail bounds and concentration

Tool: Paley-Zygmund and the second moment lower bound

If X0X\geq 0 and 0<θ<10<\theta<1, then

P(XθEX)(1θ)2(EX)2E[X2].\mathbb P(X\ge \theta \mathbb E X) \ge (1-\theta)^2\frac{(\mathbb E X)^2}{\mathbb E[X^2]}.

Letting θ0\theta\downarrow0 gives the second moment method:

P(X>0)(EX)2E[X2].\mathbb P(X>0)\ge \frac{(\mathbb E X)^2}{\mathbb E[X^2]}.

This is useful when you want to prove that some structure appears at least once. Usually XX is the number of appearances; compute EX\mathbb E X and control EX2\mathbb E X^2.

Tool: event version of the second moment method

Let

Bn=i=1mnAn,i,μn=i=1mnP(An,i).B_n=\bigcup_{i=1}^{m_n} A_{n,i},\qquad \mu_n=\sum_{i=1}^{m_n}\mathbb P(A_{n,i}).

If only the dependent pairs are included in

γn=ijP(An,iAn,j),\gamma_n=\sum_{i\sim j}\mathbb P(A_{n,i}\cap A_{n,j}),

then in many counting problems, μn\mu_n\to\infty and γn=o(μn2)\gamma_n=o(\mu_n^2) imply P(Bn)1\mathbb P(B_n)\to1. This is a common template in random graphs and random structures.

Tool: Chernoff-Cramer bound

If the moment generating function

MX(s)=EesXM_X(s)=\mathbb E e^{sX}

is finite in the relevant range, write

ΨX(s)=logMX(s).\Psi_X(s)=\log M_X(s).

For s>0s>0, exponential Markov gives

P(Xβ)exp{sβ+ΨX(s)}.\mathbb P(X\ge \beta) \le \exp\{-s\beta+\Psi_X(s)\}.

Thus one usually writes

P(Xβ)infs>0exp{sβ+ΨX(s)}.\mathbb P(X\ge \beta) \le \inf_{s>0}\exp\{-s\beta+\Psi_X(s)\}.

This is the starting point of many exponential tail bounds: write the moment generating function, then optimize over ss.

Definition: sub-Gaussian random variable

Let μ=EX\mu=\mathbb E X. If there is a ν>0\nu>0 such that for all sRs\in\mathbb R,

ΨXμ(s)νs22,\Psi_{X-\mu}(s)\le \frac{\nu s^2}{2},

then XX is called sub-Gaussian with parameter ν\nu. A typical tail bound is

P(Xμβ)2exp(β22ν).\mathbb P(|X-\mu|\ge \beta) \le 2\exp\left(-\frac{\beta^2}{2\nu}\right).

Bounded variables, normal variables, and many independent sums have this square-exponential tail behavior.

Tool: Hoeffding-type bound for weighted sums

Suppose XiX_i are independent and XisG(νi)X_i\in \mathrm{s}\mathcal G(\nu_i). Let

S=i=1nwiXi,V=i=1nwi2νi.S=\sum_{i=1}^n w_iX_i,\qquad V=\sum_{i=1}^n w_i^2\nu_i.

Then SS is again sub-Gaussian-type, and

P(SESβ)2exp(β22V).\mathbb P(|S-\mathbb ES|\ge \beta) \le 2\exp\left(-\frac{\beta^2}{2V}\right).

Use this for independent weighted sums, random signs, and deviations of empirical averages. The main step is computing the variance proxy VV correctly.

Definition: sub-exponential random variable

Let μ=EX\mu=\mathbb E X. If there are ν,α>0\nu,\alpha>0 such that for s<1/α|s|<1/\alpha,

ΨXμ(s)νs22,\Psi_{X-\mu}(s)\le \frac{\nu s^2}{2},

then XX is called sub-exponential with parameters (ν,α)(\nu,\alpha). Its one-sided tail bound is

P(Xμβ){exp(β22ν),0<βν/α,exp(β2α),β>ν/α.\mathbb P(X-\mu\ge \beta)\le \begin{cases} \exp\left(-\dfrac{\beta^2}{2\nu}\right),&0<\beta\le \nu/\alpha,\\ \exp\left(-\dfrac{\beta}{2\alpha}\right),&\beta>\nu/\alpha. \end{cases}

Small deviations look sub-Gaussian; large deviations become exponential.

Tool: Bernstein-type bound for bounded variables

Let X1,,XnX_1,\dots,X_n be independent, with μi=EXi\mu_i=\mathbb E X_i, Var(Xi)=σi2\operatorname{Var}(X_i)=\sigma_i^2, and

Xiμic.|X_i-\mu_i|\le c.

For Sn=iXiS_n=\sum_iX_i and V=iσi2V=\sum_i\sigma_i^2, a common one-sided Bernstein-type bound is

P(SnESnβ){exp(β24V),0<βV/c,exp(β4c),β>V/c.\mathbb P(S_n-\mathbb ES_n\ge \beta)\le \begin{cases} \exp\left(-\dfrac{\beta^2}{4V}\right),&0<\beta\le V/c,\\ \exp\left(-\dfrac{\beta}{4c}\right),&\beta>V/c. \end{cases}

For a two-sided bound, apply the same inequality to Xi-X_i as well. This is often much sharper than Chebyshev for sums of independent bounded variables.

Tool: Borel-Cantelli template

If

n=1P(An)<,\sum_{n=1}^{\infty}\mathbb P(A_n)<\infty,

then

P(An i.o.)=0.\mathbb P(A_n\ \text{i.o.})=0.

This is often used to prove that a bad event happens only finitely many times, which then gives an eventual almost sure bound. If the events AnA_n are independent and nP(An)=\sum_n\mathbb P(A_n)=\infty, the second Borel-Cantelli lemma gives P(An i.o.)=1\mathbb P(A_n\ \text{i.o.})=1.

Quick reference

  • To prove convergence in probability: first try a higher even-moment method or Chebyshev.
  • To prove convergence in distribution to a normal law: first try CLT plus Slutsky.
  • For triangular arrays: check Lindeberg or the third-moment criterion.
  • For independent sums: consider characteristic functions.
  • To prove a nonnegative count is positive: try Paley-Zygmund or a second moment lower bound.
  • For exponential tail bounds: write the moment generating function and try Chernoff-Cramer.
  • For independent weighted sums: check whether a Hoeffding-type bound applies.
  • For sums of independent bounded variables: consider a Bernstein-type bound.
  • For maximum probabilities: first try a union bound, then combine it with Chernoff-Cramer, Hoeffding, or Bernstein.
  • For eventual almost sure statements: consider Borel-Cantelli.
  • For counting problems: write the count as a sum of indicator variables.
  • For limits of distribution functions: take limits directly only at continuity points.
  • For limits of expectations when you only have convergence in distribution: consider Skorohod representation or uniform integrability.
Reading warning

In probability, many uses of "obvious" rely on countable additivity, monotone convergence, independence, moment conditions, or the assumptions of a limit theorem. When reading a proof, it is better to mark these conditions step by step.