Fifth recitation

Jieyang Hu

Contents
Reading guide
  • This chapter moves into convergence theory, the strong law of large numbers, characteristic functions, the central limit theorem, and Stein's method.
  • The four modes of convergence give the basic map: a.s., LpL^p, PP, and DD.
  • In each proof, keep track of where independence, moment assumptions, truncation, or Borel-Cantelli is used.

Tip. Whenever a limiting distribution appears, first ask what kind of limit it is: in probability, in distribution, or almost surely.

Exercise 4.2

Note

Keep the four modes of convergence separate: a.s., LpL^p, PP, and DD. When you see an arrow, first identify which mode it means.

Problem: 4.2.1

Prove the following two inequalities.

(1) (Lyapunov inequality) If 0<r<s0<r<s, then

(E[Xr])1/r(E[Xs])1/s.\bigl(\mathbb{E}[|X|^r]\bigr)^{1/r} \leq \bigl(\mathbb{E}[|X|^s]\bigr)^{1/s}.

(2) (CrC_r inequality) If r>0r>0, then

E[X+Yr]Cr(E[Xr]+E[Yr]),\mathbb{E}[|X+Y|^r] \leq C_r\bigl(\mathbb{E}[|X|^r]+\mathbb{E}[|Y|^r]\bigr),

where

Cr={1,0<r<1,2r1,r1.C_r= \begin{cases} 1, & 0<r<1,\\ 2^{r-1}, & r\geq 1. \end{cases}
Proof

(1) Let α=r/s(0,1)\alpha=r/s\in(0,1). Since xxαx\mapsto x^\alpha is concave on [0,)[0,\infty), Jensen's inequality gives

E[Xr]=E[(Xs)α](E[Xs])α.\mathbb{E}[|X|^r] =\mathbb{E}\bigl[(|X|^s)^\alpha\bigr] \leq \bigl(\mathbb{E}[|X|^s]\bigr)^\alpha.

Taking the power 1/r1/r on both sides gives

(E[Xr])1/r(E[Xs])1/s.\bigl(\mathbb{E}[|X|^r]\bigr)^{1/r} \leq \bigl(\mathbb{E}[|X|^s]\bigr)^{1/s}.

22 If 0<r<10<r<1, then for all a,b0a,b\geq 0,

(a+b)rar+br.(a+b)^r\leq a^r+b^r.

Therefore

X+Yr(X+Y)rXr+Yr.|X+Y|^r\leq (|X|+|Y|)^r\leq |X|^r+|Y|^r.

Taking expectations gives

E[X+Yr]E[Xr]+E[Yr].\mathbb{E}[|X+Y|^r]\leq \mathbb{E}[|X|^r]+\mathbb{E}[|Y|^r].

If r1r\geq 1, then by convexity,

(a+b)r=2r(a+b2)r2r1(ar+br).(a+b)^r =2^r\left(\frac{a+b}{2}\right)^r \leq 2^{r-1}(a^r+b^r).

Hence

X+Yr2r1(Xr+Yr),|X+Y|^r\leq 2^{r-1}\bigl(|X|^r+|Y|^r\bigr),

and the result follows after taking expectations.

Problem: 4.2.2

Let {Xn}\{X_n\} be a sequence of random variables, and let {cn}\{c_n\} be a real sequence with cncc_n\to c. Prove, under each of the four modes of convergence a.s., LpL^p, in probability, and in distribution, that

XnXcnXncX.X_n\to X \Longrightarrow c_nX_n\to cX.
Proof

If Xna.s.XX_n\xrightarrow{\text{a.s.}}X, then for almost every ω\omega,

cnXn(ω)cX(ω).c_nX_n(\omega)\to cX(\omega).

Thus cnXna.s.cXc_nX_n\xrightarrow{\text{a.s.}}cX.

If XnLpXX_n\xrightarrow{L^p}X, then XLpX\in L^p, and {cn}\{c_n\} is bounded. By the CrC_r inequality, for a constant CpC_p depending only on pp,

cnXncXpCp(cnpXnXp+cncpXp).|c_nX_n-cX|^p \leq C_p\bigl(|c_n|^p|X_n-X|^p+|c_n-c|^p|X|^p\bigr).

Taking expectations gives

E[cnXncXp]0,\mathbb{E}[|c_nX_n-cX|^p]\to 0,

so cnXnLpcXc_nX_n\xrightarrow{L^p}cX.

If XnPXX_n\xrightarrow{P}X, then

cnXncX=cn(XnX)+(cnc)X.c_nX_n-cX=c_n(X_n-X)+(c_n-c)X.

Since {cn}\{c_n\} is bounded, the first term converges to 00 in probability. The second term converges to 00 a.s., hence also in probability. Therefore

cnXnPcX.c_nX_n\xrightarrow{P}cX.

If XnDXX_n\xrightarrow{D}X, regard cnc_n as a constant random variable. Then cnPcc_n\xrightarrow{P}c, and Slutsky's theorem gives

cnXnDcX.c_nX_n\xrightarrow{D}cX.
Problem: 4.2.3

Prove that, as nn\to\infty,

XnP0E ⁣[Xn1+Xn]0.X_n\xrightarrow{P}0 \quad\Longleftrightarrow\quad \mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right]\to 0.
Proof

Suppose XnP0X_n\xrightarrow{P}0. For every ε>0\varepsilon>0,

E ⁣[Xn1+Xn]=E ⁣[Xn1+Xn;Xn<ε]+E ⁣[Xn1+Xn;Xnε]ε+P(Xnε).\begin{aligned} \mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right] &= \mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}; |X_n|<\varepsilon\right] + \mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}; |X_n|\geq \varepsilon\right] \\ &\leq \varepsilon+\mathbb{P}(|X_n|\geq \varepsilon). \end{aligned}

Letting nn\to\infty gives

lim supnE ⁣[Xn1+Xn]ε.\limsup_{n\to\infty}\mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right]\leq \varepsilon.

Then let ε0\varepsilon\downarrow 0. Hence

E ⁣[Xn1+Xn]0.\mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right]\to 0.

Conversely, if

E ⁣[Xn1+Xn]0,\mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right]\to 0,

then for every ε>0\varepsilon>0,

E ⁣[Xn1+Xn]E ⁣[Xn1+Xn;Xnε]ε1+εP(Xnε).\mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}\right] \geq \mathbb{E}\!\left[\frac{|X_n|}{1+|X_n|}; |X_n|\geq \varepsilon\right] \geq \frac{\varepsilon}{1+\varepsilon}\mathbb{P}(|X_n|\geq \varepsilon).

Thus P(Xnε)0\mathbb{P}(|X_n|\geq \varepsilon)\to 0, so XnP0X_n\xrightarrow{P}0.

Problem: 4.2.4

Let random variable sequences {Xn}\{X_n\} and {Yn}\{Y_n\} satisfy XnDXX_n\xrightarrow{D}X and YnPcY_n\xrightarrow{P}c, where XX is a random variable and cc is a constant. Prove:

11 Xn+YnDX+cX_n+Y_n\xrightarrow{D}X+c.

22 XnYnDcXX_nY_n\xrightarrow{D}cX, and if c0c\neq 0, then

XnYnDXc.\frac{X_n}{Y_n}\xrightarrow{D}\frac{X}{c}.
Proof

The first statement and the first part of the second statement are Slutsky's theorem with Zn0Z_n\equiv 0. Hence

Xn+YnDX+c,XnYnDcX.X_n+Y_n\xrightarrow{D}X+c,\qquad X_nY_n\xrightarrow{D}cX.

If c0c\neq 0, the function x1/xx\mapsto 1/x is continuous at cc, so

1YnP1c.\frac{1}{Y_n}\xrightarrow{P}\frac{1}{c}.

Applying Slutsky's theorem to {Xn}\{X_n\} and {1/Yn}\{1/Y_n\} gives

XnYn=Xn1YnDXc.\frac{X_n}{Y_n}=X_n\cdot\frac{1}{Y_n}\xrightarrow{D}\frac{X}{c}.

The full form of Slutsky's theorem is as follows. Suppose

XnDX,YnPb,ZnPc,X_n\xrightarrow{D}X,\qquad Y_n\xrightarrow{P}b,\qquad Z_n\xrightarrow{P}c,

where XX is a random variable and b,cb,c are constants. Then

XnYn+ZnDbX+c.X_nY_n+Z_n\xrightarrow{D}bX+c.

In particular,

Xn+YnDX+c,XnYnDbX,X_n+Y_n\xrightarrow{D}X+c,\qquad X_nY_n\xrightarrow{D}bX,

and if b0b\neq 0,

XnYnDXb.\frac{X_n}{Y_n}\xrightarrow{D}\frac{X}{b}.
Proof

First prove a standard lemma: if

UnVnP0,VnDV,U_n-V_n\xrightarrow{P}0,\qquad V_n\xrightarrow{D}V,

then UnDVU_n\xrightarrow{D}V.

Indeed, for every ε>0\varepsilon>0 and every continuity point xx of the distribution function of VV,

{Vnxε}{UnVnε}{Unx},\{V_n\leq x-\varepsilon\}\cap\{|U_n-V_n|\leq\varepsilon\} \subset \{U_n\leq x\},

and

{Unx}{Vnx+ε}{UnVn>ε}.\{U_n\leq x\} \subset \{V_n\leq x+\varepsilon\}\cup\{|U_n-V_n|>\varepsilon\}.

Therefore

P(Vnxε)P(UnVn>ε)P(Unx),\mathbb{P}(V_n\leq x-\varepsilon)-\mathbb{P}(|U_n-V_n|>\varepsilon) \leq \mathbb{P}(U_n\leq x),

and

P(Unx)P(Vnx+ε)+P(UnVn>ε).\mathbb{P}(U_n\leq x) \leq \mathbb{P}(V_n\leq x+\varepsilon)+\mathbb{P}(|U_n-V_n|>\varepsilon).

Letting nn\to\infty gives

FV(xε)lim infnP(Unx)lim supnP(Unx)FV(x+ε).F_V(x-\varepsilon)\leq \liminf_{n\to\infty}\mathbb{P}(U_n\leq x) \leq \limsup_{n\to\infty}\mathbb{P}(U_n\leq x) \leq F_V(x+\varepsilon).

Letting ε0\varepsilon\downarrow 0 and using continuity at xx gives

P(Unx)FV(x).\mathbb{P}(U_n\leq x)\to F_V(x).

Thus UnDVU_n\xrightarrow{D}V.

Now prove Slutsky's theorem. For addition, the continuous mapping theorem gives

Xn+cDX+c.X_n+c\xrightarrow{D}X+c.

Since

(Xn+Yn)(Xn+c)=YncP0,(X_n+Y_n)-(X_n+c)=Y_n-c\xrightarrow{P}0,

the lemma gives

Xn+YnDX+c.X_n+Y_n\xrightarrow{D}X+c.

For multiplication, XnDXX_n\xrightarrow{D}X implies that {Xn}\{X_n\} is tight. Thus for every ε,η>0\varepsilon,\eta>0, there exists M>0M>0 such that, for all large nn,

P(Xn>M)<η.\mathbb{P}(|X_n|>M)<\eta.

Hence

P(Xn(Ynb)>ε)P(Xn>M)+P ⁣(Ynb>εM).\mathbb{P}(|X_n(Y_n-b)|>\varepsilon) \leq \mathbb{P}(|X_n|>M) + \mathbb{P}\!\left(|Y_n-b|>\frac{\varepsilon}{M}\right).

Letting nn\to\infty gives

Xn(Ynb)P0.X_n(Y_n-b)\xrightarrow{P}0.

Also, by the continuous mapping theorem,

bXnDbX.bX_n\xrightarrow{D}bX.

Since

XnYnbXn=Xn(Ynb)P0,X_nY_n-bX_n=X_n(Y_n-b)\xrightarrow{P}0,

the lemma gives

XnYnDbX.X_nY_n\xrightarrow{D}bX.

Finally, apply the addition part to XnYnX_nY_n and ZnZ_n to get

XnYn+ZnDbX+c.X_nY_n+Z_n\xrightarrow{D}bX+c.

If b0b\neq 0, then x1/xx\mapsto 1/x is continuous at bb, so

1YnP1b.\frac{1}{Y_n}\xrightarrow{P}\frac{1}{b}.

The multiplication part applied to XnX_n and 1/Yn1/Y_n gives

XnYn=Xn1YnDXb.\frac{X_n}{Y_n}=X_n\cdot\frac{1}{Y_n}\xrightarrow{D}\frac{X}{b}.

Exercise 4.3

Note

Borel-Cantelli, subsequence arguments, and extreme-value estimates often appear together. Almost sure conclusions usually come from building summable bad events.

Problem: 4.3.1

Let {Xn}\{X_n\} be independent standard normal random variables. Use the standard normal tail estimate from Chapter 3, Problem 14(1), to prove

P ⁣(lim supnXnlogn=2)=1.\mathbb{P}\!\left(\limsup_{n\to\infty}\frac{X_n}{\sqrt{\log n}}=\sqrt{2}\right)=1.
Proof

For a>0a>0, define

An(a)={Xn2alogn}.A_n(a)=\left\{X_n\geq \sqrt{2a\log n}\right\}.

By the normal tail estimate, there exist constants C1,C2>0C_1,C_2>0 such that, for all large nn,

C1nalognP(An(a))C2nalogn.C_1\frac{n^{-a}}{\sqrt{\log n}} \leq \mathbb{P}(A_n(a)) \leq C_2\frac{n^{-a}}{\sqrt{\log n}}.

If 0<a<10<a<1, then

n=2P(An(a))=.\sum_{n=2}^{\infty}\mathbb{P}(A_n(a))=\infty.

The events An(a)A_n(a) are independent, so the second Borel-Cantelli lemma gives

P(An(a) i.o.)=1.\mathbb{P}(A_n(a)\ \text{i.o.})=1.

Thus

lim supnXnlogn2aa.s.\limsup_{n\to\infty}\frac{X_n}{\sqrt{\log n}}\geq \sqrt{2a} \qquad\text{a.s.}

If a>1a>1, then

n=2P(An(a))<.\sum_{n=2}^{\infty}\mathbb{P}(A_n(a))<\infty.

By the first Borel-Cantelli lemma,

P(An(a) i.o.)=0.\mathbb{P}(A_n(a)\ \text{i.o.})=0.

Hence

lim supnXnlogn2aa.s.\limsup_{n\to\infty}\frac{X_n}{\sqrt{\log n}}\leq \sqrt{2a} \qquad\text{a.s.}

Therefore, for every 0<a<1<b0<a<1<b,

2alim supnXnlogn2ba.s.\sqrt{2a} \leq \limsup_{n\to\infty}\frac{X_n}{\sqrt{\log n}} \leq \sqrt{2b} \qquad\text{a.s.}

Letting a1a\uparrow 1 and b1b\downarrow 1 gives

lim supnXnlogn=2a.s.\limsup_{n\to\infty}\frac{X_n}{\sqrt{\log n}}=\sqrt{2} \qquad\text{a.s.}
Problem: 4.3.6

Let X1,,XnX_1,\cdots,X_n be i.i.d. uniform random variables on [0,a][0,a], where a>0a>0. Set

Mn=max{X1,,Xn}.M_n=\max\{X_1,\cdots,X_n\}.

Prove that MnaM_n\to a both a.s. and in LpL^p as nn\to\infty.

Proof

For every 0<ε<a0<\varepsilon<a,

P(Mna>ε)=P(Mn<aε)=(aεa)n.\mathbb{P}(|M_n-a|>\varepsilon) =\mathbb{P}(M_n<a-\varepsilon) =\left(\frac{a-\varepsilon}{a}\right)^n.

Since

n=1(aεa)n<,\sum_{n=1}^{\infty}\left(\frac{a-\varepsilon}{a}\right)^n<\infty,

the first Borel-Cantelli lemma implies that

Mna>ε|M_n-a|>\varepsilon

can occur only finitely many times. Taking a countable intersection over positive rational ε\varepsilon gives

Mna.s.a.M_n\xrightarrow{\text{a.s.}}a.

Since 0Mna0\leq M_n\leq a,

Mnapap.|M_n-a|^p\leq a^p.

Together with almost sure convergence, the dominated convergence theorem gives

E[Mnap]0.\mathbb{E}[|M_n-a|^p]\to 0.

Thus

MnLpa.M_n\xrightarrow{L^p}a.
Problem: 4.3.7

Suppose XnPXX_n\xrightarrow{P}X. Prove that there exists a subsequence {Xnk}\{X_{n_k}\} such that

Xnka.s.X.X_{n_k}\xrightarrow{\text{a.s.}}X.
Proof

Since XnPXX_n\xrightarrow{P}X, for each kNk\in\mathbb{N}^* we may choose nk>nk1n_k>n_{k-1} such that

P(XnkX>2k)<2k.\mathbb{P}(|X_{n_k}-X|>2^{-k})<2^{-k}.

Then

k=1P(XnkX>2k)<.\sum_{k=1}^{\infty}\mathbb{P}(|X_{n_k}-X|>2^{-k})<\infty.

By the first Borel-Cantelli lemma, the events

XnkX>2k|X_{n_k}-X|>2^{-k}

occur only finitely many times. Hence, for almost every ω\omega, there exists K(ω)K(\omega) such that for all kK(ω)k\geq K(\omega),

Xnk(ω)X(ω)2k.|X_{n_k}(\omega)-X(\omega)|\leq 2^{-k}.

Therefore Xnk(ω)X(ω)X_{n_k}(\omega)\to X(\omega), and

Xnka.s.X.X_{n_k}\xrightarrow{\text{a.s.}}X.
Problem: 4.3.8

11 Let {Xn}\{X_n\} be independent real-valued random variables with XnP0X_n\xrightarrow{P}0, and let {an}\{a_n\} be a positive increasing sequence with an+a_n\to+\infty. Must we have

Xnana.s.0?\frac{X_n}{a_n}\xrightarrow{\text{a.s.}}0?

22 Let {Xn}\{X_n\} be any sequence of real-valued random variables. Construct positive numbers {cn}\{c_n\} such that

Xncna.s.0.\frac{X_n}{c_n}\xrightarrow{\text{a.s.}}0.
Proof

(1) The conclusion need not hold. For the given sequence {an}\{a_n\}, define independent random variables by

P(Xn=an)=1n+1,P(Xn=0)=11n+1.\mathbb{P}(X_n=a_n)=\frac{1}{n+1}, \qquad \mathbb{P}(X_n=0)=1-\frac{1}{n+1}.

Since ana_n\to\infty, for every ε>0\varepsilon>0 and all large nn, an>εa_n>\varepsilon. Hence

P(Xn>ε)=1n+10,\mathbb{P}(|X_n|>\varepsilon)=\frac{1}{n+1}\to 0,

so XnP0X_n\xrightarrow{P}0. But

P ⁣(Xnan=1)=1n+1,n=11n+1=.\mathbb{P}\!\left(\frac{X_n}{a_n}=1\right)=\frac{1}{n+1}, \qquad \sum_{n=1}^{\infty}\frac{1}{n+1}=\infty.

By the second Borel-Cantelli lemma,

Xnan=1\frac{X_n}{a_n}=1

occurs infinitely often. Thus Xn/anX_n/a_n does not converge to 00 a.s.

22 For each nn, since P(Xn>t)0\mathbb{P}(|X_n|>t)\downarrow 0 as tt\to\infty, choose cn>0c_n>0 such that

P(Xn>2ncn)<2n.\mathbb{P}(|X_n|>2^{-n}c_n)<2^{-n}.

Let

An={Xn>2ncn}.A_n=\{|X_n|>2^{-n}c_n\}.

Then

n=1P(An)<.\sum_{n=1}^{\infty}\mathbb{P}(A_n)<\infty.

By the first Borel-Cantelli lemma, AnA_n occurs only finitely many times. Hence, almost surely, there exists N(ω)N(\omega) such that for all nN(ω)n\geq N(\omega),

Xncn2n.\left|\frac{X_n}{c_n}\right|\leq 2^{-n}.

Therefore

Xncna.s.0.\frac{X_n}{c_n}\xrightarrow{\text{a.s.}}0.

Exercise 4.4

Note

Proofs of the strong law often use truncation, fourth moments, or Borel-Cantelli. Watch which moment condition controls which tail event.

Problem: 4.4.1

Let {Xn}\{X_n\} be nonnegative i.i.d. random variables with E[X1]=+\mathbb{E}[X_1]=+\infty. Prove that

1nk=1nXka.s.+.\frac1n\sum_{k=1}^{n}X_k\xrightarrow{\text{a.s.}}+\infty.
Proof

For each M>0M>0, let

Yk(M)=XkM.Y_k^{(M)}=X_k\wedge M.

Then {Yk(M)}\{Y_k^{(M)}\} is still a nonnegative i.i.d. sequence, and E[Y1(M)]<\mathbb{E}[Y_1^{(M)}]<\infty. By the strong law of large numbers,

1nk=1nYk(M)a.s.E[Y1(M)].\frac1n\sum_{k=1}^{n}Y_k^{(M)} \xrightarrow{\text{a.s.}} \mathbb{E}[Y_1^{(M)}].

Since XkYk(M)X_k\geq Y_k^{(M)},

lim infn1nk=1nXkE[Y1(M)]a.s.\liminf_{n\to\infty}\frac1n\sum_{k=1}^{n}X_k \geq \mathbb{E}[Y_1^{(M)}] \qquad\text{a.s.}

Because Y1(M)X1Y_1^{(M)}\uparrow X_1, the monotone convergence theorem gives

E[Y1(M)]E[X1]=+.\mathbb{E}[Y_1^{(M)}]\uparrow \mathbb{E}[X_1]=+\infty.

Thus for every L>0L>0, we can choose MM so that E[Y1(M)]L\mathbb{E}[Y_1^{(M)}]\geq L. Hence

lim infn1nk=1nXkLa.s.\liminf_{n\to\infty}\frac1n\sum_{k=1}^{n}X_k\geq L \qquad\text{a.s.}

Since LL is arbitrary,

1nk=1nXka.s.+.\frac1n\sum_{k=1}^{n}X_k\xrightarrow{\text{a.s.}}+\infty.
Problem: 4.4.2

(Weierstrass approximation theorem) For every continuous function f:[0,1]Rf:[0,1]\to\mathbb{R}, let SnB(n,x)S_n\sim B(n,x). Prove that

limn+sup0x1f(x)k=0nf ⁣(kn)(nk)xk(1x)nk=0.\lim_{n\to+\infty}\sup_{0\leq x\leq 1} \left| f(x)-\sum_{k=0}^{n}f\!\left(\frac{k}{n}\right)\binom{n}{k}x^k(1-x)^{n-k} \right|=0.
Proof

For fixed x[0,1]x\in[0,1], let SnB(n,x)S_n\sim B(n,x). Then

P(Sn=k)=(nk)xk(1x)nk,\mathbb{P}(S_n=k)=\binom{n}{k}x^k(1-x)^{n-k},

so

k=0nf ⁣(kn)(nk)xk(1x)nk=E ⁣[f ⁣(Snn)].\sum_{k=0}^{n}f\!\left(\frac{k}{n}\right)\binom{n}{k}x^k(1-x)^{n-k} = \mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right].

It is enough to prove

sup0x1E ⁣[f ⁣(Snn)]f(x)0.\sup_{0\leq x\leq 1} \left| \mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right]-f(x) \right|\to 0.

Since ff is continuous on [0,1][0,1], it is uniformly continuous. Given ε>0\varepsilon>0, choose δ>0\delta>0 such that uv<δ|u-v|<\delta implies

f(u)f(v)<ε.|f(u)-f(v)|<\varepsilon.

Let

M=sup0y1f(y).M=\sup_{0\leq y\leq 1}|f(y)|.

Then

E ⁣[f ⁣(Snn)]f(x)E ⁣[f ⁣(Snn)f(x);Snnx<δ]+E ⁣[f ⁣(Snn)f(x);Snnxδ]ε+2MP ⁣(Snnxδ).\begin{aligned} \left|\mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right]-f(x)\right| &\leq \mathbb{E}\!\left[\left|f\!\left(\frac{S_n}{n}\right)-f(x)\right|; \left|\frac{S_n}{n}-x\right|<\delta\right] \\ &\quad+ \mathbb{E}\!\left[\left|f\!\left(\frac{S_n}{n}\right)-f(x)\right|; \left|\frac{S_n}{n}-x\right|\geq\delta\right] \\ &\leq \varepsilon+2M\mathbb{P}\!\left(\left|\frac{S_n}{n}-x\right|\geq\delta\right). \end{aligned}

By Chebyshev's inequality,

P ⁣(Snnxδ)Var(Sn/n)δ2=x(1x)nδ214nδ2.\mathbb{P}\!\left(\left|\frac{S_n}{n}-x\right|\geq\delta\right) \leq \frac{\operatorname{Var}(S_n/n)}{\delta^2} = \frac{x(1-x)}{n\delta^2} \leq \frac{1}{4n\delta^2}.

Thus

sup0x1E ⁣[f ⁣(Snn)]f(x)ε+M2nδ2.\sup_{0\leq x\leq 1} \left| \mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right]-f(x) \right| \leq \varepsilon+\frac{M}{2n\delta^2}.

Letting nn\to\infty and then ε0\varepsilon\downarrow 0 proves the result.

Problem: 4.4.3

Let X1,,XnX_1,\cdots,X_n be i.i.d. random variables with E[X1]=0\mathbb{E}[X_1]=0 and E[X14]<\mathbb{E}[X_1^4]<\infty. Without using the strong law directly, prove that

1nk=1nXka.s.0.\frac1n\sum_{k=1}^{n}X_k\xrightarrow{\text{a.s.}}0.
Proof

Set

Sn=k=1nXk.S_n=\sum_{k=1}^{n}X_k.

Since E[X1]=0\mathbb{E}[X_1]=0, expanding the fourth moment and using independence gives

E[Sn4]=nE[X14]+6(n2)(E[X12])2=O(n2).\mathbb{E}[S_n^4] = n\mathbb{E}[X_1^4] + 6\binom{n}{2}\bigl(\mathbb{E}[X_1^2]\bigr)^2 =O(n^2).

Thus there is a constant C>0C>0 such that, for all nn,

E[Sn4]Cn2.\mathbb{E}[S_n^4]\leq Cn^2.

By Markov's inequality,

P(Sn>nε)E[Sn4]n4ε4Cn2ε4.\mathbb{P}(|S_n|>n\varepsilon) \leq \frac{\mathbb{E}[S_n^4]}{n^4\varepsilon^4} \leq \frac{C}{n^2\varepsilon^4}.

Therefore

n=1P(Sn>nε)<.\sum_{n=1}^{\infty}\mathbb{P}(|S_n|>n\varepsilon)<\infty.

By the first Borel-Cantelli lemma,

P(Sn>nε i.o.)=0.\mathbb{P}(|S_n|>n\varepsilon\ \text{i.o.})=0.

Since ε>0\varepsilon>0 is arbitrary,

Snna.s.0.\frac{S_n}{n}\xrightarrow{\text{a.s.}}0.
Problem: 4.4.4

Let {Xn}\{X_n\} be independent exponential random variables with parameter 11.

11 Prove that (X1Xn)1/n(X_1\cdots X_n)^{1/n} converges a.s., and find the limit.

22 Find the limiting distribution of

n1X1++1Xn.\frac{n}{\frac1{X_1}+\cdots+\frac1{X_n}}.
Proof

(1) Let Yn=logXnY_n=\log X_n. Since XnExp(1)X_n\sim\mathrm{Exp}(1),

E[Y1]<,E[Y1]=0(logx)exdx=γ,\mathbb{E}[|Y_1|]<\infty, \qquad \mathbb{E}[Y_1]=\int_{0}^{\infty}(\log x)e^{-x}\,dx=-\gamma,

where γ\gamma is Euler's constant. By the strong law,

1nk=1nYka.s.γ.\frac1n\sum_{k=1}^{n}Y_k\xrightarrow{\text{a.s.}}-\gamma.

Therefore

(X1Xn)1/n=exp ⁣(1nk=1nYk)a.s.eγ.(X_1\cdots X_n)^{1/n} = \exp\!\left(\frac1n\sum_{k=1}^{n}Y_k\right) \xrightarrow{\text{a.s.}}e^{-\gamma}.

22 Let

Zk=1Xk.Z_k=\frac1{X_k}.

Then Zk0Z_k\geq 0 and {Zk}\{Z_k\} are i.i.d. Also,

E[Z1]=01xexdx=+.\mathbb{E}[Z_1]=\int_{0}^{\infty}\frac1x e^{-x}\,dx=+\infty.

By the previous result,

1nk=1nZka.s.+.\frac1n\sum_{k=1}^{n}Z_k\xrightarrow{\text{a.s.}}+\infty.

Hence

n1X1++1Xn=(1nk=1nZk)1a.s.0.\frac{n}{\frac1{X_1}+\cdots+\frac1{X_n}} = \left(\frac1n\sum_{k=1}^{n}Z_k\right)^{-1} \xrightarrow{\text{a.s.}}0.

Its limiting distribution is therefore the degenerate distribution δ0\delta_0.

Problem: 4.4.5

The interval [0,1][0,1] is divided into nn disjoint subintervals with lengths p1,p2,,pnp_1,p_2,\cdots,p_n. Define the entropy of the partition by

h=i=1npilogpi.h=-\sum_{i=1}^{n}p_i\log p_i.

Let X1,X2,,XmX_1,X_2,\cdots,X_m be independent uniform random variables on [0,1][0,1]. Let Zm(i)Z_m(i) be the number of X1,,XmX_1,\cdots,X_m that fall in the ii-th interval, and define

Rm=i=1npiZm(i).R_m=\prod_{i=1}^{n}p_i^{Z_m(i)}.

Prove that, as mm\to\infty,

logRmma.s.h.\frac{\log R_m}{m}\xrightarrow{\text{a.s.}}-h.
Proof

For each kk, define

Yk=i=1n(logpi)1{Xk falls in interval i}.Y_k=\sum_{i=1}^{n}(\log p_i)\mathbf{1}_{\{X_k\text{ falls in interval }i\}}.

Then {Yk}\{Y_k\} are i.i.d., and

P(Yk=logpi)=pi,1in.\mathbb{P}(Y_k=\log p_i)=p_i,\qquad 1\leq i\leq n.

Thus

E[Y1]=i=1npilogpi=h.\mathbb{E}[Y_1]=\sum_{i=1}^{n}p_i\log p_i=-h.

Also,

logRm=i=1nZm(i)logpi=k=1mYk.\log R_m=\sum_{i=1}^{n}Z_m(i)\log p_i=\sum_{k=1}^{m}Y_k.

By the strong law,

logRmm=1mk=1mYka.s.E[Y1]=h.\frac{\log R_m}{m} = \frac1m\sum_{k=1}^{m}Y_k \xrightarrow{\text{a.s.}} \mathbb{E}[Y_1] =-h.
Problem: 4.4.7

Let {Xk:k2}\{X_k:k\geq 2\} be independent random variables such that

P(Xk=2k)=P(Xk=2k)=12klogk,P(Xk=0)=11klogk.\mathbb{P}(X_k=2k)=\mathbb{P}(X_k=-2k)=\frac{1}{2k\log k}, \qquad \mathbb{P}(X_k=0)=1-\frac{1}{k\log k}.

Set

Sn=X2++Xn.S_n=X_2+\cdots+X_n.

Prove that

SnnP0,Snn(n1)a.s.0,\frac{S_n}{n}\xrightarrow{P}0, \qquad \frac{S_n}{n(n-1)}\xrightarrow{\text{a.s.}}0,

but

Snn\frac{S_n}{n}

does not converge to 00 a.s.

Proof

First,

E[Xk]=0.\mathbb{E}[X_k]=0.

Also,

E[Xk2]=4k21klogkCklogk.\mathbb{E}[X_k^2] =4k^2\cdot\frac1{k\log k} \leq C\frac{k}{\log k}.

Therefore

Var ⁣(Snn)=1n2k=2nE[Xk2].\operatorname{Var}\!\left(\frac{S_n}{n}\right) = \frac1{n^2}\sum_{k=2}^{n}\mathbb{E}[X_k^2].

Since

k=2nklogkknklog2+k>n2klogn=O ⁣(n2logn),\sum_{k=2}^{n}\frac{k}{\log k} \leq \sum_{k\leq \sqrt n}\frac{k}{\log 2} + \sum_{k>\sqrt n}\frac{2k}{\log n} = O\!\left(\frac{n^2}{\log n}\right),

we get

Var ⁣(Snn)=O ⁣(1logn)0.\operatorname{Var}\!\left(\frac{S_n}{n}\right) =O\!\left(\frac1{\log n}\right)\to 0.

By Chebyshev's inequality,

SnnP0.\frac{S_n}{n}\xrightarrow{P}0.

For almost sure convergence under the larger normalization, the same estimate gives

Var(Sn)=k=2nE[Xk2]=O ⁣(n2logn).\operatorname{Var}(S_n) = \sum_{k=2}^{n}\mathbb{E}[X_k^2] = O\!\left(\frac{n^2}{\log n}\right).

For every ε>0\varepsilon>0,

P ⁣(Snn(n1)>ε)Var(Sn)ε2n2(n1)2=O ⁣(1n2logn).\mathbb{P}\!\left(\left|\frac{S_n}{n(n-1)}\right|>\varepsilon\right) \leq \frac{\operatorname{Var}(S_n)}{\varepsilon^2n^2(n-1)^2} = O\!\left(\frac1{n^2\log n}\right).

Hence

n=2P ⁣(Snn(n1)>ε)<.\sum_{n=2}^{\infty} \mathbb{P}\!\left(\left|\frac{S_n}{n(n-1)}\right|>\varepsilon\right) <\infty.

By the first Borel-Cantelli lemma,

Snn(n1)a.s.0.\frac{S_n}{n(n-1)}\xrightarrow{\text{a.s.}}0.

Finally, prove that Sn/nS_n/n does not converge to 00 a.s. Let

An={Xn=2n}.A_n=\{X_n=2n\}.

Then {An}\{A_n\} are independent, and

n=2P(An)=n=212nlogn=.\sum_{n=2}^{\infty}\mathbb{P}(A_n) = \sum_{n=2}^{\infty}\frac1{2n\log n} =\infty.

By the second Borel-Cantelli lemma, AnA_n occurs infinitely often a.s. If

Snna.s.0,\frac{S_n}{n}\xrightarrow{\text{a.s.}}0,

then

Sn1n=n1nSn1n1a.s.0.\frac{S_{n-1}}{n} = \frac{n-1}{n}\cdot\frac{S_{n-1}}{n-1} \xrightarrow{\text{a.s.}}0.

But on AnA_n,

Snn=Sn1n+2.\frac{S_n}{n}=\frac{S_{n-1}}{n}+2.

Since AnA_n occurs infinitely often, this contradicts Sn/n0S_n/n\to 0. Therefore Sn/nS_n/n does not converge to 00 a.s.

Exercise 5.1

Note

For characteristic functions, independent sums correspond to products, and linear changes correspond to rescaling. Convergence in distribution can often be checked through pointwise convergence of characteristic functions.

Problem: 5.1.1

The density of XX is

f(x)=12ex,<x<.f(x)=\frac12e^{-|x|},\qquad -\infty<x<\infty.

Find the characteristic function of XX.

Proof
ϕX(t)=E[eitX]=12eitxxdx=0excos(tx)dx=11+t2.\begin{aligned} \phi_X(t) &= \mathbb{E}[e^{itX}] = \frac12\int_{-\infty}^{\infty}e^{itx-|x|}\,dx \\ &= \int_{0}^{\infty}e^{-x}\cos(tx)\,dx = \frac1{1+t^2}. \end{aligned}
Problem: 5.1.2

Assume that {U,V}\{U,V\} is independent of {X,Y}\{X,Y\}, and let

Z=UX+VYU2+V2.Z=\frac{UX+VY}{\sqrt{U^2+V^2}}.

Prove that if XX and YY are independent N(0,1)N(0,1) random variables, then ZN(0,1)Z\sim N(0,1). If (X,Y)(X,Y) is only standard bivariate normal, does the conclusion still hold?

Proof

If X,YX,Y are independent and both have distribution N(0,1)N(0,1), then for every fixed (u,v)R2(u,v)\in\mathbb{R}^2,

uX+vYN(0,u2+v2).uX+vY\sim N(0,u^2+v^2).

Thus, conditional on (U,V)=(u,v)(U,V)=(u,v),

Z(U,V)=(u,v)N(0,1).Z\mid (U,V)=(u,v)\sim N(0,1).

Equivalently, for every tRt\in\mathbb{R},

E[eitZU,V]=et2/2.\mathbb{E}[e^{itZ}\mid U,V]=e^{-t^2/2}.

Taking expectations gives

E[eitZ]=et2/2,\mathbb{E}[e^{itZ}]=e^{-t^2/2},

so ZN(0,1)Z\sim N(0,1).

If (X,Y)(X,Y) is only standard bivariate normal and independence is not assumed, the conclusion need not hold. Let

Cov(X,Y)=ρ0,\operatorname{Cov}(X,Y)=\rho\neq 0,

and take U=V=1U=V=1. Then

Z=X+Y2,Z=\frac{X+Y}{\sqrt2},

so

Var(Z)=12Var(X+Y)=12(1+1+2ρ)=1+ρ1.\operatorname{Var}(Z) = \frac12\operatorname{Var}(X+Y) = \frac12(1+1+2\rho) =1+\rho\neq 1.

Thus Z≁N(0,1)Z\not\sim N(0,1) in general.

Problem: 5.1.3

Let

ϕ(t)=(sintt)2.\phi(t)=\left(\frac{\sin t}{t}\right)^2.

Use a probabilistic argument to prove that, for real numbers t1,,tnt_1,\cdots,t_n, the matrix

Hn=(ϕ(titj))i,j=1nH_n=\bigl(\phi(t_i-t_j)\bigr)_{i,j=1}^{n}

is nonnegative definite.

Proof

Take independent random variables X,YU[1,1]X,Y\sim U[-1,1]. Then

ϕX(t)=ϕY(t)=sintt.\phi_X(t)=\phi_Y(t)=\frac{\sin t}{t}.

Hence

ϕX+Y(t)=ϕX(t)ϕY(t)=(sintt)2=ϕ(t).\phi_{X+Y}(t)=\phi_X(t)\phi_Y(t) = \left(\frac{\sin t}{t}\right)^2 =\phi(t).

Thus ϕ\phi is the characteristic function of the random variable X+YX+Y.

For any complex numbers c1,,cnc_1,\cdots,c_n,

i,j=1ncicjϕ(titj)=i,j=1ncicjE ⁣[ei(titj)(X+Y)]=E ⁣[j=1ncjeitj(X+Y)2]0.\begin{aligned} \sum_{i,j=1}^{n}c_i\overline{c_j}\phi(t_i-t_j) &= \sum_{i,j=1}^{n}c_i\overline{c_j} \mathbb{E}\!\left[e^{i(t_i-t_j)(X+Y)}\right] \\ &= \mathbb{E}\!\left[\left|\sum_{j=1}^{n}c_j e^{it_j(X+Y)}\right|^2\right]\\ &\geq 0. \end{aligned}

Therefore HnH_n is nonnegative definite.

Problem: 5.1.5

Let X1,X2,,XnX_1,X_2,\cdots,X_n be independent random variables, and set

Yn=X12+X22++Xn2.Y_n=X_1^2+X_2^2+\cdots+X_n^2.

11 If XiN(i,1)X_i\sim N(i,1), find the characteristic function of YnY_n.

22 If XiN(1,1)X_i\sim N(1,1), and if NP(λ)N\sim P(\lambda) is independent of all XiX_i, find the characteristic function of YNY_N.

Proof

If XN(μ,1)X\sim N(\mu,1), then

E[eitX2]=12πRexp ⁣(itx2(xμ)22)dx=112itexp ⁣(iμ2t12it).\begin{aligned} \mathbb{E}[e^{itX^2}] &= \frac1{\sqrt{2\pi}} \int_{\mathbb{R}} \exp\!\left(itx^2-\frac{(x-\mu)^2}{2}\right)\,dx\\ &= \frac1{\sqrt{1-2it}} \exp\!\left(\frac{i\mu^2t}{1-2it}\right). \end{aligned}

11 By independence,

ϕYn(t)=k=1nE[eitXk2]=(12it)n/2exp ⁣(it12itk=1nk2).\phi_{Y_n}(t) = \prod_{k=1}^{n}\mathbb{E}[e^{itX_k^2}] = (1-2it)^{-n/2} \exp\!\left(\frac{it}{1-2it}\sum_{k=1}^{n}k^2\right).

Thus

ϕYn(t)=(12it)n/2exp ⁣(it12itn(n+1)(2n+1)6).\phi_{Y_n}(t) = (1-2it)^{-n/2} \exp\!\left(\frac{it}{1-2it}\cdot\frac{n(n+1)(2n+1)}6\right).

22 In this case,

ϕX12(t)=(12it)1/2exp ⁣(it12it).\phi_{X_1^2}(t) = (1-2it)^{-1/2} \exp\!\left(\frac{it}{1-2it}\right).

Conditional on N=mN=m,

ϕYNN=m(t)=ϕX12(t)m.\phi_{Y_N\mid N=m}(t)=\phi_{X_1^2}(t)^m.

Therefore

ϕYN(t)=E[ϕX12(t)N]=exp{λ(ϕX12(t)1)}.\phi_{Y_N}(t) = \mathbb{E}[\phi_{X_1^2}(t)^N] = \exp\{\lambda(\phi_{X_1^2}(t)-1)\}.

That is,

ϕYN(t)=exp ⁣{λ((12it)1/2exp ⁣(it12it)1)}.\phi_{Y_N}(t) = \exp\!\left\{\lambda\left((1-2it)^{-1/2} \exp\!\left(\frac{it}{1-2it}\right)-1\right)\right\}.
Problem: 5.1.7

Let X1,,XnX_1,\cdots,X_n be i.i.d., and let

Sn=X1++Xn.S_n=X_1+\cdots+X_n.

11 If the moment generating function M(t)=E[etX1]M(t)=\mathbb{E}[e^{tX_1}] exists, prove the tail bound

P(X1a)inft>0{eatM(t)}.\mathbb{P}(X_1\geq a) \leq \inf_{t>0}\{e^{-at}M(t)\}.

22 If P(X1=1)=P(X1=1)=12\mathbb{P}(X_1=1)=\mathbb{P}(X_1=-1)=\frac12, prove that for every a>0a>0,

P(Sna)ea2/(2n).\mathbb{P}(S_n\geq a)\leq e^{-a^2/(2n)}.
Proof

(1) For every t>0t>0, Markov's inequality gives

P(X1a)=P(etX1eta)etaE[etX1]=etaM(t).\mathbb{P}(X_1\geq a) = \mathbb{P}(e^{tX_1}\geq e^{ta}) \leq e^{-ta}\mathbb{E}[e^{tX_1}] = e^{-ta}M(t).

Taking the infimum over t>0t>0 gives the result.

22 Apply (1) to SnS_n:

P(Sna)eatE[etSn]=eat(E[etX1])n.\mathbb{P}(S_n\geq a) \leq e^{-at}\mathbb{E}[e^{tS_n}] = e^{-at}\bigl(\mathbb{E}[e^{tX_1}]\bigr)^n.

Now

E[etX1]=et+et2=cosht,\mathbb{E}[e^{tX_1}] = \frac{e^t+e^{-t}}2 = \cosh t,

and

cosht=m=0t2m(2m)!m=0(t2/2)mm!=et2/2.\cosh t = \sum_{m=0}^{\infty}\frac{t^{2m}}{(2m)!} \leq \sum_{m=0}^{\infty}\frac{(t^2/2)^m}{m!} = e^{t^2/2}.

Thus

P(Sna)exp ⁣(at+nt22).\mathbb{P}(S_n\geq a) \leq \exp\!\left(-at+\frac{nt^2}{2}\right).

Taking t=a/nt=a/n gives

P(Sna)ea2/(2n).\mathbb{P}(S_n\geq a)\leq e^{-a^2/(2n)}.
Problem: 5.1.8

A random variable XX is called sub-Gaussian if, for some constant K>0K>0,

P(Xt)2et2/K2,t0.\mathbb{P}(|X|\geq t)\leq 2e^{-t^2/K^2}, \qquad \forall t\geq 0.

Prove:

11 If

E[esX]es2/2,sR,\mathbb{E}[e^{sX}]\leq e^{s^2/2}, \qquad \forall s\in\mathbb{R},

then XX is sub-Gaussian.

22 The moments of a sub-Gaussian random variable satisfy

E[Xp](K1p)p,p1,\mathbb{E}[|X|^p]\leq (K_1\sqrt p)^p, \qquad \forall p\geq 1,

where K1K_1 is a positive constant independent of pp. You may use Stirling's formula

n!nnen2πn.n!\sim n^n e^{-n}\sqrt{2\pi n}.
Proof

(1) For s,t>0s,t>0, Markov's inequality gives

P(Xt)=P(esXest)estE[esX]est+s2/2.\mathbb{P}(X\geq t) = \mathbb{P}(e^{sX}\geq e^{st}) \leq e^{-st}\mathbb{E}[e^{sX}] \leq e^{-st+s^2/2}.

Taking s=ts=t yields

P(Xt)et2/2.\mathbb{P}(X\geq t)\leq e^{-t^2/2}.

Applying the same argument to X-X gives

P(Xt)et2/2.\mathbb{P}(X\leq -t)\leq e^{-t^2/2}.

Therefore

P(Xt)2et2/2,\mathbb{P}(|X|\geq t)\leq 2e^{-t^2/2},

so XX is sub-Gaussian.

22 By the tail integral formula,

E[Xp]=0ptp1P(X>t)dt2p0tp1et2/K2dt.\mathbb{E}[|X|^p] = \int_{0}^{\infty}pt^{p-1}\mathbb{P}(|X|>t)\,dt \leq 2p\int_{0}^{\infty}t^{p-1}e^{-t^2/K^2}\,dt.

With u=t2/K2u=t^2/K^2,

E[Xp]pKp0up/21eudu=pKpΓ(p/2)=2KpΓ(p/2+1).\mathbb{E}[|X|^p] \leq pK^p\int_{0}^{\infty}u^{p/2-1}e^{-u}\,du = pK^p\Gamma(p/2) = 2K^p\Gamma(p/2+1).

By Stirling's formula, there is a constant C>0C>0 such that for all p1p\geq 1,

Γ(p/2+1)Cppp/2.\Gamma(p/2+1)\leq C^p p^{p/2}.

Hence

E[Xp](K1p)p\mathbb{E}[|X|^p]\leq (K_1\sqrt p)^p

for a constant K1K_1 independent of pp.

Exercise 5.2

Note

This section looks at convergence in distribution and how independence passes to limits. The Cauchy example is a warning: without a first moment, the usual law-of-large-numbers intuition does not apply.

Problem: 5.2.2

Suppose Xn,YnX_n,Y_n are independent, X,YX,Y are independent, and

XnDX,YnDY.X_n\xrightarrow{D}X,\qquad Y_n\xrightarrow{D}Y.

Prove that

Xn+YnDX+Y.X_n+Y_n\xrightarrow{D}X+Y.
Proof

By independence,

ϕXn+Yn(t)=ϕXn(t)ϕYn(t).\phi_{X_n+Y_n}(t)=\phi_{X_n}(t)\phi_{Y_n}(t).

Since XnDXX_n\xrightarrow{D}X and YnDYY_n\xrightarrow{D}Y, for every tRt\in\mathbb{R},

ϕXn(t)ϕX(t),ϕYn(t)ϕY(t).\phi_{X_n}(t)\to\phi_X(t), \qquad \phi_{Y_n}(t)\to\phi_Y(t).

Because X,YX,Y are independent,

ϕX(t)ϕY(t)=ϕX+Y(t).\phi_X(t)\phi_Y(t)=\phi_{X+Y}(t).

Thus

ϕXn+Yn(t)ϕX+Y(t).\phi_{X_n+Y_n}(t)\to\phi_{X+Y}(t).

By Levy's continuity theorem,

Xn+YnDX+Y.X_n+Y_n\xrightarrow{D}X+Y.
Problem: 5.2.3

Let X1,,XnX_1,\cdots,X_n be independent Cauchy random variables. Prove that

1nk=1nXk\frac1n\sum_{k=1}^{n}X_k

also has the Cauchy distribution.

Proof

First compute the characteristic function of the standard Cauchy distribution. If XX has density

f(x)=1π(1+x2),f(x)=\frac1{\pi(1+x^2)},

then

ϕX(t)=1πeitx1+x2dx.\phi_X(t) = \frac1\pi\int_{-\infty}^{\infty}\frac{e^{itx}}{1+x^2}\,dx.

For t>0t>0, consider

g(z)=eitz1+z2,g(z)=\frac{e^{itz}}{1+z^2},

and integrate over the upper half-plane semicircle. By Jordan's lemma, the integral over the arc tends to 00. The only pole inside the contour is z=iz=i, with residue

Res(g,i)=limzieitzz+i=et2i.\operatorname{Res}(g,i) = \lim_{z\to i}\frac{e^{itz}}{z+i} = \frac{e^{-t}}{2i}.

The residue theorem gives

eitx1+x2dx=2πiet2i=πet.\int_{-\infty}^{\infty}\frac{e^{itx}}{1+x^2}\,dx = 2\pi i\cdot\frac{e^{-t}}{2i} = \pi e^{-t}.

Thus

ϕX(t)=et,t>0.\phi_X(t)=e^{-t},\qquad t>0.

Since ff is even,

ϕX(t)=1πcos(tx)1+x2dx,\phi_X(t) = \frac1\pi\int_{-\infty}^{\infty}\frac{\cos(tx)}{1+x^2}\,dx,

so ϕX\phi_X is even. Hence, for t<0t<0,

ϕX(t)=ϕX(t)=et.\phi_X(t)=\phi_X(-t)=e^{t}.

Together with ϕX(0)=1\phi_X(0)=1,

ϕX(t)=et,tR.\phi_X(t)=e^{-|t|},\qquad t\in\mathbb{R}.

Therefore

ϕXk/n(t)=ϕXk ⁣(tn)=et/n.\phi_{X_k/n}(t)=\phi_{X_k}\!\left(\frac{t}{n}\right)=e^{-|t|/n}.

By independence,

ϕ1nk=1nXk(t)=k=1nϕXk/n(t)=(et/n)n=et.\phi_{\frac1n\sum_{k=1}^{n}X_k}(t) = \prod_{k=1}^{n}\phi_{X_k/n}(t) = \left(e^{-|t|/n}\right)^n = e^{-|t|}.

This is the characteristic function of the standard Cauchy distribution, so

1nk=1nXk\frac1n\sum_{k=1}^{n}X_k

is again Cauchy.

Problem: 5.2.5

Let ϕn(t)=cosnt\phi_n(t)=\cos^n t, tRt\in\mathbb{R}.

11 Find the distribution function corresponding to the characteristic function ϕ2(t)\phi_2(t).

22 For general positive integers nn, is ϕn(t)\phi_n(t) a characteristic function? Answer and explain.

Proof

(1) Define a random variable XX by

P(X=2)=14,P(X=0)=12,P(X=2)=14.\mathbb{P}(X=-2)=\frac14,\qquad \mathbb{P}(X=0)=\frac12,\qquad \mathbb{P}(X=2)=\frac14.

Then

ϕX(t)=14e2it+12+14e2it=cos2t.\phi_X(t)=\frac14e^{-2it}+\frac12+\frac14e^{2it}=\cos^2t.

Hence the distribution function corresponding to ϕ2\phi_2 is

F2(x)={0,x<2,14,2x<0,34,0x<2,1,x2.F_2(x)= \begin{cases} 0, & x<-2,\\ \frac14, & -2\leq x<0,\\ \frac34, & 0\leq x<2,\\ 1, & x\geq 2. \end{cases}

22 For any positive integer nn, let Y1,,YnY_1,\cdots,Y_n be i.i.d. random variables with

P(Yk=1)=P(Yk=1)=12.\mathbb{P}(Y_k=1)=\mathbb{P}(Y_k=-1)=\frac12.

Then

ϕYk(t)=12(eit+eit)=cost.\phi_{Y_k}(t)=\frac12(e^{it}+e^{-it})=\cos t.

By independence,

ϕY1++Yn(t)=k=1nϕYk(t)=cosnt=ϕn(t).\phi_{Y_1+\cdots+Y_n}(t) = \prod_{k=1}^{n}\phi_{Y_k}(t) = \cos^n t = \phi_n(t).

Thus ϕn(t)\phi_n(t) is a characteristic function for every positive integer nn.

Exercise 5.3

Note

For central limit theorem problems, first identify the centering and scaling. If the variance depends on nn, compute the scale before applying a theorem.

Problem: 5.3.1

Choose suitable sequences {μn}\{\mu_n\} and {σn}\{\sigma_n\} to prove

XnμnσnDN(0,1).\frac{X_n-\mu_n}{\sigma_n}\xrightarrow{D}N(0,1).

11 XnX_n has the Poisson distribution with positive integer parameter nn.

22 XnX_n has the Gamma density

f(x)=xn1exΓ(n)1x0.f(x)=\frac{x^{n-1}e^{-x}}{\Gamma(n)}\mathbf{1}_{x\geq 0}.
Proof

(1) If Y1,,YnY_1,\cdots,Y_n are i.i.d. with YiP(1)Y_i\sim P(1), then

Xn:=Y1++YnP(n).X_n':=Y_1+\cdots+Y_n\sim P(n).

Thus XnX_n' and XnX_n have the same distribution. By the i.i.d. CLT,

XnnnDN(0,1).\frac{X_n'-n}{\sqrt n}\xrightarrow{D}N(0,1).

So take

μn=n,σn=n.\mu_n=n,\qquad \sigma_n=\sqrt n.

Then

XnμnσnDN(0,1).\frac{X_n-\mu_n}{\sigma_n}\xrightarrow{D}N(0,1).

22 If Z1,,ZnZ_1,\cdots,Z_n are i.i.d. exponential random variables with parameter 11, then

Xn:=Z1++ZnX_n':=Z_1+\cdots+Z_n

has density

f(x)=xn1exΓ(n)1x0.f(x)=\frac{x^{n-1}e^{-x}}{\Gamma(n)}\mathbf{1}_{x\geq 0}.

Thus XnX_n' and XnX_n have the same distribution. By the i.i.d. CLT,

XnnnDN(0,1).\frac{X_n'-n}{\sqrt n}\xrightarrow{D}N(0,1).

Again take

μn=n,σn=n.\mu_n=n,\qquad \sigma_n=\sqrt n.

This gives

XnμnσnDN(0,1).\frac{X_n-\mu_n}{\sigma_n}\xrightarrow{D}N(0,1).
Problem: 5.3.3

Let X1,,XnX_1,\cdots,X_n be i.i.d. random variables with

P(X1=1)=P(X1=1)=12.\mathbb{P}(X_1=1)=\mathbb{P}(X_1=-1)=\frac12.

Prove that

3n3/2k=1nkXkDN(0,1).\frac{\sqrt3}{n^{3/2}}\sum_{k=1}^{n}kX_k\xrightarrow{D}N(0,1).
Proof

Let

Yn,k=kXk,1kn.Y_{n,k}=kX_k,\qquad 1\leq k\leq n.

Then {Yn,k}k=1n\{Y_{n,k}\}_{k=1}^{n} are independent, with

E[Yn,k]=0,Var(Yn,k)=k2.\mathbb{E}[Y_{n,k}]=0,\qquad \operatorname{Var}(Y_{n,k})=k^2.

Set

Bn2=k=1nVar(Yn,k)=k=1nk2=n(n+1)(2n+1)6.B_n^2=\sum_{k=1}^{n}\operatorname{Var}(Y_{n,k}) = \sum_{k=1}^{n}k^2 = \frac{n(n+1)(2n+1)}6.

For every ε>0\varepsilon>0, when nn is large enough, Bnn3/2B_n\asymp n^{3/2}, so

Yn,k=kn<εBn,1kn.|Y_{n,k}|=k\leq n<\varepsilon B_n,\qquad 1\leq k\leq n.

Hence

k=1nE ⁣[Yn,k2; Yn,k>εBn]=0,\sum_{k=1}^{n} \mathbb{E}\!\left[Y_{n,k}^2;\ |Y_{n,k}|>\varepsilon B_n\right]=0,

so the Lindeberg condition holds. By the Lindeberg-Feller CLT,

k=1nkXkBnDN(0,1).\frac{\sum_{k=1}^{n}kX_k}{B_n}\xrightarrow{D}N(0,1).

Since

Bnn3/2=(n+1)(2n+1)6n213,\frac{B_n}{n^{3/2}} = \sqrt{\frac{(n+1)(2n+1)}{6n^2}} \longrightarrow \frac1{\sqrt3},

Slutsky's theorem gives

3n3/2k=1nkXkDN(0,1).\frac{\sqrt3}{n^{3/2}}\sum_{k=1}^{n}kX_k\xrightarrow{D}N(0,1).

Exercise 5.5

Note

Slutsky's theorem lets us replace a random error by its constant probability limit. The first thing to check is whether the added or multiplied factor converges in probability to a constant.

Problem: 5.5.12

Slutsky's theorem says: if random variables {Xn}\{X_n\}, {Yn}\{Y_n\}, and {Zn}\{Z_n\} satisfy

XnDX,YnPb,ZnPc,X_n\xrightarrow{D}X,\qquad Y_n\xrightarrow{P}b,\qquad Z_n\xrightarrow{P}c,

where XX is a random variable and b,cb,c are constants, then

XnYn+ZnDbX+c.X_nY_n+Z_n\xrightarrow{D}bX+c.

Use Slutsky's theorem to answer the following questions.

11 Let {Xn}\{X_n\} be i.i.d., with E[X1]=0\mathbb{E}[X_1]=0 and finite second moment. Let

X=1nk=1nXk.\overline X=\frac1n\sum_{k=1}^{n}X_k.

Prove that

k=1nXkk=1n(XkX)2DN(0,1).\frac{\sum_{k=1}^{n}X_k} {\sqrt{\sum_{k=1}^{n}(X_k-\overline X)^2}} \xrightarrow{D}N(0,1).

22 Let {Xn}\{X_n\} be independent and satisfy

P(Xn=±2n)=12n+1,P(Xn=±1)=1212n+1.\mathbb{P}(X_n=\pm 2^n)=\frac1{2^{n+1}}, \qquad \mathbb{P}(X_n=\pm 1)=\frac12-\frac1{2^{n+1}}.

Prove that

1nk=1nXkDN(0,1).\frac1{\sqrt n}\sum_{k=1}^{n}X_k\xrightarrow{D}N(0,1).

33 Let {Xn}\{X_n\} be i.i.d. with

E[X1]=Var(X1)=1.\mathbb{E}[X_1]=\operatorname{Var}(X_1)=1.

Set

Sn=k=1nXk.S_n=\sum_{k=1}^{n}X_k.

Prove that

Sn3/2n3/232nDN(0,1).\frac{S_n^{3/2}-n^{3/2}}{\frac32 n}\xrightarrow{D}N(0,1).
Proof

(1) Let σ2=Var(X1)\sigma^2=\operatorname{Var}(X_1). By the CLT,

k=1nXkσnDN(0,1).\frac{\sum_{k=1}^{n}X_k}{\sigma\sqrt n}\xrightarrow{D}N(0,1).

Also,

1nk=1n(XkX)2=1nk=1nXk2X2.\frac1n\sum_{k=1}^{n}(X_k-\overline X)^2 = \frac1n\sum_{k=1}^{n}X_k^2-\overline X^{\,2}.

By the weak law of large numbers,

1nk=1nXk2PE[X12]=σ2,XP0.\frac1n\sum_{k=1}^{n}X_k^2\xrightarrow{P}\mathbb{E}[X_1^2]=\sigma^2, \qquad \overline X\xrightarrow{P}0.

Therefore

1nk=1n(XkX)2Pσ2,σ1nk=1n(XkX)2P1.\frac1n\sum_{k=1}^{n}(X_k-\overline X)^2\xrightarrow{P}\sigma^2, \qquad \frac{\sigma}{\sqrt{\frac1n\sum_{k=1}^{n}(X_k-\overline X)^2}} \xrightarrow{P}1.

Slutsky's theorem gives

k=1nXkk=1n(XkX)2=k=1nXkσnσ1nk=1n(XkX)2DN(0,1).\frac{\sum_{k=1}^{n}X_k} {\sqrt{\sum_{k=1}^{n}(X_k-\overline X)^2}} = \frac{\sum_{k=1}^{n}X_k}{\sigma\sqrt n} \cdot \frac{\sigma}{\sqrt{\frac1n\sum_{k=1}^{n}(X_k-\overline X)^2}} \xrightarrow{D}N(0,1).

22 We may write

Xk=(1Bk)εk+Bk2kηk,X_k=(1-B_k)\varepsilon_k+B_k2^k\eta_k,

where {Bk}\{B_k\}, {εk}\{\varepsilon_k\}, and {ηk}\{\eta_k\} are mutually independent, and

P(Bk=1)=2k,P(εk=±1)=P(ηk=±1)=12.\mathbb{P}(B_k=1)=2^{-k}, \qquad \mathbb{P}(\varepsilon_k=\pm1)=\mathbb{P}(\eta_k=\pm1)=\frac12.

Then XkX_k has the required distribution. Set

Tn=k=1nεk,Rn=k=1nBk(2kηkεk).T_n=\sum_{k=1}^{n}\varepsilon_k, \qquad R_n=\sum_{k=1}^{n}B_k(2^k\eta_k-\varepsilon_k).

Then

k=1nXk=Tn+Rn.\sum_{k=1}^{n}X_k=T_n+R_n.

Since

k=1P(Bk=1)=k=12k<,\sum_{k=1}^{\infty}\mathbb{P}(B_k=1) = \sum_{k=1}^{\infty}2^{-k} <\infty,

the first Borel-Cantelli lemma shows that {Bk=1}\{B_k=1\} occurs only finitely many times. Hence RnR_n is eventually constant a.s., and

Rnna.s.0.\frac{R_n}{\sqrt n}\xrightarrow{\text{a.s.}}0.

By the CLT,

TnnDN(0,1).\frac{T_n}{\sqrt n}\xrightarrow{D}N(0,1).

Slutsky's theorem gives

1nk=1nXk=Tnn+RnnDN(0,1).\frac1{\sqrt n}\sum_{k=1}^{n}X_k = \frac{T_n}{\sqrt n}+\frac{R_n}{\sqrt n} \xrightarrow{D}N(0,1).

33 Let

Tn=Snnn,Un=Snn.T_n=\frac{S_n-n}{\sqrt n}, \qquad U_n=\frac{S_n}{n}.

By the CLT,

TnDN(0,1).T_n\xrightarrow{D}N(0,1).

By the weak law,

UnP1.U_n\xrightarrow{P}1.

Also,

Sn3/2n3/232n=Tn23Un3/21Un1.\frac{S_n^{3/2}-n^{3/2}}{\frac32 n} = T_n\cdot\frac23\cdot\frac{U_n^{3/2}-1}{U_n-1}.

Define

g(u)=23u3/21u1(u1),g(1)=1.g(u)=\frac23\cdot\frac{u^{3/2}-1}{u-1}\quad (u\neq 1), \qquad g(1)=1.

Then gg is continuous at u=1u=1, so

g(Un)P1.g(U_n)\xrightarrow{P}1.

Slutsky's theorem gives

Sn3/2n3/232nDN(0,1).\frac{S_n^{3/2}-n^{3/2}}{\frac32 n}\xrightarrow{D}N(0,1).

Exercise 5.4

Note

This section moves into stronger limit theorems and Stein's method. When reading the proofs, separate weak convergence, moment bounds, and integrability.

Problem: 5.4.1

Let X1,X2,X_1,X_2,\dots be i.i.d. with

P(X1=1)=P(X1=1)=12.\mathbb{P}(X_1=1)=\mathbb{P}(X_1=-1)=\frac12.

Prove that for every δ>0\delta>0,

1n1/2+δk=1nXka.s.0.\frac1{n^{1/2+\delta}}\sum_{k=1}^{n}X_k \xrightarrow{\text{a.s.}}0.
Proof

Chebyshev's inequality alone only reaches δ>1/2\delta>1/2, so we use higher moments. Let mm be a positive integer to be chosen later, and set

Sn=k=1nXk.S_n=\sum_{k=1}^{n}X_k.

For ε>0\varepsilon>0, Markov's inequality gives

P(Snn1/2+δ>ε)=P(Snn1/2+δ2m>ε2m)ESn2mε2mnm+2mδ.\mathbb{P}\left(\left|\frac{S_n}{n^{1/2+\delta}}\right|>\varepsilon\right) = \mathbb{P}\left(\left|\frac{S_n}{n^{1/2+\delta}}\right|^{2m}>\varepsilon^{2m}\right) \leq \frac{\mathbb{E}|S_n|^{2m}}{\varepsilon^{2m}n^{m+2m\delta}}.

We estimate ESn2m\mathbb{E}|S_n|^{2m}. Expanding,

ESn2m=i1,,i2m=1nE(Xi1Xi2m).\mathbb{E}S_n^{2m} = \sum_{i_1,\dots,i_{2m}=1}^{n} \mathbb{E}(X_{i_1}\cdots X_{i_{2m}}).

Since the XiX_i are independent and EXi=0\mathbb{E}X_i=0, a term vanishes if some index appears an odd number of times. Thus, in every nonzero term, the number of distinct indices is at most mm. Hence there is a constant CmC_m, depending only on mm, such that

ESn2mCmnm.\mathbb{E}S_n^{2m}\leq C_m n^m.

Therefore

P(Snn1/2+δ>ε)Cmε2mn2mδ.\mathbb{P}\left(\left|\frac{S_n}{n^{1/2+\delta}}\right|>\varepsilon\right) \leq \frac{C_m}{\varepsilon^{2m}}\,n^{-2m\delta}.

Choose mm so that

2mδ>1.2m\delta>1.

Then

n=1P(Snn1/2+δ>ε)<.\sum_{n=1}^{\infty} \mathbb{P}\left(\left|\frac{S_n}{n^{1/2+\delta}}\right|>\varepsilon\right) <\infty.

By Borel-Cantelli,

P(Snn1/2+δ>ε i.o.)=0.\mathbb{P}\left( \left|\frac{S_n}{n^{1/2+\delta}}\right|>\varepsilon \ \text{i.o.} \right)=0.

Thus, for every fixed ε>0\varepsilon>0, almost surely there is N(ω)N(\omega) such that for nN(ω)n\geq N(\omega),

Snn1/2+δε.\left|\frac{S_n}{n^{1/2+\delta}}\right|\leq \varepsilon.

Letting ε\varepsilon run over the positive rationals gives

Snn1/2+δa.s.0.\frac{S_n}{n^{1/2+\delta}}\xrightarrow{\mathrm{a.s.}}0.
Problem: 5.4.4

Let {Xk}\{X_k\} be i.i.d. random variables with

EX1=0,Var(X1)=1,EX13<.\mathbb{E}X_1=0,\qquad \operatorname{Var}(X_1)=1,\qquad \mathbb{E}|X_1|^3<\infty.

Use the Lindeberg replacement method to prove the CLT convergence rate

suptRP(1nk=1nXkt)Φ(t)=O(n1/8).\sup_{t\in\mathbb{R}} \left| \mathbb{P}\left(\frac1{\sqrt n}\sum_{k=1}^{n}X_k\leq t\right) -\Phi(t) \right| = O(n^{-1/8}).

Here Φ(t)\Phi(t) is the standard normal distribution function.

Proof

Set

Sn=k=1nXk,Wn=Snn.S_n=\sum_{k=1}^{n}X_k, \qquad W_n=\frac{S_n}{\sqrt n}.

Let Y1,,YnY_1,\dots,Y_n be i.i.d. standard normal random variables, independent of X1,,XnX_1,\dots,X_n, and set

Zn=1nk=1nYk.Z_n=\frac1{\sqrt n}\sum_{k=1}^{n}Y_k.

Then ZnN(0,1)Z_n\sim N(0,1), so

P(Znt)=Φ(t).\mathbb{P}(Z_n\leq t)=\Phi(t).

Fix ε>0\varepsilon>0. Choose a smooth function ft,εC3(R)f_{t,\varepsilon}\in C^3(\mathbb R) such that

1{xt}ft,ε(x)1{xt+ε},\mathbf{1}_{\{x\leq t\}} \leq f_{t,\varepsilon}(x) \leq \mathbf{1}_{\{x\leq t+\varepsilon\}},

and

ft,ε(3)Cε3,\|f_{t,\varepsilon}^{(3)}\|_\infty\leq C\varepsilon^{-3},

where CC is independent of t,ε,nt,\varepsilon,n.

We estimate

Eft,ε(Wn)Eft,ε(Zn).\left| \mathbb{E}f_{t,\varepsilon}(W_n) - \mathbb{E}f_{t,\varepsilon}(Z_n) \right|.

Replace XkX_k by YkY_k one at a time. Let

Tk=1n(Y1++Yk1+Xk+1++Xn).T_k= \frac1{\sqrt n} \left( Y_1+\cdots+Y_{k-1} + X_{k+1}+\cdots+X_n \right).

Then TkT_k is independent of XkX_k and YkY_k. Taylor expansion gives

ft,ε(Tk+Xkn)=ft,ε(Tk)+Xknft,ε(Tk)+Xk22nft,ε(Tk)+Rk,X,f_{t,\varepsilon}\left(T_k+\frac{X_k}{\sqrt n}\right) = f_{t,\varepsilon}(T_k) + \frac{X_k}{\sqrt n}f_{t,\varepsilon}'(T_k) + \frac{X_k^2}{2n}f_{t,\varepsilon}''(T_k) + R_{k,X},

with

Rk,Xft,ε(3)6Xk3n3/2.|R_{k,X}| \leq \frac{\|f_{t,\varepsilon}^{(3)}\|_\infty}{6} \frac{|X_k|^3}{n^{3/2}}.

Similarly,

ft,ε(Tk+Ykn)=ft,ε(Tk)+Yknft,ε(Tk)+Yk22nft,ε(Tk)+Rk,Y,f_{t,\varepsilon}\left(T_k+\frac{Y_k}{\sqrt n}\right) = f_{t,\varepsilon}(T_k) + \frac{Y_k}{\sqrt n}f_{t,\varepsilon}'(T_k) + \frac{Y_k^2}{2n}f_{t,\varepsilon}''(T_k) + R_{k,Y},

and

Rk,Yft,ε(3)6Yk3n3/2.|R_{k,Y}| \leq \frac{\|f_{t,\varepsilon}^{(3)}\|_\infty}{6} \frac{|Y_k|^3}{n^{3/2}}.

Because

EXk=EYk=0,EXk2=EYk2=1,\mathbb{E}X_k=\mathbb{E}Y_k=0, \qquad \mathbb{E}X_k^2=\mathbb{E}Y_k^2=1,

and TkT_k is independent of Xk,YkX_k,Y_k, the first- and second-order terms cancel after taking expectations. Hence

Eft,ε(Tk+Xkn)Eft,ε(Tk+Ykn)Cε3n3/2.\left| \mathbb{E}f_{t,\varepsilon}\left(T_k+\frac{X_k}{\sqrt n}\right) - \mathbb{E}f_{t,\varepsilon}\left(T_k+\frac{Y_k}{\sqrt n}\right) \right| \leq C\varepsilon^{-3}n^{-3/2}.

Summing over k=1,,nk=1,\dots,n gives

Eft,ε(Wn)Eft,ε(Zn)Cε3n1/2.\left| \mathbb{E}f_{t,\varepsilon}(W_n) - \mathbb{E}f_{t,\varepsilon}(Z_n) \right| \leq C\varepsilon^{-3}n^{-1/2}.

Therefore

P(Wnt)Eft,ε(Wn)Eft,ε(Zn)+Cε3n1/2.\mathbb{P}(W_n\leq t) \leq \mathbb{E}f_{t,\varepsilon}(W_n) \leq \mathbb{E}f_{t,\varepsilon}(Z_n) + C\varepsilon^{-3}n^{-1/2}.

Since

ft,ε(x)1{xt+ε},f_{t,\varepsilon}(x)\leq \mathbf{1}_{\{x\leq t+\varepsilon\}},

we have

Eft,ε(Zn)P(Znt+ε)=Φ(t+ε).\mathbb{E}f_{t,\varepsilon}(Z_n) \leq \mathbb{P}(Z_n\leq t+\varepsilon) = \Phi(t+\varepsilon).

Thus

P(Wnt)Φ(t)Φ(t+ε)Φ(t)+Cε3n1/2.\mathbb{P}(W_n\leq t)-\Phi(t) \leq \Phi(t+\varepsilon)-\Phi(t) + C\varepsilon^{-3}n^{-1/2}.

Since the standard normal density is bounded,

Φ(t+ε)Φ(t)Cε.\Phi(t+\varepsilon)-\Phi(t)\leq C\varepsilon.

Hence

P(Wnt)Φ(t)Cε+Cε3n1/2.\mathbb{P}(W_n\leq t)-\Phi(t) \leq C\varepsilon+C\varepsilon^{-3}n^{-1/2}.

For the other direction, choose a smooth function gt,εg_{t,\varepsilon} such that

1{xtε}gt,ε(x)1{xt},gt,ε(3)Cε3.\mathbf{1}_{\{x\leq t-\varepsilon\}} \leq g_{t,\varepsilon}(x) \leq \mathbf{1}_{\{x\leq t\}}, \qquad \|g_{t,\varepsilon}^{(3)}\|_\infty\leq C\varepsilon^{-3}.

The same Lindeberg replacement argument gives

Egt,ε(Wn)Egt,ε(Zn)Cε3n1/2.\left| \mathbb{E}g_{t,\varepsilon}(W_n) - \mathbb{E}g_{t,\varepsilon}(Z_n) \right| \leq C\varepsilon^{-3}n^{-1/2}.

Therefore

P(Wnt)Egt,ε(Wn)Egt,ε(Zn)Cε3n1/2.\mathbb{P}(W_n\leq t) \geq \mathbb{E}g_{t,\varepsilon}(W_n) \geq \mathbb{E}g_{t,\varepsilon}(Z_n) - C\varepsilon^{-3}n^{-1/2}.

Also,

Egt,ε(Zn)P(Zntε)=Φ(tε).\mathbb{E}g_{t,\varepsilon}(Z_n) \geq \mathbb{P}(Z_n\leq t-\varepsilon) = \Phi(t-\varepsilon).

So

Φ(t)P(Wnt)Φ(t)Φ(tε)+Cε3n1/2Cε+Cε3n1/2.\Phi(t)-\mathbb{P}(W_n\leq t) \leq \Phi(t)-\Phi(t-\varepsilon) + C\varepsilon^{-3}n^{-1/2} \leq C\varepsilon+C\varepsilon^{-3}n^{-1/2}.

Combining the two bounds, for every tRt\in\mathbb R,

P(Wnt)Φ(t)Cε+Cε3n1/2.\left| \mathbb{P}(W_n\leq t)-\Phi(t) \right| \leq C\varepsilon+C\varepsilon^{-3}n^{-1/2}.

Take

ε=n1/8.\varepsilon=n^{-1/8}.

Then

P(Wnt)Φ(t)Cn1/8.\left| \mathbb{P}(W_n\leq t)-\Phi(t) \right| \leq Cn^{-1/8}.

Therefore

suptRP(1nk=1nXkt)Φ(t)=O(n1/8).\sup_{t\in\mathbb R} \left| \mathbb{P}\left(\frac1{\sqrt n}\sum_{k=1}^{n}X_k\leq t\right) - \Phi(t) \right| = O(n^{-1/8}).
Remark

The main idea is to approximate the indicator function by a smooth function, and then replace XiX_i by normal variables YiY_i one at a time. Since XiX_i and YiY_i have the same first two moments, the first- and second-order Taylor terms cancel. Only the third-order remainder remains. The smoothing error is O(ε)O(\varepsilon), and the replacement error is O(ε3n1/2)O(\varepsilon^{-3}n^{-1/2}). Taking ε=n1/8\varepsilon=n^{-1/8} gives the bound.

Problem: 5.4.5

Let {Xn}\{X_n\} be i.i.d. random variables with

EX1=0,EX12=1,\mathbb{E}X_1=0,\qquad \mathbb{E}X_1^2=1,

and assume that for all l3l\geq 3,

EX1l<.\mathbb{E}|X_1|^l<\infty.

Set

Sn=X1++Xn.S_n=X_1+\cdots+X_n.

Let Hk(x)H_k(x) be the kk-th Hermite polynomial, defined by

H0=1,(1)kHk(x)ϕ(x)=ϕ(k)(x),H_0=1,\qquad (-1)^kH_k(x)\phi(x)=\phi^{(k)}(x),

where ϕ\phi is the standard normal density. Prove that

limnE[Hk(Snn)]=0,k1.\lim_{n\to\infty} \mathbb{E}\left[ H_k\left(\frac{S_n}{\sqrt n}\right) \right] =0,\qquad \forall k\geq 1.
Proof

Let

Wn=Snn.W_n=\frac{S_n}{\sqrt n}.

First prove that, for every fixed positive integer jj,

limnEWnj=EZj,\lim_{n\to\infty}\mathbb{E}W_n^j = \mathbb{E}Z^j,

where ZN(0,1)Z\sim N(0,1).

Expand:

EWnj=nj/2i1,,ij=1nE(Xi1Xij).\mathbb{E}W_n^j = n^{-j/2} \sum_{i_1,\dots,i_j=1}^{n} \mathbb{E}(X_{i_1}\cdots X_{i_j}).

Since EX1=0\mathbb{E}X_1=0 and the XiX_i are independent, a term is 00 if some index appears exactly once.

Thus, in a nonzero term, every appearing index appears at least twice. If there are rr distinct indices, then

rj2.r\leq \frac j2.

If r<j/2r<j/2, the total contribution of these terms is at most

O(nr)nj/2=o(1).O(n^r)n^{-j/2}=o(1).

Therefore the limit can only come from r=j/2r=j/2. This requires jj to be even and every appearing index to appear exactly twice. Let j=2mj=2m. The number of pairings is

(2m1)!!,(2m-1)!!,

and each contribution has expectation

EX12EXm2=1.\mathbb{E}X_1^2\cdots \mathbb{E}X_m^2=1.

Hence

limnEWn2m=(2m1)!!.\lim_{n\to\infty}\mathbb{E}W_n^{2m} = (2m-1)!!.

If jj is odd, no r=j/2r=j/2 case exists, so

limnEWnj=0.\lim_{n\to\infty}\mathbb{E}W_n^j=0.

These are exactly the moments of ZN(0,1)Z\sim N(0,1). Thus

limnEWnj=EZj.\lim_{n\to\infty}\mathbb{E}W_n^j = \mathbb{E}Z^j.

Since Hk(x)H_k(x) is a degree kk polynomial, write

Hk(x)=j=0kajxj.H_k(x)=\sum_{j=0}^{k}a_jx^j.

By moment convergence,

limnEHk(Wn)=j=0kajlimnEWnj=j=0kajEZj=EHk(Z).\lim_{n\to\infty}\mathbb{E}H_k(W_n) = \sum_{j=0}^{k}a_j\lim_{n\to\infty}\mathbb{E}W_n^j = \sum_{j=0}^{k}a_j\mathbb{E}Z^j = \mathbb{E}H_k(Z).

Finally compute EHk(Z)\mathbb{E}H_k(Z). Since ZN(0,1)Z\sim N(0,1),

EHk(Z)=Hk(x)ϕ(x)dx.\mathbb{E}H_k(Z) = \int_{-\infty}^{\infty}H_k(x)\phi(x)\,dx.

By the definition of the Hermite polynomial,

Hk(x)ϕ(x)=(1)kϕ(k)(x).H_k(x)\phi(x)=(-1)^k\phi^{(k)}(x).

Therefore

EHk(Z)=(1)kϕ(k)(x)dx.\mathbb{E}H_k(Z) = (-1)^k\int_{-\infty}^{\infty}\phi^{(k)}(x)\,dx.

For k1k\geq 1,

ϕ(k)(x)dx=ϕ(k1)()ϕ(k1)()=0.\int_{-\infty}^{\infty}\phi^{(k)}(x)\,dx = \phi^{(k-1)}(\infty)-\phi^{(k-1)}(-\infty) =0.

Hence

EHk(Z)=0,k1.\mathbb{E}H_k(Z)=0,\qquad k\geq 1.

Thus

limnE[Hk(Snn)]=0,k1.\lim_{n\to\infty} \mathbb{E}\left[ H_k\left(\frac{S_n}{\sqrt n}\right) \right] =0,\qquad \forall k\geq 1.
Remark

The idea is to first show that the fixed moments of Sn/nS_n/\sqrt n converge to the moments of a standard normal variable. In the moment expansion, because EXi=0\mathbb{E}X_i=0, only terms with paired indices survive in the limit. Those are exactly the normal moments. Since HkH_k is a polynomial, moment convergence gives EHk(Sn/n)EHk(Z)\mathbb{E}H_k(S_n/\sqrt n)\to\mathbb{E}H_k(Z). Finally, Hermite polynomials satisfy EHk(Z)=0\mathbb{E}H_k(Z)=0 under the standard normal law for k1k\geq 1.

Problem: 5.4.8

(Stein's method) Prove that

XN(0,1)X\sim N(0,1)

if and only if for every bounded continuous function gg with bounded continuous derivative gg',

E[Xg(X)]=E[g(X)].\mathbb{E}[Xg(X)]=\mathbb{E}[g'(X)].

Hint: for ZN(0,1)Z\sim N(0,1) and bounded continuous hh, construct

g0(x)=ex2/2xey2/2(h(y)Eh(Z))dy.g_0(x) = e^{x^2/2} \int_{-\infty}^{x} e^{-y^2/2}\bigl(h(y)-\mathbb{E}h(Z)\bigr)\,dy.
Proof

First prove necessity. If XN(0,1)X\sim N(0,1), its density is

ϕ(x)=12πex2/2.\phi(x)=\frac1{\sqrt{2\pi}}e^{-x^2/2}.

Since

ϕ(x)=xϕ(x),\phi'(x)=-x\phi(x),

we have

E[Xg(X)]=xg(x)ϕ(x)dx=g(x)ϕ(x)dx.\mathbb{E}[Xg(X)] = \int_{-\infty}^{\infty}xg(x)\phi(x)\,dx = -\int_{-\infty}^{\infty}g(x)\phi'(x)\,dx.

Integrating by parts,

g(x)ϕ(x)dx=[g(x)ϕ(x)]+g(x)ϕ(x)dx.-\int_{-\infty}^{\infty}g(x)\phi'(x)\,dx = -[g(x)\phi(x)]_{-\infty}^{\infty} + \int_{-\infty}^{\infty}g'(x)\phi(x)\,dx.

The boundary term is 00 because gg is bounded and ϕ(x)0\phi(x)\to 0. Hence

E[Xg(X)]=E[g(X)].\mathbb{E}[Xg(X)]=\mathbb{E}[g'(X)].

Now prove sufficiency. Suppose that for every bounded continuous gg with bounded continuous gg',

E[Xg(X)]=E[g(X)].\mathbb{E}[Xg(X)]=\mathbb{E}[g'(X)].

We show that XN(0,1)X\sim N(0,1).

Let ZN(0,1)Z\sim N(0,1). For any bounded continuous hh, define

g0(x)=ex2/2xey2/2(h(y)Eh(Z))dy.g_0(x) = e^{x^2/2} \int_{-\infty}^{x} e^{-y^2/2}\bigl(h(y)-\mathbb{E}h(Z)\bigr)\,dy.

Since

ey2/2(h(y)Eh(Z))dy=0,\int_{-\infty}^{\infty} e^{-y^2/2}\bigl(h(y)-\mathbb{E}h(Z)\bigr)\,dy =0,

we may also write

g0(x)=ex2/2xey2/2(h(y)Eh(Z))dy.g_0(x) = -e^{x^2/2} \int_{x}^{\infty} e^{-y^2/2}\bigl(h(y)-\mathbb{E}h(Z)\bigr)\,dy.

The standard normal tail estimate shows that g0g_0 is bounded and continuous, and that g0g_0' is also bounded and continuous.

Differentiating,

g0(x)=xex2/2xey2/2(h(y)Eh(Z))dy+h(x)Eh(Z).g_0'(x) = x e^{x^2/2} \int_{-\infty}^{x} e^{-y^2/2}\bigl(h(y)-\mathbb{E}h(Z)\bigr)\,dy + h(x)-\mathbb{E}h(Z).

Thus

g0(x)=xg0(x)+h(x)Eh(Z),g_0'(x)=xg_0(x)+h(x)-\mathbb{E}h(Z),

or

g0(x)xg0(x)=h(x)Eh(Z).g_0'(x)-xg_0(x)=h(x)-\mathbb{E}h(Z).

Using the assumption with g=g0g=g_0,

E[Xg0(X)]=E[g0(X)].\mathbb{E}[Xg_0(X)]=\mathbb{E}[g_0'(X)].

Hence

E[g0(X)Xg0(X)]=0.\mathbb{E}\bigl[g_0'(X)-Xg_0(X)\bigr]=0.

By the Stein equation,

g0(X)Xg0(X)=h(X)Eh(Z).g_0'(X)-Xg_0(X)=h(X)-\mathbb{E}h(Z).

Therefore

Eh(X)Eh(Z)=0.\mathbb{E}h(X)-\mathbb{E}h(Z)=0.

So

Eh(X)=Eh(Z)\mathbb{E}h(X)=\mathbb{E}h(Z)

for every bounded continuous hh. Thus XX and ZZ have the same distribution, and

XN(0,1).X\sim N(0,1).

This proves the equivalence.

Remark

The main point is the basic Stein characterization

XN(0,1)E[Xg(X)]=E[g(X)]X\sim N(0,1) \quad\Longleftrightarrow\quad \mathbb{E}[Xg(X)]=\mathbb{E}[g'(X)]

for a large enough class of test functions gg.

The necessary part comes from the special identity for the normal density,

ϕ(x)=xϕ(x),\phi'(x)=-x\phi(x),

which lets us turn E[Xg(X)]\mathbb{E}[Xg(X)] into E[g(X)]\mathbb{E}[g'(X)] by integration by parts.

For sufficiency, we want to prove that XX and a standard normal ZZ have the same distribution. It is enough to show that for every bounded continuous hh,

Eh(X)=Eh(Z).\mathbb{E}h(X)=\mathbb{E}h(Z).

The constructed function g0g_0 solves the Stein equation

g0(x)xg0(x)=h(x)Eh(Z).g_0'(x)-xg_0(x)=h(x)-\mathbb{E}h(Z).

Putting g0g_0 into the assumed identity gives

Eh(X)=Eh(Z).\mathbb{E}h(X)=\mathbb{E}h(Z).

So XX must be standard normal.

There are similar Stein characterizations for other classical distributions, such as the exponential and Poisson distributions. For example:

(Stein characterization of the exponential distribution) Let λ>0\lambda>0, and let WW be a continuous random variable supported on (0,)(0,\infty) with density qq. Under suitable regularity conditions, prove that

WExp(λ)W\sim \operatorname{Exp}(\lambda)

if and only if for every fCc1(0,)f\in C_c^1(0,\infty),

Ef(W)=λEf(W).\mathbb{E}f'(W)=\lambda\mathbb{E}f(W).

Extension: From the χ² distribution to the Wishart distribution

Reading map

Probability already has plenty of named distributions. Adding one or two more is not the main issue. The useful path is this:

normal sampleorthogonal decompositionsum of squares / sum of outer products.\text{normal sample} \quad\Longrightarrow\quad \text{orthogonal decomposition} \quad\Longrightarrow\quad \text{sum of squares / sum of outer products}.

In one dimension, this path gives the χ2\chi^2 distribution and explains why Xˉ\bar X and s2s^2 are independent. In several dimensions, what should replace it?

1. One dimension: a sum of squares loses one direction

First recall a small fact from the course notes. If

Z1,,ZνiidN(0,1),Z_1,\ldots,Z_\nu\stackrel{\mathrm{iid}}{\sim}N(0,1),

then

i=1νZi2χν2.\sum_{i=1}^{\nu}Z_i^2\sim \chi^2_\nu.

This is the squared length of a standard normal vector in Rν\mathbb R^\nu. Equivalently, χν2\chi^2_\nu is the sum of ν\nu independent squared standard normals, or the square of a random radius.

Now recall the normal-sample theorem. If

X1,,XniidN(μ,σ2),s2=1n1i=1n(XiXˉ)2,X_1,\ldots,X_n\stackrel{\mathrm{iid}}{\sim}N(\mu,\sigma^2), \qquad s^2=\frac1{n-1}\sum_{i=1}^{n}(X_i-\bar X)^2,

then

Xˉ ⁣ ⁣ ⁣s2,(n1)s2σ2χn12.\bar X\perp\!\!\!\perp s^2, \qquad \frac{(n-1)s^2}{\sigma^2}\sim \chi^2_{n-1}.

The number n1n-1 is not random decoration. The vector

(X1μ,,Xnμ)(X_1-\mu,\ldots,X_n-\mu)

lives in an nn-dimensional space, but the sample mean uses the special direction

span{(1,,1)}.\operatorname{span}\{(1,\ldots,1)\}.

After subtracting Xˉ\bar X, the residual vector

(X1Xˉ,,XnXˉ)(X_1-\bar X,\ldots,X_n-\bar X)

is orthogonal to that direction, so it lives in an (n1)(n-1)-dimensional subspace. Its squared length, after scaling by σ2\sigma^2, has the χn12\chi^2_{n-1} distribution.

Two Gaussian facts are being used here. First, a standard normal vector is unchanged in distribution under orthogonal rotations. Second, for Gaussian vectors, orthogonal components are independent, not merely uncorrelated. The second fact is special to the Gaussian setting.

So the geometric source of n1n-1 is simple: estimating the mean uses one direction in the sample space.

2. What changes in p dimensions?

Now replace each scalar observation by a pp-dimensional vector. We often use pp for dimension, especially in high-dimensional problems. Let

Y1,,YνiidNp(0,Σ).Y_1,\ldots,Y_\nu\stackrel{\mathrm{iid}}{\sim}N_p(0,\Sigma).

For vectors, the natural analogue of a square is not Yi2Y_i^2, but the outer product

YiYi.Y_iY_i^\top.

Thus the matrix version of a sum of squares is

W=i=1νYiYi.W=\sum_{i=1}^{\nu}Y_iY_i^\top.

We write

WWp(Σ,ν),W\sim W_p(\Sigma,\nu),

and say that WW has a Wishart distribution with scale matrix Σ\Sigma and ν\nu degrees of freedom.

This is the same idea as the χν2\chi^2_\nu distribution. When p=1p=1, each YiY_i is just a scalar with YiN(0,σ2)Y_i\sim N(0,\sigma^2), so

W=i=1νYi2=σ2i=1νZi2σ2χν2.W=\sum_{i=1}^{\nu}Y_i^2 =\sigma^2\sum_{i=1}^{\nu}Z_i^2 \sim \sigma^2\chi^2_\nu.

In this sense, the Wishart distribution is a matrix-valued extension of the χ2\chi^2 distribution.

For example, when p=2p=2,

W=(iYi12iYi1Yi2iYi1Yi2iYi22).W= \begin{pmatrix} \sum_i Y_{i1}^2 & \sum_i Y_{i1}Y_{i2}\\ \sum_i Y_{i1}Y_{i2} & \sum_i Y_{i2}^2 \end{pmatrix}.

The diagonal entries record sums of squares in each coordinate. The off-diagonal entries record cross-products between coordinates. A χ2\chi^2 variable keeps only length; a Wishart matrix also keeps the relationships between directions.

3. Main theorem: the sample covariance matrix is Wishart

Let

X1,,XniidNp(μ,Σ),X_1,\ldots,X_n\stackrel{\mathrm{iid}}{\sim}N_p(\mu,\Sigma),

and define the sample mean vector and sample covariance matrix by

Xˉ=1ni=1nXi,S=1n1i=1n(XiXˉ)(XiXˉ).\bar X=\frac1n\sum_{i=1}^{n}X_i, \qquad S=\frac1{n-1}\sum_{i=1}^{n}(X_i-\bar X)(X_i-\bar X)^\top.

Then

(n1)SWp(Σ,n1),Xˉ ⁣ ⁣ ⁣S.(n-1)S\sim W_p(\Sigma,n-1), \qquad \bar X\perp\!\!\!\perp S.

If n1<pn-1<p, this distribution is singular: the rank of (n1)S(n-1)S is at most n1n-1, so the matrix cannot be positive definite. The construction above still makes sense. The usual density on the positive definite cone only applies when the degrees of freedom are large enough, typically ν>p1\nu>p-1.

This is exactly the multivariate version of

Xˉ ⁣ ⁣ ⁣s2,(n1)s2σ2χn12.\bar X\perp\!\!\!\perp s^2, \qquad \frac{(n-1)s^2}{\sigma^2}\sim\chi^2_{n-1}.

In one dimension, after removing the sample mean, the residual sum of squares is χ2\chi^2. In several dimensions, after removing the sample mean vector, the residual sum of outer products is Wishart.

One-dimensional normal sample Multivariate normal sample
square (XiXˉ)2(X_i-\bar X)^2 outer product (XiXˉ)(XiXˉ)(X_i-\bar X)(X_i-\bar X)^\top
sum of squares sum of outer products
χn12\chi^2_{n-1} Wp(Σ,n1)W_p(\Sigma,n-1)
Xˉ ⁣ ⁣ ⁣s2\bar X\perp\!\!\!\perp s^2 Xˉ ⁣ ⁣ ⁣S\bar X\perp\!\!\!\perp S

4. Proof: rotate the sample space

We do not start from the Wishart density. The density is useful, but if it is the first thing you see, Wishart may look like a pile of determinants and traces. Orthogonal decomposition is a better first view.

Put the data into an n×pn\times p matrix

X=(X1Xn),1n=(1,,1).X= \begin{pmatrix} X_1^\top\\ \vdots\\ X_n^\top \end{pmatrix}, \qquad \mathbf 1_n=(1,\ldots,1)^\top.

Choose an n×nn\times n orthogonal matrix HH whose first row is

1n1n.\frac1{\sqrt n}\mathbf 1_n^\top.

Define the standardized data matrix

Z=(X1nμ)Σ1/2.Z=(X-\mathbf 1_n\mu^\top)\Sigma^{-1/2}.

The rows of ZZ are independent Np(0,Ip)N_p(0,I_p) random vectors. Left multiplication by an orthogonal matrix only rotates the sample-index direction, so

U=HZU=HZ

again has independent Np(0,Ip)N_p(0,I_p) rows. Write the jj-th row as uju_j^\top, where ujRpu_j\in\mathbb R^p.

The first row is exactly the mean direction:

u1=1n1nZ=n(Xˉμ)Σ1/2.u_1^\top =\frac1{\sqrt n}\mathbf 1_n^\top Z =\sqrt n\,(\bar X-\mu)^\top\Sigma^{-1/2}.

Thus u1u_1 contains the information in Xˉ\bar X.

The remaining rows u2,,unu_2^\top,\ldots,u_n^\top are residual directions. Let

P0=1n1n1n,P1=InP0.P_0=\frac1n\mathbf 1_n\mathbf 1_n^\top, \qquad P_1=I_n-P_0.

Here P0P_0 is the projection onto the mean direction, and P1P_1 is the projection onto its orthogonal complement. Since the first row of HH is 1n/n\mathbf 1_n^\top/\sqrt n,

P1=H(000In1)H.P_1 =H^\top \begin{pmatrix} 0&0\\ 0&I_{n-1} \end{pmatrix} H.

Now compute the residual sum of outer products:

(n1)S=(X1nXˉ)(X1nXˉ)=XP1X=(X1nμ)P1(X1nμ)=Σ1/2ZP1ZΣ1/2=Σ1/2(j=2nujuj)Σ1/2=j=2n(Σ1/2uj)(Σ1/2uj).\begin{aligned} (n-1)S &=(X-\mathbf 1_n\bar X^\top)^\top(X-\mathbf 1_n\bar X^\top)\\ &=X^\top P_1X\\ &=(X-\mathbf 1_n\mu^\top)^\top P_1(X-\mathbf 1_n\mu^\top)\\ &=\Sigma^{1/2}Z^\top P_1Z\Sigma^{1/2}\\ &=\Sigma^{1/2}\left(\sum_{j=2}^{n}u_ju_j^\top\right)\Sigma^{1/2}\\ &=\sum_{j=2}^{n}(\Sigma^{1/2}u_j)(\Sigma^{1/2}u_j)^\top. \end{aligned}

The third line uses P11n=0P_1\mathbf 1_n=0: the residual projection removes any constant mean direction.

The last line is a sum of n1n-1 independent outer products of Np(0,Σ)N_p(0,\Sigma) vectors. Therefore

(n1)SWp(Σ,n1).(n-1)S\sim W_p(\Sigma,n-1).

Also, Xˉ\bar X depends only on u1u_1, while SS depends only on u2,,unu_2,\ldots,u_n. These vectors are independent, so

Xˉ ⁣ ⁣ ⁣S.\bar X\perp\!\!\!\perp S.

This already contains the main idea of Cochran's theorem. Here is the same argument in the projection-matrix form used in multivariate statistics.

Why the degrees of freedom are n-1, not n-p

Subtracting Xˉ\bar X removes one direction in the sample-index space, namely (1,,1)(1,\ldots,1). It does not remove pp directions. Each remaining residual direction is still a full pp-dimensional vector. Hence the Wishart degrees of freedom are n1n-1.

5. Cochran's theorem: a projection still gives Wishart

The more general form is this. Above we only projected away the mean direction. Cochran's theorem says that any symmetric idempotent projection cuts out a Wishart piece from a normal data matrix.

Theorem: Cochran's theorem

Let

z1,,zmiidNp(0,Σ),Z=(z1zm).z_1,\ldots,z_m\stackrel{\mathrm{iid}}{\sim}N_p(0,\Sigma), \qquad Z= \begin{pmatrix} z_1^\top\\ \vdots\\ z_m^\top \end{pmatrix}.

If PP is an m×mm\times m symmetric idempotent matrix and r=rank(P)r=\operatorname{rank}(P), then

ZPZWp(Σ,r),Z(ImP)ZWp(Σ,mr),Z^\top PZ\sim W_p(\Sigma,r), \qquad Z^\top(I_m-P)Z\sim W_p(\Sigma,m-r),

and the two random matrices are independent.

More generally, if P1,,PkP_1,\ldots,P_k are pairwise orthogonal symmetric idempotent matrices and a=1kPa=Im\sum_{a=1}^kP_a=I_m, then

ZPaZWp(Σ,rank(Pa)),a=1,,k,Z^\top P_aZ\sim W_p(\Sigma,\operatorname{rank}(P_a)), \qquad a=1,\ldots,k,

and these matrices are independent.

Proof

It is enough to prove the statement for one projection PP. Since PP is symmetric and idempotent, it is the orthogonal projection onto an rr-dimensional subspace. Hence there is an orthogonal matrix HH such that

P=H(Ir000)H,ImP=H(000Imr)H.P = H^\top \begin{pmatrix} I_r&0\\ 0&0 \end{pmatrix} H, \qquad I_m-P = H^\top \begin{pmatrix} 0&0\\ 0&I_{m-r} \end{pmatrix} H.

Set Y=HZY=HZ. Left multiplication by an orthogonal matrix only rotates the sample-index direction, so the rows of YY are still independent Np(0,Σ)N_p(0,\Sigma) vectors. Split the rows as

Y=(Y1Y2),Y1Rr×p,Y2R(mr)×p.Y= \begin{pmatrix} Y_1\\ Y_2 \end{pmatrix}, \qquad Y_1\in\mathbb R^{r\times p},\quad Y_2\in\mathbb R^{(m-r)\times p}.

Then

ZPZ=Y1Y1,Z(ImP)Z=Y2Y2.Z^\top PZ=Y_1^\top Y_1, \qquad Z^\top(I_m-P)Z=Y_2^\top Y_2.

The blocks Y1Y_1 and Y2Y_2 use disjoint normal rows, so they are independent. By the definition of the Wishart distribution, Y1Y1Wp(Σ,r)Y_1^\top Y_1\sim W_p(\Sigma,r) and Y2Y2Wp(Σ,mr)Y_2^\top Y_2\sim W_p(\Sigma,m-r).

For the sample covariance matrix, take

P=In1n1n1n.P=I_n-\frac1n\mathbf 1_n\mathbf 1_n^\top.

This is a projection matrix with rank n1n-1, and P1n=0P\mathbf 1_n=0. Therefore

(n1)S=XPX=(X1nμ)P(X1nμ)Wp(Σ,n1).(n-1)S =X^\top PX =(X-\mathbf 1_n\mu^\top)^\top P(X-\mathbf 1_n\mu^\top) \sim W_p(\Sigma,n-1).

This gives a compact proof that the sample covariance matrix has a Wishart distribution. It is shorter than rotating the sample space by hand, but the geometry is the same: the projection splits the sample-index space, and each piece contributes a sum of outer products.

6. A note on the Wishart density

If ν>p1\nu>p-1 and WW is positive definite, the Wishart density is

f(W)=W(νp1)/2exp{12tr(Σ1W)}2νp/2Σν/2Γp(ν/2),W>0,f(W) = \frac{ |W|^{(\nu-p-1)/2} \exp\left\{-\frac12\operatorname{tr}(\Sigma^{-1}W)\right\} }{ 2^{\nu p/2}|\Sigma|^{\nu/2}\Gamma_p(\nu/2) }, \qquad W>0,

where the multivariate Gamma function is

Γp(a)=πp(p1)/4j=1pΓ(aj12).\Gamma_p(a) = \pi^{p(p-1)/4} \prod_{j=1}^{p}\Gamma\left(a-\frac{j-1}{2}\right).

This formula is useful, but it is not the friendliest first encounter with Wishart. Its derivation needs a Jacobian calculation on the cone of positive definite matrices, which is rather technical, so we leave it aside here.

Summary

In a one-dimensional normal sample, projecting away the mean direction leaves n1n-1 independent Gaussian residual directions, and their squared length gives χn12\chi^2_{n-1}. The Wishart theorem is the vector version of the same statement: square becomes outer product, variance becomes covariance matrix, and Xˉs2\bar X\perp s^2 becomes XˉS\bar X\perp S. Cochran's theorem replaces the mean direction by a general orthogonal projection.

End-of-chapter check
  • The original problems and solutions in this chapter come from the corresponding TeX source files.
  • You can first read only the problem boxes, write down the main identities, and then open the proof or solution.
  • If a conclusion uses independence, countable additivity, a change-of-variables formula, or a moment condition, it is worth marking that point explicitly.