Third recitation

Jieyang Hu

Contents
Reading guide
  • This chapter covers discrete random variables, expectation, conditional expectation, the probabilistic method, and a midterm review.
  • Linearity, indicator decompositions, and conditional distributions appear again and again.
  • The parts on random walks, Catalan numbers, and distribution functions are worth reading slowly.

Tip. When a random quantity looks complicated, first try to write it as a sum of indicator variables. Then decide whether independence is needed.

Exercise 2.1

Note

This section is mainly about probability mass functions, mixture distributions, and symmetry. Condition first, then sum. Independence is often used to pass symmetry from the marginals to the sum.

Problem: 2.1.1

Roll a fair die first. After seeing the result, toss that many fair coins. Let XX be the number of heads. Find the probability mass function of XX.

Proof

Let NN be the die result. Then the possible values of XX are 0,1,,60,1,\cdots,6, and

P(X=x)=n=x6P(X=xN=n)P(N=n)=16n=x6(nx)2n.\begin{aligned} \mathbb{P} (X=x) &=\sum_{n=x}^6 \mathbb{P} (X=x \mid N=n) \mathbb{P} (N=n)\\ &=\frac{1}{6}\sum_{n=x}^6 \binom{n}{x}2^{-n}. \end{aligned}
Problem: 2.1.2

The number of new posts in a unit time interval follows a Poisson distribution with parameter λ\lambda. The numbers of new posts in disjoint time intervals are independent. Find the probability of seeing kk new posts in two unit intervals.

Proof

Let XX be the number of posts in two unit intervals. Then

P(X=k)=m+n=kλmm!eλλnn!eλ=λke2λk!m=0k(km)=(2λ)ke2λk!.\mathbb{P} (X=k) =\sum_{m+n=k}\frac{\lambda^m}{m!}e^{-\lambda } \frac{\lambda^n}{n!}e^{-\lambda } =\frac{\lambda ^k e^{-2\lambda }}{k!}\sum_{m=0}^k \binom{k}{m} =\frac{(2\lambda )^k e^{-2\lambda }}{k!}.
Problem: 2.1.3

Let X1,,XnX_1,\cdots,X_n be independent discrete random variables, each symmetric about 00, meaning that XiX_i and Xi-X_i have the same probability mass function. Prove that for every xRx\in\mathbb{R},

P(Snx)=P(Snx),\mathbb{P}(S_n \ge x) = \mathbb{P}(S_n \le -x),

where

Sn=X1++Xn.S_n = X_1 + \cdots + X_n.

If independence is removed, does the conclusion still hold? Explain.

Proof
P(Snx)=x1++xnxP(X1=x1,,Xn=xn)=x1++xnxP(X1=x1)P(Xn=xn)=x1++xnxP(X1=x1)P(Xn=xn)=x1++xnxP(X1=x1,,Xn=xn)=P(Snx).\begin{aligned} \mathbb{P} (S_n\geq x) &=\sum_{x_{1}+\cdots +x_n\geq x }\mathbb{P} (X_1= x_1,\cdots, X_n=x_n)\\ &= \sum_{x_{1}+\cdots +x_n\geq x }\mathbb{P} (X_1= x_1)\cdots\mathbb{P} ( X_n=x_n)\\ &= \sum_{x_{1}+\cdots +x_n\geq x }\mathbb{P} (X_1= -x_1)\cdots\mathbb{P} ( X_n=-x_n)\\ &=\sum_{x_{1}+\cdots +x_n\leq -x }\mathbb{P} (X_1= x_1,\cdots, X_n=x_n)\\ &=\mathbb{P} (S_n \leq -x). \end{aligned}

Without independence, the conclusion can fail. Take n=2n=2 and let

P((X1,X2)=(1,0))=P((X1,X2)=(0,1))=P((X1,X2)=(1,1))=13.\mathbb{P}\bigl((X_1,X_2)=(-1,0)\bigr) =\mathbb{P}\bigl((X_1,X_2)=(0,-1)\bigr) =\mathbb{P}\bigl((X_1,X_2)=(1,1)\bigr) =\frac{1}{3}.

Then both X1X_1 and X2X_2 are uniform on {1,0,1}\{-1,0,1\}, so each is symmetric about 00. But

P(S22)=130=P(S22).\mathbb{P}(S_2\geq 2)=\frac{1}{3}\neq 0=\mathbb{P}(S_2\leq -2).
Problem: 2.1.4

Let XX have probability mass function P(X=xk)=pk\mathbb{P}(X=x_k)=p_k, k=1,2,,nk=1,2,\cdots,n. The Shannon entropy is

H(X)=k=1npklnpk.H(X)=-\sum_{k=1}^n p_k\ln p_k.

For fixed nn, which distribution of XX maximizes H(X)H(X)?

Proof

By Jensen's inequality,

H(X)=k=1npklogpklogn.H(X)=-\sum_{k=1}^n p_k\log p_k \leq \log n.

Equality holds when p1=p2==pn=1np_1=p_2=\cdots=p_n=\frac{1}{n}, that is, when XX is uniform on its nn possible values.

Exercise 2.2

Note

For expectation problems, common tools are generating functions, tail-sum formulas, and indicator variables. For higher moments, first try breaking the random variable into Bernoulli indicators.

Problem: 2.2.1

For XB(n,p)X\sim B(n,p), find E[X3]\mathbb{E}[X^3].

Proof

Write X=k=1nIkX=\sum_{k=1}^n I_k, where I1,,InI_1,\cdots,I_n are i.i.d. indicator random variables with P(Ik=1)=p\mathbb{P}(I_k=1)=p. Then

E(X3)=E((k=1nIk)3)=nE(I13)+6(n2)E(I12I2)+6(n3)E(I1I2I3)=n(n1)(n2)p3+3n(n1)p2+np.\begin{aligned} \mathbb{E} (X^3) &= \mathbb{E} \left( \left( \sum_{k=1}^n I_k \right)^3 \right) \\ &= n\mathbb{E} (I_1^3) +6\binom{n}{2}\mathbb{E} (I_1 ^{2} I_2) +6\binom{n}{3}\mathbb{E} (I_1 I_2 I_3)\\ &=n(n-1)(n-2)p^3 +3n(n-1)p^{2} +np. \end{aligned}
Problem: 2.2.2

Let the discrete random variable XX have probability mass function

f(x)={1x(x+1),x=1,2,,0,otherwise.f(x) = \begin{cases} \dfrac{1}{x(x+1)}, & x = 1, 2, \cdots, \\ 0, & \text{otherwise}. \end{cases}

For which real numbers α\alpha is the α\alpha-th moment E[Xα]\mathbb{E}[X^\alpha] finite?

Proof

If α<1\alpha<1, then

E[Xα]=k=11k1α(k+1)k=11k2α<+.\mathbb{E} [X^\alpha ] =\sum_{k=1}^{\infty}\frac{1}{k^{1-\alpha }(k+1)} \leq \sum_{k=1}^{\infty} \frac{1}{k^{2-\alpha } }<+\infty .

If α1\alpha\ge 1, then

E[Xα]k=11k+1=+.\mathbb{E} [X^\alpha ]\geq \sum_{k=1}^{\infty} \frac{1}{k+1}=+\infty .

Thus E[Xα]<\mathbb{E}[X^\alpha]<\infty exactly when α<1\alpha<1.

Problem: 2.2.3

Let XX be the total number of intersections crossed by a self-driving ride-hailing car in one day, and suppose

P(X=k)=(1p)k1p,0<p<1,k=1,2,.\mathbb{P}(X = k) = (1-p)^{k-1}p,\quad 0<p<1,\quad k=1,2,\cdots.

The traffic lights at different intersections work independently, and the car meets a red light at each intersection with probability pp.

11 Find the expectation and variance of the total number of intersections crossed.

22 Find the expected number of red lights met in one day.

Proof

Let q=1pq=1-p. From

k=0qk=11q,\sum_{k=0}^\infty q^k=\frac{1}{1-q},

differentiating gives

k=1kqk1=1(1q)2,k=2k(k1)qk2=2(1q)3.\sum_{k=1}^\infty kq^{k-1}=\frac{1}{(1-q)^2},\qquad \sum_{k=2}^\infty k(k-1)q^{k-2}=\frac{2}{(1-q)^3}.

Thus

E[X]=k=1kpqk1=p1(1q)2=1p.\mathbb{E}[X]=\sum_{k=1}^\infty kpq^{k-1} =p\cdot \frac{1}{(1-q)^2}=\frac{1}{p}.

Also

E[X(X1)]=k=2k(k1)pqk1=pqk=2k(k1)qk2=2qp2.\mathbb{E}[X(X-1)] =\sum_{k=2}^\infty k(k-1)pq^{k-1} =pq\sum_{k=2}^\infty k(k-1)q^{k-2} =\frac{2q}{p^2}.

So

E[X2]=E[X(X1)]+E[X]=2qp2+1p=2pp2,\mathbb{E}[X^2] =\mathbb{E}[X(X-1)]+\mathbb{E}[X] =\frac{2q}{p^2}+\frac{1}{p} =\frac{2-p}{p^2},

and

Var(X)=E[X2]E[X]2=2pp21p2=1pp2.\operatorname{Var}(X) =\mathbb{E}[X^2]-\mathbb{E}[X]^2 =\frac{2-p}{p^2}-\frac{1}{p^2} =\frac{1-p}{p^2}.

Let YY be the number of red lights met in one day. Given X=nX=n, we have YX=nB(n,p)Y\mid X=n\sim B(n,p), so

E[YX=n]=np.\mathbb{E}[Y\mid X=n]=np.

Therefore

E[Y]=E(E[YX])=pE[X]=1.\mathbb{E}[Y]=\mathbb{E}\bigl(\mathbb{E}[Y\mid X]\bigr)=p\mathbb{E}[X]=1.
Remark

Here XX is a geometric random variable with parameter pp. One may directly use

E[X]=1p,Var(X)=1pp2.\mathbb{E}[X]=\frac{1}{p},\qquad \operatorname{Var}(X)=\frac{1-p}{p^2}.
Problem: 2.2.4

For a nonnegative integer-valued random variable XX, prove that

E[X]=n=0P(X>n).\mathbb{E}[X] = \sum_{n=0}^{\infty} \mathbb{P}(X > n).
Proof

Since XX is nonnegative and integer-valued,

X=n=01{X>n}.X=\sum_{n=0}^{\infty}\mathbf{1}_{\{X>n\}}.

Taking expectations and exchanging expectation with the sum gives

E[X]=n=0E[1{X>n}]=n=0P(X>n).\mathbb{E}[X] =\sum_{n=0}^{\infty}\mathbb{E}\bigl[\mathbf{1}_{\{X>n\}}\bigr] =\sum_{n=0}^{\infty}\mathbb{P}(X>n).
Remark

For a nonnegative real-valued random variable XX, use

XXX.\lfloor X\rfloor \leq X \leq \lceil X\rceil.

Applying the previous result to X\lfloor X\rfloor and X\lceil X\rceil gives

E[X]=n=0P(X>n)=n=1P(Xn),\mathbb{E}[\lfloor X\rfloor] =\sum_{n=0}^{\infty}\mathbb{P}(\lfloor X\rfloor>n) =\sum_{n=1}^{\infty}\mathbb{P}(X\geq n),

and

E[X]=n=0P(X>n)=n=0P(X>n).\mathbb{E}[\lceil X\rceil] =\sum_{n=0}^{\infty}\mathbb{P}(\lceil X\rceil>n) =\sum_{n=0}^{\infty}\mathbb{P}(X>n).

Thus

n=1P(Xn)E[X]n=0P(X>n).\sum_{n=1}^{\infty}\mathbb{P}(X\geq n) \leq \mathbb{E}[X] \leq \sum_{n=0}^{\infty}\mathbb{P}(X>n).

This is a useful way to estimate an expectation by tail probabilities.

Problem: 2.2.6

The random graph model G(n,p)G(n,p) has vertex set V={1,2,,n}V=\{1,2,\cdots,n\}. Each pair of vertices is connected by an edge with probability pp, independently of all other pairs. The degree DiD_i of vertex ii is the number of edges incident to ii.

11 Find the distribution and expectation of DiD_i.

22 Let XX be the number of triangles in G(n,p)G(n,p). Find E[X]\mathbb{E}[X] and Var(X)\operatorname{Var}(X).

Proof

For a fixed vertex ii, the degree DiD_i counts how many of the other n1n-1 possible edges appear. Hence

DiB(n1,p),E[Di]=(n1)p.D_i\sim B(n-1,p),\qquad \mathbb{E}[D_i]=(n-1)p.

Let T\mathcal{T} be the set of all triangles. For each TTT\in\mathcal{T}, let ITI_T be the indicator that triangle TT appears. Then

X=TTIT,T=(n3).X=\sum_{T\in\mathcal{T}}I_T,\qquad |\mathcal{T}|=\binom{n}{3}.

Therefore

E[X]=TTE[IT]=(n3)p3.\mathbb{E}[X]=\sum_{T\in\mathcal{T}}\mathbb{E}[I_T]=\binom{n}{3}p^3.

For the variance,

Var(X)=TTVar(IT)+2T<SCov(IT,IS).\operatorname{Var}(X) =\sum_{T\in\mathcal{T}}\operatorname{Var}(I_T) +2\sum_{T<S}\operatorname{Cov}(I_T,I_S).

Here

Var(IT)=p3(1p3).\operatorname{Var}(I_T)=p^3(1-p^3).

If two different triangles have no common edge, their edge sets are independent, so the covariance is 00. If they share one edge, then

E[ITIS]=p5,Cov(IT,IS)=p5p6=p5(1p).\mathbb{E}[I_TI_S]=p^5,\qquad \operatorname{Cov}(I_T,I_S)=p^5-p^6=p^5(1-p).

The number of unordered pairs of triangles sharing an edge is

(n2)(n22)=6(n4).\binom{n}{2}\binom{n-2}{2}=6\binom{n}{4}.

Thus

Var(X)=(n3)p3(1p3)+12(n4)p5(1p).\operatorname{Var}(X) =\binom{n}{3}p^3(1-p^3)+12\binom{n}{4}p^5(1-p).

Exercise 2.3

Note

The probabilistic method usually starts with a random construction, then uses an expectation to prove existence. The final conclusion is deterministic; randomness is only a proof tool.

Problem: 2.3.1

Daniel Bernoulli described a diffusion model in 1769. Bottle A contains nn red balls, and bottle B contains nn blue balls. At each step, choose one ball from each bottle and exchange the two balls. Find the expected number of red balls in bottle A after kk steps.

Proof

For each ball that started in bottle A, let pkp_k be the probability that it is still in bottle A after the kk-th exchange. Then p0=1p_0=1, and

pk+1=n1npk+1n(1pk).p_{k+1}=\frac{n-1}{n}p_k+\frac{1}{n}(1-p_k).

Solving the recursion gives

pk=12[(n2n)k+1].p_k=\frac{1}{2}\left[\left(\frac{n-2}{n}\right)^k+1\right].

Label the nn balls originally in bottle A. Let IiI_i be the indicator that the ii-th such ball is in bottle A after kk exchanges. If N=I1++InN=I_1+\cdots+I_n, then

E(N)=i=1nE(Ii)=npk=n2[(n2n)k+1].\mathbb{E}(N) =\sum_{i=1}^n\mathbb{E}(I_i) =np_k =\frac{n}{2}\left[\left(\frac{n-2}{n}\right)^k+1\right].
Problem: 2.3.2

Let G=(V,E)G=(V,E) be a finite graph. For a vertex set WW and an edge eEe\in E, define

1W(e)={1,e connects W and Wc,0,otherwise.\mathbf{1}_W(e)= \begin{cases} 1, & e \text{ connects } W \text{ and } W^c,\\ 0, & \text{otherwise}. \end{cases}

Let

NW=eE1W(e).N_W=\sum_{e\in E}\mathbf{1}_W(e).

Use the probabilistic method to prove that there exists WVW\subset V such that NWE/2N_W\ge |E|/2.

Proof

Choose each vertex independently with probability 12\frac12, and let WW be the chosen set. For a fixed edge, the probability that its two endpoints are separated by WW and WcW^c is 212(112)=122\cdot\frac12(1-\frac12)=\frac12. Hence

E(NW)=eEE(1W(e))=E2.\mathbb{E}(N_W) =\sum_{e\in E}\mathbb{E}(\mathbf{1}_W(e)) =\frac{|E|}{2}.

Therefore at least one choice of WW satisfies NWE/2N_W\ge |E|/2.

Problem: 2.3.3

A box contains nn balls labeled 1,2,,n1,2,\cdots,n. Choose kk balls uniformly without replacement, and let XX be the sum of their labels. Find the expectation and variance of XX.

Proof

For each i=1,2,,ni=1,2,\cdots,n, let

Ii=1{ball i is chosen}.I_i=\mathbf{1}_{\{\text{ball }i\text{ is chosen}\}}.

Then

X=i=1niIi.X=\sum_{i=1}^n iI_i.

Each ball is chosen with probability kn\frac{k}{n}, so

E(X)=i=1niE(Ii)=kni=1ni=k(n+1)2.\mathbb{E}(X)=\sum_{i=1}^n i\mathbb{E}(I_i) =\frac{k}{n}\sum_{i=1}^n i =\frac{k(n+1)}{2}.

For the variance,

Var(X)=i=1ni2Var(Ii)+21i<jnijCov(Ii,Ij).\operatorname{Var}(X) =\sum_{i=1}^n i^2\operatorname{Var}(I_i) +2\sum_{1\leq i<j\leq n}ij\operatorname{Cov}(I_i,I_j).

We have

Var(Ii)=kn(1kn)=k(nk)n2.\operatorname{Var}(I_i) =\frac{k}{n}\left(1-\frac{k}{n}\right) =\frac{k(n-k)}{n^2}.

For iji\ne j,

P(Ii=1,Ij=1)=(n2k2)(nk)=k(k1)n(n1).\mathbb{P}(I_i=1,I_j=1) =\frac{\binom{n-2}{k-2}}{\binom{n}{k}} =\frac{k(k-1)}{n(n-1)}.

Thus

Cov(Ii,Ij)=k(k1)n(n1)k2n2=k(nk)n2(n1).\operatorname{Cov}(I_i,I_j) =\frac{k(k-1)}{n(n-1)}-\frac{k^2}{n^2} =-\frac{k(n-k)}{n^2(n-1)}.

Therefore

Var(X)=k(nk)n2i=1ni22k(nk)n2(n1)1i<jnij=k(nk)n2n(n+1)(2n+1)6k(nk)n2(n1)[(i=1ni)2i=1ni2]=k(nk)(n+1)12.\begin{aligned} \operatorname{Var}(X) &=\frac{k(n-k)}{n^2}\sum_{i=1}^n i^2 -\frac{2k(n-k)}{n^2(n-1)}\sum_{1\leq i<j\leq n}ij\\ &=\frac{k(n-k)}{n^2}\cdot \frac{n(n+1)(2n+1)}{6} -\frac{k(n-k)}{n^2(n-1)} \left[\left(\sum_{i=1}^n i\right)^2-\sum_{i=1}^n i^2\right]\\ &=\frac{k(n-k)(n+1)}{12}. \end{aligned}
Problem: 2.3.6

Let v1,v2,,vnRn\mathbf{v}_1,\mathbf{v}_2,\cdots,\mathbf{v}_n\in\mathbb{R}^n satisfy vi1|\mathbf{v}_i|\le 1 for all ii. Let

w=i=1npivi,pi[0,1].\mathbf{w}=\sum_{i=1}^n p_i\mathbf{v}_i,\quad p_i\in[0,1].

Use the probabilistic method to prove that there exist εi{0,1}\varepsilon_i\in\{0,1\} such that

i=1nεiviwn2.\left|\sum_{i=1}^n \varepsilon_i\mathbf{v}_i-\mathbf{w}\right| \le \frac{\sqrt n}{2}.
Proof

Choose εi{0,1}\varepsilon_i\in\{0,1\} independently, with P(εi=1)=pi\mathbb{P}(\varepsilon_i=1)=p_i. Consider

X:=i=1nεiviw2=i=1n(εipi)2vi2+21i<jn(εipi)(εjpj)vivj.X:=\left|\sum_{i=1}^n \varepsilon_i v_i-w\right|^2 =\sum_{i=1}^n(\varepsilon_i-p_i)^2|v_i|^2 +2\sum_{1\le i<j\le n}(\varepsilon_i-p_i)(\varepsilon_j-p_j)v_i\cdot v_j.

Taking expectations,

E[X]=i=1nE[(εipi)2]vi2+21i<jnE[(εipi)(εjpj)]vivj=i=1nE[(εipi)2]vi2=i=1npi(1pi)vi2n4.\begin{aligned} \mathbb{E}[X] &=\sum_{i=1}^n \mathbb{E}[(\varepsilon_i-p_i)^2]|v_i|^2 +2\sum_{1\leq i<j\leq n} \mathbb{E}[(\varepsilon_i-p_i)(\varepsilon_j-p_j)]v_i\cdot v_j\\ &=\sum_{i=1}^n \mathbb{E}[(\varepsilon_i-p_i)^2]|v_i|^2\\ &=\sum_{i=1}^n p_i(1-p_i)|v_i|^2\\ &\le \frac{n}{4}. \end{aligned}

So for at least one choice of the εi\varepsilon_i, we have Xn/4X\le n/4, which means

i=1nεiviwn2.\left|\sum_{i=1}^n \varepsilon_i v_i-w\right| \le \frac{\sqrt n}{2}.

Exercise 2.4

Note

Here it is useful to understand conditional expectation first in the discrete case: it is the average value after some information is given. It also satisfies linearity, positivity, and the tower property.

Problem: 2.4.1

Prove the following properties of conditional expectation:

11 E[aY+bZX]=aE[YX]+bE[ZX]\mathbb{E}[aY+bZ\mid X]=a\mathbb{E}[Y\mid X]+b\mathbb{E}[Z\mid X] for all a,bRa,b\in\mathbb{R}.

22 If Y0Y\ge0, then E[YX]0\mathbb{E}[Y\mid X]\ge0.

33 E[1X]=1\mathbb{E}[1\mid X]=1.

44 If XX and YY are independent, then E[YX]=E[Y]\mathbb{E}[Y\mid X]=\mathbb{E}[Y].

55 E[Yg(X)X]=g(X)E[YX]\mathbb{E}[Yg(X)\mid X]=g(X)\mathbb{E}[Y\mid X], whenever both sides are well-defined.

Proof

For any xx with P(X=x)>0\mathbb{P}(X=x)>0, by definition

E[YX=x]=yyP(Y=yX=x).\mathbb{E}[Y\mid X=x]=\sum_y y\mathbb{P}(Y=y\mid X=x).

It is enough to prove each statement for every such xx.

11

E[aY+bZX=x]=y,z(ay+bz)P(Y=y,Z=zX=x)=ay,zyP(Y=y,Z=zX=x)+by,zzP(Y=y,Z=zX=x)=aE[YX=x]+bE[ZX=x].\begin{aligned} \mathbb{E}[aY+bZ\mid X=x] &=\sum_{y,z}(ay+bz)\mathbb{P}(Y=y,Z=z\mid X=x)\\ &=a\sum_{y,z}y\mathbb{P}(Y=y,Z=z\mid X=x) +b\sum_{y,z}z\mathbb{P}(Y=y,Z=z\mid X=x)\\ &=a\mathbb{E}[Y\mid X=x]+b\mathbb{E}[Z\mid X=x]. \end{aligned}

22 If Y0Y\ge0, then

E[YX=x]=yyP(Y=yX=x)0.\mathbb{E}[Y\mid X=x]=\sum_y y\mathbb{P}(Y=y\mid X=x)\ge0.

33

E[1X=x]=1.\mathbb{E}[1\mid X=x]=1.

44 If XX and YY are independent, then

P(Y=yX=x)=P(Y=y),\mathbb{P}(Y=y\mid X=x)=\mathbb{P}(Y=y),

so

E[YX=x]=yyP(Y=y)=E[Y].\mathbb{E}[Y\mid X=x]=\sum_y y\mathbb{P}(Y=y)=\mathbb{E}[Y].

55 Given X=xX=x, the value g(X)=g(x)g(X)=g(x) is constant. Hence

E[Yg(X)X=x]=E[Yg(x)X=x]=g(x)E[YX=x].\mathbb{E}[Yg(X)\mid X=x] =\mathbb{E}[Yg(x)\mid X=x] =g(x)\mathbb{E}[Y\mid X=x].

These identities hold for every relevant xx, so the desired statements follow.

Problem: 2.4.2

Let XX and YY be independent Poisson random variables with parameters λ1\lambda_1 and λ2\lambda_2. Find E[XX+Y]\mathbb{E}[X\mid X+Y].

Proof

Let S=X+YS=X+Y. For 0kn0\le k\le n,

P(X=kS=n)=P(X=k,Y=nk)P(S=n)=λ1keλ1k!λ2nkeλ2(nk)!(λ1+λ2)ne(λ1+λ2)n!=(nk)(λ1λ1+λ2)k(λ2λ1+λ2)nk.\begin{aligned} \mathbb{P}(X=k\mid S=n) &=\frac{\mathbb{P}(X=k,Y=n-k)}{\mathbb{P}(S=n)}\\ &=\frac{\dfrac{\lambda_1^k e^{-\lambda_1}}{k!} \dfrac{\lambda_2^{n-k}e^{-\lambda_2}}{(n-k)!}} {\dfrac{(\lambda_1+\lambda_2)^n e^{-(\lambda_1+\lambda_2)}}{n!}}\\ &=\binom{n}{k} \left(\frac{\lambda_1}{\lambda_1+\lambda_2}\right)^k \left(\frac{\lambda_2}{\lambda_1+\lambda_2}\right)^{n-k}. \end{aligned}

Thus, conditional on S=nS=n, the random variable XX is binomial with parameters nn and λ1λ1+λ2\dfrac{\lambda_1}{\lambda_1+\lambda_2}. Therefore

E[XS=n]=nλ1λ1+λ2.\mathbb{E}[X\mid S=n] =n\frac{\lambda_1}{\lambda_1+\lambda_2}.

Equivalently,

E[XX+Y]=λ1λ1+λ2(X+Y).\mathbb{E}[X\mid X+Y] =\frac{\lambda_1}{\lambda_1+\lambda_2}(X+Y).
Problem: 2.4.3

Let X,YX,Y be discrete random variables with means 00, variances 11, and covariance ρ\rho. Prove that

E[max{X2,Y2}]1+1ρ2.\mathbb{E}[\max\{X^2,Y^2\}]\leq 1+\sqrt{1-\rho^2}.
Proof

Observe that

max{X2,Y2}=X2+Y2+X2Y22=X2+Y2+XYX+Y2.\max\{X^2,Y^2\} =\frac{X^2+Y^2+|X^2-Y^2|}{2} =\frac{X^2+Y^2+|X-Y||X+Y|}{2}.

Thus

E[max{X2,Y2}]=1+12E[XYX+Y].\mathbb{E}[\max\{X^2,Y^2\}] =1+\frac{1}{2}\mathbb{E}[|X-Y||X+Y|].

By Cauchy's inequality,

E[XYX+Y]E[(XY)2]E[(X+Y)2].\mathbb{E}[|X-Y||X+Y|] \leq \sqrt{\mathbb{E}[(X-Y)^2]\mathbb{E}[(X+Y)^2]}.

Also,

E[(XY)2]=Var(XY)=2(1ρ),\mathbb{E}[(X-Y)^2]=\operatorname{Var}(X-Y)=2(1-\rho),

and

E[(X+Y)2]=Var(X+Y)=2(1+ρ).\mathbb{E}[(X+Y)^2]=\operatorname{Var}(X+Y)=2(1+\rho).

Therefore

E[XYX+Y]2(1ρ)2(1+ρ)=21ρ2.\mathbb{E}[|X-Y||X+Y|] \leq \sqrt{2(1-\rho)\cdot2(1+\rho)} =2\sqrt{1-\rho^2}.

So

E[max{X2,Y2}]1+1ρ2.\mathbb{E}[\max\{X^2,Y^2\}] \leq 1+\sqrt{1-\rho^2}.
Problem: 2.4.5

The conditional variance of YY given XX, denoted Var(YX)\operatorname{Var}(Y\mid X), is usually defined as the variance of the conditional distribution YXY\mid X. Using the usual formula

Var(Y)=E[Y2]E[Y]2,\operatorname{Var}(Y)=\mathbb{E}[Y^2]-\mathbb{E}[Y]^2,

we may define it directly by

Var(YX)=E[Y2X]E[YX]2.\operatorname{Var}(Y\mid X) =\mathbb{E}[Y^2\mid X]-\mathbb{E}[Y\mid X]^2.

Prove that

Var(Y)=E[Var(YX)]+Var(E[YX]).\operatorname{Var}(Y) =\mathbb{E}[\operatorname{Var}(Y\mid X)] +\operatorname{Var}(\mathbb{E}[Y\mid X]).
Proof

By definition,

E[Var(YX)]=E[E[Y2X]]E[E[YX]2].\mathbb{E}[\operatorname{Var}(Y\mid X)] =\mathbb{E}[\mathbb{E}[Y^2\mid X]] -\mathbb{E}[\mathbb{E}[Y\mid X]^2].

Also,

E[E[Y2X]]=E[Y2],E[E[YX]]=E[Y].\mathbb{E}[\mathbb{E}[Y^2\mid X]]=\mathbb{E}[Y^2], \qquad \mathbb{E}[\mathbb{E}[Y\mid X]]=\mathbb{E}[Y].

Hence

E[Var(YX)]=E[Y2]E[E[YX]2].\mathbb{E}[\operatorname{Var}(Y\mid X)] =\mathbb{E}[Y^2]-\mathbb{E}[\mathbb{E}[Y\mid X]^2].

On the other hand,

Var(E[YX])=E[E[YX]2]E[E[YX]]2=E[E[YX]2]E[Y]2.\operatorname{Var}(\mathbb{E}[Y\mid X]) =\mathbb{E}[\mathbb{E}[Y\mid X]^2] -\mathbb{E}[\mathbb{E}[Y\mid X]]^2 =\mathbb{E}[\mathbb{E}[Y\mid X]^2]-\mathbb{E}[Y]^2.

Adding the two identities gives

E[Var(YX)]+Var(E[YX])=E[Y2]E[Y]2=Var(Y).\mathbb{E}[\operatorname{Var}(Y\mid X)] +\operatorname{Var}(\mathbb{E}[Y\mid X]) =\mathbb{E}[Y^2]-\mathbb{E}[Y]^2 =\operatorname{Var}(Y).
Problem: 2.4.8

The 2024 Nobel Prize in Physics was awarded to Hopfield and Hinton for foundational work related to machine learning with artificial neural networks. Building on the idea of Hopfield networks, Hinton introduced Boltzmann machines. Given weights wij=wjiw_{ij}=w_{ji} with wii=0w_{ii}=0, define an nn-dimensional random vector

X=(X1,,Xn)X=(X_1,\cdots,X_n)

taking values in {0,1}n\{0,1\}^n by

P(X=x)=1Znexp{1i<jnwijxixj+1inbixi},\mathbb{P}(X=x)=\frac{1}{Z_n}\exp\left\{ \sum_{1\leq i<j\leq n} w_{ij}x_ix_j+\sum_{1\leq i\leq n} b_ix_i \right\},

where the partition function is

Zn=x{0,1}nexp{1i<jnwijxixj+1inbixi}.Z_n=\sum_{x\in \{0,1\}^n}\exp\left\{ \sum_{1\leq i<j\leq n} w_{ij}x_ix_j+\sum_{1\leq i\leq n} b_ix_i \right\}.

Let X(k)X^{(k)} be the vector XX with its kk-th coordinate removed. Prove that

E[XkX(k)]=exp{bk+ikwkiXi}1+exp{bk+ikwkiXi}.\mathbb{E}[X_k\mid X^{(k)}] =\frac{\exp\left\{ b_k+\sum_{i\neq k} w_{ki}X_i \right\}} {1+\exp\left\{ b_k+\sum_{i\neq k} w_{ki}X_i \right\}}.
Proof

Fix x(k)x^{(k)}, and write

η=bk+ikwkixi.\eta=b_k+\sum_{i\neq k}w_{ki}x_i.

When X(k)=x(k)X^{(k)}=x^{(k)} is fixed, all factors in the joint probability that do not depend on xkx_k can be absorbed into a constant CC. Thus

P(Xk=xk,X(k)=x(k))=Cexp{xkη},xk=0,1.\mathbb{P}(X_k=x_k,X^{(k)}=x^{(k)}) =C\exp\{x_k\eta\},\qquad x_k=0,1.

Therefore

P(Xk=1X(k)=x(k))=CeηC+Ceη=eη1+eη.\mathbb{P}(X_k=1\mid X^{(k)}=x^{(k)}) =\frac{Ce^\eta}{C+Ce^\eta} =\frac{e^\eta}{1+e^\eta}.

Since XkX_k only takes the values 00 and 11,

E[XkX(k)=x(k)]=P(Xk=1X(k)=x(k))=eη1+eη.\mathbb{E}[X_k\mid X^{(k)}=x^{(k)}] =\mathbb{P}(X_k=1\mid X^{(k)}=x^{(k)}) =\frac{e^\eta}{1+e^\eta}.

This gives

E[XkX(k)]=exp{bk+ikwkiXi}1+exp{bk+ikwkiXi}.\mathbb{E}[X_k\mid X^{(k)}] =\frac{\exp\left\{ b_k+\sum_{i\neq k} w_{ki}X_i \right\}} {1+\exp\left\{ b_k+\sum_{i\neq k} w_{ki}X_i \right\}}.

Exercise 2.5

Note

For inequality problems, first identify the tool: Cauchy, Jensen, Markov, Chebyshev, or Chernoff. Equality cases are often important too.

Problem: 2.5.1

Consider the simple random walk on the line

Sn=k=1nXk,S0=0,S_n=\sum_{k=1}^n X_k,\quad S_0=0,

where

P(Xi=1)=p,P(Xi=1)=1p,0<p<1.P(X_i=1)=p,\quad P(X_i=-1)=1-p,\quad 0<p<1.

Find E(Sn)E(S_n), Var(Sn)\operatorname{Var}(S_n), Cov(Sm,Sn)\operatorname{Cov}(S_m,S_n), and E[SnSm]E[S_n\mid S_m].

Proof

First note that

E(X1)=p(1p)=2p1,Var(X1)=1(2p1)2=4p(1p).E(X_1)=p-(1-p)=2p-1,\qquad \operatorname{Var}(X_1)=1-(2p-1)^2=4p(1-p).

Therefore

E(Sn)=k=1nE(Xk)=n(2p1),E(S_n)=\sum_{k=1}^n E(X_k)=n(2p-1),

and

Var(Sn)=k=1nVar(Xk)=4np(1p).\operatorname{Var}(S_n)=\sum_{k=1}^n\operatorname{Var}(X_k)=4np(1-p).

By independence,

Cov(Sm,Sn)=i=1mj=1nCov(Xi,Xj)=k=1mnVar(Xk)=4p(1p)(mn).\operatorname{Cov}(S_m,S_n) =\sum_{i=1}^m\sum_{j=1}^n \operatorname{Cov}(X_i,X_j) =\sum_{k=1}^{m\wedge n}\operatorname{Var}(X_k) =4p(1-p)(m\wedge n).

Now compute the conditional expectation.

If nmn\ge m, then

Sn=Sm+k=m+1nXk,S_n=S_m+\sum_{k=m+1}^n X_k,

and the latter sum is independent of SmS_m. Hence

E[SnSm]=Sm+(nm)(2p1).E[S_n\mid S_m]=S_m+(n-m)(2p-1).

If nmn\le m, then conditional on SmS_m, the variables X1,,XmX_1,\dots,X_m are exchangeable. Thus

E[X1Sm]==E[XmSm].E[X_1\mid S_m]=\cdots=E[X_m\mid S_m].

Also,

k=1mE[XkSm]=E[k=1mXkSm]=E[SmSm]=Sm.\sum_{k=1}^m E[X_k\mid S_m] =E\left[\sum_{k=1}^m X_k\mid S_m\right] =E[S_m\mid S_m] =S_m.

So

E[XkSm]=Smm,1km.E[X_k\mid S_m]=\frac{S_m}{m},\qquad 1\le k\le m.

Therefore

E[SnSm]=k=1nE[XkSm]=nmSm.E[S_n\mid S_m] =\sum_{k=1}^n E[X_k\mid S_m] =\frac{n}{m}S_m.

In summary,

E[SnSm]={nmSm,nm,Sm+(nm)(2p1),nm.E[S_n\mid S_m]= \begin{cases} \dfrac{n}{m}S_m, & n\le m,\\[6pt] S_m+(n-m)(2p-1), & n\ge m. \end{cases}
Problem: 2.5.2

In an election with two candidates, every vote goes to exactly one candidate. Suppose the final result is α\alpha votes for AA and β\beta votes for BB, with αβ\alpha\ge\beta. All counting orders are equally likely.

  1. Find the probability that the two candidates are tied at some time during the count.

  2. Prove that the probability that AA is never behind BB during the count is

αβ+1α+1.\frac{\alpha-\beta+1}{\alpha+1}.
Proof

As in the standard ballot problem, build a random walk:

Xi={1,the i-th vote is for A,1,the i-th vote is for B,Sk=i=1kXi.X_i= \begin{cases} 1, & \text{the }i\text{-th vote is for }A,\\ -1, & \text{the }i\text{-th vote is for }B, \end{cases} \qquad S_k=\sum_{i=1}^k X_i.

After kk votes have been counted, SkS_k is the number of votes by which AA leads BB. Each counting order corresponds to a path from (0,0)(0,0) to (α+β,αβ)(\alpha+\beta,\alpha-\beta), and all such paths are equally likely. The total number of paths is

Nα+β(0,αβ)=(α+βα).N_{\alpha+\beta}(0,\alpha-\beta)=\binom{\alpha+\beta}{\alpha}.

11 The event that the vote counts are tied at some time means that the path returns to the xx-axis after the start.

If α=β\alpha=\beta, the endpoint is on the xx-axis, so the probability is 11.

If α>β\alpha>\beta, the complement is that the path never returns to the xx-axis, meaning AA is always strictly ahead. By the ballot theorem,

#{paths from (0,0) to (α+β,αβ) that never return to the x-axis}=αβα+βNα+β(0,αβ).\#\{\text{paths from }(0,0)\text{ to }(\alpha+\beta,\alpha-\beta) \text{ that never return to the }x\text{-axis}\} =\frac{\alpha-\beta}{\alpha+\beta}N_{\alpha+\beta}(0,\alpha-\beta).

Thus

P(a tie occurs)=1αβα+β=2βα+β.P(\text{a tie occurs}) =1-\frac{\alpha-\beta}{\alpha+\beta} =\frac{2\beta}{\alpha+\beta}.

This also gives 11 when α=β\alpha=\beta.

22 The event that AA is never behind BB is the event Sk0S_k\ge0 for all kk.

Add one upward step to the front of each such path. This gives a path from (0,0)(0,0) to (α+β+1,αβ+1)(\alpha+\beta+1,\alpha-\beta+1) that never returns to the xx-axis after the start. Conversely, removing the first step recovers the original path. This is a bijection.

By the ballot theorem,

#{paths where A is never behind B}=αβ+1α+β+1Nα+β+1(0,αβ+1).\#\{\text{paths where }A\text{ is never behind }B\} =\frac{\alpha-\beta+1}{\alpha+\beta+1} N_{\alpha+\beta+1}(0,\alpha-\beta+1).

Since

Nα+β+1(0,αβ+1)=(α+β+1α+1),N_{\alpha+\beta+1}(0,\alpha-\beta+1)=\binom{\alpha+\beta+1}{\alpha+1},

we get

#{paths where A is never behind B}=αβ+1α+β+1(α+β+1α+1)=αβ+1α+1(α+βα).\#\{\text{paths where }A\text{ is never behind }B\} =\frac{\alpha-\beta+1}{\alpha+\beta+1}\binom{\alpha+\beta+1}{\alpha+1} =\frac{\alpha-\beta+1}{\alpha+1}\binom{\alpha+\beta}{\alpha}.

Dividing by the total number (α+βα)\binom{\alpha+\beta}{\alpha} gives

P(A is never behind B)=αβ+1α+1.P(\text{$A$ is never behind $B$}) =\frac{\alpha-\beta+1}{\alpha+1}.
Problem: 2.5.3

Let SnS_n be the simple symmetric random walk on the line, with S0=0S_0=0. Let

T=min{n1:Sn=0}T=\min\{n\ge1:S_n=0\}

be the first return time to the origin. Prove that

P(T=2n)=12n1(2nn)22n,P(T=2n)=\frac{1}{2n-1}\binom{2n}{n}2^{-2n},

and determine for which α\alpha we have E[Tα]<E[T^\alpha]<\infty.

Note. You may use Stirling's formula: n!nnen2πnn!\sim n^n e^{-n}\sqrt{2\pi n}.

Proof

Clearly TT can only be even. Let

An+={T=2n, X1=1},An={T=2n, X1=1}.A_n^+=\{T=2n,\ X_1=1\},\qquad A_n^-=\{T=2n,\ X_1=-1\}.

By symmetry,

P(T=2n)=P(An+)+P(An)=2P(An+).P(T=2n)=P(A_n^+)+P(A_n^-)=2P(A_n^+).

Count the paths in An+A_n^+. If T=2nT=2n and the first step is to 11, then

S1,S2,,S2n1>0,S2n=0.S_1,S_2,\cdots,S_{2n-1}>0,\qquad S_{2n}=0.

Reading this path backwards gives a path from (0,0)(0,0) to (2n1,1)(2n-1,1) that does not return to the xx-axis after the start. This is a bijection. By the ballot theorem,

#{paths in An+}=12n1N2n1(0,1)=12n1(2n1n).\#\{\text{paths in }A_n^+\} =\frac{1}{2n-1}N_{2n-1}(0,1) =\frac{1}{2n-1}\binom{2n-1}{n}.

Each path of length 2n2n has probability 22n2^{-2n}. Hence

P(T=2n)=212n1(2n1n)22n=12n1(2nn)22n.\begin{aligned} P(T=2n) &=2\cdot \frac{1}{2n-1}\binom{2n-1}{n}2^{-2n} \\ &=\frac{1}{2n-1}\binom{2n}{n}2^{-2n}. \end{aligned}

Now consider E[Tα]E[T^\alpha]. By Stirling's formula,

(2nn)4nπn.\binom{2n}{n}\sim \frac{4^n}{\sqrt{\pi n}}.

Thus

P(T=2n)=12n1(2nn)22n12πn3/2.P(T=2n) =\frac{1}{2n-1}\binom{2n}{n}2^{-2n} \sim \frac{1}{2\sqrt{\pi}}n^{-3/2}.

Therefore

E[Tα]=n=1(2n)αP(T=2n)n=1nα3/2.E[T^\alpha] =\sum_{n=1}^{\infty}(2n)^\alpha P(T=2n) \asymp \sum_{n=1}^{\infty}n^{\alpha-3/2}.

The series nα3/2\sum n^{\alpha-3/2} converges if and only if

α32<1,\alpha-\frac32<-1,

that is,

α<12.\alpha<\frac12.

Hence

E[Tα]<    α<12.E[T^\alpha]<\infty \iff \alpha<\frac12.
Problem: 2.5.4

A particle moves on a cycle with nodes labeled 0,1,,m0,1,\cdots,m. At each step it moves to a neighboring node clockwise or counterclockwise with equal probability. Starting from node 00, the particle moves until all nodes 1,2,,m1,2,\cdots,m have been visited.

  1. Prove that the particle visits all nodes 1,2,,m1,2,\cdots,m with probability 11.

  2. Find the probability that node ii is the last node visited, for 1im1\le i\le m.

Proof

11 Fix i{1,2,,m}i\in\{1,2,\cdots,m\}, and let Ai={the particle eventually visits node i}A_i=\{\text{the particle eventually visits node }i\}. For each r1r\ge1, let

Br={during steps (r1)m+1,(r1)m+2,,rm, node i is never visited}.B_r=\{\text{during steps }(r-1)m+1,\,(r-1)m+2,\cdots,rm, \text{ node }i\text{ is never visited}\}.

No matter where the particle is at time (r1)m(r-1)m, there is a choice of direction that reaches ii in at most mm steps. The probability of following that particular route is at least 2m2^{-m}. Hence

P(Brall results before time (r1)m)12m.P(B_r\mid \text{all results before time }(r-1)m)\le 1-2^{-m}.

Thus, for every N1N\ge1,

P(B1BN)(12m)N.P(B_1\cap\cdots\cap B_N)\le (1-2^{-m})^N.

If AicA_i^c occurs, then node ii is missed in every block of length mm. Therefore, for every N1N\ge1,

AicB1BN.A_i^c\subseteq B_1\cap\cdots\cap B_N.

So

P(Aic)P(B1BN)(12m)N.P(A_i^c)\le P(B_1\cap\cdots\cap B_N)\le (1-2^{-m})^N.

Letting NN\to\infty gives P(Ai)=1P(A_i)=1. This holds for every i=1,2,,mi=1,2,\cdots,m. Since there are only finitely many nodes,

P(i=1mAi)=1.P\Bigl(\bigcap_{i=1}^m A_i\Bigr)=1.

So the particle visits all nodes with probability 11.

22 Let

pi=P(the last node visited is i),1im.p_i=P(\text{the last node visited is }i),\qquad 1\le i\le m.

By part (1), i=1mpi=1\sum_{i=1}^m p_i=1.

For 2im12\le i\le m-1, condition on the first step:

pi=12P(after first going to 1, the last node visited is i)+12P(after first going to m, the last node visited is i).\begin{aligned} p_i &=\frac12 P(\text{after first going to }1,\text{ the last node visited is }i)\\ &\quad+\frac12 P(\text{after first going to }m,\text{ the last node visited is }i). \end{aligned}

If the first step is to 11 and the last node visited is ii, then node 00 must be visited again before reaching ii; otherwise the particle cannot cross ii to visit the other side. Thus this event has the same form as the original problem, after relabeling

10,21,,mm1,0m.1\mapsto0,\quad 2\mapsto1,\quad \cdots,\quad m\mapsto m-1,\quad 0\mapsto m.

Therefore

P(after first going to 1, the last node visited is i)=pi1.P(\text{after first going to }1,\text{ the last node visited is }i)=p_{i-1}.

Similarly,

P(after first going to m, the last node visited is i)=pi+1.P(\text{after first going to }m,\text{ the last node visited is }i)=p_{i+1}.

Hence

pi=pi1+pi+12,2im1.p_i=\frac{p_{i-1}+p_{i+1}}{2},\qquad 2\le i\le m-1.

So p1,,pmp_1,\cdots,p_m form an arithmetic progression. By symmetry around node 00, p1=pmp_1=p_m. An arithmetic progression with equal endpoints is constant, so

p1=p2==pm.p_1=p_2=\cdots=p_m.

Since their sum is 11,

pi=1m,1im.p_i=\frac1m,\qquad 1\le i\le m.

Exercise 2.6

Note

This section repeatedly uses tail control: turn a probability bound into an expectation bound, then close it by summing or integrating.

Problem: 2.6.1

Let G1,G2G_1,G_2 be probability generating functions, and let 0α10\le \alpha\le1. Prove that G1G2G_1G_2 and αG1+(1α)G2\alpha G_1+(1-\alpha)G_2 are also probability generating functions. Is

G(αs)G(α)\frac{G(\alpha s)}{G(\alpha)}

also a probability generating function?

Proof

Write

Gi(s)=n=0pn(i)sn,pn(i)0,n=0pn(i)=1,i=1,2.G_i(s)=\sum_{n=0}^{\infty}p_n^{(i)}s^n,\qquad p_n^{(i)}\ge0,\qquad \sum_{n=0}^{\infty}p_n^{(i)}=1,\quad i=1,2.

Then

G1(s)G2(s)=n=0(k=0npk(1)pnk(2))sn.G_1(s)G_2(s) =\sum_{n=0}^{\infty} \left(\sum_{k=0}^n p_k^{(1)}p_{n-k}^{(2)}\right)s^n.

The coefficients are nonnegative, and their sum is

n=0k=0npk(1)pnk(2)=(n=0pn(1))(n=0pn(2))=1.\sum_{n=0}^{\infty}\sum_{k=0}^n p_k^{(1)}p_{n-k}^{(2)} =\left(\sum_{n=0}^{\infty}p_n^{(1)}\right) \left(\sum_{n=0}^{\infty}p_n^{(2)}\right) =1.

So G1G2G_1G_2 is a probability generating function.

Also,

αG1(s)+(1α)G2(s)=n=0(αpn(1)+(1α)pn(2))sn.\alpha G_1(s)+(1-\alpha)G_2(s) =\sum_{n=0}^{\infty} \left(\alpha p_n^{(1)}+(1-\alpha)p_n^{(2)}\right)s^n.

The coefficients are nonnegative and sum to

n=0(αpn(1)+(1α)pn(2))=α+(1α)=1.\sum_{n=0}^{\infty} \left(\alpha p_n^{(1)}+(1-\alpha)p_n^{(2)}\right) =\alpha+(1-\alpha)=1.

So αG1+(1α)G2\alpha G_1+(1-\alpha)G_2 is also a probability generating function.

Now write

G(s)=n=0pnsn.G(s)=\sum_{n=0}^{\infty}p_ns^n.

When G(α)>0G(\alpha)>0,

G(αs)G(α)=n=0pnαnG(α)sn.\frac{G(\alpha s)}{G(\alpha)} =\sum_{n=0}^{\infty}\frac{p_n\alpha^n}{G(\alpha)}s^n.

The coefficients are nonnegative and sum to

n=0pnαnG(α)=G(α)G(α)=1.\sum_{n=0}^{\infty}\frac{p_n\alpha^n}{G(\alpha)} =\frac{G(\alpha)}{G(\alpha)} =1.

Thus it is still a probability generating function. In particular, this always works for α(0,1]\alpha\in(0,1]. If α=0\alpha=0, the expression is meaningful only when G(0)>0G(0)>0; in that case it is identically 11, so it is still a probability generating function.

Problem: 2.6.3

Let XX have the geometric distribution with parameter pp, 0<p<10<p<1:

P(X=k)=(1p)k1p,k=1,2,.\mathbb{P}(X=k)=(1-p)^{k-1}p,\quad k=1,2,\cdots.

Let YY be a nonnegative integer-valued random variable with probability generating function G(s)G(s), and suppose YY is independent of XX. Prove that

P(X>Y)=G(1p).\mathbb{P}(X>Y)=G(1-p).
Proof

By the law of total probability and independence,

P(X>Y)=n=0P(X>n,Y=n)=n=0P(X>n)P(Y=n).\mathbb{P}(X>Y) =\sum_{n=0}^{\infty}\mathbb{P}(X>n,Y=n) =\sum_{n=0}^{\infty}\mathbb{P}(X>n)\mathbb{P}(Y=n).

For the geometric distribution,

P(X>n)=k=n+1(1p)k1p=(1p)n.\mathbb{P}(X>n) =\sum_{k=n+1}^{\infty}(1-p)^{k-1}p =(1-p)^n.

Therefore

P(X>Y)=n=0(1p)nP(Y=n)=G(1p).\mathbb{P}(X>Y) =\sum_{n=0}^{\infty}(1-p)^n\mathbb{P}(Y=n) =G(1-p).
Problem: 2.6.4

Prove that

G(x,y,z,w)=18(xyzw+xy+yz+zw+xw+yw+xz+1)G(x,y,z,w)=\frac{1}{8}(xyzw+xy+yz+zw+xw+yw+xz+1)

is the joint generating function of four random variables that are pairwise independent and three-wise independent, but not mutually independent.

Proof

All coefficients of

G(x,y,z,w)=18(xyzw+xy+yz+zw+xw+yw+xz+1)G(x,y,z,w)=\frac{1}{8}(xyzw+xy+yz+zw+xw+yw+xz+1)

are nonnegative, and the sum of the coefficients is 11. Hence it is indeed the joint generating function of some four-dimensional random vector.

For the marginal generating function,

GX(x)=G(x,1,1,1)=1+x2.G_X(x)=G(x,1,1,1)=\frac{1+x}{2}.

The other three marginals are the same. For two variables,

GX,Y(x,y)=G(x,y,1,1)=(1+x)(1+y)4=GX(x)GY(y).G_{X,Y}(x,y)=G(x,y,1,1) =\frac{(1+x)(1+y)}{4} =G_X(x)G_Y(y).

By symmetry, any two of the four variables are independent.

For three variables,

GX,Y,Z(x,y,z)=G(x,y,z,1)=(1+x)(1+y)(1+z)8=GX(x)GY(y)GZ(z).G_{X,Y,Z}(x,y,z)=G(x,y,z,1) =\frac{(1+x)(1+y)(1+z)}{8} =G_X(x)G_Y(y)G_Z(z).

Again by symmetry, any three variables are independent.

If all four variables were mutually independent, their joint generating function would be

GX(x)GY(y)GZ(z)GW(w)=(1+x)(1+y)(1+z)(1+w)16.G_X(x)G_Y(y)G_Z(z)G_W(w) =\frac{(1+x)(1+y)(1+z)(1+w)}{16}.

This is not equal to G(x,y,z,w)G(x,y,z,w); for instance, the right-hand side has an xx term, while GG does not. Hence the four variables are not mutually independent.

Extra material

Note

The characterization of distribution functions is mostly analytic. The random-walk examples are more about recursions and stopping times. It is fine to read the two parts separately.

Notes left from class

Theorem: characterization of distribution functions

Let F:RRF:\mathbb{R}\to\mathbb{R} be a function. Then FF is the distribution function of some random variable if and only if it satisfies the following three properties:

  1. Monotonicity: if x1<x2x_1<x_2, then F(x1)F(x2)F(x_1)\le F(x_2).

  2. Right-continuity: for every xRx\in\mathbb{R}, limyx+F(y)=F(x)\lim_{y\to x^+}F(y)=F(x).

  3. Normalization: limxF(x)=0\lim_{x\to-\infty}F(x)=0 and limx+F(x)=1\lim_{x\to+\infty}F(x)=1.

Proof

Necessity is omitted. We prove sufficiency.

Assume FF satisfies the three properties. Let UU(0,1)U\sim U(0,1) and define

X=inf{tR:F(t)U}.X=\inf\{t\in\mathbb{R}:F(t)\ge U\}.

Since limxF(x)=0\lim_{x\to-\infty}F(x)=0 and limx+F(x)=1\lim_{x\to+\infty}F(x)=1, the set above is nonempty and bounded below, so XX is well-defined.

For any xRx\in\mathbb{R}, if UF(x)U\le F(x), then x{t:F(t)U}x\in\{t:F(t)\ge U\}, so XxX\le x. Hence

{UF(x)}{Xx}.\{U\le F(x)\}\subseteq\{X\le x\}.

Conversely, if XxX\le x, then for every n1n\ge1 there exists some tn<x+1nt_n<x+\frac1n such that F(tn)UF(t_n)\ge U. Since FF is nondecreasing,

UF(tn)F(x+1n).U\le F(t_n)\le F\left(x+\frac1n\right).

Letting nn\to\infty and using right-continuity gives UF(x)U\le F(x). Thus

{Xx}{UF(x)}.\{X\le x\}\subseteq\{U\le F(x)\}.

Therefore

{Xx}={UF(x)},\{X\le x\}=\{U\le F(x)\},

and

P(Xx)=P(UF(x))=F(x).\mathbb{P}(X\le x)=\mathbb{P}(U\le F(x))=F(x).

So FF is the distribution function of XX.

Good problems

Problem: absorption time of a simple random walk

Let {Sn}n0\{S_n\}_{n\ge0} be a simple random walk on the state space {0,1,,L}\{0,1,\dots,L\}, where 00 and LL are absorbing states. If S0=1S_0=1, find the probability that the walk is absorbed exactly at time nn.

Proof

Let the absorption time be

τ=inf{n0:Sn{0,L}}.\tau=\inf\{n\ge0:S_n\in\{0,L\}\}.

For x=1,2,,L1x=1,2,\dots,L-1, set

P(x,n)=P(Sn=x, τ>n).P(x,n)=\mathbb{P}(S_n=x,\ \tau>n).

Then

P(x,n)=12P(x1,n1)+12P(x+1,n1),P(x,n)=\frac12P(x-1,n-1)+\frac12P(x+1,n-1),

with boundary conditions

P(0,n)=P(L,n)=0,P(0,n)=P(L,n)=0,

and initial condition

P(x,0)=1{x=1}.P(x,0)=\mathbf{1}_{\{x=1\}}.

Because the boundary values are zero, expand in sine functions:

P(x,n)=m=1L1am(n)sinmπxL.P(x,n)=\sum_{m=1}^{L-1}a_m(n)\sin\frac{m\pi x}{L}.

Substituting this into the recursion gives

am(n)=am(n1)cosmπL.a_m(n)=a_m(n-1)\cos\frac{m\pi}{L}.

Hence

am(n)=am(0)cosnmπL.a_m(n)=a_m(0)\cos^n\frac{m\pi}{L}.

The initial condition gives

am(0)=2LsinmπL.a_m(0)=\frac{2}{L}\sin\frac{m\pi}{L}.

Thus

P(x,n)=m=1L12LsinmπLcosnmπLsinmπxL.P(x,n)=\sum_{m=1}^{L-1} \frac{2}{L}\sin\frac{m\pi}{L} \cos^n\frac{m\pi}{L} \sin\frac{m\pi x}{L}.

Therefore

P(τ>n)=x=1L1P(x,n),\mathbb{P}(\tau>n)=\sum_{x=1}^{L-1}P(x,n),

and

P(τ=n)=P(τ>n1)P(τ>n).\mathbb{P}(\tau=n)=\mathbb{P}(\tau>n-1)-\mathbb{P}(\tau>n).

This gives the desired probability in terms of the explicit expression above.

Remark

If the walk starts from any i{1,2,,L1}i\in\{1,2,\dots,L-1\}, only the initial condition changes:

P(x,0)=1{x=i}.P(x,0)=\mathbf{1}_{\{x=i\}}.

The rest of the argument is the same.

Problem: distribution of the first run of successes

Toss a coin repeatedly and independently. Each toss is heads with probability pp and tails with probability q=1pq=1-p. Let NN be the number of tosses needed to see mm consecutive heads for the first time. Find the generating function of NN.

Proof

Let

Pn=P(N=n),nm.P_n=\mathbb{P}(N=n),\qquad n\ge m.

Clearly,

Pn=0(n<m),Pm=pm.P_n=0\quad(n<m),\qquad P_m=p^m.

For n>mn>m, classify by the first tail among the first mm tosses. This gives the recursion

Pn=qk=1mpk1Pnk,n>m.P_n=q\sum_{k=1}^m p^{k-1}P_{n-k},\qquad n>m.

Let

G(z)=n=mPnzn.G(z)=\sum_{n=m}^{\infty}P_nz^n.

Summing the recursion gives

G(z)pmzm=qn=m+1k=1mpk1Pnkzn.G(z)-p^mz^m =q\sum_{n=m+1}^{\infty}\sum_{k=1}^m p^{k-1}P_{n-k}z^n.

Rearranging,

G(z)pmzm=qz(1+pz++(pz)m1)G(z).G(z)-p^mz^m =qz\bigl(1+pz+\cdots+(pz)^{m-1}\bigr)G(z).

Thus

G(z)=pmzm1qz(1+pz++(pz)m1).G(z)=\frac{p^mz^m}{1-qz\bigl(1+pz+\cdots+(pz)^{m-1}\bigr)}.

Using

1+pz++(pz)m1=1(pz)m1pz,1+pz+\cdots+(pz)^{m-1} =\frac{1-(pz)^m}{1-pz},

we get

G(z)=(pz)m(1pz)1z+qpmzm+1.G(z)=\frac{(pz)^m(1-pz)}{1-z+qp^mz^{m+1}}.

This is the generating function of NN.

Midterm review

Note

For review, organize the material by counting, distribution calculations, conditional probability, expectation and variance, and random walks. Check both the answers and the conditions under which each method applies.

A useful review path is: concepts, homework problems, and then common hard points. In other words, start from the book's definitions, then go through the problem types in the homework, and finally review the methods used repeatedly in class.

1. Basic concepts

The easiest points to lose are often not from the hardest computations. They come from unclear definitions or not knowing which property to use. At minimum, you should be able to explain the following items in your own words and recognize when a problem is asking for them:

  1. The three parts of a probability space: sample space, event space, and probability measure.

  2. Random variables, distribution functions, and the basic properties of distribution functions.

  3. Discrete and continuous random variables, and the relation among probability mass functions, density functions, and distribution functions.

  4. Joint distributions, marginal distributions, conditional distributions, and independence for two-dimensional random variables.

  5. The definitions and basic properties of expectation, variance, covariance, and correlation.

  6. The meaning of conditional expectation, and the idea of "condition first, then take expectation."

  7. Basic facts about common distributions: Bernoulli, binomial, geometric, and Poisson.

Remark

Do not stop at "this looks familiar." Try to say the definitions without looking at the notes. In particular, be able to say clearly what independence, conditional expectation, and a distribution function mean.

Here are two examples.

Problem: Fall 2024, problem 1

Toss two fair coins. Write down the three parts of the probability space in detail, and explain why there exist two independent random variables on it.

Solution

Take

Ω={HH,HT,TH,TT},\Omega=\{HH,HT,TH,TT\},

where the first letter records the first coin and the second letter records the second coin. Let

F=2Ω,\mathcal{F}=2^\Omega,

and define the probability measure PP by

P({ω})=14,ωΩ.P(\{\omega\})=\frac14,\qquad \omega\in\Omega.

This gives the probability space (Ω,F,P)(\Omega,\mathcal{F},P).

Now define

X=1{the first coin is heads},Y=1{the second coin is heads}.X=\mathbf{1}_{\{\text{the first coin is heads}\}},\qquad Y=\mathbf{1}_{\{\text{the second coin is heads}\}}.

Then XX and YY both take values in {0,1}\{0,1\}, and

P(X=1)=P(Y=1)=12.P(X=1)=P(Y=1)=\frac12.

Moreover,

P(X=i,Y=j)=14=P(X=i)P(Y=j),i,j{0,1}.P(X=i,Y=j)=\frac14=P(X=i)P(Y=j),\qquad i,j\in\{0,1\}.

Thus XX and YY are independent.

Problem: Fall 2019, problem 2

Give a probability space on [0,1][0,1]. For

A1=[a1,b1],A2=[a2,b2],A_1=[a_1,b_1],\qquad A_2=[a_2,b_2],

when are A1A_1 and A2A_2 independent?

Solution

Take

Ω=[0,1],F=B([0,1]),P=μ,\Omega=[0,1],\qquad \mathcal{F}=\mathcal{B}([0,1]),\qquad P=\mu,

where μ\mu is the Borel probability measure given by interval length on [0,1][0,1]. Thus for every closed interval [a,b][0,1][a,b]\subset[0,1],

P([a,b])=ba.P([a,b])=b-a.

Let

1=b1a1,2=b2a2.\ell_1=b_1-a_1,\qquad \ell_2=b_2-a_2.

Then A1A_1 and A2A_2 are independent if and only if

P(A1A2)=P(A1)P(A2)=12.P(A_1\cap A_2)=P(A_1)P(A_2)=\ell_1\ell_2.

Assume without loss of generality that a1a2a_1\le a_2.

(1) If b1a2b_1\le a_2, then the two intervals intersect in at most one point, so

P(A1A2)=0.P(A_1\cap A_2)=0.

Independence then holds if and only if 12=0\ell_1\ell_2=0, meaning that at least one interval degenerates to a point.

(2) If a2b2b1a_2\le b_2\le b_1, then A2A1A_2\subset A_1, so

P(A1A2)=P(A2)=2.P(A_1\cap A_2)=P(A_2)=\ell_2.

Independence requires

2=12.\ell_2=\ell_1\ell_2.

Thus either 2=0\ell_2=0 or 1=1\ell_1=1. In words, either A2A_2 is a single point or A1=[0,1]A_1=[0,1].

(3) If a2<b1<b2a_2<b_1<b_2, the intervals overlap but neither contains the other. Let

x=a2a1,y=b1a2,z=b2b1.x=a_2-a_1,\qquad y=b_1-a_2,\qquad z=b_2-b_1.

Then x,y,z>0x,y,z>0, and

1=x+y,2=y+z,P(A1A2)=y.\ell_1=x+y,\qquad \ell_2=y+z,\qquad P(A_1\cap A_2)=y.

If the intervals were independent, then

y=(x+y)(y+z)=y2+y(x+z)+xz>y,y=(x+y)(y+z)=y^2+y(x+z)+xz>y,

a contradiction. So this case is impossible.

In this probability space, two closed intervals are independent if and only if at least one of them has probability 00 or 11. Equivalently, at least one interval is either a point or the whole interval [0,1][0,1].

2. Homework review

The homework problems are the most important review material. Many exam questions appear in similar, sometimes almost identical, forms, so every homework problem should be worked through carefully.

Problems worth reviewing include:

  1. If X,YX,Y are independent, XPoisson(λ1)X\sim \mathrm{Poisson}(\lambda_1), and YPoisson(λ2)Y\sim \mathrm{Poisson}(\lambda_2), find
E[XX+Y].\mathbb{E}[X\mid X+Y].
  1. If NPoisson(λ)N\sim\mathrm{Poisson}(\lambda), and conditional on NN we toss a coin NN times, with XX the number of heads, find
E[NX].\mathbb{E}[N\mid X].
  1. Let
Sn=k=1nXk,S0=0,S_n=\sum_{k=1}^n X_k,\qquad S_0=0,

be a simple random walk on the line, where

P(Xi=1)=p,P(Xi=1)=1p,0<p<1.\mathbb{P}(X_i=1)=p,\qquad \mathbb{P}(X_i=-1)=1-p,\qquad 0<p<1.

For mnm\le n, find

Cov(Sn,Sm)andVar(SnSm).\operatorname{Cov}(S_n,S_m) \quad\text{and}\quad \operatorname{Var}(S_n\mid S_m).

Methods to know:

  1. Cauchy's inequality, especially for estimates, proving variances are nonnegative, and controlling expectations.

  2. Markov's inequality, and the basic idea of controlling tail probabilities by expectations.

  3. The law of total expectation.

  4. Writing a random variable as a sum of indicator variables, which is useful for expectation, variance, and sometimes higher moments.

3. Examples

1. Generating functions and moment generating functions

Generating functions are useful for several reasons:

  1. An ordinary generating function encodes the distribution of a nonnegative integer-valued random variable as one function.

  2. Derivatives give expectation, variance, and higher moments.

  3. For independent random variables, the generating function or moment generating function of the sum is the product of the individual ones.

  4. Generating functions are especially useful for recursions and first-hitting or first-occurrence times.

  5. When a moment generating function exists, it can often characterize the distribution and makes it easier to compare moments.

Problem: symmetric random walk and Catalan numbers

Let {Sn}n0\{S_n\}_{n\ge0} be the simple symmetric random walk on the line, with S0=0S_0=0. At each step it moves right with probability 12\frac12 and left with probability 12\frac12. Find

P(S10,S20,,S2n0,S2n=0).\mathbb{P}(S_1\ge0,S_2\ge0,\dots,S_{2n}\ge0,S_{2n}=0).
Solution

Let

Cn:=#{(S1,,S2n):Si0, 1i2n, S2n=0},C0=1.C_n:=\#\{(S_1,\dots,S_{2n}):S_i\ge0,\ 1\le i\le2n,\ S_{2n}=0\}, \qquad C_0=1.

Then the desired probability is

P(S10,S20,,S2n0,S2n=0)=Cn22n,\mathbb{P}(S_1\ge0,S_2\ge0,\dots,S_{2n}\ge0,S_{2n}=0) =\frac{C_n}{2^{2n}},

because every path of length 2n2n has probability 22n2^{-2n}.

Suppose the path first returns to 00 at time 2k2k, where 1kn1\le k\le n. Then Si1S_i\ge1 for 1i2k11\le i\le 2k-1. Shifting this first part down by 11 gives a path starting at 00, of length 2k22k-2, staying nonnegative and ending at 00. There are Ck1C_{k-1} such paths. After the first return, the remaining 2(nk)2(n-k) steps form another path of the same type, giving CnkC_{n-k} choices. Hence

Cn=k=1nCk1Cnk,n1.C_n=\sum_{k=1}^n C_{k-1}C_{n-k},\qquad n\ge1.

Introduce the generating function

F(z):=n=0Cnzn.F(z):=\sum_{n=0}^{\infty}C_nz^n.

The recursion gives

F(z)=1+n=1k=1nCk1Cnkzn=1+zF(z)2.\begin{aligned} F(z) &=1+\sum_{n=1}^{\infty}\sum_{k=1}^n C_{k-1}C_{n-k}z^n\\ &=1+zF(z)^2. \end{aligned}

Therefore

zF(z)2F(z)+1=0.zF(z)^2-F(z)+1=0.

Solving,

F(z)=114z2z,F(z)=\frac{1-\sqrt{1-4z}}{2z},

where we choose the branch with F(0)=1F(0)=1. Hence

Cn=1n+1(2nn).C_n=\frac{1}{n+1}\binom{2n}{n}.

Thus

P(S10,S20,,S2n0,S2n=0)=122n1n+1(2nn).\mathbb{P}(S_1\ge0,S_2\ge0,\dots,S_{2n}\ge0,S_{2n}=0) =\frac{1}{2^{2n}}\cdot\frac{1}{n+1}\binom{2n}{n}.
Problem: Spring 2025, problem 6

Let XnX_n be a nonconstant random variable taking values in {0,1,,2n}\{0,1,\dots,2n\}. Its generating function G(z)=E[zXn]G(z)=\mathbb{E}[z^{X_n}] is a polynomial of degree 2n2n and satisfies the Lee--Yang property: all zeros of G(z)=0G(z)=0 lie on the unit circle.

  1. Give an example of a random variable whose generating function has the Lee--Yang property.

  2. Prove that for every nonnegative integer mm,

E[(Xnn)2m+1]=0.\mathbb{E}\bigl[(X_n-n)^{2m+1}\bigr]=0.
  1. Let
Xn:=XnE[Xn]Var(Xn).X_n^*:=\frac{X_n-\mathbb{E}[X_n]}{\sqrt{\operatorname{Var}(X_n)}}.

Prove that

1E[(Xn)4]<3.1\le \mathbb{E}\bigl[(X_n^*)^4\bigr]<3.
Solution

(i) A standard example is

XnBin ⁣(2n,12).X_n\sim\mathrm{Bin}\!\left(2n,\frac12\right).

Then

G(z)=(1+z2)2n,G(z)=\left(\frac{1+z}{2}\right)^{2n},

and all zeros are at z=1z=-1, which lies on the unit circle.

(ii) Let

Y:=Xnn.Y:=X_n-n.

Since the coefficients of GG are real and all roots lie on the unit circle, the roots come in conjugate pairs. We can write

G(z)=λk=1n(z2akz+1),ak[2,2].G(z)=\lambda\prod_{k=1}^n(z^2-a_kz+1), \qquad a_k\in[-2,2].

Then

MY(t):=E[etY]=entG(et)=λk=1n(et+etak).M_Y(t):=\mathbb{E}[e^{tY}] =e^{-nt}G(e^t) =\lambda\prod_{k=1}^n(e^t+e^{-t}-a_k).

The right-hand side is an even function of tt. Hence MY(t)M_Y(t) is even, and for every nonnegative integer mm,

E[Y2m+1]=MY(2m+1)(0)=0.\mathbb{E}[Y^{2m+1}] =M_Y^{(2m+1)}(0)=0.

Thus

E[(Xnn)2m+1]=0.\mathbb{E}\bigl[(X_n-n)^{2m+1}\bigr]=0.

(iii) Taking m=0m=0 in part (ii) gives

E[Xn]=n.\mathbb{E}[X_n]=n.

So Y=XnE[Xn]Y=X_n-\mathbb{E}[X_n]. Let

MY(t)=E[etY],f(t)=logMY(t).M_Y(t)=\mathbb{E}[e^{tY}],\qquad f(t)=\log M_Y(t).

Since all odd moments vanish,

MY(t)=1+E[Y2]2t2+E[Y4]24t4+o(t4),M_Y(t) =1+\frac{\mathbb{E}[Y^2]}{2}t^2 +\frac{\mathbb{E}[Y^4]}{24}t^4+o(t^4),

and therefore

f(t)=E[Y2]2t2+E[Y4]3(E[Y2])224t4+o(t4).f(t) =\frac{\mathbb{E}[Y^2]}{2}t^2 +\frac{\mathbb{E}[Y^4]-3(\mathbb{E}[Y^2])^2}{24}t^4 +o(t^4).

On the other hand, let ck:=2ak(0,4]c_k:=2-a_k\in(0,4]. Then

f(t)=k=1nlog ⁣(ck+t2+t412+o(t4))+logλ.f(t)=\sum_{k=1}^n \log\!\left(c_k+t^2+\frac{t^4}{12}+o(t^4)\right) +\log\lambda.

For each kk,

log ⁣(ck+t2+t412+o(t4))=logck+log ⁣(1+t2ck+t412ck+o(t4))=logck+t2ck+(112ck12ck2)t4+o(t4).\begin{aligned} \log\!\left(c_k+t^2+\frac{t^4}{12}+o(t^4)\right) &=\log c_k+\log\!\left(1+\frac{t^2}{c_k}+\frac{t^4}{12c_k}+o(t^4)\right)\\ &=\log c_k+\frac{t^2}{c_k} +\left(\frac{1}{12c_k}-\frac{1}{2c_k^2}\right)t^4 +o(t^4). \end{aligned}

Thus

f(t)=C+k=1n[t2ck+(112ck12ck2)t4]+o(t4),f(t) =C+\sum_{k=1}^n \left[ \frac{t^2}{c_k} +\left(\frac{1}{12c_k}-\frac{1}{2c_k^2}\right)t^4 \right]+o(t^4),

where CC is a constant. Since 0<ck4<60<c_k\le4<6,

112ck12ck2=ck612ck2<0.\frac{1}{12c_k}-\frac{1}{2c_k^2} =\frac{c_k-6}{12c_k^2}<0.

Equivalently,

f(4)(0)=24k=1n(112ck12ck2)=k=1n(2ck12ck2)<0.f^{(4)}(0) =24\sum_{k=1}^n \left(\frac{1}{12c_k}-\frac{1}{2c_k^2}\right) =\sum_{k=1}^n\left(\frac{2}{c_k}-\frac{12}{c_k^2}\right)<0.

So the coefficient of t4t^4 in f(t)f(t) is negative. Hence

E[Y4]3(E[Y2])2<0,\mathbb{E}[Y^4]-3(\mathbb{E}[Y^2])^2<0,

or

E[Y4]<3(E[Y2])2.\mathbb{E}[Y^4]<3(\mathbb{E}[Y^2])^2.

After standardization,

E[(Xn)4]=E[Y4](E[Y2])2<3.\mathbb{E}\bigl[(X_n^*)^4\bigr] =\frac{\mathbb{E}[Y^4]}{(\mathbb{E}[Y^2])^2}<3.

On the other hand, by Jensen's inequality, or by Cauchy's inequality,

E[(Xn)4](E[(Xn)2])2=1.\mathbb{E}\bigl[(X_n^*)^4\bigr] \ge \bigl(\mathbb{E}[(X_n^*)^2]\bigr)^2 =1.

Therefore

1E[(Xn)4]<3.1\le \mathbb{E}\bigl[(X_n^*)^4\bigr]<3.

2. Simple random walk and common variants

First make sure you know the random-walk material in the notes. The earlier homework explanations are also worth reviewing carefully. In particular, know Theorems 2.5.2 through 2.5.5 in the textbook. In an exam, the reflection principle and the ballot theorem may be used directly.

Problem: Fall 2024, problem 5

In an election with two candidates, every ballot goes to exactly one of them. Suppose the final count is α\alpha votes for TT and β\beta votes for HH, with αβ\alpha\ge\beta. If the ballots are counted in a random order, find the probability that TT is never more than one vote behind HH during the count.

Solution

As before, build a random walk:

Xi={1,the i-th vote is for T,1,the i-th vote is for H,Sk=i=1kXi.X_i= \begin{cases} 1, & \text{the }i\text{-th vote is for }T,\\ -1, & \text{the }i\text{-th vote is for }H, \end{cases} \qquad S_k=\sum_{i=1}^k X_i.

After kk votes, SkS_k is the number of votes by which TT leads HH. The condition in the problem is

Sk1,1kα+β.S_k\ge -1,\qquad 1\le k\le \alpha+\beta.

Now add one extra vote for TT to the beginning of every counting order. The new election has α+1\alpha+1 votes for TT and β\beta votes for HH, and the new path satisfies

1+Sk0.1+S_k\ge0.

Thus the original problem is the same as asking that, in the new election, TT is never behind HH.

Conversely, in any new counting order where TT is never behind HH, the first vote must be for TT. Removing that first vote recovers a valid order for the original problem. This is a bijection.

By the ballot theorem,

(α+1)β+1(α+1)+1(α+β+1α+1)=αβ+2α+2(α+β+1α+1)\frac{(\alpha+1)-\beta+1}{(\alpha+1)+1} \binom{\alpha+\beta+1}{\alpha+1} =\frac{\alpha-\beta+2}{\alpha+2} \binom{\alpha+\beta+1}{\alpha+1}

is the number of valid new orders. The total number of original counting orders is

(α+βα).\binom{\alpha+\beta}{\alpha}.

Therefore the desired probability is

αβ+2α+2(α+β+1α+1)(α+βα)=αβ+2α+2α+β+1α+1.\begin{aligned} &\frac{\dfrac{\alpha-\beta+2}{\alpha+2} \binom{\alpha+\beta+1}{\alpha+1}} {\binom{\alpha+\beta}{\alpha}}\\ &=\frac{\alpha-\beta+2}{\alpha+2} \cdot\frac{\alpha+\beta+1}{\alpha+1}. \end{aligned}

That is,

P(T is at most one vote behind H during the count)=(α+β+1)(αβ+2)(α+1)(α+2).P(\text{$T$ is at most one vote behind $H$ during the count}) =\frac{(\alpha+\beta+1)(\alpha-\beta+2)} {(\alpha+1)(\alpha+2)}.

3. Probability and other subjects

Problem: Spring 2025, problem 5

Here is an example where probability meets linear algebra. Let Xn=(Xij)X_n=(X_{ij}) be an n×nn\times n matrix whose n2n^2 entries are independent symmetric Bernoulli random variables:

P(Xij=0)=P(Xij=1)=12.\mathbb{P}(X_{ij}=0)=\mathbb{P}(X_{ij}=1)=\frac12.

Define pn=P(det(Xn) is odd)p_n=\mathbb{P}(\det(X_n)\text{ is odd}). Compute p2p_2 and p3p_3, then guess and prove a general formula for pnp_n.

Solution

The key observation is that an integer is odd if and only if it is nonzero modulo 22. Hence

det(Xn) is odddet(Xn)≢0(mod2).\det(X_n)\text{ is odd} \quad\Longleftrightarrow\quad \det(X_n)\not\equiv0\pmod2.

So, if we view XnX_n as a matrix over F2={0,1}\mathbf{F}_2=\{0,1\}, the problem becomes

pn=P(Xn is invertible over F2).p_n=\mathbb{P}(X_n\text{ is invertible over }\mathbf{F}_2).

Compute this by checking whether the rows are linearly independent. Let

R1,R2,,RnF2nR_1,R_2,\dots,R_n\in\mathbf{F}_2^n

be the rows of XnX_n. Since the entries are independent and equally likely to be 00 or 11, each row is uniform on F2n\mathbf{F}_2^n, and the rows are independent.

The probability that the first row is nonzero is

2n12n=12n.\frac{2^n-1}{2^n}=1-2^{-n}.

If the first kk rows are linearly independent, their span has 2k2^k vectors. Thus the conditional probability that row k+1k+1 lies outside this span is

2n2k2n=12kn.\frac{2^n-2^k}{2^n}=1-2^{k-n}.

Therefore

pn=k=0n1(12kn)=j=1n(12j).p_n=\prod_{k=0}^{n-1}(1-2^{k-n}) =\prod_{j=1}^n(1-2^{-j}).

In particular,

p2=(112)(114)=38,p_2=\left(1-\frac12\right)\left(1-\frac14\right)=\frac38,

and

p3=(112)(114)(118)=2164.p_3=\left(1-\frac12\right)\left(1-\frac14\right)\left(1-\frac18\right)=\frac{21}{64}.

The general formula is

pn=j=1n(112j).p_n=\prod_{j=1}^n\left(1-\frac{1}{2^j}\right).
Problem: Fall 2020, problem 4

Let SnS_n be the symmetric group, the set of all bijections from {1,2,,n}\{1,2,\cdots,n\} to itself. Choose σ\sigma uniformly from SnS_n. Let the number of fixed points be

X(σ)={k:σ(k)=k},X(\sigma)=\left|\{k:\sigma(k)=k\}\right|,

and let the number of transpositions be

Y(σ)={(i,j):i<j, σ(i)=j, σ(j)=i}.Y(\sigma)=\left|\{(i,j):i<j,\ \sigma(i)=j,\ \sigma(j)=i\}\right|.
  1. Give the probability space in detail.

  2. Are XX and YY independent?

  3. Find the probability mass function of XX.

  4. Compute E[Y]\mathbb{E}[Y].

Solution

(1) Take

Ω=Sn,F=2Sn,P(A)=An!(ASn).\Omega=S_n,\qquad \mathcal{F}=2^{S_n},\qquad P(A)=\frac{|A|}{n!}\quad(A\subset S_n).

A sample point is a permutation σ\sigma, and X,YX,Y are random variables on this probability space.

(2) For n2n\ge2, XX and YY are not independent. Indeed,

P(X=n)=1n!>0,P(X=n)=\frac1{n!}>0,

and

P(Y>0)>0,P(Y>0)>0,

because, for example, the transposition (1 2)(1\ 2) has one transposition. But if X=nX=n, then σ\sigma must be the identity permutation, so Y=0Y=0. Hence

P(X=n,Y>0)=0P(X=n)P(Y>0).P(X=n,Y>0)=0\ne P(X=n)P(Y>0).

Thus XX and YY are not independent.

(3) For k=0,1,,nk=0,1,\dots,n, first choose which kk points are fixed. This can be done in

(nk)\binom{n}{k}

ways. The remaining nkn-k points must form a derangement. Let DmD_m be the number of derangements of mm elements. Then

P(X=k)=(nk)Dnkn!,k=0,1,,n.P(X=k)=\frac{\binom{n}{k}D_{n-k}}{n!}, \qquad k=0,1,\dots,n.

By inclusion-exclusion,

Dm=m!j=0m(1)jj!.D_m=m!\sum_{j=0}^m\frac{(-1)^j}{j!}.

Therefore

P(X=k)=1k!j=0nk(1)jj!,k=0,1,,n.P(X=k)=\frac{1}{k!}\sum_{j=0}^{n-k}\frac{(-1)^j}{j!}, \qquad k=0,1,\dots,n.

This is the probability mass function of XX.

(4) For each 1i<jn1\le i<j\le n, define

Iij=1{σ(i)=j, σ(j)=i}.I_{ij}=\mathbf{1}_{\{\sigma(i)=j,\ \sigma(j)=i\}}.

Then

Y=1i<jnIij.Y=\sum_{1\le i<j\le n} I_{ij}.

By linearity of expectation,

E[Y]=1i<jnE[Iij]=1i<jnP(σ(i)=j,σ(j)=i).\mathbb{E}[Y] =\sum_{1\le i<j\le n}\mathbb{E}[I_{ij}] =\sum_{1\le i<j\le n}P(\sigma(i)=j,\sigma(j)=i).

For a fixed pair i<ji<j, there are (n2)!(n-2)! permutations with σ(i)=j\sigma(i)=j and σ(j)=i\sigma(j)=i. Hence

P(σ(i)=j,σ(j)=i)=(n2)!n!=1n(n1).P(\sigma(i)=j,\sigma(j)=i) =\frac{(n-2)!}{n!} =\frac{1}{n(n-1)}.

Thus

E[Y]=(n2)1n(n1)=12.\mathbb{E}[Y] =\binom{n}{2}\frac{1}{n(n-1)} =\frac12.
End-of-chapter checklist
  • The original problems and solutions in this chapter come from the corresponding TeX source file.
  • You can first read only the problem boxes, write down the key identities, and then open the proofs or solutions.
  • If a result uses independence, countable additivity, a change of variables, or a moment condition, it is worth marking that point explicitly.