Third recitation
Contents
- This chapter covers discrete random variables, expectation, conditional expectation, the probabilistic method, and a midterm review.
- Linearity, indicator decompositions, and conditional distributions appear again and again.
- The parts on random walks, Catalan numbers, and distribution functions are worth reading slowly.
Tip. When a random quantity looks complicated, first try to write it as a sum of indicator variables. Then decide whether independence is needed.
Exercise 2.1
This section is mainly about probability mass functions, mixture distributions, and symmetry. Condition first, then sum. Independence is often used to pass symmetry from the marginals to the sum.
Roll a fair die first. After seeing the result, toss that many fair coins. Let be the number of heads. Find the probability mass function of .
Proof
Let be the die result. Then the possible values of are , and
The number of new posts in a unit time interval follows a Poisson distribution with parameter . The numbers of new posts in disjoint time intervals are independent. Find the probability of seeing new posts in two unit intervals.
Proof
Let be the number of posts in two unit intervals. Then
Let be independent discrete random variables, each symmetric about , meaning that and have the same probability mass function. Prove that for every ,
where
If independence is removed, does the conclusion still hold? Explain.
Proof
Without independence, the conclusion can fail. Take and let
Then both and are uniform on , so each is symmetric about . But
Let have probability mass function , . The Shannon entropy is
For fixed , which distribution of maximizes ?
Proof
By Jensen's inequality,
Equality holds when , that is, when is uniform on its possible values.
Exercise 2.2
For expectation problems, common tools are generating functions, tail-sum formulas, and indicator variables. For higher moments, first try breaking the random variable into Bernoulli indicators.
For , find .
Proof
Write , where are i.i.d. indicator random variables with . Then
Let the discrete random variable have probability mass function
For which real numbers is the -th moment finite?
Proof
If , then
If , then
Thus exactly when .
Let be the total number of intersections crossed by a self-driving ride-hailing car in one day, and suppose
The traffic lights at different intersections work independently, and the car meets a red light at each intersection with probability .
Find the expectation and variance of the total number of intersections crossed.
Find the expected number of red lights met in one day.
Proof
Let . From
differentiating gives
Thus
Also
So
and
Let be the number of red lights met in one day. Given , we have , so
Therefore
Here is a geometric random variable with parameter . One may directly use
For a nonnegative integer-valued random variable , prove that
Proof
Since is nonnegative and integer-valued,
Taking expectations and exchanging expectation with the sum gives
For a nonnegative real-valued random variable , use
Applying the previous result to and gives
and
Thus
This is a useful way to estimate an expectation by tail probabilities.
The random graph model has vertex set . Each pair of vertices is connected by an edge with probability , independently of all other pairs. The degree of vertex is the number of edges incident to .
Find the distribution and expectation of .
Let be the number of triangles in . Find and .
Proof
For a fixed vertex , the degree counts how many of the other possible edges appear. Hence
Let be the set of all triangles. For each , let be the indicator that triangle appears. Then
Therefore
For the variance,
Here
If two different triangles have no common edge, their edge sets are independent, so the covariance is . If they share one edge, then
The number of unordered pairs of triangles sharing an edge is
Thus
Exercise 2.3
The probabilistic method usually starts with a random construction, then uses an expectation to prove existence. The final conclusion is deterministic; randomness is only a proof tool.
Daniel Bernoulli described a diffusion model in 1769. Bottle A contains red balls, and bottle B contains blue balls. At each step, choose one ball from each bottle and exchange the two balls. Find the expected number of red balls in bottle A after steps.
Proof
For each ball that started in bottle A, let be the probability that it is still in bottle A after the -th exchange. Then , and
Solving the recursion gives
Label the balls originally in bottle A. Let be the indicator that the -th such ball is in bottle A after exchanges. If , then
Let be a finite graph. For a vertex set and an edge , define
Let
Use the probabilistic method to prove that there exists such that .
Proof
Choose each vertex independently with probability , and let be the chosen set. For a fixed edge, the probability that its two endpoints are separated by and is . Hence
Therefore at least one choice of satisfies .
A box contains balls labeled . Choose balls uniformly without replacement, and let be the sum of their labels. Find the expectation and variance of .
Proof
For each , let
Then
Each ball is chosen with probability , so
For the variance,
We have
For ,
Thus
Therefore
Let satisfy for all . Let
Use the probabilistic method to prove that there exist such that
Proof
Choose independently, with . Consider
Taking expectations,
So for at least one choice of the , we have , which means
Exercise 2.4
Here it is useful to understand conditional expectation first in the discrete case: it is the average value after some information is given. It also satisfies linearity, positivity, and the tower property.
Prove the following properties of conditional expectation:
for all .
If , then .
.
If and are independent, then .
, whenever both sides are well-defined.
Proof
For any with , by definition
It is enough to prove each statement for every such .
If , then
If and are independent, then
so
Given , the value is constant. Hence
These identities hold for every relevant , so the desired statements follow.
Let and be independent Poisson random variables with parameters and . Find .
Proof
Let . For ,
Thus, conditional on , the random variable is binomial with parameters and . Therefore
Equivalently,
Let be discrete random variables with means , variances , and covariance . Prove that
Proof
Observe that
Thus
By Cauchy's inequality,
Also,
and
Therefore
So
The conditional variance of given , denoted , is usually defined as the variance of the conditional distribution . Using the usual formula
we may define it directly by
Prove that
Proof
By definition,
Also,
Hence
On the other hand,
Adding the two identities gives
The 2024 Nobel Prize in Physics was awarded to Hopfield and Hinton for foundational work related to machine learning with artificial neural networks. Building on the idea of Hopfield networks, Hinton introduced Boltzmann machines. Given weights with , define an -dimensional random vector
taking values in by
where the partition function is
Let be the vector with its -th coordinate removed. Prove that
Proof
Fix , and write
When is fixed, all factors in the joint probability that do not depend on can be absorbed into a constant . Thus
Therefore
Since only takes the values and ,
This gives
Exercise 2.5
For inequality problems, first identify the tool: Cauchy, Jensen, Markov, Chebyshev, or Chernoff. Equality cases are often important too.
Consider the simple random walk on the line
where
Find , , , and .
Proof
First note that
Therefore
and
By independence,
Now compute the conditional expectation.
If , then
and the latter sum is independent of . Hence
If , then conditional on , the variables are exchangeable. Thus
Also,
So
Therefore
In summary,
In an election with two candidates, every vote goes to exactly one candidate. Suppose the final result is votes for and votes for , with . All counting orders are equally likely.
-
Find the probability that the two candidates are tied at some time during the count.
-
Prove that the probability that is never behind during the count is
Proof
As in the standard ballot problem, build a random walk:
After votes have been counted, is the number of votes by which leads . Each counting order corresponds to a path from to , and all such paths are equally likely. The total number of paths is
The event that the vote counts are tied at some time means that the path returns to the -axis after the start.
If , the endpoint is on the -axis, so the probability is .
If , the complement is that the path never returns to the -axis, meaning is always strictly ahead. By the ballot theorem,
Thus
This also gives when .
The event that is never behind is the event for all .
Add one upward step to the front of each such path. This gives a path from to that never returns to the -axis after the start. Conversely, removing the first step recovers the original path. This is a bijection.
By the ballot theorem,
Since
we get
Dividing by the total number gives
Let be the simple symmetric random walk on the line, with . Let
be the first return time to the origin. Prove that
and determine for which we have .
Note. You may use Stirling's formula: .
Proof
Clearly can only be even. Let
By symmetry,
Count the paths in . If and the first step is to , then
Reading this path backwards gives a path from to that does not return to the -axis after the start. This is a bijection. By the ballot theorem,
Each path of length has probability . Hence
Now consider . By Stirling's formula,
Thus
Therefore
The series converges if and only if
that is,
Hence
A particle moves on a cycle with nodes labeled . At each step it moves to a neighboring node clockwise or counterclockwise with equal probability. Starting from node , the particle moves until all nodes have been visited.
-
Prove that the particle visits all nodes with probability .
-
Find the probability that node is the last node visited, for .
Proof
Fix , and let . For each , let
No matter where the particle is at time , there is a choice of direction that reaches in at most steps. The probability of following that particular route is at least . Hence
Thus, for every ,
If occurs, then node is missed in every block of length . Therefore, for every ,
So
Letting gives . This holds for every . Since there are only finitely many nodes,
So the particle visits all nodes with probability .
Let
By part (1), .
For , condition on the first step:
If the first step is to and the last node visited is , then node must be visited again before reaching ; otherwise the particle cannot cross to visit the other side. Thus this event has the same form as the original problem, after relabeling
Therefore
Similarly,
Hence
So form an arithmetic progression. By symmetry around node , . An arithmetic progression with equal endpoints is constant, so
Since their sum is ,
Exercise 2.6
This section repeatedly uses tail control: turn a probability bound into an expectation bound, then close it by summing or integrating.
Let be probability generating functions, and let . Prove that and are also probability generating functions. Is
also a probability generating function?
Proof
Write
Then
The coefficients are nonnegative, and their sum is
So is a probability generating function.
Also,
The coefficients are nonnegative and sum to
So is also a probability generating function.
Now write
When ,
The coefficients are nonnegative and sum to
Thus it is still a probability generating function. In particular, this always works for . If , the expression is meaningful only when ; in that case it is identically , so it is still a probability generating function.
Let have the geometric distribution with parameter , :
Let be a nonnegative integer-valued random variable with probability generating function , and suppose is independent of . Prove that
Proof
By the law of total probability and independence,
For the geometric distribution,
Therefore
Prove that
is the joint generating function of four random variables that are pairwise independent and three-wise independent, but not mutually independent.
Proof
All coefficients of
are nonnegative, and the sum of the coefficients is . Hence it is indeed the joint generating function of some four-dimensional random vector.
For the marginal generating function,
The other three marginals are the same. For two variables,
By symmetry, any two of the four variables are independent.
For three variables,
Again by symmetry, any three variables are independent.
If all four variables were mutually independent, their joint generating function would be
This is not equal to ; for instance, the right-hand side has an term, while does not. Hence the four variables are not mutually independent.
Extra material
The characterization of distribution functions is mostly analytic. The random-walk examples are more about recursions and stopping times. It is fine to read the two parts separately.
Notes left from class
Let be a function. Then is the distribution function of some random variable if and only if it satisfies the following three properties:
-
Monotonicity: if , then .
-
Right-continuity: for every , .
-
Normalization: and .
Proof
Necessity is omitted. We prove sufficiency.
Assume satisfies the three properties. Let and define
Since and , the set above is nonempty and bounded below, so is well-defined.
For any , if , then , so . Hence
Conversely, if , then for every there exists some such that . Since is nondecreasing,
Letting and using right-continuity gives . Thus
Therefore
and
So is the distribution function of .
Good problems
Let be a simple random walk on the state space , where and are absorbing states. If , find the probability that the walk is absorbed exactly at time .
Proof
Let the absorption time be
For , set
Then
with boundary conditions
and initial condition
Because the boundary values are zero, expand in sine functions:
Substituting this into the recursion gives
Hence
The initial condition gives
Thus
Therefore
and
This gives the desired probability in terms of the explicit expression above.
If the walk starts from any , only the initial condition changes:
The rest of the argument is the same.
Toss a coin repeatedly and independently. Each toss is heads with probability and tails with probability . Let be the number of tosses needed to see consecutive heads for the first time. Find the generating function of .
Proof
Let
Clearly,
For , classify by the first tail among the first tosses. This gives the recursion
Let
Summing the recursion gives
Rearranging,
Thus
Using
we get
This is the generating function of .
Midterm review
For review, organize the material by counting, distribution calculations, conditional probability, expectation and variance, and random walks. Check both the answers and the conditions under which each method applies.
A useful review path is: concepts, homework problems, and then common hard points. In other words, start from the book's definitions, then go through the problem types in the homework, and finally review the methods used repeatedly in class.
1. Basic concepts
The easiest points to lose are often not from the hardest computations. They come from unclear definitions or not knowing which property to use. At minimum, you should be able to explain the following items in your own words and recognize when a problem is asking for them:
-
The three parts of a probability space: sample space, event space, and probability measure.
-
Random variables, distribution functions, and the basic properties of distribution functions.
-
Discrete and continuous random variables, and the relation among probability mass functions, density functions, and distribution functions.
-
Joint distributions, marginal distributions, conditional distributions, and independence for two-dimensional random variables.
-
The definitions and basic properties of expectation, variance, covariance, and correlation.
-
The meaning of conditional expectation, and the idea of "condition first, then take expectation."
-
Basic facts about common distributions: Bernoulli, binomial, geometric, and Poisson.
Do not stop at "this looks familiar." Try to say the definitions without looking at the notes. In particular, be able to say clearly what independence, conditional expectation, and a distribution function mean.
Here are two examples.
Toss two fair coins. Write down the three parts of the probability space in detail, and explain why there exist two independent random variables on it.
Solution
Take
where the first letter records the first coin and the second letter records the second coin. Let
and define the probability measure by
This gives the probability space .
Now define
Then and both take values in , and
Moreover,
Thus and are independent.
Give a probability space on . For
when are and independent?
Solution
Take
where is the Borel probability measure given by interval length on . Thus for every closed interval ,
Let
Then and are independent if and only if
Assume without loss of generality that .
(1) If , then the two intervals intersect in at most one point, so
Independence then holds if and only if , meaning that at least one interval degenerates to a point.
(2) If , then , so
Independence requires
Thus either or . In words, either is a single point or .
(3) If , the intervals overlap but neither contains the other. Let
Then , and
If the intervals were independent, then
a contradiction. So this case is impossible.
In this probability space, two closed intervals are independent if and only if at least one of them has probability or . Equivalently, at least one interval is either a point or the whole interval .
2. Homework review
The homework problems are the most important review material. Many exam questions appear in similar, sometimes almost identical, forms, so every homework problem should be worked through carefully.
Problems worth reviewing include:
- If are independent, , and , find
- If , and conditional on we toss a coin times, with the number of heads, find
- Let
be a simple random walk on the line, where
For , find
Methods to know:
-
Cauchy's inequality, especially for estimates, proving variances are nonnegative, and controlling expectations.
-
Markov's inequality, and the basic idea of controlling tail probabilities by expectations.
-
The law of total expectation.
-
Writing a random variable as a sum of indicator variables, which is useful for expectation, variance, and sometimes higher moments.
3. Examples
1. Generating functions and moment generating functions
Generating functions are useful for several reasons:
-
An ordinary generating function encodes the distribution of a nonnegative integer-valued random variable as one function.
-
Derivatives give expectation, variance, and higher moments.
-
For independent random variables, the generating function or moment generating function of the sum is the product of the individual ones.
-
Generating functions are especially useful for recursions and first-hitting or first-occurrence times.
-
When a moment generating function exists, it can often characterize the distribution and makes it easier to compare moments.
Let be the simple symmetric random walk on the line, with . At each step it moves right with probability and left with probability . Find
Solution
Let
Then the desired probability is
because every path of length has probability .
Suppose the path first returns to at time , where . Then for . Shifting this first part down by gives a path starting at , of length , staying nonnegative and ending at . There are such paths. After the first return, the remaining steps form another path of the same type, giving choices. Hence
Introduce the generating function
The recursion gives
Therefore
Solving,
where we choose the branch with . Hence
Thus
Let be a nonconstant random variable taking values in . Its generating function is a polynomial of degree and satisfies the Lee--Yang property: all zeros of lie on the unit circle.
-
Give an example of a random variable whose generating function has the Lee--Yang property.
-
Prove that for every nonnegative integer ,
- Let
Prove that
Solution
(i) A standard example is
Then
and all zeros are at , which lies on the unit circle.
(ii) Let
Since the coefficients of are real and all roots lie on the unit circle, the roots come in conjugate pairs. We can write
Then
The right-hand side is an even function of . Hence is even, and for every nonnegative integer ,
Thus
(iii) Taking in part (ii) gives
So . Let
Since all odd moments vanish,
and therefore
On the other hand, let . Then
For each ,
Thus
where is a constant. Since ,
Equivalently,
So the coefficient of in is negative. Hence
or
After standardization,
On the other hand, by Jensen's inequality, or by Cauchy's inequality,
Therefore
2. Simple random walk and common variants
First make sure you know the random-walk material in the notes. The earlier homework explanations are also worth reviewing carefully. In particular, know Theorems 2.5.2 through 2.5.5 in the textbook. In an exam, the reflection principle and the ballot theorem may be used directly.
In an election with two candidates, every ballot goes to exactly one of them. Suppose the final count is votes for and votes for , with . If the ballots are counted in a random order, find the probability that is never more than one vote behind during the count.
Solution
As before, build a random walk:
After votes, is the number of votes by which leads . The condition in the problem is
Now add one extra vote for to the beginning of every counting order. The new election has votes for and votes for , and the new path satisfies
Thus the original problem is the same as asking that, in the new election, is never behind .
Conversely, in any new counting order where is never behind , the first vote must be for . Removing that first vote recovers a valid order for the original problem. This is a bijection.
By the ballot theorem,
is the number of valid new orders. The total number of original counting orders is
Therefore the desired probability is
That is,
3. Probability and other subjects
Here is an example where probability meets linear algebra. Let be an matrix whose entries are independent symmetric Bernoulli random variables:
Define . Compute and , then guess and prove a general formula for .
Solution
The key observation is that an integer is odd if and only if it is nonzero modulo . Hence
So, if we view as a matrix over , the problem becomes
Compute this by checking whether the rows are linearly independent. Let
be the rows of . Since the entries are independent and equally likely to be or , each row is uniform on , and the rows are independent.
The probability that the first row is nonzero is
If the first rows are linearly independent, their span has vectors. Thus the conditional probability that row lies outside this span is
Therefore
In particular,
and
The general formula is
Let be the symmetric group, the set of all bijections from to itself. Choose uniformly from . Let the number of fixed points be
and let the number of transpositions be
-
Give the probability space in detail.
-
Are and independent?
-
Find the probability mass function of .
-
Compute .
Solution
(1) Take
A sample point is a permutation , and are random variables on this probability space.
(2) For , and are not independent. Indeed,
and
because, for example, the transposition has one transposition. But if , then must be the identity permutation, so . Hence
Thus and are not independent.
(3) For , first choose which points are fixed. This can be done in
ways. The remaining points must form a derangement. Let be the number of derangements of elements. Then
By inclusion-exclusion,
Therefore
This is the probability mass function of .
(4) For each , define
Then
By linearity of expectation,
For a fixed pair , there are permutations with and . Hence
Thus
End-of-chapter checklist
- The original problems and solutions in this chapter come from the corresponding TeX source file.
- You can first read only the problem boxes, write down the key identities, and then open the proofs or solutions.
- If a result uses independence, countable additivity, a change of variables, or a moment condition, it is worth marking that point explicitly.