7 Days0Site 7 Days0
30 Days0Site 30 Days0
Total0Site Total0

Probability Note

Date: 2024/05/26
Last Updated: 2024-06-05T11:18:36.167Z
Categories: Probability
Tags: Probability
Read Time: 16 minutes

Contents

Random Variables and Probability Distributions

Random Variables

A random variable is a function that maps the outcomes Ω\Omega of a random process to numerical values R\mathbb{R}. If the Ω\Omega is discrete, the random variable is called a discrete random variable. If the Ω\Omega is continuous, the random variable is called a continuous random variable.

Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) of a random variable XX is defined as

F(x)=P(Xx)F(x) = P(X \leq x)

Probability Mass Function (PMF) for Discrete Random Variables

The probability mass function (PMF) of a discrete random variable XX is defined as

p(x)=P(X=x)p(x) = P(X = x)

Probability Density Function (PDF) for Continuous Random Variables

The probability density function (PDF) of a continuous random variable XX is defined as

f(x)=dF(x)dxf(x) = \frac{dF(x)}{dx}

where F(x)F(x) is the CDF of XX.

Expectation and Variance of Random Variables

The expectation of a random variable XX is defined as

E[X]=xxp(x)for discrete random variablesE[X] = \sum_{x} x p(x) \quad \text{for discrete random variables}

where p(x)p(x) is the PMF of XX.

E[X]=xf(x)dxfor continuous random variablesE[X] = \int_{-\infty}^{\infty} x f(x) dx \quad \text{for continuous random variables}

where f(x)f(x) is the PDF of XX.

The variance of a random variable XX is defined as

Var(X)=E[(XE[X])2]\text{Var}(X) = E[(X - E[X])^2]

By the definition of variance, the variance is always non-negative.

Alternatively, the variance can be calculated as

Var(X)=E[X2]E[X]2\text{Var}(X) = E[X^2] - E[X]^2

Properties of Expectation and Variance

  1. E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b
  2. E[aX+bY]=aE[X]+bE[Y]E[aX + bY] = aE[X] + bE[Y]
  3. Var(aX+b)=a2Var(X)\text{Var}(aX + b) = a^2 \text{Var}(X)
  4. Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)

where aa and bb are constants, and Cov(X,Y)\text{Cov}(X, Y) is the covariance between XX and YY.

Moments of Random Variables

The n-th moment of a random variable XX is defined as

E[Xn]=xxnp(x)for discrete random variablesE[X^n] = \sum_{x} x^n p(x) \quad \text{for discrete random variables}

Standard Deviation of Random Variables

The standard deviation of a random variable XX is defined as

SD(X)=Var(X)\text{SD}(X) = \sqrt{\text{Var}(X)}

Standard Discrete Distributions

Bernoulli Distribution

The Bernoulli distribution is a discrete distribution with two possible outcomes: 0 and 1. The PMF of a Bernoulli random variable XX is defined as

p(x)={pif x=11pif x=0p(x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases}

where pp is the probability of success.

The expectation and variance of a Bernoulli random variable XX are

E[X]=pandVar(X)=p(1p)E[X] = p \quad \text{and} \quad \text{Var}(X) = p(1 - p)

Binomial Distribution

The Binomial distribution is a discrete distribution that models the number of successes in a fixed number of independent Bernoulli trials. The PMF of a Binomial random variable XX is defined as

p(x)=(nx)px(1p)nxp(x) = \binom{n}{x} p^x (1 - p)^{n - x}

where nn is the number of trials, xx is the number of successes, and pp is the probability of success.

The expectation and variance of a Binomial random variable XX are

E[X]=npandVar(X)=np(1p)E[X] = np \quad \text{and} \quad \text{Var}(X) = np(1 - p)

Poisson Distribution

The Poisson distribution is a discrete distribution that models the number of events occurring in a fixed interval of time or space. The PMF of a Poisson random variable XX is defined as

p(x)=λxeλx!p(x) = \frac{\lambda^x e^{-\lambda}}{x!}

where λ\lambda is the average rate of events.

The expectation and variance of a Poisson random variable XX are

E[X]=λandVar(X)=λE[X] = \lambda \quad \text{and} \quad \text{Var}(X) = \lambda

Geometric Distribution

The Geometric distribution is a discrete distribution that models the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials. The PMF of a Geometric random variable XX is defined as

p(x)=(1p)x1pp(x) = (1 - p)^{x - 1} p

where xx is the number of trials needed to achieve the first success, and pp is the probability of success.

The expectation and variance of a Geometric random variable XX are

E[X]=1pandVar(X)=1pp2E[X] = \frac{1}{p} \quad \text{and} \quad \text{Var}(X) = \frac{1 - p}{p^2}

Standard Continuous Distributions

Uniform Distribution

The Uniform distribution is a continuous distribution with a constant probability density function (PDF) over a fixed interval. The PDF of a Uniform random variable XX is defined as

f(x)={1baif axb0otherwisef(x) = \begin{cases} \frac{1}{b - a} & \text{if } a \leq x \leq b \\ 0 & \text{otherwise} \end{cases}

where aa and bb are the lower and upper bounds of the interval.

The expectation and variance of a Uniform random variable XX are

E[X]=a+b2andVar(X)=(ba)212E[X] = \frac{a + b}{2} \quad \text{and} \quad \text{Var}(X) = \frac{(b - a)^2}{12}

Exponential Distribution

The Exponential distribution is a continuous distribution that models the time between events in a Poisson process. The PDF of an Exponential random variable XX is defined as

f(x)=λeλxf(x) = \lambda e^{-\lambda x}

where λ\lambda is the rate parameter.

The expectation and variance of an Exponential random variable XX are

E[X]=1λandVar(X)=1λ2E[X] = \frac{1}{\lambda} \quad \text{and} \quad \text{Var}(X) = \frac{1}{\lambda^2}

Normal Distribution

The Normal distribution is a continuous distribution that is symmetric and bell-shaped. The PDF of a Normal random variable XX is defined as

f(x)=12πσe(xμ)22σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x - \mu)^2}{2\sigma^2}}

where μ\mu is the mean and σ\sigma is the standard deviation.

The expectation and variance of a Normal random variable XX are

E[X]=μandVar(X)=σ2E[X] = \mu \quad \text{and} \quad \text{Var}(X) = \sigma^2

We usually write XN(μ,σ2)X \sim N(\mu, \sigma^2) to denote that XX follows a Normal distribution with mean μ\mu and variance σ2\sigma^2.

Gamma Distribution

The Gamma distribution is a continuous distribution that generalizes the Exponential distribution. The PDF of a Gamma random variable XX is defined as

f(x)=λkxk1eλxΓ(k)f(x) = \frac{\lambda^k x^{k - 1} e^{-\lambda x}}{\Gamma(k)}

where λ\lambda is the rate parameter, kk is the shape parameter, and Γ(k)\Gamma(k) is the gamma function.

Γ(k)=0xk1exdx\Gamma(k) = \int_{0}^{\infty} x^{k - 1} e^{-x} dx

The expectation and variance of a Gamma random variable XX are

E[X]=kλandVar(X)=kλ2E[X] = \frac{k}{\lambda} \quad \text{and} \quad \text{Var}(X) = \frac{k}{\lambda^2}

Beta Distribution

The Beta distribution is a continuous distribution that is defined on the interval [0, 1]. The PDF of a Beta random variable XX is defined as

f(x)=xα1(1x)β1B(α,β)f(x) = \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}

where α\alpha and β\beta are the shape parameters, and B(α,β)B(\alpha, \beta) is the beta function.

B(α,β)=Γ(α)Γ(β)Γ(α+β)B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)}

The expectation and variance of a Beta random variable XX are

E[X]=αα+βandVar(X)=αβ(α+β)2(α+β+1)E[X] = \frac{\alpha}{\alpha + \beta} \quad \text{and} \quad \text{Var}(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}

Discrete Bivariate Distributions

A bivariate distribution is a probability distribution that describes the joint behaviour of two random variables.

Joint Probability Mass Function (PMF)

Given two random variables XX and YY, the joint probability mass function (PMF) for discrete random variables is defined as

p(x,y)=P(X=x,Y=y)p(x, y) = P(X = x, Y = y)

Marginal Probability Mass Function (PMF)

The marginal probability mass function (PMF) of a random variable XX is defined as

pX(x)=yp(x,y)p_X(x) = \sum_{y} p(x, y)

Conditional Probability Mass Function (PMF)

The conditional probability mass function (PMF) of a random variable XX given Y=yY = y is defined as

pXY(xy)=p(x,y)pY(y)p_{X|Y}(x|y) = \frac{p(x, y)}{p_Y(y)}

Expectation and Variance of Bivariate Distributions

The expectation of a bivariate distribution is defined as

E[g(X,Y)]=xyg(x,y)p(x,y)E[g(X, Y)] = \sum_{x} \sum_{y} g(x, y) p(x, y)

The covariance of two random variables XX and YY is defined as

Cov(X,Y)=E[(XE[X])(YE[Y])]\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]

Covariance can sometimes be negative, zero, or positive.

And can be calculated as

Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X, Y) = E[XY] - E[X]E[Y]

The correlation coefficient of two random variables XX and YY is defined as

ρ(X,Y)=Cov(X,Y)Var(X)Var(Y)\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}}

We can prove that 1ρ(X,Y)1-1 \leq \rho(X, Y) \leq 1.

Independent Random Variables

Two random variables XX and YY are independent if and only if

p(x,y)=pX(x)pY(y)p(x, y) = p_X(x) p_Y(y)

for all xx and yy.

If XX and YY are independent, then

E[XY]=E[X]E[Y]andCov(X,Y)=0E[XY] = E[X]E[Y] \quad \text{and} \quad \text{Cov}(X, Y) = 0

Uncorrelated Random Variables

Two random variables XX and YY are uncorrelated if and only if

Cov(X,Y)=0\text{Cov}(X, Y) = 0

If XX and YY are uncorrelated, then

E[XY]=E[X]E[Y]E[XY] = E[X]E[Y]

Note: Uncorrelated random variables are not necessarily independent.

Continuous Bivariate Distributions

Joint Cumulative Distribution Function (CDF)

Given two random variables XX and YY, the joint cumulative distribution function (CDF) for continuous random variables is defined as

FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x, y) = P(X \leq x, Y \leq y)

Marginal Cumulative Distribution Function (CDF)

The marginal cumulative distribution function (CDF) of a random variable XX is defined as

FX(x)=P(Xx)=P(Xx,Y<)=FX,Y(x,)F_X(x) = P(X \leq x) = P(X \leq x, Y < \infty) = F_{X,Y}(x, \infty)

Joint Probability Density Function (PDF)

Given two random variables XX and YY, if there exists a function f(x,y)f(x, y) such that

P((X,Y)A)=Af(x,y)dxdyP((X, Y) \in A) = \iint_{A} f(x, y) dx dy

for all Lebesgue-measurable sets AA, then f(x,y)f(x, y) is the joint probability density function (PDF) of XX and YY. And XX and YY are called jointly continuous random variables.

By the definition of the joint PDF, we have

FX,Y(x,y)=xyf(u,v)dudvF_{X,Y}(x, y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f(u, v) du dv

And

f(x,y)=2FX,Y(x,y)xyf(x, y) = \frac{\partial^2 F_{X,Y}(x, y)}{\partial x \partial y}

Marginal Probability Density Function (PDF)

The marginal probability density function (PDF) of a random variable XX is defined as

fX(x)=dFX(x)dx=f(x,y)dyf_X(x) = \frac{dF_X(x)}{dx} = \int_{-\infty}^{\infty} f(x, y) dy

Continuous Multivariate Distributions

Joint Cumulative Distribution Function (CDF)

Given nn random variables X1,X2,,XnX_1, X_2, \ldots, X_n, let the vector X=(X1,X2,,Xn)\mathbf{X} = (X_1, X_2, \ldots, X_n), the joint cumulative distribution function (CDF) for continuous random variables is defined as

FX(x)=P(X<x)F_\mathbf{X}(\mathbf{x}) = P(\mathbf{X} < \mathbf{x})

Joint Probability Density Function (PDF)

Given nn random variables X1,X2,,XnX_1, X_2, \ldots, X_n, let the vector X=(X1,X2,,Xn)\mathbf{X} = (X_1, X_2, \ldots, X_n), if there exists a function f(x1,x2,,xn)f(x_1, x_2, \ldots, x_n) such that

P(XA)=Af(x)dXP(\mathbf{X} \in A) = \int_{A} f(\mathbf{x}) d\mathbf{X}

for all Lebesgue-measurable sets AA, then f(x)f(\mathbf{x}) is the joint probability density function (PDF) of X\mathbf{X}. And X\mathbf{X} are called jointly continuous random variables.

By the definition of the joint PDF, we have

FX(x)=x1x2xnf(u1,u2,,un)du1du2dunF_\mathbf{X}(\mathbf{x}) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} \ldots \int_{-\infty}^{x_n} f(u_1, u_2, \ldots, u_n) du_1 du_2 \ldots du_n

And

f(x)=nFX(x)x1x2xnf(\mathbf{x}) = \frac{\partial^n F_\mathbf{X}(\mathbf{x})}{\partial x_1 \partial x_2 \ldots \partial x_n}

Marginal Probability Density Function (PDF)

The marginal probability density function (PDF) of a random variable XiX_i is defined as

FXk1,Xk2,,Xkm(xk1,xk2,,xkm)=F(x1,x2,,xn)jkidxjF_{X_{k_1}, X_{k_2}, \ldots, X_{k_m}}(x_{k_1}, x_{k_2}, \ldots, x_{k_m}) = \int_{-\infty}^{\infty} \ldots \int_{-\infty}^{\infty} F(x_1, x_2, \ldots, x_n) \prod_{j \neq k_i}dx_j

Independence of Random Variables

Two random variables XX and YY are independent if and only if

F(x,y)=FX(x)FY(y)F(x, y) = F_X(x) F_Y(y)

for all xx and yy.

This can be thought as the joint behaviour of XX and YY is the product of the marginal behaviour of XX and YY.

The definition can also be formulated in terms of the joint PDF:

f(x,y)=fX(x)fY(y)f(x, y) = f_X(x) f_Y(y)

To show that two random variables are not independent, we only need to find one pair of xx and yy such that the equation does not hold.

Functions of Independent Random Variables

Given two independent random variables XX and YY, and a function g(X)g(X), and a function h(Y)h(Y), the random variables Z=g(X)Z = g(X) and W=h(Y)W = h(Y) are also independent.

Mutual Independence of Random Variables

A set of random variables X1,X2,,XnX_1, X_2, \ldots, X_n are mutually independent if and only if

F(x1,x2,,xn)=FX1(x1)FX2(x2)FXn(xn)F(x_1, x_2, \ldots, x_n) = F_{X_1}(x_1) F_{X_2}(x_2) \ldots F_{X_n}(x_n)

Note: Mutual independence implies pairwise independence, however, the converse is not true.

Identically Independent Random Variables (IID)

A set of random variables X1,X2,,XnX_1, X_2, \ldots, X_n are identically independent (IID) if and only if

  1. They are mutually independent.
  2. They have the same distribution.

Sum of Random Variables

Given two independent random variables XX and YY, the sum of XX and YY is defined as

Z=X+YZ = X + Y

Then, the CDF of ZZ can be calculated as

FZ(z)=P(Zz)=P(X+Yz)=P(X+YzX=x)fX(x)dx=P(Yzx)fX(x)dx=FY(zx)fX(x)dx\begin{align} F_Z(z) &= P(Z \leq z) \\ &= P(X + Y \leq z) \\ &= \int_{-\infty}^{\infty} P(X + Y \leq z | X = x) f_X(x) dx \\ &= \int_{-\infty}^{\infty} P(Y \leq z - x) f_X(x) dx \\ &= \int_{-\infty}^{\infty} F_Y(z - x) f_X(x) dx \end{align}

The PDF of ZZ can be calculated as

fZ(z)=fY(zx)fX(x)dxf_Z(z) = \int_{-\infty}^{\infty} f_Y(z - x) f_X(x) dx

This is called the convolution of the PDFs of XX and YY.

Expectation, Covariance and Correlation of Multiple Random Variables

Expectation of Multiple Random Variables

Let the vector X=(X1,X2,,Xn)\mathbf{X} = (X_1, X_2, \ldots, X_n) be a set of random variables, and for Lebesgue-measurable functions g:RnRg: \mathbb{R}^n \rightarrow \mathbb{R} the expectation of X\mathbf{X} is defined as

E[g(X)]=Rng(x)f(x)dXE[g(\mathbf{X})] = \int_{\mathbb{R}^n} g(\mathbf{x}) f(\mathbf{x}) d\mathbf{X}

Properties of Expectation of Multiple Random Variables

  1. E[ag(X)+bh(X)+c]=aE[g(X)]+bE[h(X)]+cE[a g(\mathbf{X}) + bh(\mathbf{X}) + c] = a E[g(\mathbf{X})] + b E[h(\mathbf{X})] + c
  2. If XX and YY are independent, then E[g(X)h(Y)]=E[g(X)]E[h(Y)]E[g(X)h(Y)] = E[g(X)]E[h(Y)] for any functions gg and hh.

Covariance of Multiple Random Variables

The covariance of two random variables XX and YY is defined as

Cov(X,Y)=E[(XE[X])(YE[Y])]\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])]

The covariance can be calculated as

Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X, Y) = E[XY] - E[X]E[Y]

Properties of Covariance of Multiple Random Variables

  1. Cov(X,Y)=Cov(Y,X)\text{Cov}(X, Y) = \text{Cov}(Y, X)
  2. Cov(X,X)=Var(X)\text{Cov}(X, X) = \text{Var}(X)
  3. Cov(aX+b,cY+d)=acCov(X,Y)\text{Cov}(aX + b, cY + d) = ac \text{Cov}(X, Y)
  4. Cov(aiXi,biYi)=aibjCov(Xi,Yj)\text{Cov}(\sum a_iX_{i}, \sum b_iY_{i}) = \sum a_i b_j \text{Cov}(X_i,Y_j)
  5. Cov(X,Y)=E(XY)E(X)E(Y)\text{Cov}(X,Y) = E(XY) - E(X)E(Y)

Cauchy-Schwarz Inequality In Terms of Covariance

Cov(X,Y)Var(X)Var(Y)|\text{Cov}(X, Y)| \leq \sqrt{\text{Var}(X) \text{Var}(Y)}

This equality holds if and only if XX and YY are linearly related.

Correlation Coefficient of Multiple Random Variables

The correlation coefficient of two random variables XX and YY is defined as

ρ(X,Y)=Cov(X,Y)Var(X)Var(Y)\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}}

By Cauchy-Schwarz Inequality, we have 1ρ(X,Y)1-1 \leq \rho(X, Y) \leq 1.

Moments of Multiple Random Variables

The n-th (raw) moment of a random variable XX is defined as E[Xn]E[X^n], and the n-th central moment of a random variable XX is defined as E[(XE[X])n]E[(X - E[X])^n].

The joint (raw) moment of random variables XX and YY is defined as E[XiYj]E[X^iY^j], and the joint central moment of random variables XX and YY is defined as E[(XE[X])i(YE[Y])j]E[(X - E[X])^i(Y - E[Y])^j].

Expectation and Variance of Multiple Random Variables

Given a set of random variables X=(X1,X2,,Xn)T\mathbf{X} = (X_1, X_2, \ldots, X_n)^T,

E[X]=(E[X1],E[X2],,E[Xn])TE[\mathbf{X}] = (E[X_1], E[X_2], \ldots, E[X_n])^T

which is a vector of n×1n\times1 dimensions vector.

The covariance matrix of X\mathbf{X} is defined as

Cov(X)=E[(XE[X])(XE[X])T]\text{Cov}(\mathbf{X}) = E[(\mathbf{X} - E[\mathbf{X}])(\mathbf{X} - E[\mathbf{X}])^T]

which is a n×nn\times n matrix.

Conditional Distribution and Expectation

Conditional Distribution

Conditional Probability Mass Function (PMF)

Given two random variables XX and YY, the conditional probability mass function (PMF) of XX given Y=yY = y is defined as

pXY(xy)=p(x,y)pY(y)p_{X|Y}(x|y) = \frac{p(x, y)}{p_Y(y)}

Conditional Commutative Distribution Function (CDF) for Discrete Random Variables

Given two random variables XX and YY, the conditional commutative distribution function (CDF) of XX given Y=yY = y is defined as

FXY(xy)=P(XxY=y)F_{X|Y}(x|y) = P(X \leq x | Y = y)

Conditional Probability Density Function (PDF) for Continuous Random Variables

Given two random variables XX and YY, the conditional probability density function (PDF) of XX given Y=yY = y is defined as

fXY(xy)=f(x,y)fY(y)f_{X|Y}(x|y) = \frac{f(x, y)}{f_Y(y)}

Conditional Commutative Distribution Function (CDF) for Continuous Random Variables

Given two random variables XX and YY, the conditional commutative distribution function (CDF) of XX given Y=yY = y is defined as

FXY(xy)=P(XxY=y)=xfXY(uy)duF_{X|Y}(x|y) = P(X \leq x | Y = y) = \int_{-\infty}^{x} f_{X|Y}(u|y) du

Conditional Expectation

Given two random variables XX and YY, the conditional expectation of XX given Y=yY = y is defined as

E[XY=y]=xxpXY(xy)for discrete random variablesE[X|Y = y] = \sum_{x} x p_{X|Y}(x|y) \quad \text{for discrete random variables} E[XY=y]=xfXY(xy)dxfor continuous random variablesE[X|Y = y] = \int_{-\infty}^{\infty} x f_{X|Y}(x|y) dx \quad \text{for continuous random variables}

We can also define function of YY as

ψ(y)=E[XY=y]\psi(y) = E[X|Y = y]

This is a random variable, and we call this the conditional expectation of XX given YY.

The Law of Iterated Expectations (The Tower Law)

Given two random variables XX and YY, the law of iterated expectations states that

E[E[XY]]=E[X]E[E[X|Y]] = E[X]

Proof:

E[E[XY]]=E[XY=y]fY(y)dy=(xfXY(xy)dx)fY(y)dy=xf(x,y)dxdy=E[X]\begin{align} E[E[X|Y]] &= \int_{-\infty}^{\infty} E[X|Y = y] f_Y(y) dy \\ &= \int_{-\infty}^{\infty} \left( \int_{-\infty}^{\infty} x f_{X|Y}(x|y) dx \right) f_Y(y) dy \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x f(x, y) dx dy \\ &= E[X] \end{align}

Law of Total Probability

Given a random variable XX and an event AA, the law of total probability states that

P(A)=P(AX=x)fX(x)dxP(A) = \int_{-\infty}^{\infty} P(A|X = x) f_X(x) dx

Wald's Equation

Given a random variable XX and a stopping time NN which is an integer valued random variable, then

E[i=1NX]=E[X]E[N]E[\sum_{i=1}^{N}X] = E[X]E[N]

Properties of Conditional Expectation

  1. E[aX+bY+cZ]=aE[XZ]+bE[YZ]+cE[aX + bY + c|Z] = aE[X|Z] + bE[Y|Z] + c
  2. If Y>0Y > 0, then E[XY]>0E[X|Y] > 0.
  3. If XX and YY are independent, then E[XY]=E[X]E[X|Y] = E[X].
  4. For any function gg and hh, E[g(X)h(Y)Y]=h(Y)E[g(X)Y]E[g(X)h(Y)|Y] = h(Y)E[g(X)|Y].

Conditional Variance for Multiple Random Variables

Given two random variables XX and YY, the conditional variance of XX given YY is defined as

Var(XY)=E[(XE[XY])2Y]\text{Var}(X|Y) = E[(X - E[X|Y])^2|Y]

The conditional variance can be calculated as

Var(XY)=E[X2Y]E[XY]2\text{Var}(X|Y) = E[X^2|Y] - E[X|Y]^2

Note that the conditional variance is a random variable of YY.

Law of Total Variance

Given a random variable XX and an event AA, the law of total variance states that

Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])

Transformations of Random Variables

Support of Probability Density Function (PDF)

Given a random variable XX with a PDF fX(x)f_X(x), the support of fX(x)f_X(x) is the set of values of xx where fX(x)>0f_X(x) > 0.

Monotonic Transformations

Given a random variable XX with a PDF fX(x)f_X(x), and a function Y=g(X)Y = g(X), if gg is a monotonic function, then the CDF of YY is

FY(y)={FX(g1(y))if g is increasing1FX(g1(y))if g is decreasingF_Y(y) = \begin{cases} F_X(g^{-1}(y)) & \text{if } g \text{ is increasing} \\ 1 - F_X(g^{-1}(y)) & \text{if } g \text{ is decreasing} \end{cases}

Then, the PDF of YY is

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right|

For non-monotonic transformations, we can break the transformation into monotonic parts.

Transformation of Bivariate Random Variables

Given random variables X1X_1 and X2X_2 with a joint PDF fX1,X2(x1,x2)f_{X_1, X_2}(x_1, x_2), and functions (Y1,Y2)=T(X1,X2)(Y_1, Y_2) = T(X_1, X_2), where T:R2R2T: \mathbb{R}^2 \rightarrow \mathbb{R}^2 is a one-to-one transformation, and let H=T1H = T^{-1}.

We define JHJ_{H}, which is the Jacobian determinate of HH as

JH=(H1,H2)(x1,x2)=det[H1x1H1x2H2x1H2x2]J_{H} = \left| \frac{\partial (H_1, H_2)}{\partial (x_1, x_2)} \right| = \det\begin{bmatrix} \frac{\partial H_1}{\partial x_1} & \frac{\partial H_1}{\partial x_2} \\ \frac{\partial H_2}{\partial x_1} & \frac{\partial H_2}{\partial x_2} \end{bmatrix}

Then, the joint PDF of (Y1,Y2)(Y_1, Y_2) is

fY1,Y2(y1,y2)=fX1,X2(H1(y1,y2),H2(y1,y2))JHf_{Y_1, Y_2}(y_1, y_2) = f_{X_1, X_2}(H_1(y_1, y_2), H_2(y_1, y_2)) |J_{H}|

Note: The Jacobian determinate satisfy: JH=JH11=JT1J_{H} = J_{H^{-1}}^{-1} = J_T^{-1}.

Transformation of Multivariate Random Variables

The theorem from Transformation of Bivariate Random Variables can be generalized to multiple random variables.

Given random variables X1,X2,,XnX_1, X_2, \ldots, X_n with a joint PDF fX1,X2,,Xn(x1,x2,,xn)f_{X_1, X_2, \ldots, X_n}(x_1, x_2, \ldots, x_n), and functions (Y1,Y2,,Yn)=T(X1,X2,,Xn)(Y_1, Y_2, \ldots, Y_n) = T(X_1, X_2, \ldots, X_n), where T:RnRnT: \mathbb{R}^n \rightarrow \mathbb{R}^n is a one-to-one transformation, and let H=T1H = T^{-1}.

We define JHJ_{H}, which is the Jacobian determinate of HH as

JH=(H1,H2,,Hn)(x1,x2,,xn)=det[H1x1H1x2H1xnH2x1H2x2H2xnHnx1Hnx2Hnxn]J_{H} = \left| \frac{\partial (H_1, H_2, \ldots, H_n)}{\partial (x_1, x_2, \ldots, x_n)} \right| = \det\begin{bmatrix} \frac{\partial H_1}{\partial x_1} & \frac{\partial H_1}{\partial x_2} & \ldots & \frac{\partial H_1}{\partial x_n} \\ \frac{\partial H_2}{\partial x_1} & \frac{\partial H_2}{\partial x_2} & \ldots & \frac{\partial H_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial H_n}{\partial x_1} & \frac{\partial H_n}{\partial x_2} & \ldots & \frac{\partial H_n}{\partial x_n} \end{bmatrix}

Then, the joint PDF of (Y1,Y2,,Yn)(Y_1, Y_2, \ldots, Y_n) is

fY1,Y2,,Yn(y1,y2,,yn)=fX1,X2,,Xn(H1(y1,y2,,yn),H2(y1,y2,,yn),,Hn(y1,y2,,yn))JHf_{Y_1, Y_2, \ldots, Y_n}(y_1, y_2, \ldots, y_n) = f_{X_1, X_2, \ldots, X_n}(H_1(y_1, y_2, \ldots, y_n), H_2(y_1, y_2, \ldots, y_n), \ldots, H_n(y_1, y_2, \ldots, y_n)) |J_{H}|

Generating Functions

The Moment Generating Function (MGF)

Given a random variable XX, the moment generating function (MGF) MX(t)M_X(t) of XX is defined as

MX(t)=E[etX]M_X(t) = E[e^{tX}]

The domain of the MGF is the set of tt such that MX(t)M_X(t) exists and is finite.

If the domain does not contain an open neighbourhood of 00, then we say the MGF does not exist.

Example: The Moment Generating Function of the Standard Normal Distribution

Given a random variable XX that follows the standard normal distribution,

fX(x)=12πex22f_X(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}

By definition,

MX(t)=E[etX]=etxfX(x)dx=12πetxx22dx=12πetxx22dx=12πe12(x22tx)dx=12πe12(xt)2+t22dx=et22\begin{align} M_X(t) &= E[e^{tX}] \\ &= \int_{-\infty}^{\infty} e^{tx} f_X(x) dx \\ &= \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{tx-\frac{x^2}{2}} dx \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{tx-\frac{x^2}{2}} dx \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 - 2tx)} dx \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-\frac{1}{2}(x - t)^2 + \frac{t^2}{2}} dx \\ &= e^{\frac{t^2}{2}} \end{align}

Example: The Moment Generating Function of the Exponential Distribution

Given a random variable XX that follows the exponential distribution,

fX(x)=λeλxf_X(x) = \lambda e^{-\lambda x}

By definition,

MX(t)=E[etX]=0etxλeλxdx=λ0etxλxdx=λtλ0(tλ)e(tλ)xdx=λtλ\begin{align} M_{X}(t) &= E[e^{tX}] \\ &= \int_{0}^{\infty} e^{tx} \lambda e^{-\lambda x} dx \\ &= \lambda \int_{0}^{\infty} e^{tx-\lambda x} dx \\ &= \frac{\lambda}{t-\lambda} \int_{0}^{\infty} (t-\lambda) e^{(t-\lambda)x} dx \\ &= \frac{\lambda}{t-\lambda} \end{align}

Properties of the Moment Generating Function

MX(0)=E[1]=1M_X(0) = E[1] = 1

The n-th derivative of the MGF at t=0t = 0 is:

MX(n)(0)=E[XnetX](0)=E[Xn]M_X^{(n)}(0) = E[X^{n}e^{tX}](0) = E[X^{n}]

By the previous property, the Maclaurin series of the MGF is:

MX(t)=n=0E[Xn]n!tnM_X(t) = \sum_{n=0}^{\infty} \frac{E[X^{n}]}{n!} t^{n}

Also, if XX have MGF MX(t)M_X(t) and Y=aX+bY = aX + b, then YY have MGF MY(t)=etbMX(at)M_Y(t) = e^{tb}M_X(at).

Uniqueness of the Moment Generating Function

Given two random variables XX and YY with MGF MX(t)M_X(t) and MY(t)M_Y(t), if MX(t)=MY(t)M_X(t) = M_Y(t) for all tt in an open neighbourhood of 00, then XX and YY have the same distribution.

Joint Moment Generating Function (JMGF)

The joint moment generating function (JMGF) of random variables X1,X2,,XnX_1, X_2, \ldots, X_n is defined as a function from Rn\mathbb{R}^n to R\mathbb{R}:

MX(t)=E[etTX]M_\mathbf{X}(\mathbf{t}) = E[e^{\mathbf{t}^T \mathbf{X}}]

where t=(t1,t2,,tn)T\mathbf{t} = (t_1, t_2, \ldots, t_n)^T and X=(X1,X2,,Xn)T\mathbf{X} = (X_1, X_2, \ldots, X_n)^T.

If the JMGF exists and is finite on a open neighbourhood of 0\mathbf{0}, then we say the JMGF exists.

Properties of the Joint Moment Generating Function

If the JMGF exists and is finite on a open neighbourhood of 0\mathbf{0}, then it uniquely determines the joint distribution of X1,X2,,XnX_1, X_2, \ldots, X_n.

The MGF of XiX_i can be expressed as:

MXi(ti)=MX(0,,0,ti,0,,0)M_{X_i}(t_i) = M_{\mathbf{X}}(0, \ldots, 0, t_i, 0, \ldots, 0)

The joint moment of X1,X2,,XnX_1, X_2, \ldots, X_n can be expressed as:

E[X1i1X2i2Xnin]=i1+i2++inMX(t)t1i1t2i2tnint=0E[X_1^{i_1}X_2^{i_2}\ldots X_n^{i_n}] = \left. \frac{\partial^{i_1 + i_2 + \ldots + i_n} M_{\mathbf{X}}(\mathbf{t})}{\partial t_1^{i_1} \partial t_2^{i_2} \ldots \partial t_n^{i_n}} \right|_{\mathbf{t} = \mathbf{0}}

Relation Between the Joint Moment Generating Function and Moment Generating Function

Given random variables X1,X2,,XnX_1, X_2, \ldots, X_n with MGF MXi(ti)M_{X_i}(t_i), and JMGF MX1,X2,,Xn(t)M_{X_1, X_2, \ldots, X_n}(\mathbf{t}), then X1,X2,,XnX_1, X_2, \ldots, X_n are mutually independent if and only if

MX1,X2,,Xn(t)=MX1(t1)MX2(t2)MXn(tn)M_{X_1, X_2, \ldots, X_n}(\mathbf{t}) = M_{X_1}(t_1)M_{X_2}(t_2)\ldots M_{X_n}(t_n)

Sums of Independent Random Variables

Given random variables X1,X2,,XnX_1, X_2, \ldots, X_n that are independent, and S=a1X1+a2X2++anXnS = a_1X_1 + a_2X_2 + \ldots + a_nX_n, then the MGF of SS is

MS(t)=MX1(a1t)MX2(a2t)MXn(ant)M_S(t) = M_{X_1}(a_1t)M_{X_2}(a_2t)\ldots M_{X_n}(a_nt)

Probability Generating Function (PGF)

Given a random variable XX that takes non-negative integer values, the probability generating function (PGF) GX(z)G_X(z) of XX is defined as

ϕX(z)=E[zX]=x=0zxP(X=x)\phi_X(z) = E[z^X] = \sum_{x=0}^{\infty} z^x P(X = x)

Properties of the Probability Generating Function

ϕX(1)=1\phi_X(1) = 1

The PMF of XX is uniquely determined by ϕX(z)\phi_X(z).

The n-th factorial moment of XX is

E[X(X1)(Xn+1)]=dnϕX(z)dznz=1E[X(X-1)\ldots(X-n+1)] = \left. \frac{d^n \phi_X(z)}{dz^n} \right|_{z=1}

Random variables X1,X2,,XnX_1, X_2, \ldots, X_n are mutually independent if and only if the joint PGF ϕX1,X2,,Xn(z1,z2,,zn)\phi_{X_1, X_2, \ldots, X_n}(z_1, z_2, \ldots, z_n) is

ϕX1,X2,,Xn(z1,z2,,zn)=E[z1X1z2X2znXn]=ϕX1(z1)ϕX2(z2)ϕXn(zn)\phi_{X_1, X_2, \ldots, X_n}(z_1, z_2, \ldots, z_n) = E[z_1^{X_1}z_2^{X_2}\ldots z_n^{X_n}] =\phi_{X_1}(z_1)\phi_{X_2}(z_2)\ldots \phi_{X_n}(z_n)

The PGF of sum of independent random variables X1,X2,,XnX_1, X_2, \ldots, X_n is

ϕX1+X2++Xn(z)=ϕX1(z)ϕX2(z)ϕXn(z)\phi_{X_1 + X_2 + \ldots + X_n}(z) = \phi_{X_1}(z)\phi_{X_2}(z)\ldots \phi_{X_n}(z)

Relation of PGF and MGF

Given a random variable XX that takes non-negative integer values, and the PGF ϕX(z)\phi_X(z) and MGF MX(t)M_X(t) of XX, then

ϕX(et)=MX(t)MX(ln(t))=ϕX(t)\begin{align} \phi_X(e^t) = M_X(t) \\ M_X(\ln(t)) = \phi_X(t) \end{align}

Markov and Chebyshev Inequalities

Markov Inequality

Given a non-negative random variable XX and a>0a > 0, then

P(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}

Proof:

P(Xc)=cfX(x)dx=1cccfX(x)dx1ccxfX(x)dx1c0xfX(x)dx=E[X]c\begin{align} P(X \geq c) &= \int_c^{\infty} f_X(x) dx \\ &= \frac{1}{c} \int_c^{\infty} cf_X(x) dx \\ &\le \frac{1}{c} \int_c^{\infty} xf_X(x) dx \\ &\le \frac{1}{c} \int_0^{\infty} xf_X(x) dx &= \frac{E[X]}{c} \end{align}

Chebyshev Inequality

Given a random variable XX with mean μ\mu and variance σ2\sigma^2, and a>0a > 0, then

P(Xμa)σ2a2P(|X - \mu| \geq a) \leq \frac{\sigma^2}{a^2}

Proof:

Define Y=(Xμ)2Y = (X - \mu)^2, then YY is a non-negative random variable, and E[Y]=Var[X]=σ2E[Y] = \text{Var}[X] = \sigma^2.

By Markov Inequality,

P(Ya2)E[Y]a2=σ2a2P(Y \geq a^2) \leq \frac{E[Y]}{a^2} = \frac{\sigma^2}{a^2}

Then,

P(Xμa)=P((Xμ)2a2)=P(Ya2)σ2a2P(|X - \mu| \geq a) = P((X - \mu)^2 \geq a^2) = P(Y \geq a^2) \leq \frac{\sigma^2}{a^2}

Multivariate Normal Distribution

We define the higher dimensional normal distribution as an analog of the one dimensional normal distribution.

We say a random vector X=(X1,X2,,Xn)T\mathbf{X} = (X_1, X_2, \ldots, X_n)^T follows a multivariate normal distribution if it can be expressed as

X=μ+AZ\mathbf{X} = \mathbf{\mu} + \mathbf{A}\mathbf{Z}

where lnl\le n and μ\mathbf{\mu} is a vector of means, A\mathbf{A} is a n×ln\times l matrix of constants, and Z\mathbf{Z} is a l×1l\times 1 vector of independent standard normal random variables.

In convention, we write Σ=AAT\Sigma = A A^T, and we denote the multivariate normal distribution as

XNn(μ,Σ)\mathbf{X} \sim N_n(\mathbf{\mu}, \Sigma)

Joint PDF of Multivariate Normal Distribution

If we assume that Σ\Sigma has full rank, We can use multivariate transformation to derive the joint PDF of X\mathbf{X}:

fX(x)=fZ(A1(xμ))det(A1)=det(A1)i=1n12πe12zi2=det(A1)12πne12ZTZ=det(A1)12πne12(A1(xμ))TA1(xμ)=det(A1)12πne12(xμ)TΣ1(xμ)\begin{align} f_X(x) &= f_Z(A^{-1}(x-\mu))|\det(A^{-1})| \\ &= |\det(A^{-1})| \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z_i^2} \\ &= |\det(A^{-1})| \frac{1}{\sqrt{2\pi}^n} e^{-\frac{1}{2}Z^TZ} \\ &= |\det(A^{-1})| \frac{1}{\sqrt{2\pi}^n} e^{-\frac{1}{2}(A^{-1}(x-\mu))^TA^{-1}(x-\mu)} \\ &= |\det(A^{-1})| \frac{1}{\sqrt{2\pi}^n} e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)} \\ \end{align}

Joint Moment Generating Function of Multivariate Normal Distribution

If we assume that Σ\Sigma has full rank, the joint moment generating function of X\mathbf{X} is

MX(t)=E[etTX]=RnetTxfX(x)dx=RnetTxdet(A1)12πne12(xμ)TΣ1(xμ)dx=det(A1)2πnRne12[2tTx+xTΣ1xxTΣ1μμTΣ1x+μTΣ1μ]dx=det(A1)2πnRne12[2(ΣTt)TΣ1x+xTΣ1x2μTΣ1x+μTΣ1μ]dx=det(A1)2πnRne12[+xTΣ1x2(μΣTt)TΣ1x+μTΣ1μ]dx=det(A1)2πne12[(μΣTt)TΣ1(μΣTt)+μTΣ1μ]Rne12[(xμΣTt)TΣ1(xμΣTt)]dx=e12[(μΣTt)TΣ1(μΣTt)+μTΣ1μ]=etTμ+12tTΣt\begin{align} M_{X}(t) &= E[e^{t^TX}] \\ &= \int_{\mathbb{R}^n} e^{t^Tx} f_X(x) dx \\ &= \int_{\mathbb{R}^n} e^{t^Tx} |\det(A^{-1})| \frac{1}{\sqrt{2\pi}^n} e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)} dx \\ &= \frac{|\det(A^{-1})|}{\sqrt{2\pi}^n} \int_{\mathbb{R}^n} e^{-\frac{1}{2}\left[ 2t^Tx +x^T\Sigma^{-1}x -x^T\Sigma^{-1}\mu -\mu^T\Sigma^{-1}x +\mu^T\Sigma^{-1}\mu \right]} dx \\ &= \frac{|\det(A^{-1})|}{\sqrt{2\pi}^n} \int_{\mathbb{R}^n} e^{-\frac{1}{2}\left[ 2(\Sigma^{T}t)^T \Sigma^{-1}x +x^T\Sigma^{-1}x -2\mu^T\Sigma^{-1}x +\mu^T\Sigma^{-1}\mu \right]} dx \\ &= \frac{|\det(A^{-1})|}{\sqrt{2\pi}^n} \int_{\mathbb{R}^n} e^{-\frac{1}{2}\left[ +x^T\Sigma^{-1}x -2(\mu-\Sigma^{T}t)^T\Sigma^{-1}x +\mu^T\Sigma^{-1}\mu \right]} dx \\ &= \frac{|\det(A^{-1})|}{\sqrt{2\pi}^n} e^{-\frac{1}{2}\left[ -(\mu-\Sigma^{T}t)^T\Sigma^{-1}(\mu-\Sigma^{T}t) +\mu^T\Sigma^{-1}\mu \right]} \int_{\mathbb{R}^n} e^{-\frac{1}{2}\left[ (x-\mu-\Sigma^{T}t)^T\Sigma^{-1}(x-\mu-\Sigma^{T}t) \right]} dx \\ &= e^{-\frac{1}{2}\left[ -(\mu-\Sigma^{T}t)^T\Sigma^{-1}(\mu-\Sigma^{T}t) +\mu^T\Sigma^{-1}\mu \right]} \\ &= e^{t^T\mu + \frac{1}{2}t^T\Sigma t} \end{align}

Moments of Multivariate Normal Distribution

By the joint moment generating function of multivariate normal distribution,

E[X1k1X2k2Xnkn]=k1+k2++knMX(t)t1k1t2k2tnknt=0E[X_{1}^{k_1}X_{2}^{k_2}\ldots X_{n}^{k_n}] = \left. \frac{\partial^{k_1 + k_2 + \ldots + k_n} M_{X}(\mathbf{t})}{\partial t_1^{k_1} \partial t_2^{k_2} \ldots \partial t_n^{k_n}} \right|_{\mathbf{t} = \mathbf{0}}

Especially, as,

tiMX(t)=tietTμ+12tTΣt=[ttiTμ+12ttiTΣt+tTΣtti]MX(t)=eiT[μ+Σt]MX(t)\begin{align} \frac{\partial}{\partial t_i} M_{X}(\mathbf{t}) &= \frac{\partial}{\partial t_i} e^{t^T\mu + \frac{1}{2}t^T\Sigma t} \\ &= \left[ \frac{\partial t}{\partial t_i}^T\mu + \frac{1}{2} \frac{\partial t}{\partial t_i}^T\Sigma t + t^T\Sigma \frac{\partial t}{\partial t_i} \right] M_{X}(\mathbf{t}) &= e_i^T \left[ \mu + \Sigma t \right] M_{X}(\mathbf{t}) \end{align} 2tjti=tjeiT[μ+Σt]MX(t)=eiTΣejTMX(t)+eiT[μ+Σt]ejT[μ+Σt]MX(t)=eiTΣejTMX(t)+eiT[μ+Σt][μ+Σt]TejMX(t)=eiTΣejTMX(t)+eiT[μμT+μ(Σt)T+ΣtμT+Σt(Σt)T]ejMX(t)\begin{align} \frac{\partial^2}{\partial t_j\partial t_i} &= \frac{\partial}{\partial t_j} e_i^T \left[ \mu + \Sigma t \right] M_{X}(\mathbf{t}) \\ &= e_i^T \Sigma e_j^T M_{X}(\mathbf{t}) + e_i^T \left[ \mu + \Sigma t \right] e_j^T \left[ \mu + \Sigma t \right] M_{X}(\mathbf{t}) \\ &= e_i^T \Sigma e_j^T M_{X}(\mathbf{t}) + e_i^T \left[ \mu + \Sigma t \right] \left[ \mu + \Sigma t \right]^T e_j M_{X}(\mathbf{t}) \\ &= e_i^T \Sigma e_j^T M_{X}(\mathbf{t}) + e_i^T \left[ \mu \mu^T + \mu (\Sigma t)^T + \Sigma t \mu^T + \Sigma t (\Sigma t)^T \right] e_j M_{X}(\mathbf{t}) \\ \end{align}

where eie_i is the ii-th unit vector.

Then, we can calculate the moments of the multivariate normal distribution.

E(Xi)=tiMX(0)=eiT[μ]MX(0)=μiE(X_i) = \frac{\partial}{\partial t_i} M_{X}(0) = e_i^T \left[ \mu \right] M_{X}(0) = \mu_i E(XiXj)=2tjtiMX(0)=eiTΣejTMX(0)+eiT[μμT+μ(Σ0)T+Σ0μT+Σ0(Σ0)T]ejMX(0)=eiTΣejTMX(0)=Σij+μiμj\begin{align} E(X_iX_j) &= \frac{\partial^2}{\partial t_j\partial t_i} M_{X}(0) \\ &= e_i^T \Sigma e_j^T M_{X}(0) + e_i^T \left[ \mu \mu^T + \mu (\Sigma 0)^T + \Sigma 0 \mu^T + \Sigma 0 (\Sigma 0)^T \right] e_j M_{X}(0) \\ &= e_i^T \Sigma e_j^T M_{X}(0) \\ &= \Sigma_{ij} + \mu_i\mu_j \end{align}

And the covariance:

Cov(Xi,Xj)=E(XiXj)E(Xi)E(Xj)=Σij\text{Cov}(X_i, X_j) = E(X_iX_j) - E(X_i)E(X_j) = \Sigma_{ij}

Thus, the covariance matrix of X\mathbf{X} is Σ\Sigma.

Bivariate Normal Distribution

Given n=2n=2 in multivariate normal distribution, we have the bivariate normal distribution.

The joint PDF of bivariate normal distribution is

fX1,X2(x1,x2)=detA12πe12(xμ)TΣ1(xμ)f_{X_1, X_2}(x_1, x_2) = \frac{|\det A^{-1}|}{2\pi} e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)}

where μ=(μ1,μ2)T\mu = (\mu_1, \mu_2)^T and Σ=[σ11σ12σ21σ22]\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{bmatrix}.

By the moments of multivariate normal distribution, Σ\Sigma can also be expressed as

Σ=[σ11σ12σ21σ22]=[Var(X1)Cov(X1,X2)Cov(X1,X2)Var(X2)]\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22} \end{bmatrix} = \begin{bmatrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) \\ \text{Cov}(X_1, X_2) & \text{Var}(X_2) \end{bmatrix}

Properties of Multivariate Normal Distribution

Affine Transformation of Multivariate Normal Distribution

Given XX a multivariate normal distribution, and Y=AX+bY = AX+b, then YY is also a multivariate normal distribution.

Marginal Distribution of Multivariate Normal Distribution

Given XX a multivariate normal distribution, to get the marginal distribution of Y=(Xk1,Xk2,,Xki)Y = (X_{k_1},X_{k_2},\ldots,X_{k_i}), we can let AA be a i×ni\times n matrix with 11 at the k1,k2,,kik_1, k_2, \ldots, k_i-th row, and bb be a i×1i\times 1 vector of zeros.

Then Y=AX+bY= AX + b.

Condition of Independence of Multivariate Normal Distribution

Given XX a multivariate normal distribution, XiX_i and XjX_j are independent if and only if Cov(Xi,Xj)=0\text{Cov}(X_i, X_j) = 0.

Degenerate Multivariate Normal Distribution

Given XX a multivariate normal distribution, if Σ\Sigma is a singular matrix, then XX is a degenerate multivariate normal distribution.

Suppose xx is a eigenvector of Σ\Sigma with eigenvalue 00, then let Y=xTXY = x^T X, then the mean of YY is xTμx^T \mu, and the variance of YY is xTΣx=0x^T \Sigma x = 0.

Then YY is the distribution of a constant.

Limiting Behaviors of Sums of Random Variables

In this section, we assume XX to be IID random variables. And μ=E[X]\mu = E[X], σ2=Var(X)\sigma^2 = \text{Var}(X).

Let Sn=i=1nXS_{n} = \sum_{i=1}^{n}X.

Then,

E[Sn]=nμE[S_{n}] = n\mu Var(Sn)=nσ2\text{Var}(S_{n}) = n\sigma^2 E[Snn]=μE\left[\frac{S_{n}}{n}\right] = \mu Var(Snn)=σ2n\text{Var}\left(\frac{S_{n}}{n}\right) = \frac{\sigma^2}{n}

Weak Law of Large Numbers

By intuition, as nn increases, the sample mean Snn\frac{S_{n}}{n} converges to the population mean μ\mu. In formal terms, we have the weak law of large numbers:

P(limnSnn=μ)=1P\left(\lim_{n\rightarrow\infty} \frac{S_{n}}{n} = \mu\right) = 1

Or equivalently, for all ϵ>0\epsilon > 0,

limnP(Snnμϵ)=0\lim_{n\rightarrow\infty} P\left(|\frac{S_{n}}{n} - \mu| \geq \epsilon\right) = 0

Central Limit Theorem

Limit of Distribution

Given XX a random variable with CDF FX(x)F_X(x), and sequence of random variables X1,X2,X_1, X_2, \ldots with CDF FXn(x)F_{X_n}(x), we say that the sequence of random variables converges in distribution to XX if FXn(x)F_{X_n}(x) converge pointwise to FX(x)F_X(x).

As the MGF uniquely determines the distribution of a random variable, we have the following theorem:

Given XX a random variable with MGF MX(t)M_X(t), and sequence of random variables X1,X2,X_1, X_2, \ldots with MGF MXn(t)M_{X_n}(t), and all of the MGFs exist and are finite on a same open neighbourhood of 00, then X1,X2,X_1, X_2, \ldots converge in distribution to XX if and only if MXn(t)M_{X_n}(t) converge pointwise to MX(t)M_X(t) on the open neighbourhood of 00.

Central Limit Theorem

Given XX a random variable with mean μ\mu and variance σ2\sigma^2, and Sn=i=1nXS_{n} = \sum_{i=1}^{n}X, then the distribution of Snnμnσ\frac{S_{n} - n\mu}{\sqrt{n}\sigma} converges in distribution to the standard normal distribution.