# properties of expectation

From the 6.436 lecture notes here.

Recall that a discrete random variable is a random variable \(X\) on a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) if the range of \(X\) is countable or finite.

## 1. Definition: expectation

A discrete random variable \(X\) with a PMF \(p_X\) has expectation \[ \mathbb{E}[X] = \sum_{x} x p_X(x) \] whenever the sum is well defined. (Also, remember that this sum is always the countable range of \(X\))

## 2. Fact: alternative formula

If \(X\) only takes non-negative integer values, then \[ \mathbb{E}[X] = \sum_{n\geq 0}\mathbb{P}(X > n) \]

### 2.1. proof

\[\begin{align*} \sum_{n\geq 0 } \mathbb{P}(X > n) &= \sum_{n\geq 0} \sum_{i=n}^{\infty} \mathbb{P}(X = i)\\ &=\sum_{n\geq 0} n \mathbb{P}(X=i)\\ &=\mathbb{E}[X] \end{align*}\]

The second line comes from the fact that the term \(\mathbb{P}(X=i)\) appears \(i\) times in the sum.

## 3. Expectation of a function of a random variable: \(g(X)\)

## 5. linearity of expectation

For \(a,b\in \mathbb{R}\), we have \(\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\), provided the sums are well defined.

## 6. product of independent r.v.'s

If \(X\) and \(Y\) are independent, then \(\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]\) this is because: \[\begin{align*} \mathbb{E}[XY] &= \sum_{x,y} xy p_{X,Y}(x,y)\\ &= \sum_{x} x p_{X}(x) \sum_{y} yp_{Y}(y) \\ &= \mathbb{E}[X]\mathbb{E}[Y] \end{align*}\]

## 7. moments

Let \(Y=X^2\), by the law of the unconscious statistician, we have \(\mathbb{E}[Y] = \mathbb{E}[X^2] = \sum_x x^2 p_X(x)\).

\(\mathbb{E}[X^2]\) is called the *second moment* of \(X\)

The quantity \(\mathbb{E}[(X-\mathbb{E}[X])^r]\) is called the *r-th central moment* of \(x\).

The second central moment \(\mathbb{E}[(X-\mathbb{E}[X])^2]\) is called the *variance* of \(x\). See this interesting stack overflow question about why the variance is so often used to measure distance from the mean (as opposed to the absolute value or some other norm). One thing that stood out to me was the fact that the variance is essentially taking the l2 distance between a vector of samples \(X_i\) (in the limit, as the number of samples increases, we will have an expectation) and the vector \(\mu\mathbf{1}\).

The square root of the variance is the *standard* deviation.

## 8. definition: conditional expectation

Let \(A\) be an event with \(\mathbb{P}(A)>0\). Let \(X\) be a random variable. Then, the conditional expectation of \(X\) given \(A\) is: \[ \mathbb{E}[X\mid A] = \sum_{x} xp_{X\mid A}(x) \]