# covariance

From 6.436 lecture notes here.

## 1. formula

\[ \text{cov}(X,Y) = \mathbb{E}\left[(X - \mathbb{E}[X])(Y-\mathbb{E}[Y])\right] \]

You can think of \(X\) and \(Y\) as spreads that center on their respective means. Notice that the covariance is large and positive when \(X\) is above its mean when \(Y\) is also above its mean. The covariance is large and negative when the opposite is true.

## 2. alternative formula

\[ \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \]

## 3. independent r.v.'s

If \(X\) and \(Y\) are independent, then the covariance is 0. This can be seen from the above alternative formula, and the fact that the expectation of a product of independent r.v.'s is the product of their expectations (see properties of expectation). It can also be seen from the original formula. Look at the terms of the expectation. Imagine fixing some \(x\), then looking at how \(Y\) varies around its mean. If instead, we were looking at \(-x\), the distribution of \(Y\) would be the same (because independence). Then all the terms involving \(x\) will cancel out with the terms involving \(x\).

Finally, if \(X\) and \(Y\) are independent, then the covariance is 0. But the reverse does not always hold. Think of \(X\) and \(Y\) that take \((0,1), (1,0), (-1,0), (0,-1)\) with probability \(\frac{1}{4}\) each. Then, \(\mathbb{E}[X] = \mathbb{E}[Y] = 0\) and \(\mathbb{E}[XY] = 0\), so \(\text{cov}(X,Y) = 0\). But \(X\) and \(Y\) are not independent, because if we know that \(X=1\), then \(Y\) must be \(0\).

## 4. variance for sum of random variables

For any random variables \(X_i\) (not necessarily independent): \[ \text{var}\left( \sum_{i=1}^n X_i \right) = \sum_{i=1}^n \text{var}(X_i) + 2\sum_{i\neq j} \text{cov}(X_i, X_j) \]

## 5. correlation coefficient

\[ \rho = \frac{\text{cov}(X,Y)}{\text{var}(X)\text{var}(Y)} \]

## 6. Cauchy schwarz inequality for expectations

If \(X\) and \(Y\) are two r.v.'s with finite variance, then \[ \mathbb{E}[XY]^2 \leq \mathbb{E}[X^2]\mathbb{E}[Y^2] \]

### 6.1. proof

The first line comes from the fact that we are taking an expectation over squared values. The third line comes from the linearity of expectation. How did someone come up with this proof? Maybe working backwards and trying to complete the square?

## 7. theorem: correlation measures linear relationship between \(X\) and \(Y\)

Let \(X\) and \(Y\) be discrete random variables with correlation coefficient \(\rho\). Then

- We have \(-1\leq \rho \leq 1\).
- We have \(\rho = 1\) (or \(\rho\) = -1) if and only if there exists a positive (negative) constant \(a\) such that \(Y - \mathbb{E}[Y] = a(X - \mathbb{E}[X])\) with probability 1.

### 7.1. proof

- Let \(\tilde{X}=X-\mathbb{E}[X]\) and \(\tilde{Y} = Y-\mathbb{E}[Y]\). From the Cauchy Schwarz inequality, we have \((\mathbb{E}[\tilde{X}\tilde{Y}])^2 \leq (\mathbb{E}[\tilde{X}^2])(\mathbb{E}[\tilde{Y}^2])\), so \(\left|\frac{(\mathbb{E}[\tilde{X}\tilde{Y}])}{\sqrt{\mathbb{E}[\tilde{X}^2]\mathbb{E}[\tilde{Y}^2]}}\right| \leq 1\)
- One direction: if \(\tilde{Y} = a\tilde{X}\), then:

which is either \(1\) or \(-1\), depending on the sign of \(a\). Now, the other direction. Let \(\rho=1\). Then \(\mathbb{E}[\tilde{X}\tilde{Y}] = \sqrt{\mathbb{E}[\tilde{X}^2]\mathbb{E}[\tilde{Y}^2]}\), so \(\mathbb{E}[\tilde{X}\tilde{Y}]^2 = \mathbb{E}[\tilde{X}^2] \mathbb{E}[\tilde{Y}^2]\). So \(\mathbb{E}[\tilde{X}^2] - \frac{\mathbb{E}[\tilde{X}\tilde{Y}]^2}{\mathbb{E}[\tilde{Y}^2]} = 0\). Now, look at the first and last line in the proof for the Cauchy Schwarz inequality. Then, we have: \[ 0 = \tilde{X} - \frac{\mathbb{E}[\tilde{X}\tilde{Y}]}{\mathbb{E}[\tilde{Y}^2]}\tilde{Y} \] with probability 1. (because the expectation in the first line is taken over squared (i.e. non-negative) values, so the expectation is 0 iff the above is 0). In other words, \[ \tilde{X} = \frac{\mathbb{E}[\tilde{X}\tilde{Y}]}{\mathbb{E}[\tilde{Y}^2]}\tilde{Y} = \sqrt{\frac{\mathbb{E}[\tilde{X}^2]}{\mathbb{E}[\tilde{Y}^2]}}\rho(X,Y)\tilde{Y} \] with probability 1. Notice that this is a linear relationship, where the multiplying constant has a sign that is determined by \(\rho\).