# student's t test

Recall hypothesis testing. The gist of the game is: we assume that the statistic has a t distribution under the null hypothesis. With this assumption, we can calculate the probability that the statistic takes the observed value under the null hypothesis, and this lets us give p-values.

## 1. difference between two means

The difference between two means from two populations follows a t-distribution if the variances are equal. If we don't assume that the variances are equal, then we have Welch's test.

In the case of Welch's test, the formula for degrees of freedom is more complicated.

## 2. signficance of correlation

- Say that we have the pearson correlation coefficient between two samples. What is the probability that this coefficient was observed under the null hypothesis that the correlation is 0?
- It turns out that

\[ t_{score} = \frac{r\sqrt{n-2}}{(1-r^2)} \] follows a t distribution with degrees of freedom (n-2) under the null hypothesis.

## 3. student's t distribution

- For a positive integer \(d\) the student's T distribution with \(d\) degrees of freedom is the law of the random variable \(\frac{Z}{\sqrt{V/d}}\) where \(Z\sim N(0,1)\) and \(V\sim \chi^2_d\) and \(Z \perp V\).
- Then, under the null hypothesis that the mean is \(\mu_0\), can we find a statistic that follows the t-distribution? yes!
- Let \(Z=\sqrt{n} \frac{\bar{X}_n - \mu_0}{\sigma} \sim N(0,1)\)
- Let \(V = \frac{nS_n}{\sigma^2} \sim \chi_{n-1}^2\) (see chi squared distribution)
- Then, \(\frac{Z}{\sqrt{V/(n-1)}} = \sqrt{n-1}\frac{\bar{X}_n - \mu_0}{\sqrt{S_n}} \sim t_{n-1}\)
- here, we use Cochran's theorem, which says that the sample mean and sample variance are independent, to justify ourselves in saying this follows the student's t distribution.
- advantages: this lets us use the sample variance instead of having to know the true variance.

- Let's remember how we got here from our original hypothesis testing motivations. We wanted a test statistic whose distribution could be known. We didn't know the variance, so we estimate the variance using \(S_n\). It turns out that when we do this, we get something that follows the t-distribution.