p-value
1. as relates to linear regression
1.1. statistical model
- \(Y=\beta X + \epsilon\) where \(\epsilon\) is the noise which is 0 mean \(\sigma\) variance.
- In a regression problem, we have data \((x_i, y_i)\) which we say is sampled independently from each other
- If we consider the \(x\) 's to be fixed, then \(y\) is a random variable
- And the line of best fit \(\hat{\beta}\) is also a random variable
- And in fact \(\mathbb{E}[\hat{beta}] = \beta\) where \(\beta\) is the true line of fit (which we will never observe, but assume to exist)
- And if we assume that \(\epsilon\) is Gaussian, we can get the distribution of \(\hat{\beta}\), if we know \(\beta\)
- So, the p-value of \(\hat{beta}\) is the chance that \(\hat{\beta}\) takes the observed value if we assume that \(\beta=0\)
- extremely useful slides
- notes
- glm