confidence interval

1. formal definition

We have a dataset of samples \(x_1, ..., x_n\) sampled from random variables \(X_1, ..., X_n\)
Let \(\theta\) be some parameter that we are interested in, e.g. \(\mu\)
Let \(L(X_1,...,X_n)\) and \(U(X_1,...,X_n)\) be two statistics – note that they are also random variables
Let \(0 \leq \gamma \leq 1\) – this is the confidence level
Then, \(L\) and \(U\) define a confidence interval for confidence level \(\gamma\) if:

\[ P(U < \theta < L) = \gamma \] for every \(\theta\)

Then, we can compute a specific confidence interval \(U(x_1,...,x_n)\) and \(L(x_1,...,x_n)\).

1.1. commentary

Note that \(\theta\) is not a random variable. \(P(U < \theta < L)\) can be written \(P(U < \theta \wedge L > \theta)\).

\(P(U < \theta < L) \forall \theta\) means that no matter what the parameter is, then for \(\gamma\) fraction of the samples, the confidence intervals will contain the parameter.

Note that this does not mean that, for a particular sampled set, the interval contains the parameter with \(\gamma\) certainty. \(\gamma\) just tells us the reliability of the confidence interval estimation procedure. As far as I can tell this is mostly a philosophical distinction. Neyman says that for a given realized dataset sample, there is no probability that the interval contains the true parameter. It is either a fact that it does or doesn't.

Think of a roulette wheel. The casino knows that the gamblers should win \(\gamma\) % of the time. Suppose that the winning number of a spin is 1. This does not imply that there is a \(\gamma\) chance that the gambler picked 1.

2. confidence intervals and p-values

"if the hypothesis is not inside the 95% confidence interval of the null, the probability of the null being true is 5%"
TODO: think about whether or not this makes sense
https://online.stat.psu.edu/stat500/lesson/6a/6a.5
also here

3. from 18.650 lectures

3.1. confidence intervals

Let \((\Omega, (P_{\theta})_{\theta\in \Theta})\) be a statistical model where \(\Theta \subset \mathbb{R}\). A level \(1-\alpha\) confidence interval is a random interval \(I(X_1,\dots,X_n)\) that does not depend on \(\theta\) such that: \[ P[\theta \in I] \geq 1 - \alpha \] for all \(\theta \in \Theta\)

3.2. asymptotic confidence intervals

An asymptotic level \(1-\alpha\) confidence interval is a sequence of random intervals \(I_n(X_1,\dots,X_n)\) that do not depend on \(\theta\) such that: \[ \lim_{n\rightarrow\infty} P_{\theta}(\theta \in I) \geq 1-\alpha \] for all \(\theta \in \Theta\)

3.3. example and statistician's approximation

Consider the case where we want to find the value \(p\) of a bernoulli random variable. We take \(n\) trials and find the average \(X_n\).

By the central limit theorem, let's say that we have \(\sqrt{n}(\bar{X}_n - p) \rightarrow \mathcal{N}(0, \sigma^2)\) as \(n\rightarrow \infty\). Statisticians often make the following approximation for very large \(n\): \[ \sqrt{n}(\bar{X}_n - p) \approx \mathcal{N}(0, \sigma^2) \]

Then, we can use this approximation to bound the probability that \(|\bar{X}_n - p| \geq a\): \[ P\left[|X_n - p| \geq \frac{a}{\sqrt{n}}\right] = P\left[\mathcal{N}(0,1) \geq \frac{a}{\sigma} \right] = 2\left(1-\Phi\left(\frac{a}{2}\right)\right) \]

If \(q_{\alpha/2}\) is the \((1-\alpha)\) quartile of \(\mathcal{N}(0,1)\), then, from the above: \[ P\left[|X_n - p| \geq \frac{q_{\alpha/2}\sigma}{\sqrt{n}}\right] = P\left[\mathcal{N}(0,1) \geq \frac{q_{\alpha/2}\sigma}{\sigma} \right] = 2\left(1-\Phi\left(q_{\alpha/2}\right)\right) = 1 - \alpha \]

That is with probability \(1-\alpha\) (if we make the approximation): \[|X_n - p| \leq \frac{q_{\alpha/2}\sigma}{\sqrt{n}}\]

So the interval \(X_n \pm \frac{q_{\alpha/2}\sigma}{\sqrt{n}}\) is a \(1-\alpha\) confidence interval for \(p\).

3.4. commentary

3.4.1. interpretation 1

If we run the confidence interval procedure many times, \(\gamma\) percent of the time, the confidence interval procedure will contain the true parameter.

3.4.2. interpretation 2

Given the data, consider all possible values of the null hypothesis parameter, such that the data would not be a cause to reject the null hypothesis under a \((1-\gamma)\) level test. The range of these values is the confidence interval.

Can we make the two interpretations gel with each other? I think this only works because the estimate is distributed normally (which is symmetric) about the true mean. In general it is not true. Note that the definition gives us a lot of leeway in how we can construct our intervals. So you can imagine a construction that does not satisfy 2: under the true distribution with true parameter \(\theta\), consider some region of the outcome space that is a statistically significant distance away from \(\theta\). What if, for that region, we expanded the confidence interval so that \(\theta\) was captured. Then, in order to keep balance, we pick some region of equal probability that is not a statistically signficant distance away from \(\theta\) and designate that the confidence intervals be so narrow that they do not capture \(\theta\). Then, \(\gamma\) percent of our intervals will capture \(\theta\), but interpretation 2 is not satisfied.

But at least for the above bernoulli example, it is true. Let's say the true parameter is \(\theta\). Let's consider a null hypothesis that puts the parameter at \(\theta\). Consider a \((1-\gamma)\) level test. Then \(\gamma\) percent of the samples should not reject the null hypothesis. That is, for \(\gamma\) percent of the samples, the confidence interval contains the true parameter. This will happen if and only if \(X_n\) is \(\frac{q_{\alpha/2}\sigma}{\sqrt{n}}\) away from \(p\)

3.5. plug-in method

But what if we don't know \(\sigma\)? Use the plug-in method (see bootstrapping (statistics)). Plug in \(\hat{\sigma}\), our estimate of the variance.

This is fine because of slutsky's theorem: \[ \left|P\left[\sqrt{n}|X_n - \theta| \geq q_{\alpha/2}\hat{\sigma}\right] - P\left[\mathcal{N}(0,1) \geq q_{\alpha/2}\sigma \right] \right| \overset{d}{\rightarrow} 0 \] as \(n\rightarrow\infty\). In other words, for big enough \(n\), statisticians approximate \(\sqrt{n}|X_n - \theta|\) with a normal distribution with variance \(\hat{\sigma}^2\).

3.6. example

Consider the exponential statistical model: \[ ((0,\infty), (Exp(\lambda))_{\lambda \in (0,\infty)} \]

Let \(T_1,...,T_n\) be samples drawn from a particular \(P_{\lambda}\).

Recall:

\(\mathbb{E}[T_i] = \lambda^{-1}\)
\(\text{var}[T_i] = \lambda^{-2}\)
density \(f(x) = \lambda e^{-\lambda x}\)

Then, by the Law of Large Numbers, \(\bar{T}_n \rightarrow \lambda^{-1}\) almost surely and in probability. So \(\frac{1}{\bar{T}_n}\) goes to \(\lambda\) almost surely and in probability. So \(\hat{\lambda} = \frac{1}{\bar{T}_n}\) is a consistent estimator for \(\lambda\).

Then, by central limit theorem, \[ \sqrt{n}(\bar{T}_n - \lambda^{-1}) \overset{d}{\rightarrow} \mathcal{N}(0, \lambda^{-2}) \]

Then, by the delta method, where \(g(x) = \frac{1}{x}\), we have: \[ \sqrt{n}(\bar{T}_n^{-1} - \lambda) \overset{d}{\rightarrow} \mathcal{N}(0, \lambda^{2}) \]

So by the above discussion, we have that: \[ \hat{\lambda} \pm \frac{q_{\alpha/2}\lambda}{\sqrt{n}} \] is a \(1-\alpha\) asymptotic confidence interval for \(\lambda\). (Note that we do not use the statistician's approximation here, we merely say that the confidence interval is asymptotic.)

But we don't know \(\lambda\), so we replace it with \(\hat{\lambda}\) by slutsky's theorem: \[ \hat{\lambda} \pm \frac{q_{\alpha/2}\hat{\lambda}}{\sqrt{n}} \] (remember we just care what happens asymptotically)