Frequentist vs Bayesian Inference
1. Quick heuristics
Typically, when we talk about Bayesian inference, we just mean that we have a prior on the parameter \(\theta\). With the prior, we can easily have a maximum a priori decision rule. Without a prior, things become more difficult. Instead of talking about the MAP hypothesis, we usually come up with a decision rule that performs well enough under any possible value of the parameter \(\theta\). Of course, we have no idea what the true \(\theta\) is.
There's also a philosophical difference. In the frequentist world, \(\theta\) has a fixed value, albeit an unknown one. There's no possibility that it can take any other value than its "true" value. In the Bayesian world, we instead talk about a prior, a distribution over \(\theta\).
2. Old notes
In the frequentist interpretation of probability, the probability of an event \(E\) is the limit of the fraction of times that \(E\) occurs across many experiments.
Consider the problem of trying to infer a parameter \(\theta\) from data. For example, if we know that the data is distributed according to \(\mathcal{N}(\mu, 1)\), we might want to find \(\mu\). In frequentist interpretations, there is no notion of a prior probability distribution over possible \(\mu\). This is because, there is only one \(\mu\) which generated the data. In the frequentist interpretation, we can talk about probabilites for experiments, but not probabilities of parameters (e.g. the bias of a coin). Instead, when we do inference, we compare likelihoods. For example, we might ask "What is the likelihood of \(\mu_0\), given the data?"
Often, the goal of frequentist inference is a true or false answer.
In contrast, the goal of bayesian inference is often a distribution.
In the Bayesian inference, we view probabilities as degrees of belief. Thus, we are able to talk about a prior distribution over unkown parameters such as \(\mu\). This distribution is often unkowable, so we give the best prior we can as well as an explanation for why we chose that prior. Given evidence, we update our distribution with Baye's rule.