parametric vs non-parametric models
1. What is the difference between parametric and non-parametric models?
Answer from Rodvi's answer on stackoverflow, wikipedia, and 18.650 lectures.
First, what do we mean when we say model?
We could be talking about statistical models. In which case, a parametric statistical model is a family of probability functions that could have given rise to the observed data. This family is parameterized by a set \(\Theta\). Formally, a parametric statistical model is a pair \[ (\Omega, (\mathbb{P}_{\theta})_{\theta \in \Theta}) \] where \(\Omega\) is the outcome space and \(\mathbb{P}_{\theta}\) is a probability measure on \(\Omega\) for each \(\theta \in \Theta\).
For example, we could assume that our data was distributed according to a Gaussian. Then, our family would be parameterized by \(\mu\) and \(\sigma\).
In contrast, we could make no such assumptions about the generating distributions. We could merely specify that they be continuous or symmetric. Then, there's not a simple set of parameters that we could give where each member of the set is a parameter vector of the same length. Then, we would be talking about a non-parametric model.
Alternatively, we could be talking about hypotheses \(h: \mathcal{X} \rightarrow \mathcal{Y}\) that try to map data examples to a label, i.e. what we do in machine learning. Then, a parametric family of hypothesis can be defined, where each hypothesis in the family is specified by a vector of parameters. For example, \(\mathcal{H} = \{h(x;w,b) = wx + b \mid (w,b) \in \mathbb{R}^2 \}\) is the family of linear classifiers. In contrast, we could have a family of hypothesis where each hypothesis in the family has a different number of parameters. Usually this is because the number of parameters depends on the data. Examples of non-parametric hypothesis families include K-nearest-neighbors and decision trees (whose parameters are the number of leaves and the regions in the outcome space that the leaves cover, both of which can vary between trees).
Another example of a non-parametric model: just all the valid density functions. That is, all the functions \(f\) such that \(f>0\) and \(f\) integrates to 1. There are no finite dimensional set of parameters that can describe all such functions. And usually trying to select such a function amounts to building a histogram.