no free lunch

A learned mapping \(f\) is, averaged across all possible problems, just as good as any other.

Here, "averaged across all possible problems" is meant for the case where there is no structure. So precisely, if the space of all possible problems is the set of all functions from \(V\) into \(S\), \(f: V \rightarrow S\). And some ground truth \(f^*\) is selected from this set uniformly at random, and the task is to make a 'guess', \(\hat{f}\), then any algorithm to produce \(\hat{f}\) is as good as another. To see this, imagine selecting a certain \(\hat{f}\). Averaged across all possible choices of \(f^*\), it will be close to some choices of \(\hat{f}\) and further from some others, but to the same degree that any other \(\hat{f}\) would be. This isn't really a profound statment.