hammersley-clifford theorem
Any probability distribution that satisfies markov property with respect to a graph is a gibbs random field and vice versa.
- the markov property means that if any two nodes \(X\) and \(Y\) can be separated by a set of nodes \(U\), then \(X \perp Y \mid U\)
- a gibbs random field is any probability distribution that can be written as a product of factors over cliques: \(P(U) = \prod_{d\in D} f_d(X_d)\)
1. proof:
Proof from wikipedia with my commentary
1.1. mrf \(\Rightarrow\) grf
1.1.1. lemma 1
- Let \(U\) be all the variables under consideration.
- Let \(\Theta, \Phi_1, ... \Phi_n \subseteq U\) and \(\Psi_1, ..., \Psi_m \subseteq U\) be arbitary sets of variables
- Here, when applicable, \(X\) will also refer to an (arbitrary) assignment to the variables \(X\)
If, \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g(\Phi_i) = \prod_{j=1}^{m}h(\Psi_j) \] for functions \(f\), \(g_i\) and \(h_j\), then there exist functions \(g_i'\) and \(h_j'\) such that: \[ Pr(U) = \prod_{j=1}^{m}h'(\theta \cap \Psi_j)\prod_{i=1}^{n}g'(\Phi_i) \]
In other words, \(h_j\) gives a template for further factoring \(f\).
Now, with lemma 1, we can finish the proof. The local Markov property tells us that for any variable \(x\), we can write:
\begin{equation} Pr(U) = f_{x}(x,\delta x)f_{-x}(x,U\setminus\{x\}) \label{factor} \end{equation}where \(\delta x\) are the neighors of \(x\). Namely: \[ Pr(U) = \underbrace{P(\delta x)P(x \mid \delta x)}_{f_{x}} \underbrace{P(U\setminus(\{x\} \cup \delta x) \mid \delta x)}_{f_{-x}} \]
Then, let's repeatedly apply lemma 1 to Equation \ref{factor}. The goal is to end up with an expression that only contains functions over cliques. Let's take a look at what happens when our template is: \[ Pr(u) = f_{y}(y, \delta y) f_{-y}(y, U\setminus\{y\}) \] Then, by lemma 1, applying this template to \Equation \ref{factor} gets us: \[ f_{x,y}'(\{x,\delta x\} \cap \{y, \delta y\})h'(\{x,\delta x\} \cap (U\setminus \{y\}))f_{-x}'(U\setminus \{x\}) \] Since we get to choose how to factor this function, let's assume that \(x\) and \(y\) belong to the same maximal clique
To take stock:
- fx,y' is a function over things that are neighbors of \(x\) and \(y\)
- h' is now a function over \((\{x,\delta x\} \setminus \{y\})\)
- f-x is still a function over (U \setminux \{x\})
Here is the key realization:
- Let's recursively take templates and factor each one of the terms
- so for \(f_{x,y}\) let's continue to factor it, i.e. by using the templates for \(z\), where \(z\) is some node in the same maximal clique as \(x\) and \(y\)
- once that first term consists only of nodes in a maximal clique, we can move on to the next term and rinse and repeat
- how do we know that this process will terminate? because the sets of variables involved will strictly decrease everytime we factor
1.1.1.1. proof of lemma 1
Let \(\bar\theta\) denote a particular assignment to the variables \(U\setminus \Theta\). For an arbitrary set of variables \(X\) let \(\bar\theta[X]\) be the variables in \(X\) where the assignments to the variables that fall outside \(\Theta\) are fixed by \(\theta\).
Then, we will write \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g_i(\Phi_i) \] as \[ Pr(U) = f(\Theta)\prod_{i=1}^{n}g_i(\Phi_i \cap \Theta, \bar\theta[\Phi_i])\prod_{i=1}^{n}\frac{g_i(\Phi_i)}{g_i(\Phi_i\cap\Theta, \bar\theta[\Phi_i])} \] where \(g_i(\Theta_i \cap \Theta, \bar\theta[\Phi_i])\) is the function where the variables in \(\Phi_i\) that lie outside \(\Theta\) are fixed. Then, let:
- \(f'(\Theta) = f(\Theta)\prod_{i=1}^{n}g_i(\Theta\cap\Phi_i, \bar\theta[\Phi_i])\)
- \(g_i'(\Phi_i) = \frac{g_i(\Theta_i)}{g_i(\Phi_i \cap \Theta, \bar\theta[\Phi_i])}\)
Then, \[ Pr(U) = f'(\Theta)\prod_{i=1}^{n} g'(\Phi_i) = \prod_{j=1}^{m}h_j(\Psi_j) \]
Now, note that \(g_i'(\Phi_i) = 1\) when the assignments to \(\Phi_i\) exactly match the assignments in \(\bar\theta\). So, let's see what happens when we fix the assignments outside \(\Theta\) to be exactly the assignments in \(\bar\theta\). Then, we have:
\[ Pr(\Theta, \bar\theta) = f'(\Theta)\prod_{i=1}^n 1 = \prod_{j=1}^{m} h_j(\Psi_j\cap \Theta, \bar\theta[\Psi_j]) \] So, \[ f'(\Theta) = \prod_{j=1}^{m} h_j(\Psi_j\cap \Theta, \bar\theta[\Psi_j]) \] Then, let's set \[ h'_j(\Psi_j \cap \Theta) = h_j(\Psi_j \cap \Theta, \bar\theta[\Psi_j]) \] So, now we can finally say: \[ Pr(U) =\left( \prod_{j=1}^{m}h_j'(\Psi_j \cap \Theta)\right) \left(\prod_{i=1}^{n} g_i'(\Phi_i) \right) \]
1.2. grf \(\Rightarrow\) mrf
Consider \(X\) and \(Y\) separated by a set of nodes. Once all the separating variables have been fixed, we see that the probability function breaks into two neat pieces: one piece whose terms only involve \(X\) and one piece that only involves \(Y\)