The Hammersley–Clifford theorem is a result in probability theory , mathematical statistics and statistical mechanics that gives necessary and sufficient conditions under which a strictly positive probability distribution can be represented as events generated by a Markov network (also known as a Markov random field ). It is the fundamental theorem of random fields .[ 1] It states that a probability distribution that has a strictly positive mass or density satisfies one of the Markov properties with respect to an undirected graph G if and only if it is a Gibbs random field , that is, its density can be factorized over the cliques (or complete subgraphs ) of the graph.
The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin [ 2] and Frank Spitzer [ 3] in the context of statistical mechanics . The theorem is named after John Hammersley and Peter Clifford , who proved the equivalence in an unpublished paper in 1971.[ 4] [ 5] Simpler proofs using the inclusion–exclusion principle were given independently by Geoffrey Grimmett ,[ 6] Preston[ 7] and Sherman[ 8] in 1973, with a further proof by Julian Besag in 1974.[ 9]
A simple Markov network for demonstrating that any Gibbs random field satisfies every Markov property. It is a trivial matter to show that a Gibbs random field satisfies every Markov property . As an example of this fact, see the following:
In the image to the right, a Gibbs random field over the provided graph has the form Pr ( A , B , C , D , E , F ) ∝ f 1 ( A , B , D ) f 2 ( A , C , D ) f 3 ( C , D , F ) f 4 ( C , E , F ) {\displaystyle \Pr(A,B,C,D,E,F)\propto f_{1}(A,B,D)f_{2}(A,C,D)f_{3}(C,D,F)f_{4}(C,E,F)} . If variables C {\displaystyle C} and D {\displaystyle D} are fixed, then the global Markov property requires that: A , B ⊥ E , F | C , D {\displaystyle A,B\perp E,F|C,D} (see conditional independence ), since C , D {\displaystyle C,D} forms a barrier between A , B {\displaystyle A,B} and E , F {\displaystyle E,F} .
With C {\displaystyle C} and D {\displaystyle D} constant, Pr ( A , B , E , F | C = c , D = d ) ∝ [ f 1 ( A , B , d ) f 2 ( A , c , d ) ] ⋅ [ f 3 ( c , d , F ) f 4 ( c , E , F ) ] = g 1 ( A , B ) g 2 ( E , F ) {\displaystyle \Pr(A,B,E,F|C=c,D=d)\propto [f_{1}(A,B,d)f_{2}(A,c,d)]\cdot [f_{3}(c,d,F)f_{4}(c,E,F)]=g_{1}(A,B)g_{2}(E,F)} where g 1 ( A , B ) = f 1 ( A , B , d ) f 2 ( A , c , d ) {\displaystyle g_{1}(A,B)=f_{1}(A,B,d)f_{2}(A,c,d)} and g 2 ( E , F ) = f 3 ( c , d , F ) f 4 ( c , E , F ) {\displaystyle g_{2}(E,F)=f_{3}(c,d,F)f_{4}(c,E,F)} . This implies that A , B ⊥ E , F | C , D {\displaystyle A,B\perp E,F|C,D} .
To establish that every positive probability distribution that satisfies the local Markov property is also a Gibbs random field, the following lemma, which provides a means for combining different factorizations, needs to be proved:
Lemma 1 provides a means for combining factorizations as shown in this diagram. Note that in this image, the overlap between sets is ignored. Lemma 1
Let U {\displaystyle U} denote the set of all random variables under consideration, and let Θ , Φ 1 , Φ 2 , … , Φ n ⊆ U {\displaystyle \Theta ,\Phi _{1},\Phi _{2},\dots ,\Phi _{n}\subseteq U} and Ψ 1 , Ψ 2 , … , Ψ m ⊆ U {\displaystyle \Psi _{1},\Psi _{2},\dots ,\Psi _{m}\subseteq U} denote arbitrary sets of variables. (Here, given an arbitrary set of variables X {\displaystyle X} , X {\displaystyle X} will also denote an arbitrary assignment to the variables from X {\displaystyle X} .)
If
Pr ( U ) = f ( Θ ) ∏ i = 1 n g i ( Φ i ) = ∏ j = 1 m h j ( Ψ j ) {\displaystyle \Pr(U)=f(\Theta )\prod _{i=1}^{n}g_{i}(\Phi _{i})=\prod _{j=1}^{m}h_{j}(\Psi _{j})}
for functions f , g 1 , g 2 , … g n {\displaystyle f,g_{1},g_{2},\dots g_{n}} and h 1 , h 2 , … , h m {\displaystyle h_{1},h_{2},\dots ,h_{m}} , then there exist functions h 1 ′ , h 2 ′ , … , h m ′ {\displaystyle h'_{1},h'_{2},\dots ,h'_{m}} and g 1 ′ , g 2 ′ , … , g n ′ {\displaystyle g'_{1},g'_{2},\dots ,g'_{n}} such that
Pr ( U ) = ( ∏ j = 1 m h j ′ ( Θ ∩ Ψ j ) ) ( ∏ i = 1 n g i ′ ( Φ i ) ) {\displaystyle \Pr(U)={\bigg (}\prod _{j=1}^{m}h'_{j}(\Theta \cap \Psi _{j}){\bigg )}{\bigg (}\prod _{i=1}^{n}g'_{i}(\Phi _{i}){\bigg )}}
In other words, ∏ j = 1 m h j ( Ψ j ) {\displaystyle \prod _{j=1}^{m}h_{j}(\Psi _{j})} provides a template for further factorization of f ( Θ ) {\displaystyle f(\Theta )} .
Proof of Lemma 1
In order to use ∏ j = 1 m h j ( Ψ j ) {\displaystyle \prod _{j=1}^{m}h_{j}(\Psi _{j})} as a template to further factorize f ( Θ ) {\displaystyle f(\Theta )} , all variables outside of Θ {\displaystyle \Theta } need to be fixed. To this end, let θ ¯ {\displaystyle {\bar {\theta }}} be an arbitrary fixed assignment to the variables from U ∖ Θ {\displaystyle U\setminus \Theta } (the variables not in Θ {\displaystyle \Theta } ). For an arbitrary set of variables X {\displaystyle X} , let θ ¯ [ X ] {\displaystyle {\bar {\theta }}[X]} denote the assignment θ ¯ {\displaystyle {\bar {\theta }}} restricted to the variables from X ∖ Θ {\displaystyle X\setminus \Theta } (the variables from X {\displaystyle X} , excluding the variables from Θ {\displaystyle \Theta } ).
Moreover, to factorize only f ( Θ ) {\displaystyle f(\Theta )} , the other factors g 1 ( Φ 1 ) , g 2 ( Φ 2 ) , . . . , g n ( Φ n ) {\displaystyle g_{1}(\Phi _{1}),g_{2}(\Phi _{2}),...,g_{n}(\Phi _{n})} need to be rendered moot for the variables from Θ {\displaystyle \Theta } . To do this, the factorization
Pr ( U ) = f ( Θ ) ∏ i = 1 n g i ( Φ i ) {\displaystyle \Pr(U)=f(\Theta )\prod _{i=1}^{n}g_{i}(\Phi _{i})}
will be re-expressed as
Pr ( U ) = ( f ( Θ ) ∏ i = 1 n g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) ) ( ∏ i = 1 n g i ( Φ i ) g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) ) {\displaystyle \Pr(U)={\bigg (}f(\Theta )\prod _{i=1}^{n}g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}]){\bigg )}{\bigg (}\prod _{i=1}^{n}{\frac {g_{i}(\Phi _{i})}{g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])}}{\bigg )}}
For each i = 1 , 2 , . . . , n {\displaystyle i=1,2,...,n} : g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) {\displaystyle g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])} is g i ( Φ i ) {\displaystyle g_{i}(\Phi _{i})} where all variables outside of Θ {\displaystyle \Theta } have been fixed to the values prescribed by θ ¯ {\displaystyle {\bar {\theta }}} .
Let f ′ ( Θ ) = f ( Θ ) ∏ i = 1 n g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) {\displaystyle f'(\Theta )=f(\Theta )\prod _{i=1}^{n}g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])} and g i ′ ( Φ i ) = g i ( Φ i ) g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) {\displaystyle g'_{i}(\Phi _{i})={\frac {g_{i}(\Phi _{i})}{g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])}}} for each i = 1 , 2 , … , n {\displaystyle i=1,2,\dots ,n} so
Pr ( U ) = f ′ ( Θ ) ∏ i = 1 n g i ′ ( Φ i ) = ∏ j = 1 m h j ( Ψ j ) {\displaystyle \Pr(U)=f'(\Theta )\prod _{i=1}^{n}g'_{i}(\Phi _{i})=\prod _{j=1}^{m}h_{j}(\Psi _{j})}
What is most important is that g i ′ ( Φ i ) = g i ( Φ i ) g i ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) = 1 {\displaystyle g'_{i}(\Phi _{i})={\frac {g_{i}(\Phi _{i})}{g_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])}}=1} when the values assigned to Φ i {\displaystyle \Phi _{i}} do not conflict with the values prescribed by θ ¯ {\displaystyle {\bar {\theta }}} , making g i ′ ( Φ i ) {\displaystyle g'_{i}(\Phi _{i})} "disappear" when all variables not in Θ {\displaystyle \Theta } are fixed to the values from θ ¯ {\displaystyle {\bar {\theta }}} .
Fixing all variables not in Θ {\displaystyle \Theta } to the values from θ ¯ {\displaystyle {\bar {\theta }}} gives
Pr ( Θ , θ ¯ ) = f ′ ( Θ ) ∏ i = 1 n g i ′ ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) = ∏ j = 1 m h j ( Ψ j ∩ Θ , θ ¯ [ Ψ j ] ) {\displaystyle \Pr(\Theta ,{\bar {\theta }})=f'(\Theta )\prod _{i=1}^{n}g'_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])=\prod _{j=1}^{m}h_{j}(\Psi _{j}\cap \Theta ,{\bar {\theta }}[\Psi _{j}])}
Since g i ′ ( Φ i ∩ Θ , θ ¯ [ Φ i ] ) = 1 {\displaystyle g'_{i}(\Phi _{i}\cap \Theta ,{\bar {\theta }}[\Phi _{i}])=1} ,
f ′ ( Θ ) = ∏ j = 1 m h j ( Ψ j ∩ Θ , θ ¯ [ Ψ j ] ) {\displaystyle f'(\Theta )=\prod _{j=1}^{m}h_{j}(\Psi _{j}\cap \Theta ,{\bar {\theta }}[\Psi _{j}])}
Letting h j ′ ( Θ ∩ Ψ j ) = h j ( Ψ j ∩ Θ , θ ¯ [ Ψ j ] ) {\displaystyle h'_{j}(\Theta \cap \Psi _{j})=h_{j}(\Psi _{j}\cap \Theta ,{\bar {\theta }}[\Psi _{j}])} gives:
f ′ ( Θ ) = ∏ j = 1 m h j ′ ( Θ ∩ Ψ j ) {\displaystyle f'(\Theta )=\prod _{j=1}^{m}h'_{j}(\Theta \cap \Psi _{j})} which finally gives:
Pr ( U ) = ( ∏ j = 1 m h j ′ ( Θ ∩ Ψ j ) ) ( ∏ i = 1 n g i ′ ( Φ i ) ) {\displaystyle \Pr(U)={\bigg (}\prod _{j=1}^{m}h'_{j}(\Theta \cap \Psi _{j}){\bigg )}{\bigg (}\prod _{i=1}^{n}g'_{i}(\Phi _{i}){\bigg )}}
The clique formed by vertices x 1 {\displaystyle x_{1}} , x 2 {\displaystyle x_{2}} , and x 3 {\displaystyle x_{3}} , is the intersection of { x 1 } ∪ ∂ x 1 {\displaystyle \{x_{1}\}\cup \partial x_{1}} , { x 2 } ∪ ∂ x 2 {\displaystyle \{x_{2}\}\cup \partial x_{2}} , and { x 3 } ∪ ∂ x 3 {\displaystyle \{x_{3}\}\cup \partial x_{3}} . Lemma 1 provides a means of combining two different factorizations of Pr ( U ) {\displaystyle \Pr(U)} . The local Markov property implies that for any random variable x ∈ U {\displaystyle x\in U} , that there exists factors f x {\displaystyle f_{x}} and f − x {\displaystyle f_{-x}} such that:
Pr ( U ) = f x ( x , ∂ x ) f − x ( U ∖ { x } ) {\displaystyle \Pr(U)=f_{x}(x,\partial x)f_{-x}(U\setminus \{x\})}
where ∂ x {\displaystyle \partial x} are the neighbors of node x {\displaystyle x} . Applying Lemma 1 repeatedly eventually factors Pr ( U ) {\displaystyle \Pr(U)} into a product of clique potentials (see the image on the right).
End of Proof
^ Lafferty, John D.; Mccallum, Andrew (2001). "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data" . Proc. of the 18th Intl. Conf. on Machine Learning (ICML-2001) . Morgan Kaufmann. ISBN 9781558607781 . Retrieved 14 December 2014 . by the fundamental theorem of random fields (Hammersley & Clifford 1971 ) ^ Dobrushin, P. L. (1968), "The Description of a Random Field by Means of Conditional Probabilities and Conditions of Its Regularity" , Theory of Probability and Its Applications , 13 (2): 197– 224, doi :10.1137/1113026 ^ Spitzer, Frank (1971), "Markov Random Fields and Gibbs Ensembles", The American Mathematical Monthly , 78 (2): 142– 154, doi :10.2307/2317621 , JSTOR 2317621 ^ Hammersley, J. M.; Clifford, P. (1971), Markov fields on finite graphs and lattices (PDF) ^ Clifford, P. (1990), "Markov random fields in statistics" , in Grimmett, G. R.; Welsh, D. J. A. (eds.), Disorder in Physical Systems: A Volume in Honour of John M. Hammersley , Oxford University Press, pp. 19– 32, ISBN 978-0-19-853215-6 , MR 1064553 , retrieved 2009-05-04 ^ Grimmett, G. R. (1973), "A theorem about random fields", Bulletin of the London Mathematical Society , 5 (1): 81– 84, CiteSeerX 10.1.1.318.3375 , doi :10.1112/blms/5.1.81 , MR 0329039 ^ Preston, C. J. (1973), "Generalized Gibbs states and Markov random fields", Advances in Applied Probability , 5 (2): 242– 261, doi :10.2307/1426035 , JSTOR 1426035 , MR 0405645 ^ Sherman, S. (1973), "Markov random fields and Gibbs random fields", Israel Journal of Mathematics , 14 (1): 92– 103, doi :10.1007/BF02761538 , MR 0321185 ^ Besag, J. (1974), "Spatial interaction and the statistical analysis of lattice systems", Journal of the Royal Statistical Society, Series B , 36 (2): 192– 236, JSTOR 2984812 , MR 0373208