Graphical models

Table of Contents


  • graphical_models_da88015a6ebb980cd5f5027bf3834222f55a8107.png denotes the parent nodes of the rv. graphical_models_0207be880056b9a69e22e729dd37bced29cd174a.png in the BN
  • graphical_models_e20dcda5f035650122343c61053a7c3ad6acacaa.png denotes a clique
  • graphical_models_b842f34d1222c4678df9b0ee65af007a264d0f81.png denotes a set of variables in the clique graphical_models_e20dcda5f035650122343c61053a7c3ad6acacaa.png
  • graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png are potential functions

Markov Random Fields

A Markov random field, also known as a Markov network or an undirected graphical model, has a set of nodes each of which corresponds to a variable or group of variables, as well as a set of links each of which connects a pair of nodes.

The links are undirected, i.e. do not carry arrows.

Belief Network

A belief network is a distribution of the form


where graphical_models_f9f07a49fd74596d9a2d01866fd1d2cbecf0a505.png denotes the parental variables of variable graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png

This can be written as a DAG, with the i-th node in the graph corresponding to the factor graphical_models_af3ec5df73b411165e19b04a7901a090d4000096.png.

Factorization properties

  • Without loss of generality, only consider maximal cliques, since all other cliques are subsets

The joint distribution is written as a product of potential functions graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png over the maximal cliques of the graph


where the partition function is given by


which ensures that the distribution graphical_models_1ed25cf746e36b58c8d980488e7f49fe0ccb1b39.png is correctly normalized.

To make a formal connection between the conditional independence and factorization for undirected graphs, we restrict attention to potential functions graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png that are strictly positive, thus we can express them as:


where graphical_models_8853e473e7d8a944e8d8d7e080a19c817957ffa9.png is called the energy function , and the exponential representation is called the Boltzmann distribution or Gibbs distribution.

A graph is said to be a D map (for dependency map) of a distribution if every conditional indepedence statement satisfied by the distribution is reflected in the graph.

Thus a completely disconnected graph (no links) wil l be a trivial D map for any distribution.

A graph is said to be a I map (for independence map) if every conditional independence statement implied by the graph is satisfied by a specific distribution.

Clearly a fully connected graph will be a trivial I map for any distribution.

If is the case that every conditional independence property of the distribution is reflected in the graph, and vice versa, then graph is said to be a perfect map.

Inference in Graphical Models


  • graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png denotes a variable (can be scalar, vector, matrices, etc.)
  • graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png denotes a subset of variables graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png
  • graphical_models_003e80375fa7638a9064f8b558ee0f0d19d5c829.png denotes the set of factor nodes that are neighbours of graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png
  • graphical_models_1d78d6215e0f6431cc81b63de3eddd90d15863c7.png denotes the set of variables in the subtree connected to the variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png via a factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • graphical_models_c0902a7fe009edf65e5c238a4a01f013bdba12cc.png denotes the set of variables in the subtree connected to the variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png via a factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • fgraphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png depends on graphical_models_89a3397790b75c60c6de297434a2590dab3ee77e.png, which we denote graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png
  • graphical_models_1c88c057f84485f7e7fc2b322f4c7d686176c079.png represents the product of all the factors in the group associated with factor graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • Message from factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png to variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png:


  • Message from variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png to factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png:


    where graphical_models_c0902a7fe009edf65e5c238a4a01f013bdba12cc.png denotes the set of factors in the subtree connected to the factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png via the variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png

Factor graphs

  • Can write joint distribution over a set of variables in the form of a product of factors


    where graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png denotes a subset of the variables.

  • Directed graphs represent special cases in which factors graphical_models_28af69c88c33c0097756ca8602735ec5914b6108.png are local conditional distributions
  • Undirect graphs are a special case in which factors are potential functions over the maximal cliques (the normalizing coefficient graphical_models_35a140ad0c9652490119f9c6d18ea3095da2bd11.png can be viewed as a factor defined over the empty set of variables)

Converting distribution expressed in terms of undirected graph to factor graph

  1. Create variable nodes corresponding to the nodes in the original undirected graph
  2. Create additional factor nodes corresponding to the maximal cliques graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png
  3. Factors graphical_models_28af69c88c33c0097756ca8602735ec5914b6108.png are set equal to the clique potentials graphical_models_ce5d583a61678e9bddc18a63ef5b8acb183ec79d.png

There might be several factor graphs corresponding to the same undirected graph, i.e. this construction is not unique.

Sum-product algorithm

  • Problem: evaluate local marginals over nodes or subsets of nodes
  • Applicable to tree-structured graphs
  • Exploit structure of the graph to achieve:
    1. efficient, exact inference algorithm for finding marginals
    2. in situations where several marginals are required to allow computations to be shared efficiently
  • Summarized: barber2012

sum_product_part1.png sum_product_part2.png

Belief propagation is a special case of sum-product algorithm, which performs exact inference on directed graphs without loops. (You have already written about this)

For actual implementation it's useful to use the logarithmic version of the messages. See barber2012.

  • Marginal is obtained by summing the joint distribution over all variables except graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png so that


    where graphical_models_b04e29a498edca2060e5cecf54f8d9d1b8af7631.png denotes the set of variables in graphical_models_ed39d9a397196f8f0ce6388b0ea4e0c1dd8becee.png with the variable graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png omitted.

  • Joint distribution of the set of variables graphical_models_ed39d9a397196f8f0ce6388b0ea4e0c1dd8becee.png can be written as a product of the form


  • Rewriting the marginal above using the joint distribution notation:


    i.e. marginal is given by the product of all incoming messages arriving at node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png.

  • Each factor graphical_models_1c88c057f84485f7e7fc2b322f4c7d686176c079.png can be descrbied by a factor (sub-)graph and so can itself be factorized:


    where, for convenience, we have denoted the variable associated with factor graphical_models_daa8271e2ed6b587f3467b7789f0a0cd95d44cb6.png, in addition to graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png, by graphical_models_07b14bf3140c92cf62c1be71c940c43cee459070.png.

  • Message graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png to graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png can be rewritten:


    i.e. marginalizing over all the variables which graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png depend on, and the "messages" which are being sent from the variables to the factor graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png. Thus, to evaluate message sent by factor to variable along the link connceting them:

    1. Take product of incoming messages along all other links coming into the factor node
    2. Multiply by the factor associated with that node
    3. Marginalize over all of the variables associated with the incoming messages
  • Message graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png to graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png can be written:


    i.e. to evaluate messages passed from variables to adjacent factors we simply take the product of incoming messages the incoming messages along all other links.

  • Messages can be computed recursively
    • Start: if leaf-node is:
      • variable: message it sends along its one and only link is given by


      • factor: message it sends is simply


  • Finally, if we wish to find the marginal distributions graphical_models_d54ac1fd1dc1fe01979a4cd514c2913a2660ab91.png associated with the sets of variables belonging to each factors:



The sum-product algorithm can then be summarized as:

  1. Variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png is chosen as root of the factor graph
  2. Messages are initiaed at the leaves of the graph, if leaf is:
    • variable: message it sends along its one and only link is given by


    • factor: message it sends is simply


  3. Message passing steps are applied recursively until messages have been propagated along every link:




  4. Once root node has received messages from all of its of its neighbors, the marginal can be evaluated by:


    or, if one wants the marginal distribution graphical_models_d54ac1fd1dc1fe01979a4cd514c2913a2660ab91.png associated with the sets of variables belong to each of the factors:


  5. If we started from a:
    • directed graph: already correctly normalized
    • undirected graph: started with unknown normalization constant graphical_models_35a140ad0c9652490119f9c6d18ea3095da2bd11.png. From steps 1-4 we now have unnormalized marginal graphical_models_df826dcd5f3c217da58f0f0182cb55531e4cebb3.png, which we can marginalize over a single one of these graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png (holding all other variables constant), which then gives us the normalization constant:


Could find marginals for every variable in the node in the graph by first passing messages from leaf to root, as in sum-product algorithm, and then when receiving all messages to the root, pass messages from the root back to the leaves.

Then we would have computed the messages along each of the links / edges in both directions, hence we could simply marginalize over these messages to obtain the wanted marginal.

Max-sum algorithm

  • Problem: find most probable state or sequence:


    which will have the joint probability


  • Summarized: barber2012


Figure 1: barber2012

  • Will make use of


    which holds if graphical_models_17d13b37eb97f5258609b9c4f50c8e4a147f4da2.png (always true for factors in graphical models!)

  • Work with graphical_models_2bb56c8ea50dceff8f0c2c90cbbec3bbb59537d8.png instead (for numerical purposes), i.e.


    where the distributive property is preserved since


    hence we replace products with sums.


In setting any variable into a particular state we need to surgically remove all parental links of that variable.

This is called the do operator, and constrants observational ("see") inference graphical_models_7b21dda0564051f333f63fb274f66ecd8ca6d490.png with a casual ("make" or "do") inference graphical_models_394d742d6b074656ef05ff1c750eadb0a2ca1bab.png.

Formally, let all variables graphical_models_fb418593f861516616e0cfba2a36c194db631a56.png be written in terms of the interventon variables graphical_models_10ddd4f51c00c7d25eb40ad30a8fd9630274e144.png and the non-intervention variables graphical_models_53d76e66ece4bf63a774e103ed1fb4b04652ae4c.png.

For a Belief Network graphical_models_fec0942488d25ce7a04868049b94a20037d8127b.png, inferring the effect of setting variables graphical_models_c9988e92acaea677b0226abfd47344665fa9f92d.png, in states graphical_models_82bebf7fec0fe63abe739427b5852d005ccd8604.png, is equivalent to standard evidential inference in the post intervention distribution:


where any parental variable inclded in the intervention set is set to its intervention state. An alternative notation is


Intervention here refers to the "act of intervening", as in making some variable take on a particular state, i.e. we're interferring with the "total" state.


  • [barber2012] Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press (2012).