Graphical models

Table of Contents

Notation

  • graphical_models_da88015a6ebb980cd5f5027bf3834222f55a8107.png denotes the parent nodes of the rv. graphical_models_0207be880056b9a69e22e729dd37bced29cd174a.png in the BN
  • graphical_models_e20dcda5f035650122343c61053a7c3ad6acacaa.png denotes a clique
  • graphical_models_b842f34d1222c4678df9b0ee65af007a264d0f81.png denotes a set of variables in the clique graphical_models_e20dcda5f035650122343c61053a7c3ad6acacaa.png
  • graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png are potential functions

Markov Random Fields

A Markov random field, also known as a Markov network or an undirected graphical model, has a set of nodes each of which corresponds to a variable or group of variables, as well as a set of links each of which connects a pair of nodes.

The links are undirected, i.e. do not carry arrows.

Belief Network

A belief network is a distribution of the form

graphical_models_5e48b14096a37f6cd3c05d751170207f41ef0969.png

where graphical_models_f9f07a49fd74596d9a2d01866fd1d2cbecf0a505.png denotes the parental variables of variable graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png

This can be written as a DAG, with the i-th node in the graph corresponding to the factor graphical_models_af3ec5df73b411165e19b04a7901a090d4000096.png.

Factorization properties

  • Without loss of generality, only consider maximal cliques, since all other cliques are subsets

The joint distribution is written as a product of potential functions graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png over the maximal cliques of the graph

graphical_models_614cc1201eb832f43d57aa13bd57eae037cb5c10.png

where the partition function is given by

graphical_models_30f34d04ab98d7cd1c930a12855e84f8374b7d32.png

which ensures that the distribution graphical_models_1ed25cf746e36b58c8d980488e7f49fe0ccb1b39.png is correctly normalized.

To make a formal connection between the conditional independence and factorization for undirected graphs, we restrict attention to potential functions graphical_models_99697882edd351397d90bdc69dc329b82aca2312.png that are strictly positive, thus we can express them as:

graphical_models_c27619f6c1d6adc604b84520ecc2a215e904a130.png

where graphical_models_8853e473e7d8a944e8d8d7e080a19c817957ffa9.png is called the energy function , and the exponential representation is called the Boltzmann distribution or Gibbs distribution.

A graph is said to be a D map (for dependency map) of a distribution if every conditional indepedence statement satisfied by the distribution is reflected in the graph.

Thus a completely disconnected graph (no links) wil l be a trivial D map for any distribution.

A graph is said to be a I map (for independence map) if every conditional independence statement implied by the graph is satisfied by a specific distribution.

Clearly a fully connected graph will be a trivial I map for any distribution.

If is the case that every conditional independence property of the distribution is reflected in the graph, and vice versa, then graph is said to be a perfect map.

Inference in Graphical Models

Notation

  • graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png denotes a variable (can be scalar, vector, matrices, etc.)
  • graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png denotes a subset of variables graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png
  • graphical_models_003e80375fa7638a9064f8b558ee0f0d19d5c829.png denotes the set of factor nodes that are neighbours of graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png
  • graphical_models_1d78d6215e0f6431cc81b63de3eddd90d15863c7.png denotes the set of variables in the subtree connected to the variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png via a factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • graphical_models_c0902a7fe009edf65e5c238a4a01f013bdba12cc.png denotes the set of variables in the subtree connected to the variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png via a factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • fgraphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png depends on graphical_models_89a3397790b75c60c6de297434a2590dab3ee77e.png, which we denote graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png
  • graphical_models_1c88c057f84485f7e7fc2b322f4c7d686176c079.png represents the product of all the factors in the group associated with factor graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png
  • Message from factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png to variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png:

    graphical_models_eff817fdb48874c120c64905b08af0a4559e9430.png

  • Message from variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png to factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png:

    graphical_models_5a7b30f40b34166ad22cd3efc06e0e465f2f92a9.png

    where graphical_models_c0902a7fe009edf65e5c238a4a01f013bdba12cc.png denotes the set of factors in the subtree connected to the factor node graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png via the variable node graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png

Factor graphs

  • Can write joint distribution over a set of variables in the form of a product of factors

    graphical_models_fd322b992fdc86bacf71e2d9b9a41ebbbead93c8.png

    where graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png denotes a subset of the variables.

  • Directed graphs represent special cases in which factors graphical_models_28af69c88c33c0097756ca8602735ec5914b6108.png are local conditional distributions
  • Undirect graphs are a special case in which factors are potential functions over the maximal cliques (the normalizing coefficient graphical_models_35a140ad0c9652490119f9c6d18ea3095da2bd11.png can be viewed as a factor defined over the empty set of variables)

Converting distribution expressed in terms of undirected graph to factor graph

  1. Create variable nodes corresponding to the nodes in the original undirected graph
  2. Create additional factor nodes corresponding to the maximal cliques graphical_models_b261d99faf9dfd9111794d0f799206942eeba277.png
  3. Factors graphical_models_28af69c88c33c0097756ca8602735ec5914b6108.png are set equal to the clique potentials graphical_models_ce5d583a61678e9bddc18a63ef5b8acb183ec79d.png

There might be several factor graphs corresponding to the same undirected graph, i.e. this construction is not unique.

Sum-product algorithm

  • Problem: evaluate local marginals over nodes or subsets of nodes
  • Applicable to tree-structured graphs
  • Exploit structure of the graph to achieve:
    1. efficient, exact inference algorithm for finding marginals
    2. in situations where several marginals are required to allow computations to be shared efficiently
  • Summarized: barber2012

sum_product_part1.png sum_product_part2.png

Belief propagation is a special case of sum-product algorithm, which performs exact inference on directed graphs without loops. (You have already written about this)

For actual implementation it's useful to use the logarithmic version of the messages. See barber2012.

Derivation
  • Marginal is obtained by summing the joint distribution over all variables except graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png so that

    graphical_models_5a71cd6462c2cf0a24bbce16136f3d32943cebd3.png

    where graphical_models_b04e29a498edca2060e5cecf54f8d9d1b8af7631.png denotes the set of variables in graphical_models_ed39d9a397196f8f0ce6388b0ea4e0c1dd8becee.png with the variable graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png omitted.

  • Joint distribution of the set of variables graphical_models_ed39d9a397196f8f0ce6388b0ea4e0c1dd8becee.png can be written as a product of the form

    graphical_models_fbb8c3d2b47c64a0691087cb82a9ae99e0efa707.png

  • Rewriting the marginal above using the joint distribution notation:

    graphical_models_37de6e7db638713bfcd7686afa87496d64151597.png

    i.e. marginal is given by the product of all incoming messages arriving at node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png.

  • Each factor graphical_models_1c88c057f84485f7e7fc2b322f4c7d686176c079.png can be descrbied by a factor (sub-)graph and so can itself be factorized:

    graphical_models_39a61fad7a49b9fbc821531beebbabf33cf77167.png

    where, for convenience, we have denoted the variable associated with factor graphical_models_daa8271e2ed6b587f3467b7789f0a0cd95d44cb6.png, in addition to graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png, by graphical_models_07b14bf3140c92cf62c1be71c940c43cee459070.png.

  • Message graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png to graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png can be rewritten:

    graphical_models_7541936d2542cde88c14f117a4f0b7a2bae38a43.png

    i.e. marginalizing over all the variables which graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png depend on, and the "messages" which are being sent from the variables to the factor graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png. Thus, to evaluate message sent by factor to variable along the link connceting them:

    1. Take product of incoming messages along all other links coming into the factor node
    2. Multiply by the factor associated with that node
    3. Marginalize over all of the variables associated with the incoming messages
  • Message graphical_models_a251bc25c59730a4f11ac86802be0523baafb2f0.png to graphical_models_762cf0fd00793655d58ce4bac08fb4d97e17f7bb.png can be written:

    graphical_models_7372999f2f82d577bc0fd07ab49aceb795f29ad4.png

    i.e. to evaluate messages passed from variables to adjacent factors we simply take the product of incoming messages the incoming messages along all other links.

  • Messages can be computed recursively
    • Start: if leaf-node is:
      • variable: message it sends along its one and only link is given by

        graphical_models_f2d4b06439fcca48ba33c0cd0908397da6e49307.png

      • factor: message it sends is simply

        graphical_models_b1929f57d70204bd9945f4e1b03044ed4a74e4a5.png

  • Finally, if we wish to find the marginal distributions graphical_models_d54ac1fd1dc1fe01979a4cd514c2913a2660ab91.png associated with the sets of variables belonging to each factors:

    graphical_models_d8eeac96594f1c5c875f0fb7ae8b9395c2286602.png

Summarized

The sum-product algorithm can then be summarized as:

  1. Variable node graphical_models_3c314f80373742988ad542a6d4ce66a111b17847.png is chosen as root of the factor graph
  2. Messages are initiaed at the leaves of the graph, if leaf is:
    • variable: message it sends along its one and only link is given by

      graphical_models_f2d4b06439fcca48ba33c0cd0908397da6e49307.png

    • factor: message it sends is simply

      graphical_models_b1929f57d70204bd9945f4e1b03044ed4a74e4a5.png

  3. Message passing steps are applied recursively until messages have been propagated along every link:

    graphical_models_1064c67fbbe20e3325d5846a67a7df82103fb589.png

    and

    graphical_models_7372999f2f82d577bc0fd07ab49aceb795f29ad4.png

  4. Once root node has received messages from all of its of its neighbors, the marginal can be evaluated by:

    graphical_models_97457eaf3c4a4d82889c22305b5248588cec5971.png

    or, if one wants the marginal distribution graphical_models_d54ac1fd1dc1fe01979a4cd514c2913a2660ab91.png associated with the sets of variables belong to each of the factors:

    graphical_models_ea85aeaf323cc4998dc852bf053e1affbe1656b5.png

  5. If we started from a:
    • directed graph: already correctly normalized
    • undirected graph: started with unknown normalization constant graphical_models_35a140ad0c9652490119f9c6d18ea3095da2bd11.png. From steps 1-4 we now have unnormalized marginal graphical_models_df826dcd5f3c217da58f0f0182cb55531e4cebb3.png, which we can marginalize over a single one of these graphical_models_e9f4d218474bc10aa94958ec30139aee865c0173.png (holding all other variables constant), which then gives us the normalization constant:

      graphical_models_65911b8c03bedc596b0e4b87e6d2f95cb13b09f5.png

Could find marginals for every variable in the node in the graph by first passing messages from leaf to root, as in sum-product algorithm, and then when receiving all messages to the root, pass messages from the root back to the leaves.

Then we would have computed the messages along each of the links / edges in both directions, hence we could simply marginalize over these messages to obtain the wanted marginal.

Max-sum algorithm

  • Problem: find most probable state or sequence:

    graphical_models_b6b2b7688d4c264e1e28dc4763bfdaf2c4452945.png

    which will have the joint probability

    graphical_models_bee4a84c56d0fc80167c1e5952743eb98e0897e1.png

  • Summarized: barber2012

max_product.png

Figure 1: barber2012

Derivation
  • Will make use of

    graphical_models_5cbd22d427f66fe3cb50460cddb9c443c7641688.png

    which holds if graphical_models_17d13b37eb97f5258609b9c4f50c8e4a147f4da2.png (always true for factors in graphical models!)

  • Work with graphical_models_2bb56c8ea50dceff8f0c2c90cbbec3bbb59537d8.png instead (for numerical purposes), i.e.

    graphical_models_97d5952f6a007860bc465dd5a267029c1bed82c2.png

    where the distributive property is preserved since

    graphical_models_dcb422c2740ed8b5d3acaf1dee0a71a3ac52724a.png

    hence we replace products with sums.

Causality

In setting any variable into a particular state we need to surgically remove all parental links of that variable.

This is called the do operator, and constrants observational ("see") inference graphical_models_7b21dda0564051f333f63fb274f66ecd8ca6d490.png with a casual ("make" or "do") inference graphical_models_394d742d6b074656ef05ff1c750eadb0a2ca1bab.png.

Formally, let all variables graphical_models_fb418593f861516616e0cfba2a36c194db631a56.png be written in terms of the interventon variables graphical_models_10ddd4f51c00c7d25eb40ad30a8fd9630274e144.png and the non-intervention variables graphical_models_53d76e66ece4bf63a774e103ed1fb4b04652ae4c.png.

For a Belief Network graphical_models_fec0942488d25ce7a04868049b94a20037d8127b.png, inferring the effect of setting variables graphical_models_c9988e92acaea677b0226abfd47344665fa9f92d.png, in states graphical_models_82bebf7fec0fe63abe739427b5852d005ccd8604.png, is equivalent to standard evidential inference in the post intervention distribution:

graphical_models_323bd50a37bc162eb690e91d50c96faeed9c298f.png

where any parental variable inclded in the intervention set is set to its intervention state. An alternative notation is

graphical_models_24cb17a6d5f2134b1b596c8414327d75ea78a18c.png

Intervention here refers to the "act of intervening", as in making some variable take on a particular state, i.e. we're interferring with the "total" state.

Bibliography

  • [barber2012] Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press (2012).