Graph Theory

Table of Contents

Notation

  • graph_theory_cd688c6eee1ca6434b864971de956368f81e50f7.png denotes set of vertices in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png (graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png is used when there is no scope for ambiguity)
  • graph_theory_c85b3b9bf3f64f7e7c181ebeffb64cad96aff734.png denotes set of edges in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png (graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is used when there is no scope for ambiguity)
  • graph_theory_f07b932b12c91bca424fd428d383495413b5b6a9.png and graph_theory_fe52c58a96549528260fe0d7d04caf390dd47865.png denotes # of vertices and edges, respectively, unless stated otherwise
  • graph_theory_c7fba9fc5da1ed024e7e601960cce476921a74cb.png denotes the incidence function which maps an edge to the set of vertices connected by this edge
  • graph_theory_2bd7a39487a4c620e744bb72fabc0c9790f85909.png denotes the set of neigbours of a vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png
  • A set graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png, together with a set graph_theory_fbe98aaf8359e7eed3ba031caeaf6a3c13ae8690.png of two-element subsets of graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png, defines a simple graph graph_theory_d53a16f7b121fd8b4c5f047530f753bdcd4b95b9.png, where the ends of an edge graph_theory_71362c7cfff0cecc62fb17e777c41c99adab7d67.png are precisely the vertices graph_theory_250c724db45bd043e48d12edb0b92c91430f27a2.png
  • incident matrix is graph_theory_92a34268abe3452db5b3c17f734bc5b045d8ed93.png matrix graph_theory_5ee2b542e5afa24aa5667ff38f610714fd66b63a.png, where graph_theory_50a95975bf26677c9d619294fd1da2e9030386e1.png is # of times (0, 1, or 2) that vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png and egde graph_theory_eaac7646cb022aa963b396c0e936a271d8f5e34f.png are incident
  • adjacency matrix is graph_theory_812a7be1c6b4e6d0c54540febd3d26b36f71738b.png matrix graph_theory_dad6c004a3c06bfba95c6ec7b86bf8777e5d3f80.png, where graph_theory_54a99099afaf1358279325dad050ced404c62a90.png is # of edges joining graph_theory_de4b3256c9441a9fb95ba92da7c59c7d1f70d1c7.png and graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png, with each loop counting as two edges
  • biparite adjaceny matrix of some biparite graph graph_theory_75de11970197d8a5c4ca993f5a411070c382463e.png is the graph_theory_51239909b4d330c428ed2eb8eca7dbfb8cf9aaf4.png matrix graph_theory_a55cbf5a59a92b906c7b87cffe7ccdaeb514b3ac.png, where graph_theory_99b358db6cadb8ab3ac805ee4a7cda51d5fe84f4.png is # of edges joining graph_theory_84548688b41e8f40f42c4548eab80934e69ebc1b.png and graph_theory_074586a88f47170b63fa8c3fd039d6de046ae6d8.png
  • graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png denotes a sub-graph of some graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, i.e. graph_theory_499f8574e7fe0fe78bac864e9afafe4a037254be.png (unless otherwise specified)

Definitions

Terminology

We say a graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is densily connected at vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png if graph_theory_09fd7d33ab83023ffe772d9b176718e41225cb99.png is "large".

If we have a cluster of such vertices, we might refer to this cluster as being assortative ; in contrast we refer to clusters of vertices which connect similarily to the rest of the graph (vertices "outside" of the cluster) without necessarily having higher inner density, as disassortative communities / clusters.

General

A tree is a undirected graph in which any two vertices are connected by exactly one path.

E.g. a acyclic connected graph is a tree.

Sometimes a tree also has the additional constraint that every node has at most a single parent, and polytrees refer to what we just defined.

A spanning tree graph_theory_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png of an undirected graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is a subgraph that is a tree which includes all of the vertices of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, with miniumum possible number of edges.

In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree.

If all of the edges of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png are also edges of a spanning tree graph_theory_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, then graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is a tree and is identical to graph_theory_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png (i.e. a tree has a unique spanning tree and it's itself).

A cut is a partition of a the vertices of a graph into two disjoint subsets.

Any cut determines a cut-set, that is the set of edges which have one endpoint in each subset of the partition.

A simple graph is a graph where both multiple edges and self-loops are dissallowed.

Thus, in a simple graph with graph_theory_f07b932b12c91bca424fd428d383495413b5b6a9.png vertices, each vertex is at most of degree graph_theory_c0bc17f451d8a243f5e125bf2d4efbf8877ed765.png.

Let graph_theory_a5177eee309f705d90af4a252c1cb0e6a1e3a05d.png and graph_theory_5bc7790d720315b839e08b76ac44d78dcffe855b.png be two graphs.

Graph graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png is a sub-graph of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, written graph_theory_499f8574e7fe0fe78bac864e9afafe4a037254be.png, if

graph_theory_acdaf81a06be0f954ae794aec469e425455f4ed9.png

If graph_theory_499f8574e7fe0fe78bac864e9afafe4a037254be.png contains all of the edges graph_theory_3fa5d1f7af7e6a16e052a97b9c0ac6a6c38b7496.png with graph_theory_64eb5f2e864581766e40383e0307a0ec39957b37.png, then graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png is an induced sub-graph of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png.

When graph_theory_b12b454c9b7f07eafb7d56f1677b093d26a38a8e.png and there exists an isomorphism between the sub-graph graph_theory_3d9c1248fcc6b307af0f63960c5fd592fa860e64.png and a graph graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png, this mapping represents an appearance of graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png in graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png.

A graph is called recurrent or frequent in graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, when its frequency graph_theory_696e95e98c496dfbae263c325aba6107d245834d.png is aboe a predefined threshold or cut-off value.

Intersection graph

An intersection graph is a graph that represents the pattern of intersections of a family of sets.

Formally, an intersection graph is an undirected graph formed from a family of sets

graph_theory_dec1a5a1fa57e270d644beb2dcb0eccc9cb43e80.png

by creating one vertex graph_theory_5d4395016d30850d955a0eb3b9c99ae498007d3d.png for each set graph_theory_7166a9c57a06a3bf27ae0bab9601df91d707d194.png, and connecting two vertices graph_theory_5d4395016d30850d955a0eb3b9c99ae498007d3d.png and graph_theory_1bf6b6b25c0eeaeb6ada96b941136a791fc2ef4f.png by an edge whenever the corresponding two sets have a nonempty intersection, i.e.

graph_theory_bbcd1b15062b60a36078abd4ddf4975f46e679cf.png

Chordal graph

A chordal graph is one in which all cycles of 4 or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle.

Sometimes these are referred to as triangulated graphs.

Network motifs

Network motifs are defined as recurrent and statistically significant sub-graphs or patterns in a graph.

There is an ensemble graph_theory_5d76da979cb12ab27fa71a7f9732248b7a3f84a5.png of random graphs corresponding to the null-model associated to graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png. We should choose graph_theory_e10e2b430f95617381cdd6d6b52aed29fb971dff.png random graphs uniformly from graph_theory_5d76da979cb12ab27fa71a7f9732248b7a3f84a5.png and calculate the frequency for a particular frequent sub-graph graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png in graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png.

If the frequency of graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png in graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is higher than its arithmetic mean frequency in graph_theory_e10e2b430f95617381cdd6d6b52aed29fb971dff.png random graphs graph_theory_d9e2fed90d2d92222b37a4b9f57c4bedecd8a86f.png, we acll this recurrent pattern significant and hence treat graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png as a network motif for graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png.

For a small graph graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png, the network graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png and a set of randomized networks graph_theory_4db380bbc2b588482717fcc34fd16095956ed802.png wher graph_theory_727061f88e9b72c468ad5062377439313222ec52.png, the z-score of graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png is defined as:

graph_theory_945dd0addff065b17fce6495a4b8409aab83e4fd.png

where graph_theory_add7a5b453f8775f39303aad0f4866b761095831.png and graph_theory_fc8ee8ce017f39deecff4e28af32b312604e85fa.png stand for mean and std. dev. frequency in set graph_theory_5379e2a89d5f2dae09fa94f02400cbce94575e68.png respectively.

The larger the graph_theory_93fe2549a65bf3859f7ca6daa56cfe27e54c8756.png, the more significant is the sub-graph graph_theory_8742c133599a272857583ac2b89b9502a78aa8c4.png as a motif.

Clique

A clique is defined as a subset of nodes in a graph such that there exists a link between all pairs of nodes in the subset.

In other words, a clique is fully connected.

A maximal clique is a clique such that it is not possible to include any other nodes from the graph in the set without it ceasing to be a clique.

Theorems

Book: Graph Theory

1. Graphs

Notation

  • graph_theory_cd688c6eee1ca6434b864971de956368f81e50f7.png denotes set of vertices in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png (graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png is used when there is no scope for ambiguity)
  • graph_theory_c85b3b9bf3f64f7e7c181ebeffb64cad96aff734.png denotes set of edges in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png (graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is used when there is no scope for ambiguity)
  • graph_theory_f07b932b12c91bca424fd428d383495413b5b6a9.png and graph_theory_fe52c58a96549528260fe0d7d04caf390dd47865.png denotes # of vertices and edges, respectively, unless stated otherwise
  • graph_theory_c7fba9fc5da1ed024e7e601960cce476921a74cb.png denotes the incidence function which maps an edge to the set of vertices connected by this edge
  • graph_theory_2bd7a39487a4c620e744bb72fabc0c9790f85909.png denotes the set of neigbours of a vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png
  • A set graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png, together with a set graph_theory_fbe98aaf8359e7eed3ba031caeaf6a3c13ae8690.png of two-element subsets of graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png, defines a simple graph graph_theory_d53a16f7b121fd8b4c5f047530f753bdcd4b95b9.png, where the ends of an edge graph_theory_71362c7cfff0cecc62fb17e777c41c99adab7d67.png are precisely the vertices graph_theory_250c724db45bd043e48d12edb0b92c91430f27a2.png
  • incident matrix is graph_theory_92a34268abe3452db5b3c17f734bc5b045d8ed93.png matrix graph_theory_5ee2b542e5afa24aa5667ff38f610714fd66b63a.png, where graph_theory_50a95975bf26677c9d619294fd1da2e9030386e1.png is # of times (0, 1, or 2) that vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png and egde graph_theory_eaac7646cb022aa963b396c0e936a271d8f5e34f.png are incident
  • adjacency matrix is graph_theory_812a7be1c6b4e6d0c54540febd3d26b36f71738b.png matrix graph_theory_dad6c004a3c06bfba95c6ec7b86bf8777e5d3f80.png, where graph_theory_54a99099afaf1358279325dad050ced404c62a90.png is # of edges joining graph_theory_de4b3256c9441a9fb95ba92da7c59c7d1f70d1c7.png and graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png, with each loop counting as two edges
  • biparite adjaceny matrix of some biparite graph graph_theory_75de11970197d8a5c4ca993f5a411070c382463e.png is the graph_theory_51239909b4d330c428ed2eb8eca7dbfb8cf9aaf4.png matrix graph_theory_a55cbf5a59a92b906c7b87cffe7ccdaeb514b3ac.png, where graph_theory_99b358db6cadb8ab3ac805ee4a7cda51d5fe84f4.png is # of edges joining graph_theory_84548688b41e8f40f42c4548eab80934e69ebc1b.png and graph_theory_074586a88f47170b63fa8c3fd039d6de046ae6d8.png
  • graph_theory_40d7edb4e628ae5ee727132056085ba8d76f92e2.png denotes the degree of graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png, i.e. # of edges incident with graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png, each loop counting as two edges
  • graph_theory_996dd76daf1aa11704e2fec1e626afa7d92ffb94.png and graph_theory_91a7b3be008cf33a3d12e0a41aed42f44214805e.png are the minimum and maximum degrees of the vertices of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png
  • graph_theory_814309bd7f23083c6e2fb47cf63bdd2ae6a486a4.png is the average degree of the vertices of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png

1.1 Graphs and Their Representations

A graph is an ordered pair graph_theory_642d4dc8ac7c6dcca14ddf76e308bb524035609c.png consisting of a set graph_theory_cd688c6eee1ca6434b864971de956368f81e50f7.png of vertices and a set graph_theory_c85b3b9bf3f64f7e7c181ebeffb64cad96aff734.png, disjoint from graph_theory_cd688c6eee1ca6434b864971de956368f81e50f7.png, of edges, together with an incidence function graph_theory_340f706b7fcec770158aaa064371fa925c6d29f3.png that associates with each edge of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png an unordered pair of vertices of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png.

If graph_theory_eaac7646cb022aa963b396c0e936a271d8f5e34f.png is an edge and graph_theory_de4b3256c9441a9fb95ba92da7c59c7d1f70d1c7.png and graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png are vertices such that graph_theory_1a94f66b2f735ac553a8db0c89043bac00f3a422.png.

A biparite graph is a simple graph where the vertices graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png can be partitioned into two subsets graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png and graph_theory_3ca302b80fb078e124ac6b194794815069394f1a.png such that every edge has one end in graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png and one end in graph_theory_3ca302b80fb078e124ac6b194794815069394f1a.png; such a partition graph_theory_3a2e2752cac74a1a51b468d725a7cfaf3ff49621.png is called a bipartition of the graph, and graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png and graph_theory_3ca302b80fb078e124ac6b194794815069394f1a.png its parts.

A path is a simple graph whose vertices can be arranged in a linear sequence s.t. two vertices are adjacent if they are consecutive in the sequence, and are non-adjacent otherwise.

A cycle on three or more vertices in a simple graph whose vertices can be arranged in a cyclic sequence s.t. two vertices are adjacent if they are consecutive in the sequence, and are non-adjacent otherwise.

A graph is connected if and only if for every partition of its vertex set into two nonempty sets graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png and graph_theory_3ca302b80fb078e124ac6b194794815069394f1a.png, there is an edge with one end in graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png and one end in graph_theory_3ca302b80fb078e124ac6b194794815069394f1a.png; otherwise the graph is disconnected.

A graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is k-regular if

graph_theory_ada20d2f24684a20f4d277fce229daf19e7ec166.png

A regular graph is one that is k-regular for some graph_theory_094b02afce734f4ce51933d0093ef3d2da9f8123.png.

Counting in two ways is a proof-technique where we consider a suitable matrix and compute the sum of its entries in two different ways:

  1. sum of its row sums
  2. sum of its column sums

Equating these two quantities results in an identity.

Exercises
  • 1.1.5
    • Question

      For graph_theory_279dff3210a4ce50d1f61a577a84a324f4a569ae.png, characterize the k-regular graphs.

    • Answer

      Let graph_theory_3998a097cc93a9c56a25086ff51c2b65340d5225.png, i.e.

      graph_theory_d1be26e4acaa27414cf3ee2411b79475ce0ede67.png

      which is simply a graph without any connections between the vertices.

      For graph_theory_76ad4212542b94774b27bea1e8dceb373f0f448c.png, we have

      graph_theory_38c418b104ca4c81764bd0fd6dfa3adefd125ce5.png

      For any graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png we then the following possibilities:

      • connected to itself, thus not connected to any other vertices, which in turn implies that we can consider two graphs; graph_theory_4a2ea331936e2aa856380cb049c383a83560c09f.png and graph_theory_f008d8ef1d8cfba066aa887b74a939887607dc9f.png
      • connected to another single vertex graph_theory_c3b570332bc2bf31fdcbbe8ddda1cc17432a8b98.png: a simple sequence of vertices, i.e. a path, since each vertex is connected only to one other

      In total, a 1-regular graph can be separeted into graphs consisting of a single vertex with no edges (for all the vertices connected by loops) and a single graph which is a path.

      For graph_theory_d1ce0769b32ee0f544eb7d7cfd83ff873317145f.png, we have

      graph_theory_5368c641c1d8b9508120dd9b8bcdc5e2a0b560d5.png

      For any graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png we then have the following possibilities:

      • both edges are loops: we can consider two graphs; graph_theory_4a2ea331936e2aa856380cb049c383a83560c09f.png and graph_theory_f008d8ef1d8cfba066aa887b74a939887607dc9f.png
      • both edges are not loops: here we can have two cases
        • graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png is part of a cycle graph_theory_32e55c6e155c100dc50ef35a36a56df33f4c0687.png: can consider two separate graphs graph_theory_d53dcbf9eb8abf6d1bda38c53fa5c4bde4780684.png and graph_theory_6616644ddc22ce0dc00e0afdfeaafea50066e8d7.png, since every graph_theory_6aa7eb11b3d2739b1c82279acb6b5621bf5078e9.png is connected to two other vertices in graph_theory_dfb0326eb7fa86ea469405d7bf5fac6d160e2a15.png for graph_theory_d53dcbf9eb8abf6d1bda38c53fa5c4bde4780684.png to form a cycle. graph_theory_d53dcbf9eb8abf6d1bda38c53fa5c4bde4780684.png is then simply a polygon
        • graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png is not a part of a cycle: we have a simple graph
      • one edge is loop, one edge is non-loop: graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png can then only be connected to a single graph_theory_0cc3cd03378d1c11d1f3d1517eef8436aa58e99e.png which also has one loop and one non-loop, thus they form:

        ex1_5_5.png

        cycle.png

      All in all, we can separate a 2-regular graph into graphs of the following classification:

      • empty graphs
      • k-cycles
      • graphs of the form 1

Terminology

loop
edge with identical ends
link
edge with distinct ends
parallel edges
two or more links with the same pair of ends
null graph
graph with zero vertices
trivial graph
graph with one vertex
complete graph
simple graph in which any two vertices are adjacent
empty graph
simple graph in which no two vertices are adjacent, i.e. graph_theory_83cf33cafaed526a59ce6f4ac19851db4ae94c8c.png
k-path
path of length graph_theory_094b02afce734f4ce51933d0093ef3d2da9f8123.png
k-cycle
cycle of length graph_theory_094b02afce734f4ce51933d0093ef3d2da9f8123.png
planar graph
graph which can be drawn in a plane (2D) s.t. edges meet only at points corresponding to their common ends, i.e. no intersections when drawn in a 2D plane, where the drawing itself is called a planar embedding
embedding
"drawing" of a graph on some n-dimensional structure so that its edges intersect only at their ends

Book: Bayesian Reasoning and Machine Learning

Notation

  • graph_theory_da88015a6ebb980cd5f5027bf3834222f55a8107.png denotes the parent nodes of the rv. graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png in the BN

7. Making Decisions

Notation

  • graph_theory_92e49821f75d8a8bdcd063e02e781c73fb366bcd.png denotes the expected utility of decision graph_theory_b689cba8d7566f6adaf605a844e193a27e155078.png over the set of rvs. graph_theory_76879b948635123ecd29d7cd65a1060145ba040e.png

7.3 Extending Bayesian Networks for Decisions

An Influence Diagram (ID) states which information is required in order to make each decision, and the order in which these decisions are to be made.

7.4 Solving Influence Diagrams

A decision potential on a clique graph_theory_e20dcda5f035650122343c61053a7c3ad6acacaa.png contains two potentials: a probability potential graph_theory_31c02b128273ae1118e243a30100316f0356bd18.png and a utility potential graph_theory_9648b8261a1e2792c09d4c880eef1e1b376f473c.png. The join potentials for the junction tree are defined as

graph_theory_ff6f9d2a5df6d4762ed9a6c13b5598c81007ba4b.png

with the junction tree representing the term graph_theory_45de47c57a020c08e43b72b54b23fdad2bd163d0.png.

But for IDs there are constraints on the triangulation since we need to preserve the partial ordering induced by the decisions, which gives rise to strong Junction Tree.

  1. Remove information edges
  2. Moralization: marry all parents of the remaining nodes
  3. Remove utility nodes: remove the utility nodes and their parental links
  4. Strong triangulation: form a triangulation based on an elimination order which obeys the partial ordering of the variables.
  5. Strong Junction Tree: From the strongly triangulated graph, form a junction tree and orient the edges towards the strong root (the clique that appears last in the eliminiation sequence).

Spectral Clustering

Notation

  • graph_theory_5d4395016d30850d955a0eb3b9c99ae498007d3d.png denotes the i-th vertex
  • graph_theory_813c3887ca6c4f15c269d4f538a036f2b167ac1a.png when graph_theory_5d4395016d30850d955a0eb3b9c99ae498007d3d.png is adjacent to graph_theory_1bf6b6b25c0eeaeb6ada96b941136a791fc2ef4f.png
  • graph_theory_5589994da098fe45f4407bc86f0d24b6027d155e.png denotes the edge between graph_theory_5d4395016d30850d955a0eb3b9c99ae498007d3d.png and graph_theory_1bf6b6b25c0eeaeb6ada96b941136a791fc2ef4f.png
  • Edges are non-negatively weighted with the weight matrix graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png
  • graph_theory_b689cba8d7566f6adaf605a844e193a27e155078.png is the diagonal matrix of graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png, also called the degree matrix, defined as:

    graph_theory_779b07fb51207783f94852bbed381ec119cff2c6.png

  • graph_theory_25f4e4c75097921d0a5829443961bcfbd8fe2cfd.png denotes the unnormalized graph Laplacian
  • graph_theory_266088085bdc5922fbb35bdc23939ff16099523e.png denotes the normalized graph Laplacian

Stuff

  • Want to partition a graph in parts which are roughly the same size and are connected by as few edges as possible, or informally, "as disjoint as possible"
    • NP-hard (even approximations are difficult)
  • Good heuristic: write cost function as a quadratic form and relaxing the discrete optimization problem of finding the characteristic function of the cut to a real-valued optimization problem
    • Leads to an eigenproblem

The volume of a subset of vertices graph_theory_707218c0ea8696b5447a93938aa8f21d1e2cce76.png is the total degree of all vertices in the subset:

graph_theory_272c949a01d6832c900e1c62d885a8411894dfbd.png

If graph_theory_707218c0ea8696b5447a93938aa8f21d1e2cce76.png is a set of vertices of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, we define it's edge boundary graph_theory_8bbb82e1012761b2b940c8ec2a702d2c350e3429.png to be the set of edges connecting graph_theory_a7a9870d411fa7be56753e7bbd48731b3acb06b9.png and its complement graph_theory_d6fe0ad0beff24bf824da48126e3c358dbc2edbb.png.

Similiarily, the volume of graph_theory_8bbb82e1012761b2b940c8ec2a702d2c350e3429.png is defined as

graph_theory_bc50b2cd357b7a44fa9de37474291d3f507e3d32.png

This quantity is also known as the given by graph_theory_a7a9870d411fa7be56753e7bbd48731b3acb06b9.png and graph_theory_07e493b8df25595b9aed62aadeafdbaf82f4666a.png.

The minimum cut of a graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png refers to the partition graph_theory_489b01640c13ccfb7506d3e024082e076dde56d0.png which minimizes graph_theory_92ed0058f1d4c96e545b5cb1081cd7fdc982933c.png, i.e. the smallest total weight "crossing" between the partitions graph_theory_a7a9870d411fa7be56753e7bbd48731b3acb06b9.png and graph_theory_07e493b8df25595b9aed62aadeafdbaf82f4666a.png.

Problem with mincut is that no penalty is paid for unbalanced partitions and therefore nothing prevents the algorithm from making single vertices "clusters".

The Cheeger constant (minimum conductance) of a cut is defined

graph_theory_a9d4fafe0709b07865da46403ae918df08b47a25.png

The normalized cut is defined as

graph_theory_493a33bea642cee2e9c85abae9e9ff3074acfa85.png

For graph_theory_507af1d4e5558962854469d66b255553e33618de.png,

graph_theory_e331342798e6d4a61dc07c4539d64eea3ee4511c.png

from which we can see that Cheeger constant and normalized cut are closely related.

Both are NP-hard, thus need to be approximated.

The weighted and unweighted , respectively, balanced cuts of a graph are given:

graph_theory_d54ddac7458fe5e5e951e16e7d7fe39bef4eb107.png

We'll assume balanced partitions exists (e.g. for unweighted version the number of vertices has to be even). For the general case one can easily modify the definition requiring the partition to be "almost balanced", and this will lead to the same relaxed optimization problems in the end.

The unnormalized graph Laplacian is defined to be

graph_theory_d300a14df49932035ef5eef22e8be9867d4f1cc2.png

and the normalized graph Laplacian

graph_theory_657f7a79befc5394d9887df10fc829e0b905f68f.png

From graph_theory_9dfb630910b0b3d2a61cb5f1673b9adaa08fe150.png and graph_theory_b4eb105ddb70f9f674810cbe153b7690d3ec93e5.png various graph invariants can be estimated in terms of eigenvalues of these matrices.

Given a vector graph_theory_07ca1a0803343720dccd77b1d23e3d6fd327faf5.png, we have the following identity:

graph_theory_1771778ad13b853c3ac5cd8f335a922b387a72ae.png

thus graph_theory_9dfb630910b0b3d2a61cb5f1673b9adaa08fe150.png is positive semi-definite, and since graph_theory_24c9b1652180e32a7110ec48c93f9c2de54bf043.png is positive definite the same holds for graph_theory_b4eb105ddb70f9f674810cbe153b7690d3ec93e5.png.

Spectral clustering

Given a subset graph_theory_707218c0ea8696b5447a93938aa8f21d1e2cce76.png, define the column vector graph_theory_e41f8b1519143d62fe46909fcef53f19ac511219.png as follows:

graph_theory_af20a195eaf0a57980cac8cfed7b5c2695448335.png

Then it's immediate that

graph_theory_908e1bbefeab5a1dd210cdf4cd669768002372c4.png

since graph_theory_9b360dacd90e8f8f0853091c47477ab4d3e14f38.png and graph_theory_eeeffe7a031cd02eedffd9b55ebd13b5387d8e91.png. For all graph_theory_707218c0ea8696b5447a93938aa8f21d1e2cce76.png we have

graph_theory_55cdeed1649cfd609d72f936e394b757e23b5c8c.png

Consider the weighted balanced cut:

graph_theory_b0a124eb049df3668d0d16d31e967c851d47a203.png

Hence, graph_theory_1ad9f4b967476949a6e5482a5c45427abf918ae5.png if and only if the cut corresponding to graph_theory_a7a9870d411fa7be56753e7bbd48731b3acb06b9.png is volume-balanced, i.e.

graph_theory_c13f808fbf4c86547f98c6855c107e38ac60aa43.png

Therefore, we can reformulate the problem of computing weighted balanced cut as:

graph_theory_4710f420298be3fdb91e48c894e41b25acd4ff21.png

Moreover, as graph_theory_dd27aecf6a8bcfa867f53ed87954e4b1f00c9019.png has constant value graph_theory_f450151402c408e4c1d3072c370c1cd5467cbe6c.png for all graph_theory_a2e2251bdc3ca052a4813d7bed58c8bd996f798b.png, we can rewrite this as

graph_theory_bf7255437c50da55076dcfb05afcf35caa448422.png

I'll be honest and say I do not see where this graph_theory_a63aae439f113d8059b6b9520d51130c9e6d166d.png comes from.

In this form, the discrete optimization problem admits a simple relaxation by letting graph_theory_2d9d24485dd95b28e0b2d68a5aed229a99e26b48.png instead of graph_theory_1bc0e870e3bb96b227514efdf7c1035df5553746.png. Noticing that

graph_theory_d2f4a5813511ac79a354c9faf64923d851e9bed4.png

a standard linear algebra argument shows that

graph_theory_da654bbd83badbb6088c55f279ae6d8490fc0a5d.png

I tried to figure out why "since graph_theory_d1d29515853f26bc3369eab89ce7ea5804938d5a.png the minimizer is instead graph_theory_97edc990c61f092b1a52218bc612180fd885db92.png!" and what I can gather is that:

  • graph_theory_d1d29515853f26bc3369eab89ce7ea5804938d5a.png means that the matrix is non-invertible
  • Then only consider graph_theory_daeaece0368d28a7925c2bc7782354f3255ee614.png, i.e. subspace not spanned by eigenspace of graph_theory_4948d53a91e437ba6124a0426878e933b4a1b559.png
  • Left with this stuff, which makes sense; the Rayleigh quotient is minimized by the first eigenvalue

where graph_theory_97edc990c61f092b1a52218bc612180fd885db92.png is the second smallest eigenvalue of the generalized eigenvector problem graph_theory_bcd3fc5a7de1950ae9cf40604decd7e3f9858a6f.png. It is clear that the smallest eigenvalue graph_theory_4948d53a91e437ba6124a0426878e933b4a1b559.png of graph_theory_9dfb630910b0b3d2a61cb5f1673b9adaa08fe150.png is graph_theory_d83ba0b34682ac87f8d84c8310f6bbd28d1fe65b.png (since graph_theory_740f73efaa1d073fc6d7c2847a6d8331f06eaef8.png as noted earlier) and the corresponding eigenvector is graph_theory_3f8426b61732a55b02e9d83d7e9892a70d0906fd.png.

Further, we can show that for a connected graph, graph_theory_e8b7b18ab1f76a7f3267328e4cfa422e494fe5b8.png. Thus, the vector graph_theory_cdd1cc131da6040eca078917132a377727053c44.png for which the minimum above is attained is the eigenvector corresponding to graph_theory_97edc990c61f092b1a52218bc612180fd885db92.png.

This leads directly to the bipartitioning algortihm as relaxation of the weighted balanced cut:

  1. Compute the matrices graph_theory_9dfb630910b0b3d2a61cb5f1673b9adaa08fe150.png and graph_theory_b689cba8d7566f6adaf605a844e193a27e155078.png
  2. Find the eigenvector corresponding to the second smallest eigenvalue of the following generalized eigenvalue problem:

    graph_theory_ea2ecaa11f5de3ba1215704272176c18c97ca91e.png

  3. Obtain the partition:

    graph_theory_481e589adb173241db093aebd108572d39a7239f.png

where graph_theory_4dfbb4efa24b820c3d05ca44f97337b43d713ff2.png denotes the i-th entry of the corresponding eigenvector.

For unnormalized spectral clustering, a similar argument shows that

graph_theory_9327764002fce9d060f86055ab964a359eede7a2.png

and by relaxation we obtain an identical algorithm, except taht the eigenproblem

graph_theory_5dde309175b9bbe9264cecb990d2c592e2306f6b.png

is sovled instead.

Network Data Analysis

Notation

  • graph_theory_661c791039d54369a2013d9aecb898e6bdf88a3c.png denotes the linear space of all bounded symmetric measureable functions graph_theory_1290b779c9c1e31c0879a12beb0bb4da52ab6ba4.png
  • graph_theory_57023b917e8c6ccffa044979259c9f18879ad538.png is a kernel:

    graph_theory_5052171cf80de26918517c33a25630e4e7f967c7.png

    Considered as an operator graph_theory_90051b8f374e570df5fe4cfea105b0caefac4ce2.png

    graph_theory_a59d96b3fd56a31ecc18748919feebe579b1714e.png

  • graph_theory_6228af426f28d4195540bf6f886e392a0ce5d752.png is used to control sparsity
  • graph_theory_9096e5b0b9631266d63bd588fa85c1d0b2192cac.png is the
  • graph_theory_5f29a2bd050a2975a3c0c44919603b79de453a64.png is the expected number of occurences of subgraph graph_theory_e08421ac16591ee60adecc1e14a359550f81a1e2.png in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png
  • "random graph" = Erdos-Renyi

Stuff

  • Treating adjacency-matrix as a multi-dimensional Bernoulli rv. => allows possibility "construction" and "destruction" of connections

Real networks often follow:

  • Limiting behaviour

    graph_theory_a950c84fcd0074c80014aac56b5d286802c07fa9.png

  • Heterogenous: graph_theory_4786cad8c666708a4cdf737d82cb451928d7d116.png for some choices of graph_theory_53f6529f5b9b18f06d222ec877fef937cc50ecdd.png or graph_theory_9527176a32138b1f37501b929d15c4d5c5c3fb1e.png and graph_theory_374d3bbefa18f8f684b0c576cf8d3f3fa308e6da.png
  • No ordering

Hence, we consider the limiting behaviour

Graphons

A graphon is a symmetric measurable function graph_theory_c4f267e4d2fbe956e051304561cbfdb6912df9cc.png.

Usually a graphon is understood as defining an exchangeable random graph model according to the following scheme:

  1. Each vertex graph_theory_92483d0bb3bbad1c0b4630e7e529b0b5f3f8c48b.png of the graph is assigned an independent random value graph_theory_31e3eb14f2ade7d3ca7caf5d23d019586f181572.png
  2. Edge graph_theory_d4f50e72444617cdd48de82c238131603993949f.png is independently included in the graph with probability graph_theory_72ed0fbb411adb9fa818d2d62723197a05140c47.png

A random graph model is an exchangeable random graph model if and only if can be defined in terms of a (possibly random) graphon in this way.

It is an immediate consequence of this definition and the law of large numbers that, if graph_theory_eef55df91be05261c7d2a602d62dcebdf67cc1e7.png, exchangeable random graph models are dense almost surely.

Define a generative model of random graphs

  1. Fix a nonegative kernel graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png and scale graph_theory_4e2e56a190a8369d4cb21ba9abfff2e075678861.png
  2. Generate iid Uniform(0, 1) variates graph_theory_d2df35d2475fb9808a46bb97b9409a35ed155b24.png
  3. Indep. connect vertices graph_theory_f38eca84601d8086530e3e0e9664a1927b70eeae.png w.p. graph_theory_03fdc60cda351883499645fba42f50315d450855.png

Can we recover graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png up to scale from an observed graph?

We therefore consider the cut distance, which can be defined naturally on the linera space graph_theory_661c791039d54369a2013d9aecb898e6bdf88a3c.png of kernels.

Constructing an exchangeable model

Suppose we give graph_theory_2bb5e22609e9d4b4a084d37146266b8fcd08416c.png a univariate "prior" distribution

graph_theory_0660d390df6db2c11e6449a7ddfd81cd36976394.png

eqn scale the resulting mean matrix graph_theory_1362564cbd3ebebaeb5d6844e43718cd2430dde7.png by graph_theory_6228af426f28d4195540bf6f886e392a0ce5d752.png to control sparsity:

graph_theory_8a381164f728454a937707ca6e8d8e381e820780.png

Let graph_theory_985bbe1fcd92971791923ccf1dbf02fc46341915.png, then

graph_theory_19fcb3e9301b0116e0f2cd2c950cb848bfe1d3d2.png

where graph_theory_6c03018fd4253198b850bc6339e99699b567a798.png is a "Kartauff effect" (?)

Stochastic block models

Assume each node is assigned categorical variable graph_theory_e9426235ddd7db107cc54ee7d14f0fd6de07ea74.png:

graph_theory_03fc0cfd8676fafbe5c3004fe554a26d4148a25e.png

The a-th fitted community then comprises graph_theory_21942de8a40577888c06576ad5f241df3f915040.png > 1 nodes:

graph_theory_27b57515f41a2080f5f4951d66d6457751f297a8.png

i.e. graph_theory_21942de8a40577888c06576ad5f241df3f915040.png is the number of members in the community.

These "bandwiths" define a CDF graph_theory_f1330523c697c7f312e3dcdd0386cfbf0532a867.png an its inverse graph_theory_025d1e5ea0b227bbaf8057a16842faf4c0264bb1.png.

Any community assignment function graph_theory_9c15196dd07b1add486b8b54592e74bfe946ed95.png has two components:

  1. A vector graph_theory_644359fed082618f072d9fbc7feba30148ce0f88.png of community sizes (defines graph_theory_f1330523c697c7f312e3dcdd0386cfbf0532a867.png)
  2. A permutation graph_theory_ade9fe1b6de0968c993e78d814ec777d9a4c4632.png of graph_theory_1a2f7f655b54270c43cf9f525ab027e8f596c2f0.png that re-orders the nodes prior to applying the quantile function graph_theory_62d38a7d93f3e91f2cd3141fd8eaeca725ddd8b4.png.

Community to which graph_theory_9c15196dd07b1add486b8b54592e74bfe946ed95.png assigns node graph_theory_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png is determined by the composition graph_theory_b93625a6a318146ee27625989688acd0438a9ce2.png:

graph_theory_4cf7aeacaa39d093e95a8b5407ef54cae084d30d.png

Thus, each graph_theory_9c15196dd07b1add486b8b54592e74bfe946ed95.png represents a re-ordering of the network nodes, followed by a partitioning of the unit interval.

Comparing features

  1. Inexact graph matching - maximum common subgraph, etc.
  2. Edit distance - convergence implies that motif prevalences converge also
  3. Motif prevelance

One way

  1. Count prevalance of subgraph graph_theory_e08421ac16591ee60adecc1e14a359550f81a1e2.png in graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png

    graph_theory_c7e0532804c21a03d198c96bf0ccabba780f8747.png

  2. Compare to a null-model, typically moment matched (i.e. matching something like mean, variance, etc.)
  3. Compare subgraphs graph_theory_e08421ac16591ee60adecc1e14a359550f81a1e2.png somehow
  4. Finite connected graph graph_theory_281b393b3b567ae9f6cfec1011e739b7835199f5.png can be mapped out by a closed walk graph_theory_1c7de91d00bb815cf0006004154b2a0704675857.png (i.e. sequence of vertices)
  5. Walks in a network determine its classification as a topological space
  6. Euler characteristic: graph_theory_ff4faf6d2f7d9f83199a4675aebb2340bea356b9.png
  7. Number of closed k-walks in graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png derives from its adjancey matrix: graph_theory_d52d4121b7dd4a6a3128b2f8f8ad5a177de731d7.png
  8. For large classes of networks, walks mapping out maximal trees and cycles dominate => allows comparison between "random graphs"
  9. Turns out that counting cycles allows one to determine whether or not we're just looking at a "random graph"; i.e. if the graph exhibits any interesting behavior or not

Tree decomposition

  • Also called juntion trees, clique trees or join trees

Moralization of a graph is the process of turning a directed graph into an undirected graph by "marrying" of parent nodes.

Triangulation refers to the process of turning a graph into a chordal graph.

One of the reasons why this is so useful is when we turn triangulated graph into a clique tree for use in representing a factor graph.

A junction tree or tree decomposition represents the vertices graph_theory_79dc7b66c784a82e557ef3df794e82b67a292b17.png of a given graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png as subtrees of a tree, in such a way that the vertices in the given graph are adjacent only when the corresponding subtrees intersect.

Thus, graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png forms a subgraph of the intersection graph of the subtrees. The full intersection graph is a chordal graph.

Given a graph graph_theory_a5177eee309f705d90af4a252c1cb0e6a1e3a05d.png, a junction tree is a pair graph_theory_6cb70a93971ba92c7cc83018bb0b3ef4ade0ff6e.png

  • graph_theory_86f0433614ffd62df83d657e6d78fe4525c92d7f.png where graph_theory_9957bab4893d19cc166b76bf1350d39c91da1d1d.png (in some literature referred to as "supernodes")
  • graph_theory_70b5c2822229cb13eb02538c05793f89fd5eef92.png is a tree whose nodes correspond to the subsets graph_theory_42b3c8f6ed56c7cfb60e6194029bee525c549d11.png, i.e. a vertex graph_theory_bbfed3606479c75a7c29136c63d1b924dddf330a.png is defined by graph_theory_c200c1d741cd8258d414f13487a139e45cf33f5a.png

And this pair graph_theory_6cb70a93971ba92c7cc83018bb0b3ef4ade0ff6e.png is required to satisfy the following:

  1. graph_theory_67ddcd2d02dda52712aeac26b78d9c922693ef54.png, i.e. each vertex is associated with at least one tree node in graph_theory_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png
  2. Vertices are adjacent in the graph graph_theory_b3731d37cd447bd6c31809075a6be43b3d0b04ec.png if and only if the corresponding subtrees have a node in common, i.e.

    graph_theory_48f8287c1d91f8ea713feaf2a6be6c17570fa038.png

  3. If graph_theory_42b3c8f6ed56c7cfb60e6194029bee525c549d11.png and graph_theory_cc919856ca77385845d81cfd831bcb8a0384b2af.png both contain a vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png, then all nodes graph_theory_fa043b69cf999720c284d3a06a7f2033d0e4394b.png of the tree in (unique) path between graph_theory_42b3c8f6ed56c7cfb60e6194029bee525c549d11.png and graph_theory_cc919856ca77385845d81cfd831bcb8a0384b2af.png contain graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png as well, i.e.

    graph_theory_f8ef137fef8f9a91f260aaefea4ca03c88af4ef7.png

    where graph_theory_339c704fc9bf0c2f6780a0a0e68b51e128bb7f74.png denotes the set of notes in the (unique) path between graph_theory_42b3c8f6ed56c7cfb60e6194029bee525c549d11.png and graph_theory_cc919856ca77385845d81cfd831bcb8a0384b2af.png

A junction tree is not unique, i.e. there exists multiple ways to perform a decomposition described above.

Often you'll see that it's suggested to transform a graph into a chordal graph before attempting to decompose it.

This is due to a theorem which states that the following two are equivalent:

  1. graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is triangulated
  2. Clique graph of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png has a junction tree

One method of exact marginalization in general graphs is called the junction tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree.

The basic premise is to eliminate cycles by clustering them into single nodes.

Given a triangulated graph, weight the edges of the clique graph by their cardinality, graph_theory_43975debce6456f0864d287f096e7b043a63db5c.png, of the intersection of the adjacent cliques graph_theory_8939ac39ed16887f3b3c033a47d4ab60d120251a.png and graph_theory_aa45b78ca4fcfb1128b81983bc69cebaddb6486d.png. Then any maximum-weight spanning tree of the clique graph is a junction tree.

Thus, to construct a junction tree, we just have to extract a maximum weight spanning tree out of the clique graph.

A width of a tree decomposition is the size of its largest set graph_theory_42b3c8f6ed56c7cfb60e6194029bee525c549d11.png minus one, i.e.

graph_theory_d90e1a66c62bce203230d5ccf4e993c365adf828.png

And the treewidth graph_theory_a65c47d9a5678cd359529f86f0114e2e26f21b9f.png of a graph graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png is the minimum width among all possible tree decompositions of graph_theory_2de98136973021abb46a5a3fc1e4318bafb84264.png, i.e.

graph_theory_b2651cfc69c6ea39e24af61e853bd09043416706.png

Stochastic Block Modelling

Notation

  • graph_theory_f07b932b12c91bca424fd428d383495413b5b6a9.png number of vertices, thus graph_theory_ff8e4c60834c77f8d266e62555138f286ac7405a.png
  • graph_theory_094b02afce734f4ce51933d0093ef3d2da9f8123.png number of communities
  • graph_theory_0207be880056b9a69e22e729dd37bced29cd174a.png n-dimensional random vector with i.i.d. components distributed under graph_theory_fefe9e556d399665a26a37824ec578cbffb0cabe.png, i.e. representing the community-assignment of each vertex
  • graph_theory_59a49612564b3b646c7292ca45a900c769fde1ed.png is the random variable representing the community of which vertex graph_theory_ebac26f7eba81eb516f423f028596d8e6018aa7a.png belongs to, taking on values in graph_theory_86b4bcd268fe853a9b908d628122c9a0598dfe67.png, i.e. graph_theory_525ca4ace76c00743562e706ac70aa99b6a8da14.png with graph_theory_5aff125458168146a263539eb0ac44a051003376.png
  • graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png is a graph_theory_49fd3b25474425ce0e536633b89ac031379e90fb.png symmetric matrix with vertices graph_theory_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png and graph_theory_92483d0bb3bbad1c0b4630e7e529b0b5f3f8c48b.png being connected with probability graph_theory_f3895ebac67d205bbe639f04344141203651efbc.png, indep. of other pairs of vertices.
  • graph_theory_222595d5261b9951d6c0fb7f187c8a9ca73adc90.png denotes a community set
  • graph_theory_0a34dc6b3c6b4a604fb2f19376ce210b1f778d11.png denotes a community-graph pair

Stuff

  • graph_theory_bef6d28673ac96aa8341200bc5837edfd3605307.png
  • Probability vector graph_theory_fefe9e556d399665a26a37824ec578cbffb0cabe.png of dimension graph_theory_094b02afce734f4ce51933d0093ef3d2da9f8123.png
  • Symmetric matrix graph_theory_fe7108f38f2db55db9cb7eee3f63077437a2e510.png of dimension graph_theory_49fd3b25474425ce0e536633b89ac031379e90fb.png with entries in graph_theory_14ad9fc7ad19fd0d1dd65feedc0d5b8c171b6486.png

Then graph_theory_5a742264177c282513450904a78d5bc1a8ca6f43.png defines a n-vertex random graph with labelled vertices, where each vertex is assigned a community label in graph_theory_68dfe46e0a1c2b74a9cd84c1bc35cff0d0c46fab.png independently under the community prior graph_theory_fefe9e556d399665a26a37824ec578cbffb0cabe.png, and pairs of vertices with labels graph_theory_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png and graph_theory_92483d0bb3bbad1c0b4630e7e529b0b5f3f8c48b.png connect independently with probability graph_theory_08183b7f8e16f15028600bc4495d51ed0bbed198.png.

Further generalizations allow for labelled edges and continuous vertex labels, conencting to low-rank approximation models and graphons.

Thus the distribution of graph_theory_0a34dc6b3c6b4a604fb2f19376ce210b1f778d11.png where graph_theory_9498f1d2c33413a91da098f4b8309ccf0ceedcca.png is

graph_theory_c22c6f91ba8ddb2adeec352c6bfe5866bd1c0e57.png

which is just saying:

graph_theory_b0c333b1ca4ab0616316a1a5493b36717eb7d18f.png

Graphical models

Min-Max propagation

Algorithms

Shortest path

Dijkstra's Shortest Path

dijkstra-animation.gif

And here's a super-inefficient implementation of Djikstra's shortest path algorithm in Python:

import numpy as np
from collections import defaultdict, deque, namedtuple

def is_empty(q):
    return len(q) < 1


def get_neighbors(v, graph):
    neighs = []

    for e in graph.edges:
        if e.vertices[0] == v:
            neighs.append((e, e.vertices[1]))
        elif e.vertices[1] == v:
            neighs.append((e, e.vertices[0]))

    return neighs


# structures
Node = namedtuple('Node', ['value'])
Edge = namedtuple('Edge', ['vertices', 'weight', 'directed'])
Graph = namedtuple('Graph', ['vertices', 'edges'])

# straight foward problem
V = [Node(i) for i in range(10)]
E = [Edge([V[i - 1], V[i]], i, False) for i in range(1, 9)] + [Edge([V[8], V[9]], 0, False)]
G = Graph(V, E)

# shortcut!
E = E + [Edge([V[4], V[8]], -6, False)]
G = Graph(V, E)

v_0 = V[0]   # start 
v_n = V[-1]  # stop

paths = {}
path_weights = defaultdict(lambda: np.inf)

seen = set()
queue = deque([v_0, v_n])

seen.add(v_n)

v_curr = queue.popleft()
path_weights[v_curr] = 0 # initialize path weight to 0 from starting point
path_w = path_weights[v_curr]
paths[v_curr] = None

edge, v_neigh = get_neighbors(v_curr, G)[0]
# update best prev vertex candidate for `v_neigh`
paths[v_neigh] = v_curr
# update best candidate weight
path_weights[v_neigh] = edge.weight + path_w

while not is_empty(queue) and v_curr != v_n:
    for (edge, v_neigh) in get_neighbors(v_curr, G):
        if v_neigh == paths[v_curr]:
            # if it's the neighboring vertex we just came from => skip it
            continue

        # proposed path is improvement over previous path with `v_neigh` ?
        candidate_path_weight = path_weights[v_neigh]
        if edge.weight + path_w < candidate_path_weight:
            # update best prev vertex candidate for `v_neigh`
            paths[v_neigh] = v_curr
            # update best candidate weight
            path_weights[v_neigh] = edge.weight + path_w

        if v_neigh not in seen:
            queue.append(v_neigh)
            seen.add(v_neigh)

    # sort `queue` so that the next visit is to the current "shortest" path
    queue = deque(sorted(queue, key=lambda v: path_weights[v]))

    v_curr = queue.popleft()
    path_w = path_weights[v_curr]

# upon termination we simply walk backwards to get the shortest path
optimal_path = [v_n]
v_curr = v_n
while v_curr != v_0:
    v_curr = paths[v_curr]
    optimal_path.append(v_curr)


def display_path(path):
    print("Optimal path: " + " --> ".join([str(v.value) for v in path]))


display_path(list(reversed(optimal_path)))
print("Observe that some nodes are NEVER seen, as we'd like!")

from pprint import pprint
pprint(list(paths.keys()))
Optimal path: 0 --> 1 --> 2 --> 3 --> 4 --> 8 --> 9
Observe that some nodes are NEVER seen, as we'd like!
[Node(value=0),
 Node(value=1),
 Node(value=2),
 Node(value=3),
 Node(value=4),
 Node(value=5),
 Node(value=8),
 Node(value=7),
 Node(value=9)]
#include <iostream>
#include <memory>

template<typename T>
struct BinTreeNode
{
  T value;
  std::shared_ptr<BinTreeNode<T>> left = nullptr;
  std::shared_ptr<BinTreeNode<T>> right = nullptr;

  BinTreeNode(T v) : value(v) { }
};

int main()
{
  std::shared_ptr<BinTreeNode<int>> root = std::shared_ptr<BinTreeNode<int>>(new BinTreeNode<int>(0));
  root->left = std::make_shared<BinTreeNode<int>>(1);
  root->right = std::make_shared<BinTreeNode<int>>(2);

  root->left->left = std::make_shared<BinTreeNode<int>>(4);

  std::cout << root->value << std::endl;
  std::cout << root->left->value << std::endl;
  std::cout << root->left->left->value << std::endl;

  std::shared_ptr<BinTreeNode<int>> current = root;
  for (auto i = 0; i < 10; i++) {
    // using "raw" pointers, these `BinTreeNode` instances will live long enough
    // but risking dangling pointers in the case were we delete a parent node
    // UNLESS we specificly delete them (which we can do using a custom destructor for `BinTreeNode`)

    // Instead we can use `std::shared_ptr` which has an internal reference-counter,
    // deallocating only when there are no references to the object
    std::shared_ptr<BinTreeNode<int>> next = std::make_shared<BinTreeNode<int>>(i);
    std::cout << next->value << std::endl;
    if (next->value >= current->value) {
      current->right = next;
    }
    else {
      current->left = next;
    }

    current = next;
  }

  std::cout << root->right->right->value << std::endl;
  return 0;
}

Bellman-Ford algorithm

  • Computes shortest paths from single source vertex to all other vertices in a weighted digraph
  • Slower than Dijkstra's Algorithm for same problem, but more versatile
    • Capable of handling edges with negative weights
  • Can be used to discover negative cycles (i.e. a cycle whose edges sum to a negative value)
    • E.g. arbitrage opportunities on Forex markets

Relation to Dijkstra's algorithm:

  • Both based on principle of relaxation:
    • Approximation of the correct distance is gradually replaced by more accurate values until eventually reaching the optimum solution
    • Approx. distnace to each vertex is an overestimate of true distance, and is replacted by the minimum of its old value and the length of the newly found path
      • All nodes are initialized to be graph_theory_30dd1bf5261c6231fab304afac9382dac65bc232.png away from the source vertex graph_theory_34f5a229e8b4f260d3af2fc9e36fa0aaf62394e2.png, i.e. an overestimate
      • Update these distances at each iteration
  • Difference:
    • Dijkstra's: uses priority queue to greedily select the closest vertex that has not been processed, and performs this relaxation process on all of its outgoing edges
    • BF: simply relaxes all the edges, and does this graph_theory_2d56522dcbe84a5858a7d471261557fff556f0cb.png times (since there can be at most graph_theory_2d56522dcbe84a5858a7d471261557fff556f0cb.png edges on a path between any two vertices)
      • Can terminate early if shortest paths does not change between two iterations
      • In practive the ordering of which we update the edges can improve the runtime
Impl
def bellman_ford(vertices, edges, source):
    distances = {source: 0.0}
    previous = {}

    # instead of using infinity as initial value, we simply
    # check if the vertex is already in ``distances`` as a
    # proxy for "infinity" distance. Hence we do not need an initialization.

    # perform relaxation iterations
    diff = True
    i = 0

    while diff and i < len(vertices) - 1:
        diff = False

        # iterate over vertices which are reachable
        for v in (v for v in vertices if v in distances):
            v_dist = distances[v]
            outgoing_edges = [e for e in edges if e[0] == v]

            for (_, u, w) in outgoing_edges:
                if u not in distances or v_dist + w < distances[u]:
                    # found new shortert path to vertex ``u``
                    distances[u] = v_dist + w
                    previous[u] = v

                    # made a change
                    diff = True

        # update counter
        i += 1

    return distances


# negative cycle
edges = [
    (0, 1, 1.0),
    (1, 2, 1.0),
    (0, 2, 4.0),
    (2, 3, 1.0),
    (3, 0, -5.0)
]
vertices = [v for v in range(4)]
bellman_ford(vertices, edges, 0)

Conclusion

Algorithm Greedy Negative edges? Complexity  
Dijkstra Y N    
Bellman-Ford N Y graph_theory_05f86f1b1f42971e8e9b934f6046dd8967ccd341.png  

Code

graph-tool   python

import graph_tool.all as gt

g = gt.collection.data["football"]
g
# Out[4]:
: <Graph object, undirected, with 115 vertices and 613 edges at 0x7f25204da438>

state = gt.minimize_blockmodel_dl(g)
state
# Out[6]:
: <BlockState object with 10 blocks (10 nonempty), degree-corrected, for graph <Graph object, undirected, with 115 vertices and 613 edges at 0x7f25204da438>, at 0x7f24d806d588>

_ = state.draw(pos=g.vp.pos, output="./.graph_theory/figures/football-sbm-fit.png")

football-sbm-fit.png

Figure 7: Graph of the football-dataset with the different communities colored.

%matplotlib inline
import matplotlib.pyplot as plt

edges = state.get_matrix()
_ plt.matshow(edges.todense())

football-edge-counts.png

Figure 8: Connections between the different clusters (and the degrees within the clusters along the diagonal)