Graph Theory

Table of Contents


  • $V(G)$ denotes set of vertices in graph $G$ ($V$ is used when there is no scope for ambiguity)
  • $E(G)$ denotes set of edges in graph $G$ ($G$ is used when there is no scope for ambiguity)
  • $n$ and $m$ denotes # of vertices and edges, respectively, unless stated otherwise
  • $\psi_G: E(G) \to V(G) \times V(G)$ denotes the incidence function which maps an edge to the set of vertices connected by this edge
  • $N_G(v)$ denotes the set of neigbours of a vertex $v$
  • A set $V$, together with a set $E$ of two-element subsets of $V$, defines a simple graph $(V, E)$, where the ends of an edge $uv \in E$ are precisely the vertices $u, v \in V$
  • incident matrix is $n \times m$ matrix $\mathbf{M}_G := (m_{ve})$, where $m_{ve}$ is # of times (0, 1, or 2) that vertex $v$ and egde $e$ are incident
  • adjacency matrix is $n \times n$ matrix $\mathbf{A}_G := (a_{uv})$, where $a_{uv}$ is # of edges joining $u$ and $v$, with each loop counting as two edges
  • biparite adjaceny matrix of some biparite graph $G[X, Y]$ is the $r \times s$ matrix $\mathbf{B}_G = (b_{ij})$, where $b_{ij}$ is # of edges joining $x_i \in X := \{ x_1, \dots, x_r \}$ and $y_j \in Y := \{ y_1, \dots, y_s \}$
  • $G'$ denotes a sub-graph of some graph $G$, i.e. $G' \subseteq G$ (unless otherwise specified)



We say a graph $G$ is densily connected at vertex $v$ if $\text{deg}(v)$ is "large".

If we have a cluster of such vertices, we might refer to this cluster as being assortative ; in contrast we refer to clusters of vertices which connect similarily to the rest of the graph (vertices "outside" of the cluster) without necessarily having higher inner density, as disassortative communities / clusters.


A tree is a undirected graph in which any two vertices are connected by exactly one path.

E.g. a acyclic connected graph is a tree.

Sometimes a tree also has the additional constraint that every node has at most a single parent, and polytrees refer to what we just defined.

A spanning tree $T$ of an undirected graph $G$ is a subgraph that is a tree which includes all of the vertices of $G$, with miniumum possible number of edges.

In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree.

If all of the edges of $G$ are also edges of a spanning tree $T$ of $G$, then $G$ is a tree and is identical to $T$ (i.e. a tree has a unique spanning tree and it's itself).

A cut is a partition of a the vertices of a graph into two disjoint subsets.

Any cut determines a cut-set, that is the set of edges which have one endpoint in each subset of the partition.

A simple graph is a graph where both multiple edges and self-loops are dissallowed.

Thus, in a simple graph with $n$ vertices, each vertex is at most of degree $n - 1$.

Let $G = (V, E)$ and $G' = (V', E')$ be two graphs.

Graph $G'$ is a sub-graph of $G$, written $G' \subseteq G$, if

V' \subseteq V, \quad E' \subseteq E \cap (V' \times V')

If $G' \subseteq G$ contains all of the edges $e = \psi(u, v) \in E$ with $u, v \in V'$, then $G'$ is an induced sub-graph of $G$.

When $G'' \subset G$ and there exists an isomorphism between the sub-graph $G''$ and a graph $G'$, this mapping represents an appearance of $G'$ in $G$.

A graph is called recurrent or frequent in $G$, when its frequency $F_g(G')$ is aboe a predefined threshold or cut-off value.

Intersection graph

An intersection graph is a graph that represents the pattern of intersections of a family of sets.

Formally, an intersection graph is an undirected graph formed from a family of sets

S_i, \quad i = 0, 1, 2, \dots

by creating one vertex $v_i$ for each set $S_i$, and connecting two vertices $v_i$ and $v_j$ by an edge whenever the corresponding two sets have a nonempty intersection, i.e.

E(G) = \{ e_{ij} \mid S_i \cap S_j \ne \emptyset \}

Chordal graph

A chordal graph is one in which all cycles of 4 or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle.

Sometimes these are referred to as triangulated graphs.

Network motifs

Network motifs are defined as recurrent and statistically significant sub-graphs or patterns in a graph.

There is an ensemble $\Omega(G)$ of random graphs corresponding to the null-model associated to $G$. We should choose $N$ random graphs uniformly from $\Omega(G)$ and calculate the frequency for a particular frequent sub-graph $G'$ in $G$.

If the frequency of $G'$ in $G$ is higher than its arithmetic mean frequency in $N$ random graphs $R_i$, we acll this recurrent pattern significant and hence treat $G'$ as a network motif for $G$.

For a small graph $G'$, the network $G$ and a set of randomized networks $R(G) \subseteq \Omega(R)$ wher $R(G) = N$, the z-score of $G'$ is defined as:

Z(G') = \frac{F_g(G') - \mu_R(G')}{\sigma_R(G')}

where $\mu_R(G')$ and $\sigma_R(G')$ stand for mean and std. dev. frequency in set $R(G)$ respectively.

The larger the $Z(G')$, the more significant is the sub-graph $G'$ as a motif.


A clique is defined as a subset of nodes in a graph such that there exists a link between all pairs of nodes in the subset.

In other words, a clique is fully connected.

A maximal clique is a clique such that it is not possible to include any other nodes from the graph in the set without it ceasing to be a clique.


Book: Graph Theory

1. Graphs


  • $V(G)$ denotes set of vertices in graph $G$ ($V$ is used when there is no scope for ambiguity)
  • $E(G)$ denotes set of edges in graph $G$ ($G$ is used when there is no scope for ambiguity)
  • $n$ and $m$ denotes # of vertices and edges, respectively, unless stated otherwise
  • $\psi_G: E(G) \to V(G) \times V(G)$ denotes the incidence function which maps an edge to the set of vertices connected by this edge
  • $N_G(v)$ denotes the set of neigbours of a vertex $v$
  • A set $V$, together with a set $E$ of two-element subsets of $V$, defines a simple graph $(V, E)$, where the ends of an edge $uv \in E$ are precisely the vertices $u, v \in V$
  • incident matrix is $n \times m$ matrix $\mathbf{M}_G := (m_{ve})$, where $m_{ve}$ is # of times (0, 1, or 2) that vertex $v$ and egde $e$ are incident
  • adjacency matrix is $n \times n$ matrix $\mathbf{A}_G := (a_{uv})$, where $a_{uv}$ is # of edges joining $u$ and $v$, with each loop counting as two edges
  • biparite adjaceny matrix of some biparite graph $G[X, Y]$ is the $r \times s$ matrix $\mathbf{B}_G = (b_{ij})$, where $b_{ij}$ is # of edges joining $x_i \in X := \{ x_1, \dots, x_r \}$ and $y_j \in Y := \{ y_1, \dots, y_s \}$
  • $d_G(v)$ denotes the degree of $v$, i.e. # of edges incident with $v$, each loop counting as two edges
  • $\delta(G)$ and $\Delta(G)$ are the minimum and maximum degrees of the vertices of $G$
  • $d(G)$ is the average degree of the vertices of $G$

1.1 Graphs and Their Representations

A graph is an ordered pair $\big( V(G), E(G) \big)$ consisting of a set $V(G)$ of vertices and a set $E(G)$, disjoint from $V(G)$, of edges, together with an incidence function $\psi_G$ that associates with each edge of $G$ an unordered pair of vertices of $G$.

If $e$ is an edge and $u$ and $v$ are vertices such that $\psi_G(e) = \{ u, v \}$.

A biparite graph is a simple graph where the vertices $V$ can be partitioned into two subsets $X$ and $Y$ such that every edge has one end in $X$ and one end in $Y$; such a partition $(X, Y)$ is called a bipartition of the graph, and $X$ and $Y$ its parts.

A path is a simple graph whose vertices can be arranged in a linear sequence s.t. two vertices are adjacent if they are consecutive in the sequence, and are non-adjacent otherwise.

A cycle on three or more vertices in a simple graph whose vertices can be arranged in a cyclic sequence s.t. two vertices are adjacent if they are consecutive in the sequence, and are non-adjacent otherwise.

A graph is connected if and only if for every partition of its vertex set into two nonempty sets $X$ and $Y$, there is an edge with one end in $X$ and one end in $Y$; otherwise the graph is disconnected.

Equivalently, if there is a u-v-path in $G$ (a path fro m $u$ to $v$) for every pair of vertices $u, v \in V(G)$.

A graph $G$ is k-regular if

d(v) = k \quad \forall v \in V

A regular graph is one that is k-regular for some $k$.

Counting in two ways is a proof-technique where we consider a suitable matrix and compute the sum of its entries in two different ways:

  1. sum of its row sums
  2. sum of its column sums

Equating these two quantities results in an identity.

  • 1.1.5
    • Question

      For $k = 0, 1, 2$, characterize the k-regular graphs.

    • Answer

      Let $k = 0$, i.e.

d(v) = 0, \quad \forall v \in V

      which is simply a graph without any connections between the vertices.

      For $k = 1$, we have

d(v) = 1, \quad \forall \in V

      For any $v$ we then the following possibilities:

      • connected to itself, thus not connected to any other vertices, which in turn implies that we can consider two graphs; $(v, \emptyset)$ and $(V \setminus \{ v \}, E)$
      • connected to another single vertex $v'$: a simple sequence of vertices, i.e. a path, since each vertex is connected only to one other

      In total, a 1-regular graph can be separeted into graphs consisting of a single vertex with no edges (for all the vertices connected by loops) and a single graph which is a path.

      For $k = 2$, we have

d(v) = 2, \quad \forall V

      For any $v$ we then have the following possibilities:

      • both edges are loops: we can consider two graphs; $(v, \emptyset)$ and $(V \setminus \{ v \}, E)$
      • both edges are not loops: here we can have two cases
        • $v$ is part of a cycle $c$: can consider two separate graphs $(V_c, E_c)$ and $(V \setminus V_c, E \setminus E_c)$, since every $v' \in V_c$ is connected to two other vertices in $V_c$ for $(V_c, E_c)$ to form a cycle. $(V_c, E_c)$ is then simply a polygon
        • $v$ is not a part of a cycle: we have a simple graph
      • one edge is loop, one edge is non-loop: $v$ can then only be connected to a single $v' \in V$ which also has one loop and one non-loop, thus they form:



      All in all, we can separate a 2-regular graph into graphs of the following classification:

      • empty graphs
      • k-cycles
      • graphs of the form 1


edge with identical ends
edge with distinct ends
parallel edges
two or more links with the same pair of ends
null graph
graph with zero vertices
trivial graph
graph with one vertex
complete graph
simple graph in which any two vertices are adjacent
empty graph
simple graph in which no two vertices are adjacent, i.e. $E = \emptyset$
path of length $k$
cycle of length $k$
planar graph
graph which can be drawn in a plane (2D) s.t. edges meet only at points corresponding to their common ends, i.e. no intersections when drawn in a 2D plane, where the drawing itself is called a planar embedding
"drawing" of a graph on some n-dimensional structure so that its edges intersect only at their ends

Book: Bayesian Reasoning and Machine Learning


  • $\text{pa}(X)$ denotes the parent nodes of the rv. $X$ in the BN

7. Making Decisions


  • $U(D \mid \mathcal{X})$ denotes the expected utility of decision $D$ over the set of rvs. $\mathcal{X}$

7.3 Extending Bayesian Networks for Decisions

An Influence Diagram (ID) states which information is required in order to make each decision, and the order in which these decisions are to be made.

We call an information link fundamental if its removal would alter the partial ordering.

7.4 Solving Influence Diagrams

A decision potential on a clique $C$ contains two potentials: a probability potential $\rho_C$ and a utility potential $\mu_C$. The join potentials for the junction tree are defined as

\rho = \prod_{C \in \mathcal{C}} \rho_C, \quad \mu = \sum_{C \in \mathcal{C}} \mu_C

with the junction tree representing the term $\rho_{\mu}$.

But for IDs there are constraints on the triangulation since we need to preserve the partial ordering induced by the decisions, which gives rise to strong Junction Tree.

  1. Remove information edges
  2. Moralization: marry all parents of the remaining nodes
  3. Remove utility nodes: remove the utility nodes and their parental links
  4. Strong triangulation: form a triangulation based on an elimination order which obeys the partial ordering of the variables.
  5. Strong Junction Tree: From the strongly triangulated graph, form a junction tree and orient the edges towards the strong root (the clique that appears last in the eliminiation sequence).

Spectral Clustering


  • $v_i$ denotes the i-th vertex
  • $v_i \sim v_j$ when $v_i$ is adjacent to $v_j$
  • $e_{ij}$ denotes the edge between $v_i$ and $v_j$
  • Edges are non-negatively weighted with the weight matrix $W$
  • $D$ is the diagonal matrix of $W$, also called the degree matrix, defined as:

D := 
\text{diag} \big( \text{deg}(v_i) \big) & \text{if un-weighted} \\
\text{diag} \big( \sum_{j} w_{ij} \big) & \text{if weighted}
  • $L = D - W$ denotes the unnormalized graph Laplacian
  • $L' = I - D^{- \frac{1}{2}} W D^{- \frac{1}{2}}$ denotes the normalized graph Laplacian


  • Want to partition a graph in parts which are roughly the same size and are connected by as few edges as possible, or informally, "as disjoint as possible"
    • NP-hard (even approximations are difficult)
  • Good heuristic: write cost function as a quadratic form and relaxing the discrete optimization problem of finding the characteristic function of the cut to a real-valued optimization problem
    • Leads to an eigenproblem

The volume of a subset of vertices $S \subset V$ is the total degree of all vertices in the subset:

\text{vol}(S) = \sum_{v_i \in S} \text{deg}(v_i)

If $S \subset V$ is a set of vertices of $G$, we define it's edge boundary $\delta S$ to be the set of edges connecting $S$ and its complement $\bar{S} = V - S$.

Similiarily, the volume of $\delta S$ is defined as

\text{vol}(\delta S) = \sum_{e_{ij} \in E \ : \  v_i \in S, v_j \in \bar{S}} w_{ij}

This quantity is also known as the given by $S$ and $\bar{S}$.

The minimum cut of a graph $G$ refers to the partition $S, \bar{S}$ which minimizes $\text{vol}(\delta S)$, i.e. the smallest total weight "crossing" between the partitions $S$ and $\bar{S}$.

Problem with mincut is that no penalty is paid for unbalanced partitions and therefore nothing prevents the algorithm from making single vertices "clusters".

The Cheeger constant (minimum conductance) of a cut is defined

c(G) = \underset{S \subset V}{\text{argmin}}\ \frac{\text{vol}(\delta S)}{\min \big( \{ \text{vol}(S), \text{vol}(\bar{S}) \} \big)}

The normalized cut is defined as

n(G) = \underset{S \subset V}{\text{argmin}}\ \frac{\text{vol}(\delta S)}{\text{vol}(S)} + \frac{\text{vol}(\delta S)}{\text{vol}(\bar{S})}

For $a, b > 0$,

\min \{ a, b \} \le \frac{1}{\frac{1}{a} + \frac{1}{b}} \le 2 \min \{ a, b \}

from which we can see that Cheeger constant and normalized cut are closely related.

Both are NP-hard, thus need to be approximated.

The weighted and unweighted , respectively, balanced cuts of a graph are given:

  b_w (G) &= \underset{S \subset V, \text{vol}(S) = \text{vol}(\bar{S}}{\text{argmin}}\ \text{vol}(\delta S) \\
  b_{uw}(G) &= \underset{S \subset V, |S| = |\bar{S}|}{\text{argmin}}\ \text{vol}(\delta S)

We'll assume balanced partitions exists (e.g. for unweighted version the number of vertices has to be even). For the general case one can easily modify the definition requiring the partition to be "almost balanced", and this will lead to the same relaxed optimization problems in the end.

The unnormalized graph Laplacian is defined to be

L = D - W

and the normalized graph Laplacian

  L' &= D^{- \frac{1}{2}} L D^{- \frac{1}{2}} \\
  &= I - D^{- \frac{1}{2}} W D^{- \frac{1}{2}}

From $L$ and $L'$ various graph invariants can be estimated in terms of eigenvalues of these matrices.

Given a vector $f = (f_1, \dots, f_n) \in \mathbb{R}^n$, we have the following identity:

f L f^T = 2 \sum_{e_{ij}} w_{ij} (f_i - f_j)^2

thus $L$ is positive semi-definite, and since $D^{\frac{1}{2}}$ is positive definite the same holds for $L'$.

Spectral clustering

Given a subset $S \subset V$, define the column vector $f_S = (f_{S_1}, \dots, f_{S_n})' \in \mathbb{R}^n$ as follows:

f_{S_i} = 
  & 1, v_i \in S \\
  -& 1, v_i \in \bar{S}

Then it's immediate that

f_S^T L f_S = \sum_{e_{ij}} w_{ij} \big( f_{S_i} - f_{S_j} \big)^2 = 4 \text{vol}(\delta S)

since $f_{S_i} - f_{S_i} = 0$ and $f_{S_i} - f_{S_j} = 2, \quad i \ne j$. For all $S \subset V$ we have

f_S^T D f_S = \sum_{v_i \in V} w_{ij} = \text{vol}(V)

Consider the weighted balanced cut:

f_S^T D \mathbf{1} = \sum_{v_i \in S} \text{deg}(v_i) - \sum_{v_i \in \bar{S}} \text{deg}(v_i) = \text{vol}(S) - \text{vol}(\bar{S})

Hence, $f_S^T D \mathbf{1} = 0$ if and only if the cut corresponding to $S$ is volume-balanced, i.e.

\text{vol}(S) = \text{vol}(\bar{S})

Therefore, we can reformulate the problem of computing weighted balanced cut as:

b_w(G) = \min_{f \in \{ -1, 1 \}^n, \ f D \mathbf{1} = 0} f^T L f

Moreover, as $f^T D f$ has constant value $\frac{1}{4} \text{vol}(V)$ for all $f \in \{ -1, 1 \}^n$, we can rewrite this as

b_w(G) = \frac{1}{4} \text{vol}(V) \Bigg(  \min_{f \in \{ -1, 1 \}^n,\ f D \mathbf{1} = 0} \frac{f^T L f}{f^T D f} \Bigg)

I'll be honest and say I do not see where this $\frac{1}{4}$ comes from.

In this form, the discrete optimization problem admits a simple relaxation by letting $f_i \in \mathbb{R}$ instead of $f_i \in \{ -1, 1 \}$. Noticing that

  L \mathbf{1} &= (D - W) \mathbf{1} = D \mathbf{1} - W \mathbf{1} \\
    \text{deg}(v_1) & \dots & \text{deg}(v_n)
  \end{bmatrix} -
    \sum_{v_j \in V : v_j \sim v_1} v_j & \dots & \sum_{v_j \in V : v_j \sim v_n} v_j
  \end{bmatrix} \\
    \text{deg}(v_1) & \dots & \text{deg}(v_n)
  \end{bmatrix} -
    \text{deg}(v_1) & \dots & \text{deg}(v_n)
  \end{bmatrix} \\
  &= 0

a standard linear algebra argument shows that

\lambda_2 = \min_{f \in \mathbb{R}^n,\ f D \mathbf{1} = 0} \frac{f^T L f}{f^T D f}

I tried to figure out why "since $\lambda_1 = 0$ the minimizer is instead $\lambda_2$!" and what I can gather is that:

  • $\lambda_1 = 0$ means that the matrix is non-invertible
  • Then only consider $f D \mathbf{1} = 0$, i.e. subspace not spanned by eigenspace of $\lambda_1$
  • Left with this stuff, which makes sense; the Rayleigh quotient is minimized by the first eigenvalue

where $\lambda_2$ is the second smallest eigenvalue of the generalized eigenvector problem $L f = \lambda D f$. It is clear that the smallest eigenvalue $\lambda_1$ of $L$ is $0$ (since $L \mathbf{1} = 0$ as noted earlier) and the corresponding eigenvector is $\mathbf{1}$.

Further, we can show that for a connected graph, $\lambda_2 > 0$. Thus, the vector $f$ for which the minimum above is attained is the eigenvector corresponding to $\lambda_2$.

This leads directly to the bipartitioning algortihm as relaxation of the weighted balanced cut:

  1. Compute the matrices $L$ and $D$
  2. Find the eigenvector corresponding to the second smallest eigenvalue of the following generalized eigenvalue problem:

 Lf = \lambda D f
  3. Obtain the partition:

 S = \{ v_i: p_i > 0 \}, \quad \bar{S} = \{ v_i : p_i \le 0 \}

where $p_i$ denotes the i-th entry of the corresponding eigenvector.

For unnormalized spectral clustering, a similar argument shows that

b_{uw}(G) = \frac{1}{4} |V| \min_{f \in \{ -1, 1 \}^n, \ f \mathbf{1} = 0} \frac{f^T L f}{f^T f}

and by relaxation we obtain an identical algorithm, except taht the eigenproblem

L f = \lambda f

is sovled instead.

Network Data Analysis


  • $\mathcal{W}$ denotes the linear space of all bounded symmetric measureable functions $[0, 1]^2 \to \mathbb{R}$
  • $W \in \mathcal{W}$ is a kernel:

  \big( T_w f \big)(x) = \int_{0}^{1} W(x, y) f(y) \ dy

    Considered as an operator $L_\infty[0, 1] \to L_1[0, 1]$

  \norm{T_W}_{\infty \to 1} = \sup_{f \in L_{\infty}[0, 1]: \norm{f}_\infty = 1} \norm{T_w f}_1
  • $\rho_n$ is used to control sparsity
  • $I(\cdot)$ is the
  • $\text{ind}_k(F, G)$ is the expected number of occurences of subgraph $F$ in graph $G$
  • "random graph" = Erdos-Renyi


  • Treating adjacency-matrix as a multi-dimensional Bernoulli rv. => allows possibility "construction" and "destruction" of connections

Real networks often follow:

  • Limiting behaviour

  |V| \to \infty \implies p(e_{ij}) \to 0
  • Heterogenous: $\frac{p_{ij}}{p_{kl}} \propto n^{-\gamma}$ for some choices of $i \ne k$ or $j \ne l$ and $\gamma > 0$
  • No ordering

Hence, we consider the limiting behaviour


A graphon is a symmetric measurable function $W: [0, 1]^2 \to [0, 1]$.

Usually a graphon is understood as defining an exchangeable random graph model according to the following scheme:

  1. Each vertex $j$ of the graph is assigned an independent random value $u_j \sim U[0, 1]$
  2. Edge $(i, j)$ is independently included in the graph with probability $W(u_i, u_j)$

A random graph model is an exchangeable random graph model if and only if can be defined in terms of a (possibly random) graphon in this way.

It is an immediate consequence of this definition and the law of large numbers that, if $W \ne 0$, exchangeable random graph models are dense almost surely.

Define a generative model of random graphs

  1. Fix a nonegative kernel $W$ and scale $0 \le \rho_n \le \frac{1}{\norm{W}_\infty}$
  2. Generate iid Uniform(0, 1) variates $\xi_1, \dots, \xi_n$
  3. Indep. connect vertices $i, j$ w.p. $p_{ij} = \rho_n W(\xi_i, \xi_j)$

Can we recover $W$ up to scale from an observed graph?

We therefore consider the cut distance, which can be defined naturally on the linera space $\mathcal{W}$ of kernels.

Constructing an exchangeable model

Suppose we give $\pi$ a univariate "prior" distribution

\pi_i \overset{iid}{\sim} g(\cdot)

eqn scale the resulting mean matrix $\pi \pi'$ by $\rho_n$ to control sparsity:

\mathbb{E}[A_{ij} \mid \pi ] = p_{ij} = \rho_n \cdot \pi_i \pi_j

Let $mu_n = \mathbb{E}_[p_{ij} = \omega(1 / n)$, then

n \mu_n \mathbb{P}(d_i = k) = g \bigg( \frac{k}{n \mu_n} \bigg) I(k \le n \mu_n)

where $I$ is a "Kartauff effect" (?)

Stochastic block models

Assume each node is assigned categorical variable $z_i$:

\mathbb{E}[A_{ij} \mid z] = \theta_{ab} I(z_i = a) I(z_j = b)

The a-th fitted community then comprises $h_a$ > 1 nodes:

h_a = \sum_{i=1}^{n} I(z_i = a)

i.e. $h_a$ is the number of members in the community.

These "bandwiths" define a CDF $H$ an its inverse $H^{-1}$.

Any community assignment function $z$ has two components:

  1. A vector $h(z) = (h_1, \dots, h_k)$ of community sizes (defines $H$)
  2. A permutation $\pi_z$ of $\{ 1, \dots, n \}$ that re-orders the nodes prior to applying the quantile function $H^{-1}(\cdot / n)$.

Community to which $z$ assigns node $i$ is determined by the composition $H^{-1} \circ \pi_z$:

z_i = H^{-1} \{ \pi_z(i) / n \}, \quad 1 \le i \le n

Thus, each $z$ represents a re-ordering of the network nodes, followed by a partitioning of the unit interval.

Comparing features

  1. Inexact graph matching - maximum common subgraph, etc.
  2. Edit distance - convergence implies that motif prevalences converge also
  3. Motif prevelance

One way

  1. Count prevalance of subgraph $F$ in graph $G$

 X_F(G) = \sum_{F' \subset G} 1_{F' = F}
  2. Compare to a null-model, typically moment matched (i.e. matching something like mean, variance, etc.)
  3. Compare subgraphs $F$ somehow
  4. Finite connected graph $F_w$ can be mapped out by a closed walk $w = (v_i)_{0 \le i \le k)$ (i.e. sequence of vertices)
  5. Walks in a network determine its classification as a topological space
  6. Euler characteristic: $v(F_w) - e(F_w)$
  7. Number of closed k-walks in $G$ derives from its adjancey matrix: $\tr A$
  8. For large classes of networks, walks mapping out maximal trees and cycles dominate => allows comparison between "random graphs"
  9. Turns out that counting cycles allows one to determine whether or not we're just looking at a "random graph"; i.e. if the graph exhibits any interesting behavior or not

Tree decomposition

  • Also called juntion trees, clique trees or join trees

Moralization of a graph is the process of turning a directed graph into an undirected graph by "marrying" of parent nodes.

Triangulation refers to the process of turning a graph into a chordal graph.

One of the reasons why this is so useful is when we turn triangulated graph into a clique tree for use in representing a factor graph.

A junction tree or tree decomposition represents the vertices $V$ of a given graph $G$ as subtrees of a tree, in such a way that the vertices in the given graph are adjacent only when the corresponding subtrees intersect.

Thus, $G$ forms a subgraph of the intersection graph of the subtrees. The full intersection graph is a chordal graph.

Given a graph $G = (V, E)$, a junction tree is a pair $(X, T)$

  • $X = \{ X_1, \dots, X_n \}$ where $X_i \subset V$ (in some literature referred to as "supernodes")
  • $T = (V_T, E_T)$ is a tree whose nodes correspond to the subsets $X_i$, i.e. a vertex $v_T^{(i)} \in V_T$ is defined by $v_T^{(i)} = X_i$

And this pair $(X, T)$ is required to satisfy the following:

  1. $\bigcup_i X_i = V$, i.e. each vertex is associated with at least one tree node in $T$
  2. Vertices are adjacent in the graph $T$ if and only if the corresponding subtrees have a node in common, i.e.

 \forall e_{v,w} \in E_T, \exists X_i : v, w \in X_i
  3. If $X_i$ and $X_j$ both contain a vertex $v$, then all nodes $X_k$ of the tree in (unique) path between $X_i$ and $X_j$ contain $v$ as well, i.e.

  \forall X_i, X_j : v \in X_i \text{ and } v \in X_j \implies \forall X_k \in \gamma(X_i, X_j) \quad v \in X_k

    where $\gamma$ denotes the set of notes in the (unique) path between $X_i$ and $X_j$

A junction tree is not unique, i.e. there exists multiple ways to perform a decomposition described above.


Figure 5: 640px-Tree_decomposition.svg.png?1518327540486

Often you'll see that it's suggested to transform a graph into a chordal graph before attempting to decompose it.

This is due to a theorem which states that the following two are equivalent:

  1. $G$ is triangulated
  2. Clique graph of $G$ has a junction tree

One method of exact marginalization in general graphs is called the junction tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree.

The basic premise is to eliminate cycles by clustering them into single nodes.

Given a triangulated graph, weight the edges of the clique graph by their cardinality, $|A \cap B|$, of the intersection of the adjacent cliques $A$ and $B$. Then any maximum-weight spanning tree of the clique graph is a junction tree.

Thus, to construct a junction tree, we just have to extract a maximum weight spanning tree out of the clique graph.

A width of a tree decomposition is the size of its largest set $X_i$ minus one, i.e.

w(G) = \max_i |X_i| - 1

And the treewidth $\text{tw}(G)$ of a graph $G$ is the minimum width among all possible tree decompositions of $G$, i.e.

\text{tw}(G) = \min_{(X = \{ X_1, \dots, X_n \}, T)} w(g)

Stochastic Block Modelling


  • $n$ number of vertices, thus $v \in [n]$
  • $k$ number of communities
  • $X$ n-dimensional random vector with i.i.d. components distributed under $p$, i.e. representing the community-assignment of each vertex
  • $X_v$ is the random variable representing the community of which vertex $v$ belongs to, taking on values in $[k]$, i.e. $X_v = x_v$ with $x_v \in [k]$
  • $W$ is a $k \times k$ symmetric matrix with vertices $i$ and $j$ being connected with probability $W_{X_i, X_j}$, indep. of other pairs of vertices.
  • $\Omega_i = \Omega_i(X) := \{ v \in [n] \mid X_v = i, i \in [k] \}$ denotes a community set
  • $(X, G)$ denotes a community-graph pair


  • $n, k \in \mathbb{N}$
  • Probability vector $p$ of dimension $k$
  • Symmetric matrix $W$ of dimension $k \times k$ with entries in $[0, 1]$

Then $\text{SBM}(n, p, W)$ defines a n-vertex random graph with labelled vertices, where each vertex is assigned a community label in $\{ 1, \dots, k \}$ independently under the community prior $p$, and pairs of vertices with labels $i$ and $j$ connect independently with probability $W_{i,j}$.

Further generalizations allow for labelled edges and continuous vertex labels, conencting to low-rank approximation models and graphons.

Thus the distribution of $(X, G)$ where $G = \big( [n], E(G) \big)$ is

  \mathbb{P} \{ X = x \} & := \prod_{u = 1}^n p_{x_u} = \prod_{i = 1}^k p_i^{|\Omega_i(x)|} \\
  \mathbb{P} \Big\{ E(G) = e \mid X = x \Big\} & := \prod_{1 \le u < v \le n} W_{x_u, x_v}^{e_{uv}} \big( 1 - W_{x_u, x_v} \big)^{1 - e_{uv}}

which is just saying:

  X_u & \sim \text{Multinomial}(k), \quad \forall u \in [n] \\
  E_{uv} & \sim \text{Bernoulli}(W_{x_u, x_v}), \quad \forall u, v \in [n], u \ne v

Graphical models

Min-Max propagation


Shortest path

Dijkstra's Shortest Path


And here's a super-inefficient implementation of Djikstra's shortest path algorithm in Python:

import numpy as np
from collections import defaultdict, deque, namedtuple

def is_empty(q):
    return len(q) < 1

def get_neighbors(v, graph):
    neighs = []

    for e in graph.edges:
        if e.vertices[0] == v:
            neighs.append((e, e.vertices[1]))
        elif e.vertices[1] == v:
            neighs.append((e, e.vertices[0]))

    return neighs

# structures
Node = namedtuple('Node', ['value'])
Edge = namedtuple('Edge', ['vertices', 'weight', 'directed'])
Graph = namedtuple('Graph', ['vertices', 'edges'])

# straight foward problem
V = [Node(i) for i in range(10)]
E = [Edge([V[i - 1], V[i]], i, False) for i in range(1, 9)] + [Edge([V[8], V[9]], 0, False)]
G = Graph(V, E)

# shortcut!
E = E + [Edge([V[4], V[8]], -6, False)]
G = Graph(V, E)

v_0 = V[0]   # start 
v_n = V[-1]  # stop

paths = {}
path_weights = defaultdict(lambda: np.inf)

seen = set()
queue = deque([v_0, v_n])


v_curr = queue.popleft()
path_weights[v_curr] = 0 # initialize path weight to 0 from starting point
path_w = path_weights[v_curr]
paths[v_curr] = None

edge, v_neigh = get_neighbors(v_curr, G)[0]
# update best prev vertex candidate for `v_neigh`
paths[v_neigh] = v_curr
# update best candidate weight
path_weights[v_neigh] = edge.weight + path_w

while not is_empty(queue) and v_curr != v_n:
    for (edge, v_neigh) in get_neighbors(v_curr, G):
        if v_neigh == paths[v_curr]:
            # if it's the neighboring vertex we just came from => skip it

        # proposed path is improvement over previous path with `v_neigh` ?
        candidate_path_weight = path_weights[v_neigh]
        if edge.weight + path_w < candidate_path_weight:
            # update best prev vertex candidate for `v_neigh`
            paths[v_neigh] = v_curr
            # update best candidate weight
            path_weights[v_neigh] = edge.weight + path_w

        if v_neigh not in seen:

    # sort `queue` so that the next visit is to the current "shortest" path
    queue = deque(sorted(queue, key=lambda v: path_weights[v]))

    v_curr = queue.popleft()
    path_w = path_weights[v_curr]

# upon termination we simply walk backwards to get the shortest path
optimal_path = [v_n]
v_curr = v_n
while v_curr != v_0:
    v_curr = paths[v_curr]

def display_path(path):
    print("Optimal path: " + " --> ".join([str(v.value) for v in path]))

print("Observe that some nodes are NEVER seen, as we'd like!")

from pprint import pprint
Optimal path: 0 --> 1 --> 2 --> 3 --> 4 --> 8 --> 9
Observe that some nodes are NEVER seen, as we'd like!
#include <iostream>
#include <memory>

template<typename T>
struct BinTreeNode
  T value;
  std::shared_ptr<BinTreeNode<T>> left = nullptr;
  std::shared_ptr<BinTreeNode<T>> right = nullptr;

  BinTreeNode(T v) : value(v) { }

int main()
  std::shared_ptr<BinTreeNode<int>> root = std::shared_ptr<BinTreeNode<int>>(new BinTreeNode<int>(0));
  root->left = std::make_shared<BinTreeNode<int>>(1);
  root->right = std::make_shared<BinTreeNode<int>>(2);

  root->left->left = std::make_shared<BinTreeNode<int>>(4);

  std::cout << root->value << std::endl;
  std::cout << root->left->value << std::endl;
  std::cout << root->left->left->value << std::endl;

  std::shared_ptr<BinTreeNode<int>> current = root;
  for (auto i = 0; i < 10; i++) {
    // using "raw" pointers, these `BinTreeNode` instances will live long enough
    // but risking dangling pointers in the case were we delete a parent node
    // UNLESS we specificly delete them (which we can do using a custom destructor for `BinTreeNode`)

    // Instead we can use `std::shared_ptr` which has an internal reference-counter,
    // deallocating only when there are no references to the object
    std::shared_ptr<BinTreeNode<int>> next = std::make_shared<BinTreeNode<int>>(i);
    std::cout << next->value << std::endl;
    if (next->value >= current->value) {
      current->right = next;
    else {
      current->left = next;

    current = next;

  std::cout << root->right->right->value << std::endl;
  return 0;

Bellman-Ford algorithm

  • Computes shortest paths from single source vertex to all other vertices in a weighted digraph
  • Slower than Dijkstra's Algorithm for same problem, but more versatile
    • Capable of handling edges with negative weights
  • Can be used to discover negative cycles (i.e. a cycle whose edges sum to a negative value)
    • E.g. arbitrage opportunities on Forex markets

Relation to Dijkstra's algorithm:

  • Both based on principle of relaxation:
    • Approximation of the correct distance is gradually replaced by more accurate values until eventually reaching the optimum solution
    • Approx. distnace to each vertex is an overestimate of true distance, and is replacted by the minimum of its old value and the length of the newly found path
      • All nodes are initialized to be $\infty$ away from the source vertex $v_s$, i.e. an overestimate
      • Update these distances at each iteration
  • Difference:
    • Dijkstra's: uses priority queue to greedily select the closest vertex that has not been processed, and performs this relaxation process on all of its outgoing edges
    • BF: simply relaxes all the edges, and does this $|V| - 1$ times (since there can be at most $|V| - 1$ edges on a path between any two vertices)
      • Can terminate early if shortest paths does not change between two iterations
      • In practive the ordering of which we update the edges can improve the runtime
def bellman_ford(vertices, edges, source):
    distances = {source: 0.0}
    previous = {}

    # instead of using infinity as initial value, we simply
    # check if the vertex is already in ``distances`` as a
    # proxy for "infinity" distance. Hence we do not need an initialization.

    # perform relaxation iterations
    diff = True
    i = 0

    while diff and i < len(vertices) - 1:
        diff = False

        # iterate over vertices which are reachable
        for v in (v for v in vertices if v in distances):
            v_dist = distances[v]
            outgoing_edges = [e for e in edges if e[0] == v]

            for (_, u, w) in outgoing_edges:
                if u not in distances or v_dist + w < distances[u]:
                    # found new shortert path to vertex ``u``
                    distances[u] = v_dist + w
                    previous[u] = v

                    # made a change
                    diff = True

        # update counter
        i += 1

    return distances

# negative cycle
edges = [
    (0, 1, 1.0),
    (1, 2, 1.0),
    (0, 2, 4.0),
    (2, 3, 1.0),
    (3, 0, -5.0)
vertices = [v for v in range(4)]
bellman_ford(vertices, edges, 0)


Algorithm Greedy Negative edges? Complexity  
Dijkstra Y N    
Bellman-Ford N Y $\mathcal{O}(V \times E)$  


graph-tool   python

import graph_tool.all as gt

g =["football"]
# Out[4]:
: <Graph object, undirected, with 115 vertices and 613 edges at 0x7f25204da438>
state = gt.minimize_blockmodel_dl(g)
# Out[6]:
: <BlockState object with 10 blocks (10 nonempty), degree-corrected, for graph <Graph object, undirected, with 115 vertices and 613 edges at 0x7f25204da438>, at 0x7f24d806d588>
_ = state.draw(pos=g.vp.pos, output="./.graph_theory/figures/football-sbm-fit.png")


Figure 7: Graph of the football-dataset with the different communities colored.

%matplotlib inline
import matplotlib.pyplot as plt

edges = state.get_matrix()
_ plt.matshow(edges.todense())


Figure 8: Connections between the different clusters (and the degrees within the clusters along the diagonal)



  • Graph is a pair $G = \big( V, E \big)$ where $E \subset V^{(2)} = \left\{ Y \subset V: \left| Y \right| = 2 \right\}$, e.g.

V = \left\{ a, b, c, d \right\}, \quad E = \left\{ \left\{ a, b \right\}, \left\{ a, c \right\}, \left\{ a, d \right\}, \left\{ b, c \right\} \right\}
  • Order of $G$ refers to $|V(G)|$, which is often denoted $|G|$
  • Size of $G$ is |E(G)|$, often denoted $e(G)$
  • Will only deal with simple graphs (or rather, the def above only includes simple graphs)
  • $ab$ often used to refer to the edge $\left\{ a, b \right\}$
  • If $ab \in E(G)$ we say "$a$ is adjacent to $b$" and might write $ab \in G$
  • Two graphs $G, H$ are *isomorphic$ if there exists a bijection $f: V(G) \to V(H)$ such that

uv \in E(G) \quad \iff \quad f(u) f(v) \in E(H)

    i.e. edge-preserving bijection between vertices

  • $E_n$ is empty graph of order $n$ and size 0
  • $K_n$ is complete graph of order $n$ and size ${n \choose 2}$, i.e. all pairs vertices adjacent
  • $P_n$ is a path of order $n$ and size $n - 1$
  • $C_n$ is a circuit of order $n$ and size $n$, i.e. just a path with the ends joined, or a cycle
  • A subgraph of $G$ is a graph $H$ with $V(H) \subset V(G)$ and $E(H) \subset E(G)$ (so every graph of order $\le n$ is a subgraph of $K_n$)
  • A subgraph of $G$ induced by the subset $W \subset V(G)$ written $G[W]$ is the graph $\big( W, E(G) \cap W^{(2)} \big)$, i.e. the vector set $W$ and all edges of $G$ lying inside $W$

Lecture 1

The components of $G$ are the maximal connected subgraphs induced by the equivalence classes of the relation "there is a u-v-path or $u = v$".

A forest is a graph containing no circuit.

A tree is a connected forest, i.e. components of a forest are trees. Equivalently, it's connected but has no circuit.

The following are equivalent

  1. $G$ is a tree (connected but has no circuit)
  2. $G$ is minimal connected, i.e. $G$ is connected but removal of any edge kills connectivity
  3. $G$ is maximial circuit-free, i.e. $G$ has no circuit, but addition of any new edge creates a circuit

$(1) \implies (2)$: Let $G$ be connected circuit free. Suppose $uv \in G$ and $G - uv$ is connected (i.e. $G$ without the edge $uv$). Then there is a path $P$ in $G - uv$, so $G$ contains the circuit $P + uv$. Thus contradiction means $G$ is minimal connected.

$(2) \implies (1)$: Let $G$ be minimal conected. Suppose $G$ has a circuit $C$ and let $xy \in C$. Then, $x$ and $y$ are connected by at least two paths, so we can remove one edge along one of these paths but $G$ still remain connected $\implies$ $G$ is circuit-free.

$(1) \implies (3)$: Let $G$ be a tree. If $uv \notin E(G)$ there is a path $P$ in $G$ from $u$ to $v$, so $P + uv$ is a circuit in $G + uv$. So $G$ is maximal circuit-free.

$(3) \implies (1)$: Let $G$ be maximal circuit-free. Let $u, v \in V(G)$. Either $uv \in E(G)$ or $G + uv$ has a circuit $C$ so $C - uv$ is a u-v-path in $G$. Either way $u$ is joined to $v$, so $G$ is connected.

Discrete Exterior Calculus