Measure theory

Table of Contents

Notation

  • $1_S$ and $\chi_S$ are used to denote the indicator or characteristic function

Definition

Motivation

The motivation behind defining such a thing is related to the Banach-Tarski paradox, which says that it is possible to decompose the 3-dimensional solid unit ball into finitely many pieces and, using only rotations and translations, reassemble the pieces into two solid balls each with the same volume as the original. The pieces in the decomposition, constructed using the axiom of choice, are non-measurable sets.

Informally, the axiom of choice, says that given a collecions of bins, each containing at least one object, it's possible to make a selection of exactly one object from each bin.

Measure space

If $X$ is a set with the sigma-algebra $\Sigma$ and the measure $\mu$, then we have a measure space .

Product measure

Given two measurable spaces and measures on them, one can obtain a product measurable space and a product measure on that space.

A product measure $\mu_1 \times \mu_2$ is defined to be a measure on the measurable space $(X_1 \times X_2, \Sigma_1 \otimes \Sigma_2)$, where we've let $\Sigma_1 \otimes \Sigma_2$ be the algebra on the Cartesian product $X_1 \times X_2$. This sigma-algebra is called the tensor-product sigma-algebra on the product space, which is defined

\begin{equation*}
\Sigma_1 \otimes \Sigma_2 := \sigma \big( A_1 \times A_2 : A_i \in \Sigma_i \big)
\end{equation*}

A product measure $\mu_1 \times \mu_2$ is defined to be a measure on the measurable space $(X_1 \times X_2, \Sigma_1 \otimes \Sigma_2)$ satisfying the property

\begin{equation*}
(\mu_1 \times \mu_2) (B_1 \times B_2) = \mu_1(B_1) \mu_2 (B_2), \quad \forall \ B_1 \in \Sigma_1, \ B_2 \in \Sigma_2
\end{equation*}

$\liminf$ and $\limsup$

Let $\big( a_m \big)_{m = 0}^{\infty}$ be a sequence of extended real numbers.

The limit inferior is defined

\begin{equation*}
\liminf_{n \to \infty} a_n = \ \uparrow \lim_{n \to \infty} \Big( \inf_{m \ge n} a_m \Big)
\end{equation*}

The limit supremum is defined

\begin{equation*}
\limsup_{n \to \infty} a_n = \ \downarrow \lim_{n \to \infty} \Big( \sup_{m \ge n} a_m \Big)
\end{equation*}

Premeasure

Given a space $\Omega$, and a collection of sets $\mathcal{A} \subseteq \text{Pow}(\Omega)$ is an algebra of sets on $\Omega$ if

  • $\emptyset \in \mathcal{A}$
  • If $S \in \mathcal{A}$, then $S^c \in \mathcal{A}$
  • If $S$ and $T$ are in $\mathcal{A}$, then $S \cup T \in \mathcal{A}$

Thus, a algebra of sets allow only finite unions, unlike σ-algebras where we allow countable unions.

Given a space $\Omega$ and an algebra $\mathcal{A}$, a premeasure is a function $\lambda: \mathcal{A} \to [0, \infty]$ such that

  • $\lambda(\emptyset) = 0$
  • For every finite or countable collection of disjoint sets $\{ S_i \}_{i = 1}^N$ with $N \in \mathbb{N} \cup \left\{ \infty \right\}$, if $\bigcup_{i = 1}^{N} S_i \in \mathcal{A}$ then

    \begin{equation*}
\mu \bigg( \bigcup_{i=1}^{N} S_i \bigg) = \sum_{i=1}^{N} \mu (S_i)
\end{equation*}

Observe that the last property says that IF this "possibly large" union is in the algebra, THEN that sum exists.

A premeasure space is a triple $\big( \Omega, \mathcal{A}, \mu \big)$ where $\Omega$ is a space, $\mathcal{A}$ is an algebra, and a premeasure $\mu$.

Complete measure

A complete measure (or, more precisely, a complete measure space ) is a measure space in which every subset of every null set is measurable (having measure zero).

More formally, $(\Omega, \mathcal{A}, \mu)$ is complete if and only if

\begin{equation*}
B \subseteq A \in \mathcal{A} \quad \text{and} \quad \mu(A) = 0 \implies B \in \Omega
\end{equation*}

If $\big( \Omega, \mathcal{A}, \lambda \big)$ is a premeasure space, then there is a complete measure space $\big( \Omega, \mathcal{M}, \mu \big)$ such that

  • $\mathcal{A} \subseteq \mathcal{M}$
  • $\forall A \in \mathcal{A}$ we have $\mu(A) = \lambda(A)$

If $\lambda$ is σ-finite, then $\mu|_{\sigma(\mathcal{A})}$ is the only measure on $\sigma(\mathcal{A})$ that is equal to $\lambda$ on $\mathcal{A}$.

Atomic measure

Let $\big( \mathcal{A}, \Omega, \mu \big)$ be a measure space.

Then a set $A \in \mathcal{A}$ is called an atom if

\begin{equation*}
\mu(A) > 0
\end{equation*}

and

\begin{equation*}
\mu(B) < \mu(A) \implies \mu(B) = 0, \quad \forall B \subset A
\end{equation*}

A measure $\mu$ which has no atoms is called non-atomic or diffuse

In other words, a measure $\mu$ is non-atomic if for any measurable set $A$ with $\mu(A) > 0$, there exists a measurable subset $B \subset A$ s.t.

\begin{equation*}
\mu(A) > \mu(B) > 0
\end{equation*}

π-system

Let $X$ be any set. A family $\mathcal{F}$ of subsets of $X$ is called a π-system if

  1. $\emptyset \in \mathcal{F}$
  2. If $A_1, \dots, A_n \in \mathcal{F}$, then

    \begin{equation*}
\bigcup_{i = 1}^{n} A_i \in \mathcal{F}
\end{equation*}

So this is an even weaker notion than being an (Boolean) algebra. We introduce it because it's sufficient to prove uniqueness of measures:

Let $\big( X, \mathcal{A} \big)$ be a measurable space and $\mu_1, \mu_2$ be two finite measures on $X$ s.t.

\begin{equation*}
\mu_1(F) = \mu_2(F), \quad \forall f \in \mathcal{F} \cup \left\{ X \right\}
\end{equation*}

where $\mathcal{F}$ is a pi-system such that

\begin{equation*}
\sigma(\mathcal{F}) = \mathcal{A}
\end{equation*}

Then

\begin{equation*}
\mu_1 = \mu_2
\end{equation*}

Theorems

Jensen's inequality

Let

  • $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space
  • $X: \Omega \to \mathbb{R}$ be random variable
  • $c: \mathbb{R} \to (-\infty, \infty]$ be a convex function Then $c$ is the supremum of a sequence of affine functions

    \begin{equation*}
c(x) = \sup_{n \in \mathbb{N}} \big( a_n x + b_n \big), \quad x \in \mathbb{R}
\end{equation*}

    for $(a_n, b_n)_{n \in \mathbb{N}}$, with $a_n, b_n \in \mathbb{R}$.

Then $\mathbb{E}[c(X)]$ is well-defined, and

\begin{equation*}
\mathbb{E}[c(X)] \overset{a.s.}{\ge} a_n \mathbb{E}[X] + b_n
\end{equation*}

Taking the supremum over $n \in \mathbb{N}$ in this inequality, we obtain

\begin{equation*}
\mathbb{E}[c(X)] \ge c \big( \mathbb{E}[X] \big)
\end{equation*}

$c: \mathbb{R} \to (-\infty, \infty]$ be a convex function Then $c(x)$ is the supremum of a sequence of affine functions

\begin{equation*}
c(x) = \sup_{n \in \mathbb{N}} \big( a_n x + b_n \big), \quad x \in \mathbb{R}
\end{equation*}

Suppose $c$ is convex, then for each point $\big( \alpha, c(\alpha) \big)$ there exists an affine function $f_{\alpha}(x) = a_{\alpha} x + b_{\alpha}$ s.t.

  • the line $L_{\alpha}$ corresponding to $f_{\alpha}$ passes through $\big( \alpha, c(\alpha) \big)$
  • the graph of $c$ lies entirely above $L_{\alpha}$

Let $A = \left\{ f_{\alpha} \mid \alpha \in \mathbb{R} \right\}$ be the set of all such functions. We have

  • $\sup_{f_{\alpha} \in A} f_{\alpha}(x) \ge f_x(x) = c(x)$ because $f_x$ passes through the point $\big( x, c(x) \big)$
  • $\sup_{f_{\alpha} \in A} f_{\alpha}(x) \le c(x)$ beacuse all $f_{\alpha}$ lies below $c$

Hence

\begin{equation*}
\sup_{f_{\alpha} \in A} f_{\alpha}(x) = c(x)
\end{equation*}

(note this is for each $x$, i.e. pointwise).

Sobolev space

Notation

  • $\Omega$ is an open subset of $\mathbb{R}^n$
  • $\varphi \in C_c^{\infty}(\Omega)$ denotes a infinitively differentiable function $\varphi$ with compact support
  • $\alpha$ is a multi-index of order $|\alpha| = k$, i.e.

    \begin{equation*}
D^{\alpha} f = \frac{\partial^{|\alpha|} f}{\partial x_1^{\alpha_1} \dots \partial x_n^{\alpha_n}}
\end{equation*}

Definition

Vector space of functions equipped with a norm that is a combination of $L^p$ norms of the function itself and its derivatoves to a given order.

Intuitively, a Sobolev space is a space of functions with sufficiently many derivatives for some application domain, e.g. PDEs, and equipped with a norm that measures both size and regularity of a function.

The Sobolev space spaces $W^{k,p}(\Omega)$ combine the concepts of weak differentiability and Lebesgue norms (i.e. $L^p$ spaces).

For a proper definition for different cases of dimension of the space $k$, have a look at Wikipedia.

Motivation

Integration by parst yields that for every $u \in C^k(\Omega)$ where $k \in \mathbb{N}$, and for all infinitively differentiable functions with compact support $\varphi \in C_c^{\infty}(\Omega)$:

\begin{equation*}
\int_{\Omega} u D^\alpha \varphi \ dx = \big( -1 \big)^{|\alpha|} \int_{\Omega} \varphi D^{\alpha} u \ dx
\end{equation*}

Observe that LHS only makes sense if we assume $u$ to be locally integrable. If there exists a locally integrable function $v$, such that

\begin{equation*}
\int_{\Omega} u D^{\alpha} \varphi \ dx = \big( -1 \big)^{|\alpha|} \int_{\Omega} \varphi v \ dx ,\quad \varphi \in C_c^{\infty}(\Omega)
\end{equation*}

we call $v$ the weak $\alpha$ -th partial derivative of $u$. If this exists, then it is uniquely defined almost everywhere, and thus it is uniquely determined as an element of a Lebesgue space (i.e. $L^p$ function space).

On the other hand, if $u \in C^k(\Omega)$, then the classical and the weak derivative coincide!

Thus, if $v = \partial_{\alpha} u$, we may denote it by $D^{\alpha} u := v$.

Example

\begin{equation*}
u(x)=\begin{cases}1+x&-1<x<0\\10&x=0\\1-x&0<x<1\\0&{\text{otherwise}}\end{cases}
\end{equation*}

is not continuous at zero, and not differentiable at −1, 0, or 1. Yet the function

\begin{equation*}
v(x)= \begin{cases}1&-1<x<0\\-1&0<x<1\\0&{\text{otherwise}}\end{cases}
\end{equation*}

satisfies the definition of being the weak derivative of $u(x)$, which then qualifies as being in the Sobolev space $W^{1, p}$ (for any allowed $p$).

Lebesgue measure

Notation

  • $\mathcal{M}$ denotes the collection of all measurable sets

Stuff

Given a subset $E \subseteq \mathbb{R}$, with the length of a closed interval $I = [a,b]$ given by $\ell (I) = b - a$, the Lebesgue outer measure $\lambda^* (E)$ is defined as

\begin{equation*}
\lambda^*(E) = \inf \Bigg\{ \sum_{k=1}^{\infty} \ell(I_k) : (I_k)_{k \in \mathbb{N}} \text{ is a sequence of open intervals with } E \subseteq \underset{k=1}{\overset{\infty}{\cup}} I_k \Bigg\}
\end{equation*}

Lebesgue outer-measure has the following properties:

  1. $\lambda^*(\emptyset) = 0$ Idea: Cover by $I_j = \emptyset, \forall j$.
  2. (Monotinicy) $\forall S, T \in \text{Pow}(\mathbb{R})$ if $S \subseteq T$, then

    \begin{equation*}
\lambda^*(S) \le \lambda^*(T)
\end{equation*}

    Idea: a cover of $T$ is a cover of $S$.

  3. (Countable subadditivity) For every set $S \in \text{Pow}(\mathbb{R})$ and every sequence of sets $\{ S_i \}_{i = 0}^{\infty}$ if $S \subseteq \bigcup_{i = 0}^{\infty}$ then

    \begin{equation*}
\lambda^*(S) \le \sum_{i=0}^{\infty} \lambda^*(S_i)
\end{equation*}

    Idea: construct a cover of each $S_i$, $\{ I_{i, k} \}$ such that $\sum_{k=0}^{\infty} \ell(I_{i, k}) < \lambda^*(S_i) + \varepsilon 2^{-i}$:

    • Every point in $S$ is in one of the $S_i$
    • $\lambda^*(S) \le \sum_{i, k}^{} \ell(I_{i, k}) \le \sum_{i=0}^{\infty} \big( \lambda^*(S_i) + \varepsilon 2^{-i} \big) = \sum_{i=0}^{\infty} \lambda^*(S_i) + \varepsilon$

Q: Is it possible for every $S \subseteq \mathbb{R}$ to find a cover $\{ I_i \}$ such that $\lambda^*() = \sum_{i=0}^{\infty} \ell (I_i)$? A: No. Consider $\{ 0 \}$. Given $\varepsilon > 0$, consider $\{ \big( -\varepsilon, \varepsilon \big), \emptyset, \emptyset, \dots \}$. This is a cover of $\{ 0 \}$ so $\lambda^*(\left\{ 0 \right\}) \le \ell \Big( (-\varepsilon, \varepsilon) \Big) = 2 \varepsilon \to 0$. If $\{ I_i \}_{i = 0}^\infty$ is a cover by open intervals of $\{ 0 \}$, then there is at least one $j$ such that $I_j$ is a nonempty open interval, so it has a strictly positive lenght, and

\begin{equation*}
\sum_{i=0}^{\infty} \ell(I_i) \ge \ell(I_j) > 0 = \lambda^*(\left\{ 0 \right\})
\end{equation*}

If $- \infty < a < b < \infty$, then

\begin{equation*}
\lambda^* \big( [a, b] \big) = \lambda^*([a, b)) = \lambda^* \big( (a, b] \big) = \lambda^* \big( (a, b) \big)
\end{equation*}

Idea: $\big( a ,b \big) \subseteq [a, b]$, so $\lambda^* \big( (a, b) \big) \le \lambda^* \big( [a, b] \big)$. For reverse, cover $\big( a, b \big)$ by intervals giving a sum within $\varepsilon$. Then cover $\{ a \}$ and $\{ b \}$ by intervals of length $2 \varepsilon$. Put the 2 new sets at at the start of the sequence, to get a cover of $[a, b]$, and sum of the lengths is at most $2 \varepsilon + 2 \varepsilon + \lambda^* \big( (a, b) \big) + \varepsilon = \lambda^* \big( (a, b) \big) + 5 \varepsilon \searrow \lambda^* \big( (a, b) \big)$. Hence,

\begin{equation*}
\lambda^* \big( (a, b) \big) \le \lambda^* \big( [a, b] \big) \text{ and } \lambda^* \big( [a, b] \big) \le \lambda^* \big( (a, b) \big)
\end{equation*}

If $I$ is an open interval, then $\lambda^* (I) = \ell(I)$.

Idea: lower bound from $\{ I, \emptyset, \emptyset, \dots \}$. Only bounded nonempty intervals are interesting. Take the closure to get a compact set. Given a countable cover by open intervals, reduce to a finite subcover. Then arrange a finite collection of intervals in something like increasing order, possibly dropping unnecessary sets. Call these new intervals $J_i = \big( c_i, d_i \big)$ and let $p$ be the number of such intervals, and such that

\begin{equation*}
c_1 < a \quad \text{and} \quad d_p > b
\end{equation*}

i.e. left-most interval cover the starting-point, and right-most interval cover the end-point. Then

\begin{equation*}
\sum_{k=0}^{\infty} \ell \big( I_k \big) \ge \sum_{i=i}^{p} \ell (J_i) = \sum_{i=1}^{p} \big( d_i - c_i \big) > d_p - c_1 > b - a
\end{equation*}

Taking the infimum,

\begin{equation*}
\lambda^* \big( (a, b) \big) \ge \ell \big( (a, b) \big)
\end{equation*}

The Lebesgue measure is then defined on the Lebesgue sigma-algebra, which is the collection of all the sets $E$ which satisfy the condition that, for every $A \subseteq \mathbb{R}$

\begin{equation*}
\lambda^*(A) = \lambda^* (A \cap E) + \lambda^* (A \cap E^c)
\end{equation*}

For any set in the Lebesgue sigma-algrebra, its Lebesgue measure is given by its Lebesgue outer measure $\lambda (E) = \lambda^*(E)$.

IMPORTANT!!! This is not necessarily related to the Lebesgue integral! It CAN be be, but the integral is more general than JUST over some Lebesgue measure.

Intuition

  • First part of definition states that the subset $E$ is reduced to its outer measure by coverage by sets of closed intervals
  • Each set of intervals $I$ covers $E$ in the sense that when the intervals are combined together by union, they contain $E$
  • Total length of any covering interval set can easily overestimate the measure of $E$, because $E$ is a subset of the union of the intervals, and so the intervals include points which are not in $E$

Lebesgue outer measure emerges as the greatest lower bound (infimum) of the lengths from among all possible such sets. Intuitively, it is the total length of those interval sets which fit $E$ most tightly and do not overlap.

In my own words: Lebesgue outer measure is smallest sum of the lengths of subintervals $I_k$ s.t. the union of these subintervals $I_k$ completely "covers" (i.e. are equivalent to) $E$.

If you take an a real interval $I = [a, b]$, then the Lebesge outer measure is simply $\ell(I) = b - a$.

Properties

Notation

  • For $\alpha \in \mathbb{R}$ and $S \subseteq \mathbb{R}$, we let

    \begin{equation*}
\begin{split}
S + \alpha &= \left\{ x \in \mathbb{R} : \exists y \in S \text{ s.t. } x = y + \alpha \right\} \\
\alpha S &= \left\{ x \in \mathbb{R} : \exists y \in S \text{ s.t. } x = \alpha y \right\}
\end{split}
\end{equation*}

Stuff

The collection of Lebesgue measurable sets is a sigma-algebra.

  1. Easy to see $\emptyset$ is in this collection:

    \begin{equation*}
m^*(E) = m^* (E \cap \emptyset) + m^*(E \cap \emptyset^c) = m^*(\emptyset) + m^*(E) = m^*(E)
\end{equation*}
  2. Closed under complements is clear: let $S \subseteq \mathbb{R}$ be Lebesgue measurable, then

    \begin{equation*}
m^*(E) = m^*(E \cap S) + m^*(E \cap S^c ) = m^*\big(E \cap (S^c)^c \big) + m^* \big( E \cap S^c \big)
\end{equation*}

    hence this is also true for $S^c$, and so $S^c$ is Lebesgue measurable.

  3. Closed under countable unions:
    • Finite case: $m^*(E) \le m^* \big( E \cap S \big) + m^* (E \cap S^c)$. Consider $S_1, S_2$ both Lebesgue measurable and some set $E$. Since $S_2$ is L. measurable:

      \begin{equation*}
m^* (E) = m^* (E \cap S_2) + m^* (E \cap S_2^c)
\end{equation*}

      Since $S_1$ is L. measurable:

      \begin{equation*}
m^* (E \cap S_2^c) = m^* \Big( (E \cap S_2^c) \cap S_1 \Big) + m^* \Big( (E \cap S_2^c) \cap S_1^2 \Big)
\end{equation*}

      which allows us to rewrite the above equation for $m^*(E)$:

      \begin{equation*}
m^*(E) = m^* \big( E \cap S_2 \big) + m^* \big( E \cap S_1 \cap S_2^c \big) + m^* \big( E \cap S_1^c \cap S_2^c \big)
\end{equation*}

      Observe that

      \begin{equation*}
E \cap \big( S_1 \cup S_2 \big) = E \cap \Big( S_1 \cup (S_2 \cap S_1^c) \Big) \underset{\text{de Morgan}}{=} \big( E \cap S_1 \big) \cup \big( E \cap S_2 \cap S_1^c \big)
\end{equation*}

      By subadditivity:

      \begin{equation*}
m^* \big( E \cap (S_1 \cup S_2) \big) \le m^*(E \cap S_1) + m^*(E \cap S_2 \cap S_1^c)
\end{equation*}

      Hence,

      \begin{equation*}
\begin{split}
  m^*(E) &\ge m^* \big( E \cap (S_1 \cup S_2) \big) + m^* \big( E \cap S_1^c \cap S_2^c \big) \\
  &= m^* \big( E \cap (S_1 \cup S_2) \big) + m^* \big( E \cap (S_1 \cup S_2)^c \big)
\end{split}
\end{equation*}

      Then this follows for all finite cases by induction.

    • Countable disjoint case: Let $S = \bigcup_{ i = 0}^\infty S_i$, and $E \subseteq \mathbb{R}$. Further, let $R_n = \bigcup_{i  = 0}^n S_i$.

      \begin{equation*}
R_n \subseteq S \implies S^c \subseteq \tensor{R}{_i^c}
\end{equation*}

      Hence $R_i$ is L. measurable. Thus,

      \begin{equation*}
\begin{split}
   m^*(E) &= m^*(E \cap R_n) + m^* (E \cap \tensor{R}{_n^c}) \\
   & \ge m^*(E \cap R_n) + m^*(E \cap S^c) \quad \text{by monotonicity}
\end{split}
\end{equation*}

      Since the $S_i$ are disjoint $R_n \cap \tensor{S}{_n^c} = R_{n - 1}$ and $R_n \cap S_n = S_n$:

      \begin{equation*}
\begin{split}
  m^*(E \cap R_n) &= m^* \big( E \cap R_n \cap S_n \big) + m^* \big( E \cap R_n \cap \tensor{S}{_n^c} \big) \\
  &= m^*(E \cap S_n) + m^* (E \cap R_{n - 1})
\end{split}
\end{equation*}

      Let $R_0 = S_0$ and note that $m^*(R_0) = m^*(S_0)$. Thus, by indiction

      \begin{equation*}
m^*(E \cap R_n) = \sum_{i=0}^{n} m^*(E \cap S_i)
\end{equation*}

      Thus,

      \begin{equation*}
m^*(E) \ge \sum_{i=0}^{n} m^*(E \cap S_i) + m^*(E \cap S^c)
\end{equation*}

      Taking $n \to \infty$:

      \begin{equation*}
\begin{split}
  m^*(E) &\ge \sum_{i=0}^{\infty} m^*(E \cap S_i) + m^* (E \cap S^c) \\
  & \ge m^* \Big( E \cap \bigcup_{i = 0}^\infty R_i \Big) + m^* (E \cap S^c) \\
  &= m^*(E \cap S) + m^*(E \cap S^c)
\end{split}
\end{equation*}

      Thus, $S = \bigcup_{i = 0}^\infty S_i$ is L. measurable if the $S_i$ are disjoint and L. measurable!

    • Countable (not-necessarily-disjoint) case: If $S_i$ are not disjoint, let $R_n = \bigcup_{i=0}^{n} S_i$ and let $T_n = R_n \cap R_{n - 1}^c$, which gives a sequence of disjoint sets, hence the above proof applies.

Every open interval is Lebesgue measurable, and the Borel sigma-algebra is a subset of the sigma-algebra of Lebesgue measurable sets.

Want to prove measurability of intervals of the form $(a, \infty)$.

Idea:

  1. split any set $E$ into the left and right part
  2. split any cover in the same way
  3. extend covers by $\varepsilon 2^{- i}$ to make them open

$\big( \mathbb{R}, M, m \big)$ is a measure space, and for al intervals $I$, the measure is the length.

Cantor set

Define

\begin{equation*}
\begin{split}
  \Phi: \quad & \text{Pow}(\mathbb{R}) \to \text{Pow}(\mathbb{R}) \\
  & S \mapsto \frac{1}{3} S \cup \bigg( \frac{1}{3} S + \frac{2}{3} \bigg)
\end{split}
\end{equation*}

For $n \in \mathbb{N}$, with $\Phi^0$ being identity, and

\begin{equation*}
\Phi^{n + 1} = \Phi^n \circ \Phi = \Phi \circ \Phi^n
\end{equation*}

Let $E_0 = [0, 1]$ and $E_n = \Phi^n(E_0)$. Then the Cantor set is defined

\begin{equation*}
E = \bigcap_{n = 0}^{\infty} E_n
\end{equation*}

The Cantor set has a Lebesgue measure zero.

We make the following observations:

  • Scaled and shifted closed sets are closed
  • $E_n$ is a finite union of closed intervals and so is in the Borel sigma-algebra
  • σ-algebras are closed under countable intersections, hence Cantor set is in the Borel σ-algebra
  • Finally, Borel σ-algebra is a subset of Lebesgue measurable sets, hence the Cantor set is Lebesuge measurable!

Since Lebesgue measure satisfy $m(\alpha S) = \left| \alpha \right| \ m(S)$ for any Lebesgue measurable set $S$ with finite measure and any $\alpha \in \mathbb{R}$ with $\alpha \ne 0$. Since Lebesgue measure is subadditive, we have for any $n \in \mathbb{Z}^+$

\begin{equation*}
m(E_n) = m \Big( \Phi(E_{n - 1}) \Big) \le m \bigg( \frac{1}{3} E_{n - 1} + \frac{2}{3} \bigg) = \frac{2}{3} m(E_{n - 1})
\end{equation*}

Since $m(E_0) = 1$, by induction, it follows that

\begin{equation*}
m(E_n) \le \bigg( \frac{2}{3} \bigg)^n
\end{equation*}

Taking the infimum of over $n$, we have that the Cantor set has measure zero:

\begin{equation*}
m(E_n) \searrow 0 \implies m(E) = 0
\end{equation*}
Cardinality of the Cantor set

Let $x \in [0, 1]$.

The terniary expansion is a sequence $\{ b_i \}_{i = 1}^\infty$ with $b_i \in \left\{ 0, 1, 2 \right\}$ such that

\begin{equation*}
x = \sum_{i=1}^{\infty} b_i 3^{-i}
\end{equation*}

The Cantor set $E$ is uncountable.

We observe that if the first $n$ elements of the expansion for $x$ are in $\{ 0, 2 \}$, then $x \in E_n$. But importantly, observe that some numbers have more than one terniary expansion, i.e.

\begin{equation*}
\frac{1}{3} \equiv \big( 1, 0, 0, \dots \big) \equiv \big( 0, 2, 2, \dots \big)
\end{equation*}

in the terniary expansion. One can show that a number $x \in E$ if and only if $x$ has a terniary expansion with no 1 digits. Hence, the Cantor set $E$ is uncountable!

One can see that $x \in E$ if and only if terniary expansion with no 1 digits, since such an $x$ would land in the "gaps" created by the construction of the Cantor set.

Uncountable Lebesgue measurable set

There exists uncountable Lebesgue measurable sets.

Menger sponge

Vitali sets

Let $x \sim y$ if and only if $x - y \in \mathbb{Q}$.

  • There are uncountable many equivalence classes, with each equivalence class being countable (as a set).
  • By axiom of choice, we can pick one element from each equivalence class.
  • Can assume each representative picked is in $[0, 1]$, and this set we denote $R$

Suppose, for the sake of contradiction, that $R$ is measurable.

Observe if $x \in [0, 1]$, then there is a $q \in \mathbb{Q}$ and $r \in R$ s.t. $q \in [-1, 1]$, i.e.

\begin{equation*}
[0, 1] \subseteq \bigcup_{q \in [-1, 1] \cap \mathbb{Q}}^{} \Big( R + q \Big) \subseteq [-1, 2]
\end{equation*}

Then, by countable additivity

\begin{equation*}
\begin{split}
  m([0, 1]) &\le m \bigg( \bigcup_{q \in [-1, 1] \cap \mathbb{Q}}^{} R + q \bigg) \le m \big( [-1, 2] \big) = 3 \\
  m([0, 1]) &\le \sum_{q \in [-1, 1] \cap \mathbb{Q}}^{} m (R + q) \le  3 \\
  m([0, 1]) &\le \sum_{q \in [-1, 1] \cap \mathbb{Q}}^{} m (R) \le 3
\end{split}
\end{equation*}

where we've used

\begin{equation*}
m \bigg( \bigcup_{q \in [-1, 1] \cap \mathbb{Q}}^{} R + q \bigg) = \sum_{q \in [-1, 1] \cap \mathbb{Q}}^{} m (R + q) = \sum_{q \in [-1, 1] \cap \mathbb{Q}}^{} m (R)
\end{equation*}

Hence, we have our contradiction and so this set, the Vitali set, is not measurable!

There exists a subset of $\mathbb{R}$ that is not measurable wrt. Lebesgue measure.

Lebesgue Integral

The Lebesgue integral of a function $f$ over a measure space $(X, \Sigma, \mu)$ is written

\begin{equation*}
\int_X f \ d \mu
\end{equation*}

which means we're taking the integral wrt. the measure $\mu$.

riemann_vs_lebesgue_integral.png

Special case: non-negative real-valued function

Suppose that $f : \mathbb{R} \to \mathbb{R}^+$ is a non-negative real-valued function.

Using the "partitioning of range of $f$" philosophy, the integral of $f$ should be the sum over $t$ of the elementary area contained in the thin horizontal strip between $y = t$ and $y = t + dt$, which is just

\begin{equation*}
\mu (\{x \ | \ f(x) > t\} ) dt
\end{equation*}

Letting

\begin{equation*}
f^*(t) = \mu (\{x \ | \ f(x) > t\} ) dt
\end{equation*}

The Lebesgue integral of $f$ is then defined by

\begin{equation*}
\int f \ d\mu = \int_0^\infty f^*(t) \ dt
\end{equation*}

where the integral on the right is an ordinary improper Riemann integral. For the set of measurable functions, this defines the Lebesgue integral.

Radon measure

  • Hard to find a good notion of measure on a topological space that is compatible with the topology in some sense
  • One way is to define a measure on the Borel set of the topological space

Let $\mu$ be a measure on the sigma-algebra of Borel sets of a Hausdorff topological space $X$.

  • $\mu$ is called inner regular or tight if, for any Borel set $B$, $\mu(B)$ is the supremum of $\mu(K)$ over all compact subsets of $K$ of $B$, i.e.

    \begin{equation*}
\mu \big( \text{Int}_{C} (B) \big) = \mu \big( U \big)
\end{equation*}

    where $\text{Int}_C$ denotes the compact interior, i.e. union of all compact subsets $K \subseteq B$.

  • $\mu$ is called outer regular if, for any Borel set $B$, $\mu(B)$ is the infimum of $\mu(U)$ over all open sets $U$ containing $B$, i.e.

    \begin{equation*}
\mu \big( \overline{B} \big) = \mu \big( B \big)
\end{equation*}

    where $\overline{B}$ denotes the closure of $B$.

  • $\mu$ is called locally finite if every point of $X$ has a neighborhood $U$ for which $\mu(U)$ is finite (if $\mu$ is locally finite, then it follows that $\mu$ is finite on compact sets)

The measure $\mu$ is called a Radon measure if it is inner regular and locally finite.

Suppose $\mu$ and $\nu$ are two $\sigma \text{-finite}$ measures on a measures on a measurable space $\big( \Omega, \mathcal{A} \big)$ and $\nu$ is absolutely continuous wrt. $\mu$.

Then there exists a non-negative, measurable function $f$ on $X$ such that

\begin{equation*}
\nu(E) = \int_E f \ d \mu, \quad \forall E \in \Omega
\end{equation*}

The function $\rho$ is called the density or Radon-Nikodym derivative of $\nu$ wrt. $\mu$.

If a Radon-Nikodym derivative of $\mu$ wrt. $\nu$ exists, then $\dv{\nu}{\mu}$ denotes the equivalence class of measurable functions that are Radon-Nikodym derivatives of $\mu$ wrt. $\nu$.

$f = \dv{\nu}{\mu}$ is often used to denote $f \in \dv{\nu}{\mu}$, i.e. $f$ is just in the equivalence class of measurable functions such that this is the case.

This comes from the fact that we have

\begin{equation*}
\nu(S) = \int_S \dd{\nu} = \int_S \dv{\nu}{\mu} \dd{\mu}
\end{equation*}

Suppose $f$ and $g$ are Radon-Nikodym derivatives of $\nu$ wrt. $\mu$ iff $f \overset{\text{a.e.}}{=} g$.

The δ measure cannot have a Radon-Nikodym derivative since integrating $\delta$ gives us zero for all measurable functions.

Continuity of measure

Suppose $\mu$ and $\nu$ are two sigma-finite measures on a measure space $(X, \Omega)$.

Then we say that $\mu$ is absolutely continuous wrt. $\nu$ if

\begin{equation*}
\nu(E) = 0 \implies \mu(E) = 0, \quad \forall E \in \Omega
\end{equation*}

We say that $\mu$ and $\nu$ are equivalent if each measure is absolutely continuous wrt. to the other.

Density

Suppose $\mu$ and $\nu$ are two sigma-finite measures on a measure space $(X, \Omega)$ and that $\mu$ is absolutely continuous wrt. $\nu$. Then there exists a non-negative, measurable function $\rho$ on $X$ such that

\begin{equation*}
\mu(E) = \int_{E} \rho \ \ d \nu
\end{equation*}

Measure-preserving transformation

$T: X \to X$ is a measure-preserving transformation is a transformation on the measure-space $(X, \Sigma, \mu)$ if

\begin{equation*}
\mu \Big( T^{-1}(A) \Big) = \mu(A), \quad \forall A \in \Omega
\end{equation*}

Measure

A measure on a set is a systematic way of defining a number to each subset of that set, intuitively interpreted as size.

In this sense, a measure is a generalization of the concepts of length, area, volume, etc.

Formally, let $\mathcal{A}$ be a $\sigma \text{-algebra}$ of subsets of $X$.

Suppose $\mu: \mathcal{A} \to [0, \infty]$ is a function. Then $\mu$ is a measure if

  1. $\mu(\emptyset) = 0$
  2. Whenever $A_0, A_1, \dots$ are pairwise disjoint subsets of $X$ in $\mathcal{A}$, then

    \begin{equation*}
\mu \bigg( \bigcup_{k = 0}^\infty A_k \bigg) = \sum_{k=0}^{\infty} \mu \big( A_k \big)
\end{equation*}
    • Called σ-additivity or sub-additivity

Properties

Let $\big( X, \mathcal{A}, \mu \big)$ be a measure space, and $A, B \in \mathcal{A}$ such that $A \subseteq B$.

Then $\mu(A) \le \mu(B)$.

Let

\begin{equation*}
Z = B \setminus A = B \cup A^c \in \mathcal{A}
\end{equation*}

Then $B = Z \cup A$, and by finite additivity property of a measure:

\begin{equation*}
\mu(B) = \mu (A) + \mu(Z) \ge \mu(A)
\end{equation*}

since $\mu(Z) > 0$ by definition of a measure.

If $A_0, A_1, \dots$ are $\mu \text{-measurable}$ subsets of $X$, then

\begin{equation*}
\mu \bigg( \bigcup_{k = 0}^\infty A_k \bigg) \le \sum_{k=0}^{\infty} \mu(A_k)
\end{equation*}

We know for a sequence of disjoint sets $\{ B_k \}$ we have

\begin{equation*}
\mu \bigg( \bigcup_{k = 0}^\infty B_k \bigg) = \sum_{k=0}^{\infty} \mu(B_k)
\end{equation*}

So we just let

\begin{equation*}
\begin{split}
  B_0 &= A_0 \\
  B_1 &= A_1 \setminus B_0 \\
  B_2 &= A_2 \setminus \big( B_0 \cup B_1 \big) \\
  &\vdots \\
  B_k &= A_k \setminus \bigg( \bigcup_{i = 0}^{k - 1} B_i \bigg) \\
  &\vdots
\end{split}
\end{equation*}

Then,

\begin{equation*}
A_k = B_k \cup \bigg( \bigcup_{i = 0}^{k - 1} B_i \bigg) = \bigcup_{i = 0}^k B_i
\end{equation*}

Thus,

\begin{equation*}
\begin{split}
  \mu(A_k) &= \sum_{i = 0}^k \mu(B_i) \\
  &= \mu(B_k) + \sum_{i=0}^{k - 1} \mu(B_i) \\
  &\ge \mu(B_k)
\end{split}
\end{equation*}

Concluding our proof!

Let $A_0 \subseteq A_1 \subseteq A_2, \dots$ be an increasing sequence of measurable sets.

Then

\begin{equation*}
\mu \bigg( \bigcup_{k = 0}^\infty A_k \bigg) = \lim_{k \to \infty} \mu(A_k)
\end{equation*}

Observe

\begin{equation*}
\bigcup_{i = 0}^k A_i \subseteq A_{k + 1}
\end{equation*}

then by monotonicity of the measure we have

\begin{equation*}
\mu \bigg( \bigcup_{ i = 0}^k A_i \bigg) \le \mu(A_{k + 1})
\end{equation*}

Which is true for all $k$, and so

\begin{equation*}
\begin{split}
  \lim_{k \to \infty} \mu \bigg( \bigcup_{ i = 0}^k A_i \bigg) &\le \lim_{k \to \infty} \mu (A_{ k + 1} ) \\
  \mu \bigg( \bigcup_{ i = 0}^\infty A_i \bigg) &= \lim_{k \to \infty} \mu(A_k)
\end{split}
\end{equation*}

as wanted.

Let $A_0 \supset A_1 \supset A_2, \dots$ be sets from some $\sigma \text{-algebra}$ $\mathcal{A}$.

If $\mu(A_0) < \infty$, then

\begin{equation*}
\mu \bigg( \bigcap_{k = 0}^{\infty} A_k \bigg) = \lim_{k \to \infty} \mu (A_k)
\end{equation*}

Examples of measures

Let

  • $\Omega$ be a space
  • $a \in \Omega$

The δ-measure (at $a$) is

\begin{equation*}
\begin{split}
  \delta_a: \quad & \text{Pow}(\Omega) \to [0, \infty) \\
  & A \mapsto \delta_a(A) = 
  \begin{cases}
    1 & \text{if } a \in A \\
    0 & \text{otherwise}
  \end{cases}
\end{split}
\end{equation*}

Sigma-algebra

Definition

Let $X$ be some set, and let $2^X$ be its power set. Then the subset $\Sigma \in 2^X$ is a called a σ-algebra on $X$ if it satisfies the following three properties:

  1. $X \in \emptyset$
  2. $\Sigma$ is closed under complement: if $X \in \Sigma \implies X^C \in \Sigma$
  3. $\Sigma$ is closed under countable unions: if $A_1, A_2, A_3, ... \in \Sigma \implies \cup_{i=1}^\infty A_i \in \Sigma$

These properties also imply the following:

  • $\emptyset \in \Sigma$
  • $\Sigma$ is closed under countable intersections: if $A_1, A_2, A_3, ... \in \Sigma \implies \cap_{i=1}^\infty A_i \in \Sigma$

Generated σ-algebras

Given a space $\Omega$ and a collection of subsets $C \subseteq \text{Pow}(\Omega)$, the σ-algebra generated by $C$, denoted $\sigma(C)$, is defined to be the intersection of all σ-algebras on $\Omega$ that contain $C$, i.e.

\begin{equation*}
\sigma(C) = \bigcup_{\alpha \in A} \mathcal{A}_{\alpha}
\end{equation*}

where

\begin{equation*}
\left\{ \mathcal{A}_{\alpha} \right\}_{\alpha \in A} := \left\{ \mathcal{A} \subseteq \text{Pow} \Big( \text{Pow}(\Omega) \Big): \mathcal{A} \text{ is a} \sigma \text{-algebra on }  \Omega \text{ and } C \subseteq \mathcal{A} \right\}
\end{equation*}

Let $\big( \Omega, \mathcal{A} \big)$ be a measurable space and $f: \Sigma \to \Omega$ a function from some space $\Sigma$ to $\Omega$.

The σ-algebra generated by $f$ is

\begin{equation*}
\sigma(f) = \sigma \Big( \left\{ f^{-1}(S) \mid S \in \mathcal{A} \right\} \Big)
\end{equation*}

Observe that though this is similar to σ-algebra generated by MEASURABLE function, the definition differs in a sense that the preimage does not have to be measurable. In particular, the σ-algebra generated by a measurable function can be defined as above, where $f^{-1}(S)$ is measurable by definition of $f$ being a measurable function, hence corresponding exactly to the other definition.

Let $\big( \Omega_1, \mathcal{A}_1, \mu_1 \big)$ and $\big( \Omega_2, \mathcal{A}_2, \mu_2 \big)$ be measure spaces and $f: \Omega_1 \to \Omega_2$ a measurable function.

The σ-algebra generated by $f$ is

\begin{equation*}
\sigma \Big( \left\{ A_1 \in \mathcal{A}_1 \mid \exists A_2 \in \mathcal{A}_2: f^{-1} (A_2) = A_1 \right\} \Big)
\end{equation*}

Let $\big( \Omega, \mathcal{A}, P \big)$ be a probability space and $X: \Omega \to \mathbb{R}$ a random variable.

The σ-algebra generated by $X$ is

\begin{equation*}
\sigma \Big( \left\{ A \in \mathcal{A} \mid \exists c \in [- \infty, \infty): X^{-1} \big( (c, \infty] \big) = A \right\} \Big)
\end{equation*}

Let $\Omega$ be a space.

If $\left\{ \mathcal{A}_{\alpha} \right\}_{\alpha \in A}$ is a collection of σ-algebras, then $\bigcap_{\alpha \in \mathcal{A}}^{} \mathcal{A}_{\alpha}$ is also a σ-algebra.

σ-finite

A measure or premeasure space $\big( \Omega, \mathcal{A}, \mu \big)$ is finite if $\mu(\Omega) < \infty$.

A measure $\mu$ on a measure space $(X, \mu)$ is said to be sigma-finite if $X$ can be written as a countable union of measurable sets of finite measure.

Example: counting measure on uncountable set is not σ-finite

Let $\Omega$ be a space.

The counting measure is defined to be $\#: \text{Pow}(\Omega) \to \mathbb{N} \cup \left\{ \infty \right\}$ such that

\begin{equation*}
\#(A) = 
\begin{cases}
  |A| & \text{if } |A| < \infty \\
  \infty & \text{otherwise}
\end{cases},
\quad \forall A \in \text{Pow}(\Omega)
\end{equation*}

On any uncountable set, the counting measure is not σ-finite, since if a set has finite counting measure it has countably many elements, and a countable union of finite sets is countable.

Properties

Let $\mathcal{A}$ be a $\sigma \text{-algebra}$ of subsets of a set $X$. Then

  1. $X, \emptyset \in \mathcal{A}$
  2. If $A_k \in \mathcal{A}$, then

    \begin{equation*}
\bigcap_{k = 0}^\infty A_k \in \mathcal{A}
\end{equation*}
  3. If $A, B \in \mathcal{A}$ then $A \cup B, \ A \cap B \in \mathcal{A}$

Borel sigma-algebra

Any set in a topological space that can be formed from the open sets through the operations of:

  • countable union
  • countable intersection
  • complement

is called a Borel set.

Thus, for some topological space $X$, the collection of all Borel sets on $X$ forms a σ-algebra, called the Borel algebra or Borel σ-algebra .

More compactly, the Borel σ-algebra on $\mathbb{R}$ is

\begin{equation*}
\mathcal{B} = \mathcal{B}(\mathbb{R}) = \sigma(\mathcal{O}_{\text{std.}})
\end{equation*}

where $\sigma(\mathcal{O}_{\text{std.}})$ is the σ-algebra generated by the standard topology on $\mathbb{R}$.

Borel sets are important in measure theory, since any measure defined on the open sets of a space, or on the closed sets of a space, must also be defined on all Borel sets of that space.

Any measure defined on the Borel sets is called a Borel measure.

Lebesgue sigma-algebra

Basically the same as the Borel sigma-algebra but the Lebesgue sigma-algebra forms a complete measure.

Note to self

Suppose we have a Lebesgue mesaure on the real line, with measure space $(\mathbb{R}, B, \lambda)$.

Suppose that $A$ is non-measurable subset of the real line, such as the Vitali set. Then the $\lambda^2$ measure of $\{0\} \times A$ is not defined, but

\begin{equation*}
\{0\} \times A \subseteq \{0\} \times \mathbb{R}
\end{equation*}

and this larger set ( $\{0\} \times \mathbb{R}$ ) does have $\lambda^2$ measure zero, i.e. it's not complete !

Motivation

Suppose we have constructed Lebesgue measure on the real line: denote this measure space by $(\mathbb{R}, B, \lambda)$. We now wish to construct some two-dimensional Lebesgue measure $\lambda^2$ on the plane $\mathbb{R}^2$ as a product measure.

Naïvely, we could take the sigma-algebra on $\mathbb{R}^2$ to be $B \otimes B$, the smallest sigma-algebra containing all measureable "rectangles" $A_1 \times A_2$ for $A_i \in B$.

While this approach does define a measure space, it has a flaw: since every singleton set has one-dimensional Lebesgue measure zero,

\begin{equation*}
\lambda^2 ( \{0\} \times A) = \lambda(\{0\}) \cdot \lambda(A) = 0
\end{equation*}

for any subset of $\mathbb{R}$.

What follows is the important part!

However, suppose that $A$ is non-measureable subset of the real line, such as the Vitali set. Then the $\lambda^2$ measure of $\{0\} \times A$ is not defined (since we just supposed that $A$ is non-measurable), but

\begin{equation*}
\{0\} \times A \subseteq \{0\} \times \mathbb{R}
\end{equation*}

and this larger set ( $\{0\} \times \mathbb{R}$ ) does have $\lambda^2$ measure zero, i.e. it's not complete !

Construction

Given a (possible incomplete) measure space $(X, \Sigma, \mu)$, there is an extension $(X, \Sigma_0, \mu_0)$ of this measure space that is complete .

The smallest such extension (i.e. the smallest sigma-algebra $\Sigma_0$ ) is called the completion of the measure space.

It can be constructed as follows:

  • Let $Z$ be the set of all $\mu$ measure zero subsets of $X$ (intuitively, those elements of $Z$ that are not already in $\Sigma$ are the ones preventing completeness from holding true)
  • Let $\Sigma_0$ be the sigma-algebra generated by $\Sigma$ and $Z$ (i.e. the smallest sigma-algreba that contains every element of $\Sigma$ and of $Z$)
  • $\mu$ has an extension to $\Sigma_0$ (which is unique if $\mu$ is sigma-finite), called the outer measure of $\mu$, given by the infimum
\begin{equation*}
\mu_0(C) := \inf \{\mu(D) \ | \ C \subseteq D \in \Sigma \}
\end{equation*}

Then $(X, \Sigma_0, \mu_0)$ is a complete measure space, and is the completion of $(X, \Sigma, \mu)$.

What we're saying here is:

  • For the "multi-dimensional" case we need to take into account the zero-elements in the resulting sigma-algebra due the product between the 1D zero-element and some element NOT in our original sigma-algebra
  • The above point means that we do NOT necessarily get completeness, despite the sigma-algebras defined on the sets individually prior to taking the Cartesian product being complete
  • To "fix" this, we construct a outer measure $\mu_0$ on the sigma-algebra where we have included all those zero-elements which are "missed" by the naïve approach, $\Sigma_0$

Measurable functions

Let $\big( X, S \big)$ and $\big( Y, \mathcal{T} \big)$ be measurable spaces.

A function $f: X \to Y$ is a measurable function if

\begin{equation*}
f^{-1} \big( T \big) \in \mathcal{S}, \quad \forall T \in \mathcal{T}
\end{equation*}

where $f^{-1}$ denotes the preimage of the $f$ for the measurable set $T \in \mathcal{T}$.

Let $S \subset X$.

We define the indicator function of $S$ to be the function $1_S: X \to \mathbb{R}$ given by

\begin{equation*}
1_S(x) = 
\begin{cases}
  1 & \text{if } x \in S \\
  0 & \text{if } x \notin S
\end{cases}
\end{equation*}

Let $S \subset X$. Then $1_S$ is measurable if and only if $S \in \mathcal{A}$.

Let $\big( X, \mathcal{A}, \mu \big)$ be a measure space or a probability space.

Let $\{ f_n \}_{n = 0}^\infty$ be a sequence of measurable functions.

  1. For each $m \in \mathbb{N}$, the function $\inf_{n \ge m} f_n$ is measurable
  2. The function $\liminf_{n \to \infty} f_n$ is measurable
  3. Thus, if $f_n$ converge pointwise, $\lim_{n \to \infty} f_n$ is measurable.

Let $\big( X, \mathcal{A} \big)$ be a measurable space, and let $f: X \to \big[ - \infty, \infty \big]$.

The following statements are equivalent:

  1. $f$ is measurable.
  2. $\forall c \in [- \infty, \infty)$ we have $f^{-1} \big( (c, \infty] \big) \in \mathcal{A}$.
  3. $\forall c \in (- \infty, \infty)$ we have $f^{-1} \big( [c, \infty] \big) \in \mathcal{A}$.
  4. $\forall c \in (- \infty, \infty]$ we have $f^{-1} \big( [-\infty, c) \big) \in \mathcal{A}$.
  5. $\forall c \in (- \infty, \infty)$ we have $f^{-1} \big( [-\infty, c] \big) \in \mathcal{A}$.

A function $g$ is measurable if

\begin{equation*}
\forall c \in [- \infty, \infty) : f^{-1} \Big( (c, \infty] \Big) \in \mathcal{A}
\end{equation*}

We also observe that by Proposition proposition:equivalent-statements-to-being-a-measurable-function, it's sufficient to prove

\begin{equation*}
\forall c \in (-\infty, \infty) : f^{-1} \big( [c, \infty] \big) \in \mathcal{A}
\end{equation*}

so that's what we set out to do.

For $m \in \mathbb{N}$ and $x \in X$, consider the following equivalent statements:

\begin{equation*}
\begin{align*}
  & & x &\in \bigg( \inf_{n \ge m} f_n \bigg)^{- 1} \Big( [c, \infty] \Big) \\
  & & \inf_{n \ge m} f_n(x) &\in [c, \infty] \\
  & & \inf_{n \ge m} f_n(x) &\ge c \\
  &\forall n \ge m : & f_n(x) &\ge c \\
  & \forall n \ge m : & x & \in f_n^{-1} \big( [c, \infty \big) \\
  & & x & \in \bigcap_{n \ge m}^{\infty} f_n^{-1} \big( [c, \infty] \big)
\end{align*}
\end{equation*}

Thus,

\begin{equation*}
\Big( \inf_{n \ge m} f_n \Big)^{-1} \big( [c, \infty] \big) = \bigcap_{n \ge m}^{} f_n^{-1} \big( [c, \infty] \big)
\end{equation*}

so

\begin{equation*}
\big( \inf_{n \ge m} f_n \big)^{-1} \big( [c, \infty] \big) \in \mathcal{A}
\end{equation*}

Recall that for each $x \in X$, the sequence $\{ \inf_{n \ge m} f_n(x) \}$ is an increasing sequence in $m$. Therefore, similarily, the following are equivalent:

\begin{equation*}
\begin{align}
  & & x & \in \big( \liminf_{n \to \infty} f_n \big)^{- 1} \big( [c, \infty] \big) \\
  & & \liminf_{n \to \infty} f_n(x) & \in [c, \infty] \\
  & & \uparrow \lim_{m \to \infty} \inf_{n \ge m} f_n(x) &\ge c \\
  & & \sup_m \inf_{n \ge m} f_n(x) &\ge c \\
  & \forall N \in \mathbb{Z} : \exist m \in \mathbb{N} & \inf_{n \ge m} f_n(x) &\ge c - \frac{1}{N} \\
  & \forall N \in \mathbb{N} : & x &\in \bigcup_{m \in \mathbb{N}}^{} \bigg( \inf_{n \ge m} f_n \bigg)^{-1} \bigg( \bigg[ c - \frac{1}{N}, \infty  \bigg] \bigg) \\
  & & x & \in \bigcap_{N \in \mathbb{N}}^{} \bigcup_{m \in \mathbb{N}}^{} \bigg( \inf_{n \ge m} f_n \bigg)^{-1} \bigg( \bigg[ c - \frac{1}{N}, \infty \bigg] \bigg)
\end{align}
\end{equation*}

Thus,

\begin{equation*}
\big( \liminf f_n \big)^{-1} \big( [c, \infty] \big) = \bigcap_{N \in \mathbb{N}}^{} \bigcup_{m \in \mathbb{N}}^{} \bigg( \inf_{n \ge m} f_n \bigg)^{-1} \bigg( \bigg[ c - \frac{1}{N}, \infty \bigg] \bigg)
\end{equation*}

Hence,

\begin{equation*}
\big( \liminf f_n \big)^{-1} \big( [c, \infty] \big) \in \mathcal{A}
\end{equation*}

concluding our proof!

Basically says the same as Prop. proposition:limits-of-measurable-functions-are-measurable, but a bit more "concrete".

Let $\mathcal{A}$ be a $\sigma \text{-algebra}$ of subsets of a set $X$, and let $\{ f_k \}_{k = 0}^\infty$ with $f_k: X \to \overline{\mathbb{R}}$ be a sequence of measurable functions.

Furthermore, let

\begin{equation*}
f(x) = \lim_{k \to \infty} f_k(x), \quad \forall x \in X
\end{equation*}

Then $f$ is a measurable function.

Simple functions

Let $\mathcal{A}$ be a $\sigma \text{-algebra}$ of subsets of a set $X$.

A function $\varphi: X \to \mathbb{R}$ is called a simple function if

  • it is measurable
  • only takes a finite number of values

Let $\mathcal{A}$ be a $\sigma \text{-algebra}$ of subsets of a set $X$.

Let $f : X \to [0, \infty]$ be a nonnegative measurable function.

Then there exists a sequence $\big( \varphi_n \big)$ of simple functions such that

  1. $0 \le \varphi_1(x) \le \varphi_2(x) \le \dots \le f(x)$ for all $x \in X$
  2. Converges to $f$:

    \begin{equation*}
f(x) = \lim_{n \to \infty} \varphi_n(x), \quad \forall x \in X
\end{equation*}

Define a function $\varphi_n$ as follows. Let

\begin{equation*}
A_{n, k} = \left\{ x \in X \mid \frac{k}{2^n} \le f(x) \le \frac{k + 1}{2^n} \right\}
\end{equation*}

and let

\begin{equation*}
A_{n, 2^n} = \left\{ x \in X \mid f(x) \ge n \right\}
\end{equation*}

Then the function

\begin{equation*}
\varphi_n = \sum_{k=0}^{n 2^n} \frac{k}{2^n} 1_{A_{n ,k}}
\end{equation*}

obeys the required properties!

Almost everywhere and almost surely

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

Let $\{ f_n \}$ be a sequence of measurable functions

  1. For each $m \in \mathbb{N}$ the function $\inf_{n \ge m} f_n$ is measurable
  2. The function $\liminf_{n \to \infty} f_n$ is measurable
  3. Thus, if the $f_n$ converge pointwise, then $\lim_{n \to \infty} f_n$ is measurable

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure space. Let $\phi$ be a condition in oe variable.

$\phi$ holds almost everywhere (a.e.) if

\begin{equation*}
\mu \big( \left\{ \omega \in \Omega \mid \phi(\omega) \text{ does not hold} \right\} \big) = 0
\end{equation*}

Let $\big( \Omega, \mathcal{A}, P \big)$ be a probability space and $\phi$ be a condition in one variable, then $\phi$ holds almost surely (a.e.) if

\begin{equation*}
P \big( \left\{ \omega \in \Omega \mid \phi(\omega) \text{ holds} \right\} \big) = 1
\end{equation*}

also denoted

\begin{equation*}
P(\phi) = 1
\end{equation*}

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a complete measure space.

  1. If $f$ is measurable and if $g = f$ a.e. then $g$ is measurable.
  2. Being equal a.e. is an equivalence relation on measurable functions.

Convergence theorems for nonnegative functions

Problems

Let $S$ be Lebesgue measurable and $m(S) \in (0, \infty)$.

Then $\exists a, b \in \mathbb{R}$ with $a < b$ s.t.

\begin{equation*}
\mu \big( S \cup (a, b) \big) > \frac{b - a}{2}
\end{equation*}

Clearly if $\exists a, b \in \mathbb{R}$ with $a < b$ s.t. $(a, b) \subseteq S$, then

\begin{equation*}
 m \big( S \cap (a, b) \big) = m \big( (a, b) \big) = \ell \big( (a, b) \big) = b - a
\end{equation*}

hence

\begin{equation*}
m \big( S \cap (a, b) \big) = b - a > \frac{b - a}{2}
\end{equation*}

Therefore it's sufficient to prove that if $m(S) \in (0, \infty)$, then there exists a non-degenerate open interval $(a, b)$ s.t. $m \big( S^c \cap (a, b) \big) = 0$. (first I said contained in $S$, but that is a unecessarily strong statement; if contained then what we want would hold, but what we want does not imply containment).

As we know, for every $\varepsilon > 0$ there exists $O = \bigcup_{k = 0}^{\infty} I_k$ such that $S \subseteq O$ and

\begin{equation*}
m \big( O \setminus S \big) = m \big( O \cap S^c \big) \le \varepsilon
\end{equation*}

Which implies

\begin{equation*}
m \big( O \cap S^c \big) = m \bigg( \bigcup_{k=0}^{\infty} I_k \cap S^c \bigg) = \sum_{k=0}^{\infty} m \big( I_k \cap S^c \big) \le \varepsilon
\end{equation*}

which implies

\begin{equation*}
m \big( I_k \cap S^c \big) \le \varepsilon, \quad \forall k
\end{equation*}

Letting $\varepsilon \to 0$, this implies that there exists an open cover $\left\{ J_k \right\}_{k = 0}^{\infty}$ s.t.

\begin{equation*}
\exists n: \quad m \big( J_n \cap S^c \big) = 0
\end{equation*}

and

\begin{equation*}
m \big( J_n \big) \in (0, \infty)
\end{equation*}

(this fact that this is true can be seen by considering $m(J_k) = 0$ for all $k$ and see that this would imply $\left\{ J_k \right\}$ not being a cover of $S$, and if $m(J_k) = \infty$, then since $m(S) < \infty$ there exists a "smaller" cover).

Thus,

\begin{equation*}
m \big( J_n \big) = m \big( J_n \cap S \big) + \underbrace{m \big( J_n \cap S^c \big)}_{=0} = m \big( J_n \cap S \big) \in (0, \infty)
\end{equation*}

Hence, letting $a, b \in \mathbb{R}$ be s.t.

\begin{equation*}
J_n = (a, b)
\end{equation*}

we have

\begin{equation*}
m \big( J_n \cap S \big) = m \big( J_n \big) = m \big( (a, b) \big) = b - a > \frac{b - a}{2}
\end{equation*}

as wanted!

$\forall n \in \mathbb{N}$, we have $f_n(\omega) > 0$ for almost every $\omega \in \Omega$ if and only if for almost every $\omega \in \Omega$, $f_n(\omega) > 0$ for all $n \in \mathbb{N}$.

This is equivalent to saying

\begin{equation*}
m \Big( f_n^{-1} \big( (-\infty, 0] \big) \Big) = 0, \quad \forall n \in \mathbb{N}
\end{equation*}

if and only if

\begin{equation*}
m \bigg( \bigcup_{n = 0}^{\infty} f_n^{-1} \Big( (-\infty, 0] \Big) \bigg) = 0
\end{equation*}

i.e. $\left\{ \omega: f_n(\omega) \le 0 \right\}$ is a set of measure zero.

$\big( \implies \big):$ Then clearly

\begin{equation*}
m \bigg( \bigcup_{n=0}^{\infty} f_n^{-1} \big( (- \infty, 0] \big) \bigg) = \sum_{n=0}^{\infty} \underbrace{m \Big( f_n^{-1} \big( (- \infty, 0] \big) \Big)}_{= 0} = 0
\end{equation*}

by the assumption.

$\big( \impliedby \big):$ Follows by the same logic:

\begin{equation*}
\sum_{n=0}^{\infty} m \Big( f_n^{-1} \big( (- \infty, 0] \big) \Big) = \underbrace{m \bigg( \bigcup_{n=0}^{\infty} f_n^{-1} \big( (- \infty, 0] \big) \bigg)}_{= 0} = 0
\end{equation*}

This concludes our proof.

Integration

Notation

  • We let

    \begin{equation*}
f(x) = f^+(x) - f^-(x)
\end{equation*}

    where

    \begin{equation*}
f^+ := \max \left\{ f, 0 \right\}, \quad f^- := - \min \left\{ f, 0 \right\}
\end{equation*}

Stuff

Let

\begin{equation*}
\varphi = \sum_{j = 1}^{m} a_j I_{A_j}
\end{equation*}

where $\{ a_j \}_{j = 1}^n$ are a set of positive values.

Then the integral $\int_X \varphi \ d\mu$ of $\varphi$ over $X$ wrt. $\mu$ is given by

\begin{equation*}
\int_X \varphi \ d\mu = \sum_{j=1}^{m} a_j \mu (A_j)
\end{equation*}

Let $f: X \to [0, \infty]$ be a nonnegative measurable function.

We define the integral $\int_X f \ d\mu$ of $f$ over $X$ wrt. $\mu$ by

\begin{equation*}
\int_X f \ d \mu = \sup \left\{ \int_X \varphi \ d \mu : \varphi \text{ is simple, and } 0 \le \varphi \le f \text{ on X} \right\}
\end{equation*}

Let $\big( f_k \big)$ be a sequence of nonnegative measurable functions on $X$. Assume that

  • $0 \le f_0(x) \le f_1(x) \le \dots$ for each $x \in X$
  • $\lim_{k \to \infty} f_k(x) = f(x)$ for each $x \in X$.

Then, we write $f_k \nearrow f$ pointwise.

Then $f$ is measurable, and

\begin{equation*}
\lim_{k \to \infty} \int_X f_k \ d\mu = \int \big( \lim_{k \to \infty} f_k \big) \ d \mu = \int_X f \ d\mu
\end{equation*}

Let $f = \lim_{n \to \infty} f_n$. By Proposition proposition:limit-of-measurable-functions-is-measurable, $f$ is measurable.

Since each $f_n$ satisfies $f_n \le f$, we know $\int f_n \ d \mu \le \int f \ d \mu$.

  • If $\int f \ d \mu = 0$, then since $0 \le \int f_n \ d\mu \le \int f \ d\mu$ and for all $n$ we have $\int f_n \ d\mu = 0$, and $\lim_{n \to \infty} \int f_n \ d\mu = 0 = \int f \ d\mu$.

Let $\alpha \in \big(0, \int f \ d\mu \big)$ and $\varepsilon > 0$.

Step 1: Approximate $f$ by a simple function.

Let $h$ be a simple function such that $h \le f$ and $\int h \ d\mu > \alpha$. Such an $h$ exists by definition of Lebesgue integral. Thus, there are $K \in \mathbb{N}$ such that $\{ c_i \}_{i = 1}^K \in (0, \infty)$, and disjoint mesurable sets $\{ B_i \}_{i = 1}^K$ such that

\begin{equation*}
h = \sum_{i=1}^{K} c_i \chi_{B_i}
\end{equation*}

If any $\mu(B_i) = 0$, it doesn't contribute to the integral, so we may ignore it and assume that there are no such sets.

Step 2: Find sets of large measure where the convergence is controlled.

Note that for all $i$ we have

\begin{equation*}
\lim_{n \to \infty} f_n(\omega) = f(\omega) \ge h(w) = c_i, \quad \forall w \in B_i
\end{equation*}

That is, for each $i$ and $\omega \in B_i$,

\begin{equation*}
\exists m \in \mathbb{N} :  \quad f(m) \ge \bigg( 1 - \frac{\varepsilon}{2} \bigg) h(\omega)
\end{equation*}

For $i$ and $n$, let

\begin{equation*}
S_{i, n} = \left\{ \omega \in B_i : f(\omega) \ge \bigg( 1 - \frac{\varepsilon}{2} \bigg)c_i \right\}
\end{equation*}

And since it's easier to work with disjoint sets,

\begin{equation*}
\begin{split}
  T_{i, 0} &= S_{i, 0} \\
  T_{i, n + 1} &= S_{i, n + 1} \setminus S_{i, n}
\end{split}
\end{equation*}

Observe that,

\begin{equation*}
\bigcup_{j = 0}^{n} T_{i, j} = S_{i, n} \quad \text{and}\quad \bigcup_{j = 0}^{\infty} T_{i, j} = B_i
\end{equation*}

Then,

\begin{equation*}
\begin{split}
  \mu(B_i) = \mu \bigg( \bigcup_{j=0}^{\infty} T_{i, j} \bigg) &= \sum_{j=0}^{\infty} \mu (T_{i, j} \\
  &= \lim_{n \to \infty} \sum_{j=0}^{n} \mu(T_{i, j}) \\
  &= \lim_{n \to \infty} \mu \bigg( \bigcup_{j=0}^{\n} T_{i, j} \bigg) \\
  &= \lim_{n \to \infty} \mu (S_{i, n})
\end{split}
\end{equation*}

We don't have a "rate of convergence" on $B_i$, but on $S_{i, n}$ we know that we are $\varepsilon / 2$ close, and so we can "control" the convergence.

Step 3: Approximate $f_n$ from below.

For each $i$ if $\mu(B_i) = \infty$, then let $N_i$ be such that

\begin{equation*}
\mu (S_{i, N_i}) > \alpha c_i^{-1}
\end{equation*}

and otherwise, let $N_i$ be such that

\begin{equation*}
\mu(S_{i, N_i}) > \bigg( 1 - \frac{\varepsilon}{2} \bigg) \mu (B_i)
\end{equation*}

Let $N = \max \left\{ N_1, \dots, N_K \right\}$, and let $S_i = S_{i, N}$.

For each $n \ge N$, $i \in \left\{ 1, \dots, K \right\}$ and $\omega \in S_i$ we have

\begin{equation*}
f_n(\omega) \ge f_{N_i} ( \omega) \ge \bigg( 1 - \frac{\varepsilon}{2} \bigg) c_i
\end{equation*}

Thus, $\forall \omega \in \Omega$, and $n \ ge N$,

\begin{equation*}
f_n(\omega) \ge \sum_{i=1}^{K} \bigg( 1 - \frac{\varepsilon}{2} \bigg) c_i \chi_{S_i}(\omega)
\end{equation*}

If there is a $j$ such that $\mu(B_j) = \infty$, then

\begin{equation*}
\begin{split}
  \int f_N \ d\mu & \ge c_j \bigg( 1 - \frac{\varepsilon}{2} \bigg) \mu(S_j)  \\
   &> c_j \bigg( 1 - \frac{\varepsilon}{2} \bigg) \alpha c_j^{-1} \\
   &= \alpha \bigg( 1 - \frac{\varepsilon}{2} \bigg)
\end{split}
\end{equation*}

Otherwise (if the integral is finite), then

\begin{equation*}
\begin{split}
  \int f_n \ d \mu &\ge \sum_{i=1}^{K} \bigg( 1 - \frac{\varepsilon}{2} \bigg) c_i \mu(S_i) \\
  & \ge \sum_{i=1}^{K} \bigg( 1 - \frac{\varepsilon}{2} \bigg) c_i \bigg( 1 - \frac{\varepsilon}{2} \bigg) \mu(B_i) \\
  & \ge \sum_{i=1}^{K} \bigg( 1 - \frac{\varepsilon}{2} \bigg)^2 c_i \mu(B_i) \\
  & \ge ( 1 - \varepsilon ) \sum_{i=1}^{K} c_i \mu(B_i) \\
  & \ge  ( 1 - \varepsilon ) \int h \ d\mu \qquad \text{(by def. of } h \text{ )} \\
  & > (1 - \varepsilon) \alpha
\end{split}
\end{equation*}

For every $\alpha \in (0, \int f \ d\mu)$ and $\varepsilon > 0$, there is an $N$ such that

\begin{equation*}
\forall n \ge N: \quad \int f_n \ d\mu \ge (1 - \varepsilon) \alpha
\end{equation*}

For every $\alpha \in (0, \int f \ d\mu)$ such that

\begin{equation*}
\lim_{n \to \infty} \int f_n \ d \mu \ge \alpha
\end{equation*}

Therefore

\begin{equation*}
\lim_{n \to \infty} \int f_n \ d \mu \ge \int f \ d \mu
\end{equation*}

Thus,

\begin{equation*}
\lim_{n \to \infty} \int f_n \ d\mu = \int \lim_{n \to \infty} f_n \ d \mu
\end{equation*}

as wanted.

Let $\{ f_n \}$ be any nonnegative measurable functions on $X$.

Then

\begin{equation*}
\int \liminf f_n \ d \mu \le \liminf \int f_n  \ d \mu
\end{equation*}

Let $g_n = \inf_{m \ge n} f_n$ and observe $\{ g_n \}$ are pointwise increasing

\begin{equation*}
\begin{split}
  \int \liminf_{n \to \infty} f_n \ d \mu &= \int \uparrow \lim_{n \to \infty} g_n \ d \mu \\
  &= \uparrow \lim_{n \to \infty} \int g_n \ d \mu \qquad \text{(by Monontone convergence)} \\
  &= \liminf_{n \to \infty} \int g_n \ d \mu \\
  & \le \liminf_{n \to \infty} \int f_n \ d \ \mu
\end{split}
\end{equation*}

Properties of integrals

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure space.

If $f$ is a nonnegative measurable function, then there is an increasing sequence of simple functions $\{ f_n \}$ such that

\begin{equation*}
\uparrow \lim_{n \to \infty} f_n = f
\end{equation*}

Given $f$ as above and $n \in \mathbb{N}$ for $k \in \left\{ 0, 1, \dots, 4^n - 1 \right\}$, let

\begin{equation*}
\begin{split}
  S_{n, k} &= f^{-1} \Big( \big[k 2^{-n}, (k + 1) 2^{-n} \big) \Big) \\
  S_{n, 4^n} &= f^{-1} \Big( [2^n, \infty] \Big)
\end{split}
\end{equation*}

and

\begin{equation*}
f_n = \sum_{k=0}^{4^n} \big( k 2^{-n} \big) \chi_{S_{n, k}}
\end{equation*}

Or a bit more explicit (and maybe a bit clearer),

\begin{equation*}
f_n = \underbrace{4^n 2^{-n}}_{= 2^n} \chi_{f^{-1} \Big( [2^n, \infty] \Big)} +  \sum_{k=0}^{4^n - 1} \big( k 2^{-n} \big) \chi_{f^{-1} \Big( \big[\frac{k}{2^n}, \frac{k + 1}{2^n} \big) \Big)
\end{equation*}

For each $n$, $\{ S_{n, k} \}_{k = 0}^{4^n}$ is a cover of $\Omega$. On each $S_{n, k}$ we have $f_n \le f$, hence $f_n \le f$ on entirety of $\Omega$.

Consider $\omega \in \Omega$. If $f(\omega) < \infty$, then for $n > \log_2 f(\omega})$ which in turn implies

\begin{equation*}
\left| f_n(\omega) - f(\omega) \right| < 2^{-n}
\end{equation*}

Hence $f_n(\omega) \to f(\omega)$.

Finally, if $f(\omega) = \infty$, then $\omega \in S_{n, 4^n}$ and for all $n$ $f_n$ take on values

\begin{equation*}
f_n(\omega) = 2^n \implies f_n(\omega) \to f(\omega)
\end{equation*}

Hence, $f_n \to f$ for all cases.

Furthermore, for any $n \in \mathbb{N}$ and $k < 4^n$, there is the nesting property

\begin{equation*}
S_{n, k} = S_{n + 1, 2k} \cup S_{n + 1, 2k + 1}
\end{equation*}

so on $S_{n, k}$ we have $f_n(\omega) \le f_{n + 1}(\omega)$.

(This can be seen by observing that what we're really doing here is dividing the values $f$ takes on into a grid, and observing that if we're in $S_{n, k}$ then we're either in $S_{n + 1, 2k}$ or $S_{n + 1, 2k + 1}$).

For $k = 4^n$, then

\begin{equation*}
S_{n, k} = \bigcup_{j=2 \cdot 4^n}^{4 \cdot 4^n} S_{n + 1, j}
\end{equation*}

so again $f_n \le f_{n + 1}$ and $\{ f_n \}$ is pointwise increasing.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure space.

Let

  • $f, g$ be nonnegative, measurable functions
  • $\alpha \in [0, \infty]$ s.t.

    \begin{equation*}
\forall \omega \in \Omega: \quad \alpha f(\omega) \text{ and } \alpha \int f \ d\mu
\end{equation*}

    is defined

  • $\{ h_n \}$ be a sequence of nonnegative measurable functions.

Then

  1. Finite sum

    \begin{equation*}
\int \big( f + g \big) \ d\mu = \int f \ d\mu + \int g \ d\mu
\end{equation*}
  2. Scalar multiplication

    \begin{equation*}
\int \alpha f \ d\mu = \alpha \int f \ d \mu
\end{equation*}
  3. Infinte sums

    \begin{equation*}
\sum_{n= 0}^{\infty} \int h_n \ d \mu = \int \sum_{n = 0}^{\infty} h_n d \mu
\end{equation*}

Let $\{ f_n \}$ and $\{ g_n \}$ be increasing sequence of simple functions converging to $f$, $g$, respectively.

Note $\{ f_n + g_n \}$ is aslo increasing to $f + g$.

By monotone convergence theorem

\begin{equation*}
\begin{split}
    \int_{}^{} \big( f + g \big) \ d\mu &= \int \lim_{n \to \infty} \big( f_n + g_n \big) \ d\mu \\
  &= \lim_{n \to \infty} \int \big( f_n + g_n \big) \ d\mu \\
  &= \lim_{n \to \infty} \bigg( \int f_n d \mu + \int g_n \ d\mu \bigg) \\
  &= \lim_{n \to \infty} \int f_n \ d\mu + \lim_{n \to \infty} \int g_n \ d\mu \\
  &= \int \lim_{n \to \infty} d \mu + \int \lim_{n \to \infty} g_n \ d\mu \\
  &= \int f \ d\mu + \int g \ d \mu
\end{split}
\end{equation*}

The argument is similar for products.

Finally, $\{ \sum_{n=0}^{N} h_n \}_{N = 0}^\infty$ is an increasing sequence of nonnegative measurable functions, since sums of measurable functions is a measurable function.

Thus, by monotone convergence and the result for finite sums

\begin{equation*}
\begin{split}
  \int \lim_{N \to \infty} \sum_{n = 0}^{N} h_n \ d\mu &= \lim_{N \to \infty} \int \sum_{n=0}^{N} h_n \ d \mu \\
  &=  \lim_{N \to \infty} \sum_{n=0}^{N} \int h_n \ d \mu \quad \text{(due to } f + g \text{ shown before)} \\
  &= \sum_{n = 0}^{\infty} \int h_n \ d \mu
\end{split}
\end{equation*}

Integrals on sets

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $\{ S_i \}_{i = 0}^\infty$ is a sequence of disjoint measurable sets then

\begin{equation*}
\chi_{\bigcup_{i=0}^{\infty}} S_i} = \sum_{i=0}^{\infty} \chi_{S_i}
\end{equation*}

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $h$ is a simple function and $S$ is a measurable set, then $h \chi_S$ is a simple function.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

Let $f$ be a nonnegative measurable function and $S \in \mathcal{A}$.

The integral of $f$ on $S$ is defined to be

\begin{equation*}
\int_S f \dd{\mu} = \int f \chi_S \dd{\mu}
\end{equation*}

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

Let $f$ be a nonnegative measurable function.

  • If $S_1$ and $S_2$ are disjoint measurable sets, then

    \begin{equation*}
\int_{S_1 \cup S_2} f \dd{\mu} = \int_{S_1}^{} f \dd{\mu} + \int_{S_2}^{} f \dd{\mu}
\end{equation*}
  • If $\{ S_i \}_{i = 0}^\infty$ are disjoint measurable sets, then

    \begin{equation*}
\int_{\bigcup_{i=0}^{\infty} S_i}^{} f \dd{\mu} = \sum_{i=0}^{\infty} f \dd{\mu}
\end{equation*}

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $f$ is a nonnegative measurable function, then $\mu_f: \mathcal{A} \to [0, \infty]$ defined by $\forall S \in \mathcal{A}$:

\begin{equation*}
\mu_f(S) = \int_S f \dd{\mu}
\end{equation*}

is a measure on $\mathcal{A}$.

If $0 < \int f \dd{\mu} < \infty$, then $P_f: \mathcal{A} \to [0, \infty]$ defined by $\forall S \in \mathcal{A}$:

\begin{equation*}
P_f(S) = \frac{1}{\int_{\Omega}^{} f \dd{\mu}} \int_{S}^{} f \dd{\mu}
\end{equation*}

The (real) Gaussian measure on $\mathbb{R}$ is defined as:

\begin{equation*}
P_{\text{Gaussian}}(S) = \frac{1}{\int e^{-x^2} \dd{m(x)}} \int_{S}^{} e^{-x^2} \dd{m(x)}
\end{equation*}

where $m$ denotes the Lebesgue measure.

A Gaussian probability measure can also be defined for an arbitrary Banach space $\Omega$ as follows:

Then, we say $\mu$ is a Gaussian probability measure on $\Omega$ if and only if $\mu$ is a Borel measure, i.e.

\begin{equation*}
\mu: \mathcal{B}(\Omega) \to \mathbb{R}
\end{equation*}

such that $\ell^* \mu$ is a real Gaussian probability measure on $\mathbb{R}$ for every linear functional $\ell: \Omega \to \mathbb{R}$, i.e. $\forall \ell \in \Omega^*$.

Here we have used the notation $\ell^*: \mathcal{B}^*(\Omega) \to \mathcal{B}^*(\mathbb{R})$, defined

\begin{equation*}
\big( \ell^* \mu \big)(A) = \mu \big( f^{-1}(A) \big)
\end{equation*}

where $\mathcal{B}^*(\Omega)$ denotes the Borel measures on $\Omega$.

Integrals of general functions

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $f: \Omega \to [- \infty, \infty]$ is a measurable function, then the positive and negative parts are defined by

\begin{equation*}
\begin{split}
  f^+ &= f \chi_{\left\{ x: f(x) > 0 \right\}} \\
  f^- &= f \chi_{\left\{ x: f(x) < 0 \right\}}
\end{split}
\end{equation*}

Note: $f^+$ and $(-1) \cdot f^-$ are nonnegative.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $f: \Omega \to [- \infty, \infty]$ is a measurable function, then $f^+$ and $f^-$ are measurable functions.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

  • A nonnegative function is defined to be integrable if it is measurable and $\int f \dd{\mu} < \infty$.
  • A function $f$ is defined to be integrable if it is measurable and $f^+ + (- f^-)$ is integrable.

For an integrable function $f$, the integral of $f$ is defined to be

\begin{equation*}
\int f \dd{\mu} = \int f^+ \dd{\mu} - \int \big( - f^- \big) \dd{\mu}
\end{equation*}

On a set $S$, the integral is defined to be

\begin{equation*}
\int_S f \dd{\mu} = \int_S f^+ \dd{\mu} - \int_S \big( - f^- \big) \dd{\mu}
\end{equation*}

Note that $f^+ + (- f^-)$, but in the actual definition of the integral, we use $- \int (- f^-) \dd{\mu}$.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

If $f$ and $g$ are real-valued integrable functions and $\alpha \in \mathbb{R}$, then

  1. (Scalar multiplication)

    \begin{equation*}
\int \alpha f \dd{\mu} = \alpha \int f \dd{\mu}
\end{equation*}
  2. (Additive)

    \begin{equation*}
\int \big( f + g \big) \dd{\mu} = \int f \dd{\mu} + \int g  \dd{\mu}
\end{equation*}

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

Let $f$ and $g$ be measurable functions s.t.

\begin{equation*}
\forall \omega \in \Omega: \quad \left| f(\omega) \right| \le g(\omega)
\end{equation*}

If $g$ is integrable then $f$ is integrable.

Examples

Consider $\mathbb{R}$ with Lebesgue measure. Is $\frac{1}{1 + x^2}$ integrable?
\begin{equation*}
\frac{1}{1 + x^2} < \frac{1}{x^2}
\end{equation*}

And

\begin{equation*}
\int_{1}^{\infty} \frac{1}{x^2} = 1 = \int_{-\infty}^{-1} \frac{1}{x^2} = 1
\end{equation*}

and

\begin{equation*}
\int_{-1}^{1} \frac{1}{1 + x^2} \le 2
\end{equation*}

therefore

\begin{equation*}
\int \frac{1}{1 + x^2} \dd{x} \le 1 + 2 + 1 < \infty
\end{equation*}

Thus, $\frac{1}{1 + x^2}$ is integrable.

Lebesge dominated convergence theorem

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure or probability space.

Let $g$ be a nonnegative integrable function and let $\left\{ f_n \right\}_{n = 0}^\infty$ be a sequence of (not necessarily nonnegative!) measurable functions.

Asssume $g$ and all $f_n$ are real-valued.

If $\forall \omega \in \Omega$ and $n \in \mathbb{N}$ such that

\begin{equation*}
\left| f_n(\omega) \right| \le g(\omega)
\end{equation*}

and the pointwise limit

\begin{equation*}
\lim_{n \to \infty} f_n(\omega)
\end{equation*}

exists.

Then

\begin{equation*}
\lim_{n\to \infty} \int f_n \dd{\mu} = \int \Big( \lim_{n \to \infty} f_n \Big) \dd{\mu}
\end{equation*}

That is, if there exists a "dominating function" $g(\omega)$, then we can "move" the limit into the integral.

Since $\forall n \in \mathbb{N}$ and $\omega \in \Omega$ such that $\left| f_n(\omega) \right| \le g(\omega)$, we find $\forall n \in \mathbb{N}$ that $g - f_n$ and $g + f_n$ are nonnegative.

Consider $a \in \left\{ -1, 1 \right\}$

From Fatou's lemma, we have

\begin{equation*}
\begin{split}
  \int g + \lim_{n \to \infty} \big( a f_n \big) \dd{\mu} &= \int \liminf_{n \to \infty} \big( g + a f_n \big) \dd{\mu} \\
  & \le \liminf_{n \to \infty} \int \big( g + a f_n \big) \dd{\mu} \\
  &= \int g \dd{\mu} + \liminf_{n \to \infty} \int a f_n \dd{\mu}
\end{split}
\end{equation*}

Therefore

\begin{equation*}
\int \lim_{n \to \infty} \big( a f_n \big) \dd{\mu} \le \liminf_{n \to \infty} \int a f_n \dd{\mu}
\end{equation*}

Consider $a = 1$, then

\begin{equation*}
\int \lim_{n \to \infty} f_n \dd{\mu} \le \liminf_{n \to \infty} \int f_n \dd{\mu}
\end{equation*}

(this looks very much like Fatou's lemma, but it ain't; $f_n$ does not necessarily have to be nonnegative as in Fatou's lemma)

Consider $a = -1$

\begin{equation*}
- \int \lim_{n \to \infty} f_n \dd{\mu} \le \liminf_{n \to \infty} - \int f_n \dd{\mu} = - \limsup_{n \to \infty} \int f_n \dd{\mu}
\end{equation*}

Therefore,

\begin{equation*}
\int \lim_{n \to \infty} f_n \dd{\mu} \ge \limsup_{n \to \infty} \int f_n \dd{\mu}
\end{equation*}

Which implies

\begin{equation*}
\limsup_{n \to \infty} \int f_n \dd{\mu} \le \int \lim_{n \to \infty} f_n \dd{\mu} \le \liminf_{n \to \infty} \int f_n \dd{\mu}
\end{equation*}

Since $\liminf \le \limsup$, we then have $\lim_{n \to \infty} f_n \dd{\mu}$ exists and is equal to $\int \lim_{n \to \infty} f_n \dd{\mu}$.

Examples of failure of dominated convergence

Where dominated convergence does not work

On $\mathbb{R}$ with Lebesgue measure, consider

\begin{equation*}
\begin{split}
  f_n &= \chi_{[n, n + \frac{1}{2}]} \\
  g_n &= \arctan \big( x - n \big) + \frac{\pi}{2}
\end{split}
\end{equation*}

such that $g_n \in [0, \pi]$ instead of $[- \frac{\pi}{2}, \frac{\pi}{2}]$ as "usual" with $\arctan$.

Both of these are nonnegative sequences that converge to $0$ pointwise.

Notice there is no integrable dominating function for either of these sequences:

  • $f_n$ would require a dominating function to have infinite integral, therefore no dominating integrable function exists.
  • $g_n \longrightarrow \pi$ on the right, and so a dominating function would have to be above $\pi$ on some interval $[a, \infty)$ which would lead to infinite integral.

Thus, Lebesgue dominated convergence does not apply

\begin{equation*}
\begin{split}
  \int \lim_{n \to \infty} f_n \dd{m} = 0 &< \frac{1}{2} = \lim_{n \to \infty} \int f_n \dd{m} \\
  \int \lim_{n \to \infty} g_n \dd{m} = 0 & < \infty = \lim_{n \to \infty} \int g_n \dd{m}
\end{split}
\end{equation*}
Noncummtative limits: simple case
\begin{equation*}
\lim_{N \to \infty} \lim_{M \to \infty} \big( N - M \big) = - \infty \ne \infty = \lim_{M \to \infty} \lim_{N \to \infty} \big( N - M \big)
\end{equation*}
Noncommutative limits: another one

Consider $\mathbb{R}$ with Lebesgue measure and

\begin{equation*}
f(x) = \frac{1}{x} \chi_{[1, \infty)} - \frac{1}{\left| x \right|} \chi_{(-\infty, -1]}
\end{equation*}

Consider $a < -1$ and $ b > 1$ and

\begin{equation*}
\begin{split}
  \int_{a}^{b} f \dd{x} &= - \int_{a}^{-1} \frac{1}{\left| x \right|} \dd{x} + \int_{1}^{b} \frac{1}{x} \dd{x} \\
  &= - \int_{1}^{\left| a \right|} \frac{1}{x} \dd{x} + \int_{1}^{b} \frac{1}{x} \dd{x} \\
  &= - \log \left| a \right| + \log b
\end{split}
\end{equation*}

Note that $\int_{1}^{\infty} \frac{1}{x} \dd{x} = \infty$, so $f$ is not integrable.

Consider

\begin{equation*}
\begin{split}
  \lim_{N \to \infty} \int \chi_{[-N, N]} f \dd{x} &= \lim_{N \to \infty} \Big( - \log \left| - N \right| + \log N \Big) = \lim_{N \to \infty} 0 = 0 \\
  \lim_{N \to \infty} \int \chi_{[-N, 2N]} f \dd{x} &= \lim_{N \to \infty} \Big( - \log N + \log 2N  \Big) = \lim_{N \to \infty} \log 2 = \log 2
\end{split}
\end{equation*}

Commutative limits

Consider

\begin{equation*}
f(x) = \frac{1}{1 + x^2} \sin(x)
\end{equation*}
\begin{equation*}
\lim_{N \to \infty} \lim_{M \to \infty} \int \chi_{[-M, N]} f \dd{x} = \lim_{M \to \infty} \lim_{N \to \infty} \int \chi_{[- M, N]} f \dd{x}
\end{equation*}

We know that $\frac{\left| \sin(x) \right|}{1 + x^2}$ is integrable and for all $M$ and $N$,

\begin{equation*}
f(x) \le \frac{\left| \sin(x) \right|}{1 + x^2}
\end{equation*}

By multiple applications of LDCT

\begin{equation*}
\begin{split}
  \lim_{N \to \infty} \lim_{M \to \infty \to \infty} \int \chi_{[-M, N]} f \dd{x} &= \lim_{N \to \infty} \int \chi_{(-\infty, N]} f \dd{x} \\
  &= \int \chi_{(-\infty, \infty)} f \dd{x} \\
  &= \int f \dd{x} \\
  &= \int \chi_{(-\infty, \infty)} f \dd{x} \\
  &= \lim_{M \to \infty} \int \chi_{[-M, \infty)} f \dd{x} \\
  &= \lim_{M \to \infty} \lim_{N \to \infty} \int \chi_{[-M, N]} f \dd{x}
\end{split}
\end{equation*}

Showing that in this case the limits do in fact commute.

Riemann integrable functions are measurable

All Riemann integrable functions are measurable.

For any Riemann integrable function, the Riemann integral and the Lebesgue integral are equal.

Almost everywhere and Lp spaces

If $f$ is a nonnegative, measurable function, and $\int f \dd{\mu} = 0$, then $f \overset{\text{a.e.}}{=} 0$.

For $n = 0$, let

\begin{equation*}
\begin{split}
  T_0 &= f^{-1} \Big( (1, \infty] \Big) \\
  T_n &= f^{-1} \bigg( \bigg( \frac{1}{n + 1}, \frac{1}{n} \bigg] \bigg), \quad n \in \mathbb{Z}^+
\end{split}
\end{equation*}

Observe the $\left\{ T_n \right\}$ are disjoint and

\begin{equation*}
\bigcup_{i = 1}^{\infty} T_i = f^{-1} \bigg( (0, \infty] \bigg)
\end{equation*}

Suppose that $f \overset{\text{a.e.}}{\ne} 0$. This implies that $f > 0$ on a set of positive measure, i.e.

\begin{equation*}
\mu \bigg( f^{-1} \Big( (0, \infty] \Big) \bigg) > 0
\end{equation*}

but this implies that

\begin{equation*}
\begin{split}
  \mu \bigg( \bigcup_{i=1}^{\infty} T_i \bigg) &> 0 \\
  \iff \quad \sum_{i=1}^{\infty} \mu (T_i) & > 0 \\
  \implies \quad \exists j \in \mathbb{Z}^+: \quad \mu(T_j) & > 0
\end{split}
\end{equation*}

Thus,

\begin{equation*}
\begin{split}
  f(\omega) &> \frac{1}{n + 1}, \quad \forall \omega \in T_n \\
  \implies \quad f &> \frac{1}{n + 1} \chi_{T_n} \\
  \implies \quad \int f \dd{\mu} &\ge \frac{1}{n + 1} \mu(T_n) > 0
\end{split}
\end{equation*}

which is a contradiction, hence $f \overset{\text{a.e.}}{=} 0$.

Let $f$ and $g$ be integrable.

\begin{equation*}
f \overset{\text{a.e.}}{=} g \iff \forall S \in \mathcal{A} : \quad \int_S f \dd{\mu} = \int_S g \dd{\mu}
\end{equation*}

$L^1(\dd{\mu})$ is the set of all equivalence classes of integrable functions wrt. the equivalence relation given by a.e. equality, i.e.

\begin{equation*}
L^1(\dd{\mu}) = \left\{ f \text{ integrable wrt. } \mu \right\} / \underset{\text{a.e.}}{\sim} \quad \text{where} \quad f \underset{\text{a.e.}}{\sim} g \iff f \overset{\text{a.e.}}{=} g
\end{equation*}

If $f$ is an integrable function, the $L^1$ norm is

\begin{equation*}
\norm{f}_{L^1} = \int \left| f \right| \dd{\mu}
\end{equation*}

If $[f] \in L^1(\dd{\mu})$ and $g \in [f]$, the integral and norm are defined to be

\begin{equation*}
\begin{split}
  \int [f] \dd{\mu} &= \int g \dd{\mu} \\
  \norm{[f]}_{L^1} &= \norm{g}_{L^1}
\end{split}
\end{equation*}

If $[f], [g], [h] \in L^1$, then $[f + g] \in L^1$, and

\begin{equation*}
\norm{[f - g]}_{L^1} \le \norm{f - g}_{L^1} + \norm{[h - g]}_{L^1}
\end{equation*}

$L^1$ is a real vector space with addition and scalar multiplication given pointwise almost everywhere.

Functions taking on $\pm \infty$ on a set of zero measure are fine!

These functions are still the almost everywhere equal to some integrable function (even those these infinite-valued functions are integrable), hence these are in $L^1$.

$L^1$ with the metric

\begin{equation*}
d \big( [f], [g] \big) = \norm{[f - g]}_{L^1}
\end{equation*}

is a complete metric space.

Let $\left\{ [f_k] \right\}$ be a Cauchy sequence. Since the $f_k$ are integrable, we may assume we choose $\mathbb{R}$ valued representatives.

For $k \in \mathbb{N}$, let $N_k$ be such that for $m, n \ge N_k$,

\begin{equation*}
\int \left| f_m - f_n \right| \dd{\mu} < 2^{-k}
\end{equation*}

and $N_{k + 1} > N_k$.

Thus,

\begin{equation*}
\int \left| f_{N_k} - f_{N_{k + 1}} \right| \dd{\mu} < 2^{-k}
\end{equation*}

and

\begin{equation*}
\int \sum_{k=1}^{\infty} \left| f_{N_k}  - f_{N_{k + 1}} \right| \dd{\mu} = \sum_{k=1}^{\infty} \left| f_{N_k} - f_{N_{k+1}} \right| \dd{\mu} < 1
\end{equation*}

Thus, $\sum_{k=1}^{\infty} \left| f_{N_k} - f_{N_{k + 1}} \right|$ is finite almost everywhere. Thus, this series is infinite on a set of measure zero, so we may assume the representatives $f_k$ are zero there and the sum is finite at each $\omega \in \Omega$.

Thus, $\sum_{k=1}^{\infty} \big( - f_{N_k} + f_{N_{k +1}} \big)$ converges everywhere.

Let

\begin{equation*}
\begin{split}
  g &= \left| f_{N_1} \right| + \sum_{k=1}^{\infty} \left| f_{N_k} - f_{N_{k + 1}} \right| \\
  f &= f_{N_1} + \sum_{k=1}^{\infty} \big( - f_{N_k} + f_{N_{k + 1}} \big)
\end{split}
\end{equation*}

(observe that the last part is just rewriting the $\sum_{k=1}^{\infty} f_{N_k}$).

By monotone convergence theorem

\begin{equation*}
\begin{split}
  \int g \dd{\mu} &\le \int \left| f_{N_1} \right| \dd{\mu} + \int \sum_{k=1}^{\infty} \left| f_{N_k} - f_{N_{k + 1}} \right| \dd{\mu} \\
  &= \int \left| f_{N_1} \right| \dd{\mu} + \sum_{k=1}^{\infty} \int \left| f_{N_k} - f_{N_{k + 1}} \right| \\
 &< \infty
\end{split}
\end{equation*}

Observe that pointwise

Applications to Probability

Notation

Probability and cumulative distributions

An elementary event is an element of $\Omega$.

A random event is an element of $\mathcal{A}$

A random variable is a measurable function from $\big( \Omega, \mathcal{A} \big)$ to $\Big( [- \infty, \infty], \mathcal{B} \big( [- \infty, \infty] \big) \Big)$.

Let

  • $\big( \Omega_1, \mathcal{A}_2, \mu \big)$ be a measure space and $\big( \Omega_2, \mathcal{A}_2 \big)$ be a measurable space
  • $f: \Omega_1 \to \Omega_2$ be a measurable function

Then we say that the push-forward of $\mu$ by $f$ is defined

\begin{equation*}
\big( f_* \mu \big)(A) = \mu \big( f^{-1}(A) \big), \quad \forall A \in \mathcal{A}_1
\end{equation*}

The probability distribution measure of $X$, denoted $\rho_X: \mathcal{B}(\mathbb{R}) \to [0, 1]$, is defined

\begin{equation*}
\forall S \in \mathcal{B}(\mathbb{R}), \quad \rho_X(S) = P(X \in S)
\end{equation*}

Equivalently, it's the push-forward of $P$ by $X$:

\begin{equation*}
\rho_X := X_* P = P \circ X^{-1}
\end{equation*}

In certain circles not including measure-theorists (existence of such circles is trivial), you might hear talks about "probability distributions". Usually what is meant by this is $\rho_X$ for some random variable $X$.

That is, a "distribution of $X$" usually means that there is some probability space $\big( \Omega, \mathcal{A}, P \big)$ in which $X$ is a random variable, i.e. $X: \Omega \to \mathbb{R}$ and the "distribution of $X$" is the corresponding probability distribution measure!

Confusingly enough, they will often talk about "$P$ distribution of $X$", in which case $P$ is NOT a probability measure, but denotes a probability distribution measure of the random variable.

The cumulative distribution function of $X$, denoted $F_X: \mathbb{R} \to [0, 1]$, is defined by

\begin{equation*}
\forall x \in \mathbb{R}, \quad F_X(x) = P(X \le x) = \rho_X \big( (- \infty, x] \big)
\end{equation*}

where $\rho_X$ is the probability distribution measure of $X$.

The probability distribution measure $\rho_X$ is a probability measure on the Borel sets $\mathcal{B}$.

\begin{equation*}
\rho_X(\emptyset) = P(X \in \emptyset) = P \Big( X^{-1}(\emptyset) \Big) = P(\emptyset) = 0
\end{equation*}

If $\left\{ S_i \right\}_{i = 0}^\infty$ is a disjoint sequence of sets in $\mathcal{B}(\mathbb{R})$, then

\begin{equation*}
\begin{split}
  \rho_X \big( \bigcup_{i=0}^{\infty} S_i \big) &= P \Big( X^{-1} \big( \bigcup_{i = 0}^{\infty} S_i \big) \Big) \\
  &= P \Big( \bigcup_{i=0}^{\infty} X^{-1}(S_i) \Big) \\
  &= \sum_{i=0}^{\infty} P \Big( X^{-1}(S_i) \Big) \\
  &= \sum_{i=0}^{\infty} \rho_X(S_i)
\end{split}
\end{equation*}

so $\rho_X$ satisfies countable additivity and is a measure.

Finally,

\begin{equation*}
\rho_X(\mathbb{R}) = P(X \in \mathbb{R}) = P(\Omega) = 1
\end{equation*}

so $\rho_X$ is a probability measure.

  1. $F_X$ is increasing
  2. $\lim_{x \to \infty} F_X = 1$ and $\lim_{x \to - \infty} F_X(x) = 0$
  3. $F_X$ is right continuous (i.e. continuous from the right)

    \begin{equation*}
\forall x \in \mathbb{R}, \forall \varepsilon > 0, \exists \delta > 0: \quad \forall y \in \mathbb{R}: \quad 0 < y - x < \delta \implies |F_X(y) - F_X(x)| < \varepsilon
\end{equation*}
  1. If $x < y$, then

    \begin{equation*}
\begin{split}
  F_X(x) &= P(X \le x) \\
  &= P \Big( X^{-1} \big( (- \infty, x] \big) \Big) \\
  & \le P \Big( X^{-1} \big( ( -\infty, x] \big) \Big) + P \Big( X^{-1} \big( (x, y] \big) \Big) \\
  &= P \Big( X^{-1} \big( (-\infty, x] \cup (x, y] \big) \Big) \\
  &= P \Big( X^{-1} \big( (- \infty, y] \big) \Big) \\
  &= F_X(y)
\end{split}
\end{equation*}
  2. Consider the limit as $x \to \infty$. Let

    \begin{equation*}
\begin{split}
  T_0 &= X^{-1} \big( (- \infty, 0] \big) \\
  T_n &= X^{-1} \big( (n - 1, n] \big), \quad \forall n \in \mathbb{Z}^+
\end{split}
\end{equation*}

    so

    \begin{equation*}
\Omega = X^{-1} ( \mathbb{R}) = \bigcup_{i = 0}^{\infty} T_n
\end{equation*}

    Then,

    \begin{equation*}
\begin{split}
  1 &= P (\Omega) = P \bigg( \bigcup_{i=0}^{\infty} T_i \bigg) \\
  &= \lim_{n \to infty} \sum_{i=0}^{n} P(S_i) \\
  &= \lim_{n \to \infty} \bigg( P \Big( X^{-1} \big( (- \infty, 0] \big) \Big) \bigg) + \sum_{i=1}^{n} P \bigg( X^{-1} \Big( (n - 1, n] \Big) \bigg) \\
  &= \lim_{n \to \infty} P \bigg( X^{-1} \Big( (- \infty, n] \Big) \bigg) \\
  &= \lim_{n \to \infty} F_X(n)
\end{split}
\end{equation*}

    which, since $F_X$ is increasing implies

    \begin{equation*}
\lim_{x \to \infty} F_X(x) = 1
\end{equation*}
  3. Let $x \in \mathbb{R}$ and $\varepsilon > 0$. Let

    \begin{equation*}
S_n = \bigg( x, x + \frac{1}{n} \bigg]
\end{equation*}

    The $S_n$ are nested, and similarily $X^{-1}(S_i)$ are nested.

    \begin{equation*}
\begin{split}
  0 &= P (\emptyset) = P \bigg( x^{-1} \bigg( \bigcup_{i=0}^{\infty} S_i \bigg) \bigg) \\
  &= \lim_{n \to \infty} P \Big( X^{-1}(S_n) \Big) \\
  &= \lim_{n \to \infty} P \bigg( X^{-1} \bigg( \bigg(x, x + \frac{1}{n} \bigg] \bigg) \bigg) \\
  &= \lim_{n \to \infty} P \bigg( X^{-1} \bigg( \bigg( - \infty, x + \frac{1}{n} \bigg] \bigg) \bigg) - P \bigg( X^{-1} \bigg( \bigg(-\infty, x \bigg] \bigg) \bigg) \\
  &= \lim_{n \to \infty} \bigg[ F_X \bigg( x + \frac{1}{n} \bigg) - F_X(x) \bigg]
\end{split}
\end{equation*}

    Thus, given $\varepsilon > 0$, there exists $N$ such that

    \begin{equation*}
F_X \bigg( x + \frac{1}{N} \bigg) - F_X(x) < \varepsilon
\end{equation*}

    Let $\delta = \frac{1}{N}$ so

    \begin{equation*}
0 < y -x < \delta \implies y < x + \frac{1}{N}
\end{equation*}
    \begin{equation*}
F_X(y) - F_X(x) \le F_X \bigg( x + \frac{1}{N} \bigg) - F_x (x) < \varepsilon
\end{equation*}

Radon-Nikodym derivatives and expectations

Let

The following are equivalent:

  1. $\forall x \in \mathbb{R}: F_X(x) = \int_{-\infty}^{x} f(s) \dd{s}$
  2. $f$ is a Radon-Nikodym derivative for $\rho_X$ wrt. $m \big|_{\mathcal{B}}$ (the Lebesgue measure but restricted to Borel measurable sets)
  3. $\rho_X = f m\big|_{\mathcal{B}}$

(2) and (3) are immediately equivalent:

\begin{equation*}
\rho_X \Big( (- \infty, x] \Big) = F_X(x) = \int_{-\infty}^{x} f(s) \dd{s} = \int_{-\infty}^{x} f \dd{m}
\end{equation*}

iff (2) or (3) holds when considering only sets of the form $(- \infty, x]$.

This statement is also equivalent to (1).

Thus (1) is equivalent to (2) or (3) restricted to sets of the form $(-\infty, x]$.

However, sets of the form $(-\infty, x]$ generate $\mathcal{B}(\mathbb{R})$, so from the Carathéodory extension theorem this gives $(1) \iff (2) \iff 3$.

To prove $(1) \implies (2)$ more rigorously, let

\begin{equation*}
S \in \mathcal{L}_{\text{finite}} \iff S = \bigcup_{i=1}^{n} [c_i, d_i)
\end{equation*}

for $c_i, d_i \in [-\infty, \infty]$ s.t. $c_i < d_i$ and none of these intervals overlap. That is all finite unions of left-closed, right-open, disjoint intervals.

Also let

\begin{equation*}
\lambda(S) = \sum_{i=1}^{n} F_X(d_i) - F_X(c_i)
\end{equation*}

Observe that

\begin{equation*}
\lambda(S) = \sum_{i=1}^{n} \bigg[ \int_{-\infty}^{d_i} f(x) \dd{x} - \int_{-\infty}^{c_i} f(x) \dd{x} \bigg] = \sum_{i=1}^{n} \int_{c_i}^{d_i} f(x) \dd{x} = \int_S f \dd{m}
\end{equation*}

and that

\begin{equation*}
\begin{split}
  \lambda(S) &= \sum_{i=1}^{n} P \Big( X^{-1}([c_i, d_i)) \Big) \\
  &= P \bigg( \bigcup_{i=1}^{n} X^{-1} \big( [c_i, d_i) \big) \bigg) \\
  &= P \Bigg( X^{-1} \bigg( \bigcup_{i=1}^{n} [c_i, d_i) \bigg) \Bigg) \\
  &= P \Big( X^{-1}(S) \Big) \\
  &= \rho_X(S)
\end{split}
\end{equation*}

One can show that $\big( \mathcal{L}_{\text{finite}}, \lambda \big)$ is a premeasure space. Therefore, by the Carathéodory extension theorem, there is a measure $\mu$ on $\mathcal{B}$ s.t.

\begin{equation*}
\mu(S) = \lambda(\mu), \quad \forall S \in \mathcal{L}_{\text{finite}}
\end{equation*}

Furthermore, since $\lambda(\mathbb{R}) = 1 < \infty$, $\mu$ is unique! But both the measures $\int_S f \dd{m}$ and $\rho_X(S)$ satisfy these properties, thus

\begin{equation*}
\rho_X(S) = \int_{S}^{} f \dd{m}, \quad \forall S \in \mathcal{L}_{\text{finite}}
\end{equation*}

which is the definition of $f$ being a Radon-Nikodym derivative of $\rho_X$ wrt. Lebesgue measure restricted to the Borel σ-algebra, as wanted.

A function $f$ is a probability density function for $X$ if $f$ is a Radon-Nikodym derivative of the probability distribution measure $\rho_X$, wrt. Lebesgue measure restricted to Borel sets, i.e.

\begin{equation*}
\rho_X = f \dd{m} \big|_{\mathcal{B}}
\end{equation*}

Expectation via distributions

Expectation of a random variable is

\begin{equation*}
\mathbb{E}(X) = \int X \dd{P}
\end{equation*}

If $g: \mathbb{R} \to [0, \infty)$ is a nonnegative function that is $\mathcal{B}(\mathbb{R})$ measurable, then

\begin{equation*}
\mathbb{E} \big[ g(X) \big] = \int g(s) \dd{\rho_X(s)}
\end{equation*}

If $g$ is the characterstic function, then, if $g = \chi_S$,

\begin{equation*}
g \Big( X(\omega) \Big) = 
\begin{cases}
  1 & \text{if } X \in S \\
  0 & \text{if } X \notin S
\end{cases}
\end{equation*}

so

\begin{equation*}
\int g \Big( X(\omega) \Big) \dd{P(\omega)} = \int \chi_{X^{-1}(S)} \dd{P} = P(X \in S) = \rho_X(S) = \int \chi_S \dd{\rho_X} = \int g \dd{\rho_X}
\end{equation*}

Multiplying by constants and summing over different characteristic functions, we get the result to be true for any simple function.

Given a nonnegative function $g$, let $\left\{ g_n \right\}$ be an increasing sequence of simple functions converging pointwise to $g$.

Note $g \circ X$ is the increasing limit of $g_n \circ X$. By two applications of Monotone Convergence

\begin{equation*}
\begin{split}
  \mathbb{E} \big[ g(X) \big] &= \int g(X) \dd{P}\\
  &= \int \lim_{n \to \infty} g_n(X) \dd{P} \quad\\
  &= \lim_{n \to \infty} \int g_n(X) \dd{P} \quad \text{(by MC)} \\
  &= \lim_{n \to \infty} \int g_n \dd{\rho_X} \quad \text{(by above)}\\
  &= \int g \dd{\rho_X} \quad \text{(by MC)}
\end{split}
\end{equation*}

This techinque, of going from characterstic function → simple functions → general functions, is used heavily, not just in probability theory.

Independent events & Borel-Cantelli theorem

A collection of random events $\left\{ E_{\alpha} \right\}_{\alpha \in A} \subseteq \mathcal{A}$ are independent events if for every finite collection of distinct indices $\alpha_1, \dots, \alpha_N$,

\begin{equation*}
P \Bigg( \bigcap_{i=1}^{N} E_{\alpha_i} \Bigg) = \prod_{i=1}^{N} P \big( E_{\alpha_j} \big)
\end{equation*}

A random event $E_i$ occurs at $\omega \in \Omega$ if $\omega \in E_i$.

The probability that the event occurs is $P(E_i)$.

If $F_1, F_2, \dots, F_n$ are independent then $F_1^C, F_2^C, \dots, F_n^C$ are also independent.

Prove that $F_1, \dots, F_{n - 1}, F_n^C$ are independent.

Consider $F_{i_1}, \dots, F_{i_k}, F_n^C$, we want to prove

\begin{equation*}
P \big( F_{i_1} \cap \dots \cap F_{i_k} \cap F_n^C \big) = F_n^C \prod_{j=1}^{k} F_{i_j}, \quad i_j \ne n
\end{equation*}

RHS can be written

\begin{equation*}
\begin{split}
  F_n^C \prod_{j=1}^{k} F_{i_j} &= \big( 1 - F_n \big) \prod_{j=1}^{k} \\
  &= \prod_{j=1}^{k} P(F_{i_j}) - F_n \prod_{j=1}^{k} P(F_{i_j}) \\
  &= P \big( F_{i_1} \cap \dots \cap F_{i_k} \big) - P \big( F_{i_1} \cap \dots \cap F_{i_k} \cap F_n \big)
\end{split}
\end{equation*}

which is equal to LHS above, and implies that the complement is indeed independent.

The condition that infinitively many of the events occurs at $\omega$ is

\begin{equation*}
\omega \in \bigcap_{n = 0}^{\infty} \bigcup_{m = n}^{\infty} E_m := \limsup_{n \to \infty} E_n
\end{equation*}

This is equivalent to

\begin{equation*}
\forall n, \ \exists m: \quad \omega \in E_m \quad \text{and} \quad m \ge n
\end{equation*}

where we have converted the $\bigcap_{n=0}^{\infty} \mapsto \forall n$ and $\bigcup_{m=n}^{\infty} \mapsto \exists m, \ m \ge n$.

Furthermore, $\bigcap_{n = 0}^{\infty} \bigcup_{m = n}^{\infty} E_m$ is itself a random event.

  1. If $\sum_{i=0}^{\infty} P(E_i) < \infty$ then probability of infinitely many of the events occuring is 0, i.e.

    \begin{equation*}
P \Bigg( \bigcap_{n = 0}^{\infty} \bigcup_{m = n}^{\infty} E_i \Bigg) = 0
\end{equation*}
  2. If the $\left\{ E_i \right\}_{i = 0}^{\infty}$ are independent and $\sum_{i=0}^{\infty} P(E_i) = \infty$, then probability of infinitely many of the events occuring is 1, i.e.

    \begin{equation*}
P \Bigg( \bigcap_{n = 0}^{\infty} \bigcup_{m = n}^{\infty} E_i \Bigg) = 1
\end{equation*}
  1. Suppose $\sum_{i=0}^{\infty} P (E_i) < \infty$.

    \begin{equation*}
\begin{split}
  P \Bigg( \bigcap_{n = 0}^{\infty} \underbrace{\bigcup_{m = n}^{\infty} E_i}_{\text{nested}} \Bigg) &= \lim_{n \to \infty} P \Bigg( \bigcup_{m = n}^{\infty} E_m \Bigg) \\
  & \le \lim_{n\to \infty} \sum_{m=n}^{\infty} P(E_m) \\
  &= 0
\end{split}
\end{equation*}
  2. Suppose $\left\{ E_i \right\}_{i = 0}^{\infty}$ are now independent and that $\sum_{i=0}^{\infty} P(E_i) = \infty$. Fix $n \in \mathbb{N}$. Then

    \begin{alignat*}{2}
P \Bigg( \bigg( \bigcup_{m=n}^{\infty} E_m \bigg)^C \Bigg) &= P \Bigg( \bigcap_{m=n}^{\infty} E_m^C \Bigg) & \\
&= \prod_{m=n}^{\infty} P(E_m^C) \quad & \text{(by indep.)}\\
&= \prod_{m=n}^{\infty} 1 - P(E_m) &  \\
& \le \prod_{m=n}^{\infty} e^{- P(E_m)} \quad & \text{(by } e^{-x} \ge 1 - x ) \\
&= e^{- \sum_{m=n}^{\infty} P(E_m)} & \\
&= 0 &
\end{alignat*}

Chebyshev's inequality

Let $\big( \Omega, \mathcal{A}, P \big)$ be a probability space.

If $X$ is a random variable with mean $\mu$ and variance $\sigma^2$, then

\begin{equation*}
P \big( \left| X - \mu \right| \ge \lambda \big) \le \frac{\sigma^2}{\lambda^2}, \quad \forall \lambda > 0
\end{equation*}

Let

\begin{equation*}
E = \left\{ \omega \in \Omega: \left| X(\omega) - \mu \right| \ge \lambda \right\}
\end{equation*}

Then $\big( X - \mu \big)^2 \ge \lambda^2 \chi_E$ everywhere, so

\begin{equation*}
\sigma^2 = \mathbb{E} \big[ \big( X - \mu \big)^2 \big] \ge \lambda^2 \mathbb{E} \big[ \chi_E \big] = \lambda^2 P(E)
\end{equation*}

Hence,

\begin{equation*}
P(E) \le \frac{\sigma^2}{\lambda^2}
\end{equation*}

Independent random variables

Let

  • $\big( \Omega, \mathcal{A}, P \big)$ be a probability space.
  • $N \in \mathbb{N} \cup \left\{ \infty \right\}$

A collection of σ-algebras $\left\{ \mathcal{A}_i \right\}_{i = 1}^N$, where $\mathcal{A}_i \subseteq \mathcal{A}$ for all $i$, is independent if for every collection of events $\left\{ A_i \right\}_{i = 1}^N$ s.t $A_i \in \mathcal{A}_i$ for all $i$, then $\left\{ A_i \right\}_{i = 1}^N$ is a set of independent events.

A collection of random variables $\left\{ X_n \right\}_{n = 1}^N$ is independent if the collection of σ-algebras they generate is independent.

A sequence of random variables $\left\{ X_n \right\}$ is independent and identically distributed (i.i.d) if they are independent variables and for $m, n$ we have

\begin{equation*}
F_{X_m} = F_{X_n}
\end{equation*}

where $F_{X_i}$ is the cumulative distribution function for $X_i$.

Let $X$ and $Y$ be independent.

  1. We have
    1. If $\mathbb{E} \big[ \left| X \right| \big] = 0$ or $\mathbb{E} \big[ \left| Y \right| \big] = 0$ then $\mathbb{E} \big[ \left| XY \right| \big] = 0$
    2. If $\mathbb{E} \big[ \left| X \right| \big] > 0$ and $\mathbb{E} \big[ \left| Y \right| \big] > 0$, then

      \begin{equation*}
\mathbb{E} \big[ \left| XY \right| \big] = \mathbb{E} \big[ \left| X \right| \big] \mathbb{E} \big[ \left| Y \right| \big]
\end{equation*}
  2. Furthermore, if $\mathbb{E} \big[ \left| X \right| \big] < \infty$ and $\mathbb{E} \big[ \left| Y \right| \big] < \infty$, then

    \begin{equation*}
\mathbb{E} \big[ XY \big] = \mathbb{E} \big[ X \big] \mathbb{E} \big[ Y \big]
\end{equation*}

Consider

  • first nonnegative functions
  • subcase $\mathbb{E} \big[ X \big] = 0$

Since $X$ is nonnegative

\begin{equation*}
0 = \mathbb{E} \big[ X \big] = \int X \dd{P} \implies X \overset{\text{a.s.}}{=} 0
\end{equation*}

Thus, $XY \overset{\text{a.s.}}{=} 0$ so $\mathbb{E} \big[ XY \big] = 0$.

Now consider the subcase where $\mathbb{E} \big[ X \big] > 0$ and $\mathbb{E} \big[ Y \big] > 0$.

Let $\mathcal{A}_X$ and $\mathcal{A}_Y$ be the σ-algebras generated by $X$ and $Y$.

Observe that $\big( \Omega, \mathcal{A}_X, P \big|_{\mathcal{A}_X} \big)$ and $\big( \Omega, \mathcal{A}_Y, P \big|_{\mathcal{A}_Y} \big)$ are measure spaces. Let $\left\{ X_n \right\}$ be an increasing sequence of simple functions that are measurable wrt. $\mathcal{A}_X$ and similarily $\left\{ Y_n \right\}$ simple increasing to $Y$ and $\mathcal{A}_Y$ measurable.

As simple functions, these can be written as

\begin{equation*}
\begin{split}
  X_n &= \sum_{i=1}^{M_n} c_{n, i} \chi_{S_{n, i}} \\
  Y_n &= \sum_{j=1}^{N_n} d_{n, j} \chi_{T_{n, j}}
\end{split}
\end{equation*}

Then,

\begin{equation*}
\begin{split}
  \mathbb{E} \big[ X_n Y_n \big] &= \int \sum_{i=1}^{M_n} \sum_{j=1}^{N_n} c_{n, i} d_{n, j} \chi_{S_{n, i}} \chi_{T_{n, j}} \dd{P} \\
  &= \sum_{i=1}^{M_n} \sum_{j=1}^{N_n} c_{n, i} d_{n, j} P \Big( S_{n, i} \cap T_{n, j} \Big) \\
  &= \sum_{i=1}^{M_n} \sum_{j=1}^{N_n} c_{n, i} d_{n, j} P \big( S_{n, i} \big) P \big( T_{n, j} \big) \\
  &= \bigg( \sum_{i=1}^{M_n} c_{n, i} P \big( S_{n, i} \big) \bigg) \bigg( \sum_{j=1}^{N_n} d_{n, j} P\big(T_{n, j} \big) \bigg) \\
  &= \mathbb{E} \big[ X_n \big] \mathbb{E} \big[ Y_n \big]
\end{split}
\end{equation*}

Since $X_n Y_n$ increases to $XY$, by MCT

\begin{equation*}
\mathbb{E} \big[ XY \big] = \mathbb{E} \big[ X \big] \mathbb{E} \big[ Y \big]
\end{equation*}

Dividing into positive & negative parts & summing gives $(2)$.

Strong Law of Large numbers

Notation

  • $\left\{ X_i \right\}_{i = 1}^{\infty}$ are i.i.d. random variables, and we will assume

    \begin{equation*}
\mathbb{E}[X_i] < \infty \quad \text{and} \quad \text{Var}(X_i) < \infty
\end{equation*}

Stuff

Let $\big( \Omega, \mathcal{A}, P \big)$ be a probability space and $\left\{ X_i \right\}_{i = 1}^{\infty}$ be a sequence of i.i.d. random variables with

\begin{equation*}
\mu := \mathbb{E}[X_i] < \infty \quad \text{and} \quad \sigma^2 := \text{Var}(X_i) < \infty
\end{equation*}

Then the sequence of random variables $\frac{1}{n} \sum_{i=1}^{n} X_i$ converges almost surely to $\mu$, i.e.

\begin{equation*}
\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i \overset{\text{a.s.}}{=} \mu
\end{equation*}

This is equivalent to $X_i - \mu > \varepsilon$ occuring with probability 0, and this is the approach we will take.

First consider $X_i \ge 0$.

For $\varepsilon > 0$ and $k \in \mathbb{N}$, let

\begin{equation*}
S_{k, \varepsilon} = \left\{ \omega : \left| \frac{1}{k} \sum_{i=1}^{k} \big( X_i - \mu \big) \right| > \varepsilon \right\}
\end{equation*}

and

\begin{equation*}
A_{\varepsilon} = \bigcap_{n=0}^{\infty} \bigcup_{m=n}^{\infty} S_{m^2, \varepsilon}
\end{equation*}

Since $\left\{ X_i \right\}$ are i.i.d. we have

\begin{equation*}
\text{Var} \bigg( \sum_{i=1}^{k} X_i \bigg) = \sum_{i=1}^{k} \text{Var} \big( X_i \big) = k \sigma^2
\end{equation*}

and since variance rescales quadratically,

\begin{equation*}
\text{Var} \bigg( \frac{1}{k} \sum_{i=1}^{k} X_i \bigg) = \frac{\sigma^2}{k}
\end{equation*}

Using Chebyshev's inequality

\begin{equation*}
P \big( S_{k, \varepsilon} \big) = P \Bigg( \left\{ \left| \sum_{i=1}^{k} \big( X_i - \mu \big) \right| > k\  \varepsilon \right\} \Bigg) \le \frac{\sigma^2 k}{k^2 \varepsilon^2} = \frac{\sigma^2}{k \varepsilon^2}
\end{equation*}

Observe then that with $k = m^2$, we have

\begin{equation*}
\sum_{m=1}^{\infty} P \big( S_{m^2, \varepsilon} \big) = \frac{\sigma^2}{\varepsilon^2} \sum_{m=1}^{\infty} m^{-2} < \infty
\end{equation*}

And so by Borel-Cantelli, since this is a sequence of independent random variables, we have

\begin{equation*}
P(A_{\varepsilon}) = P \Bigg( \bigcap_{n=0}^{\infty} \bigcup_{m = n}^{\infty} S_{m^2, \varepsilon} \Bigg) = 0
\end{equation*}

In particular, for any $\varepsilon$, there are almost surely only finitely many $m$ with

\begin{equation*}
\left| \frac{1}{m^2} \sum_{i=1}^{m^2} \big( X_i - \mu \big) \right| > \varepsilon
\end{equation*}

Step: showing that we can do this for any $\varepsilon$.

Consider $\bigcup_{r = 1}^{\infty} A_{1 / r}$. Observe that by countable subadditivity,

\begin{equation*}
P \Bigg( \bigcup_{r = 1}^{\infty} A_{1 / r} \Bigg) = 0
\end{equation*}

Now let $\omega \notin A_{1 / r}$, which occurs almost surely from the above. For any $\varepsilon > 0$, let

\begin{equation*}
r: \quad r > \varepsilon^{-1}
\end{equation*}

Since $\omega \notin A_{1 / r}$, there are only finitely many $m$ s.t.

\begin{equation*}
\left| \bigg( \frac{1}{m^2} \sum_{i=1}^{m^2} X_i \bigg) - \mu \right| > \varepsilon > \frac{1}{r}
\end{equation*}

as found earlier (the parenthesis are indeed different here, compared to before). Therefore

\begin{equation*}
\exists M: \quad \left| \bigg( \frac{1}{m^2} \sum_{i=1}^{m^2} X_i \bigg) - \mu \right| \le \varepsilon, \quad \forall m > M
\end{equation*}

$\varepsilon > 0$ is arbitrary, so this is true for all $\varepsilon$. Hence,

\begin{equation*}
\lim_{m \to \infty} \bigg( \frac{1}{m^2} \sum_{i=1}^{m^2} X_i \bigg) - \mu = 0 \implies \lim_{m \to \infty} \frac{1}{m^2} \sum_{i=1}^{m^2} X_i = \mu
\end{equation*}

This proves that there is a subsequential limit almost surely.

Step: subsequential limit to "sequential" limit. Given $n \in \mathbb{N}$, let $m$ be such that $m^2 \le n < (m + 1)^2$. Since $X_i$ are nonnegative

\begin{equation*}
\sum_{i=1}^{m^2} X_i \le \sum_{i=1}^{n} X_i \le \sum_{i=1}^{(m + 1)^2} X_i
\end{equation*}

and therefore

\begin{equation*}
\frac{m^2}{n} \frac{1}{m^2} \sum_{i=1}^{m^2} X_i \le \frac{1}{n} \sum_{i=1}^{n} X_i \le \frac{(m + 1)^2}{n} \frac{1}{(m + 1)^2} \sum_{i=1}^{(m + 1)^2} X_i
\end{equation*}

and since $m^2 \le n < (m + 1)^2$,

\begin{equation*}
\frac{m^2}{(m + 1)^2} \bigg( \frac{1}{m^2} \sum_{i=1}^{m^2} X_i \bigg) \le \frac{1}{n} \sum_{i=1}^{n} X_i \le \frac{(m + 1)^2}{m^2} \bigg( \frac{1}{(m + 1)^2} \sum_{i=1}^{(m + 1)^2} X_i \bigg)
\end{equation*}

Since the first and the last expressions converge to $\mu$,t by the squeeze theorem we have

\begin{equation*}
\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i = \mu
\end{equation*}

Step: Relaxing nonnegativity assumption on $X_i$.

Suppose $X_i$ is not necessarily nonnegative. Since, by assumption, $X_i$ has finite expectation, $X_i$ is integrable. Therefore we know that the positive and negative parts of $X_i$, denoted $X_i^{\pm}$, are also integrable. Therefore we can compute the expectations

\begin{equation*}
\mu^{+} = \int X_i^{ + } \dd{P} \quad \text{and} \quad \big( - \mu^{-} \big) = \int \big( - X_i^{-} \big) \dd{P}
\end{equation*}

Similarily, we have that the variance of $X_i^{\pm}$ is finite, which allows us to the apply the result we found for $X$ being nonnegative to both $X_i^{+}$ and $X_i^{-}$:

\begin{equation*}
\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i^+ = \mu^+ \quad \text{and} \quad \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} \big( - X_i \big) = \big( - \mu^{-} \big)
\end{equation*}

Let $A^{\pm}$ be the set where the mean of the positive / negative part converges. Since

\begin{equation*}
P \Big( \big( A^{\pm} \big)^c \Big) = 0
\end{equation*}

(since otherwise the limit would not converge almost surely). We then have

\begin{equation*}
\begin{split}
  P \Big( \big( A^+ \cap A^- \big)^c \Big) &= \underbrace{P \Big( \big( A^+ \big)^c \cup A^- \Big)}_{= 0} + \underbrace{ P \Big( A^+ \cup \big( A^- \big) \Big)}_{= 0} + P \Big( \big( A^+ \big)^c \cup \big( A^- \big)^c \Big) \\
  &= P \Big( \big( A^+ \big)^c \cup \big( A^- \big)^c \Big) \\
  &\le \underbrace{P \Big( \big( A^+ \big)^c \Big)}_{=0} + \underbrace{P \Big( \big( A^- \big)^c \Big)}_{=0} \\
  &= 0
\end{split}
\end{equation*}

Thus, almost surely, $\omega \in A^+ \cap A^-$, and on this we have convergence, so

\begin{equation*}
\begin{split}
  \mu &= \mu^+ - \mu^- \\
  &= \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i^+ - \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i^- \\
  &= \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} \Big( X_i^+ - \big( - X_i^- \big) \Big) \\
  &= \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i
\end{split}
\end{equation*}

Concluding our proof.

Ergodic Theory

Let $T: X \to X$ be a measure-preserving transformation on a measure space $(X, \Sigma, \mu)$ with $\mu(X) = 1$, i.e. it's a probability space.

Then $T$ is ergodic if for every $E \in \Sigma$ we have

\begin{equation*}
T^{-1}(E) = E \implies \mu(E) = 0 \text{ or } \mu(E) = 1
\end{equation*}

Bochner integrable

The Bochner integral is a notion of integrability on Banach spaces, and is defined in very much the same way as integrability wrt. Lebesgue-measure.

Let $\big( \Omega, \mathcal{A}, \mu \big)$ be a measure space and $\big(B, \norm{\cdot}_B\big)$ a Banach space.

A simple function is defined similarily as before, but now taking values on a Banach space instead. That is,

\begin{equation*}
f_n(x) = \sum_{i=1}^{n} \chi_{E_i}(x) \ b_i, \quad E_i \in \mathcal{A} \text{ and } b_i \in B
\end{equation*}

with the integral

\begin{equation*}
\int_X f_n \dd{\mu} = \sum_{i=1}^{n} \mu(E_i) \ b_i
\end{equation*}

A measurable function $f:X \to B$ is said to be Bochner integrable if there exists a sequence of integrable simple functions $f_n$ such that

\begin{equation*}
\lim_{n \to \infty} \int_X \norm{f - f_n}_{B} \dd{\mu} = 0
\end{equation*}

where the integral on the LHS is an ordinary Lebesgue integral.

If this is the case, then the Bochner integral is defined

\begin{equation*}
\int_X f \dd{\mu} = \lim_{n \to \infty} \int_X f_n \dd{\mu}
\end{equation*}

It can indeed be shown that a function $f$ is Bochner integrable if and only if $f \in L^1_B$, the $L^1$ Bochner space, defined similarily as L1-space for functions but with the absolute value replaced by the $\norm{\cdot}_B$.

Concentration inequalities

Stochastic processes

Let

such that one of the following holds:

  1. $\exists n \in \mathbb{N}$ such that $\mathbb{P} (\tau > n) = 0$
  2. $\mathbb{E}[\tau] < \infty$ and there exists a constatn $c \in \mathbb{r}$ s.t. for all $t \in \mathbb{N}$,

    \begin{equation*}
\mathbb{E} \big[ \left| X_{t + 1} - X_t \right| \mid \mathcal{F}_t \big] \le c
\end{equation*}

    almost surely on the even that $\tau > t$.

  3. $\exists c \in \mathbb{R}$ such that $\left| X_{t \wedge \tau} \right| \le c$ almost surely for all $t \in \mathbb{N}$

Then $X_{\tau}$ is a.s. well-defined and $\mathbb{E}[X_{\tau}] = \mathbb{E}[X_0]$.

Furthermore, when $\big( X_t \big)$ is supersub-martingale rather than a martingale, then equality is replaced with lessgreather-than, respectively.

Let $\big( X_t \big)_{t = 0}^{\infty}$ be a supermartingale with $X_t \ge 0$ a.s. for all $t$.

Then for any $\varepsilon > 0$

\begin{equation*}
\mathbb{P} \bigg( \sup_{t \in \mathbb{N}} X_t \ge \varepsilon \bigg) \le \frac{\mathbb{E}[X_0]}{\varepsilon}
\end{equation*}

Let $A_n$ be the event that $\sup_{t \le n} X_t \ge \varepsilon$ and $\tau = (n + 1) \wedge \min \left\{ t \le n : X_t \ge \varepsilon \right\}$, where we assume $\min \emptyset = \infty$ so that $\tau = n + 1$ if $X_t < \varepsilon$ for all $0 \le t \le n$.

Clearly $\tau$ is a stopping time and $\mathbb{P} \big( \tau \le n + 1 \big) = 1$. Then by Doob's optional stopping theorem and an elementary calculation

\begin{equation*}
\begin{split}
  \mathbb{E} [X_0] & \ge \mathbb{E}[X_{\tau}] \\
  &\ge \mathbb{E} \big[ X_{\tau} \1 \left\{ \tau \le n \right\} \big] \\
  &\ge \mathbb{E} \big[ \varepsilon \1 \left\{ \tau \le n \right\} \big] \\
  &= \varepsilon \mathbb{P} (\tau \le n) = \varepsilon \mathbb{P} (A_n)
\end{split}
\end{equation*}

Course: Advanced Probability

Notation

  • $\wedge$ is used as a binary operation which takes minimum of the two arguments
  • $\lor$ is used as a binary operation which takes maxmium of the two arguments

Lecture 1

Notation

  • $(E, \mathcal{E})$ denotes a measurable space (with a measure $\mu$ it becomes a measure space)
  • $m \mathcal{E}$ denotes the set of measurable functions wrt. $\mathcal{E}$

    \begin{equation*}
m \mathcal{E} = \left\{ f : E \to \mathbb{R}, f \text{ measurable wrt. } \mathcal{B}(\mathbb{R}) \right\}
\end{equation*}

    and non-negative measurable functions

    \begin{equation*}
m \mathcal{E}^{ + } := \left\{ f: E \to [0, \infty], f \text{ measurable wrt. } \mathcal{B}(\mathbb{R}) \right\}
\end{equation*}
  • We write

    \begin{equation*}
\mu(f) = \int_E f(x) \mu (\dd{x})
\end{equation*}

Stuff

Let $\big( E, \mathcal{E}, \mu \big)$ be a measure space.

Then there exists a unique $\tilde{\mu}: m \mathcal{E}^{ + } \to [0, \infty]$ s.t.

  1. $\tilde{\mu}(1_A) = \mu(A)$ for all $A \in \mathcal{E}$
  2. Linearity

    \begin{equation*}
\tilde{\mu} (\alpha f + \beta g) = \alpha \tilde{\mu} (f) + \beta \tilde{\mu}(g)
\end{equation*}

    for all $f, g, \in m \mathcal{E}^{ + }$ with $\alpha , \beta \ge 0$.

  3. Monotone convergence

    \begin{equation*}
\tilde{\mu}(f_m) \to \tilde{\mu}(f)
\end{equation*}

    for $f_m \nearrow f$ pointwise.

Let $\big( E_1, \mathcal{E}_1, \mu_1 \big)$ and $\big( E_2, \mathcal{E}_2, \mu_2 \big)$ be finite or sigma-finite measure spaces. Then

\begin{equation*}
\mathcal{E} = \mathcal{E}_1 \otimes \mathcal{E}_2 := \sigma \big( \left\{ A_1 \times A_2 \text{ for} A_i \in \mathcal{E}_i \right\} \big)
\end{equation*}

is the produt σ-algebra on $E_1 \times E_2$.

There exists a unique measure $\mu = \mu_1 \otimes \mu_2$ on $\mathcal{E} = \mathcal{E}_1 \otimes \mathcal{E}_2$ called the product measure

\begin{equation*}
\mu(A_1 \times A_2) = \mu_1(A_1) \mu_2(A_2), \quad \forall A_i \in \mathcal{E}_i
\end{equation*}

Let $f \in m \mathcal{E}^{ + }$. For $x_1 \in \mathcal{E}_1$ define

\begin{equation*}
\begin{split}
  f_{x_1}: \quad & E_2 \to \mathbb{R} \\
  & x_2 \mapsto f_{x_1}(x_2) := f(x_1, x_2)
\end{split}
\end{equation*}

Then $f_{x_1}$ is $\mathcal{E}_2 \text{-measurable}$ $\forall x_1 \in \mathcal{E}_1$. Hence, we can define

\begin{equation*}
\begin{split}
  f_1: \quad & E_1 \to \mathbb{R} \\
  & x_1 \mapsto f_1(x_1) := \mu_2(f_{x_1})
\end{split}
\end{equation*}

Then $f_1$ is $\mathcal{E}_1 \text{-measurable}$ and

\begin{equation*}
\mu_1(f_1) = \mu(f)
\end{equation*}

where $\mu$ is the product measure.

Applying the above in both directions, we have

\begin{equation*}
\mu(f) = \hat{\mu}(f)
\end{equation*}

with

\begin{equation*}
\hat{\mu} = \mu_2 \otimes \mu_1
\end{equation*}

and

\begin{equation*}
\begin{split}
  \hat{f}: \quad & E_2 \times E_1 \to \mathbb{R} \\
  & (x_2, x_1) \mapsto \hat{f}(x_2, x_1) := f(x_1, x_2)
\end{split}
\end{equation*}

Conclusion:

\begin{equation*}
\int_{E_1}^{} \bigg( \int_{E_2}^{} f(x_1, x_2) \mu_2(\dd{x_2}) \bigg) \mu_1(\dd{x_1}) = \int_{E_2}^{} \bigg( \int_{E_1}^{} f(x_1, x_2) \mu_1(\dd{x_1}) \bigg) \mu_2(\dd{x_2})
\end{equation*}

Lecture 2: conditional expectation

Notation

  • $\big( \Omega, \mathcal{F}, \mathbb{P} \big)$ a probability space, i.e. $\mathbb{P}(\Omega) = 1$
  • $X$ denotes rv, i.e. $X: \Omega \to \mathbb{R}$ is $\mathcal{F} \text{-measurable}$ and integrable, with expectation

    \begin{equation*}
\mathbb{E} \big[ X \big] = \int X \dd{\mathbb{P}}
\end{equation*}
  • Also write

    \begin{equation*}
\mathbb{E} \big[ X 1_A \big] = \int_A X \dd{\mathbb{P}}, \quad \forall A \in \mathcal{F}
\end{equation*}

    or, as we're used to, $\chi_A$ instead of $1_A$

Stuff

Let $B \in \mathcal{F}$ with $\mathbb{P}(B) > 0$.

Then

\begin{equation*}
\mathbb{P}\big( A \mid B \big) = \frac{\mathbb{P}(A \cup B)}{\mathbb{P}(B)}
\end{equation*}

is called the conditional probability of $A$ given $B$.

Similarily, we define

\begin{equation*}
\mathbb{E} \big[ X \mid B \big] = \frac{\mathbb{E}[X 1_B]}{\mathbb{P}(B)}
\end{equation*}

to be the conditional expectation of $X$ given $B$.

  • Quite restrictive since we require probability of $B$ to be non-zero
  • Goal: improve prediction for $X$ if additional "information" is available

Let $\big( B_n \big)_{n \in \mathbb{N}}$ be a sequence of disjoint events, whose union is $\Omega$. Set

\begin{equation*}
\mathcal{G} := \sigma \big( \left\{ B_n, n \in \mathbb{N} \right\} \big) = \left\{ \bigcup_{n \in I}^{} B_n : I \subseteq \mathbb{N} \right\}
\end{equation*}

For any integrable random variable $X$, we can define

\begin{equation*}
Y = \sum_{n \in \mathbb{N}}^{} \mathbb{E} \big[ X \mid B_n \big] 1_{B_n}
\end{equation*}

where we set

\begin{equation*}
\mathbb{E} \big[ X \mid B_n \big] = 
\begin{cases}
  \frac{\mathbb{E} [ X 1_{B_n}]}{\mathbb{P}(B_n)} & \text{if } \mathbb{P}(B_n) > 0 \\
  0 & \text{if } \mathbb{P}(B_n) = 0
\end{cases}
\end{equation*}

Notice that

  1. $Y$ in (discrete) definition of conditional expectation is $\mathcal{G} \text{-measurable}$

    \begin{equation*}
Y(\omega) = \sum_{n \in \mathbb{N}}^{} \mathbb{E} \big[ X \mid B_n \big] 1_{B_n}(\omega)
\end{equation*}

    Let $A \in \mathcal{B}([-\infty, \infty])$, then

    \begin{equation*}
\begin{split}
  Y^{-1}(A) &= \left\{ \omega \in \Omega \mid Y(\omega) \in A \right\} \\
  &= \left\{ \omega \in \Omega \mid \sum_{n \in \mathbb{N}}^{} \mathbb{E}[X \mid B_n] 1_{B_n}(\omega) \in A \right\} \\
  &= \bigcup_{n \in \mathbb{N}}^{} \left\{ \omega \in B_n \mid \mathbb{E} [X \mid B_n] 1_{B_n}(\omega) \in A \right\}
\end{split}
\end{equation*}

    because $\omega \in B_n \iff \omega \notin B_{m}, \forall m \ne n$, and each of these sets are measurable Notice that this is simply the union of intersections

    \begin{equation*}
B_n \cap \bigcup_{m \in \mathbb{M}: \mathbb{E}[X \mid B_m] = \mathbb{E}[X \mid B_n]}^{} \big( 1_{B_m} \big)^{-1} \big( \left\{ \mathbb{E}[x \mid B_m] \right\} \big)
\end{equation*}

    which is just

    \begin{equation*}
B_n \cap \bigcup_{m \in \mathbb{N}: \mathbb{E}[X \mid B_m] = \mathbb{E}[X \mid B_n]}^{} B_m
\end{equation*}

    But this is just $B_n$ since $B_n \cap B_m = \emptyset, \forall m \ne n$! That is,

    \begin{equation*}
B_n \cap \bigcup_{m \in \mathbb{N}: \mathbb{E}[X \mid B_m] = \mathbb{E}[X \mid B_n]}^{} B_m = B_n
\end{equation*}

    Which means we end up with

    \begin{equation*}
Y^{-1}(A) = \bigcup_{n \in \mathbb{N} : \mathbb{E}[X \mid B_n] \in A}^{} B_n
\end{equation*}

    which is union of $\mathcal{G} \text{-measurable}$ sets and so $Y$ is $\mathcal{G} \text{-measurable}$ random variable.

  2. $Y$ is integrable and

    \begin{equation*}
\mathbb{E} \big[ X 1_G \big] = \mathbb{E} \big[ Y 1_G \big], \quad \forall G \in \mathcal{G}
\end{equation*}

    This is easily seen from

    \begin{equation*}
\begin{split}
  \mathbb{E} \big[ \sum_{n \in \mathbb{N}}^{} \mathbb{E}[X \mid B_n] 1_{B_n} 1_G \big] &= \sum_{n \in \mathbb{N}}^{} \mathbb{E}[X \mid B_n]  \mathbb{E} \big[ 1_{B_n} 1_G \big] \\
  &= \sum_{n \in \mathbb{N}}^{} \mathbb{E}[X \mid B_n] \mathbb{P}(B_n \cap G) \\
  &\le \sum_{n \in \mathbb{N}}^{} \mathbb{E}[X \mid B_n] \\
  &< \infty
\end{split}
\end{equation*}

    since $\mathbb{P}(B_n \cap G) \le 1$ and $X$ is integrable.

There's an issue with the (discrete) definition of conditional expectation though. Example:

  • $\Omega = (0, 1]$, $\mathcal{F} = \mathcal{B}(\Omega)$ and $\mathbb{P}$ be the Lebesgue measure
  • Consider the case

    \begin{equation*}
\mathcal{G} = \sigma \Bigg( \bigg( \frac{k}{m}, \frac{k + 1}{m} \bigg], k = 0, \dots, m - 1 \Bigg)
\end{equation*}
  • Then consider $Z: \Omega \to \left\{ z_1, z_2, \dots \right\} \subset \mathbb{R}$ is a rv.
  • Then let

    \begin{equation*}
\mathcal{G} = \sigma(Z) = \sigma \big( \left\{ \left\{ Z = z_i \right\}, i = 1, 2, \dots \right\} \big)
\end{equation*}
  • Then

    \begin{equation*}
\begin{split}
  \mathbb{E} \big[ X \mid Z \big] &:= \mathbb{E} \big[ X \mid \sigma(Z) \big] \\
  &= \sum_{i : \mathbb{P}(\left\{ Z = z_i \right\} > 0)}^{} \mathbb{E} \big[ X \mid \left\{ Z = z_i \right\} \big] 1_{\left\{ Z = z_i \right\}}
\end{split}
\end{equation*}
  • Issue: if $Z$ has an absolutely continuous distribution, e.g. $\mathcal{N}(0, 1)$, i.e.

    \begin{equation*}
\mathbb{P} (Z = z) = 0, \quad \forall z
\end{equation*}

    then the set we're summing over $\left\{ i : \mathbb{P}( \left\{ Z = z_i \right\}) > 0  \right\}$ is the empty set!

    • This motivates the more general defintion which comes next!

Let $X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ with $\mathcal{G} \subseteq \mathcal{F}$ is a σ-algebra.

A random variable $Y$ is called (a version of) the conditional expectation of $X$ given by $\mathcal{G}$ if

  1. $Y$ is $\mathcal{G} \text{-measurable}$
  2. And

    \begin{equation*}
\mathbb{E} \big[ X 1_A \big] = \mathbb{E} \big[ Y 1_A \big], \quad \forall A \in \mathcal{G}
\end{equation*}

So we write $Y = \mathbb{E}[X \mid \mathcal{G}]$

  1. $X \in L^1$ can be replaced by $X \ge 0$ throughout
  2. If $\mathcal{G} = \sigma(\varphi)$ with $\varphi \subseteq \mathcal{F}$ it suffices to check for all $A \in \varphi$
  3. If $\mathcal{G} = \sigma(Z)$ with $Z$ a rv., then $\mathbb{E}[X \mid Z] = \mathbb{E}[X \mid \sigma(Z)]$ is $\sigma(Z) \text{-measurable}$ by condition (1) in def of conditional expectation, so it's of the form $f(Z)$ for some function $f$; therefore it's common to define

    \begin{equation*}
\mathbb{E} \big[ X \mid Z = z \big] = f(z)
\end{equation*}

Let $X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ with $\mathcal{G} \subseteq \mathcal{F}$ a sigma-algebra.

Then

  1. $Y = \mathbb{E} \big[ X \mid \mathcal{G} \big]$ exists
  2. Any two versions of $\mathbb{E} \big[ X \mid \mathcal{G} \big]$ coincide $\mathbb{P} \text{-a.s.}$
  1. Let $Y$ be as in conditional expectation and let $Y'$ satisfy the conditions in the same def for some $X' \in L^1$ with $X \le X'$ almost surely.
    • Let $Z = (Y - Y') 1_A$ with $A = \left\{ Y \ge Y' \right\} \in \mathcal{G}$ (in $\mathcal{G}$ because both $Y, Y'$ are $\mathcal{G} \text{-measurable}$)
    • Then

      \begin{equation*}
\mathbb{E} \big[ Y 1_A \big] = \mathbb{E} \big[ X 1_A \big] \le \mathbb{E} \big[ X' 1_A \big] = \mathbb{E} \big[ Y' 1_A \big] < \infty
\end{equation*}

      since $Y' \in L^1$. The first equality is due to condition (2) in def of cond. expectation.

    • $\mathbb{E}[Y 1_A] \le \mathbb{E}[Y' 1_A]$ implies, by def of $Z$, that

      \begin{equation*}
\mathbb{E}[Z] \le 0 \underset{Z \ge 0 \text{ a.s.}}{\implies} Z \overset{\text{a.s.}}{=} 0 \implies Y \overset{\text{a.s.}}{\le} Y'
\end{equation*}
    • If $X = X'$, a similar argument shows that $Y \overset{\text{a.s.}}{=} Y'$ (using $A = \left\{ Y > Y' \right\}$ and $A = \left\{ Y < Y' \right\}$)
      • The reason why we did the inequality first is because we'll need that later on.
  2. We're going to do this by orthogonal projection in $L^2(\mathcal{F})$.
    1. Assume $X \in L^2(\mathcal{F})$. Since $L^2(\mathcal{G})$ is a complete subspace of $L^2(\mathcal{F})$, so such $X$ has an orthogonal projection $Y$ on $L^2(\mathcal{G})$, i.e.

      \begin{equation*}
\exists Y \in L^2(\mathcal{G}) : \quad \mathbb{E} \big[ (X - Y) Z  \big] = 0, \quad \forall Z \in L^2(\mathcal{G})
\end{equation*}

      Choosing $Z = 1_A$ for some $A \in \mathcal{G}$, we get

      \begin{equation*}
\mathbb{E} \big[ X 1_A \big] = \mathbb{E} \big[ Y 1_A \big]
\end{equation*}

      so $Y$ satisfies (1) and (2) in def of cond expectation, from equation above. But this is assuming $X \in L^2$ which is not strict enough for the case when $X \in L^1$! So we gotta do some more work.

    2. Assume $X \ge 0$. Then $X_m = X \land m \in L^2$ and $0 \le X_m \nearrow X$ for some $m \to \infty$. By Step 1, we know that

      \begin{equation*}
\exists Y_m \in L^2(\mathcal{G}): \quad \mathbb{E} \big[ X_m 1_A \big] = \mathbb{E} \big[ Y_m 1_A \big], \quad \forall A \in \mathcal{G}
\end{equation*}

      and $0 \le Y_m \le Y_{m + 1}$ a.s. (by proof of (2) above). Further, let

      \begin{equation*}
Y_{\infty} := \lim_{m \to \infty} Y_m 1_{\Omega_0}
\end{equation*}

      with

      \begin{equation*}
\Omega_0 := \left\{ \omega \in \Omega \mid 0 \le Y_m(\omega) \le Y_{m + 1} (\omega), \quad \forall m \right\}
\end{equation*}

      which is just the set where the sequence is increasing. Then $Y_{\infty} \ge 0$ is $\mathcal{G} \text{-measurable}$ and by MCT we get

      \begin{equation*}
\mathbb{E} \big[ X 1_A \big] = \mathbb{E} \big[ Y_{\infty} 1_A \big]
\end{equation*}

      Then, letting $A = \Omega$,

      \begin{equation*}
\underset{A = \Omega}{\implies} \mathbb{E}[Y_{\infty}] = \mathbb{E}[X] \underset{X \in L^1}{<} \infty
\end{equation*}

      so $Y_{\infty} < \infty$ a.s. (and thus $Y_{\infty} \in L^1$) and

      \begin{equation*}
Y := Y_{\infty} 1_{\left\{ Y_{\infty} < \infty \right\}} < \infty}
\end{equation*}

      satisfies the condtions in def of cond expectation

    3. For general $X \in L^1$, apply Step 2 on $X^{ + }$ and $X^{ - }$ to obtain $Y^{ + }$ and $Y^{ - }$. Then

      \begin{equation*}
Y = Y^{ + } - Y^{ - }
\end{equation*}

      satisfies the condtions in def of cond expectation.

Let $X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$, i.e. integrable random variable, and let $\mathcal{G} \subseteq \mathcal{F}$ be a σ-algebra.

We have the following properties:

  1. $\mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] \big]  = \mathbb{E}[X]$
  2. If $X$ is $\mathcal{G} \text{-measurable}$, then $\mathbb{E}[X \mid \mathcal{G}] \overset{\text{a.s.}}{=} X$
  3. If $X$ is independent of $\mathcal{G}$, then $\mathbb{E}[X \mid \mathcal{G}] \overset{\text{a.s.}}{=} \mathbb{E}[X]$
  4. If $X \overset{a.s.}{\ge} 0$, then $\mathbb{E}[X \mid \mathcal{G}] \overset{a.s.}{\ge} 0$.
  5. For $\alpha, \beta \in \mathbb{R}$ and any integrable random variable $Y$, we have

    \begin{equation*}
\mathbb{E} \big[ \alpha X + \beta Y \mid \mathcal{G} \big] \overset{\text{a.s.}}{=} \alpha \mathbb{E} [X \mid \mathcal{G}] + \beta \mathbb{E}[Y \mid \mathcal{G}]
\end{equation*}

Let $\big( X_n \big)_{n \in \mathbb{N}}$ be a sequence of random variables. Suppose further $0 \le X_n \uparrow X$ a.s., then $\mathbb{E}[X_n \mid \mathcal{G}] \uparrow Y$ a.s., for some $\mathcal{G} \text{-measurable}$ random variable $Y$.

  1. (conditional MCT) By MCT, we therefore have

    \begin{equation*}
\mathbb{E} [X 1_A] = \lim_{n \to \infty} \mathbb{E}[X_n 1_A] = \lim_{n \to \infty} \mathbb{E} \big[ \mathbb{E}[X_n \mid \mathcal{G}] 1_A \big] = \mathbb{E}[Y 1_A], \quad \forall A \in \mathcal{G}
\end{equation*}

    which implies that

    \begin{equation*}
Y \overset{\text{a.s.}}{=} \mathbb{E}[X \mid \mathcal{G}]
\end{equation*}

    This is basically the conditional MCT:

    \begin{equation*}
0 \le X_n \uparrow X \text{ a.s. } \quad \implies \quad \mathbb{E}[X_n \mid \mathcal{G}] \uparrow \mathbb{E}[X \mid \mathcal{G}] \text{ a.s.}
\end{equation*}
  2. (conditional Fatou's lemma)

    \begin{equation*}
X_n \ge 0, \quad \forall n \in \mathbb{N} \quad \implies \quad \mathbb{E} [\liminf_{n \to \infty} X_n \mid \mathcal{G}] \overset{a.s.}{\le} \liminf_{n \to \infty} \mathbb{E}[X_n \mid \mathcal{G}]
\end{equation*}
  3. (conditional Dominated convergence) If $X_n \to N$ and $|X_n| \le Y$ for all $n$, almost surely, for some integrable variable $Y$, then

    \begin{equation*}
\mathbb{E}[X_n \mid \mathcal{G}] \overset{a.s.}{\to} \mathbb{E}[X \mid \mathcal{G}]
\end{equation*}
  4. (conditional Jensen's inequality) If $c: \mathbb{R} \to (-\infty, \infty]$ is convex, then

    \begin{equation*}
\mathbb{E} \big[ c(X) \mid \mathcal{G} \big] \overset{a.s.}{\ge} c \big( \mathbb{E}[X \mid \mathcal{G}] \big)
\end{equation*}
  5. In particular, for $1 \le p \le \infty$, we have

    \begin{equation*}
\norm{\mathbb{E}[X \mid \mathcal{G}]_p^p} = \mathbb{E} \big[ \left| \mathbb{E}[X \mid \mathcal{G}] \right|^p \big] \le \mathbb{E} \big[ \mathbb{E}[\left| X \right|^p \mid \mathcal{G}] \big] = \mathbb{E} \big[ \left| X \right|^p \big] = \norm{X}_p^p
\end{equation*}

    where we used Jensen's inequality for the inequality. Thus we have

    \begin{equation*}
\norm{\mathbb{E}[X \mid \mathcal{G}]}_p \le \norm{X}_p, \quad \forall \ 1 \le p < \infty
\end{equation*}

For any σ-algebra $\mathcal{H} \subseteq \mathcal{G}$, the rv. $Y = \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H} \big]$ is $\mathcal{H} \text{-measurable}$ and satisfies, for all $A \in \mathcal{H}$,

\begin{equation*}
\mathbb{E} \big[ Y 1_A \big] = \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big] = \mathbb{E}[X 1_A]
\end{equation*}
  1. (Tower property)

    \begin{equation*}
\mathcal{H} \subseteq \mathcal{G} \quad \implies \quad \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] \mid \mathcal{H} \big] \overset{\text{a.s.}}{=} \mathbb{E}[X \mid \mathcal{H}]
\end{equation*}
  2. "Take out what is known": if $Y$ is bounded and $\mathcal{G} \text{-measurable}$, then

    \begin{equation*}
\mathbb{E}[Y X \mid \mathcal{G}] = Y \mathbb{E}[X \mid \mathcal{G}]
\end{equation*}

    since $YX$ is then $\mathcal{G} \text{-measurable}$ and

    \begin{equation*}
\mathbb{E} \big[ 1_A \mathbb{E}[YX \mid \mathcal{G}] \big] \overset{\text{a.s.}}{=} \mathbb{E} [ 1_A YX ] = Y \mathbb{E}[1_A X] \overset{\text{a.s.}}{=} Y \mathbb{E} \big[ 1_A \mathbb{E}[X \mid \mathcal{G}] \big], \quad \forall A \in \mathcal{G}
\end{equation*}

    (actually, you need to first consider $Y = 1_B$ for some $B \in \mathcal{G}$, and then extend to all measurable functions $Y$ using simple functions as usual)

  3. If $\sigma(X,  \mathcal{G})$ is independent of $\mathcal{H}$, then

    \begin{equation*}
\mathbb{E} \big[ X \mid \sigma(\mathcal{G}, \mathcal{H}) \big] \overset{\text{a.s.}}{=} \mathbb{E}[X \mid \mathcal{G}] 
\end{equation*}

    Because suppose $A \in \mathcal{G}$ and $B  \in \mathcal{H}$, then

    \begin{equation*}
\begin{split}
  \mathbb{E} \big[ \mathbb{E}[X \mid \sigma(\mathcal{G}, \mathcal{H})] 1_{A \cap B} \big] & =\mathbb{E} \big[ X 1_{A \cap B} \big] \\
  &= \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big] \mathbb{P}(B) \\
  &= \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_{A \cap B} \big] 
\end{split}
\end{equation*}

    The set of such intersections $A \cap B$ is a π-system generating $\sigma(\mathcal{G}, \mathcal{H})$, so the desired formula follows from [SOME PROPOSITION].

  1. From defintion of conditional expectation we know that

    \begin{equation*}
\mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big] = \mathbb{E} \big[ X 1_A \big], \quad \forall A \in \mathcal{G}
\end{equation*}

    And since $\Omega \in \mathcal{G}$, we must also have

    \begin{equation*}
\mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] \big] = \mathbb{E} \big[ X \big]
\end{equation*}
  2. Let $A = \left\{ \omega \in \Omega \mid \mathbb{E}[X \mid \mathcal{G}](\omega) \ne X(\omega) \right\}$. Observe that $A \in \mathcal{G}$ since

    \begin{equation*}
  E := \Big( \underbrace{\mathbb{E} [X \mid \mathcal{G}](\Omega)}_{\in \mathcal{B}([-\infty, \infty])} \cap \underbrace{X(\Omega)}_{\in \mathcal{B}([-\infty, \infty])} \Big)^c \in \mathcal{B}([-\infty, \infty])
\end{equation*}

    and both $\mathbb{E}[X \mid \mathcal{G}]$ and $X$ are $\mathcal{G} \text{-measurable}$, thus the their preimages must be in $\mathcal{G}$ and

    \begin{equation*}
A = \big( \mathbb{E}[X \mid \mathcal{G}] \big)^{-1}(E) \cup X^{-1}(E) \in \mathcal{G}
\end{equation*}

    Moreover, from the definition of $\mathbb{E}[X \mid \mathcal{G}]$, we know that

    \begin{equation*}
\mathbb{E} \big[ X 1_A \big] = \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big]
\end{equation*}

    which implies

    \begin{equation*}
\mathbb{P}(A) = 0
\end{equation*}

    i.e.

    \begin{equation*}
X \overset{\text{a.s.}}{=} \mathbb{E}[X \mid \mathcal{G}]
\end{equation*}
  3. [TODO] $X$ being independent of $\mathcal{G}$ means that

    \begin{equation*}
\mathbb{P}(X^{-1}(E) \cap A) = \mathbb{P}\big(X^{-1}(E)\big) \mathbb{P}(A), \quad \forall E \in \mathcal{B}([-\infty, \infty]), \quad \forall A \in \mathcal{G}
\end{equation*}

    Let

    \begin{equation*}
B := \left\{ \omega \in \Omega \mid \mathbb{E}[X \mid \mathcal{G}] \ne \mathbb{E}[X] \right\} = \mathbb{E}[X \mid \mathcal{G}]^{-1} \big( \left\{ \mathbb{E}[X] \right\} \big) \in \mathcal{G}
\end{equation*}

    since $\mathbb{E}[X \mid \mathcal{G}]$ is $\mathcal{G} \text{-measurable}$. Then

    \begin{equation*}
\mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_B \big] = \mathbb{E} [X 1_B]
\end{equation*}

    by def of expectation. But

    \begin{equation*}
\mathbb{E}[X 1_B] = \int_{X^{-1}(\mathbb{R}) \cap B} X(\omega) \mathbb{P}(\dd{\omega}) \le \Big( \sup_{\omega \in X^{-1}(\mathbb{R}) \cap B} X(\omega) \Big) \mathbb{P}\big(X^{-1}(\mathbb{R}) \cap B \big) = \Big( \sup_{\omega \in X^{-1}(\mathbb{R}) \cap B} X(\omega) \Big) \underbrace{\mathbb{P}\big(X^{-1}(\mathbb{R})\big)}_{= 1} \mathbb{P}(B)
\end{equation*}
  4. Proven in proof:1.4.-lec-existence-and-uniqueness-of-conditional-expectation
  5. If we can show that properties (1) and (2) from def:conditional-expectation is satisfied by $\alpha \mathbb{E}[X \mid \mathcal{G}] + \beta \mathbb{E}[Y \mid \mathcal{G}]$, then by (2) we get LHS immediately. Measurability follows from linear combination of measurables being measurable. And observe that for all $A \in \mathcal{G}$

    \begin{equation*}
\begin{split}
  \mathbb{E} \big[ (\alpha \mathbb{E}[X \mid \mathcal{G}] + \beta \mathbb{E}[Y \mid \mathcal{G}]) 1_A \big] &= \mathbb{E} \big[ \alpha \mathbb{E}[X \mid \mathcal{G}] 1_A + \beta \mathbb{E}[Y \mid \mathcal{G}] 1_A \big] \\
  &= \alpha \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big] + \beta \mathbb{E} \big[ \mathbb{E}[Y \mid \mathcal{G}] 1_A \big] \\
  &= \alpha \mathbb{E} [X 1_A] + \beta \mathbb{E}[Y 1_A] \\
  &= \mathbb{E}[(\alpha X + \beta Y) 1_A]
\end{split}
\end{equation*}

    by the fact that $\mathbb{E}[X \mid \mathcal{G}]$ and $\mathbb{E}[Y \mid \mathcal{G}]$ are both $\mathcal{G} \text{-measurable}$.

Let $X \in L^1$. Then the set of random variables $Y$ of the form

\begin{equation*}
Y = \mathbb{E}[ X \mid \mathcal{G}] 
\end{equation*}

where $\mathcal{G}  \subseteq \mathcal{F}$ is a σ-algebra is uniformly integrable.

Given $\varepsilon > 0$, we can find $\delta > 0$ so that

\begin{equation*}
A \in \mathcal{F} : \mathbb{P}(A) \le \delta \quad \implies \quad \mathbb{E} [ \left| X \right| 1_A ] \le \varepsilon
\end{equation*}

Then choose $\lambda < \infty$ so that

\begin{equation*}
\mathbb{E}[|X|] \le \lambda \delta
\end{equation*}

Suppose $Y = \mathbb{E}[X \mid \mathcal{G}]$, then $|Y| \le \mathbb{E}[\left| X \right| \mid \mathcal{G}]$. In particular,

\begin{equation*}
\mathbb{E}[\left| Y \right|] \le \mathbb{E}[\left| X \right|] 
\end{equation*}

so, by Markov's inequality, we have

\begin{equation*}
\mathbb{P} \big( \left| Y \right| \ge \lambda \big) \le \frac{\mathbb{E}[\left| Y \right|]}{\lambda} \le \delta 
\end{equation*}

Then

\begin{equation*}
\mathbb{E} \big[ \left| Y \right| 1_{\left| Y \right| \ge \lambda} \big] \le \mathbb{E} \big[ \left| X \right| 1_{\left| Y \right| \ge \lambda} \big] \le \varepsilon 
\end{equation*}

Since our choice of $\lambda$ was independent of $\mathcal{G}$, we have our proof for any σ-algebra $\mathcal{G} \subseteq \mathcal{F}$.

Martingales in discrete time

Let $\big( \Omega, \mathcal{F}, \mathbb{P} \big)$ be a probability space.

A filtration on this space is a sequence $\big( \mathcal{F}_{n} \big)_{n \ge 0}$ of σ-algebras such that,

\begin{equation*}
\mathcal{F}_n \subseteq \mathcal{F}_{n + 1} \subseteq \mathcal{F}, \quad \forall n \ge 0
\end{equation*}

We also define

\begin{equation*}
\mathcal{F}_{\infty} := \sigma \big( \mathcal{F}_n : n \ge 0 \big)
\end{equation*}

Then $\mathcal{F}_{\infty} \subseteq \mathcal{F}$.

A random process (in discrete time) is a sequence of random variables $\big( X_n \big)_{n \ge 0}$.

Let $X = \big( X_n \big)_{n \ge 0}$ be a random process (discrete time).

Then we define the natural filtration of $X$ to be $\big( \mathcal{F}_n^X \big)_{n \ge 0}$, given by

\begin{equation*}
\mathcal{F}_n^X := \sigma(X_0, \dots, X_n)
\end{equation*}

Then $\mathcal{F}_n^X$ models what we know about $X$ by time $n$.

We say $\big( X_n \big)_{n \ge 0}$ is adapted to $\big( \mathcal{F}_n \big)_{n \ge 0}$ if $X_n$ is $\mathcal{F}_n \text{-measurable}$ for all $n \ge 0$.

This is equivalent to requiring that $\mathcal{F}_n^X \subseteq \mathcal{F}_n$ for all $n \ge 0$.

Let

  • $\mathbb{F} = \big( \mathcal{F}_t \big)_{t = 1}^n$ be a filtration

$\big( X_t \big)_{t = 1}^n$ is $\mathbb{F} \text{-predictable}$ if $X_t$ is $\mathcal{F}_{t - 1} \text{-measurable}$ for each $t \in [n]$.

We say a random process is integrable if $X_n$ is an integrable random variable for all $n \ge 0$.

A martingale is an adapted random process (discrete time) $X = \big( X_n \big)_{n \ge 0}$ such that

\begin{equation*}
\mathbb{E} \big[ X_{n + 1} \mid \mathcal{F}_n \big] \overset{\text{a.s.}}{=} X_n, \quad \forall n \ge 0
\end{equation*}

If instead

\begin{equation*}
\mathbb{E} \big[ X_{n + 1} \mid \mathcal{F}_n \big] \overset{\text{a.s.}}{\le} X_n, \quad \forall n \ge 0
\end{equation*}

we say $X$ is a supermartingale.

And if instead

\begin{equation*}
\mathbb{E} \big[ X_{n + 1} \mid \mathcal{F}_n \big] \overset{\text{a.s.}}{\ge} X_n, \quad \forall n \ge 0
\end{equation*}

we say $X$ is a submartingale.

Every process which is martingale wrt. a given filtration $\big( \mathcal{F}_n \big)_{n \ge 0}$ is also martingale wrt. its natural filtration.

We say a random variable

\begin{equation*}
T: \Omega \to \left\{ 0, 1, 2, \dots \right\} \cup \left\{ \infty \right\}
\end{equation*}

is a stopping time if $\left\{ T \le n \right\} = \left\{ \omega \in \Omega : T(\omega) \le n \right\} \in \mathcal{F}_n$ for all $n \ge 0$.

For a stopping time $T$, we set

\begin{equation*}
\mathcal{F}_T := \left\{ A \in \mathcal{F}_{\infty}: A \cap \left\{ T \le n \right\} \in \mathcal{F}_n, \quad \forall n \ge 0 \right\}
\end{equation*}

Given a random process (discrete time) $X$, we define

\begin{equation*}
X_{T}(\omega) := X_{T(\omega)}(\omega) \quad \text{whenever} \quad T(\omega) < \infty
\end{equation*}

and we define the stopped process $X^T = \big( X_n^T \big)_{n \ge 0}$ defined by

\begin{equation*}
X_n^T(\omega) = X_{T(\omega) \wedge n}(\omega), \quad n \ge 0
\end{equation*}

Let $S$ and $T$ be stopping times and let $X$ be an adapted process. Then

  1. $S \wedge T$ is a stopping time
  2. $\mathcal{F}_T$ is a σ-algebra
  3. If $S \le T$, then $\mathcal{F}_S \subseteq \mathcal{F}_T$
  4. $X_T 1_{\left\{ X < \infty \right\}}$ is an $\mathcal{F}_T \text{-measurable}$ random variable
  5. $X^T$ is adapted
  6. If $X$ is integrable, then $X^T$ is integrable.

Let

  • $\big( \Omega, \mathcal{F}, \mathbb{P} \big)$ denote a probability space
  • $\big( \mathcal{F}_n \big)_{n \ge 0}$ be a filtration
  • $X$ adapted to $\big( \mathcal{F}_n \big)_{n \ge 0}$
  • Note that

    \begin{equation*}
\left\{ \omega \in \Omega \mid (S \land T)(\omega) \le n \right\} = \Big( S^{-1}\big((-\infty, n]\big) \cap T^{-1} \big( (n, \infty] \big) \Big) \cup \Big( T^{-1}\big((-\infty, n]\big) \cap S^{-1} \big( (n, \infty] \big) \Big)
\end{equation*}

    And, since $S$ and $T$ are a stopping times,

    • $S^{-1} \big( (-\infty, n] \big) \in \mathcal{F}_n$ which also implies $S^{-1} \big( (n, \infty] \big) = \Big( S^{-1} \big( (-\infty, n) \big) \Big)^c$, since a σ-algebra is closed under complements
    • Similarily for $T$
    • σ-algebra is closed under finite intersections and unions

    Hence

    \begin{equation*}
\big( S \land T \big)^{-1} \big( (-\infty, n] \big) \in \mathcal{F}_n
\end{equation*}

    Importantly this holds for all $n \ge 0$, and so

    \begin{equation*}
(S \land T)^{-1} \big( (-\infty, n] \big) \in \mathcal{F}_n, \quad \forall n \ge 0
\end{equation*}

    i.e. $S \land T$ is a stopping time.

  • Recall

    \begin{equation*}
\mathcal{F}_T := \left\{ A \in \mathcal{F}_{\infty}: A \cap \left\{ T \le n \right\} \in \mathcal{F}_n, \quad \forall n \ge 0 \right\}
\end{equation*}

    which can equivalently be written

Problem sheets

PS1

1.1

Let $X, Y \in L^1(\mathbb{P})$ and let $\mathcal{G}$ be a σ-algebra. Then

\begin{equation*}
\mathbb{E}[X + Y \mid \mathcal{G}] \overset{\text{a.s.}}{=} \mathbb{E}[X \mid \mathcal{G}] + \mathbb{E}[Y \mid \mathcal{G}] 
\end{equation*}

First of, $\mathbb{E}[X \mid \mathcal{G}] + \mathbb{E}[Y \mid \mathcal{G}]$ is $\mathcal{G} \text{-measurable}$ because each $\mathbb{E}[X \mid \mathcal{G}]$ and $\mathbb{E}[Y \mid \mathcal{G}]$ are $\mathcal{G} \text{-measurable}$ and we know that linear combinations of measurables are measurable. Second,

\begin{equation*}
\begin{split}
  \mathbb{E} \big[ (\mathbb{E}[X \mid \mathcal{G}] + \mathbb{E}[X \mid \mathcal{G}]) 1_A \big] &= \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A + \mathbb{E}[Y \mid \mathcal{G}] 1_A \big] \\
  &= \mathbb{E} \big[ \mathbb{E}[X \mid \mathcal{G}] 1_A \big] + \mathbb{E} \big[ \mathbb{E}[Y \mid \mathcal{G}] 1_A \big] \\
  & \overset{\text{a.s.}}{=} \mathbb{E} [X 1_A] + \mathbb{E}[Y 1_A] \\
  &= \mathbb{E} \big[ X 1_A + Y 1_A \big] \\
  &= \mathbb{E} \big[ (X + Y) 1_A \big]
\end{split}
\end{equation*}
1.2

Let

  • $X$ be a non-negative rv.
  • $Y$ be a version of $\mathbb{E}[X \mid \mathcal{G}]$

Then

\begin{equation*}
\left\{ X > 0 \right\} \overset{a.s.}{\subseteq} \left\{ Y > 0 \right\}
\end{equation*}

i.e.

\begin{equation*}
1_{X > 0} \le 1_{Y > 0}
\end{equation*}

Further,

\begin{equation*}
\forall A \in \mathcal{G}, \quad \left\{ X > 0 \right\} \overset{a.s.}{\subseteq} A \quad \implies \quad \left\{ Y > 0 \right\} \overset{a.s.}{\subseteq} A
\end{equation*}

Course: Advanced Financial Models

Notation

  • $\mathbb{P}(A \mid \mathcal{G}) = \mathbb{E} \big[ 1_A \mid \mathcal{G} \big]$

Lecture 2

If $A$ is $\mathcal{G} \text{-measurable}$, then $\mathbb{P}(A \mid \mathcal{G}) \in \left\{ 0, 1 \right\}$ a.s.

Conversely, if $\mathbb{P}(A \mid \mathcal{G}) \in \left\{ 0, 1 \right\}$ a.s. then $\exists A'$ $\mathcal{G} \text{-measurable}$ s.t

\begin{equation*}
\mathbb{P}(A \setminus A') = \mathbb{P}(A' \setminus A) = 0
\end{equation*}

i.e. everything "interesting" happens in the intersection of $A$ and $A'$

If $A \in \mathcal{G}$ then

\begin{equation*}
\begin{split}
  \mathbb{P}(A \mid \mathcal{G}) &= \mathbb{E} \big[ 1_A \mid \mathcal{G} \big] \\
  &= 1_A \in \left\{ 0, 1 \right\}
\end{split}
\end{equation*}

since $1_A$ is $\mathcal{G} \text{-measurable}$.

Suppose $\mathbb{P}(A \mid \mathcal{G}) = 1_{A'}$ where $A' \in \mathcal{G}$.

Note tthat

\begin{equation*}
\begin{split}
  \mathbb{E} \big[ (1_A - 1_{A'})^2 \big] &= \mathbb{P}(A) + \mathbb{P}(A') - 2 \mathbb{P}(A \cup A') \\
  &= 2 \big( \mathbb{P}(A) - \mathbb{P}(A \cap A') \big) \\
  &= 0
\end{split}
\end{equation*}

where in the last equality we've used

\begin{equation*}
\begin{split}
  \mathbb{P}(A \cap A') &= \mathbb{E} \Big[ \mathbb{E} \big[ 1_A - 1_{A'} \mid \mathcal{G} \big] \Big] \\
  &= \mathbb{E} \big[ 1_{A'} \mathbb{P}(A \mid \mathcal{G}) \big] \\
  &= \mathbb{P}(A') \\
  &= \mathbb{P}(A)
\end{split}
\end{equation*}