Analysis

Defintions
Theorems
Measure
Limits of sequences
Infinite Series of Real Numbers
- Theorems
Infinite Series of Functions
- Uniform Convergence
- Uniform Continuity
Power series
Integrability on R
- Definitions
- Theorems
Integrability on R (alternative)
- Integrals and uniform limits of sequences and series of functions
- Problems
Metric spaces
Fixed Point Theory
- Differential Equations
- Exercises
Fourier Series
Functional Analysis
Reproducing Kernel Hilbert Spaces (RKHSs)
Exam prep

Defintions

General

The support of a real-valued function $f: X \to \mathbb{R}$ is given by

$\begin{equation*} \supp(f) = \{ x \in X \mid f(x) \ne 0 \} \end{equation*}$

L^p space

$L^p$ spaces are function spaces defined using a generalization of the p-norm for a finite-dimensional vector space.

p-norm

Let $p \ge 1$ be a real number. Then p-norm (also called the $\ell_p$ -norm) of vectors $\mathbf{x} = (x_1, ..., x_n)$ , i.e. over a finite-dimensional vector space, is

$\begin{equation*} || \mathbf{x} ||_p = \Big( \sum_{i=1}^n | x_i |^p \Big)^{1 / p} \end{equation*}$

Banach space

A Banach space is a vector space with a metric that:

Allows computation of vector length and distance between vectors (due to the metric imposed)
Is complete in the sense that a Cauchy sequence of vectors always converge to a well-defined limit that is within the space

Sequences

Sequences of real numbers

Definition of a convergent sequence

$\begin{equation*} (a_n)_{m \in N} \end{equation*}$

$\exists a \in \mathbb{R}$ such that $\forall \varepsilon > 0$ $\exists N$ such that $\forall j \ge N | a_j - a| < \varepsilon$

Bounded sequences

A bounded sequence $\exists M \ge 0$ such that

$\begin{equation*} \forall j \ge 1 \implies |a_j| \le M \end{equation*}$

Cauchy Sequence

A sequence of points $x_n \in \mathbb{R}$ is said to be Cauchy (in $\mathbb{R}$ ) if and only if for every $\epsilon > 0$ there is an $N \in \mathcal{N}$ such that

$\begin{equation*} n, m \ge N \quad \implies \quad | x_n - x_m | < \epsilon \end{equation*}$

For a real sequence this is equivalent of convergence.

TODO Uniform convergence

Series of functions

Pointwise convergence

Let $E$ be a nonempty subset of $\mathbb{R}$ . A sequence of functions $f_n : E \to \mathbb{R}$ is said to converge pointwise on $E$ if and only if $f(x) = \lim_{n \to \infty} f_n(x)$ exists for each $x \in E$ .

We use the following notation to express pointwise convergence :

$\begin{equation*} f_n \to f \text { pointwise on } E \text{ as } n \to \infty \end{equation*}$

Remarks

The pointwise limit of continuous (respectively, differentiable) functions is not necessarily continuous (respectively, differentiable).
The pointwise limit of integrable functions is not necessarily integrable.
There exist differentiable functions $f_n$ and $f$ such that $f_n \to f$ pointwise on $[0, 1], but

$\begin{equation*} \lim_{n \to \infty} f_n'(x) \ne \Big( \lim_{n \to \infty} f(x) \Big)' \end{equation*}$

There exist continuous functions $f_n$ and $f$ such that $f_n \to f$ pointwise on $[0, 1] but

$\begin{equation*} \lim_{n \to \infty} \int_0^1 f_n(x) \ dx \ne \int_0^1 \Big( \lim_{n \to \infty} f_n(x) \Big) \ dx \end{equation*}$

Uniform convergence

Let $E$ be a nonempty subset of $\mathbb{R}$ . A sequence of functions $f_n : E \to \mathbb{R}$ is said to converge uniformly on $E$ to a function $f$ if and only if for every $\varepsilon > 0$ .

Continuity

Lipschitz continuity

Given two metric spaces $(X, d_X)$ and $(Y, d_y)$ , where $d_X$ denotes the metric on $X$ , and same for $d_Y$ and $Y$ , a function $f : X \to Y$ is called Lipschitz continuous if there exists a real constant $K \ge 0$ such that

$\begin{equation*} d_Y \Big( f(x_2) - f(x_1) \Big) \le K d_X (x_2 - x_1), \quad \forall x_1, x_2 \in X \end{equation*}$

Where the constant $K$ is referred to as the Lipschitz constant for the function $f$ .

If the Lipschitz constant equals one, $K = 1$ , we say $f$ is a short-map.

If the Lipschitz constant $K$ is

$\begin{equation*} 0 \le K < 1 \end{equation*}$

we call the map $f$ a contraction or contraction mapping.

A fixed point (or invariant point ) $x^*$ of a function $f$ is an element of the function's domain which is mapped onto itself, i.e.

$\begin{equation*} x^* \in X : f(x^*) = x^* \end{equation*}$

Observe that if a function $f$ crosses the line $y = x$ , then it does indeed have a fixed point.

The mean-value theorem can be incredibly useful for checking if a mapping is a contraction mapping, since it states that

$\begin{equation*} |f(x) - f(y)| = |f'(c)| |x - y| \end{equation*}$

for some $c \in (x, y)$ .

Therefore, if there exists $c$ such that $|f'(c)| < 1$ for all $x \ne y$ then we clearly know that

$\begin{equation*} |f(x) - f(y)| < |x - y| \end{equation*}$

hence it's a contraction.

Hölder continuity

A real- or complex-valued function $f$ on a d-dimensional Euclidean space is said to satisfy a Hölder condition or is Hölder continuous if there exists non-negative $C$ and $\alpha$ such that

$\begin{equation*} |f(x) - f(y)| \le C \norm{x - y}^{\alpha} \end{equation*}$

for all $x, y$ in the domain of $f$ .

This definition can easily be generalized to mappings between two different metric spaces.

Observe that if $\alpha = 1$ , we have Lipschitz continuity.

Càdlàg function

Let $\big( X, d \big)$ be a metric space, and let $E \subseteq \mathbb{R}$ .

A function $f: E \to X$ is called a càdlàg function if, for very $t \in E$ :

The left limit

$\begin{equation*} f(t - ) := \lim_{s \uparrow t} f(s) \end{equation*}$

exists.
The right limit

$\begin{equation*} f(t + ) := \lim_{s \downarrow t} f(s) \end{equation*}$

exists and equals $f(t)$

That is, $f$ is right-continuous with left limits.

Affine space

An affine space generalizes the properties of Euclidean spaces in such a way that these are independent of concepts of distance and measure of angles, keeping only properties related to parallelism and ratio of lengths for parallel line segments.

In an affine space, there is no distinguished point that serves as an origin. Hence, no vector has a fixed origin and no vector can be uniquely associated to a point. We instead work with displacement vectors, also called translation vectors or simply translations, between two points of the space.

More formally, it's a set $A$ to which is associated a vector space $\mathbf{A}$ and a transistive and free action $\Theta$ of the additive group of $\mathbf{A}$ . Explicitly, the definition above means that there is a map, generally denoted as an addition

$\begin{equation*} \Theta : A \times \mathbf{A} \to A \quad \text{such that} \quad (a, v) \mapsto a + v \end{equation*}$

which has the following properties:

Right identity: $∀ a ∈ A, a + 0 = 0 $
Associativity:

$\begin{equation*} (a + \mathbf{v}) + \mathbf{w} = a + (\mathbf{v} + \mathbf{w}), \quad \forall \mathbf{v}, \mathbf{w} \in \mathbf{A}, \ \forall a \in A \end{equation*}$
Free and transistive action: for every $a \in A$ , the restriction of the group action to $\{ a \} \times \mathbf{A} \approx \mathbf{A}$ , the induced mapping $\mathbf{A} \to A \ : \ \mathbf{v} \mapsto a + \mathbf{v}$ is a bijection.
Existence of one-to-one translations: For all $\mathbf{v} \in \mathbf{A}$ , the restriction of the group action to $A \times \{ \mathbf{v} \} \approx A$ , the induced mapping $A \to A: a \mapsto a + \mathbf{v}$ is a bijection.

This very rigorous definition might seem very confusing, especially I remember finding

$\begin{equation*} A \to A: \mathbf{v} \mapsto a + \mathbf{v} \end{equation*}$

to be quite confusing.

"How can you map from some set $A$ to itself when clearly the LHS contains an element from the vector space $\mathbf{A}$ !?"

I think it all becomes apparent by considering the following example: $A = \mathbb{R}^n$ and $\mathbf{A} = (A, + ) = (\mathbb{R}^n , + )$ . Then, the map

$\begin{equation*} \mathbb{R}^n \to \mathbb{R}^n : \mathbf{v} \mapsto a + \mathbf{v} \quad a \in \mathbb{R}^n, \mathbf{v} \in (\mathbb{R}^n, +) \end{equation*}$

Simply means that we're using the structure of the vector space $(\mathbb{R}^n, + )$ to map an element from $\mathbb{R}^n$ to $\mathbb{R}^n$ !

Could have written

$\begin{equation*} A \to A : \mathbf{v} \mapsto a +_{\mathbf{A}} \mathbf{v} \end{equation*}$

to make it a bit more apparent (but of course, a set does not have any inherit struct, e.g. addition).

Schwartz space

The Schwartz space $\mathcal{S}(\mathbb{R}^n)$ is the space of all $C^{\infty}$ functions $\psi$ on $\mathbb{R}^n$ s.t.

$\begin{equation*} \lim_{x \to \pm \infty} \left| \mathbf{x}^{\mathbf{j}} \partial^{\mathbf{k}} \psi(\mathbf{x}) \right| = 0 \end{equation*}$

for all $\mathbf{j}, \mathbf{k} \in \mathbb{N}^n$ .

Here if $\mathbf{j} = (j_1, \dots, j_n)$ then $\mathbf{x}^{\mathbf{j}} = x_1^{j_1} \cdots x_n^{j_n}$ and

$\begin{equation*} \partial^{\mathbf{j}} = \Bigg( \frac{\partial }{\partial x_1} \Bigg)^{j_1} \cdots \Bigg( \frac{\partial }{\partial x_n} \Bigg)^{j_n} \end{equation*}$

An element of the Schwartz space is called a Schwartz function.

Covering and packing

Let

$\big( \mathcal{X}, \norm{\cdot} \big)$ be a normed space
$A \subset \mathcal{X}$

We say a set of points $\left\{ x_1, \dots, x_M \right\}$ is an $\epsilon \text{-packing}$ of $A$ if

$\begin{equation*} \min_{i \ne j} \norm{x_i - x_j} > \epsilon \end{equation*}$

or, equivalently,

$\begin{equation*} \bigcap_{i = 1}^{M} B(x_i, \epsilon / 2) = \emptyset \end{equation*}$

The packing number of $A \subseteq \mathcal{X}$ as

$\begin{equation*} M_{\epsilon}(A) = \max \left\{ n : \exists \epsilon \text{-packing of } A \text{ of size } n \right\} \end{equation*}$

Theorems

Cauchy's Theorem

This theorem gives us another way of telling if a sequence of real numbers is Cauchy.

Let $\{x_n\}$ be a sequence of real numbers. Then $\{x_n\}$ is Cauchy if and only if $\{x_n\}$ converges (to some point $a$ in $\mathcal{R}$ ).

Suppose that $\{x_n\}$ is Cauchy. Given $\epsilon = 1$ , choose $N \in \mathcal{N}$ such that

$\begin{equation*} | x_N - x_M | < 1, \forall m \ge N \end{equation*}$

By the Triangle Inequality,

$\begin{equation*} | x_m | < 1 + |x_N| \quad \forall m \ge N \end{equation*}$

Therefore, $\{x_n\}$ is bounded by $M = \max \{ |x_1|, |x_2|, ..., |x_{N - 1}|, 1 + |x_N| \}$ .

By the Bolzano-Weierstrass Theorem

Telescoping series

$\begin{equation*} \sum_{k=1}^\infty \frac{1}{k (k + 1)} = \sum_{k=1}^\infty \frac{1}{k} - \frac{1}{k+1} = 1 \end{equation*}$

Bolzano-Weierstrass Theorem

A sequence $(I_n)_{n \in \mathbb{N}}$ of sets is said to be nested if

$\begin{equation*} I_1 \supset I_2 \supset I_3 \supset \dots \end{equation*}$

If $(I_n)_{n \in \mathbb{N}}$ is a nested subsequence of nonempty bounded intervals, then

$\begin{equation*} E = \bigcap_{n \in \mathbb{N}} I_n = \left\{ x \in \mathbb{R} : x \in I_n, \quad \forall n \in \mathbb{N} \right\} \end{equation*}$

is non-empty (i.e. contains at least one number).

Moreover, if $|I_n| \to 0$ then $E$ contains exactly one number (by non-emptiness of $E$ ).

Each bounded sequence in $\mathbb{R}^n$ has a convergent subsequence.

Assume that $a$ is the lower and $b$ the upper bound of the given sequence. Let $I_0 = [a, b]$ .

Divide $I_0$ into two halves, $I'$ and $I''$ :

$\begin{equation*} I' = \Bigg[a, \frac{a + b}{2} \Bigg], \quad I'' = \Bigg[ \frac{a + b}{2}, b \Bigg] \end{equation*}$

Since $I_0 = I' \cup I''$ at least one of these intervals contain $x_n$ for infinitively many values of $n$ . We denote the interval with this property $I_1$ . Let $n_1 > 1$ be such that $x_n \in I_1$ .

We proceed by induction. Divide the interval $I_m$ into two halves (like we did with $I_0$ ). At least one of the two halves will contain infinitively many $x_n$ , which we denote $I_{m + 1}$ . We choose $n_{m + 1} > n_m$ such that $I_{n_{m + 1}} \in I_{m + 1}$ .

Observe that $(I_n)$ is a nested subsequence of boudned and closed intervals, hence there exists $x \in \mathbb{R}$ that belongs to every interval $I_k$ .

$\begin{equation*} |x - x_{n_k}| \el |I_k| = \frac{b - a}{2^k} \end{equation*}$

By the Squeeze Theorem $x_{n_k} \to x$ as $k \to \infty$ .

Triangle Inequality

$\begin{equation*} || \mathbf{x} + \mathbf{y} || \le || \mathbf{x} || + || \mathbf{y} || \end{equation*}$

Mean Value Theorem

If a function $f$ is continuous on the closed interval $[a, b]$ , and differentiable on the open interval $(a, b)$ , then there exists a point $c$ in $(a, b)$ such that:

$\begin{equation*} f'(c) = \frac{f(b) - f(a)}{b - a} \end{equation*}$

Rolle's Theorem

Suppose $a, b \in \mathbb{R}$ with $a < b$ . If $f$ is continuous on $[a, b]$ , differentiable on $(a, b)$ and $f(a) = f(b)$ then $f'(c) = 0$ for some $c \in (a, b)$ .

Intermediate Value Theorem

Consider an interval $I = [a, b]$ on $\mathbb{R}$ and a continuous function $f : I \to \mathbb{R}$ . If $u$ is a number between $f(a)$ and $f(b)$ , then

$\begin{equation*} \exists c \in (a, b) : f(c) = u \end{equation*}$

Useful identities

Upper bound on abs of sin(x)

$\begin{equation*} | \sin x | \le |x| \end{equation*}$

due to the Mean Value Theorem.

M-test

Let $E$ be a nonempty subset of $\mathbb{R}$ and

$\begin{equation*} f_k : E \to \mathbb{R}, \quad k \in \mathbb{N} \end{equation*}$

and suppose

$\begin{equation*} \exists M_k \ge 0 \implies \sum_{k=1}^{\infty} M_k < \infty \end{equation*}$

(i.e. series is bounded ). If $|f_k(x)| \le M_k$ for $x \in E$ , then

$\begin{equation*} \sum_{k=1}^{\infty} f_k \end{equation*}$

converges absolutely and uniformly on $E$ .

Fixed Point Theory

Banach Fixed Point Theorem

Let $(X, d)$ be a be non-empty complete metric space with a contraction mapping $f: X \to X$ . Then $f$ admits a unique fixed-point $x^*$ in $X$ .

Furthermore, $x^*$ can be found as follows:

Start with an arbitrary element $x_0$ in $X$ and define a sequence $\{x_n\}$ by $x_n = f(x_{n-1})$ , then $x_n \to x^* \text{ as } n \to \infty$

When using this theorem in practice, apparently the most difficult part is to define the domain $X$ such that $f(X) \subseteq X$ .

Fundamental Contraction Inequality

By the triangle inequality we have

$\begin{equation*} \begin{split} d(x, y) &= d(x, f(x)) + d(f(x), f(y)) + d(y, f(y)) \\ & \le d(x, f(x)) + q d(x, y) + d(y, f(y)) \end{split} \end{equation*}$

Where we're just using the fact that for any two different $x$ and $y$ , $d(f(x), f(y))$ is at least less than $q d(x, y)$ by assumption of $f$ being a contraction mapping.

Solving for $d(x, y)$ we get

$\begin{equation*} d(x, y) = \frac{d(x, f(x) + d(y, f(y))}{1 - q} \end{equation*}$

Measure

Definition

A measure on a set is a systematic way of defining a number to each subset of that set, intuitively interpreted as size.

In this sense, a measure is a generalization of the concepts of length, area, volume, etc.

Motivation

The motivation behind defining such a thing is related to the Banach-Tarski paradox, which says that it is possible to decompose the 3-dimensional solid unit ball into finitely many pieces and, using only rotations and translations, reassemble the pieces into two solid balls each with the same volume as the original. The pieces in the decomposition, constructed using the axiom of choice, are non-measurable sets.

Informally, the axiom of choice, says that given a collecions of bins, each containing at least one object, it's possible to make a selection of exactly one object from each bin.

Measure space

If $X$ is a set with the sigma-algebra $\Sigma$ and the measure $\mu$ , then we have a measure space .

Sigma-algebra

Let $X$ be some set, and let $2^X$ be its power set. Then the subset $\Sigma \in 2^X$ is a called a σ-algebra on $X$ if it satisfies the following three properties:

$X \in \Sigma$
$\Sigma$ is closed under complement: if $X \in \Sigma \implies X^C \in \Sigma$
$\Sigma$ is closed under countable unions: if $A_1, A_2, A_3, ... \in \Sigma \implies \cup_{i=1}^\infty A_i \in \Sigma$

These properties also imply the following:

$\emptyset \in \Sigma$
$\Sigma$ is closed under countable intersections: if $A_1, A_2, A_3, ... \in \Sigma \implies \cap_{i=1}^\infty A_i \in \Sigma$

A measure $\mu$ on a measure space $(X, \mu)$ is said to be sigma-finite if $X$ can be written as a countable union of measurable sets of finite measure.

Borel sigma-algebra

Any set in a topological space that can be formed from the open sets through the operations of:

countable union
countable intersection
complement

is called a Borel set.

Thus, for some topological space $X$ , the collection of all Borel sets on $X$ forms a σ-algebra, called the Borel algebra or Borel σ-algebra .

Borel sets are important in measure theory, since any measure defined on the open sets of a space, or on the closed sets of a space, must also be defined on all Borel sets of that space. Any measure defined on the Borel sets is called a Borel measure.

Lebesgue sigma-algebra

Basically the same as the Borel sigma-algebra but the Lebesgue sigma-algebra forms a complete measure.

Note to self

Suppose we have a Lebesgue mesaure on the real line, with measure space $(\mathbb{R}, B, \lambda)$ .

Suppose that $A$ is non-measurable subset of the real line, such as the Vitali set. Then the $\lambda^2$ measure of $\{0\} \times A$ is not defined, but

$\begin{equation*} \{0\} \times A \subseteq \{0\} \times \mathbb{R} \end{equation*}$

and this larger set ( $\{0\} \times \mathbb{R}$ ) does have $\lambda^2$ measure zero, i.e. it's not complete !
Motivation

Suppose we have constructed Lebesgue measure on the real line: denote this measure space by $(\mathbb{R}, B, \lambda)$ . We now wish to construct some two-dimensional Lebesgue measure $\lambda^2$ on the plane $\mathbb{R}^2$ as a product measure.

Naïvely, we could take the sigma-algebra on $\mathbb{R}^2$ to be $B \otimes B$ , the smallest sigma-algebra containing all measureable "rectangles" $A_1 \times A_2$ for $A_i \in B$ .

While this approach does define a measure space, it has a flaw: since every singleton set has one-dimensional Lebesgue measure zero,

$\begin{equation*} \lambda^2 ( \{0\} \times A) = \lambda(\{0\}) \cdot \lambda(A) = 0 \end{equation*}$

for any subset of $\mathbb{R}$ .

What follows is the important part!

However, suppose that $A$ is non-measureable subset of the real line, such as the Vitali set. Then the $\lambda^2$ measure of $\{0\} \times A$ is not defined (since we just supposed that $A$ is non-measurable), but

$\begin{equation*} \{0\} \times A \subseteq \{0\} \times \mathbb{R} \end{equation*}$

and this larger set ( $\{0\} \times \mathbb{R}$ ) does have $\lambda^2$ measure zero, i.e. it's not complete !
Construction
Given a (possible incomplete) measure space $(X, \Sigma, \mu)$ , there is an extension $(X, \Sigma_0, \mu_0)$ of this measure space that is complete .

The smallest such extension (i.e. the smallest sigma-algebra $\Sigma_0$ ) is called the completion of the measure space.

It can be constructed as follows:
- Let $Z$ be the set of all $\mu$ measure zero subsets of $X$ (intuitively, those elements of $Z$ that are not already in $\Sigma$ are the ones preventing completeness from holding true)
- Let $\Sigma_0$ be the sigma-algebra generated by $\Sigma$ and $Z$ (i.e. the smallest sigma-algreba that contains every element of $\Sigma$ and of $Z$ )
- $\mu$ has an extension to $\Sigma_0$ (which is unique if $\mu$ is sigma-finite), called the outer measure of $\mu$ , given by the infimum
$\begin{equation*} \mu_0(C) := \inf \{\mu(D) \ | \ C \subseteq D \in \Sigma \} \end{equation*}$

Then $(X, \Sigma_0, \mu_0)$ is a complete measure space, and is the completion of $(X, \Sigma, \mu)$ .
What we're saying here is:
- For the "multi-dimensional" case we need to take into account the zero-elements in the resulting sigma-algebra due the product between the 1D zero-element and some element NOT in our original sigma-algebra
- The above point means that we do NOT necessarily get completeness, despite the sigma-algebras defined on the sets individually prior to taking the Cartesian product being complete
- To "fix" this, we construct a outer measure $\mu_0$ on the sigma-algebra where we have included all those zero-elements which are "missed" by the naïve approach, $\Sigma_0$

Product measure

Given two measurable spaces and measures on them, one can obtain a product measurable space and a product measure on that space.

A product measure $\mu_1 \times \mu_2$ is defined to be a measure on the measurable space $(X_1 \times X_2, \Sigma_1 \otimes \Sigma_2)$ , where we've let $\Sigma_1 \otimes \Sigma_2$ be the algebra on the Cartesian product $X_1 \times X_2$ . This sigma-algebra is called the tensor-product sigma-algebra on the product space.

A product measure $\mu_1 \times \mu_2$ is defined to be a measure on the measurable space $(X_1 \times X_2, \Sigma_1 \otimes \Sigma_2)$ satisfying the property

$\begin{equation*} (\mu_1 \times \mu_2) (B_1 \times B_2) = \mu_1(B_1) \mu_2 (B_2), \quad \forall \ B_1 \in \Sigma_1, \ B_2 \in \Sigma_2 \end{equation*}$

Complete measure

A complete measure (or, more precisely, a complete measure space ) is a measure space in which every subset of every null set is measurable (having measure zero).

More formally, $(X, \Sigma, \mu)$ is complete if and only if

$\begin{equation*} S \subseteq N \in \Sigma \text{ and } \mu(N) = 0 \implies S \in \Sigma \end{equation*}$

Lebesgue measure

Given a subset $E \subseteq \mathbb{R}$ , with the length of a closed interval $I = [a,b]$ given by $\ell (I) = b - a$ , the Lebesgue outer measure $\lambda^* (E)$ is defined as

$\begin{equation*} \lambda^*(E) = \inf \Bigg\{ \sum_{k=1}^{\infty} \ell(I_k) : (I_k)_{k \in \mathbb{N}} \text{ is a sequence of closed intervals with } E \subseteq \underset{k=1}{\overset{\infty}{\cup}} I_k \Bigg\} \end{equation*}$

The Lebesgue measure is then defined on the Lebesgue sigma-algebra, which is the collection of all the sets $E$ which satisfy the condition that, for every $A \subseteq \mathbb{R}$

$\begin{equation*} \lambda^*(A) = \lambda^* (A \cap E) + \lambda^* (A \cap E^c) \end{equation*}$

For any set in the Lebesgue sigma-algrebra, its Lebesgue measure is given by its Lebesgue outer measure $\lambda (E) = \lambda^*(E)$ .

IMPORTANT!!! This is not necessarily related to the Lebesgue integral! It CAN be be, but the integral is more general than JUST over some Lesgue measure.

Intuition

First part of definition states that the subset $E$ is reduced to its outer measure by coverage by sets of closed intervals
Each set of intervals $I$ covers $E$ in the sense that when the intervals are combined together by union, they contain $E$
Total length of any covering interval set can easily overestimate the measure of $E$ , because $E$ is a subset of the union of the intervals, and so the intervals include points which are not in $E$

Lebesgue outer measure emerges as the greatest lower bound (infimum) of the lengths from among all possible such sets. Intuitively, it is the total length of those interval sets which fit $E$ most tightly and do not overlap.

In my own words: Lebesgue outer measure is smallest sum of the lengths of subintervals $I_k$ s.t. the union of these subintervals $I_k$ completely "covers" (i.e. are equivalent to) $E$ .

If you take an a real interval $I = [a, b]$ , then the Lebesge outer measure is simply $\ell(I) = b - a$ .

Lebesgue Integral

The Lebesgue integral of a function $f$ over a measure space $(X, \Sigma, \mu)$ is written

$\begin{equation*} \int_X f \ d \mu \end{equation*}$

which means we're taking the integral wrt. the measure $\mu$ .

Special case: non-negative real-valued function

Suppose that $f : \mathbb{R} \to \mathbb{R}^+$ is a non-negative real-valued function.

Using the "partitioning of range of $f$ " philosophy, the integral of $f$ should be the sum over $t$ of the elementary area contained in the thin horizontal strip between $y = t$ and $y = t + dt$ , which is just

$\begin{equation*} \mu (\{x \ | \ f(x) > t\} ) dt \end{equation*}$

Letting

$\begin{equation*} f^*(t) = \mu (\{x \ | \ f(x) > t\} ) dt \end{equation*}$

The Lebesgue integral of $f$ is then defined by

$\begin{equation*} \int f \ d\mu = \int_0^\infty f^*(t) \ dt \end{equation*}$

where the integral on the right is an ordinary improper Riemann integral. For the set of measurable functions, this defines the Lebesgue integral.

Measurable function

Let $(X, \Sigma)$ and $(Y, T)$ be measurable spaces.

A function $f: X \to Y$ is said to be measurable if the preimage of $E$ under $f$ is in $\Sigma$ for every $E \in T$ , i.e.

$\begin{equation*} \text{preim}_f (E) := \left\{ x \in X \mid f(x) \in E \right\} \in \Sigma, \quad \forall E \in T \end{equation*}$

Radon measure

Hard to find a good notion of measure on a topological space that is compatible with the topology in some sense
One way is to define a measure on the Borel set of the topological space

Let $\mu$ be a measure on the sigma-algebra of Borel sets of a Hausdorff topological space $X$ .

$\mu$ is called inner regular or tight if, for any Borel set $B$ , $\mu(B)$ is the supremum of $\mu(K)$ over all compact subsets of $K$ of $B$
$\mu$ is called outer regular if, for any Borel set $B$ , $\mu(B)$ is the infimum of $\mu(U)$ over all open sets $U$ containing $B$
$\mu$ is called locally finite if every point of $X$ has a neighborhood $U$ for which $\mu(U)$ is finite (if $\mu$ is locally finite, then it follows that $\mu$ is finite on compact sets)

The measure $\mu$ is called a Radon measure if it is inner regular and locally finite.

Suppose $\mu$ and $\nu$ are two $\sigma \text{-finite}$ measures on a measures space $\big( X, \Omega \big)$ and that $\mu$ is absolutely continuous wrt. $\nu$ .

Then there exists a non-negative, measurable function $\rho$ on $X$ such that

$\begin{equation*} \mu(E) = \int_E \rho \ d \nu, \quad \forall E \in \Omega \end{equation*}$

The function $\rho$ is called the density or Radon-Nikodym derivative of $\mu$ wrt. $\nu$ .

Continuity of measure

Suppose $\mu$ and $\nu$ are two sigma-finite measures on a measure space $(X, \Omega)$ .

Then we say that $\mu$ is absolutely continuous wrt. $\nu$ if

$\begin{equation*} \nu(E) = 0 \implies \mu(E) = 0, \quad \forall E \in \Omega \end{equation*}$

We say that $\mu$ and $\nu$ are equivalent if each measure is absolutely continuous wrt. to the other.

Density

Suppose $\mu$ and $\nu$ are two sigma-finite measures on a measure space $(X, \Omega)$ and that $\mu$ is absolutely continuous wrt. $\nu$ . Then there exists a non-negative, measurable function $\rho$ on $X$ such that

$\begin{equation*} \mu(E) = \int_{E} \rho \ \ d \nu \end{equation*}$

Measure-preserving transformation

$T: X \to X$ is a measure-preserving transformation is a transformation on the measure-space $(X, \Sigma, \mu)$ if

$\begin{equation*} \mu \Big( T^{-1}(A) \Big) = \mu(A), \quad \forall A \in \Omega \end{equation*}$

Sobolev space

Notation

$\Omega$ is an open subset of $\mathbb{R}^n$
$\varphi \in C_c^{\infty}(\Omega)$ denotes a infinitively differentiable function $\varphi$ with compact support
$\alpha$ is a multi-index of order $|\alpha| = k$ , i.e.

$\begin{equation*} D^{\alpha} f = \frac{\partial^{|\alpha|} f}{\partial x_1^{\alpha_1} \dots \partial x_n^{\alpha_n}} \end{equation*}$

Definition

Vector space of functions equipped with a norm that is a combination of $L^p$ norms of the function itself and its derivatives to a given order.

Intuitively, a Sobolev space is a space of functions with sufficiently many derivatives for some application domain, e.g. PDEs, and equipped with a norm that measures both size and regularity of a function.

The Sobolev space spaces $W^{k,p}(\Omega)$ combine the concepts of weak differentiability and Lebesgue norms (i.e. $L^p$ spaces).

For a proper definition for different cases of dimension of the space $k$ , have a look at Wikipedia.

Motivation

Integration by parst yields that for every $u \in C^k(\Omega)$ where $k \in \mathbb{N}$ , and for all infinitively differentiable functions with compact support $\varphi \in C_c^{\infty}(\Omega)$ :

$\begin{equation*} \int_{\Omega} u D^\alpha \varphi \ dx = \big( -1 \big)^{|\alpha|} \int_{\Omega} \varphi D^{\alpha} u \ dx \end{equation*}$

Observe that LHS only makes sense if we assume $u$ to be locally integrable. If there exists a locally integrable function $v$ , such that

$\begin{equation*} \int_{\Omega} u D^{\alpha} \varphi \ dx = \big( -1 \big)^{|\alpha|} \int_{\Omega} \varphi v \ dx ,\quad \varphi \in C_c^{\infty}(\Omega) \end{equation*}$

we call $v$ the weak $\alpha$ -th partial derivative of $u$ . If this exists, then it is uniquely defined almost everywhere, and thus it is uniquely determined as an element of a Lebesgue space (i.e. $L^p$ function space).

On the other hand, if $u \in C^k(\Omega)$ , then the classical and the weak derivative coincide!

Thus, if $v = \partial_{\alpha} u$ , we may denote it by $D^{\alpha} u := v$ .

Example

$\begin{equation*} u(x)=\begin{cases}1+x&-1<x<0\\10&x=0\\1-x&0<x<1\\0&{\text{otherwise}}\end{cases} \end{equation*}$

is not continuous at zero, and not differentiable at −1, 0, or 1. Yet the function

$\begin{equation*} v(x)= \begin{cases}1&-1<x<0\\-1&0<x<1\\0&{\text{otherwise}}\end{cases} \end{equation*}$

satisfies the definition of being the weak derivative of $u(x)$ , which then qualifies as being in the Sobolev space $W^{1, p}$ (for any allowed $p$ ).

Ergodic Theory

Let $T: X \to X$ be a measure-preserving transformation on a measure space $(X, \Sigma, \mu)$ with $\mu(X) = 1$ , i.e. it's a probability space.

Then $T$ is ergodic if for every $E \in \Sigma$ we have

$\begin{equation*} T^{-1}(E) = E \implies \mu(E) = 0 \text{ or } \mu(E) = 1 \end{equation*}$

Limits of sequences

Infinite Series of Real Numbers

Theorems

Abel's formula

Let $\{a_k\}_{k \in \mathbb{N}}$ and $\{b_k\}_{k \in \mathbb{N}}$ be real sequences, and for each pair of integers $n \ge m \ge 1$ set

$\begin{equation*} A_{n,m} := \sum_{k=m}^{n} a_k \end{equation*}$

Then

$\begin{equation*} \sum_{k=m}^{n}a_k b_k = A_{n, m} b_n - \sum_{k=m}^{n-1}A_{k,m} (b_{k+1} - b_k) \end{equation*}$

for all integeres $n > m \ge 1$ .

Since $A_{k, m} - A_{(k-1), m} = a_k$ for $k > m$ and $A_{m,m} = a_m$ , we have

$\begin{equation*} \begin{split} \sum_{k=m}^{n}a_k b_k =& a_m b_m + \sum_{k=m+1}^{n}(A_{k,m} - A_{(k-1), m}) b_k \\ =& a_m b_m + \textcolor{green}{\sum_{k=m+1}^{n}A_{k,m} b_k} - \textcolor{red}{\sum_{k=m}^{n - 1} A_{k, m} b_{k+1}} \\ =& a_m b_m + \textcolor{green}{\Bigg( A_{n,m} b_n + \sum_{k=m+1}^{n-1}A_{k,m} b_k \Bigg)} - \textcolor{red}{\Bigg( A_{m,m} b_{m+1} + \sum_{k=m+1}^{n - 1}A_{k,m} b_{k+1} \Bigg)} \\ =& A_{n,m} b_n - A_{m,m} (b_{m+1} - b_m) - \sum_{k=m+1}^{n-1}A_{k,m} (b_{k+1} - b_k) \\ =& A_{n,m} b_n - \sum_{k=m}^{n-1}A_{k, m} (b_{k+1} - b_k) \end{split} \end{equation*}$

Infinite Series of Functions

Uniform Convergence

Theorems

Cauchy criterion

Let $E$ be a nonempty subset of $\mathbb{R}$ , and let $f_n : E \to \mathbb{R}$ be a sequence of functions.

Then $f_n$ converges uniformly on $E$ if and only if for every $\varepsilon > 0$ there is an $N \in \mathbb{N}$ such that

$\begin{equation*} n, m \ge N \quad \implies \quad |f_n(x) - f_m(x)| < \varepsilon \end{equation*}$

for all $x \in E$ .

Generally about uniform convergence

Let $E$ be a nonempty subset of $\mathbb{R}$ and let $\{f_k\}$ be a sequence of real functions defined on $E$

i) Suppose that $x_0 \in E$ and that each $f_k$ is continuous at $x_0 \in E$ . If $f = \sum_{k=1}^{\infty} f_k$ converges uniformly on $E$ , then $f$ is continuous at $x_0 \in E$

ii) [Term-by-term integration] Suppose that $E = [a, b]$ and that each $f_k$ is integrable on $[a, b]$ . If $f = \sum_{k=1}^{\infty} f_k$ converges uniformly on $[a, b]$ , then $f$ is integrable on $[a, b]$ and

$\begin{equation*} \int_a^b \sum_{k=1}^{\infty} f_k(x) \ dx = \sum_{k=1}^{\infty} \int_a^b f_k(x) \ dx \end{equation*}$

iii) [Term-by-term differentiation] Suppose that $E$ is a bounded, open interval and that each $f_k$ is differentiable on $E$ . If $\sum_{k=1}^{\infty} f_k$ converges at some $x_0 \in E$ , and $\sum_{k=1}^{\infty} f_k'$ converges uniformly on $E$ , then $f := \sum_{k=1}^{\infty} f_k$ converges uniformly on $E$ , $f$ is differentiable on $E$ , and

$\begin{equation*} \Bigg( \sum_{k=1}^{\infty} f_k(x) \Bigg)' = \sum_{k=1}^{\infty} f_k'(x) \end{equation*}$

Suppose that $f_n \to f$ uniformly on a closed interval $[a, b]$ . If each $f_n$ is integrable on $[a, b]$ , then so is $f$ and

$\begin{equation*} \lim_{n \to \infty} \int_{a}^{b} f_n(x) \ dx = \int_a^b \Big( \lim_{n \to \infty} f_n(x) \Big) \ dx \end{equation*}$

In fact,

$\begin{equation*} \lim_{n \to \infty} \int_a^x f_n(x) \ dx = \int_a^x f(x) \ dx \end{equation*}$

uniformly on $x \in [a, b]$ .

Problems

7.2.4

Let

$\begin{equation*} f(x) = \sum_{k=1}^\infty \frac{\cos(kx)}{k^2} \end{equation*}$

$\begin{equation*} \int_0^{\pi / 2} f(x) \ dx = \sum_{k=0}^\infty \frac{(-1)^k}{(2k + 1)^3} \end{equation*}$

Show that the series converges on $[0, \frac{\pi}{2}]$
We can integrate series term by term

Start by bounding the terms in the sum:

$\begin{equation*} \Bigg| \frac{\cos kx}{k^2} \Bigg| \le \frac{1}{k^2} = M_k \end{equation*}$

And since the series $\sum_k 1 / k^2$ converges, the series in question converges.

Further,

$\begin{equation*} \begin{align} \int_0^{\pi / 2} f(x) dx &= \sum_{k=1}^\infty \int_0^{\pi / 2} \frac{\cos kx}{k^2} \ dx \\ &= \sum_{k=1}^\infty \Big[ \frac{\sin kx}{k^3} \Big]_0^{\pi / 2} \\ &= \sum_{k=1}^\infty \frac{\sin (k \pi / 2)}{k^3} \end{align} \end{equation*}$

Here we note that the numerator will only take on the values $-1, 0, 1$ , and in the non-zero cases the denominator will be as in the claim.

TODO 7.2.5

$\begin{equation*} \sum_{k=1}^\infty \frac{1}{k} \sin \Big( \frac{x}{k + 1} \Big) \end{equation*}$

converges pointwise on $\mathbb{R}$ and uniformly on each bounded interval in $\mathbb{R}$ to a differentiable function $f$ which satisfies

$\begin{equation*} |f(x)| \le |x| \text{ and } |f'(x)| \le 1 \end{equation*}$

for all $x \in \mathbb{R}$

Pointwise convergence on $\mathbb{R}$
Uniform convergence $[a, b]$ with $a, b \in \mathbb{R}$

For 1. we observe that

$\begin{equation*} \mathbb{R} = \underset{m \ge 1}{\cup} [ -m, m] \end{equation*}$

$\begin{equation*} \begin{align} \Big| \frac{1}{k} \sin \Big( \frac{x}{k + 1} \Big) \Big| &= \frac{1}{k} \Big| \sin \Big(\frac{x}{k + 1} \Big) \Big| \\ &\le \frac{|x|}{k (k + 1)} \\ &\le \frac{\max \{ |a|, |b| \}}{k(k+1)} \\ &= \max \{ |a|, |b| \} \end{align} \end{equation*}$

where the last step is due to the sum being a Telescoping series, which equals 1.

We then use the M-test, hence we get convergence in uniform on $[a, b]$ .

Now that we know that the series converges, we need to establish that the function satisfies the boundaries.

$\begin{equation*} f'(x) \overset{?}{=} \sum_{k=1}^\infty \Big( \frac{1}{k} \sin \big( \frac{x}{k + 1} \big) \Big) = \sum_{k=1}^\infty \frac{1}{k(k+1)} \cos \big( \frac{x}{k + 1} \big) \end{equation*}$

Where $\overset{?}{=}$ becomes $=$ if we can prove that RHS converges.

TODO 7.2.6

$\begin{equation*} \Big| \sum_{k=1}^\infty (1 - \cos ( 1 / k)) \Big| \le 2 \end{equation*}$

Look at the more general case

$\begin{equation*} f(x) = \sum_{k=1}^\infty \Big(1 - \cos(x / k) \Big) \end{equation*}$

Look also at $f'(x)$

Workshop 2

6
- Question
  
  Let $f_n(x) = n x(1 - x^2)^n$ for $x \in [0, 1]$ . Prove that $f_n$ converges pointwise on $[0, 1]$ and find the limit function. Is the convergence uniform on $[0, 1]$ ? Is the convergence uniform on $[a, 1]$ with $a \in (0, 1)$ ?
- Answer
  
  First observe that for $x= 0$ and $x = 1$ we have
  
  $\begin{equation*} f_n(0) = f_n(1) = 0 \end{equation*}$
  
  And for $x \in (0, 1)$
  
  $\begin{equation*} f_n(x) = n x (1 - x^2)^n < n (1 - x^2)^n \to 0 \end{equation*}$
  
  Therefore the limiting function is
  
  $\begin{equation*} f(x) = 0 \end{equation*}$
  
  Is the convergence uniform on $[0, 1]$ ? No! By Thm. 7.10 in introduction_to_analysis we know that if $f_n \to f$ uniformly then
  
  $\begin{equation*} \lim_{n \to \infty} \int_0^1 f_n(x) \ dx = \int_0^1 \lim_{n \to \infty} f_n(x) \ dx = \int_0^1 f(x) \ dx \end{equation*}$
  
  but in this case
  
  $\begin{equation*} \int_0^1 f_n(x) \ dx = \frac{n}{2 (n + 1)} \to \frac{1}{2} \ne \int_0^1 f(x) \ dx = 0 \end{equation*}$
  
  Hence we have a proof by contradiction.
  
  Is the convergence uniform on $[a, 1]$ ? Yes!
  
  $\begin{equation*} f_n(x) = n x (1 - x^2)^n \le n (1 - x^2)^n \le n (1 - a^2)^n \to 0 \end{equation*}$
  
  and
  
  $\begin{equation*} f_n(1) = 0 \end{equation*}$
  
  hence $f_n \to f$ uniformly on $[a, 1]$ for $a \in (0, 1)$ .
7
- Question
  
  Let $f_n: \mathbb{R} \to \mathbb{R}$ be a sequence of continuous functions which converge uniformly to a function $f: \mathbb{R} \to \mathbb{R}$ . Let $\big( x_n \big)$ be a sequence of real numbers which converges to $x \in \mathbb{R}$ . Show that $f_n(x_n) \to f(x)$ .
- Answer
  
  Observe that
  
  $\begin{equation*} | f_n(x_n) \to f(x) | \le \underbrace{|f_n(x_n) - f(x_n)|}_{< \varepsilon_1} + \underbrace{|f(x_n) - f(x)|}_{< \varepsilon_2} \end{equation*}$
  
  for some $\varepsilon_1 > 0$ and $\varepsilon_2 > 0$ . We know
  
  $\begin{equation*} f_n \to f \quad \text{uniformly} \implies \exists N_1 \in \mathbb{N} : |f_n(x_n) - f(x_n)| < \varepsilon_1, \quad \forall n \ge N_1 \end{equation*}$
  
  and for all $\delta > 0$
  
  $\begin{equation*} \exists N_2 \in \mathbb{N}: x_n \to x \iff |x_n - x| < \delta \end{equation*}$
  
  Further, by Theorem 7.10 introduction_to_analysis,
  
  $\begin{equation*} f_n \text{ continuous and } f_n \to f \text{ uniformly} \implies f \text{ continuous} \end{equation*}$
  
  which implies
  
  $\begin{equation*} \exists \delta > 0: |x_n - x| < \delta \implies |f(x_n) - f(x)| < \varepsilon_2, \quad \forall n \ge N_2 \end{equation*}$
  
  Therefore, for $\varepsilon > 0$ , let
  
  $\begin{equation*} \max \left\{ \varepsilon_1, \varepsilon_2 \right\} < \varepsilon \end{equation*}$
  
  and
  
  $\begin{equation*} N := \max \left\{ N_1, N_2 \right\} \end{equation*}$
  
  then
  
  $\begin{equation*} |f_n(x_n) \to f(x)| < \varepsilon \iff f_n(x_n) \to f(x) \end{equation*}$
  
  as wanted.

Uniform Continuity

Theorems

Suppose $f: [a, b] \to \mathbb{R}$ is continuous. Then it is uniformly continuous.

Problems

Workshop 3

5
- Question
  
  Let $I$ be an open interval in $\mathbb{R}$ . Suppose $f: I \to \mathbb{R}$ is differentiable and its derivative $f'$ is bounded on $I$ . Prove that $f$ is uniformly continuous on $I$ .
- Answer
  
  Suppose
  
  $\begin{equation*} |f'(x)| \le M, \forall x \in I \end{equation*}$
  
  for $M > 0$ . Then by Mean Value theorem we have
  
  $\begin{equation*} \exists c \in I : |f(y) - f(x)| = |f'(c)| |x - y| \end{equation*}$
  
  Therefore,
  
  $\begin{equation*} |f(y) - f(x)| \le M |x - y| \end{equation*}$
  
  Thus, let
  
  $\begin{equation*} \delta = \varepsilon / M \end{equation*}$
  
  Then
  
  $\begin{equation*} |x - y| < \delta \implies |f(y) - f(x)| \le M |x - y| < M \delta = \varepsilon \end{equation*}$
  
  Hence $f$ is uniformly continuous.

Power series

Definitions

Let $(a_n)$ be a sequence of real numbers, and $c \in \mathbb{R}$ . A power series is a series of the form

$\begin{equation*} \sum_{n=0}^\infty a_n (x - c)^n \end{equation*}$

The numbers $a_n$ are called coefficients and the constant $c$ is called the center.

Suppose we have the power series

$\begin{equation*} \sum_{n=0}^\infty a_n (x - c)^n \end{equation*}$

then the radius of convergence $R$ is given by

$\begin{equation*} R = \sup \{ r \ge 0 : (a_n r^n) \text{ is bounded} \} \end{equation*}$

unless $(a_n r^n)$ is bounded for all $r \ge 0$ , in which case we declare $R = \infty$

I.e. $R$ is the smallest number such that all series with $0 < r \le R$ is bounded.

Analytic functions

We say a function is analytic if it can be expressed as a power-series .

More precisely, $f$ is analytic on $\{x : |x-c| < r\}$ if there is a power series which converges to $f$ on $\{x : |x-c| < r\}$ .

Theorems

$\begin{equation*} R = \underset{n \to \infty}{\lim} \underset{k \ge m}{\inf} \frac{1}{|a_k|^{1 / k}} \end{equation*}$

This holds in general , thus is basically another way of defining the radius of convergence .

$\begin{equation*} R = \underset{n \to \infty}{\lim} \frac{1}{|a_k|^{1 / k}} \end{equation*}$

provided this limit exists.

Converges to a continous function

Assume that $R > 0$ . Suppose that $0 < r < R$ . Then the power series converges uniformly and absolutely on $| x - c|$ to a continuous function $f$ . That is,

$\begin{equation*} f(x) = \sum_{n=1}^{\infty} a_n (x - c)^n \end{equation*}$

Taylor's Theorem

Suppose the radius of convergence is $R$ . Then the function

$\begin{equation*} f(x) = \sum_{n=0}^{\infty} a_n (x - c)^n \end{equation*}$

is infinitely differentiable on $|x - c| < R$ , and for such x,

$\begin{equation*} f'(x) = \sum_{n=1}^{\infty} n a_n (x - c)^{n-1} \end{equation*}$

and the series converges absolutely, and also uniformly on $[r - c, r + c]$ for any $ r < R$. Moreover

$\begin{equation*} a_n = \frac{f^{(n)}(c)}{n!} \end{equation*}$

Consider the series

$\begin{equation*} \sum_{n=0}^{\infty} n a_n (x - c)^{n-1} \end{equation*}$

which has radius of convergence $R$ and so converges uniformly on $[r - c, r + c]$ for any $r < R$ .

Since

$\begin{equation*} n a_n (x - c)^{n-1} = \frac{d}{dx} a_n (x - c)^n \end{equation*}$

and $a_n (x - c)^n$ at least converges at one point, then by Theorem 7.14 in Wade's we know that

$\begin{equation*} f'(x) = \sum_{k=1}^{\infty} n a_n (x - c)^{n-1} \end{equation*}$

Further, we have $f(c) = a_0$ and $f'(c) = a_1$ , which we can keep on doing and end up with

$\begin{equation*} f^{(n)}(c) = a_n n! \iff a_n = \frac{f^{(n)}(c)}{n!} \end{equation*}$

Problems

Finding radius of convergence

The power series

$\begin{equation*} \sum_{n=1}^\infty \frac{x^n}{n} \end{equation*}$

has a radius of convergence $R = 1$ , and converges absolutely on the interval $x \in [-1, 1)$ .

One can easily see that the series convergences on the the interval $x \in (-1, 1)$ , and so we consider the endpoints of this interval.

$x = -1 \implies \sum_{n=1}^\infty \frac{(-1)^n}{n}$ convergences
$x = -1 \implies \sum_{n=1}^\infty \frac{1^n}{n}$ , which is known as the harmonic series and is known to diverge

7.3.1

a)

The series

$\begin{equation*} \sum_{k=0}^{\infty} \frac{k x^k}{(2k + 1)^2} \end{equation*}$

converges on the interval $x \in [-1, 1)$ , and has radius of convergence $R = 1$ .

$\begin{equation*} \sum_{k=0}^{\infty} \frac{k x^k}{(2k + 1)^2} \end{equation*}$

Letting $a_k = \frac{k}{(2k + 1)^2}$ we write

$\begin{equation*} \sum_{k=0}^{\infty} a_k x^k \end{equation*}$

Using the Ratio test, gives us

$\begin{equation*} R = \underset{k \to \infty}{\lim} \frac{\frac{k}{(2k + 1)^2}}{\frac{k + 1}{(2k + 3)^2}} = \underset{k \to \infty}{\lim} \frac{k}{k+1} \frac{(2k + 3)^2}{(2k + 1)^2} = 1 \end{equation*}$

Thus we have the endpoints $-1$ and $1$ .

For $x = 1$ we get the series

$\begin{equation*} \sum_{k=0}^{\infty} \frac{k}{(2k + 1)^2} \end{equation*}$

which converges.

For $x = -1$ we get the series

$\begin{equation*} \sum_{k=0}^{\infty} \frac{k (-1)^k}{(2k + 1)^2} \end{equation*}$

for which we can use the alternating series test to show that it converges.

Thus, we have the series converging on the interval $x \in [-1, 1)$ .

b)

$\begin{equation*} \sum_{k=0}^{\infty} \Big(2 + (-1)^k \Big)^k x^{2k} \end{equation*}$

We observe that

$\begin{equation*} a_k = \begin{cases} 3^k & \quad k \text{ is even} \\ 1 & \quad k \text{ is odd} \end{cases} \end{equation*}$

and let $y = x^2$ , writing the series as

$\begin{equation*} \sum_{k=0}^{\infty} a_k y^k \end{equation*}$

and using the root test, we get

$\begin{equation*} b_k = \frac{1}{|a_k|^{1 / k}} = \begin{cases} \frac{1}{\sqrt{3}} & \quad k \text{ is even} \\ 1 & \quad k \text{ is odd} \end{cases} \end{equation*}$

where we let the above equal $b_k$ for the sake of convenience.

Then we use the lim-inf definition of radius of convergence

$\begin{equation*} R = \underset{n \to \infty}{\lim} \underset{k \ge m}{\inf} b_k = \underset{n \to \infty}{\lim} \frac{1}{\sqrt{3}} = \frac{1}{\sqrt{3}} \end{equation*}$

c)

$\begin{equation*} \sum_{k=0}^{\infty} 3^{k^2} x^{k^2} \end{equation*}$

We then let

$\begin{equation*} a_n = \begin{cases} 0 & \quad \text{if } m \ne k^2 \\ 3^{k^2} & \quad \text{if } m = k^2 \end{cases} \end{equation*}$

Then we can apply the root test.

7.3.2

a)

Look at solution to 7.3.1. a)

b)

c)

Integrability on R

Definitions

Partition

Let $a, b \in \mathbb{R}$ with $a < b$

i) A partition of the interval $[a,b]$ is a set of points $P = \{x_0, x_1, \dots, x_n \}$ such that

$\begin{equation*} a = x_0 < x_1 < \dots < x_n = b \end{equation*}$

ii) The norm of a partition $P = \{x_0, x_1, \dots, x_n\}$ is the number

$\begin{equation*} || P || = \underset{1 \le j \le n}{\max} | x_j - x_{j-1} | \end{equation*}$

iii) A refinement of a partition $P = \{x_0, x_1, \dots, x_n\}$ is a partition $Q$ of $[a, b]$ which satisfies $Q \supseteq P$ . In this case we say that $Q$ is finer than $P$ .

Riemann sum

Let $a, b \in \mathbb{R}$ with $a < b$ , let $P = \{x_0, x_1, x_2, \dots, x_n\}$ be a partition of the interval $[a, b]$ , set $\Delta x_j := x_j - x_{j-1}$ for $j = 1, 2, \dots, n$ and suppose that $f : [a, b] \to \mathbb{R}$ is bounded.

i) The upper Riemann/Darboux sum of $f$ over $P$ is the number

$\begin{equation*} U(f, P) := \sum_{j=1}^{n} M_j(f) \Delta x_j \end{equation*}$

where

$\begin{equation*} M_j(f) := \sup f([x_{j-1}, x_j]) := \underset{t \in [x_{j-1}, x_j]}{\sup} f \end{equation*}$

ii) The lower Riemann/Darboux sum of $f$ over $P$ is the number

$\begin{equation*} L(f, P) := \sum_{j=1}^{n} m_j(f) \Delta x_j \end{equation*}$

where

$\begin{equation*} m_j(f) := \inf f([x_{j-1}, x_j]) := \underset{t \in [x_{j-1}, x_j]}{\inf} f \end{equation*}$

Since we assumed $f$ to be bounded, the numbers $M_j$ and $m_j$ exist and are finite.

Riemann integrable

Let $a, b \in \mathbb{R}$ with $a < b$ . A function $f: [a, b] \to \mathbb{R}$ is said to be Riemann integrable on $[a, b]$ if and only if $f$ is bounded on $[a, b]$ , and for every $\varepsilon > 0$

$\begin{equation*} \exists P = \{ x_0, x_1, \dots, x_n \} \text{ of } [a, b] \implies U(f, P) - L(f, P) < \varepsilon \end{equation*}$

I.e. the upper and lower Riemann / Darboux sums has to converge to the same value.

Riemann integral

Let $a, b \in \mathbb{R}$ with $a < b$ , and $f : [a, b] \to \mathbb{R}$ be bounded.

i) The upper integral of $f$ on $[a, b]$ is the number

$\begin{equation*} (U) \int_a^b f(x) \ dx := \inf \Big\{ U(f, P) : P \text{ is a partition of } [a,b] \Big\} \end{equation*}$

ii) The lower integral of $f$ on $[a, b]$ is the number

$\begin{equation*} (L) \int_a^b f(x) \ dx := \sup \Big\{ L(f, P) : P \text{ is a partition of } [a,b] \Big\} \end{equation*}$

iii) If the upper and lower integral are equal, we define the integral to be this number

$\begin{equation*} \int_a^b f(x) \ dx := (U) \int_a^b f(x) \ dx = (L) \int_a^b f(x) \ dx \end{equation*}$

The following definition of the Riemann sum can be proven to be equivalent of the upper and lower integrals using introduction_to_analysis.

Let $f : [a, b] \to \mathbb{R}$

i) A Riemann sum of $f$ wrt. a partition $P = \{ x_0, x_1, \dots, x_n \}$ of $[a, b]$ generated by samples $t_j \in [x_{j - 1}, x_j]$ is a sum

$\begin{equation*} \mathcal{S}(f, P, t_j) := \sum_{j=1}^{n} f(t_j) \Delta x_j \end{equation*}$

ii) The Riemann sums of $f$ are said to converge to $I(f)$ as $||P|| \to 0$ if and only if given $\varepsilon > 0$ there is a partition $P_\varepsilon$ of $[a ,b]$ such that

$\begin{equation*} P = \{ x_0, x_1, \dots, x_n \} \supseteq P_\varepsilon \implies | \mathcal{S}(f, P, t_j) - I(f) | < \varepsilon \end{equation*}$

for all choices of $t_j \in [x_{j-1}, x_j], j = 1, 2, \dots, n$ . In this case we shall use the notation

$\begin{equation*} I(f) = \underset{||P|| \to 0}{\lim} \mathcal{S}(f, P, t_j) := \underset{||P|| \to 0}{\lim} \sum_{j=1}^{n} f(t_j) \Delta x_j \end{equation*}$ 1

$t_j$ is just some arbitrary number in the given interval, e.g. one could set $t_j := \frac{1}{2}(x_j - x_{j-1})$ .

Theorems

Suppose $a, b \in \mathbb{R}$ with $a < b$ . If $f$ is continuous on the interval $[a, b]$ , then $f$ is integrable on $[a, b]$ .

Telescoping

If $g : \mathbb{N} \to \mathbb{R}$ , then

$\begin{equation*} \sum_{k = m}^{n} \Big( g(k + 1) - g(k) \Big) = g(n + 1) - g(m), \quad \forall n \ge m \in \mathbb{N} \end{equation*}$

This is more of a remark which is very useful for proofs involving Riemann sums, since we can write

$\begin{equation*} \sum_{j=1}^{n} \Delta x_j = \sum_{j=1}^{n} (x_j - x_{j-1}) = b - a \end{equation*}$

This allows us to write the following inequality for the upper and lower Riemann / Darbaux:

$\begin{equation*} \sum_{j=1}^{n} \big( M_j(f) - m_j(f) \big) \Delta x_j \le \sum_{j=1}^{n} \big( M_j(f) - m_j(f) \big) || P || = || P || \sum_{j=1}^{n} M_j(f) - m_j(f) = || P || \big( f(b) - f(a) \big) \end{equation*}$

in which case all we need to prove for $f$ to be Riemann integrable is that this goes to zero as $|| P || \to 0$ , or equiv. as we get a finer partition.

Mean Value Theorem for Integrals

Suppose that $f$ and $g$ are integrable on $[a, b]$ with $g(x) \ge 0$ for all $x \in [a, b]$ . If

$\begin{equation*} m = \inf f[a, b] \quad \text{and} \quad M = \sup f[a,b] \end{equation*}$

then

$\begin{equation*} \exists c \in [a, b] \implies \int_a^b f(x) g(x) \ dx = c \int_a^b g(x) \ dx \end{equation*}$

In particular, if $f$ is continuous on $[a, b]$ , then

$\begin{equation*} \exists x_0 \in [a,b] \implies \int_a^b f(x) g(x) \ dx = f(x_0) \int_a^b g(x) \ dx \end{equation*}$

introduction_to_analysis

Suppose that $f, g$ are integrable on $[a, b]$ , that $g$ is nonnegative on $[a, b]$ , and that

$\begin{equation*} m, M \in \mathbb{R} : m \le \inf f([a, b]) \text{ and } M \ge \sup f([a, b]) \end{equation*}$

Then there is an $c \in [a, b]$ s.t.

$\begin{equation*} \int_a^b f(x) g(x) \ dx = m \int_a^c g(x) \ dx + M \int_c^b g(x) \ dx \end{equation*}$

In particular, if $f$ is also nonnegative on $[a, b]$ , then there is an $c \in [a, b]$ which satisifies

$\begin{equation*} \int_a^b f(x) g(x) \ dx = M \int_c^b g(x) \ dx \end{equation*}$

introduction_to_analysis

Let $f$ be a real-valued function which is continuous on the closed interval $[a, b]$ , $f$ must attain its maximum and minimum at least once. That is,

$\begin{equation*} \exists c, d \in [a, b] : f(c) \le f(x) \le f(d), \quad \forall x \in [a, b] \end{equation*}$

Fundamental Theorem of Calculus

Let $[a, b]$ be non-degenerate and suppose that $f : [a, b] \to \mathbb{R}$ .

i) If $f$ is continuous on $[a, b]$ and $F(x) = \int_a^x f(t) \ dt$ , then $F \in \mathcal{C}^1 [a, b]$ and

$\begin{equation*} \frac{d}{dx} \int_a^x f(t) \ dt := F'(x) = f(x), \quad \forall x \in [a, b] \end{equation*}$

ii) If $f$ is differentiable on $[a, b]$ and $f'$ is integrable on $[a, b]$ then

$\begin{equation*} \int_a^x f'(t) \ dt = f(x) - f(a), \quad \forall x \in [a,b] \end{equation*}$

introduction_to_analysis

Non-degenerate interval means that $a \ne b$ .

Integrability on R (alternative)

Let $E \subseteq \mathbb{R}$ . Then we define the characteristic function

$\begin{equation*} \begin{split} \chi_E: \quad & \mathbb{R} \to \mathbb{R} \\ & \chi_E(x) = \begin{cases} 1 & \text{if } x \in E \\ 0 & \text{if } x \notin E \end{cases} \end{split} \end{equation*}$

Let $I$ be a bounded interval, then we define the integral of $\chi_I$ as

$\begin{equation*} \int \chi_I := \text{length}(I) \end{equation*}$

Reminds you about Lebesgue-integral, dunnit?

We say $\phi: \mathbb{R} \to \mathbb{R}$ is a step function if there exist real numbers

$\begin{equation*} x_0 < x_1 < \dots < x_n, \quad n \in \mathbb{N} \end{equation*}$

such that

$\phi(x) = 0$ for $x < x_0$ and $x > x_n$
$\phi$ is constant on $(x_{j - 1}, x_j) \le j \le n$

We will use the phrase " $\phi$ is a step function wrt. $\{ x_0, x_1, \dots, x_n \}$ " to describe this situation.

In other words, $\phi$ is step function wrt. $\{ x_0, x_1, \dots, x_n \}$ iff there exists $c_0, c_1, \dots, c_n$ such that

$\begin{equation*} \phi(x) = \sum_{j=1}^{n} c_j \chi_{(x_{j - 1}, x_j)}(x) \end{equation*}$

for $x \ne x_0, x_1, \dots, x_n$ .

If $\phi$ is a step function wrt. $\{ x_0, x_1, \dots, x_n \}$ which takes the value $c_j$ on $(x_{j - 1}, x_j)$ , then

$\begin{equation*} \int \phi := \sum_{j=1}^{n} c_j (x_j - x_{j - 1}) \end{equation*}$

Notice that the values $\{ \phi(x_j) \}$ have no effect on the value of $\int \phi$ , as one would expect.

Let $f: \mathbb{R} \to \mathbb{R}$ . We say that $f$ is Riemann integrable if for every $\varepsilon > 0$ there exist step functions $\phi$ and $\phi$ such that

$\begin{equation*} \phi \le f \le \psi \end{equation*}$

and

$\begin{equation*} \int \psi - \int \phi < \varepsilon \end{equation*}$

A function $f: \mathbb{R} \to \mathbb{R}$ is Riemann integrable if and only if

$\begin{equation*} \sup \left\{ \int \phi : \phi \text{ is a step function and } \phi \le f \right\} = \inf \left\{ \int \psi : \psi \text{ is a step function and } \psi \ge f \right\} \end{equation*}$

If $f$ is Riemann integrable we define the integral of $\int f$ as the common value

$\begin{equation*} \int f : = \sup \left\{ \int \phi : \phi \text{ is a step function and } \phi \le f \right\} \end{equation*}$

or equivalently,

$\begin{equation*} \int f := \inf \left\{ \int \psi : \psi \text{ is a step function and } \psi \ge f \right\} \end{equation*}$

A function $f: \mathbb{R} \to \mathbb{R}$ is Riemann integrable if and only if there exist sequences of step functions $\phi_n$ and $\psi_n$ such that

$\begin{equation*} \phi_n \le f \le \psi_n, \forall n \in \mathbb{N} \end{equation*}$

and

$\begin{equation*} \int \psi_n - \int \phi_n \to 0 \end{equation*}$

If $\phi_n$ and $\psi_n$ are any sequences of step functions satisfying the above, then

$\begin{equation*} \int \phi_n \to \int f \quad \text{and} \quad \int \psi_n \to \int f \end{equation*}$

as $n \to \infty$ .

Suppose $f$ and $g$ are Riemann integrable, and $\alpha, \beta \in \mathbb{R}$ . Then

$\alpha f + \beta g$ is Riemann-integrable and

$\begin{equation*} \int \big( \alpha f + \beta g \big) = \alpha \int f + \beta \int g \end{equation*}$
If $f \ge 0$ then $\inf f \ge 0$ . Further,

$\begin{equation*} f \le g \implies \int f \le \int g \end{equation*}$
$|f|$ is Riemann integrable and

$\begin{equation*} \Big| \int f \Big| \le \int |f| \end{equation*}$
$\max \left\{ f, g \right\}$ and $\min \left\{ f, g \right\}$ are Riemann integrable
$fg$ is Riemann integrable

Integrals and uniform limits of sequences and series of functions

Suppose $f_n: \mathbb{R} \to \mathbb{R}$ is a sequence of Riemann integrable functions which

$\begin{equation*} f_n \to f \end{equation*}$

uniformly.

Suppose that $f_n$ and $f$ are zero outside some common interval $[a, b]$ . Then $f$ is Riemann integrable and

$\begin{equation*} \int f = \lim_{n \to \infty} \int f_n \end{equation*}$

Suppose $\big( a_n \big)_{n = 1}^\infty$ is a non-negative sequence of numbers and $f: [1, \infty) \to (0, \infty)$ is a function that

For some $K$ and $\forall n \in \mathbb{N}$

$\begin{equation*} \int_1^n f \le K \end{equation*}$
For $n \le x \le n + 1$ we have

$\begin{equation*} a_n \le f(x) \end{equation*}$

Then

$\begin{equation*} \sum_{n} a_n \to y \le K \end{equation*}$

for some $y \in \mathbb{R}$ .

Problems

Workshop 5

5

Question

Suppose $f: \mathbb{R} \to \mathbb{R}$ is Riemann integrable, and that $f(x) = 0$ outside of $[a, b]$ where $a < b$ .

Show that

$\begin{equation*} \big( \exp f \big) \chi_{[a, b]} \end{equation*}$

is also Riemann integrable.
Answer

$f$ is Riemann integrable, then there exists step-functions $\phi$ and $\psi$ such that

$\begin{equation*} \phi \le f \le \psi \quad \text{and} \quad \psi - \phi < \varepsilon \end{equation*}$

Or rather, for all $\varepsilon_f > 0$ , there exists $a = x_0 < x_1 < \dots < x_n = b$ such that

$\begin{equation*} \sum_{j = 0}^{\infty} \sup_{x, y \in (x_{j - 1}, x_j)} \big| f(x) - f(y) \big| \ (x_j - x_{j - 1}) < \varepsilon_f \end{equation*}$

Since $f$ is integrable on a bounded and closed interval, then $f$ is bounded and has bounded support. That is

$\begin{equation*} \sup_{x \in [a, b]} f(x) \le M \end{equation*}$

Therefore, by the Mean Value theorem, we have

$\begin{equation*} \begin{split} \Big| \exp \big( f(x) \big) - \exp \big( f(y) \big) \Big| &= \Big| \exp \big( f(c) \big) \Big| \big| f(x) - f(y) \big| \\ & \le e^M \big| f(x) - f(y) \big| \end{split} \end{equation*}$

Therefore the "integral sum" for $\exp(f) \chi_{[a, b]}$ satisfy the following inequality

$\begin{equation*} \sum_{j = 0}^{n} \sup_{x, y \in (x_{j - 1}, x_j)} \Big| \exp \Big( f(x) \Big) - \exp \Big( f(y) \Big) \Big| \big( x_j - x_{j - 1} \big) \le e^M \sum_{j=0}^{n} \sup_{x, y \in (x_{j - 1}, x_j)} \Big( f(x_j) - f(x_{j - 1}) \Big) \big( x_j - x_{j - 1} \big) \end{equation*}$

Therefore, for $\varepsilon > 0$ , we choose

$\begin{equation*} \varepsilon_f = \varepsilon / e^M \end{equation*}$

which gives us

$\begin{equation*} \sum_{j = 0}^{n} \sup_{x, y \in (x_{j - 1}, x_j)} \Big| \exp \Big( f(x) \Big) - \exp \Big( f(y) \Big) \Big| \big( x_j - x_{j - 1} \big) < e^M \Big( \varepsilon / e^M \Big) = \varepsilon \end{equation*}$

That is, $f$ is Riemann integrable implies $\exp(f) \chi_{[a, b]}$ is Riemann integrable.

We've left out the $\chi_{[a, b]}$ in all the expressions above for brevity, but they ought to be included in a proper treatment.

Metric spaces

Definitions

Metric

A metric space is a set $X$ together with a function $\rho : X \times X \mapsto \mathbf{R}$ (called the metric of $X$ ) which satisfies the following properties for all $x, y, z \in X$ :

$\begin{equation*} \begin{split} \text{POSITIVE DEFINITE} \quad & \rho(x, y) \ge 0 \text{ with } \rho(x, y) = 0 \text{ iff } x = y \\ \text{SYMMETRIC} \quad & \rho(x, y) = \rho(y, x) \\ \text{TRIANGLE INEQUALITY} \quad & \rho(x, y) \le \rho(x, z) + \rho(z, y) \end{split} \end{equation*}$ 2

Metric space

A metric space is a set $X$ together with a function $\rho : X \times X \mapsto \mathbf{R}$ (called the metric of $X$ ) which satisfies the following properties for all $x, y, z \in X$ :

Balls

Let $a \in X$ and $r > 0$ . The open ball (in $X$ ) with center $a$ and radius $r$ is the set

$\begin{equation*} B_r(a) := \{ x \in X : \rho(x, a) < r \} \end{equation*}$

Let $a \in X$ and $r > 0$ . The closed ball (in $X$ ) with center $a$ and radius $r$ is the set

$\begin{equation*} \bar{B}_r(a) := \{ x \in X : \rho(x, a) \le r \} \end{equation*}$

Equivalence of metrics

We say two metrics $d$ and $\rho$ on a set $X$ are strongly equivalent if and only if

$\begin{equation*} \exists A, B > 0 : d(x, y) \le A \rho(x, y) \quad \text{and} \quad \rho(x, y) \le B d(x, y), \quad \forall x, y \in X \end{equation*}$

We say two metrics $d$ and $\rho$ on a set $X$ are equivalent if and only if for every $x \in X$ and every $\varepsilon > 0$ there exists $\delta > 0$ such that

$\begin{equation*} \begin{split} d(x, y) < \delta \implies \rho(x, y) < \varepsilon \\ \rho(x, y) < \delta \implies d(x, y) < \varepsilon \end{split} \end{equation*}$

Closedness and openness

A set $V \subseteq X$ is said to be open if and only if for every $x \in V$ there is an $\varepsilon > 0$ such that the open ball $B_\varepsilon(x)$ is contained in $V$ .

A set $E \subseteq X$ is said to be closed if and only if $E^c := X \ \backslash \ E$ is open.

Closure and interior

For $A \subseteq X$

$\begin{equation*} \text{int} A := \bigcup_{U \subseteq A} U, \quad U \text{ is open} \end{equation*}$

is the interior of $A$ ; it is the largest subset of $A$ which is open.

Or equivalently, the interior of a subset $A$ of points of a topological space $X$ consists of all points of $A$ that do not belong to the boundary of $A$ .

A point that is in the interior of $A$ is an interior point of $A$ .

For $A \subseteq X$

$\begin{equation*} \bar{A} := \bigcap_{A \subseteq F} F, \quad F \text{ is closed} \end{equation*}$

is the closure of $A$ ; it is the smallest set containing $A$ which is closed .

Or, equivalently, the closure of $A$ is the union of $A$ and all its limit points (points "arbitrarily close to $A$ "):

$\begin{equation*} \bar{A} := A \cup \left\{ \underset{n \to \infty}{\lim} \ a_n : \forall n \ge 0, a_n \in A \right\} \end{equation*}$

For $A \subseteq X$

$\begin{equation*} \partial A := \bar{A} \ \backslash \ \text{int} A \end{equation*}$

is the boundary of $A$ .

Convergence, Cauchy sequences and completeness

Most theorems and definitions used for sequences are readily generalized to metric spaces.

We say a metric space is complete if and only if every Cauchy sequence in $X$ converges.

In a metric space $(X, d)$ , a sequence $(x_n)$ with $x_n \in X$ is bounded if there exists some ball $B(a, r)$ such that $x_n \in B(a, r)$ for all $n$ .

In a metric space $(X, d)$ , a sequence $(x_n)$ with $x_n \in X$ is a Cauchy sequence iff for every $\varepsilon > 0$ ,

$\begin{equation*} \exists N \in \mathbb{N} : m, n \ge N \implies d(x_n, x_m) < \varepsilon \end{equation*}$

Let $(X, d)$ be a metric space, then $(X, d)$ is said to satisfy the Bolzano-Weierstrass Property iff every bounded sequence $x_n \in X$ has a convergent subsequence .

Closedness, limit points, cluster points and completeness

$x \in X$ is a limit point for $A \subseteq X$ if and only if there is a sequence $(x_n) \subseteq A$ such that $x_n \to x$ as $n \to \infty$ .

$x \in X$ is a cluster point for $A \subseteq X$ if and only if every open ball centred at $x$ contains infinitely many points of $A$ .

The following statements are equivalent:

$x \in X$ s a cluster point for $A \subseteq X$
for all $r > 0$ , $B(x, r) \ \backslash \ \{x\}$ contains a point of $A$
$\exists (x_n) \subseteq A$ , with $x_n \ne x$ for all $n$ , s.t. $x_n \to x$ as $n \to \infty$

Every cluster point for $A$ is a limit point for $A$ . But $x$ can be a limit point for $A$ without being a cluster point .

A closed subset of a complete metric space is complete

A complete subset of any metric space is closed.

Every convergent sequence is Cauchy, but the opposite is not necessarily true.

Compactness

Let $\mathcal{V} = \{V_{\alpha}\}_{\alpha \in \mathcal{A}}$ be a collection of subsets of a metric space $X$ and suppose that $E$ is a subset of $X$ .

$\mathcal{V}$ is said to cover $E$ if and only if

$\begin{equation*} E \subseteq \bigcup_{\alpha \in \mathcal{A}} V_{\alpha} \end{equation*}$

$\mathcal{V}$ is said to be an open covering of $E$ iff $\mathcal{V}$ covers $E$ and each $V_{\alpha}$ is open.

Let $\mathcal{V}$ be a covering of $E$ .

$\mathcal{V}$ is said to a finite (respectively, countable ) subcovering iff there is a finite (respectively, countable) subset $A_0$ of $A$ s.t. $\{V_{\alpha}\}_{\alpha \in A_0}$ covers $E$ .

Let $(X, d)$ be a metric space.

A subset $E \subseteq X$ is compact iff for every open cover $\{ U_{\alpha} : \alpha \in \mathcal{A} \}$ of $E$ , there is a finite subcover $\{U_{\alpha_1}, \dots, U_{\alpha_n} \}$ of $E$ .

I often find myself wondering "what's so cool about this compactness?! It shows up everywhere, but why?"

Well, mainly it's just a smallest-denominator of a lot of nice properties we can deduce about a metric space. Also, one could imagine compactness being important since the basis building blocks of Topology is in fact open sets, and so by saying that any open cover has a finite open subcover, we're saying it can be described using "finite topological constructs". But honestly, I'm still not sure about all of this :)

One very interesting theorem which relies on compactness is Stone-Weierstrass theorem, which allows us to show that for example polynomials are dense in the space of continous functions! Suuuuper-important when we want to create an approximating function.

Let $(X,d)$ be a metric space and let $S \subseteq X$ . Then $S$ is said to be dense in $X$ if for every $a \in X$ and for every $r > 0$ we have that $B(a, r) \cap S \ne \emptyset$ i.e., every open ball in $(X, d)$ contains a point of $S$ .

Or, alternatively, as described in thm:dense-iff-closure-eq-superset:

$\bar{A} = X$

A metric space $X$ is said to be separable iff it contains a countable dense subset.

Where with countable dense subset we simply mean a dense subset which is countable.

We say the metric space $(X, d)$ is a precompact metric space if for every $r > 0$ there is a cover of $X$ by finitely many closed balls of the form

$\begin{equation*} B(a, r) := \left\{ x \in X : d(x, a) \le r \right\} \end{equation*}$

Let $(X, d)$ be a complete metric space and precompact, then $X$ is compact.

Let $(X, d)$ be a metric space. Then $X$ is said to be sequentially compact if and only if every sequence in $X$ has a convergent subsequence.

Let $X$ be a topological space, and $A$ a subspace.

Then $A$ is compact (as a topological space with subspace topology) if and only if every cover of $A$ by open subsets of $X$ has a finite subcover.

If $A = \bigcup_{i \in I}^{} U_i$ for $U_i$ open in $A$ (subspace topology), then $\exists W_i$ open in $X$ s.t. $U_i = W_i \cup A$ .

Therefore

$\begin{equation*} A \subseteq \bigcup_{i \in I}^{} W_i \end{equation*}$

$(\impliedby)$ : Choose finite subcover $\{ W_j \}_{j \in J} \subset \left\{ W_i \right\}_{i \in I}$ . Then $\{ U_j \}_{j \in J}$ is a finite subcover of $A$ .

$(\implies)$ : Let $A \subseteq \bigcup_{i \in I}^{} W_i$ , then $W_i \cap A$ open in $A$ . So we let

$\begin{equation*} A = \bigcup_{i \in I}^{} \big( W_i \cap A \big) \end{equation*}$

so there exists finite $J \subset I$ with

$\begin{equation*} A = \bigcup_{i \in I}^{} \big( W_i \cap A \big) \subset \bigcup_{j \in J}^{} W_j \end{equation*}$

Every space with cofinite topology is compact.

Let $X = \bigcup_{i \in I}^{} U_i$ . Take some $U_{i_0}$ so that $A = X \setminus U_{i_0}$ is finite.

Then $\forall a \in A$ , there exists $U_{i_a}$ in cover with $a \in U_{i_a}$ . Therefore

$\begin{equation*} \left\{ U_{i_0}, U_{i_a} : a \in A \right\} \end{equation*}$

is a finite cover.

Idea: take away one cover → left with finitely many points → we good.

Motivation

Let $f: X \to \mathbb{R}$ . For which $X$ must $f$ be bounded?

$X$ finite
If $X = \bigcup_{j=1}^{n} A_j$ of opens with $f \big|_{A_j}$ bounded then $f$ is bounded.
Any continuous $f: X \to \mathbb{R}$ is locally bounded

$\begin{equation*} \forall x \in X, \exists U_x \in \mathcal{T}_x : \quad x \in U_x \end{equation*}$

with $f \big|_{U_x}$ bounded, e.g.

$\begin{equation*} U_x = f^{-1} \Big( f(x) - 1, f(x) + 1 \Big) \end{equation*}$
If there exists finitely many $U_i \in x_i$ as above, with

$\begin{equation*} X = \bigcup_{i=1}^{n} U_i \end{equation*}$

then $f$ is bounded.

Compactness NOT equivalent to:

$\{ \mathbb{R} \}$ is a cover of $\mathbb{R}$
covers but not finite.
- $\mathcal{T}$ cover $[0, 1]$ → clearly not finite mate
Same as 2, but take finitely many $n$
Follows from 2 by taking subcover to be the whole cover.
$\{ X \}$ always covers $X$ and has finite subcover (e.g. $X = \mathbb{R}$ )

Examples

Non-compact
1. $\bigcup_{n}^{} (-n, n) = \mathbb{R}$ , so $\{ (-n, n) \}_{n \in \mathbb{N}}$ has no finite subcover, so $\mathbb{R}$ not compact
2. Infinite discrete space is not compact. Consider $X = \bigcup_{x \in X}^{} \left\{ x \right\}$ which is an open cover, but has no finite subcover.
Compact
1. $X$ indiscrete so $\mathcal{T} = \left\{ \emptyset, X \right\}$ . Only open covers are $\{ X \}$ and $\{ \emptyset, X \}$ , and $\{ X \}$ is a finite subcover, hence $X$ is compact.
2. Any finite space is compact (for any topology)

Some specific spaces

Banach space

See Banach space

Hilbert space

A Hilbert space $H$ is a vector space equipped by an inner product such that the norm induced by the inner product

$\begin{equation*} |f| = \sqrt{\langle f, f \rangle} \end{equation*}$

turns $H$ into a complete metric space.

A Hilbert space is thus an instance of a Banach space where we specifically define the metric as the square-root of the inner product.

Reproducing Kernel Hilbert Space

Here we only discuss the construction of Reproducing Kernel Hilbert Spaces on the reals, but the results can easily be extended to complex-valued too.

Let $X$ be an arbitrary set and $H$ a Hilbert space of real-valued functions on $X$ . The evaluation functional over the Hilbert space of functions $H$ is a linear functional that evaluates each function at a point $x$ ,

$\begin{equation*} L_x : f \mapsto f(x) \quad \forall f \in H \end{equation*}$

We say that $H$ is reproducing kernel Hilbert space if, for all $x$ in $X$ , $L_x$ is continuous at any $f$ in $H$ , or, equiavelently, if $L_x$ is a bounded operator on $H$ , i.e. there exists some $M > 0$ such that

$\begin{equation*} |L_x(f)| := |f(x)| \le M \norm{f}_H \quad \forall f \in H \end{equation*}$

While this property for $L_x$ ensure both the existence of an inner product and the evaluation of every function in $H$ at every point in the domain. It does not lend itself to easy application in practice.

A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional $L_x$ can be represented by taking the inner product of $f$ with a function $K_x$ in $H$ . This function is the so-called reproducing kernel of the Hilbert space $H$ from which the RKHS takes its name.

The Riesz representation theorem implies that for all $x \in X$ there exists a unique element $K_x$ of $H$ with the reproducing property

$\begin{equation*} f(x) = L_x(f) = \langle f, K_x \rangle \quad \forall f \in H \end{equation*}$

Since $K_x$ is itself a function in $H$ , it holds that for every $y$ in $X$ there exists a $K_y \in H$ s.t.

$\begin{equation*} K_x(y) = \langle K_x, K_y \rangle \end{equation*}$

This allows us to define the reproducing kernel of $H$ as a function $K : X \times X \to \mathbb{R}$ by

$\begin{equation*} K(x, y) = \langle K_x, K_y \rangle \end{equation*}$

From this definition it is easy to see that $K : X \times X \to \mathbb{R}$ is both symmetric and positive definite, i.e.

$\begin{equation*} \sum_{i, j = 1}^{n} c_i c_j K(x_i, x_j) \ge 0 \end{equation*}$

for any $n \in \mathbb{N}$ , $x_1, \dots, x_n \in X$ and some $c_1, \dots, c_n \in \mathbb{R}$ .

RKHS in statistcal learning theory

The representer theorem states that every function in an RKHS that minimizes an empirical risk function can be written as a linear combination of the kernel function evaluated at the training points.

Let $\mathcal{X}$ be a nonempty set and $k$ a postive-definite real-valued kernel on $\mathcal{X} \times \mathcal{X}$ with corresponding RKHS $H$ .

Given a training sample $(x_1, y_1), \dots, (x_n, y_n) \in \mathcal{X} \times \mathbb{R}$ , a strictly monotonically increasing real-valued fuction $g: [0, \infty) \to \mathbb{R}$ , and a arbitrary emipirical risk function $E: (\mathcal{X} \times \mathbb{R}^2)^n \to \mathbb{R} \cup \{ \infty \}$ , then for any $f^* \in H$ satisfying

$\begin{equation*} f^* = \underset{f \in H}{\text{argmin}}\ \Bigg\{ E \Big( \big(x_1, y_1, f(x_1) \big), \dots, \big(x_n, y_n, f(x_n) \big) \Big) + g \big( \norm{f} \big) \Bigg\} \end{equation*}$

$f^*$ admits a representation of the form

$\begin{equation*} f^*(\cdot) = \sum_{i=1}^{n} \alpha_i k(\cdot, x_i) \end{equation*}$

where $\alpha_i \in \mathbb{R}$ for all $1 \le i \le n$ and, as stated before, $k$ is the kernel on the RKHS $H$ .

Let

$\begin{equation*} \begin{split} \varphi: \mathcal{X} &\to \mathbb{R}^{\mathcal{X}} \\ \varphi(x) &= k(\cdot, x) \end{split} \end{equation*}$

(so that $\varphi(x) = k(\cdot, x)$ is itself a map $\mathcal{X} \to \mathbb{R}$ )

Since $k$ is a reproducing kernel, then

$\begin{equation*} \varphi(x)(x') = k(x', x) = \langle \varphi(x'), \varphi(x) \rangle \end{equation*}$

where $\langle \cdot, \cdot \rangle$ is the inner product on $H$ .

Given any $x_1, \dots, x_n$ , one can use the orthogonal projection to decompose any $f \in H$ into a sum of two functions, one lying in the $\span \{ \phi(x_1), \dots, \phi(x_n) \}$ and the other lying in the orthogonal complement:

$\begin{equation*} f = \sum_{i=1}^{n} \alpha_i \varphi(x_i) + v \end{equation*}$

where $\langle v, \phi(x_i) \rangle = 0$ for all $i$ .

The above orthogonal decomposition and the reproducing property of $\varphi$ show that applying $f$ to any training point $x_j$ produces

$\begin{equation*} f(x_j) = \bigg\langle \sum_{i=1}^{n} \alpha_i \varphi(x_i) + v, \ \varphi(x_j) \bigg\rangle = \sum_{i=1}^{n} \alpha_i \Big\langle \varphi(x_i), \varphi(x_j) \Big\rangle \end{equation*}$

which we observe is independent of $v$ . Consequently, the value of the empirical risk $E$ in defined in the representer theorem above is likewise independent of $v$ .

For the second term (the regularization term), since $v$ is orthogonal to summand-term and $g$ is strictly monotonic, we have

$\begin{equation*} \begin{split} g \big( \norm{f} \big) &= g \Bigg( \norm{ \sum_{i=1}^{n} \alpha_i \varphi(x_i) + v } \Bigg) \\ &= g \Bigg( \sqrt{\sum_{i=1}^{n} \norm{\alpha_i \varphi(x_i)}^2 + \norm{v}^2} \Bigg) \\ &\ge g \Bigg( \norm{\sum_{i=1}^{n} \alpha_i \varphi(x_i)} \Bigg) \end{split} \end{equation*}$

Therefore setting $v = 0$ does not affect the first term of the empirical risk minimization, while it strictly decreasing the second term.

Consequently, any minimizer $f^*$ of the empirical risk must have $v = 0$ , i.e., it must be of the form

$\begin{equation*} f^*(\cdot) = \sum_{i=1}^{n} \alpha_i \varphi(x_i) = \sum_{i=1}^{n} \alpha_i k(\cdot, x_i) \end{equation*}$

which is the desired result.

This dramatically simplifies the regularized empirical risk minimization problem. Usually the search domain $H$ for the minimization function will be an infinte-dimensional subspace of $L^2(\mathcal{X})$ (square-integrable functions).

But, by the representer theorem, we know that the representation of $f^*(\cdot)$ reduces the original (infinite-dimensional) minimization problem to a search for the optimal n-dimensional vector of coefficients $\boldsymbol{\alpha} = (\alpha_1, \dots, \alpha_n) \in \mathbb{R}^n$ for the kernel for each data-point.

Theorems

Let $a, b \in \mathbb{R}^n$ , then

$\begin{equation*} \sum_{i=1}^{n} | a_i b_i| \le \Bigg( \sum_{i=1}^{n} a_i^2 \Bigg)^{1 / 2} \Bigg( \sum_{i=1}^{n} b_i^2 \Bigg)^{1 / 2} \end{equation*}$

Let $a, b \in \mathbb{R}^n$ . Consider the following polynomial

$\begin{equation*} 0 \le \big( |a_1| x + |b_1| \big)^2 + \dots + \big( |a_n| x + |b_n| \big)^2 = \Bigg( \sum_{i=1}^{n} a_i^2 \Bigg) x^2 + 2 \Bigg( \sum_{i=1}^{n} |a_i b_i| \Bigg) x + \sum_{i=1}^{n} b_i^2 \end{equation*}$

where we've used the fact that $|a_i| |b_i| = |a_i b_i|$ for $a_i, b_i \in \mathbb{R}$ .

Since it's nonnegative, it has at most one real root for $x$ , hence its discrimant is less than or equal to zero. That is,

$\begin{equation*} \Bigg( \sum_{i=1}^{n} |a_i b_i| \Bigg)^2 - \Bigg( \sum_{i=1}^{n} a_i^2 \Bigg) \Bigg( \sum_{i=1}^{n} b_i^2 \Bigg) \le 0 \end{equation*}$

Hence,

$\begin{equation*} \Bigg( \sum_{i=1}^{n} a_i b_i \Bigg) \le \Bigg( \sum_{i=1}^{n} a_i^2 \Bigg)^{1 / 2} \Bigg( \sum_{i=1}^{n} b_i^2 \Bigg)^{1 / 2} \end{equation*}$

as claimed.

Let $X$ be a metric space.

A sequence in $X$ can have at most one limit.
If $x_n \in X$ converges to $a$ and $\{x_{n_k}\}$ is any subsequence of $\{x_n\}$ , then $x_{n_k}$ converges to $a$ as $k \to \infty$
Every convergence sequence in $X$ is bounded
Every convergence sequence in $X$ is Cauchy

Let $x_n \in X$ . Then $x_n \to a$ as $n \to \infty$ if and only if

$\begin{equation*} \forall V \subseteq X : a \in V, \quad \exists N \in \mathbb{N} \implies n \ge N, \quad x_n \in V \end{equation*}$

Let $E \subseteq X$ . Then $E$ is closed if and only if the limit of every convergent sequence $x_k \in E$ satisfies

$\begin{equation*} \lim_{k \to \infty} x_k \in E \end{equation*}$

A set is open iff it equals its interior ; a set is closed iff it equals its closure .

Let $(X, d)$ be a metric space and let $S \subseteq X$ . Then, $S$ is dense if and only if $\bar{S} = X$ .

Any compact set must be closed and bounded.

The converse is not necessarily true (Heine-Borel Theorem addresses when this is true).

Bounded:

$\begin{equation*} K \subseteq \bigcup_{r > 0}^{} B_r(0) = X \end{equation*}$

$K$ is compact implies that $\exists r_1, \dots, r_N$ s.t.

$\begin{equation*} K \subseteq \bigcup_{j=1}^{N} B_{r_j}(0) = B_{r_*} (0) \end{equation*}$

where $r_* = \max \left\{ r_j \right\}$ .

Let $(X, d)$ be a separable metric space which satisfies the Bolzano-Weierstrass Property and $H \subseteq X$ . Then $H$ is compact if and only if it is closed and bounded.

Observe that

$\begin{equation*} \overline{B}_1(0) = \left\{ x \in X : \norm{x} \le 1 \right\} \end{equation*}$

is closed and bounded.

$\overline{B}_1(0)$ is compact if and only if $\dim X < \infty$ .

Suppose $\dim X = \infty$ . Idea is cto construct $\big( e_j \big)_{j = 1}^\infty$ of unit vectors s.t.

$\begin{equation*} \norm{e_j - e_k} \ge \frac{1}{2}, \quad \forall j \ne k \end{equation*}$

Example: continuous functions

$\big( C([0, 1]), \norm{\cdot}_{L^\infty} \big)$ is a Banach space.

What are the compact sets $K \subseteq C([0, 1])$ ?:

$K$ is compact if and only if $K$ is closed, bounded AND "something something" (what is it?)

Continuity and limits of functions

Let $f: X \to Y$ where $(X, d)$ and $(Y, \rho)$ are metric spaces.

Then $\underset{x \to a}{\lim} f(x) = b$ if and only if for every $\varepsilon > 0$ we have

$\begin{equation*} \exists \delta > 0 \quad : \quad 0 \ne d(x, a) < \delta \quad \implies \quad \rho \big( f(x), f(b) \big) < \varepsilon \end{equation*}$

Or equivalently, if $f$ is continuous on $X$ ,

$\begin{equation*} \exists \delta > 0 \quad : \quad x \in B_d(a, \delta) \quad \implies \quad f(x) \in B_{\rho} \big( f(a), \varepsilon \big), \quad x \ne a \end{equation*}$

Connected sets

Let $X$ be a metric space.

A pair of nonempty open sets $U, V \subset X$ is said to separate $X$ if and only if $X = U \cup V$ and $U \cap V = \emptyset$
$X$ is said to be connected if and only if $X$ cannot be separated by any pair of open sets $U, V$

Loosely speaking, a connected space cannot be broken into smaller, nonempty, open pieces which do not share any common points.

Let $X$ be a metric space, and $U, V \subset X$ .

$U, V$ are said to separate $X$ if and only if:

$U, V \ne \emptyset$ (non-empty)
$U \cup V = X$
$U \cap V = \emptyset$

$X$ is connected if and only if it cannot be separated by any $U, V$ .

A subset $E \subset \mathbb{R}$ is connected if and only if $E$ is an interval

A subset $E$ of a metric space $X$ is path-connected if for every $a, b \in E$ there is a continuous function (path) $\phi: [0, 1] \to E$ such that

$\begin{equation*} \phi(0) = a, \qquad \phi(1) = b \end{equation*}$

Let $X$ be a metric space, and $E \subset X$ .

If $E$ is path-connected , then $E$ is connected.

Stone-Weierstrass Theorem

Notation

$X$ is a metric space
$A$ denotes a algebra in $\mathcal{C}(X)$
$\norm{f} = \sup_{x \in X} | f(x) |$
$\norm{g - f} = \sup_{x \in X} |g(x) - f(x)|$

Goal

The goal of this section is to answer the following question:

Can one use polynomials to approximate continuous functions on an interval $[a, b]$ ?

Stuff

Let $X$ be a metric space.

A set $A$ is a said to be a (real function) algebra in $\mathcal{C}(X)$ if and only if

$\emptyset \ne A \subseteq \mathcal{C}(X)$
If $f, g \in A$ , then $f + g$ and $fg$ both belong to $A$
If $f \in A$ and $c \in \mathbb{R}$ , then $cf \in A$

A subset $A$ of $\mathcal{C}(X)$ is said to be (uniformly) closed if and only if for each sequence $f_n \in A$ that satisfies $\norm{f_n - f} \to 0$ as $n \to \infty$ , the limit function $f$ belongs to $A$ .

A subset $A$ of $\mathcal{C}(X)$ is said to be uniformly dense in $\mathcal{C}(X)$ if and only if given $\varepsilon > 0$ and $f \in \mathcal{C}(X)$ there is a function $g \in A$ such that $\norm{g - f} < \varepsilon$ .

A subset $A$ of $\mathcal{C}(X)$ separates points of $X$ if and only if given $x, y \in X$ with $x \ne y$

$\begin{equation*} \exists f \in A : f(x) \ne f(y) \end{equation*}$

Stone-Weierstrass Theorem

Suppose that $X$ is a compact metric space.

If $A$ is an algebra in $\mathcal{C}(X)$ that separates points of $X$ and contains the constant functions, then $A$ is uniformly dense in $\mathcal{C}(X)$ .

This is HUGE. It basically says that on any compact metric space, we can approximate any continuous function arbitrarily well using only the constants functions and some functions which separates points in the metric space!

You know what space satisfies this? Space of all polynomials!

Q & A

DONE Alternative definition of compact; consequences?

What's the difference between the definition of compactness and the following:

A set $X$ is compact if for every subset $E \subseteq X$ there exists a finite covering $\mathcal{V}$ .

Is there any difference; and if so, what are the consequences?

Answer

Yes, there is a difference.

In our new definition we're only saying that the EXISTS some finite covering, while the proper definition is sort of saying that all coverings of $E$ does in fact contain a finite covering, a sort of "the lowest common denominator covering has to be finite, and each of these coverings do in fact have this".

Fixed Point Theory

Differential Equations

Notation

Stuff

In this section we're considering the large class of ODEs of the form

$\begin{equation*} \begin{split} \frac{dx}{dt} &= F(x, t) \\ x(0) &= A \end{split} \end{equation*}$

there will exists a unique solution $x(t)$ for $t$ sufficiently small.

To ensure the existence of a unique solution, we need to consider the following:

Suppose $A \in \mathbb{R}$ and $A > 0$ . Also suppose $r > 0$ and $\rho > 0$ and

$\begin{equation*} F: [A - \rho, A + \rho] \times [-r, r] \to \mathbb{R} \end{equation*}$

is continuous.

Further, suppose that for all $x, y \in [A - \rho, A + \rho]$ and $t \in [-r, r]$ there exists $M > 0$ such that

$\begin{equation*} \abs{F(x, t) - F(y, t)} \le M \abs{x - y} \end{equation*}$

Due to Mean Value Theorem, if $\frac{\partial F}{\partial x}$ exists and is continuous on $[A - \rho, A + \rho] \times [-r, r]$ then the above is satisfied.

Suppose $F$ satisfies a Lipschitz condition as above. Then there exists an $s > 0$ such that the ODE

$\begin{equation*} \begin{split} \frac{dx}{dt} &= F(x, t) \\ x(0) = A \end{split} \end{equation*}$

has a unique solution $x(t)$ for $|t| < s$ .

Exercises

From the notes

A contraction is continuous

A contraction $f$ is continuous.

Let $(X, d)$ , and let $f: X \to X$ be a contraction. Now, for the sake of contradiction, suppose $f$ is discontinuous at some point $a \in X$ .

Due to $f$ being a contraction, then for any two points $x, a \in X$ we have

$\begin{equation*} d \big( f(x), f(a) \big) \le \alpha d(x, a) \end{equation*}$

where $\alpha \in (0, 1)$ . Since $x$ is arbitrary we can let $x$ be such that

$\begin{equation*} d(x, a) < \frac{\varepsilon}{\alpha + 1} \end{equation*}$

Clearly,

$\begin{equation*} \alpha \frac{\varepsilon}{\alpha + 1} \le \alpha \frac{\varepsilon}{\alpha} = \varepsilon \end{equation*}$

Thus,

$\begin{equation*} \alpha d(x, a) < \varepsilon \end{equation*}$

Which implies,

$\begin{equation*} d \big( f(x), f(a) \big) < \varepsilon \end{equation*}$

But this is only true if and only if $f$ is continuous; hence we have our contradiction.

TODO Exercise 1

Suppose $f: X \to X$ is a contraction mapping.

$\begin{equation*} d \big( f(x), f(y) \big) < d(x, y), \quad \forall x \ne y \in X \end{equation*}$

then any fixed point will be unique, whether or not $X$ is complete.

Further, show that if $X$ is not complete, then a fixed-point does not necessarily exist.

Let $(X, d)$ be a metric space.

For the first part of our claim, suppose the mapping $f: X \to X$ satisfies

$\begin{equation*} d \big( f(x), f(y) \big) < d(x, y), \forall x \ne y \in X \end{equation*}$

Then clearly $f$ is a contraction, since we can always choose $\alpha \in (0, 1)$ such that

$\begin{equation*} d(x, y) > \alpha d(x, y) \le d \big( f(x), f(y) \big) \end{equation*}$

due to the interval $(0, 1)$ being dense in $[0, 1]$ .

Now, for the sake of contradiction suppose there exists two different fixed-points, $x_1^* \ne x_2^*$ . Then, from the property above of $f$ , we have

$\begin{equation*} d \big( f(x_1^*), f(x_2^*) \big) < d \big( x_1^*, x_2^* \big) \end{equation*}$

But, since $x_1^*$ and $x_2^*$ are fixed-points, we have

$\begin{equation*} f(x_1^*) = x_1^*, \quad f(x_2^*) = x_2^* \end{equation*}$

Hence, the above inequality implies

$\begin{equation*} d (x_1^*, x_2^*) < d(x_1^*, x_2^*) \end{equation*}$

Which clearly is a contraction, hence if there exists a fixed-point of $f$ , then that is a unique fixed-point.

Now, for the second part of the claim,

PROBABLY EXIST SOME COUNTER-EXAMPLE THAT I CAN'T THINK OF. SOME $f$ SUCH THAT THIS IS NOT THE CASE.

TODO Exercise 2

Let $(X, d)$ be a metric space which is not complete. Then there exists contractions with no fixed point.

Probably some counter-example I can't think of.

TODO Exercise 3

Note taken on [2017-11-18 Sat 15:59]
Regarding the previous note, could we not have

$\begin{equation*} f(x) = \begin{cases} -x & \text{if } x = x^* \\ x^* & \text{otherwise} \end{cases} \end{equation*}$

which for even $n$ we would still have $f^{(n)}$ be a contraction mapping, but for odd $n$ it would not be! Therefore I believe it's reasonable to assume that they mean for any $n$ .
Note taken on [2017-11-18 Sat 15:50]
But $f^{(n)} (x^*) = x^*$ alone does not imply that $f(x^*) = x^*$ since $(f \circ f)(x^*) = x^*$ , could be $f(f(x)) = f(-x) = x$ , hence there needs to be something else which ensures this implication. That is, if the claim is supposed to be true for any $n$ , then yeah, this implication would definitively hold, but I'm not sure that is what they mean
Note taken on [2017-11-18 Sat 15:50]
$f^{(n)}$ being a contraction in a complete metric space => $f$ is a contraction in a complete metric space => $f$ has a unique fixed point in this space: NOPE! We can have fixed points without the function $f$ being a contraction, also $f^{(n)}$ being a contraction does not in fact imply that $f$ is a contraction! See Exercise 4 for a counter-example.

Let $(X, d)$ be a complete metric space, and suppose $f: X \to X$ is such that

$\begin{equation*} f^{(n)} := f \circ f \circ \dots \circ f, \quad \text{n times} \end{equation*}$

is a contraction. Then $f$ has a unique fixed point.

As proven in the first part of Exercise 1 we know that any contraction $g: X \to X$ has a unique fixed point when $X$ is a complete metric space. Thus, we know that there exists some $x^*$ such that

$\begin{equation*} f^{(n)}(x^*) = x^* \end{equation*}$

If we assume the claim is supposed to hold for any $n$ , and not a specific arbitrary $n$ , then the above implies

$\begin{equation*} f(x^*) = x^* \end{equation*}$

Further, if $x^*$ was not unique, then

$\begin{equation*} f^{(n)}(x') = f^{(n - 1)} \big( f(x') \big) = f^{(n - 1)} \big( x' \big) = f^{(n - 2)} \big( f(x') \big) = x' \end{equation*}$

for some $x' \ne x^*$ , but this implies that $f^{(n)}$ has another fixed point, which we know cannot be.

Hence, if $f^{(n)}$ is a contraction, then $f$ has a unique fixed point.

TODO Exercise 4

Note taken on [2017-11-18 Sat 17:29]
Maaaybe you can build some argument by creating two linear functions $g_1$ and $g_2$ which intersect $\cos$ at a single point $x = t$ , and such that

$\begin{equation*} g_1(t) = g_2(t) = \cos t, \quad g_1(x) \le \cos x, \quad g_2(x) \ge \cos x, \quad x \in [0, t] \end{equation*}$

Basically, consider two functions for which we can easily compute the effect of $d(x, y)$ for two points $x$ and $y$ on the distance between them $d \big( g_1(x), g_2(y) \big)$ which we can clearly tell has the property that $d \big( g_1(x), g_2(x) \big) \ge d \big( \cos x, \cos y \big)$ , buuut I'm not going to bother spending time on this now.
Note taken on [2017-11-18 Sat 17:08]
But, if we can show that

$\begin{equation*} \cos(x) \in \Big( - \frac{\pi}{4}, \frac{\pi}{4} \Big), \quad \forall x \in \mathbb{R} \end{equation*}$

and then

$\begin{equation*} d \big( \cos (x_2) - \cos (x_1) \big) \le \alpha d(x_2, x_1), \quad \forall x \in \Big( - \frac{\pi}{4}, \frac{\pi}{4} \Big) \end{equation*}$

we're good!
Note taken on [2017-11-18 Sat 16:59]
I was wondering if we can use some function $g: \mathbb{R} \to \mathbb{R}$ on the interval $x \in (0, \frac{\pi}{4})$ which is defined such that

$\begin{equation*} g(x) = 1 - ax, \quad g(0) = 1, \quad g(\pi / 4) = \frac{1}{\sqrt{2}} \end{equation*}$

which also has the property that $g(x) \le \cos x$ for $x \in (0, \frac{\pi}{4})$ , and then we potentially compute the distance of the difference between these functions to obtain some upper-bound on $\cos$ which related to $d(x_1, x_2)$ .

$\begin{equation*} d \Big( \cos x_1 - (1 - ax), \cos x_2 - (1 - a x_2) \Big) \end{equation*}$

Doesn't seem to work very well though
Note taken on [2017-11-18 Sat 16:34]
Technically we only need to prove that $\cos^{(2)}$ is a contraction mapping on the open interval $x \in (- \frac{\pi}{4}, \frac{\pi}{4} )$ , since for $x$ not in this interval, $\cos(x)$ will lie in the interval specified before.

$\cos : \mathbb{R} \to \mathbb{R}$ is not a contraction, but $\cos^{(2)} = \cos \circ \cos$ is a contraction.

Further, this implies that there is a unique solution in $\mathbb{R}$ to the equation

$\begin{equation*} \cos^2 x = x \end{equation*}$

Clearly $f = \cos$ is not a contraction, since

$\begin{equation*} \cos(\frac{\pi}{4}) = \frac{1}{\sqrt{2}}, \quad \cos(\frac{\pi}{2}) = 0 \end{equation*}$

which implies

$\begin{equation*} d \Big( \cos(\frac{\pi}{4}), \cos(\frac{\pi}{2}) \Big) = \frac{1}{\sqrt{2}} \end{equation*}$

and

$\begin{equation*} d \Big( \frac{\pi}{4}, \frac{\pi}{2} \Big) = \frac{1}{\sqrt{2}} \end{equation*}$

Hence,

$\begin{equation*} d \Big( \cos(\frac{\pi}{4}), \cos(\frac{\pi}{2}) \Big) = d \Big( \frac{\pi}{4}, \frac{\pi}{2} \Big) > \alpha d \Big( \frac{\pi}{4}, \frac{\pi}{2} \Big), \quad \alpha \in (0, 1) \end{equation*}$

NOW PROVE THAT $\cos \circ \cos$ IS A CONTRACTION MAPPING YAH FOOL

Fourier Series

Notation

Definition

Let $f$ be integrable on $[- \pi, \pi]$ .

The Fourier coefficients of $f$ are numbers

$\begin{equation*} a_k(f) = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \cos kx \ dx, \quad k = 0, 1, 2, \dots \end{equation*}$

and

$\begin{equation*} b_k(f) = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \sin kx \ dx, \quad k = 1, 2, 3, \dots \end{equation*}$

Let $f$ be integrable on $[- \pi, \pi]$ and let $N$ be a nonnegative integer.

We define the Fourier series of $f$ as the trigonometric series

$\begin{equation*} \big( S f \big) (x) = \frac{a_0(f)}{2} + \sum_{k=1}^{\infty} \Big( a_k(f) \cos kx + b_k(f) \sin kx \Big) \end{equation*}$

and the partial sum of $Sf$ of order $N$ to be the trigonometric polynomial defined

$\begin{equation*} \big( S_N f \big) (x) = \frac{a_0(f)}{2} + \sum_{k=1}^{N} \Big( a_k(f) \cos kx + b_k(f) \sin kx \Big) \end{equation*}$

Kernels

Dirichlet kernel

Let $N$ be nonnegative integer.

The Dirichlet kernel of order $N$ is the function defined

$\begin{equation*} D_N(x) = \frac{1}{2} + \sum_{k=1}^{N} \cos kx, \quad N = 1, 2, 3, \dots \end{equation*}$

with the special case of $D_0 = \frac{1}{2}$ .

It turns out it can also be written as

$\begin{equation*} D_N(t) = \frac{\sin \big( N + \frac{1}{2} \big) x}{2 \sin \big( \frac{x}{2} \big)} \end{equation*}$

$\begin{equation*} \big( S_N f \big) (x) = \frac{1}{\pi} \int_{-\pi}^{\pi} f(t) D_N(x - t) \ dt \end{equation*}$

for all $x \in [-\pi, \pi]$ and $N \in \mathbb{N}$ .

That is, we can write the N-th order Fourier partial sum as a convolution between $f$ and the Dirichlet kernel.

$\begin{equation*} \begin{split} \big( S_N f \big) (x) &= \frac{1}{2 \pi} \int_{-\pi}^{\pi} f(t) \ dt + \frac{1}{\pi} \sum_{k=1}^{N} \int_{-\pi}^{\pi} f(t) \Big( \cos kt \cos kx + \sin kt \sin kx \Big) \ dt \\ &= \frac{1}{\pi} \Bigg[ \int_{-\pi}^{\pi} \Bigg( \frac{1}{2} + \sum_{k=1}^{N} \cos \Big( k (x - t) \Big) \Bigg) f(t) \ dt \Bigg] \end{split} \end{equation*}$

where we've brought the intergrals together and used the trigonometric identity

$\begin{equation*} \cos kt \cos kx + \sin kt \sin kx = \cos \big( k (x - t) \big) \end{equation*}$

Finally remembering that

$\begin{equation*} D_N(x) = \frac{1}{2} + \sum_{k=1}^{N} \cos kx \end{equation*}$

We see that the above expression is simply

$\begin{equation*} \big( S_N f \big)(x) = \frac{1}{\pi} \int_{-\pi}^{\pi} D_N(x - t) f(t) \ dt \end{equation*}$

as claimed.

Suppose that $f_N: [-\pi, \pi] \to \mathbb{R}$ is integrable and that $f_N \to f$ uniformly on $[- \pi, \pi]$ . Then,

$\begin{equation*} a_k(f_N) \to a_k(f), \quad b_k(f_N) \to b_k(f) \end{equation*}$

as $N \to \infty$ uniformly in $k$ .

We know that $\forall \varepsilon > 0, \exists N_0 \in \mathbb{N}$ such that

$\begin{equation*} \forall N \ge N_0 \quad \implies \quad |f_N - f| < \varepsilon, \quad \forall x \in [-\pi, \pi] \end{equation*}$

Then we have,

$\begin{equation*} \begin{split} | a_k (f_N) - a_k(f) | &= \frac{1}{\pi} \Bigg| \int_{-\pi}^{\pi} f_N(t) \cos kt \ dt - \int_{-\pi}^{\pi} f(t) \cos kt \ dt \Bigg| \\ &\le \frac{1}{\pi} \int_{-\pi}^{\pi} \Big| f_N(t) - f(t) \Big| \cos kt \ dt , \qquad \Bigg( \cos kt \le 1 \Bigg) \\ &\le \frac{1}{\pi} \int_{-\pi}^{\pi} \Big| f_N(t) - f(t) \Big| \ dt \\ &< \frac{1}{\pi} \int_{-\pi}^{\pi} \varepsilon \ dt \\ &= 2 \varepsilon \end{split} \end{equation*}$

I.e. we can make the difference between the coefficients as small as we'd like, hence

$\begin{equation*} a_k(f_N) \to a_k(f) \quad \text{as} \quad N \to \infty \end{equation*}$

The very same argument holds for $b_k$ .

Fejér kernel

The Fejér kernel of order $N$ is the function defined

$\begin{equation*} K_N(x) = \frac{1}{2} + \sum_{k=1}^{N} \Bigg( 1 - \frac{k}{N + 1} \Bigg) \cos kx, \quad N = 1, 2, 3, \dots \end{equation*}$

with the special case of $D_0 = \frac{1}{2}$ .

Functional Analysis

Notation

$\mathbb{F} \in \left\{ \mathbb{R}, \mathbb{C} \right\}$ is the fields we'll be using
$\mathbb{F}^{\infty} = \left\{ x = (x_1, x_2, x_3, \dots) : x_j \in \mathbb{F} \text{ for every } j \right\}$
"Small Lp" space:

$\begin{equation*} \ell^p := \left\{ (x_j)_{j \ge 1} \in \mathbb{F}^{\infty} : \sum_{j=1}^{\infty} \left| x_j \right|^p < \infty \right\} \end{equation*}$
Let $\big( X, \mathcal{S}, \mu \big)$ be a measure space and $0 < p < \infty$ , then

$\begin{equation*} \mathcal{L}^p \big( X, \mathcal{S}, \mu \big) := \mathcal{L}^p(X, \mathcal{S}, \mu, \mathbb{R}) \end{equation*}$

denotes the set of all measurable functions $f$ on $X$ such that $\int |f|^p \ d \mu < \infty$ and the values of $f$ are real numbers, except possibly on a set of measure $0$ .
Lp space:

$\begin{equation*} L^p = \left\{ f: [0, 1] \to \mathbb{R} : \norm{f}_{L^p} < \infty \right\} \end{equation*}$

where

$\begin{equation*} \norm{f}_{L^p} := \bigg( \int_{0}^{1} |f(x)|^p \ dx \bigg)^{1 / p} \end{equation*}$

Or more accurately,

$\begin{equation*} L^P(X, \mathcal{S}, \mu) := \left\{ f^{\tilde{}} : f \in \mathcal{L}^p(X, \mathcal{S}, \mu) \right\} \end{equation*}$

where $f^{\tilde{}}$ denotes the equivalence classes of the equivalence relation

$\begin{equation*} f \sim h \iff h = f \text{ a.e.} \end{equation*}$
NLS means normed linear spaces

Theorems

Inequalities

Let $a, b \ge 0$ such that $\frac{1}{p} + \frac{1}{q} = 1$ . Then

$\begin{equation*} ab \le \frac{1}{p} a^p + \frac{1}{q} b^q \end{equation*}$

For any two sequences $(a_1, a_2, \dots, a_n)$ and $(b_1, b_2, \dots, b_n)$ of nonnegative numbers, we have

$\begin{equation*} \sum_{j=1}^{n} a_j b_j \le \bigg( \sum_{j=1}^{n} a_j^p \bigg)^{1 / p } \bigg( \sum_{j=1}^{n} b_j^q \bigg)^{1 / q} \end{equation*}$

for any $1 \le p \le \infty$ where $q$ is the conjugate exponent of $p$ , i.e.

$\begin{equation*} q = \frac{p}{p - 1} \implies \frac{1}{p} + \frac{1}{q} = 1 \end{equation*}$

Observe that this aslo includes $p = \infty$ and $q = 1$ !

For $a, b \ge 0$ and $p \ge 1$ :

$\begin{equation*} \big( a^{1/p} + b^{1/p} \big)^p \le 2^{p - 1} (a + b) \end{equation*}$

From Hölders inequality we have

$\begin{equation*} \begin{split} \big( x_1 + x_2 \big) &= \big( x_1 \cdot 1 + x_2 \cdot 2 \big) \\ &\le \big( x_1^{p} + x_2^{p} \big)^{1 / p} \big( \underbrace{1^q + 1^q}_{= 1 + 1 = 2} \big)^{1 / q} \\ &= 2^{\frac{p - 1}{p}} \big( x_1^{p} + x_2^{p} \big)^{1 / p} \end{split} \end{equation*}$

where $q = \frac{p}{p - 1}$ . Finally, letting $x_1 := a^p$ and $x_2 := b^p$ , we get

$\begin{equation*} \big( a^{1 / b} + b^{1 / b} \big) \le 2^{\frac{p - 1}{p}} \big( a^{(1 / p) \cdot p} + b^{(1 / p) \cdot p} \big)^{1 / p} = 2^{\frac{p - 1}{p}} \big( a + b \big)^{1 / p} \end{equation*}$

And taking both sides to the power of $p$ :

$\begin{equation*} \big( a^{1 / p} + b^{1 / b} \big)^p \le 2^{p - 1} (a + b) \end{equation*}$

as wanted.

For any two elements $x = (x_1, x_2, \dots, x_n)$ and $y = (x_1, x_2, \dots, x_n)$ of $\mathbb{F}^n$ , we have

$\begin{equation*} \begin{split} \norm{x + y}_p &= \bigg( \sum_{j=1}^{n} \left| x_j + y_j \right|^p \bigg)^{1 / p} \\ &\le \bigg( \sum_{j=1}^{n} \left| x_j \right|^p \bigg)^{1 / p} + \bigg( \sum_{j=1}^{n} \left| y_j \right|^p \bigg)^{1 / p} \\ &= \norm{x}_p + \norm{y}_p \end{split} \end{equation*}$

for any $1 \le p \le \infty$ .

Mercer's theorem

Let $K: [a, b]^2 \to \mathbb{R}$ be a symmetric continuous function, often called a kernel.

$K$ is said to be non-negative definite (or positive semi-definite) if and only if

$\begin{equation*} \sum_{i=1}^{n} \sum_{j=1}^{n} K(x_i, x_j) c_i c_j \ge 0 \end{equation*}$

for all fininte sequences of points $x_1, \dots, x_n \in [a, b]$ and all choices of real numbers $c_1, \dots, c_n$ .

We associate with $K$ a linear operator $T_K: L^2([a, b]) \to L^2([a, b])$ by

$\begin{equation*} \big( T_K \varphi \big) (x) = \int_{a}^{b} K(x, s) \varphi(s) \ ds, \quad \varphi \in L^2([a, b]) \end{equation*}$

The theorem then states that there is an orthonormal basis $\{ e_i \}$ of $L^2([a, b])$ consisting for eigenfunctions of $T_K$ such that the corresponding sequence of eigenvalues $\{ \lambda_i \}$ is nonnegative.

The eigenfunctions corresponding to non-zero eigenvalues are continuous on $[a, b]$ and $K$ has the representation

$\begin{equation*} K(s, t) = \sum_{j=1}^{\infty} \lambda_j e_j(s) e_j(t) \end{equation*}$

where the convergence is absolute and uniform.

There are also more general versions of Mercer's thm which establishes the same result for measurable kernels, i.e. $K \in L_{\mu \otimes \mu}^2(X \times X)$ on any compact Hausdorff space $X$ .

Banach spaces

A norm on a vector space $V$ over $\mathbb{F}$ ( $\mathbb{F} = \mathbb{R}$ or $\mathbb{C}$ ) is a map

$\begin{equation*} \begin{split} \norm{\cdot}: \quad & V \to \mathbb{R} \\ & \psi \mapsto \norm{\psi} \end{split} \end{equation*}$

with the following properties:

For all $\psi \in V$ , $\norm{\psi} \ge 0$ , with equality if and only if $\psi = 0$
For all $\psi \in V$ and $c \in \mathbb{F}$ , we have $\norm{c \psi} = |c| \norm{\psi}$
For all $\phi, \psi \in V$ , we have

$\begin{equation*} \norm{\phi + \psi} \le \norm{\phi} + \norm{\psi} \end{equation*}$

If $\norm{\cdot}$ is a norm on $V$ , then we can define a metric $d$ on $V$ by setting $d(\phi, \psi) = \norm{\psi - \phi}$ .

A normed vector space is said to be a Banach space if it is complete wrt. the associated metric.

A Banach space is said to be separable if it contains a countable dense subset.

Let $V_1$ be a normed space and $V_2$ a Banach space.

Suppose $W$ is a dense subset of $V_1$ and that $T: W \to V_2$ is a bounded linear operator.

Then there exists a unique bounded linear map $\tilde{T}: V_1 \to V_2$ such that

$\begin{equation*} \tilde{T}|_W = T \end{equation*}$

where $\tilde{T}|_W$ denotes the restriction of $\tilde{T}$ to the subspace $W$ .

Furthermore, the norm of $\tilde{T}$ equals the norm of $T$ .

Suppose that $V_1$ is a Banach space and $V_2$ a normed vector space.

For any linear map $T: V_1 \to V_2$ , let $\text{Graph}(T)$ denote the set of pairs $(\psi, T\psi)$ in $V_1 \times V_2$ such that $\psi \in V_1$ .

If the graph of $T$ is a closed subset of $V_1 \times V_2$ , then $T$ is bounded.

Examples

Equip $C \big( [0, 1] \big)$ with the norm

$\begin{equation*} \norm{f}_{L^p} = \bigg( \int_{0}^{1} \left| f(x) \right|^p \ dx \bigg)^{1 / p} \end{equation*}$

Then,

$\begin{equation*} \bigg( \int_{0}^{1} \left| f(x) + g(x) \right|^p \ dx \bigg)^{1 / p} \le \bigg( \int_{0}^{1} \left| f(x) \right|^p \ dx \bigg)^{1 / p} + \bigg( \int_{0}^{1} \left| g(x) \right|^p \ dx \bigg)^{1 / p} \end{equation*}$

Metric space structure on NLS

Let $\big( X, \norm{\cdot} \big)$ is a NLS of $S \subseteq \bar{X}$ is a subspace.

If $S$ is open, then $S = X$ .

Clearly $\mathbf{0} \in S$ , hence $\forall r > 0$ there exists

$\begin{equation*} B_r(\mathbf{0}) = \left\{ x \in X : \norm{x} < r \right\} \subseteq S \end{equation*}$

Observe that $x \in S$ if and only if $\lambda x \in S$ for some $\lambda \ne 0$ . $\implies$ by definition of a subspace, and reverse is $\forall \mathbf{x} \in S$ which implies that $\frac{1}{\lambda} (\lambda \mathbf{x}) = \mathbf{x} \in S$ .

Cosnider $\lambda = \frac{r}{2 \norm{\mathbf{x}}} > 0$ since $\mathbf{x} \ne 0$ . Then,

$\begin{equation*} \norm{\lambda \mathbf{x}} = \lambda \norm{\mathbf{x}} = \frac{r}{2} < r \implies \lambda \mathbf{x} \in B_r(\mathbf{0}) \subseteq S \implies \mathbf{x} \in S \end{equation*}$

So basically, since a linear space is closed under scalar multiplication, any open subspace $S \subseteq X$ must contain any scaling of $\mathbf{x}$ , hence it must contain the entire space.

Let $\big( X, \norm{\cdot} \big)$ be a NLS with $\dim X < \infty$ .

Then $S \subseteq X$ is closed.

Completion of NLS

Notation

$\big( X, \norm{\cdot} \big)$ denotes a NLS which is not necessarily complete
$\big( E, \norm{\cdot}' \big)$ denotes the completion of $X$
Set of Cauchy sequences in $X$ :

$\begin{equation*} C = \left\{ \mathbf{x} = \big( x_j \big)_{j \ge 1} : \mathbf{x} \text{ is Cauchy in } X \right\} \end{equation*}$

Stuff

Formal procedure to "fill holes" in an NLS, i.e. making non-complete NLS (NLS for which Cauchy sequences does not necessarily converge), well, complete!
Observe that $X$ is dense in the completion of $X$ , i.e. $\big( E, \norm{x}' \big)$
Let .
- Let $\sim$ on $C$ be the equivalence relation such that
  
  $\begin{equation*} \mathbf{x} \sim \mathbf{y} \iff \norm{x_j - y_j} \to 0 \text{ as } j \to \infty \end{equation*}$

Let $\big( X, \norm{\cdot} \big)$ be a Banach space, and let $S \subseteq X$ be a subspace.

Then $\big( S, \norm{\cdot} \big)$ is a Banach space if and only if $S$ is closed.

$(\impliedby)$ Suppose $S$ is closed. Let $\big( x_n \big) \subseteq S$ be Cauchy. Then $\big( x_n \big)$ is Cauchy in $X$ . And since $S$ is closed, $S$ contains all its limit points, hence all convergent sequences in $S$ converges to a point in $S$ , i.e. $S$ is Banach.

$\big( \implies \big)$ . Suppose $\big( S, \norm{\cdot} \big)$ is a Banach space. Let $\big( x_n \big) \subseteq S$ s.t. $x_n \to x$ for some $x \in X$ . Note that $\big( x_n \big)$ is Cauchy, and since $S$ is Banach, there exists an $y \in S$ s.t. $\norm{x_n - y} \to 0$ as $n \to \infty$ . Then

$\begin{equation*} \norm{x - y} \le \norm{x - n} + \norm{x_n - y} \to 0 \text{ as } n \to infty \end{equation*}$

Hence, $x = y \in S$ , and $S$ is closed.

Let $\big( X, \norm{\cdot}_X \big)$ and $\big( Y, \norm{\cdot}_Y \big)$ be two normal linear spaces.

A linear map $T: X \to Y$ is said to be an isometry if $\norm{Tx}_Y = \norm{x}_X$ for all $x \in X$ .
We say and are isometrically isomorphic if there exists an isometry from to .
- Note that $T^{-1}: Y \to X$ is automatically a surjective isometry.
The Banach space completion of $\big( X, \norm{\cdot}_X \big)$ is a pair, consisting of a Banach space $\big( Y, \norm{\cdot}_Y \big)$ and an isometry $T: X \to Y$ s.t. $T(X)$ is a dense subspace of $Y$ .

Let $\{ T, \big( Y, \norm{\cdot}_Y \big) \}$ and $\{ S, \big( Z, \norm{\cdot}_Z \big) \}$ be two completions of $\big( X, \norm{\cdot}_X \big)$ .

Then $Y$ and $Z$ are isometrically isomorphic.

Let $\big( X, \norm{\cdot}_X \big)$ be an NLS.

Then there exists a unique a Banach space completion of $X$ .

Let

$\begin{equation*} \begin{split} C &= \left\{ \text{Cauchy sequences in } X \right\} \\ &= \left\{ \mathbf{x} = \big( x_j \big)_{j \ge 1} : x_j \in X, \forall j \text{ s.t. } \norm{x_j - x_k} \to 0 \text{ as } j,k \to \infty \right\} \end{split} \end{equation*}$
Equivalence relation $\sim$ on $C$

$\begin{equation*} \mathbf{x} \sim \mathbf{y} \iff \norm{x_j - y_j} \to 0 \quad \text{as} \quad j \to \infty \end{equation*}$
Let $\tilde{X} = C / \tilde$ , and define

$\begin{equation*} \begin{split} T: \quad & X \to \tilde{X} \\ & x \mapsto Tx = [\mathbf{x}] \end{split} \end{equation*}$
- Observe that
  
  $\begin{equation*} \mathbf{y} \in [\mathbf{x}] \iff \norm{y_j - x} \to 0 \end{equation*}$
  
  i.e. all sequences which converges to $x$ .
Equip $\tilde{X}$ a vector space structure, i.e. addition

$\begin{equation*} [\mathbf{x}] + [ \mathbf{y}] = [\mathbf{z}], \quad \mathbf{z} = \big( x_j + y_j \big)_{j \ge 1} \in C \end{equation*}$

and scalar multiplication

$\begin{equation*} \lambda [\mathbf{x}] = [\lambda \mathbf{x}] \end{equation*}$
Equip $\tilde{X}$ with a norm $\norm{\cdot}_{*}$

$\begin{equation*} \norm{\cdot}_{*} = \lim_{j \to \infty} \norm{x_j} \end{equation*}$
- Observe that
  
  $\begin{equation*} \norm{x_j - x_k} \to 0 \text{ as } j, k \to \infty \implies \left| \norm{x_j} - \norm{x_k} \right| \le \norm{x_j - x_k} \to 0 \end{equation*}$
  
  By completion of the underlying field $\mathbb{F}$ , $\big( \norm{x}_j \big)_{j \ge 1}$ is Cauchy and thus converges. Hence the above norm is defined
- Need to check that this is well-defined:
  
  $\begin{equation*} y \in [\mathbf{x}] \iff \norm{y_j - x_j} \to 0 \iff \lim_{j \to \infty} \norm{y_j} = \lim_{j \to \infty} \norm{x_j} \end{equation*}$
$\big( \tilde{X}, \norm{\cdot}_{*} \big)$ is a Banach space
Then $T: X \to \tilde{X}$ is a linear map, then

$\begin{equation*} \norm{T \mathbf{x}}_{*} = \norm{[\mathbf{x}]}_{*} = \lim_{j\to \infty} \norm{x_j} = \norm{x} \end{equation*}$

Finally, one can show that $T(X)$ is dense in $\tilde{X}$ , hence we have our Banach space completion.

Example

Let $C_0 \in \ell^\infty$ be a subspace.

$\overline{C}_0 = C_0$ , i.e. $C_0$ is closed in $\ell^\infty$ .

Let $\mathbf{x}_n = \big( x_{n, j} \big)_{j = 1}^\infty \in C_0$ , thus $x_{n, j} \to 0$ . Then

$\begin{equation*} \norm{\mathbf{x}_n - \mathbf{x}}_{\infty} \to 0 \end{equation*}$

for some $\mathbf{x} = \big( x_j \big)_{j = 1}^\infty \in \ell^{\infty}$ .

We then need to show that $\mathbf{x} \in C_0$ or $x_j \to 0$ as $j \to \infty$ . or

$\begin{equation*} \exists \varepsilon > 0, \exists \tau : |x_j| < \varepsilon, \quad \forall j \ge \tau \end{equation*}$

Given $\varepsilon > 0$ . Then $\exists n$ s.t. $\norm{\mathbf{x}_n - \mathbf{x}}_{\infty} < \varepsilon / 2$ . But $\mathbf{x}_n \in C_0$ implies

$\begin{equation*} \exists \tau : |x_{n, j}| < \varepsilon / 2, \quad \forall j \ge \tau \end{equation*}$

So for $j \ge \tau$ ,

$\begin{equation*} \left| x_j \right| \le \left| x_j - x_{n, j} \right| + \left| x_{n, j} \right| < \norm{\mathbf{x} - \mathbf{x}_n}_{\infty} + \frac{\varepsilon}{2} < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon \end{equation*}$

So basically, show that a sequence of sequences $\big( \mathbf{x}_n \big)$ converge to some sequence $\mathbf{x} \in C_0$ , thus $C_0$ contains all it's limit points, hence $C_0$ is closed.

Consider the NLS $\big( C([0, 1]), \norm{\cdot}_{L^p} \big)$ for $1 \le p < \infty$ .

Let

$\begin{equation*} S = \left\{ f \in C([0, 1]) : f(0) = 0 \right\} \end{equation*}$

Then $S$ is not closed, but $S$ is dense in $C([0, 1])$ , i.e.

$\begin{equation*} \overline{S} = C([0, 1]) \end{equation*}$

Let

$\begin{equation*} f_n(x) = \begin{cases} 1 & \text{if } x \in \bigg( \frac{1}{n}, 1 - \frac{1}{n} \bigg) \\ 0 & \text{otherwise} \end{cases} \end{equation*}$

Then $f_n(0) = 0$ , therefore $f_n \in S$ . But $f_n \to 1$ , but $1 \notin S$ , hence $S$ does not contain all of its limit points, i.e. is not closed, proving our first claim.

Now suppose that $f \in C([0, 1])$ . We now want to show that $\exists \big( f_n \big)_{n = 1}^\infty$ such that $f_n \in S$ and $f_n \to f$ .

TO THIS.

Equivalence of norms

We'll be referring to norms as equivalent if the induced norms are strongly equivalent metrics

Two norms $\norm{\cdot}_1$ and $\norm{\cdot}_2$ on $X$ are said to be equivalent if there is a constant $A$ s.t.

$\begin{equation*} \norm{x}_1 \le A \norm{x}_2 \quad \text{and} \quad \norm{x}_2 \le A \norm{x}_1 \quad \forall x \in X \end{equation*}$

This does in fact define an equivalence relation on norms.

Hilbert spaces

Let $H$ be a $\mathbb{R} \text{-Hilbert space}$ and $H^*$ its dual space. Then the map

$\begin{equation*} \begin{split} i: \quad & H \to H^* \\ & x \mapsto \left\langle \cdot, x \right\rangle, \quad \forall x \in H \end{split} \end{equation*}$

Then $i$ is isometric isomorphism.

If $\xi: \mathbf{H} \to \mathbb{C}$ is a bounded linear functional, then there exists a unique $\chi \in \mathbf{H}$ such that

$\begin{equation*} \xi(\psi) = \left\langle \chi, \psi \right\rangle, \quad \forall \psi \in \mathbf{H} \end{equation*}$

Furthermore, the operator norm of $\xi$ as a linear functional is equal to the norm of $\chi$ as an element of $\mathbf{H}$ .

$\Lambda \in H^*$ and $y \in H$ s.t. $\Lambda x = \left\langle x, y \right\rangle = \Lambda_y x$

Assume $\Lambda \ne 0$ and consider

$\begin{equation*} \text{ker}(\Lambda) = \left\{ x \in H : \Lambda x = 0 \right\} = : M \end{equation*}$
$M$ is closed ( $\Lambda$ cont.) and subspace ( $\Lambda$ linear) of $H$ . Then

$\begin{equation*} M \ne H \end{equation*}$

since $\Lambda \ne 0$ .
But $H = M \oplus M^{\perp}$ implies there exists nonzero $z \in M^{\perp}$ .
Take

$\begin{equation*} y = \frac{\overline{\Lambda z}}{\norm{z}^2} z \end{equation*}$

and check

$\begin{equation*} \Lambda x = \left\langle x, y \right\rangle, \quad \forall x \in H \end{equation*}$

Basis in Hilbert spaces

Orthogonal decomposition

If $f \perp g$ , then

$\begin{equation*} \norm{f + g}^2 = \norm{f}^2 + \norm{g}^2 \end{equation*}$

For any seminorm $\norm{\cdot}$ defined by a semi-inner product and any $f, g \in H$ ,

$\begin{equation*} \norm{f + g}^2 + \norm{f - g}^2 = 2 \norm{f}^2 + 2 \norm{g}^2 \end{equation*}$

For any inner product space $\big( H, \left\langle \cdot, \cdot \right\rangle \big)$ , and any Hilbert subspace $F \subset H$ , and $x \in H$ there is a unique representation

$\begin{equation*} x = y + z \end{equation*}$

with $y \in F$ and $z \in F^{\perp}$ .

Idea:

Prove existence of orthogonal decomposition
- Proof by contradiction
- Consider a lower-bound on the distance between $x - f$ for $f \in F$ , and then show that this is violated if $\left\langle z, f \right\rangle \ne 0$ , i.e. the "rest" of $x$ is not in $F^{\perp}$ .
Prove uniqueness

Let

$\begin{equation*} c := \inf \left\{ \norm{x - f}, f \in F \right\} \end{equation*}$

Then let $\big( x_{n} \big)_{n = 1}^{\infty}$ such that $f_n \in F$ and $\norm{x - f_n} \downarrow c$ .

Then, by lemma:parallelogram-law we have

$\begin{equation*} 2 \norm{x - f_n}^2 + 2 \norm{x - f_m}^2 = \norm{2x - (f_n + f_m)}^2 + \norm{f_n - f_m}^2 \end{equation*}$

thus,

$\begin{equation*} \norm{f_n - f_m}^2 = 2 \norm{x - f_n}^2 + 2 \norm{x - f_m}^2 - 4 \norm{x - \frac{f_n + f_m}{2}}^2 \end{equation*}$

By completeness of $F$ , we know that $f_n - f_m \to 0$ . Further, observe that $\frac{1}{2} \big( f_n + f_m \big) \in F$ , thus by def. of $c$ ,

$\begin{equation*} \norm{x - \frac{f_n + f_m}{2}} \ge c \end{equation*}$

Thus, as $m, n \to \infty$ , we have $f_n \to y$ for some $y \in F$ .

Now let $z := x - y$ . Then

$\begin{equation*} \norm{z} = c \end{equation*}$

by continuity of $\left\langle \cdot, \cdot \right\rangle$ and the fact that $\norm{x - f_n} \to c$ and $f_n \to y$ .

Further, suppose that $\left\langle z, f \right\rangle \ne 0$ for some $f \in F$ . Let

$\begin{equation*} u := \left\langle z, f \right\rangle \ v, \quad v \in \mathbb{F} \end{equation*}$

and $v \downarrow 0$ .

Then we observe that

$\begin{equation*} \norm{x - (y + uf)}^2 = \norm{z - uf}^2 = \norm{z}^2 + \norm{uf}^2 - 2 \left\langle z, uf \right\rangle \end{equation*}$

Substituting back in our epxression for $u$ :

$\begin{equation*} \norm{x - (y + uf)}^2 = \norm{z}^2 + v^2 \left| \left\langle z, f \right\rangle \right|^2 \norm{f}^2 - 2 \left\langle z, \left\langle z, f \right\rangle v f \right\rangle \end{equation*}$

The last term is simply $2 \left| \left\langle z, f \right\rangle \right|^2 v$ , thus

$\begin{equation*} \norm{x - (y + uf)}^2 = \norm{z}^2 + v^2 \left| \left\langle z, f \right\rangle \right|^2 \norm{f}^2 - 2 \left| \left\langle z, f \right\rangle \right| v \end{equation*}$

For sufficiently small $v$ , the $v$ term dominates which implies that

$\begin{equation*} \norm{x - (y + uf)}^2 < c^2 \end{equation*}$

for $y + uf \in F$ .

This clearly contradicts our definition of $c$ , hence we have the proof of decomposition existence by contradiction.

For uniqueness of the decomposition, we simply observe that if also $x = g + h$ for some $g \in F$ and $h \in F^{\perp}$ , then

$\begin{equation*} \big( y - g \big) + \big( z - h \big) = 0 \end{equation*}$

thus

$\begin{equation*} \norm{y - g + z - h}^2 = \norm{y - g}^2 + \norm{z - h}^2 \end{equation*}$

which implies $y = g$ and $z = h$ .

Many functions in $\mathcal{L}^2$ of Lebesgue measure, being unbounded, cannot be integrated with the classical Riemann integral. Therefore spaces of Riemann integrable functions would not be complete in the $\mathcal{L}^2$ norm, and the orthogonal decomposition would not apply to them.

Another victory for good ole' Lebesgue!

Orthonormal sets and bases

A set $\{ e_i \}_{i \in I}$ in a semi-inner product space $H$ is called orthonormal if and only if

$\begin{equation*} \left\langle e_i, e_j \right\rangle = \delta_{ij} \end{equation*}$

For any orthonormal set $\{ e_{\alpha} \}_{\alpha \in I}$ and $x \in H$ ,

$\begin{equation*} \norm{x}^2 \ge \sum_{\alpha \in I}^{} \left| \left\langle x, e_{\alpha} \right\rangle \right|^2 \end{equation*}$

If $\{ e_{\alpha} \}$ is an orthonormal set in $H$ , and $x, y \in H$ , where

$\begin{equation*} x = \sum_{\alpha \in I}^{} x_{\alpha} e_{\alpha} \end{equation*}$

and

$\begin{equation*} y = \sum_{\alpha \in I}^{} y_{\alpha} e_{\alpha} \end{equation*}$

Then $x_{\alpha} = \left\langle x, e_{\alpha} \right\rangle$ and $y_{\alpha} = \left\langle y, e_{\alpha} \right\rangle$ for all $\alpha \in I$ .

Furthermore,

$\begin{equation*} \left\langle x, y \right\rangle = \sum_{\alpha \in I}^{} x_{\alpha} \overline{y}_{\alpha} \end{equation*}$

For any Hilbert space $H$ , any orthonormal set $\{ e_{\alpha} \}$ and any $x_{\alpha} \in \mathbb{F}$ , then

$\begin{equation*} \sum_{\alpha \in I}^{} \left| x_{\alpha} \right|^2 < \infty \iff \sum_{\alpha \in I}^{} x_{\alpha} e_{\alpha} \to x \in H \end{equation*}$

$\big( \impliedby \big)$ : Follows from Bessel's inequality and Parseval-Bessel equality, since we have

$\begin{equation*} \norm{x}^2 \ge \sum_{\alpha \in I}^{} \left| \left\langle x, e_{\alpha} \right\rangle \right|^2 = \sum_{\alpha \in I}^{} \left| x_{\alpha} \right|^2 \end{equation*}$

where the first equality is from Bessel's inequality and the second from Parseval-Bessel equality.

$\big( \implies \big)$ : For each $n = 1, 2, \dots$ choose a finite set $F(n) \subset I$ such that $F(n)$ increases with $n$ and

$\begin{equation*} \sum_{\alpha \notin F(n)}^{} \left| x_{\alpha} \right|^2 < \frac{1}{n^2} \end{equation*}$

Then

$\begin{equation*} \sum_{\alpha \in F(n)} x_{\alpha} e_{\alpha}}^{} \end{equation*}$

is a Cauchy sequence, hence converges to some $x \in H$ since $H$ is complete.

Then the net of all partial sums converges to the same limit, concluding our proof.

Let $H$ be any inner product space and $\{ f_n \}$ be any linearly independent sequenec in $H$ .

Then there is an orthonormal sequence $\{ e_n \}$ in $H$ s.t. for each $n$ , $\{ f_1, \dots, f_n \}$ and $\{ e_1, \dots, e_n \}$ have the same linear span.

Side-note on why Hilbert space > Banach space when talking about bases

In any vector space $S$ , a Hamel basis is a set $\{ e_{\alpha} \}_{\alpha \in I}$ such that every $s \in S$ can be written uniquely as

$\begin{equation*} \sum_{\alpha \in I}^{} s_{\alpha} e_{\alpha} \end{equation*}$

with only finitely many $s_{\alpha} \ne 0$ .

So, Hamel basis is an algebraic notion, which does not relate to any topology on $S$ .

In a Banach space $\big( S, \norm{\cdot} \big)$ , an unconditional basis is a collection $\{ e_{\alpha} \}_{\alpha \in I}$ such that for every $s \in S$ ,

$\begin{equation*} \exists! \left\{ s_{\alpha} \right\}_{\alpha \in I} : \quad \sum_{\alpha \in I}^{} s_{\alpha} e_{\alpha} = s \end{equation*}$

converging for $\norm{\cdot}$ .

In a separable Banach space $S$ , a Schauder basis is a sequence $\{ f_n \}_{n \ge 1}$ such that for every $s \in S$ ,

$\begin{equation*} \exists! \ \big( s_n \big)_{i = 1}^n: \quad \norm{s - \sum_{j = 1}^{n} s_j f_j} \to 0 \text{ as } n \to \infty \end{equation*}$

It is possible to find a Schauder basis in the "most useful" separable Banach spaces, but Schauder bases may not be conditional bases, and in general it may be very hard to find unconditional bases.
Orthonormal basis for Hilbert space
Coming up with a basis for an infinite-dimensional space comes down to constructing a sequence of orthonormal vectors $e_k$ by taking some vector for which the projection $P_{X_k} u_k \ne u_k$ , i.e. all of $u_k$ does not lie in $X_k$ .

$\begin{equation*} e_{k + 1} = \frac{u_k - \sum_{i=1}^{k} \left\langle u_k, e_i \right\rangle e_i}{\norm{u_k - \sum_{i=1}^{k} \left\langle u_k, e_i \right\rangle e_i}} \end{equation*}$

Then we prove that this gives us a space which is dense in the "parent" space.

More concretely, let $X_k = \text{span} \left\{ e_k \right\}$ and $\big( e_k \big)_{k \ge 1}$ be defined by choosing some $u_{k + 1}$ such that $(P_{X_k} - I)u_{k + 1} \ne 0$ , i.e. $u_{k + 1} \ne X_k$ , and

$\begin{equation*} e_{k + 1} = \frac{u_{k + 1} - P_{X_k} u_{k + 1}}{\norm{u_{k + 1} - P_{X_k} u_{k + 1}}} \end{equation*}$

Every Hilbert space has an orthonormal basis.

If a collection of orthonormal sets is linearly ordered by inclusion (a chain), then their union is clearly an orthonormal set.

Thus by Zorn's lemma, let $\{ e_{\alpha} \}_{\alpha \in I}$ be the maximal orthonormal set.

Take any $x \in H$ . Let

$\begin{equation*} y := \sum_{\alpha \in I}^{} \left\langle x, e_{\alpha} \right\rangle \ e_{\alpha} \end{equation*}$

where the sum converges by Bessel's inequality and Riesz-Ficher theorem.

If $y = x$ , we are done. Otherwise, then $x - y \perp e_{\alpha}$ for all $\alpha \in I$ , so we can adjoin a new element

$\begin{equation*} \frac{x - y}{\norm{x - y}} \end{equation*}$

contradicting the maximality of the orthonormal set.

Every Hilbert space is isometric to a space $\ell^2(X)$ for some set $X$ .

Let $\{ e_{\alpha} \}_{\alpha \in X}$ be an orthonormal basis for $H$ .

Then $x \mapsto \left\{ \left\langle x, e_{\alpha} \right\rangle \right\}_{\alpha \in X}$ takes $H$ into $\ell^2(X)$ by Bessel's inequality.

This function preserves inner products by the Parseval-Bessel equality. It is onto $\ell^2(X)$ by the Riesz-Fischer theorem, concluding our proof.

For any inner product space $\big( H, \left\langle \cdot, \cdot \right\rangle \big)$ , an orthonormal set $\{ e_{\alpha} \}_{\alpha \in I}$ is an orthonormal basis of $H$ if and only if its linear span $S$ is dense in $H$ .
Let $\{ e_j \}_{j = 1}^\infty$ be a orthonormal sequence in a Hilbert space $H$ .

The following are equivalent:
1. If $x \in H$ s.t. $\hat{x}(j) = \left\langle x, e_j \right\rangle = 0$ for all $j$ , then
  
  $\begin{equation*} x = 0 \end{equation*}$
  
  In other words, the sequence $\left\{ e_j \right\}_{j \ge 1}$ is a maximal orthonormal family of vectors.
2. Span of $\{ e_j \}_{i = 1}^\infty$ is dense in $H$
  
  $\begin{equation*} H = \overline{\text{Span}(\left\{ e_j \right\}_{j = 1}^\infty)} \end{equation*}$
3. Unique convergence
  
  $\begin{equation*} x = \sum_{j=1}^{\infty} \left\langle x, e_j \right\rangle e_j = \lim_{n \to \infty} \sum_{j=1}^{n} \left\langle x, e_j \right\rangle e_j, \quad \forall x \in H \end{equation*}$
  
  then
  
  $\begin{equation*} \norm{x - \sum_{j=1}^{n} \left\langle x, e_j \right\rangle e_j} \to 0 \quad \text{as} \quad n \to \infty \end{equation*}$
4. Inner product on basis
  
  $\begin{equation*} \left\langle x, y \right\rangle = \sum_{i=1}^{\infty} \hat{x}(j) \overline{\hat{y}(j)} = \sum_{j=1}^{\infty} \left\langle x, e_j \right\rangle \overline{\left\langle y, e_j \right\rangle}, \quad \forall x, y \in H \end{equation*}$
5. The norm
  
  $\begin{equation*} \norm{x}^2 = \sum_{j=1}^{\infty} \left| \hat{x}(j) \right|^2 = \sum_{j=1}^{\infty} \left| \left\langle x, e_j \right\rangle \right|^2, \quad \forall x \in H \end{equation*}$
If one of these statements hold (and thus all of them hold), we say $\{ e_j \}_{j = 1}^\infty$ is an orthonormal basis of $H$ .
$\big( 1 \implies 2 \big)$ : Idea is that $(1)$ implies $\text{Span} \big( \left\{ e_j \right\}_{j = 1}^{\infty} \big)^{\perp} \subset H$ as a subspace is simply $\left\{ 0 \right\}$ , hence the closure is dense in $H$ .

Suppose $M = \overline{\text{Span}(\left\{ e_j \right\}_{j = 1}^\infty)} \ne H$ .

Then since $H = M \oplus M^{\perp}$ , we have $M^{\perp} \ne \left\{ 0 \right\}$ and so $\exists x \in M^{\perp}$ s.t. $x \ne 0$ and $\hat{x}(j) = \left\langle x, e_j \right\rangle = 0$ for all $j$ . But this is a contradiction wrt. $(1)$ , hence we have our proof.

$(2 \implies 3)$ : Let $x \in H$ and since $H = \overline{\text{Span}(\left\{ e_j \right\}_{j = 1}^\infty)}$ .

$\begin{equation*} \exists Z_N = \sum_{j=1}^{n_N} \alpha_j e_j \to X \quad \text{as} \quad N \to \infty \end{equation*}$

Set

$\begin{equation*} y_n = \sum_{j=1}^{n} \left\langle x, e_j \right\rangle e_j \end{equation*}$

We make the following observations:
1. $y_n = P_{Y_n} x$ and $Y_n = \text{Span}(\left\{ e_1, \dots, e_n \right\})$ .
2. $\norm{x - P_{Y_{n_N}} x } \le \norm{x - Z_N} \to 0$ as $N \to \infty$ which tells us that a particular subsequence converges.
3. $\forall n \ge n_N$ we have $Z_N \in Y_{n_N}$ ,
  
  $\begin{equation*} \norm{x - y_n} = \norm{x - P_{Y_n} x} \le \norm{x - P_{Y_{n_N}} x} \to 0 \end{equation*}$
  
  as $N \to \infty$ , since $P_{Y_{n_N}} \in Y_{n_N}$ . Which tells us that all possible subsequences converge.
Therefore $y_n \to x$ .

$\big( 3 \implies 4 \big)$ :

$\begin{equation*} \begin{split} \left\langle x, y \right\rangle &= \left\langle \bigg( \sum_{i=1}^{\infty} \left\langle x, e_i \right\rangle e_i \bigg) \bigg( \sum_{j=1}^{\infty} \left\langle y, e_j \right\rangle e_j \bigg) \right\rangle \\ &= \sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \Big\langle \left\langle x, e_i \right\rangle e_i, \left\langle y, e_j \right\rangle e_j \Big\rangle \\ &= \sum_{j=1}^{\infty} \left\langle x, e_j \right\rangle \overline{\left\langle y, e_j \right\rangle} \end{split} \end{equation*}$

were we have used the interchanging of limits on multiple occasions, and in the final equality used the orthogonality of the $\left\{ e_j \right\}_{j = 1}^{\infty}$ .

$\big( 4 \implies 5 \big)$ : Apply 4 with $y = x$ .

$\big( 5 \implies 1 \big)$ : Otherwise we would have extra terms for the norm, rather than just the "Fourier" coefficients → contradiction.

Bounded Linear Operators

Notation

$T: X \to Y$ be linear operator and $X$ and $Y$ are normed linear spaces
$\mathcal{L}(X, Y)$ be vector space of bounded linear operators from $X$ to $Y$ .
$T ∈

Stuff

Let $T: X \to Y$ be linear operator and $X$ and $Y$ are normed linear spaces.

Then the following are equivalent

$T$ is continuous on all of $X$
$T$ is continuous at $x = 0$
$T$ is bounded

$(1) \iff (2)$ is seen by observing that for a sequence $x_n \to x$ we have

$\begin{equation*} T(x_n - x) = T(x_n) - T(x) = T (0) = 0 \end{equation*}$

and so $(1)$ and $(2)$ are equivalent.

$(2) \implies (3)$ : Suppose $T$ is continuous at $x = 0$ . Thus $\forall \varepsilon > 0$

$\begin{equation*} \exists \delta > 0 : \quad \norm{x} \le \delta \implies \norm{T x} \le \varepsilon \end{equation*}$

Let $\varepsilon = 1$ , and let $z \in X \setminus \left\{ 0 \right\}$ and

$\begin{equation*} x = \delta \frac{z}{\norm{z}} \implies \norm{x} = \delta \end{equation*}$

$\begin{equation*} \norm{T x} < 1 \implies \norm{T \bigg( \delta \frac{z}{\norm{z}} \bigg)} = \frac{\delta}{\norm{z}} \norm{T z} < 1 \end{equation*}$

which gives us

$\begin{equation*} \norm{T z} < \frac{1}{\delta} \norm{z} \end{equation*}$

Let $\mathcal{L}(X, Y)$ be vector space of bounded linear operators from $X$ to $Y$ .

Then

$\begin{equation*} \norm{T}_{\mathcal{L}} := \sup_{x \in X: \norm{x}_X = 1} \norm{Tx}_Y \end{equation*}$

defines a norm on $\mathcal{L(X, Y)}$ .

If $\dim X < \infty$ , then all linear maps $T: X \to Y$ are continuous.

Hence $\mathcal{L}(X, Y) = \text{Mat}(X, Y)$ .

Let $H$ be a Hilbert space, then $H^* \cong H$ .

There exists a conjugate isometric isomorphism

$\begin{equation*} \begin{split} T: \quad & H \to H^* \\ & y \mapsto \Lambda y \end{split} \end{equation*}$

where

$\begin{equation*} \begin{split} \Lambda_y x &= \left\langle x,y \right\rangle \\ \norm{\Lambda_y} &= \norm{y} \end{split} \end{equation*}$

T is onto: let

$\begin{equation*} M := \left\{ x \in H: \Lambda x = 0 \right\} \end{equation*}$

Then $M$ is a closed subspace of $H$ .

Suppose $M \ne H$ , then we can decompose $H = M \oplus M^{\perp}$ , then this implies there exists some nonzero $z \in M^\perp$ .

Let $x \in H$ and consider

$\begin{equation*} w = \big( \Lambda x \big) z - \big( \Lambda z \big) x \end{equation*}$

Then observe that

$\begin{equation*} \Lambda w = \Lambda x \Lambda z - \Lambda z \Lambda x = 0 \implies w \in M \end{equation*}$

which implies that

$\begin{equation*} 0 = \left\langle w, z \right\rangle = \Lambda x \left\langle z, z \right\rangle - \Lambda z \left\langle x, z \right\rangle \end{equation*}$

which implies

$\begin{equation*} \Lambda x = \left\langle x, \frac{\overline{\Lambda z}}{\norm{z}^2} z \right\rangle \end{equation*}$

So for

$\begin{equation*} y = \frac{\overline{\Lambda z}}{\norm{z}^2} z \end{equation*}$

Therefore,

$\begin{equation*} \Lambda x = \left\langle x, y \right\rangle, \quad \forall x \in H \end{equation*}$

or equiv,

$\begin{equation*} \Lambda = \Lambda_y \end{equation*}$

$\dim X = \infty$ and $\dim \text{im}(T) < \infty$

Let

$\begin{equation*} \text{FR}(X, Y) = \left\{ T \in \mathcal{L}(X, Y) : \dim \text{im}(T) < \infty \right\} \end{equation*}$

is the subspace of finite rank operators.

Then for $\forall x \in X$ and $Tx \in \text{im}(T)$ , there exists $\left\{ f_j(x) \right\}_{j = 1}^{\dim \text{im}(T)}$ , i.e. constants which depend on $x$ (functions yo!) s.t.

$\begin{equation*} Tx = \sum_{j=1}^{\dim \text{im}(T)} f_j(x) e_j \end{equation*}$

Then:

$T$ is linear $\iff$ $f_j: X \to \mathbb{F}$ are linear for all $j$
$T$ bounded / cont. $\iff$ $f_j: X \to \mathbb{F}$ are bounded / cont. for all $j$

If $Y = \mathbb{F}$ then $\mathcal{L}(X, \mathbb{F}) = X^*$ , i.e. the dual space of $X$ !

If $Y$ is a Banach space, then $\mathcal{L}(X, Y)$ is a Banach space.

Examples with $\dim Y = 1$ , i.e. $Y = \mathbb{F}$

$X = \ell_1^p$ for $1 < p < \infty$

$\begin{equation*} X^* = \mathcal{L}(X, \mathbb{F}) \cong \ell_1^q, \quad \frac{1}{p} + \frac{1}{q} = 1 \end{equation*}$

Fix $y \in \big( y_j \big)_{j \ge 1} \in \ell^q$ and define

$\begin{equation*} \Lambda_y(x) = \sum_{j=1}^{\infty} x_j \overline{y_j} \end{equation*}$

Then

$\begin{equation*} \left| \Lambda_y(x) \right| \le \sum_{j=1}^{\infty} \left| x_j \right| \left| y_j \right| \le \norm{x}_p \norm{y}_q \end{equation*}$

by Hölder's inequality. So $\Lambda_y$ is bounded,

$\begin{equation*} \left| \Lambda_y(x) \right| \le A \norm{x}_p, \quad A = \norm{y}_q \end{equation*}$

(remember we fixed $y$ ).

Hence $\Lambda_y \in \big( \ell^p \big)^*$ and $\norm{\Lambda_y} \le \norm{y}_q$ .

Letting

$\begin{equation*} x_j := \begin{cases} \left| y_j \right|^{q - 2} y & \text{if } y_j \ne 0 \\ 0 & \text{otherwise} \end{cases} \end{equation*}$

Then we attain the UB, and so we have equality.

We can then isometrically embed $\ell^q$ into $\big( \ell^p \big)^{\infty}$ . That is, we can show that for any $\Lambda \in \big( \ell^p \big)^*$ , there exists $y \in \ell^q$ s.t.

$\begin{equation*} \Lambda_y = \Lambda \end{equation*}$

Hilbert-Schmidt operators

Let $T \in \mathcal{L}(H, K)$ where $H, K$ are separable Hilbert spaces.

We say that $T$ is a Hilbert-Schmidt operator if

$\begin{equation*} \sum_{m \ge 1}^{} \norm{T^* f_m}^2 < \infty \end{equation*}$

The space of all such operators are denoted $\mathcal{H} \mathcal{S}(H, K)$ with the norm defined

$\begin{equation*} \norm{T}_{HS} = \sqrt{\sum_{m \ge 1}^{} \norm{T^* f_m}^2} \end{equation*}$

$\mathcal{H} \mathcal{S}(H, K)$ is a vector space and

$\begin{equation*} \mathcal{H} \mathcal{S}(H, K) \subseteq \mathcal{L}(H, K) \end{equation*}$

but $\mathcal{H} \mathcal{S}(H, K)$ is not a closed subspace of $\mathcal{L}(H, K)$ since

$\begin{equation*} \norm{T} \le \norm{T}_{HS} = \sqrt{\sum_{m \ge 1}^{} \norm{T^* f_m}^2} \end{equation*}$

Suppose $\left\{ f_m \right\}_{m \ge 1}$ be a ONB for $K$ and $\left\{ e_n \right\}_{n \ge 1}$ ONB for $H$ .
Let $T \in \mathcal{H} \mathcal{S}(H, K)$ then

$\begin{equation*} \begin{split} T e_n \in K & \implies T e_n = \sum_{m \ge 1}^{} \left\langle T e_n, f_m \right\rangle f_m \\ T^* f_m \in H & \imlpies T^* f_m = \sum_{n \ge 1}^{} \left\langle T^* f_m, e_n \right\rangle e_n \end{split} \end{equation*}$

and

$\begin{equation*} \begin{split} \norm{T e_n}^2 &= \sum_{m \ge 1}^{} \left| \left\langle T e_n, f_m \right\rangle \right|^2 = \sum_{m \ge 1}^{} \left| \left\langle e_n, T^* f_m \right\rangle \right|^2 \\ \norm{T^* f_m}^2 &= \sum_{n \ge 1}^{} \left| \left\langle T^* f_m, e_n \right\rangle \right|^2 \end{split} \end{equation*}$
Thus,

$\begin{equation*} \begin{split} \sum_{ m \ge 1}^{} \norm{T^* f_m}^2 &= \sum_{m \ge 1}^{} \sum_{n \ge 1}^{} \left| \left\langle T^* f_m, e_n \right\rangle \right|^2 \\ &= \sum_{n \ge 1}^{} \sum_{m \ge 1}^{} \left| \left\langle e_n, T^* f_m \right\rangle \right|^2 \\ &= \sum_{n \ge 1}^{} \norm{T e_n}^2 \end{split} \end{equation*}$

(switching sums we can always when all the terms are nonnegative).
Hence, the definition of a Hilbert-Schmidt operator (also the norm $\norm{\cdot}_{HS}$ ) is independent of the choice of ONB

Let

$\begin{equation*} T_N x = \sum_{m=1}^{N} \left\langle x, T^* f_m \right\rangle f_m \end{equation*}$

Then $T_N$ is a FR operator.

$\norm{T - T_N} \to 0$ as $N \to \infty$ then

$\begin{equation*} \text{FR}(H, K) \subseteq \mathcal{H} \mathcal{S}(H, K) \subseteq \overline{\text{FR}(H, K)} \end{equation*}$

since then $T = \lim_{N \to \infty} T_N$ which would imply that $\mathcal{H} \mathcal{S}(H, K)$ contain the limit points of $\text{FR}(H, K)$ .

$\begin{equation*} \begin{split} \norm{T x - T_N x}^2 &= \sum_{m = N + 1}^{\infty} \left| \left\langle x, T^* f_m \right\rangle \right|^2 \\ & \le \bigg( \sum_{m = N + 1}^{\infty} \norm{T^* f_m}^2 \bigg) \cdot \norm{x}^2 \\ \implies \norm{T - T_N} & \le \sqrt{\sum_{m = N + 1}^{\infty} \norm{T^* f_m}} \to 0 \end{split} \end{equation*}$

since $T \in \mathcal{H} \mathcal{S}(H, K)$ i.e.

$\begin{equation*} \sum_{m \ge 1}^{} \norm{T^* f_m}^2 < \infty \end{equation*}$

Example: kernel operators on $X = C([0, 1])$

Let $K \in C \big( [0, 1] \times [0, 1] \big)$ be what's called a kernel on $C([0, 1])$ and let

$\begin{equation*} T_K f(x) := \int_{0}^{1} K(x, y) f(y) \dd{y} \end{equation*}$

called a integral operator of $K$ .

Then

$\begin{equation*} \begin{split} \left| T_K f(x) \right| &\le \int_{0}^{1} \left| K(x, y) \right| \left| f(y) \right| \dd{y} \\ & \le \sqrt{\int_{0}^{1} \left| K(x, y) \right|^2 \dd{y}} \sqrt{\int_{0}^{1} \left| f(y) \right|^2 \dd{y}} \end{split} \end{equation*}$

which implies that

$\begin{equation*} \norm{T_K f}_{L^2}^2 = \int_{0}^{1} \left| T_Kf(x) \right|^2 \dd{x} \le \underbrace{\bigg( \int_{0}^{1} \int_{0}^{1} \left| K(x, y) \right|^2 \dd{x} \dd{y} \bigg)}_{= A^2} \cdot \norm{f}_{L^2}^2 \end{equation*}$

That is,

$\begin{equation*} \norm{T_K f}_{L^2} \le A \norm{f}_{L^2} \end{equation*}$

So $T_K$ is bounded and $\norm{T} \le A$ .

$\begin{equation*} \int_{0}^{1} \int_{0}^{1} \left| K(x, y) \right|^2 \dd{x} \dd{y} = \sum_{n \ge 1}^{} \norm{T_K e_n}^2 \end{equation*}$

Let $H = L^2([0 , 1])$ be the completion of $\big( C([0, 1]), \left\langle \cdot, \cdot \right\rangle \big)$ .

Consider the ONB defined by the Fourier coefficients $\left\{ e_n(y) = e^{2 \pi i n y} \right\}$ .

$\begin{equation*} \norm{T e_n}_{L^2}^2 = \int_{0}^{1} \left| T e_n(x) \right|^2 \dd{x} = \int_{0}^{1} \left| \left\langle \overline{K}_x, e_n \right\rangle^2 \right| \dd{x} \end{equation*}$

where

$\begin{equation*} \left\langle \overline{K}_x, e_n \right\rangle = T e_n(x) \end{equation*}$

Summing over $n$

$\begin{equation*} \begin{split} \sum_{n \ge 1}^{} \norm{T e_n}^2 &= \int_{0}^{1} \sum_{n \ge 1}^{} \left| \left\langle \overline{K}_x, e_n \right\rangle \right|^2 \dd{x} \\ &= \int_{0}^{1} \norm{K_x}_{L^2}^2 \dd{x} \\ &= \int_{0}^{1} \int_{0}^{1} \left| K(x, y) \right|^2 \dd{x} \dd{y} \end{split} \end{equation*}$

Compact operators

Notation

$K(X, Y) \subseteq \mathcal{L}(X, Y)$ denotes the space of compact operators
$c_0 = \left\{ \boldsymbol{\lambda} = \big( \lambda_n \big): \lambda_n \to 0 \right\}$ , i.e. denotes the set of all convergent sequences

Stuff

Let $X$ and $Y$ be normed spaces and let $T \in \mathcal{L}(X, Y)$ .

Then $T$ is said to be a compact operator if $\overline{T \big( \bar{B}_1(0) \big)} = \overline{T \big( \left\{ x : \norm{x} \le 1 \right\} \big)}$ is a compact subset of $Y$ .

That is, $T$ is compact if the image of the unit closed ball is compact.

Recall that in finite-dimensional space, by Heine Borel, closed and bounded subsets are compact.

Hence finite-rank operators are always compact!

Suppose that $X$ is a normed space, $Y$ is a Banach space and that $T_j \in \mathcal{L}(X, Y)$ is a compact operator.

Suppose there is a $T \in \mathcal{L}(X, Y)$ s.t. $\norm{T_j - T} \to 0$ as $j \to \infty$ . Then $T$ is compact.

$K(X, Y)$ is a subspace of $\mathcal{L}(X, Y)$
If $Y$ is a Banach space, then $K(X, Y)$ is closed (i.e. Theorem thm:limit-of-compact-operators-is-compact since it contains all its limit points)

Supose $H, K$ are separable Hilbert spaces (thus there exist bases)
We have

$\begin{equation*} \text{FR}(H, K) \subset \mathcal{H} \mathcal{S} (H, K) \subset \overline{\text{FR}(H, K)} = K(H, K) \end{equation*}$
- Though we will only prove $\overline{\text{FR}(H, K)} \subseteq K(H, K)$
Let $T: H \to H$ linear and $\left\{ e_n \right\}$ be a ONB in $H$ , and assume $\dim H = \infty$ since otherwise "containments" would be equality for all in the above.
Furthermore, suppose

$\begin{equation*} T e_n = \lambda_n e_n \end{equation*}$
$T \in \mathcal{L}(H)$ iff $\boldsymbol{\lambda} = \big( \lambda_n \big) \in \ell^{\infty}$ and $\norm{T} = \norm{\boldsymbol{\lambda}}_{\infty}$ , since

$\begin{equation*} x = \sum_{n \ge 1}^{} \left\langle x, e_n \right\rangle e_n \longrightarrow T x = \sum_{n \ge 1}^{ } \left\langle x, e_n \right\rangle \lambda_n e_n \end{equation*}$
$T \in \mathcal{H} \mathcal{S}(H)$ iff $\boldsymbol{\lambda} \in \ell^{\infty}$ and $\norm{T}_{\mathcal{H} \mathcal{S}} = \norm{\boldsymbol{\lambda}}_{2}$
$T \in \text{FR}(H)$ iff $\boldsymbol{\lambda} \in \mathbb{F}_0^{\infty} = \left\{ \boldsymbol{\lambda} = \big( \lambda_n \big)_{n = 1}^n: \lambda_n = 0 \right\}$
$T \in K(H)$ iff $\boldsymbol{\lambda} \in c_0$
$T \in \mathcal{L}(H)$ iff $\boldsymbol{\lambda} \in \ell^{\infty}$
From this we get the above sequence of containments, since

$\begin{equation*} \mathbb{F}_0^{\infty} \subset \ell^2 \subset c_0 \subset \ell^{\infty} \end{equation*}$

Spectral Theorem for Bounded Self-Adjoint Operators

Notation

$\mathbf{H}$ is the separable complex Hilbert space
Operator norm of $A$ on $\mathbf{H}$ is

$\begin{equation*} ||A|| := \sup_{\psi \in \mathbf{H} \setminus \left\{ 0 \right\}} \frac{||A \psi||}{||\psi||} \end{equation*}$

is finite.
Banach space of bounded operators on $\mathbf{H}$ , wrt. operator norm is denoted $\mathcal{B}(\mathbf{H})$ .
$\rho(A)$ denotes the resolvent set of $A$
$\sigma(A)$ denotes the spectrum of $A$
$\mu^A$ denotes the projection-valued measure associated with the operator self-adjoint $A$
For any projection-vauled measure $\mu$ and $\psi \in \mathbf{H}$ , we have an ordinary (positive) real-valued measure $\mu_{\psi}$ given by

$\begin{equation*} \mu_{\psi} (E) = \left\langle \psi, \mu(E) \psi \right\rangle \end{equation*}$
$Q_f: \mathbf{H} \to \mathbb{C}$ is a map defined by

$\begin{equation*} Q_f (\psi) = \int_X f \ d \mu_{\psi} = \left\langle \psi, \bigg( \int_X f \ d \mu \bigg) \psi \right\rangle \end{equation*}$
Spectral subspace for each Borel set $E \subset \mathbb{R}$

$\begin{equation*} V_E = \text{Range} \big( \mu^A (E) \big) \end{equation*}$

of $\mathbf{H}$
$\{ e_j( \cdot ) \}_{j = 1}^\infty$ defines a simultanouesly orthonormal basis for a family $\{ \mathbf{H}_{\lambda}, \lambda \in X \}$ of separable Hilbert spaces

Properties of Bounded Operators

Linear operator $A$ on $\mathbf{H}$ is said to be bounded if the operator norm of $A$

$\begin{equation*} ||A|| := \sup_{\psi \in \mathbf{H} \setminus \left\{ 0 \right\}} \frac{||A \psi||}{||\psi||} \end{equation*}$

is finite.
Space of bounded operators on $\mathbf{H}$ forms a Banach space under the operator norm, and we have the inequality

$\begin{equation*} ||AB|| \le ||A|| \ ||B|| \end{equation*}$

for all bounded operators on $A$ and $B$ .

For $A \in \mathcal{B}(\mathbf{H})$ , the resolvent set of $A$ , denoted $\rho(A)$ is the set of all $\lambda \in \mathbb{C}$ such that the operator $\big( A - \lambda I \big)$ has a bounded inverse.

The spectrum of $A$ , denoted by $\sigma(A)$ , is the complement in $\mathbb{C}$ of the resolvent set.

For $\lambda$ in the resolvent set of $A$ , the operator $\big( A - \lambda I \big)^{-1}$ is called the resolvent of $A$ at $\lambda$ .

Alternatively, the resolvent set of $A$ can be described as the set of $\lambda \in \mathbb{C}$ for which $\big( A - \lambda I \big)$ is one-to-one and onto.

For all $A \in \mathcal{B}(\mathbf{H})$ , the following results hold.

The spectrum $\sigma(A)$ of $A$ is closed, bounded and nonempty subset of $\mathbb{C}$ .
If $|\lambda| > ||A||$ , then $\lambda$ is in the resolvent set of $A$

Point 2 in proposition:hall13-quant-7.5 establishes that $\sigma(A)$ is bounded if $A$ is bounded.

Suppose $A \in \mathcal{B}(\mathbf{H})$ satisfies $||A|| < 1$ .

Then the operator $\big( I - A \big)$ is invertible, with the inverse given by the following convergent series in $\mathcal{B}(\mathbf{H})$ :

$\begin{equation*} \big( I - A \big)^{-1} = I + A + A^2 + A^3 + \dots \end{equation*}$

For all $A \in \mathcal{B}(\mathbf{H})$ , we have

$\begin{equation*} \big[ \text{Range}(A) \big]^{\perp} = \text{ker}(A^*) \end{equation*}$

Spectral Theorem for Bounded Self-Adjoint Operators

Given a bounded self-adjoint operator $A$ , we hope to associate with each Borel set $E \subset \sigma(A)$ a closed subspace $V_E$ of $\mathbf{H}$ , where we think intuitively that $V_E$ is the closed span of the generalized eigenvectors for $A$ with eigenvalues in $E$ .

We would expect the following properties of these subspaces:

and
- Captures idea that generalized eigenvectors should span $\mathbf{H}$
If and are disjoint, then
- Generalized eigenvectors ought to have some sort of orthogonality for distinct eigenvalues (even if not actually in $\mathbf{H}$ )
For any $E$ and $F$ , $V_{E \cap F} = V_E \cap V_F$
If $E_1, E_2, \dots$ are disjoint and $E = \cup_j E_j$ , then

$\begin{equation*} V_E = \bigoplus_j V_{E_j} \end{equation*}$
For any $E$ , $V_E$ is invariant under $A$ .
If $E \subset [\lambda_0 - \varepsilon, \lambda_0 + \varepsilon]$ and $\spi \in V_E$ , then

$\begin{equation*} || (A - \lambda_0 I) \psi || \le \varepsilon ||\psi|| \end{equation*}$

Projection-Valued measures

For any closed subspace $V \subset \mathbf{H}$ , there exists a unique bounded operator $P$ such that

$\begin{equation*} P v = \begin{cases} v & \text{if } v \in V \\ 0 & \text{otherwise (or equiv. } V^{\perp} \text{)} \end{cases} \end{equation*}$

where $V^{\perp}$ is the orthogonal complement.

This operator is called the orthogonal projection onto $V$ and it satisfies

$\begin{equation*} P^2 = P \quad \text{and} \quad P^* = P \end{equation*}$

i.e. it's self-adjoint.

One also has the properties

$\begin{equation*} \left\langle Px, (y - Py) \right\rangle = \left\langle (x - Px), Py \right\rangle = 0 \end{equation*}$

or equivalently,

$\begin{equation*} \left\langle x, Py \right\rangle = \left\langle Px, Py \right\rangle = \left\langle Px, y \right\rangle \end{equation*}$

Conversely, if $P$ is any bounded operator on $\mathbf{H}$ satisfying $P^2 = P$ and $P^* = P$ , then $P$ is the orthogonal projection onto a closed subspace $V$ , where

$\begin{equation*} V = \text{range}(P) \end{equation*}$

Convenient ot describe closed subspaces of $\mathbf{H}$ in terms of associated orthogonal projection operators
Projection operator expresses the first four properties of the spectral subspaces; those properties are similar to those of a measures, so we use the term projection-valued measure

Let $X$ be a set and $\Omega$ an $\sigma \text{-algebra}$ in $X$ .

A map $\mu : \Omega \to \mathcal{B}(\mathbf{H})$ is called a projection-valued measure if the following properties are satisfied:

For each $E \in \Omega$ , $\mu(E)$ is an orthogonal projection
$\mu(\emptyset) = 0$ and $\mu(X) = I$
If $E_1, E_2, \dots \in \Omega$ are disjoint, then for all $v \in \mathbf{H}$ , we have

$\begin{equation*} \mu \Bigg( \bigcup_{j = 1}^\infty E_j \Bigg) v = \sum_{j=1}^{\infty} \mu(E_j) v \end{equation*}$

where the convergence of the sum is in the norm-topology on $\mathbf{H}$ .
For all $E_1, E_2 \in \Omega$ , we have $\mu(E_1 \cap E_2) = \mu(E_1) \mu(E_2)$

Properties 2 and 4 in of a projection-valued measure tells us that if $E_1$ and $E_2$ are disjoint, then

$\begin{equation*} \mu(E_1) \mu(E_2) = 0 \end{equation*}$

from which it follows that the range of $\mu(E_1)$ and the range of $\mu(E_2)$ are perpendicular.

Let $\Omega$ be a $\sigma \text{-algebra}$ in a set $X$ and let $\mu: \Omega \to \mathcal{B}(\mathbf{H})$ be a projection-valued measure.

Then there exists a unique linear map, denoted

$\begin{equation*} f \mapsto \int_{\Omega} f \ d \mu \end{equation*}$

from the space of bounded, measurable, complex-valued functions on $\Omega$ into $\mathcal{B}(\mathbf{H})$ with the property that

$\begin{equation*} \left\langle \psi, \bigg( \int_X f \ d\mu \bigg) \psi \right\rangle = \int_X f \ d \mu_{\psi} \end{equation*}$

for all $f$ and all $\psi \in \mathbf{H}$ .

This integral has the following properties:

For all $E \in \Omega$ , we have

$\begin{equation*} \int_X 1_E \ d\mu = \mu(E) \end{equation*}$

In particular, the integral of the constant function $1$ is $I$ .
For all $f$ , we have

$\begin{equation*} \norm{\int_X f \ d \mu} \le \sup_{\lambda \in X} | f(\lambda)| \end{equation*}$
Integration is multiplicative: For all $f$ and $g$ , we have

$\begin{equation*} \int_X fg \ d \mu = \bigg( \int_X f \ d \mu \bigg) \bigg( \int_X g \ d \mu \bigg) \end{equation*}$
For all $f$ , we have

$\begin{equation*} \int_X \bar{f} \ d \mu = \bigg( \int_X f \ d \mu \bigg)^* \end{equation*}$

In particular, if $f$ is real-valued, then $\int_X f \ d \mu$ is self-adjoint.

By Property 1 and linearity, integration wrt. $\mu$ has the expected behavior on simple functions. It then follows from Property 2 that the integral of an arbitrary bounded measurable function $f$ can be comptued as follows:

Take sequence $s_n$ of simple functions converging uniformly to $f$
The integral of $f$ is then the limit, in the norm-topology, of the integral of the $s_n$ .

A quadratic form on a Hilbert space $\mathbf{H}$ is a map $Q: \mathbf{H} \to \mathbb{C}$ with the following properties:

$Q(\lambda \psi) = |\lambda|^2 Q(\psi)$ for all $\psi \in \mathbf{H}$ and $\lambda \in \mathbb{C}$
the map $L: \mathbf{H} \times \mathbf{H} \to \mathbb{C}$ defined by

$\begin{equation*} \begin{split} L(\phi, \psi) =& \frac{1}{2} \big[ Q(\phi + \psi) - Q(\phi) - Q(\psi) \big] \\ & - \frac{i}{2} \big[ Q(\phi + i \psi) - Q(\phi) - Q(i \psi) \big] \end{split} \end{equation*}$

is a sesquilinear form.

A quadratic form $Q$ is bounded if there eixsts a constant $C$ such that

$\begin{equation*} \left| Q(\phi) \right| \le C \norm{\phi}^2, \quad \forall \phi \in \mathbf{H} \end{equation*}$

The smallest such constant $C$ is the norm of $Q$ .

If $Q$ is a bounded quadratic form on $\mathbf{H}$ , there is a unique $A \in \mathcal{B}(\mathbf{H})$ such that

$\begin{equation*} Q(\psi) = \left\langle \psi, A \psi \right\rangle, \quad \forall \psi \in \mathbf{H} \end{equation*}$

If $Q(\psi)$ belongs to $\mathbb{R}$ for all $\psi \in \mathbf{H}$ , then the operator $A$ is self-adjoint.

Spectral Theorem for Bounded Self-Adjoint Operators: direct integral approach

Notation

$\mu$ is a $\sigma \text{-finite}$ measure on a $\sigma \text{-algebra}$ $\Omega$ of sets in $X$
For each $\lambda \in X$ we have a separable Hilbert space $\mathbf{H}_{\lambda}$ with inner product $\left\langle \cdot, \cdot \right\rangle_{\lambda}$
Elements of the direct integral are called sections $s$

Stuff

There are several benefits to this approach compared to the simpler "multiplication operator" approach.

The set and the function become canonical:
- $X = \sigma(A)$
- $h(\lambda) = \lambda$
The direct integral carries with it a notion of generalized eigenvectors / kets, since the space $\mathbf{H}_{\lambda}$ can be thought of as the space of generalized eigenvectors with eigenvalue $\lambda$ .
A simple way to classify self-adjoint operators up to unitary equivalence: two self-adjoint operators are unitarily equivalent if and only if their direct integral representations are equivalent in a natural sense.

Elements of the direct integral are called sections $s$ , which are functions on $X$ with values in the union of the $\mathbf{H}_{\lambda}$ , with property

$\begin{equation*} s(\lambda) \in \mathbf{H}_{\lambda} \quad \forall \lambda \in X \end{equation*}$

We define the norm of a section $s$ by the formula

$\begin{equation*} \norm{s}^2 = \int_X \left\langle s(\lambda), s(\lambda) \right\rangle_{\lambda} \ d \mu(\lambda) \end{equation*}$

provided that the integral on the RHS is finite.

The inner product between two sections $s_1$ and $s_2$ (with finite norm) should then be given by the formula

$\begin{equation*} \left\langle s_1, s_2 \right\rangle := \int_X \left\langle s_1(\lambda), s_2(\lambda) \right\rangle \ d \mu(\lambda) \end{equation*}$

Seems very much like the differential geometry section we know of.

$\mathbf{H}_{\lambda}$ is the fibre at each point $\lambda$ in the mfd.
$X$ is the mfd.

First we slightly alter the concept of an orthonormal basis. We say a family $\{ e_j \}$ of vectors is an orthonormal basis for a Hilbert space $\mathbf{H}$ if

$\begin{equation*} \left\langle e_j, e_k \right\rangle = 0, \quad j \ne k \end{equation*}$

and

$\begin{equation*} \norm{e_j} = 1 \text{ or } 0 \end{equation*}$

This just means that we allow some of the vectors in our basis to be zero.

We define a simultanouesly orthonormal basis for a family $\{ \mathbf{H}_{\lambda}, \lambda \in X \}$ of separable Hilbert spaces to be a collection $\{ e_j( \cdot ) \}_{j = 1}^\infty$ of sections with the property that

$\begin{equation*} \left\{ e_j(\lambda) \right\}_{j = 1}^\infty \text{ is a basis for } \mathbf{H}_{\lambda}, \quad \forall \lambda \in X \end{equation*}$

Provided that the function $\lambda \mapsto \dim \mathbf{H}_{\lambda}$ is a measurable function from $X$ into $[0, \infty]$ , it is possible to choose a simultaneous orthonormal basis $\{ e_j(\cdot) \}$ such that

$\begin{equation*} \left\langle e_j(\lambda), e_k(\lambda) \right\rangle \end{equation*}$

is measurable for all $j$ and $k$ .

Choosing a simultaneous orthonormal basis with the property that the function

$\begin{equation*} \lambda \mapsto \dim \mathbf{H}_{\lambda} \end{equation*}$

is a measurable function from $X$ into $[0, \infty]$ , we can define a section to be measurable if the function

$\begin{equation*} \lambda \mapsto \left\langle e_j(\lambda), s(\lambda) \right\rangle_{\lambda} \end{equation*}$

is a measurable complex-valued function for each $j$ . This also means that the $e_j$ are also measurable sections.

We refer to such a choice of simultaneous orthonormal basis as a measurability structure on the collection $\{ \mathbf{H}_{\lambda}, \lambda \in X \}$ .

Given two measurable sections $s_1$ and $s_2$ , the function

$\begin{equation*} \lambda \mapsto \left\langle s_1(\lambda), s_2(\lambda) \right\rangle_{\lambda} = \sum_{j=1}^{\infty} \left\langle s_1(\lambda), e_j(\lambda) \right\rangle_{\lambda} \left\langle e_j(\lambda), s_2(\lambda) \right\rangle_{\lambda} \end{equation*}$

is also measurable.

Suppose the following structures are given:

a $\sigma \text{-finite}$ measure space $(X, \Omega, \mu)$
a collection $\{ \mathbf{H}_{\lambda} \}_{\lambda \in X}$ of separable Hilbert spaces for which the dimension function is measurable
a measurability structure on $\{ \mathbf{H}_{\lambda} \}_{\lambda \in X}$

Then the direct integral of $\mathbf{H}_{\lambda}$ wrt. $\mu$ , denoted

$\begin{equation*} \int_X^{\oplus} \mathbf{H}_{\lambda} \ d \mu(\lambda) \end{equation*}$

is the space of equivalence classes of almost-everywhere-equal measurable sections $s$ for which

$\begin{equation*} \norm{s}^2 := \int_X \left\langle s(\lambda), s(\lambda) \right\rangle_{\lambda} \ d \mu(\lambda) < \infty \end{equation*}$

The inner product $\left\langle s_1, s_2 \right\rangle$ of two sections $s_1$ and $s_2$ is given by the formula

$\begin{equation*} \left\langle s_1, s_2 \right\rangle := \int_X \left\langle s_1(\lambda), s_2(\lambda) \right\rangle_{\lambda} \ d \mu(\lambda) \end{equation*}$

If $A \in \mathcal{B}(\mathbf{H})$ is self-adjoint, then there exists a σ-finite measure $\mu$ on $\sigma(A)$ , a direct integral

$\begin{equation*} \int_{\sigma(A)}^{\oplus} \mathbf{H}_{\lambda} \dd{\mu(\lambda)} \end{equation*}$

and a unitary map $U$ between $\mathbf{H}$ and the direct integral such that

$\begin{equation*} \Big( U A U^{-1}(s) \Big)(\lambda) = \lambda \ s(\lambda) \end{equation*}$

for all sections $s \in \int_{\sigma(A)}^{\oplus} \mathbf{H}_{\lambda} \dd{\mu(\lambda)}$ .

Proofs

Notation

$A \in \mathcal{B}(\mathbf{H})$ with spectral radius

$\begin{equation*} R(A) := \sup_{\lambda \in \sigma(A)} \left| \lambda \right| \end{equation*}$

Stage 1: Continuous Functional Calculus

Stage 2: An Operator-Valued Riesz Representation Theorem

Let $X$ be a compact metric space and let $\mathcal{C}(X; \mathbb{R})$ denote the space of continuous, real-valued functions on $X$ .

Suppose $\Lambda: \mathcal{C}(X ; \mathbb{R}) \to \mathbb{R}$ is a linear functional with the property that $\Lambda(f)$ is non-negative if $f$ is non-negative.

Then there exists a unique (real-valued, positive) measure $\mu$ on the Borel sigma-algebra in $X$ for which

$\begin{equation*} \Lambda(f) = \int_X f \ d \mu, \qquad \forall f \in \mathcal{C}(X ; \mathbb{R}) \end{equation*}$

Observe that $\mu$ is a finite measure, with

$\begin{equation*} \mu(X) = \Lambda(\mathbf{1}) \end{equation*}$

where $\mathbf{1}$ is the constant function.

Continuous one-parameter groups

$C_0$ semigroup or strongly continuous one-parameter semigroup

A strongly continuous one-parameter semigroup on a Banach space $X$ is a map $T: \mathbb{R}_+ \to L(X)$ such that

$T(0) = I$ (i.e. $X$ identity operator on $X$
$\forall t, s \ge 0$ we have

$\begin{equation*} T(t + s) = T(t) T(s) \end{equation*}$
$\forall x_0 \in X$ we have

$\begin{equation*} \lim_{t \downarrow 0} \norm{T(t) x_0 - x_0} = 0 \end{equation*}$

The first two axioms are algebraic, and state $T$ is a representation of a semigroup $\big( \mathbb{R}_+, + \big)$ , and the last axiom is topological and states that $T$ is continuous in the strong operator topology.

Infinitesimal generator

The infinitesimal generator $A$ of a strongly continuous semigroup $T$ defined by

$\begin{equation*} A x = \lim_{t \downarrow 0 } \frac{(T(t) - I) x}{t} \end{equation*}$

whenever the limit exists.

The domain of $A$ , denoted $D(A)$ , is the set of $x \in X$ for which the limit does exist; $D(A)$ is a linera subspace and $A$ is linear on this domain.

The operator $A$ is closed, although not necessarily bounded, and the domain $D(A)$ is dense in $X$ .

The storngly continuous semigroup $T$ with generator $A$ is often denoted by the symbol $e^{t A}$ , which is compatible with the notation for matrix exponentials.

Reproducing Kernel Hilbert Spaces (RKHSs)

Definitions

Let $\mathcal{X}$ be a non-empty set, sometimes reffered to as the index set.

A symmetric function $K : \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ is called a positive-definite kernel of $\mathcal{X}$ if

$\begin{equation*} \sum_{i=1}^{n} \sum_{j=1}^{n} c_i c_j K(x_i, x_j) \ge 0 \end{equation*}$

holds for any $n \in \mathbb{N}$ , $x_1, \dots, x_n \in \mathcal{X}$ , $c_1, \dots, c_n \in \mathbb{R}$ .

Or more generally, for some field real or complex field $\mathbb{K} \in \left\{ \mathbb{R}, \mathbb{C} \right\}$ , a function $K: \mathcal{X} \times \mathcal{X} \to \mathbb{K}$ is called a kernel on $\mathcal{X}$ if there exists a $\mathbb{K}-\text{Hilbert space } H$ and a map $\Phi: \mathcal{X} \to H$ such that for all $x, x' \in \mathcal{X}$ we have

$\begin{equation*} K(x, x') = \left\langle \Phi(x'), \Phi(x) \right\rangle_H \end{equation*}$

In machine learning context, you'll often see $\Phi$ be called a feature map and $H$ a feature space of $K$ .

The definition of a positive-definite kernel is equivalent to the following: A symmetric function $k$ is positive-definite if the matrix $k_{X X} \in \mathbb{R}^{n \times n}$ with elements

$\begin{equation*} \big[ k_{X X} \big]_{ij} = k(x_i, x_j) \end{equation*}$

is positive semi-definite for any finite set $X := \left\{ x_1, \dots, x_n \right\} \in \mathcal{X}^n$ of any size $n \in \mathbb{N}$ .

This matrix $k_{X X}$ is often referred to as the kernel matrix or Gram matrix.

Note the swapping of order in the kernel and inner product above; this is only necessary in the case of a complex Hilbert space, where the inner product is sesquilinear:

$\begin{equation*} \left\langle y, \alpha y' \right\rangle = \overline{\alpha} \left\langle y', y \right\rangle \end{equation*}$

Examples of kernels

Let

$\mathcal{X} \subseteq \mathbb{R}^d$
$\gamma > 0$

Then a Gaussian RBF kernel or square exponential kernel is defined

$\begin{equation*} \begin{split} k_{\gamma}: \quad & \mathcal{X} \times \mathcal{X} \to \mathbb{R} \\ & (x, x') \mapsto k_{\gamma}(x, x') := \exp \bigg( - \frac{\norm{x - x'}^2}{\gamma^2} \bigg), \quad x, x' \in \mathcal{X} \end{split} \end{equation*}$

Let

$\mathcal{X} \subseteq \mathbb{R}^d$
$\alpha > 0$
$h > 0$

The Matérn kernel is defined

$\begin{equation*} \begin{split} k_{\alpha, h}: \quad & \mathcal{X} \times \mathcal{X} \to \mathbb{R} \\ & (x, x') \mapsto k_{\alpha, h}(x, x') := \frac{1}{2^{\alpha - 1} \Gamma(\alpha)} \bigg( \frac{\sqrt{2 \alpha} \norm{x - x'}}{h} \bigg)^{\alpha} K_{\alpha} \bigg( \frac{\sqrt{2 \alpha} \norm{x - x'}}{h} \bigg), \quad x, x' \in \mathcal{X} \end{split} \end{equation*}$

where

$\Gamma$ is the gamma-function
$K_{\alpha}$ is the modified Bessel function of the second kind of order $\alpha$

$h$ determines the scale
determines the smoothness of the functions in the associated RKHS
- As $\alpha$ increases, the functions get smoother

If $\alpha$ can be written as

$\begin{equation*} \alpha = m + \frac{1}{2} \end{equation*}$

for $m \in \mathbb{N}$ , then the expression for the Matérn kernel reduces to a product of the exponentiated function and a polynomial of degree $m$ , which can be computed easily rasmussen2006gaussian:

$\begin{equation*} k_{\alpha, h}(x, x') = \exp \bigg( - \frac{\sqrt{2 \alpha} \norm{x - x'}}{h} \bigg) \frac{\Gamma(m + 1)}{\Gamma(2m + 1)} \sum_{i=1}^{m} \frac{(m + 1)!}{i! (m - 1)!} \bigg( \frac{\sqrt{8 \alpha} \norm{x - x'}}{h} \bigg)^{m - i} \end{equation*}$

For example $\alpha = \frac{1}{2}$ results in whats know as the Laplace or exponentiated kernel:

$\begin{equation*} k_{1 / 2, h}(x, x') = exp \bigg( - \frac{\norm{x - x'}}{h} \bigg) \end{equation*}$

Gaussian RBF kernels can be obtained as limits of Matérn kernels for $\alpha \to \infty$ , i.e. for a Matérn kernel $k_{\alpha, h}$ with $h > 0$ being fixed, we have

$\begin{equation*} \lim_{\alpha \to \infty} k_{\alpha, h}(x, x') = \exp \bigg( - \frac{\norm{x - x'}^2}{2h^2} \bigg), \quad x, x' \in \mathbb{R}^d \end{equation*}$

Let

$\mathcal{X} \subseteq \mathbb{R}^d$
$c > 0$
$m \in \mathbb{N}$

Then a polynomial kernel is defined

$\begin{equation*} \begin{split} k_{m, c}: \quad & \mathcal{X} \times \mathcal{X} \to \mathbb{R} \\ & (x, x') \mapsto k_{m, c}(x, x') := (x^T x' + c)^m, \quad x, x' \in \mathcal{X} \end{split} \end{equation*}$

TODO Kernel embeddings

Exam prep

May 2017

1

c

If $f_m: \mathbb{R} \to \mathbb{R}$ uniformly continuous, and $f_m \to f$ uniformly then $f$ is also uniformly continuous.

Given $\varepsilon > 0$ , find $N$ such that

$\begin{equation*} \forall m \ge N : \quad | f_m(z) - f(x)| < \frac{\varepsilon}{3} \end{equation*}$

In particular $m = N$ ,

$\begin{equation*} | f_N(x) - f(x)| < \frac{\varepsilon}{3}, \quad \forall x \in \mathbb{R} \end{equation*}$

$f_N$ is uniformly continuous, implies

$\begin{equation*} \forall \varepsilon > 0 \quad \exists \delta > 0 : \quad | x - y | < \delta \implies |f(x) - f(y)| < \varepsilon \end{equation*}$

$\begin{equation*} |\underbrace{f(x) - f_N(x)}_{< \varepsilon / 3} + \underbrace{f_N(x) - f_N(y)}_{< \varepsilon / 3} + \underbrace{f_N(y) - f(y)}_{< \varepsilon / 3}| < \varepsilon, \quad \forall | x - y | < \delta \end{equation*}$

Compactness

$C \subseteq X$ , $C$ is closed and $X$ is compact, then $C$ is also compact.

Let $\big( U_i \big)$ be any open cover of $C$ .

$\begin{equation*} U_{i \in I} U_i \supset C \end{equation*}$

$\big( U_i \big)$ might not cover whole $X$ . $X \setminus C$ is open, therefore

$\begin{equation*} \big( U_i \big)_{i \in I} \bigcup \left\{ X \setminus C \right\} \end{equation*}$

is an open cover of $X$ . Thus, due to $X$ being compact, there exists a finite subcover of the above cover,

$\begin{equation*} \exists i_1, i_2, \dots, i_k : U_{i_1} \cup U_{i_2} \cup \dots \cup U_{i_k} \cup (X \setminus C) \end{equation*}$

covers $X$ , then clearly $U_{i_1} \cup U_{i_2} \cup \dots \cup U_{i_k}$ is a finite cover of $C$ , hence $C$ is compact, as claimed.

Proving connectedness of a set

Best way to prove a set $A$ is connected is to check if it's path connected
Best way to prove a set is not connected (disconnected), we use the definition of connectedness:
- Find 2 open sets $U, V$ such that
  
  $\begin{equation*} A \subset U \cup V, \quad U \cap V = \emptyset \end{equation*}$
  
  i.e. two sets such that the union covers $A$ but they share no elements.

Example: $A = \mathbb{Q}^2 \subset \mathbb{R}^2$

$U = \left\{ (x, y) : x < \pi \right\}$
$V = \left\{ (x, y) : x > \pi \right\}$
$\mathbb{Q}^2 \subset U \cup V$

Analysis

Table of Contents

Defintions

General

Lp space

p-norm

Banach space

Sequences

Sequences of real numbers

Bounded sequences

Cauchy Sequence

TODO Uniform convergence

Series of functions

Pointwise convergence

Remarks

Uniform convergence

Continuity

Lipschitz continuity

Hölder continuity

Càdlàg function

Affine space

Schwartz space

Covering and packing

Theorems

Cauchy's Theorem

Telescoping series

Bolzano-Weierstrass Theorem

Triangle Inequality

Mean Value Theorem

Rolle's Theorem

Intermediate Value Theorem

Useful identities

Upper bound on abs of sin(x)

M-test

Fixed Point Theory

Banach Fixed Point Theorem

Fundamental Contraction Inequality

Measure

Definition

Motivation

Measure space

Sigma-algebra

Borel sigma-algebra

Lebesgue sigma-algebra

Product measure

Complete measure

Lebesgue measure

Intuition

Lebesgue Integral

Special case: non-negative real-valued function

Measurable function

Radon measure

Continuity of measure

Density

Measure-preserving transformation

Sobolev space

Notation

Definition

Motivation

Example

Ergodic Theory

Limits of sequences

Infinite Series of Real Numbers

Theorems

Abel's formula

Infinite Series of Functions

Uniform Convergence

Theorems

Cauchy criterion

Generally about uniform convergence

Problems

7.2.4

TODO 7.2.5

TODO 7.2.6

Workshop 2

Uniform Continuity

Theorems

Problems

Workshop 3

Power series

L^p space