# Optimization

## Table of Contents

## THE resource

## Notation

**Support of a vector**is denoted byA vector is referred to as

**s-sparse**if- denotes the number of zero-entries in
- denote A's
**singular values** **Frobenius norm**of**Nuclear norm**of**Spectral norm**of**Random variables**are denoted with upper-case letters

## Definitions

### Algebraic Riccati equation

The **(algebraic) Riccati equation** is problem of finding an unknown matrix in

*Continuous*case:*Discrete*case:

where , , , and are known real coefficient matrices.

## Lagrange multipliers

### TODO Modern formulation via differentiable manifolds

#### Notation

- smooth manifold
- is a differentiable function we wish to maximize
- is a local chart

#### Stuff

By extension of Fermat's Theorem the local maxima are only points where the exterior derivative .

In particular, the extrema are a subset of the critical points of the function .

This is just a mapping from to

## Convexity

A **convex combination** of a set of vectors in an arbitrary real space is a vector

where:

A **convex set** is a subset of an affince space that is closed under convex combinations.

More specifically, in a Euclidean space, a **convex region** is a region where, for every pair of points within the region, every point on the straight line segment that joins the pair of points is also within the region, i.e. we say the set is a **convex set** if and only if

The *boundary* of a **convex set** is always a **convex curve**, i.e. curve in the Euclidean plane which lies completely on one side of it's tangents.

The intersection of all convex sets containing a given subset of Euclidean space is called the **convex hull** of , i.e. the smallest convex set containing the subset .

It's like a "convex closure" of .

A continuously differentiable function is considered **convex** if

A continuously differentiable function is considered * strongly convex* (SC) and *$β strongly smooth* (SS) if

### Simplex

A **simplex** is a generalization of the notion of a triangle or a tetrahedron to arbitrary dimensions.

Specifically, a **k-simplex** is a k-dimensional polytope which is the convex hull of its vertices.

More formally, suppose the points are affinely independent, which means are linearly independent. Then, the simplex determined by them is the set of points

### Subgradient

#### Notation

- denotes the subgradient of

#### Stuff

We say is a **subgradient** of at if

where replaces the gradient in the definition of a convex function.

Observe that the line is a *global lower bound* of .

We say a function is **subdifferentiable at ** if there exists at least *one* subgradient at .

The *set of all subgradients at * is called the **subdifferential**: .

Some remarks:

- is convex and differentiable .
- Any point , there can be 0, 1, or infinitely many subgradients.
- is
*not*convex.

If , then is a *global minimizer* of .

This is easily seen from definition of subgradient, since if , then , i.e. global minimizer.

In general there might exist multiple global minimizers.

#### Subgradient Descent

- Suppose is convex, and we start optimizing at
- Repeat
Step in a negative subgradient direction:

where is the step-size and .

The following theorem tells us that the subgradient gets us closer to the minimizer.

Suppose is convex.

- Let for
- Let be any point for which

Then for small enough ,

i.e. it's a contraction mapping and it has a unique fixed point at

That is, the negative subgradient step gets us closer to minimizer.

Suppose is convex.

- Let for
- Let be any point for which

Then

where we've used the definition a subgradient to get the inequality above.

Now consider .

- It's a convex quadration (facing upwards)
Has zeros at

Therefore, due intermedite value theorem and convexity (facing upwards), we know the term above is negative if

Hence,

We then observe that if is Lipschitz, we have

with , then the **subgradient descent** using a *decreasing step-size* is a contraction mapping with a fixed point at ! Hence,

If we're using a *fixed step-size*, we cannot guarantee the term considered in the proof of subgradient descent decreasing the objective function is negative (since at some point we get too close to , and so the *fixed* we're using is not in the required interval).

That is why we use a *decreasing step-size* in the theorem above.

Using a *fixed step-size*, we still have the following bound

## Non-convex optimization

### Convex Relaxation

- Relax problem such that we end up with a
*convex*problem - E.g. relaxation of
*sparse linear regression*, you end up with Lasso regression- It's really regression, but as you can read on that link, it can be viewed as a approximation to the non-convex "norm", (it's not
*really*a norm)

- It's really regression, but as you can read on that link, it can be viewed as a approximation to the non-convex "norm", (it's not
- Can sometimes talk about the
*relaxation gap*, which tells how different the optimal convex solution is from the non-convex - Relaxation-approach can often be very hard to
*scale*

### TODO Convex Projections

- You're at Ch. 2.2 in jain17_non_convex_optim_machin_learn

## Under uncertainty

### Stochastic programming

#### Stochastic Linear Programming

A **two-stage stochastic linear program** is usually formulated as

The *first stage* of the problem is the minimization above, which is performed *prior to the realization of the uncertain* parameter .

The *second stage* is finding the variables .

## Linear quadratic programming

- Class of dynamic optimization problems
- Related to Kalman filter:
- Recursive formulations of linear-quadratic control problems and Kalman filtering both involve matrix Riccati equations
- Classical formulations of linear control and linear filtering problems make use of similar matrix decompositions

- "Linear" part refeers refers to the "motion" of the state
- "Quadratic" refers to the preferences