Optimization

Table of Contents

THE resource

Notation

  • Support of a vector optimization_40fb973cd997849a029e392c385d32d4b8c40196.png is denoted by

    optimization_2c0309f862bbc62daeb41221c73984ee075e4438.png

    A vector optimization_7a84c9a383f9772338016d101ccc096be06af784.png is referred to as s-sparse if optimization_89ec6a6418e717f012f0ca8c30b071c6bf74c81a.png

  • optimization_0691edcc5337f9ad897ffa50ab35497afe5cae53.png denotes the number of zero-entries in optimization_40fb973cd997849a029e392c385d32d4b8c40196.png
  • optimization_fe2df43dff36c382731198f47bc2f61338d88684.png denote A's singular values
  • Frobenius norm of optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png

    optimization_32b4467a2a72cb3dd1c6fe02cfe095f93028923a.png

  • Nuclear norm of optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png

    optimization_79ab29b9f39f515ef1867341c00e0805204e54bf.png

  • Spectral norm of optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png

    optimization_c3ee97af8a3fd3d4068f13cd3838cabdc35e5a6e.png

  • Random variables are denoted with upper-case letters optimization_88a185ba59d680dd9b8401e7aec99e98938de712.png

Definitions

Algebraic Riccati equation

The (algebraic) Riccati equation is problem of finding an unknown matrix optimization_aa07b3a8458adb2855b54064282b1f340d44fbd4.png in

  • Continuous case:

    optimization_81831277379d2e072c7120f3674790d4978cc64e.png

  • Discrete case:

    optimization_5e878270b5bf527198d44f77195fbc659c152fb8.png

where optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png, optimization_9492d908c579945ca9e745601e13c2e7ae1ffd0e.png, optimization_9fe46e6ccfd065fb50f2eeb9cbdc65835e9ec321.png, and optimization_3d136f0fc4860468633907421c098b9feb0eef24.png are known real coefficient matrices.

Lagrange multipliers

TODO Modern formulation via differentiable manifolds

Notation

  • optimization_83bc72bfcb75deba5f65643ddbce6fa2cf599fa4.png smooth manifold
  • optimization_1df0101de62ea6a3043bb165c218c9cb46a6a802.png is a differentiable function we wish to maximize
  • optimization_2d8e2df8ccfef4be6866e20fb607661489515d5f.png
  • optimization_e1ec2fa78832414432238c1d35dc325758d5d93f.png is a local chart

Stuff

By extension of Fermat's Theorem the local maxima are only points where the exterior derivative optimization_0716a16dbf7e9ae0447a82b2d09d1d7bf5230f3c.png.

In particular, the extrema are a subset of the critical points of the function optimization_8729cb4f620282e7381dadb8ac074674629e65d8.png.

This is just a mapping from optimization_c0602cbb7fa0fbddcb3ef769ca192057e559855a.png to

Convexity

A convex combination of a set of optimization_1dbee3eb7b7bd0453a691130ee5d0d2bc8d46f23.png vectors optimization_8ecfc314a261df44dfbeaeb8773f917c2d67e4f1.png in an arbitrary real space is a vector

optimization_eed923f039de4dac8fc606f9cf72ddb662645397.png

where:

  • optimization_b6aebcf37f78b3f85f97c0bd69e20f3f920f3e22.png
  • optimization_9614d4759dea697408ef4d907ceb6fe54c0f4033.png
  • optimization_0706bbb1b2c3ee277cf3793ba7b69b751d948d92.png

A convex set is a subset of an affince space that is closed under convex combinations.

More specifically, in a Euclidean space, a convex region is a region where, for every pair of points within the region, every point on the straight line segment that joins the pair of points is also within the region, i.e. we say the set optimization_e1606e7346bacb31c433e48ab9a27150bcbdc1f8.png is a convex set if and only if

optimization_e3b600e64804dddb3ee548a2fd5061b72e442843.png

The boundary of a convex set is always a convex curve, i.e. curve in the Euclidean plane which lies completely on one side of it's tangents.

The intersection of all convex sets containing a given subset optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png of Euclidean space is called the convex hull of optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png, i.e. the smallest convex set containing the subset optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png.

It's like a "convex closure" of optimization_e46729bc781c25bbc7120ee2892cc1c0215af7da.png.

A continuously differentiable function optimization_8a52e3ae6df71c2d70c4dcb6cdbfa0b76d37a9ac.png is considered convex if

optimization_13a0cb429bb0af86f6c7cef976dd568c250cb2ec.png

A continuously differentiable function optimization_8a52e3ae6df71c2d70c4dcb6cdbfa0b76d37a9ac.png is considered optimization_216baa1dc1da1eb3f1765b89622830a7cef9cccc.png strongly convex (SC) and $β strongly smooth (SS) if

optimization_5af2025c1929f0a62480186bfab77c165ef78422.png

Simplex

A simplex is a generalization of the notion of a triangle or a tetrahedron to arbitrary dimensions.

Specifically, a k-simplex is a k-dimensional polytope which is the convex hull of its optimization_5a565ece572d54b17995317acee2accc6ec2f821.png vertices.

More formally, suppose the optimization_5a565ece572d54b17995317acee2accc6ec2f821.png points optimization_6de352dec85a58bda92468f0add65c9fb002746e.png are affinely independent, which means optimization_81c48fe7ff9d2910938c1f28ecabaf426ea15dbb.png are linearly independent. Then, the simplex determined by them is the set of points

optimization_d3061520dea74932a3aec2b243dd663354790d92.png

Subgradient

Notation

Stuff

We say optimization_f858c91ad407f3ec0836df68589789c8bfb62d2e.png is a subgradient of optimization_a6aef7527978c82c85b0a7edb6381516f255e08b.png at optimization_7a84c9a383f9772338016d101ccc096be06af784.png if

optimization_cf479e5613991bb2d4df0059fe46bedfc4954d92.png

where optimization_5eda5bc8724030dff96b49f83b0e9ba134ffc700.png replaces the gradient in the definition of a convex function.

Observe that the line optimization_202ef0ac7a9456556149fd8ef1c5b1bddde65ec8.png is a global lower bound of optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png.

We say a function optimization_a6aef7527978c82c85b0a7edb6381516f255e08b.png is subdifferentiable at optimization_7a84c9a383f9772338016d101ccc096be06af784.png if there exists at least one subgradient at optimization_7a84c9a383f9772338016d101ccc096be06af784.png.

The set of all subgradients at optimization_7a84c9a383f9772338016d101ccc096be06af784.png is called the subdifferential: optimization_e30ae45cd919c5d721878587f2f3ec8734915a5e.png.

Some remarks:

  • optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png is convex and differentiable optimization_85a48dfbdbafa5eb610a72628a4edeb4106be807.png optimization_8c5b86aea3f3069edb3fb10e1630807c17b40edc.png.
  • Any point optimization_7a84c9a383f9772338016d101ccc096be06af784.png, there can be 0, 1, or infinitely many subgradients.
  • optimization_e3ddd8aa2b5c164c820ecfb2543d30fc36fc3877.png is not convex.

If optimization_50067f8a4379a7c4c99001fab03fe642f89f5666.png, then optimization_7a84c9a383f9772338016d101ccc096be06af784.png is a global minimizer of optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png.

This is easily seen from definition of subgradient, since if optimization_aca5f96b06a711f3752551c212f0066d7607e623.png, then optimization_14d24ce6043a533e792171ac89828baa11930a4a.png, i.e. global minimizer.

In general there might exist multiple global minimizers.

Subgradient Descent

  1. Suppose optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png is convex, and we start optimizing at optimization_c462c980a45481116745a8647094b2b1d245df0f.png
  2. Repeat
    1. Step in a negative subgradient direction:

      optimization_85806ba38983afee91afc1f790a7bd53a65969de.png

      where optimization_1452aaed016726fd389a85854c1b53f56436757e.png is the step-size and optimization_bbf77f706851ea2eaeef55c33e3b355b0d3d5505.png.

The following theorem tells us that the subgradient gets us closer to the minimizer.

Suppose optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png is convex.

  • Let optimization_3c816bb2c5b58e718093235cddb4fc77d4852ee6.png for optimization_bbf77f706851ea2eaeef55c33e3b355b0d3d5505.png
  • Let optimization_fa87ab9a5006bb5bd8b6c54e2ce82b0507c166d1.png be any point for which optimization_1ceb240d0c11c723c351bdcd13e3014a7c9daad3.png

Then for small enough optimization_1452aaed016726fd389a85854c1b53f56436757e.png,

optimization_ef21016acb91205621f1c32c6573ad5e773a5abe.png

i.e. it's a contraction mapping and it has a unique fixed point at

optimization_95e627c03a86329513c29c3ae06ccbfe9bc57e49.png

That is, the negative subgradient step gets us closer to minimizer.

Suppose optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png is convex.

  • Let optimization_3c816bb2c5b58e718093235cddb4fc77d4852ee6.png for optimization_bbf77f706851ea2eaeef55c33e3b355b0d3d5505.png
  • Let optimization_fa87ab9a5006bb5bd8b6c54e2ce82b0507c166d1.png be any point for which optimization_1ceb240d0c11c723c351bdcd13e3014a7c9daad3.png

Then

optimization_6b5343081c753503bffb448eefc0567d29697e6e.png

where we've used the definition a subgradient to get the inequality above.

Now consider optimization_daf671bd928811fd30e1389d25c2e0c9ff1fda9b.png.

  • It's a convex quadration (facing upwards)
  • Has zeros at

    optimization_fef23fc8701b80dafb5de4d8a85232136e71f3b9.png

  • Therefore, due intermedite value theorem and convexity (facing upwards), we know the term above is negative if

    optimization_72b88634f36025fe5b2a3ab71cb08a91a1fd8a27.png

    Hence,

    optimization_bedf9281d731bf5994d5a95ddfed99905f13685a.png

We then observe that if optimization_93369077affb352dbdb97e8b3182fd50784f2b14.png is Lipschitz, we have

optimization_7fb1e00ae53a24a52eab996e3d287b672a764739.png

with optimization_250173332a3b66ab3def45624080ed0d6e35fdcd.png, then the subgradient descent using a decreasing step-size is a contraction mapping with a fixed point at optimization_fa87ab9a5006bb5bd8b6c54e2ce82b0507c166d1.png! Hence,

optimization_07419e3e7e71200cb4e2a52825f0d51804a77efd.png

If we're using a fixed step-size, we cannot guarantee the term considered in the proof of subgradient descent decreasing the objective function is negative (since at some point we get too close to optimization_fa87ab9a5006bb5bd8b6c54e2ce82b0507c166d1.png, and so the fixed optimization_a270d2f0aac3268943aea13ffd8c0dd862578f2e.png we're using is not in the required interval).

That is why we use a decreasing step-size in the theorem above.

Using a fixed step-size, we still have the following bound

optimization_06f80014b279cb2f26cfe33b6f1113b1656be2ee.png

Non-convex optimization

Convex Relaxation

  • Relax problem such that we end up with a convex problem
  • E.g. relaxation of sparse linear regression, you end up with Lasso regression
    • It's really optimization_9d5b999fd528045b7f6fe9cdeb2a638ab2c7f29b.png regression, but as you can read on that link, it can be viewed as a approximation to the non-convex optimization_ced06aabfca09775c20d9408b5d73bd56764f77b.png "norm", optimization_ec90c32b29741e839161e6f5c981bbb14516c7d4.png (it's not really a norm)
  • Can sometimes talk about the relaxation gap, which tells how different the optimal convex solution is from the non-convex
  • Relaxation-approach can often be very hard to scale

TODO Convex Projections

Under uncertainty

Stochastic programming

Stochastic Linear Programming

A two-stage stochastic linear program is usually formulated as

optimization_4934f74881c81b888c8d57e9c7d69c005a06484d.png

The first stage of the problem is the minimization above, which is performed prior to the realization of the uncertain parameter optimization_5ae6b36dba66b3d59a4a16a87b3f05f8bdab18de.png.

The second stage is finding the variables optimization_eb83d466c7d035356e9f39998f357cee73da1e26.png.

Linear quadratic programming

  • Class of dynamic optimization problems
  • Related to Kalman filter:
    • Recursive formulations of linear-quadratic control problems and Kalman filtering both involve matrix Riccati equations
    • Classical formulations of linear control and linear filtering problems make use of similar matrix decompositions
  • "Linear" part refeers refers to the "motion" of the state
  • "Quadratic" refers to the preferences