Normalizing flows

"Why the name?!"

I've received this question or a similar one on more than one occation. Normalizing flows is really not a good name for what it is supposed to represent. It is also unclear whether or not it refers to the base distribution together with the transformation, or just the transformation itself, ignoring the base distribution which is transformed.

• "Is this related to gradient flows in differential equations and manifolds?" which is a completely valid question because that is indeed what it sounds like it is! I'd say something like "Well, kind of, but also not really. A gradient flow could technically be used in a normalizing flow but it's way too strict of an requirement; you really just need a differentiable bijection with a differentiable inverse. A gradient flow is indeed this (at least locally), but yeah, you can also just have, say, addition by a constant factor."

Today, I'd say a normalizing flow is a piecewise [[file:..mathematicsgeometry.org::def:diffeomorphism][diffeomorphic]] [[file:..mathematicsmeasuretheory.org::def:push-forward-measure][push-forward]], or in simpler terms, it's a differentiable function with a differentiable inverse together with a base distribution on with a density .

So, why isn't it just called a this? It seems like the term normalizing flows was popularized in 2014 by rezende15_variat_infer_with_normal_flows. This paper refers to the "method of normalizing flows" from tabak2010density in which we can probably find the first use of the term. In this work they do the following

1. Define

where is a known density and denotes the Jacobian of the map .

2. Define the mapping as an "infinite composition of infinitesimal transformations", i.e. a (gradient) flow s.t.

3. Define

then

4. Given a set of samples , we can measure the quality of the estimated density by the log-likelihood, treating it as a functional on :

This suggests constructing the flow by following the direction of ascent of , i.e. s.t.

and such that is the (local) minimizer.

where

Note that

In this work they describe a method of defining a (gradient) flow

Basically, in this work they define a gradient flow using the log-likelihood which transforms from a known density to the unknown density . They also consider the dual of the flow, which transforms from to , which they refer to as "transforming to normality", i.e. a normalizing flow in the sense that it's a flow which transforms a density to normality / a normal distribution. For practical purposes they consider "infinitesimal" small additive changes, which is basically what we today refer to as residual normalizing flows behrmann18_inver_resid_networ. They also point out that the work of "Gaussianization" which was done in 2002 follow a similar idea, though not using flows.