Variational Inference

Table of Contents


In variational inference, the posterior distribution over a set of unobserved variables variational_inference_f7b3f4f8a0c2f177bcf2dbb15c00a4be1f2b3955.png given some data variational_inference_b21bd8b95668becc72730c55b05e2dc1d75f6b27.png is approximated by a variational distribution, variational_inference_61e02c0f58ea559006f846ee624b87f336786a2b.png :


The distribution variational_inference_a18669c8da0612193cbfca0b919fa46543470700.png is restricted to belong to a family of distributions of simpler form than variational_inference_aeadcd9da495d171166b0daa93d0b039d325af52.png, selected with the intention of making variational_inference_a18669c8da0612193cbfca0b919fa46543470700.png similar to the true posterior, variational_inference_aeadcd9da495d171166b0daa93d0b039d325af52.png.

We're basically making live simpler for ourselves by casting the approximate conditional inference as an optimization problem.

The evidence lower bound (ELBO)

  • Specify a family of densities over the latent variables
  • Each variational_inference_a2a36af5c818c3adbe8f56a2a85a63d646a01fbf.png is a candidate to the exact conditional distribution
  • Goal is then to find the best candidate, i.e. the one closest in KL divergence to the exact conditionanl distribution


  • variational_inference_e2d8509fdbc9c132828402cde378d60555a80ec1.png is the best approx. to the conditional distribution, within that family
  • However, the equation above requires us to compute the log evidence variational_inference_2dade119b630e613ded4a21c4fcad100382027ba.png which may be intractable, OR DOES IT?!


  • Buuut, since we want to minimize wrt. variational_inference_a2a36af5c818c3adbe8f56a2a85a63d646a01fbf.png, we can simply minizime the above, without having to worry about variational_inference_2dade119b630e613ded4a21c4fcad100382027ba.png !
  • By dropping constant term (wrt. variational_inference_9974194a836f8c06efda1c78a55ab184c6c2d746.png) and moving the sign outside, we get


  • Thus, maximizing the variational_inference_9c25009abf3fec3aa6252efa11d540352a69e432.png is equivalent to minimizing the variational_inference_067ee973edce4702e251738334a098ed417efd8d.png divergence

Why use the above representation of the ELBO as our objective function? Because variational_inference_abd8fdd47cba587a6f46d64f9b30079ca07ba0f3.png can be rewritten as variational_inference_e33313f90866c999cc1201aa1d87ab68e961e8bc.png, thus we simply need to come up with:

  • Model for the likelihood given some latent variables variational_inference_8a2541015659aa86ee2bb832f7dbb6a3945791aa.png
  • Some posterior probability for the latent variables variational_inference_8a2541015659aa86ee2bb832f7dbb6a3945791aa.png
  • We can rewrite the variational_inference_9c25009abf3fec3aa6252efa11d540352a69e432.png to give us a some intuition about the optimal variational density


  • Basically says, "the evidence lower bound of is the maximization of the likelihood and minimization of the divergence from the prior distribution, combined"
    • variational_inference_8c3fd2fa5c203c8df53a22ceb2b9aab643ddf8fc.png encourages densities that increases the likelihood of the data
    • variational_inference_70e1c780bc21a512e072474971f079168ec7df90.png encourages densities close to the prior
  • Another property (and the reason for the name), is that it puts a lower bound on the (log) evidence, variational_inference_3a955bd43c5b65af6f776d7e7cb0828d29d40c7b.png


  • Which means that if we increase the variational_inference_beb64450790a8be2f8a8ccd4b77dafb4bf5a64c8.png => variational_inference_a11b4674e5bf9429c6e437abe1b3b1c5ad7ecb45.png must decrease

Examples of families

Mean-field variational family

  • Assumes the latent variables variational_inference_8a2541015659aa86ee2bb832f7dbb6a3945791aa.png to be mutually independent, i.e.


  • Does not model the observed data, variational_inference_17da0758469180666a34cfa1eb0d65bd169e80ca.png does not appear in the equation => it's the variational_inference_9c25009abf3fec3aa6252efa11d540352a69e432.png which connects the fitted variational density to the data


Coordinate ascent mean-field variational inference (CAVI)


  • variational_inference_17da0758469180666a34cfa1eb0d65bd169e80ca.png is the data
  • variational_inference_8a2541015659aa86ee2bb832f7dbb6a3945791aa.png is the latent variables of our model for variational_inference_d6440881a214454b89daf4c2a7e844e3cab2b8ba.png
  • variational_inference_87bc100f5db5bf1ccafa2581d7696a3f27fb3195.png is the variational density
  • variational_inference_8b34a5842f09b033248a46e3623d35bc77033a09.png is a variational factor
  • variational_inference_df01748388d856b13c3e63271c42b1c625c70d53.png, i.e. the expectation over all factors of variational_inference_9974194a836f8c06efda1c78a55ab184c6c2d746.png keeping the variational_inference_5f61c2bb5716245c8b58a7571ad75eb2c9b6ce65.png factor constant


Based on coordinate ascent, thus no gradient information required.

Iteratively optimizes each factor of the mean-field variational density, while holding others fixed. Thus, climbing the ELBO to a local optimum.


  • Input: Model variational_inference_fbbb165bcbe7fcb6d9fcc57e68d819311f424304.png, data set variational_inference_17da0758469180666a34cfa1eb0d65bd169e80ca.png
  • Output: Variational density variational_inference_87bc100f5db5bf1ccafa2581d7696a3f27fb3195.png

Then the actual algorithm goes as follows:


Psuedo implementation in Python

Which I believe could look something like this when written in Python:

import sys
import numpy as np

class UpdateableModel:
    def __init__(self):
        self.latent_var = np.random.rand(0, 1)

    def __call__(self, z):
        # compute q(z) or equivalently q(z_j | z_{l != j})
        # since mean-field approx. allows us to assume independence

    def proba_product(qs, z):
        # compute q_1(z_1) * q_2(z_2) * ... * q_m(z_m)
        res = 1
        for j in range(len(qs)):
            q_j = qs[j]
            z_j = z[j]
            res *= q_j(z_j)
        return res

def compute_elbo(q, p_joint, z, x, all_possible_vals=None):
    m = len(z)
    joint_expect = 0
    latent_expect = 0

    for j in range(m):
        q_j = q[j]
        z_j = z[j]

        joint_expect += q_j(z_j) * np.log(p_joint(z=z, x=x))
        latent_expect += q_j(z_j) * np.log(q_j(z_j))

    return joint_expect + latent_expect

def cavi(model, z, x, epsilon=5):
    model : callable
        Computes our model probability $p(z, x)$ or $p(z_j | z_{l != j}, x)$
    z : array-like
        Initialized values for latent variables, e.g. for a GMM we would have
        mu = z[0], sigma = z[1].
    x : array-like
        Represents the data.
    m = len(z)
    q = [UpdateableModel() for j in range(m)]

    elbo = sys.maxint

    while elbo > epsilon:
        for j in range(m):
            # expectation for all latent variables
            expect_log = 0
            for j2 in range(m):
                if j2 == j:
                expect_log += q[j2](z[j2]) * np.log(model(fixed=j, z=z, x=x))
            q[j].latent_var = np.exp(expect_log)

        elbo = compute_elbo(q, model, z=z, x=x)

    return q

Mean-field approx. → assume latent variables are independent of each other → variational_inference_a2a36af5c818c3adbe8f56a2a85a63d646a01fbf.png can simply be represented as a vector with the variational_inference_5f61c2bb5716245c8b58a7571ad75eb2c9b6ce65.png entry being a independent model correspondig to variational_inference_8b34a5842f09b033248a46e3623d35bc77033a09.png

Derivation of update

We simply rewrite the ELBO as a function of variational_inference_603b3cfceaf8a537967899b2c0ca88a0c9652c7e.png, since the independence between the variational factors variational_inference_603b3cfceaf8a537967899b2c0ca88a0c9652c7e.png implies that we can maximize the ELBO wrt. to each of those separately.


Were we have written the expectation variational_inference_77537e6e6b034617f2905b1144dbadacdf976a83.png wrt. using iterated expectation.




  • Can obtain unbiased gradient estimator by sampling from the variational distribution variational_inference_72a4e99309ce680595a6354649d5b0085d735564.png without having to compute ELBO analytically
  • Only requires computation of score function of variational posterior: variational_inference_01dc35618ed2bf2a5c855390e61080c81dfd0ee9.png
  • Given by


    • Unbiased estimator obtained by sampling from variational_inference_72a4e99309ce680595a6354649d5b0085d735564.png

Reparametrization trick

  • variational_inference_72a4e99309ce680595a6354649d5b0085d735564.png needs to be reparametrizable
    • E.g. variational_inference_26f0b4dc609b4427d92dd77c1f237185bf81453b.png and


      where variational_inference_daf4c98ec256e58b010370c12204e88945e82b90.png and variational_inference_8a303c22b5ff8640fb384bd5649f789459e04e8b.png are parametrized means and std. dev

  • Then we instead do the following:


    where variational_inference_c4f480233088a134e88f2426541b2f00ca318b55.png is some determinstic function and variational_inference_9bf5d0788bd50a16a34363cc274a39b783a6d5d6.png is the "noise" distribution

  • Observe that this then also takes advantage of the structure of the joint distribution
    • But also requires the join-distribution to be differentiable wrt. variational_inference_a295ff85aecbb8f91647c26f3666bc3358ec1000.png
  • Often the entropy of variational_inference_72a4e99309ce680595a6354649d5b0085d735564.png can be computed analytically which reduces the variance of the gradient estimate since we only have to estimate the expectation of the first term
    • Recall that


    • E.g. in ADVI where we are using Gaussian Mean-Field approx. which means that the entropy terms reduces to variational_inference_0dfe84faeb541b0a5be1e9755a8e7067917179ec.png where variational_inference_cda601ed6abcdf3cf82642e9404e04aae78e7e72.png
  • Assumptions
    • variational_inference_06aebb2500c6397b3e0500c9c2fd8720384cb9f7.png must be differentiable
    • variational_inference_ab20ba1962a9eb230e624066148820ba492659f0.png must be differentiable
  • Notes
    • variational_inference_a7e252ffcd9c903278ae557e56cb7be8558baa8a.png Usually lower variance than REINFORCE, and can potentially be reduced further if analytical entropy is available
    • variational_inference_6886a00d9c01084b5381e30f53c1f7d209894d5b.png Being reparametrizable (and this reparametrization being differentiable) limits the family of variational posteriors

Differentiable lower-bounds of ELBO


  • Automatic Derivation Variational Inference (ADVI)
    • Objective:


    • Gradient estimate:


    • Variational posterior:
      • Gaussian Mean-field approx.
    • Assumptions:
      • variational_inference_2c4da5bed51becdf57f225ff9e50ef8ebd691a30.png is transformable in each component
      • variational_inference_f4d02d8b757b78d2991836b88709a29d2d6c38f9.png and variational_inference_ca8c324750c3c72c23b4ef983938d4996b807e6d.png are independent
    • Notes:
      • variational_inference_a7e252ffcd9c903278ae557e56cb7be8558baa8a.png If applicable, it just works
      • variational_inference_a7e252ffcd9c903278ae557e56cb7be8558baa8a.png Easy to implement
      • variational_inference_6886a00d9c01084b5381e30f53c1f7d209894d5b.png Assumes independence
      • variational_inference_6886a00d9c01084b5381e30f53c1f7d209894d5b.png Restrictive in choice of variational posterior variational_inference_34724c66b3e97a635a63f6916f5be50dd45cf282.png (i.e. Gaussian mean-field)
  • Black-box Variational Inference (BBVI)
    • Objective


    • Gradient estimate


      • Using variance reduction techniques (Rao-Blackwellization and control variate), for each variational_inference_ca21b159c150deb1c1d9f0bb5f72aef9e4b607f6.png,




    • Variational posterior
    • Assumptions
    • Notes
      • variational_inference_a7e252ffcd9c903278ae557e56cb7be8558baa8a.png
      • variational_inference_6886a00d9c01084b5381e30f53c1f7d209894d5b.png Gradient estimator usually has high variance, even with variance-reduction techniques

Automatic Differentiation Variational Inference (ADVI)

Example implementation

using ForwardDiff
using Flux.Tracker
using Flux.Optimise

    ADVI(samplers_per_step = 10, max_iters = 5000)

Automatic Differentiation Variational Inference (ADVI) for a given model.
struct ADVI{AD} <: VariationalInference{AD}
    samples_per_step # number of samples used to estimate the ELBO in each optimization step
    max_iters        # maximum number of gradient steps used in optimization

ADVI(args...) = ADVI{ADBackend()}(args...)
ADVI() = ADVI(10, 5000)

alg_str(::ADVI) = "ADVI"

vi(model::Model, alg::ADVI; optimizer = ADAGrad()) = begin
    # setup
    var_info = VarInfo()
    model(var_info, SampleFromUniform())
    num_params = size(var_info.vals, 1)

    dists = var_info.dists
    ranges = var_info.ranges

    q = MeanField(zeros(num_params), zeros(num_params), dists, ranges)

    # construct objective
    elbo = ELBO()

    Turing.DEBUG && @debug "Optimizing ADVI..."
    θ = optimize(elbo, alg, q, model; optimizer = optimizer)
    μ, ω = θ[1:length(q)], θ[length(q) + 1:end]

    # TODO: make mutable instead?
    MeanField(μ, ω, dists, ranges) 

# TODO: implement optimize like this?
# (advi::ADVI)(elbo::EBLO, q::MeanField, model::Model) = begin
# end

function optimize(elbo::ELBO, alg::ADVI, q::MeanField, model::Model; optimizer = ADAGrad())
    θ = randn(2 * length(q))
    optimize!(elbo, alg, q, model, θ; optimizer = optimizer)

    return θ

function optimize!(elbo::ELBO, alg::ADVI{AD}, q::MeanField, model::Model, θ; optimizer = ADAGrad()) where AD
    alg_name = alg_str(alg)
    samples_per_step = alg.samples_per_step
    max_iters = alg.max_iters

    # number of previous gradients to use to compute `s` in adaGrad
    stepsize_num_prev = 10

    # setup
    # var_info = Turing.VarInfo()
    # model(var_info, Turing.SampleFromUniform())
    # num_params = size(var_info.vals, 1)
    num_params = length(q)

    # # buffer
    # θ = zeros(2 * num_params)

    # HACK: re-use previous gradient `acc` if equal in value
    # Can cause issues if two entries have idenitical values
    if θ ∉ keys(optimizer.acc)
        vs = [v for v ∈ keys(optimizer.acc)]
        idx = findfirst(w -> vcat(q.μ, q.ω) == w, vs)
        if idx != nothing
            @info "[$alg_name] Re-using previous optimizer accumulator"
            θ .= vs[idx]
        @info "[$alg_name] Already present in optimizer acc"

    diff_result = DiffResults.GradientResult(θ)

    # TODO: in (Blei et al, 2015) TRUNCATED ADAGrad is suggested; this is not available in Flux.Optimise
    # Maybe consider contributed a truncated ADAGrad to Flux.Optimise

    i = 0
    prog = PROGRESS[] ? ProgressMeter.Progress(max_iters, 1, "[$alg_name] Optimizing...", 0) : 0

    time_elapsed = @elapsed while (i < max_iters) # & converged # <= add criterion? A running mean maybe?
        # TODO: separate into a `grad(...)` call; need to manually provide `diff_result` buffers
        # ForwardDiff.gradient!(diff_result, f, x)
        grad!(elbo, alg,q, model, θ, diff_result, samples_per_step)

        # apply update rule
        Δ = DiffResults.gradient(diff_result)
        Δ = Optimise.apply!(optimizer, θ, Δ)
        @. θ = θ - Δ

        Turing.DEBUG && @debug "Step $i" Δ DiffResults.value(diff_result) norm(DiffResults.gradient(diff_result))
        PROGRESS[] && (!(prog))

        i += 1

    @info time_elapsed

    return θ

function grad!(vo::ELBO, alg::ADVI{AD}, q::MeanField, model::Model, θ::AbstractVector{T}, out::DiffResults.MutableDiffResult, args...) where {T <: Real, AD <: ForwardDiffAD}
    # TODO: this probably slows down executation quite a bit; exists a better way of doing this?
    f(θ_) = - vo(alg, q, model, θ_, args...)

    chunk_size = getchunksize(alg)
    # Set chunk size and do ForwardMode.
    chunk = ForwardDiff.Chunk(min(length(θ), chunk_size))
    config = ForwardDiff.GradientConfig(f, θ, chunk)
    ForwardDiff.gradient!(out, f, θ, config)

# TODO: implement for `Tracker`
# function grad(vo::ELBO, alg::ADVI, q::MeanField, model::Model, f, autodiff::Val{:backward})
#     vo_tracked, vo_pullback = Tracker.forward()
# end
function grad!(vo::ELBO, alg::ADVI{AD}, q::MeanField, model::Model, θ::AbstractVector{T}, out::DiffResults.MutableDiffResult, args...) where {T <: Real, AD <: TrackerAD}
    θ_tracked = Tracker.param(θ)
    y = - vo(alg, q, model, θ_tracked, args...)
    Tracker.back!(y, 1.0)

    DiffResults.gradient!(out, Tracker.grad(θ_tracked))

function (elbo::ELBO)(alg::ADVI, q::MeanField, model::Model, θ::AbstractVector{T}, num_samples) where T <: Real
    # setup
    var_info = Turing.VarInfo()

    # initialize `VarInfo` object
    model(var_info, Turing.SampleFromUniform())

    num_params = length(q)
    μ, ω = θ[1:num_params], θ[num_params + 1: end]

    elbo_acc = 0.0

    # TODO: instead use `rand(q, num_samples)` and iterate through?

    for i = 1:num_samples
        # iterate through priors, sample and update
        for i = 1:size(q.dists, 1)
            prior = q.dists[i]
            r = q.ranges[i]

            # mean-field params for this set of model params
            μ_i = μ[r]
            ω_i = ω[r]

            # obtain samples from mean-field posterior approximation
            η = randn(length(μ_i))
            ζ = center_diag_gaussian_inv(η, μ_i, exp.(ω_i))

            # inverse-transform back to domain of original priro
            θ = invlink(prior, ζ)

            # update
            var_info.vals[r] = θ

            # add the log-det-jacobian of inverse transform;
            # `logabsdet` returns `(log(abs(det(M))), sign(det(M)))` so return first entry
            elbo_acc += logabsdet(jac_inv_transform(prior, ζ))[1] / num_samples

        # compute log density
        elbo_acc += var_info.logp / num_samples

    # add the term for the entropy of the variational posterior
    variational_posterior_entropy = sum(ω)
    elbo_acc += variational_posterior_entropy


function (elbo::ELBO)(alg::ADVI, q::MeanField, model::Model, num_samples)
    # extract the mean-field Gaussian params
    θ = vcat(q.μ, q.ω)

    elbo(alg, q, model, θ, num_samples)

Black-box Variational Inference (BBVI)



Controlling the variance

  • Variance of gradient estimator under MC estimation of ELBO) can be too large to be useful
  • Reduces variance of rv. by replacing it with its conditione expectation wrt. a subset of variables
  • How

    Simple example:

    • Two rvs variational_inference_f40ad5f32532ae52dd17a4315b7711042277a778.png and variational_inference_a4242ffc7d6e6fc3298db18c548b41b6309e0a68.png
    • Function variational_inference_0bc6853473f5d85eee3cea9e652db6a507a50dd3.png
    • Goal: compute expectation variational_inference_511ad53eef401cf08b6081fcb06d5185afd3a6e9.png
    • Letting


      we note that


    • Therefore: can use variational_inference_ceec8053e8d18401f11d5b6d8d79a0c87cf1c456.png as MC approx. of variational_inference_511ad53eef401cf08b6081fcb06d5185afd3a6e9.png, with variance


      which means that variational_inference_ceec8053e8d18401f11d5b6d8d79a0c87cf1c456.png is a lower variance estimator than variational_inference_0bc6853473f5d85eee3cea9e652db6a507a50dd3.png.

  • In this case

    Consider mean-field approximation:


    where variational_inference_ca21b159c150deb1c1d9f0bb5f72aef9e4b607f6.png denotes the parameter(s) of variational posterior of variational_inference_06e9ac6ef762027586f7317a2cd8ec83c9146a2f.png. Then the MC estimator for gradient wrt. variational_inference_ca21b159c150deb1c1d9f0bb5f72aef9e4b607f6.png is simply


    <2019-06-03 Mon> But what the heck is variational_inference_4a77bca9657725862e67fa3d15043c1dd9654cf1.png? Surely variational_inference_3c902e8535222d8d0615cbf5029be12f4e9d53ed.png, right?

    <2019-06-03 Mon> So you missed the bit where variational_inference_78859c56a9fac5d874aff8fd463bbcfb4549b2a5.png denotes the pdf of variables in the model that depend on the i-th variable, i.e. the Markov blanket of variational_inference_f4d02d8b757b78d2991836b88709a29d2d6c38f9.png. Then variational_inference_ef0b019c8003c516e33dc2273cf661a018c1410d.png is the joint probability that depends on those variables.

    Important: model here refers to the variational distribution variational_inference_9974194a836f8c06efda1c78a55ab184c6c2d746.png! This means that in the case of a Mean-field approximation where the Markov blanket is an empty set, we simply sample variational_inference_01f7b7127bdc73de1b82efda8d5a045166ffb9fa.png jointly.

Control variates
  • The idea is to replace the function variational_inference_f017a9b0f9e8a176a3db97e0116f98ff496ac318.png being approximated by MC with another function variational_inference_1a191be975e51b67f80753aa90f4ac671df23893.png which has the same expectation but lower variance, i.e. choose variational_inference_1a191be975e51b67f80753aa90f4ac671df23893.png s.t.




  • One particular example is, for some function variational_inference_f017a9b0f9e8a176a3db97e0116f98ff496ac318.png,


    • We can then choose variational_inference_cb254ffe3e498884a0e8a2679f6c3a003f7219a5.png to minimize variational_inference_436a5c305091f58cc283d2a784da4b8ffe6df828.png
    • Variance of variational_inference_1a191be975e51b67f80753aa90f4ac671df23893.png is then


    • Therefore, good control variates have high covariance with variational_inference_f017a9b0f9e8a176a3db97e0116f98ff496ac318.png
    • Taking derivative wrt. variational_inference_cb254ffe3e498884a0e8a2679f6c3a003f7219a5.png and setting derivative equal to zero we get the optimal choice of variational_inference_cb254ffe3e498884a0e8a2679f6c3a003f7219a5.png, denoted variational_inference_855b45667a1f7df3534c1462b0822e1a889cf48d.png:


Maybe you recognize this form from somewhere? OH YEAH YOU DO, it's the same expression we have for for the slope the ordinary least squares (OLS) estimator:


which we also know is the minimum variance estimator in that particular case.

We already know variational_inference_6542001cb56ce6c884ba6ca4f1bc4ce7305c6b8d.png, which in the linear case just means that the intercept are the same. If we rearrange:


So it's like we're performing a linear regression in the expectation space, given some function variational_inference_c4207654f80c68e96d536088ad629a9dfa3c6927.png?

Fun stuff we could do is to let variational_inference_c4207654f80c68e96d536088ad629a9dfa3c6927.png be a parametrized function and then minimize the variance wrt. variational_inference_c4207654f80c68e96d536088ad629a9dfa3c6927.png also, right future-Tor?

  • This case

    In this particular case, we can choose


    This always has expectation zero, which simply follows from


    under sufficient restrictions allowing us to "move" the partial outside of the integral (e.g. smoothness wrt. variational_inference_eb13eedc64828a894bed17ed8b010e5fea060eff.png is sufficient).

    With the Rao-Blackwellized estimator obtained in previous section, we then have


    The estimate of optimal choice of scaling variational_inference_cb254ffe3e498884a0e8a2679f6c3a003f7219a5.png, is then



    • variational_inference_b596492b16aa74b4714fed0da875a56b21a0ec47.png and variational_inference_ea930cdacda4b2bf91e15fcd24e7f75bed0f6508.png denote the empirical estimators
    • variational_inference_9aba623455614cd217e7dea193a693d853100db0.png denotes the d-th component of variational_inference_3935454e6161397ad1f0db18e59ce71ee973a63c.png, i.e. it can be multi-dimensional

    Therefore, we end up with the MC gradient estimator (with lower variance than the previous one):




In the ranganath13_black_box_variat_infer as variational_inference_aa7e33f478b3c7ea67d6deee937aff9d5a93efa5.png where variational_inference_ccb973f0cf1a9e40c184ec65f392888f18a6471e.png is the t-th value of the Robbins-Monro sequence

Appendix A: Definitions

variational density
our approximate probability distribution variational_inference_a2a36af5c818c3adbe8f56a2a85a63d646a01fbf.png
variational parameter
a parameter required to compute our variational density, i.e. parameters which define the approx. distribution to the latent variables. So sort of like "latent-latent variables", or as I like to call them, doubly latent variables (•_•) / ( •_•)>⌐■-■ / (⌐■_■), Disclaimer: I've never called them that before in my life..