Timeseries modelling

Table of Contents

Autoregressive (AR)

Idea

Output variable depends linearly on its own previous values and on a stochastic term (accounting for the variance in the data).

Model

AR(p) refers to autoregressive model of order p, and is written

timeseries_modelling_b8e7a089e7de8e4c1129dc4d8ec329d5c55e2940.png

where

  • timeseries_modelling_5f7996cff6eac8ee5ff62e02e6812282bec9bb0d.png are parameters
  • timeseries_modelling_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png is a constant
  • timeseries_modelling_3921ec1fa164659575a8bbd84974442bd6ebd1d4.png is white noise (Gaussian rv.), i.e. timeseries_modelling_fa48fa050422028e9b6dc3b8c511f293606bd1c9.png

The unknowns are:

  • constant timeseries_modelling_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png
  • the weights / dependence timeseries_modelling_f67c54dde076eeafe0ed3fdd533d6515d2ef6b87.png of the previous observations
  • the variance of the white noise timeseries_modelling_68668609fc72c879ba0fabd0c25275964a5e4af8.png

Estimating the parameters

  • Ordinary Least Squares (OLS)
  • Method of moments (through Yule-Walker equations)

There is a direct correspondance between the parameters timeseries_modelling_f67c54dde076eeafe0ed3fdd533d6515d2ef6b87.png and the covariance of the process, and this can be "inverted" to determine the parameters from the autocorrelation function itself.

Yule-Walker equations

timeseries_modelling_2379219bc5125ddd3f0e7cf0d66f71fc3949da0b.png

where

  • timeseries_modelling_c91b2546cca35ca4f20a0b300f20ffc82a4f6949.png is the autocovariance function of timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png
  • timeseries_modelling_6a8f69aa2a1a79582a8d90459185f6d24e1076b5.png is the std. dev. of the input noise process
  • timeseries_modelling_55af90bc54783c06887569d4eaf8096c788fba54.png is the Kroenecker delta function, which is only non-zero when timeseries_modelling_543a00fe9e998f934248907d93b0102228aa6894.png

Now, if we only consider timeseries_modelling_63145ff83ecdb4abe8b65324e2b1771131b25d97.png, we can ignore the last term, and can simply write the above in matrix form:

My thinking

Our plan is as follows:

  1. Consider each sequence of timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png steps (the order of the AR)
  2. Deduce distribution for timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png conditioned on the parameters of the model and the timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png previous observations, timeseries_modelling_9a276a226b07bc04b8860dc5d744a1faeab46bf3.png.
  3. Obtain conditional log-likelihood function for this conditional distribution
  4. Get gradient of said conditional log-likelihood function
  5. Maximize said conditional log-likelihood wrt. parameters timeseries_modelling_5af4fe097b34d08eed13a152e1c7957ac4906f21.png
  6. ???
  7. PROFIT!!!
Initial observation sequence

We start by collecting the first timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png observations in a the sample timeseries_modelling_667d9387d28b8a16475d8de22923b1e0014853d6.png, writing it as timeseries_modelling_bd7b80462fc80ed8eed1ccf0e933e027edc651ba.png, which as a mean vector timeseries_modelling_aa05054aafec989e334b986c92b112596efba32d.png where

timeseries_modelling_b7cb2d843b37aedf8777e2242a0fe95378fb72dc.png

Why? Well, we start out with the taking the expectation of timeseries_modelling_157dff168e895c5151648a60490f89c4dc313376.png wrt. the random variables, i.e. timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png:

timeseries_modelling_486503c663b52bb9c05f5112290822f404d55fd0.png

Then we start expanding:

timeseries_modelling_0d4d5006eff8bee84104237ee5bfbd6556ae3922.png

And since timeseries_modelling_41d1d5deac669913fdde9399c1c56f65559830c5.png, we rearrange to end up with:

timeseries_modelling_e3772f3ee1a6944fab8c7f4b56e5380ac9901d9c.png

Voilá!

and the variance-covariance matrix is given by

timeseries_modelling_2315179a7e29867aae988976010ae9a2d418eab4.png

where timeseries_modelling_d2740097a769e841477c26cad3deb1464b1db509.png for two different steps timeseries_modelling_aa1698cb8ee1665238ec3e91824191643c62ee93.png and timeseries_modelling_a270d2f0aac3268943aea13ffd8c0dd862578f2e.png. This is due to the process being weak-sense stationary.

The reason for the covariance matrix, is that by our assumption of weak-sense stationarity (is that a word? atleast it is now).

Conditional distribution

First we note the following:

timeseries_modelling_8d5364da567f7772ef62ff0ad46c12d28887ae32.png

is equivalent of

timeseries_modelling_0707622d10eb24e55ca50f2935680687bca67c31.png

We then observe that if we condition timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png on the previous timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png observations, it is independent on everything before that. That is;

timeseries_modelling_cf7a22f5c6518f6101df20c039617d4ad5d1ce1e.png

where timeseries_modelling_be468b8d04dc54a49839becf6d750dc5c074fcfb.png.

Therefore,

timeseries_modelling_2a0216110c9eeb8b24ddffb52f4f57fd9814e777.png

For notational sake, we note that we can write it in the following form:

timeseries_modelling_b2f08e8e8a6ce5dd27feb8a66e088c3e75e8cf55.png

We can write the total probability of seeing the data conditioned on the parameters as follows:

timeseries_modelling_8dc2781ee07076928ff65c6c17df8dff9750b141.png

Which, when we substitute in the normal distributions obtained for timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png,

timeseries_modelling_b01cb0c01f928eaa3f43eb49a6117e623fb25aab.png

Conditional log-likelihood

Taking the logarithm of the previous expression, we obtain the conditional log-likelihood:

timeseries_modelling_2487d0f08e2090f56afee896588d07480192a8dd.png

And just to remind you; what we want to do is find the parameters which maximizes the condition log-likelihood. If we frame it as a minimization problem, simply by taking then egative of the log-likelihood, we end up with the objective function:

timeseries_modelling_feae99bee2fe1f1a75bcd1a5f282b218275f9303.png

Now, we optimize!

Optimization

If we only consider the objective function wrt. to timeseries_modelling_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png and timeseries_modelling_5f7996cff6eac8ee5ff62e02e6812282bec9bb0d.png, we end up with:

timeseries_modelling_a85e0a3536250398fc1d70ff1fde87c4e412742a.png

This is what I arrived at independently, but really thought this was a huge issue because timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png is defined as a function of the other variables and so we have a recursive relationship!

But then I realized, after confirming that the above is in fact what you want, that the timeseries_modelling_f094efebc388b33c5566b95ddcd4e2c9125868e1.png in this equation are observed values, not random variables, duh! Might be easier to think about it as timeseries_modelling_5b88457fc4a51a5e1de322ae71fb82967620c09f.png, and so we're looking at timeseries_modelling_fb49a2275deb9f1abf5a093ddf5d767181e29ad3.png, since the last term is in fact our estimate for timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png.

Disclaimer: I think…

Thus, the conditional MLE of these parameters can be obtained from an OLS regression of timeseries_modelling_1e93bc9be5682a9bca0c04cca91fbeebc2c17db7.png on a constant timeseries_modelling_cc6b5c35669c3b0901ef37cf78827ca69227d7b2.png and timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png of its own lagged values.

The conditional MLE estimator of timeseries_modelling_68668609fc72c879ba0fabd0c25275964a5e4af8.png turns out to be:

timeseries_modelling_6324546349b9717daf275c451761b67e64f748e9.png

This can be solved with iterative methods like:

  • Gauss-Newton algorithm
  • Gradient Descent algorithm

Or by finding the exact solution using linear algebra, but the naive approach to this is quite expensive. There are definitively libraries which do this efficiently but nothing that I can implement myself, I think.

Moving average (MA)

Autoregressive Moving average (ARMA)

Arma(p, q) refers to the model with timeseries_modelling_7225b076f6e6326f1636b11d1aad8de58bcc4761.png autoregressive terms and timeseries_modelling_f63749027365e29025a5cb867262c465d2de65bb.png moving-average terms. This model contains the AR(p) and MA(q) models,

timeseries_modelling_b28c2b21f8749806438f91d5a53384b79333a42a.png

Autoregressive Conditional Heteroskedasticity (ARCH)

Notation

  • Heteroskedastic refers to sub-populations of a collection of rvs. having different variablility

Stuff

  • Describes variance of current error term as a function of the actual sizes of the previous time periods' error terms
  • If we assume the error variance is described by an ARMA model, then we have Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model

Let timeseries_modelling_3921ec1fa164659575a8bbd84974442bd6ebd1d4.png denote a real-valued discrete-time stochastic process, and timeseries_modelling_98bbb8fc1ac23df2fd0aa2773e716831b4792fb1.png and be the information set of all information up to time timeseries_modelling_a270d2f0aac3268943aea13ffd8c0dd862578f2e.png, i.e. timeseries_modelling_28fc425da4ffb011e994d03f076f048fb6ea4e66.png.

The process timeseries_modelling_d9c8573f4e36a65f53d4e8e8594f5750d11d8f0a.png is said to be a Autoregressive Conditional Heteroskedasticitic model or ARCH(q), whenever

timeseries_modelling_4443f5f0db5ebaed23f85717d9866bcbc2bf0e4a.png

with

timeseries_modelling_dc24edb18019e917e82c58211b0004527dbf0832.png

The conditional variance can generally be expressed as

timeseries_modelling_e78c5ced4baf6ea39535fe5fabd5b4053478a7e9.png

where timeseries_modelling_f371a75ece1c10ad4867a12c53f0f2a61d1a4dbd.png is a nonnegative function of its arguments and timeseries_modelling_343a11f685fd6efc5554a10b0dd5817552490b04.png is a vector of unknown parameters.

The above is sometimes represented as

timeseries_modelling_4f358dbd6603304d37c941fbe77619c25f841238.png

with timeseries_modelling_47547b7ed11eb92373c7fc272029028bc93b9f12.png and timeseries_modelling_2e1c03f63f23e8dd0a80a7f5d5b02bfb4177f8ec.png, where the timeseries_modelling_f0675533d1b422f17afdc594708d9d8a0e4c4d1f.png are uncorrelated.

We can estimate an ARCH(q) model using OLS.

The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model or GARCH(p, q) is simply an ARCH(q) in combination with a ARMA(p) model, i.e.

timeseries_modelling_4e5e6fe91e89fb4aacb7a66d07b8e42702f2cf3c.png

where timeseries_modelling_cda438966ce66994dccc289ac16c7b37b2509797.png and timeseries_modelling_7fdbd79e0529d4572ec059923bac2eee63c8b136.png are lag operators such that

timeseries_modelling_669262df20a791d89e4116cd4547c3170c9ce823.png

Or equivalently,

timeseries_modelling_aba9a1d07aa9fabf552a4a2b4be071cb6042eb50.png

where

timeseries_modelling_370be9bfec8dba191c92299796086e8459d52204.png

which simply ensures that

Integrated GARCH (IGARCH)

Fractional Integrated GARCH (FIGARCH)

TODO Markov Switching Multifractal (MSM)

Useful tricks

Autocorrelation using Fourier Transform

Observe that the autocorrelation for a function timeseries_modelling_a69a3cf8b5bd3a05a68b76e881cb376a1c4ef4ce.png can be computed as follows:

timeseries_modelling_c1ec6c5bc687e577e897444f93394bcceec33ebf.png

since this is just summing over the interactions between every value timeseries_modelling_aa1698cb8ee1665238ec3e91824191643c62ee93.png. If timeseries_modelling_bd94554367fc25ca0735582c56b25115202bbaa1.png, then the above defines the autocorrelation between the series, as wanted.

Now, observe that the Fourier Transform has the property:

timeseries_modelling_b7e2934682f3d3993fe727d4f683898c900a9fc9.png

Hence,

timeseries_modelling_e698780da12f3d44dc5d355b4509267c3e02ce46.png

Which means that, given a dataset, we can use the efficient Fast Fourier Transform (FFT) to compute timeseries_modelling_f0115814d217269e577b50981a28c356ed50b249.png, square it, and then take the inverse FFT, to obtain timeseries_modelling_0e6501cbf940b4c65c154f1c16d144a4a8201316.png, i.e. the autocorrelation!

This is super-dope.

Appendix A: Vocabulary

autocorrelation
also known as serial correlation, and is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, similarity between observations as a function of the time lag between them.
wide-sense stationary
also known as weak stationary, only require that the mean (1st moment) and autocovariance do not vary wrt. time. This is in contrast to stationary processes, where we require the joint probability to not vary wrt. time.