Timeseries modelling
Table of Contents
Autoregressive (AR)
Idea
Output variable depends linearly on its own previous values and on a stochastic term (accounting for the variance in the data).
Model
AR(p) refers to autoregressive model of order p, and is written
where
- are parameters
- is a constant
- is white noise (Gaussian rv.), i.e.
The unknowns are:
- constant
- the weights / dependence of the previous observations
- the variance of the white noise
Estimating the parameters
- Ordinary Least Squares (OLS)
- Method of moments (through Yule-Walker equations)
There is a direct correspondance between the parameters and the covariance of the process, and this can be "inverted" to determine the parameters from the autocorrelation function itself.
Yule-Walker equations
where
- is the autocovariance function of
- is the std. dev. of the input noise process
- is the Kroenecker delta function, which is only non-zero when
Now, if we only consider , we can ignore the last term, and can simply write the above in matrix form:
My thinking
Our plan is as follows:
- Consider each sequence of steps (the order of the
AR
) - Deduce distribution for conditioned on the parameters of the model and the previous observations, .
- Obtain conditional log-likelihood function for this conditional distribution
- Get gradient of said conditional log-likelihood function
- Maximize said conditional log-likelihood wrt. parameters
- ???
- PROFIT!!!
Initial observation sequence
We start by collecting the first observations in a the sample , writing it as , which as a mean vector where
Why? Well, we start out with the taking the expectation of wrt. the random variables, i.e. :
Then we start expanding:
And since , we rearrange to end up with:
Voilá!
and the variance-covariance matrix is given by
where for two different steps and . This is due to the process being weak-sense stationary.
The reason for the covariance matrix, is that by our assumption of weak-sense stationarity (is that a word? atleast it is now).
Conditional distribution
First we note the following:
is equivalent of
We then observe that if we condition on the previous observations, it is independent on everything before that. That is;
where .
Therefore,
For notational sake, we note that we can write it in the following form:
We can write the total probability of seeing the data conditioned on the parameters as follows:
Which, when we substitute in the normal distributions obtained for ,
Conditional log-likelihood
Taking the logarithm of the previous expression, we obtain the conditional log-likelihood:
And just to remind you; what we want to do is find the parameters which maximizes the condition log-likelihood. If we frame it as a minimization problem, simply by taking then egative of the log-likelihood, we end up with the objective function:
Now, we optimize!
Optimization
If we only consider the objective function wrt. to and , we end up with:
This is what I arrived at independently, but really thought this was a huge issue because is defined as a function of the other variables and so we have a recursive relationship!
But then I realized, after confirming that the above is in fact what you want, that the in this equation are observed values, not random variables, duh! Might be easier to think about it as , and so we're looking at , since the last term is in fact our estimate for .
Disclaimer: I think…
Thus, the conditional MLE of these parameters can be obtained from an OLS regression of on a constant and of its own lagged values.
The conditional MLE estimator of turns out to be:
This can be solved with iterative methods like:
- Gauss-Newton algorithm
- Gradient Descent algorithm
Or by finding the exact solution using linear algebra, but the naive approach to this is quite expensive. There are definitively libraries which do this efficiently but nothing that I can implement myself, I think.
Moving average (MA)
Autoregressive Moving average (ARMA)
Arma(p, q) refers to the model with autoregressive terms and moving-average terms. This model contains the AR(p) and MA(q) models,
Autoregressive Conditional Heteroskedasticity (ARCH)
Notation
- Heteroskedastic refers to sub-populations of a collection of rvs. having different variablility
Stuff
- Describes variance of current error term as a function of the actual sizes of the previous time periods' error terms
- If we assume the error variance is described by an ARMA model, then we have Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model
Let denote a real-valued discrete-time stochastic process, and and be the information set of all information up to time , i.e. .
The process is said to be a Autoregressive Conditional Heteroskedasticitic model or ARCH(q), whenever
with
The conditional variance can generally be expressed as
where is a nonnegative function of its arguments and is a vector of unknown parameters.
The above is sometimes represented as
with and , where the are uncorrelated.
We can estimate an ARCH(q) model using OLS.
Integrated GARCH (IGARCH)
Fractional Integrated GARCH (FIGARCH)
TODO Markov Switching Multifractal (MSM)
Useful tricks
Autocorrelation using Fourier Transform
Observe that the autocorrelation for a function can be computed as follows:
since this is just summing over the interactions between every value . If , then the above defines the autocorrelation between the series, as wanted.
Now, observe that the [BROKEN LINK: No match for fuzzy expression: def:fourier-transform] has the property:
Hence,
Which means that, given a dataset, we can use the efficient Fast Fourier Transform (FFT) to compute , square it, and then take the inverse FFT, to obtain , i.e. the autocorrelation!
This is super-dope.
Appendix A: Vocabulary
- autocorrelation
- also known as serial correlation, and is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, similarity between observations as a function of the time lag between them.
- wide-sense stationary
- also known as weak stationary, only require that the mean (1st moment) and autocovariance do not vary wrt. time. This is in contrast to stationary processes, where we require the joint probability to not vary wrt. time.