### What is an Autoregressive Model?

An autoregressive (AR) model predicts future behavior based on past results. It is used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them. You only use past data to model the behavior, hence the name autoregressive (the Greek prefix auto means 'self'). The process is basically a linear regression of the data in the current series against one or more past values in the same series.

In an AR model, the value of the outcome variable (Y) at some point t in time, like a 'regular' linear regression, directly related to the predictor variable (X). Where simple linear regression and AR models differ is that Y is dependent on X and previous values for Y.

The AR process is an example of a stochastic process, which have degrees of uncertainty or randomness built in. The randomness means that you might be able to predict future trends pretty well with past data, but you’re never going to get 100% accuracy.

AR models are also called conditional models, Markov models, or transition models.

### AR(p) Models

An AR(p) model is an autoregressive model where specific lagged values of y_{t} are used as predictor variables. Lags are where results from one time period affect following periods.

The value for 'p' is called the order. For example, an AR(1) would be a 'first order autoregressive process.' The outcome variable in a first order AR process at some point in time t is related only to time periods that are one period apart (i.e. the value of the variable at t – 1). A second or third order AR process would be related to data two or three periods apart.

**The AR(p) model is defined by the equation:**

_{t}= δ + φ

_{1}y

_{t-1}+ φ

_{2}y

_{t-2}+ … + φ

_{p}y

_{t-p}+ ε

_{t}

Where:

- y
_{t-1}, y_{t-2}…y_{t-p}are the past series values (lags), - ε
_{t}is random term (or called white noise), - and δ is defined by the following equation:

### Parameter Estimation

From the equation, all {y_{t}} terms are already observable and known to us. What we want to get is the coefficient terms, i.e. δ, φ_{1}, φ_{2}, ...

#### (A) Least Squares Regression

One of the estimation methods is to formulate as a least squares regression problem, basing prediction of values of y_{t} on the *p* previous values of the same series. A general multiple linear regression is written as:

*Y*

_{i}= β_{0}+ β_{1}X_{i,1}+ β_{2}X_{i,2}+ … + β_{p}X_{i,p}+ ε_{t}Then, we try to minimize the sum of square error:

L(β) := Σ(ε_{t}^{2})

= ||Xβ - Y||^{2}

= (Xβ - Y)^{T}(Xβ - Y)

= Y^{T}Y - Y^{T}Xβ - β^{T}X^{T}Y + β^{T}X^{T}Xβ

As it is a convex function, the optimal solution lies at gradient zero. So we firstly take a partial derivative.

∂L(β)/∂β = ∂ (Y^{T}Y - Y^{T}Xβ - β^{T}X^{T}Y + β^{T}X^{T}Xβ) / ∂β

= -2X^{T}Y + 2X^{T}Xβ

Set this gradient to zero, we get the optimal parameters.

β = (X^{T}X)^{-1}X^{T}Y

### Example

Suppose we collected the previous 20 daily closing price of stock A.

10, 11.3, 10.71, 10.87, 10.48, 11.36, 10.49, 10.57, 10.58, 9.42, 9.11, 9.75, 11.14, 9.72, 10.73, 10.57, 11.91, 13.09, 14.34, 14.09

Now, we want to use an AR(1) model (i.e. y_{t} = δ + φ_{1}y_{t-1} + ε_{t}), to explain this series.

Using the least square estimation method above, we obtained δ, φ_{1} = 1.319141067 and 0.898255165 respectively.

_{t}= 1.319141067 + 0.898255165*y

_{t-1}

Then, we can calculate the expected value of y at time t given that we know the information for t-1, i.e. E(y_{t}|y_{t-1})

Seq | y_{t} |
Estimated y_{t} |
---|---|---|

1 | 10 | - |

2 | 11.3 | 10.3016927176918 |

3 | 10.71 | 11.4694244322527 |

4 | 10.87 | 10.939453884875 |

5 | 10.48 | 11.0831747112825 |

6 | 11.36 | 10.7328551969143 |

7 | 10.49 | 11.5233197421555 |

8 | 10.57 | 10.7418377485647 |

9 | 10.58 | 10.8136981617685 |

10 | 9.42 | 10.8226807134189 |

11 | 9.11 | 9.78070472196458 |

12 | 9.75 | 9.50224562080005 |

13 | 11.14 | 10.07712892643 |

14 | 9.72 | 11.3257036058452 |

15 | 10.73 | 10.0501812714786 |

16 | 10.57 | 10.957418988176 |

17 | 11.91 | 10.8136981617685 |

18 | 13.09 | 12.0173600829313 |

19 | 14.34 | 13.0773011776866 |

20 | 14.09 | 14.2001201339952 |

Moreover, we can base on the formula to further iteratate and forecast the next stock prices.

Seq | y_{t} |
Estimated y_{t} |
---|---|---|

21 | - | 13.9755563427335 |

22 | - | 13.8727567364869 |

23 | - | 13.7804164592112 |

24 | - | 13.6974713282064 |

25 | - | 13.6229654358659 |

Now, you understand the statistical theory behide an Auto-Regressive model. Let's further go to the next post to see how to implement as a trading strategy!