Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Why does the time series need to be stationary for analysis?

The first step for any time series analysis is to make the data set stationary. Everyone knows that stationarity means a near to constant mean and variance across time.

The red line above shows an increasing trend and the blue line is the result of the de-trending series. De-trending means to fit a regression line and then subtract it using original data

Stationarity does not mean that the series does not change over time, just the way it changes does not itself change over time.

The reason why we need a stationary data is simple – It’s easier to analyze and predict a data set with stationarity. If a series is consistently increasing over time (like the one above), then the sample mean and variance will also grow with the size of the sample, and your model or the proposed time series solution will always underestimate the mean and variance in the future periods.

How you check the stationarity of a series?
In general, we use Augmented Dickey Fuller Test or KPSS test to check the stationarity of the series. Here we will discuss only the ADF test, KPSS phir kabhi

ADF is a statistical significance test (a test which involves null and alternate hypothesis) and it falls under the category of ‘unit root test’. Now, what is a unit root test?

Yt is the value of the time series at time ‘t’ and Xe is an exogenous variable (a separate explanatory variable, which is also a time series).

The presence of a unit root means the time series is non-stationary. Besides, the number of unit roots contained in the series corresponds to the number of differencing operations required to make the series stationary.

A time series is a process that can be written in its components which contains ‘roots’. For example:

v(t)=c+a1 v(t−1) + ϵt − 1

The coefficient a1 is a root. You can interpret this process/formula as ‘the value of today depends on the value of yesterday and some randomness we can’t predict’. We expect this process to always converge back to the value of c.

Try this is out with an example:
suppose c=0 and a1=0.5.

If yesterday v(t−1) the value was 100, then we expect that today the value will be around 50. Tomorrow, we expect the value to be 25 and so on.

You see that this series will ‘come home’, in this case meaning it will converge back to the value of cc.

When one of the roots is a unit, i.e. equal to 1 (in this example when a1=1), then this series will not recover to its origin. You can see this by using the example given above.
That is why the concepts of unit roots and unit root tests are useful: it gives us insights into whether the time series will recover to its expected value. If this is not the case, then the process will be very susceptible to shocks and hard to predict and control.

What is the significance of p-value in ADF test?
A high p-value, suppose 0.87 indicates that the possibility of the series to be non-stationary is 87%.
We do multiple differencing in the dataset to make it stationary

adf.test(diff(time_series))

In the above snippet, we are doing one differentiation of the time series data and then testing the stationarity using the adf test in R
You can also try a double differentiation or a difference after log to check the stationarity(if the noise is high)

adf.test(diff(log(time_series))

A rule of thumb – Don’t over differentiate i.e. don’t apply 6-7 differentiation to fix the noise in order to decrease the p-value for a stationary dataset.

In the case of a first-difference, we are literally getting the difference between a value and the one for the time period immediately previous to it. If you are going for a high number of differentiation then it clearly means that your data has too much noise to cater to a time series pattern

Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality.

Bottom line :-
-the value of today depends on the value of yesterday and some randomness we can’t predict
-Stationarity is useful to identify the pattern in order to predict values
-You do a difference of order one, two, three, etc. to get to a stationary value for the dataset
-Do an ADF or a KPSS test to check if the series is stationary
-Uske baad chill 🙂

Keep Learning 🙂

The Data Monk

About TheDataMonkNewbie

I am the Co-Founder of The Data Monk. I have a total of 4+ years of analytics experience with 3+ years at Mu Sigma and 1 year at OYO. I am an active trader and a logically sarcastic idiot :)

Follow Me

Leave a reply