Home Data Science How to R - Beginner Guide Time Series Analysis

How to R - Beginner Guide Time Series Analysis

Wiki How to R - Beginners Guide (1)

In the tutorial "How to R - Beginner Guide Time Series Analysis", a time series analysis with R is carried out, outlining the essential steps of data preparation, modelling and visualization. The focus is on the use of the R packages tidyverse, tsibble and fable.

Table of contents

Time series analysis with R

In this short tutorial, we will limit ourselves to briefly outlining the essential steps in time series analysis in the R programming language.

These are:

  • Data preparation,
  • Modeling and
  • Visualization.  

We use the R packages tidyverse, tsibble and fable for this.

Data preparation 

As a rule, all necessary functions for general data preparation are contained in the "tidyverse" package collection. For data preparation in the "tidy" style, however, the "tsibble" package is also needed. First we load the packages and the data:

				
					library(fable) 

library(tsibble) 
library(tidyverse, quietly = TRUE, warn.conflicts = FALSE) 

raw_data  = read_delim("~/Documents/Progonosewerkstatt/Peer_Labs/R/2_DataPreparation/01_Data/Company.csv", delim=";") 

glimpse(raw_data) 

## Rows: 101 
## Columns: 9 
## $ Month       <dttm> 2011-01-01, 2011-02-01, 2011-03-01, 2011-04-01, 2011-05-… 
## $ Germany     <dbl> 19722000, 25062000, 47066000, 52625000, 66489000, 5660100… 
## $ Canada      <dbl> 1809000, 2206000, 3035000, 3793000, 4813000, 4370000, 454… 
## $ Switzerland <dbl> 3624000, 4972000, 7010000, 7084000, 9452000, 7971000, 733… 
## $ Austria     <dbl> 1436000, 2390000, 6556000, 7544000, 9069000, 8154000, 817… 
## $ US          <dbl> 3598000, 3349000, 6101000, 6098000, 6935000, 8644000, 757… 
## $ France      <dbl> 4861000, 5465000, 8440000, 8223000, 9706000, 8809000, 806… 
## $ Sweden      <dbl> 10723000, 12333000, 23562000, 30595000, 46253000, 3886900… 
## $ China       <dbl> 8103000, 4040000, 25974000, 34776000, 31125000, 31576000,… 

 

Im “tidy”-Style werden die Daten im Long-Format benötigt. Wir nutzen die “pivot_longer” Funktion. Anschliessend konvertieren wir das tibble-Objekt (= tidy dataframe) in ein tsibble Objekt, welches ein dataframe für Zeitreihen ist. 

prep_data = raw_data %>%  
  pivot_longer(., cols = 2:9, names_to = "Subsidiary", values_to = "Revenues") %>%  
  mutate(Month = yearmonth(Month)) %>%  
  tsibble(., key = Subsidiary, index = Month) 
glimpse(prep_data) 

## Rows: 808 
## Columns: 3 
## Key: Subsidiary [8] 
## $ Month      <mth> 2011 Jan, 2011 Feb, 2011 Mar, 2011 Apr, 2011 May, 2011 Jun… 
## $ Subsidiary <chr> "Austria", "Austria", "Austria", "Austria", "Austria", "Au… 
## $ Revenues   <dbl> 1436000, 2390000, 6556000, 7544000, 9069000, 8154000, 8173… 
				
			

To get a feel for the data, let's look at it first.

				
					prep_data %>% autoplot(Revenues) 
				
			
Plot in R: Graph from Revenues per Month and Country

We see that there is a break in the data between 2016 and early 2020. After consulting with the technical expert, we know that it is a system change and all values must be non-negative. We correct this.

				
					prep_data = prep_data %>% 
  mutate(Revenues = abs(Revenues))
prep_data %>% autoplot(Revenues)

				
			
Plot in R Revenues per Month and Country

It can now be seen that the available time series have a strong seasonal character. The data are thus prepared and can be modelled.

Modelling the time series analysis

Two widely used models available in the Fable package are ETS and ARIMA. These models are specified with a compact formula representation. The response variable (Revenues) and all transformations are included on the left-hand side, while the model specification is on the right-hand side of the formula. If a model is not fully specified (or if the right-hand side of the formula is completely missing), the non-specified components are automatically selected.

We use both model classes to find the best model for the respective time series.

				
					fit <- prep_data %>%
  model(
    ets = ETS(Revenues),
    arima = ARIMA(Revenues)
  )
## Warning in sqrt(diag(best$var.coef)): NaNs produced

				
			

With coef() one can access all fit parameters of a model.

				
					fit %>% coef()
## # A tibble: 145 x 7
##    Subsidiary .model term  estimate std.error statistic p.value
##    <chr>      <chr>  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
##  1 Austria    ets    alpha  2.21e-1        NA        NA      NA
##  2 Austria    ets    gamma  1.00e-4        NA        NA      NA
##  3 Austria    ets    l      6.43e+6        NA        NA      NA
##  4 Austria    ets    s0    -4.15e+6        NA        NA      NA
##  5 Austria    ets    s1    -3.35e+5        NA        NA      NA
##  6 Austria    ets    s2     2.47e+6        NA        NA      NA
##  7 Austria    ets    s3     2.42e+6        NA        NA      NA
##  8 Austria    ets    s4     1.35e+6        NA        NA      NA
##  9 Austria    ets    s5     1.78e+6        NA        NA      NA
## 10 Austria    ets    s6     1.82e+6        NA        NA      NA
## # … with 135 more rows

				
			

With glance, various statistical parameters of the fit of the model to the time series can be viewed.

				
					fit %>% glance
## # A tibble: 16 x 12
##    Subsidiary .model   sigma2 log_lik   AIC  AICc   BIC      MSE     AMSE
##    <chr>      <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1 Austria    ets    3.51e+11  -1568. 3166. 3172. 3205.  3.02e11  3.14e11
##  2 Canada     ets    1.08e- 2  -1529. 3088. 3093. 3127.  1.47e11  1.53e11
##  3 China      ets    1.84e+13  -1768. 3566. 3572. 3605.  1.59e13  2.76e13
##  4 France     ets    5.57e+11  -1591. 3213. 3218. 3252.  4.80e11  4.79e11
##  5 Germany    ets    1.98e+13  -1772. 3573. 3579. 3613.  1.71e13  1.73e13
##  6 Sweden     ets    1.08e+13  -1741. 3512. 3517. 3551.  9.26e12  1.02e13
##  7 Switzerla… ets    3.61e+11  -1569. 3169. 3175. 3208.  3.11e11  3.20e11
##  8 US         ets    5.81e- 3  -1568. 3172. 3181. 3219.  3.20e11  3.23e11
##  9 Austria    arima  4.55e+11  -1308. 2624. 2625. 2634. NA       NA      
## 10 Canada     arima  1.82e+11  -1285. 2581. 2582. 2593. NA       NA      
## 11 China      arima  2.07e+13  -1492. 2992. 2992. 3002. NA       NA      
## 12 France     arima  5.13e+11  -1329. 2672. 2674. 2690. NA       NA      
## 13 Germany    arima  2.20e+13  -1498. 3003. 3003. 3010. NA       NA      
## 14 Sweden     arima  1.10e+13  -1468. 2944. 2944. 2954. NA       NA      
## 15 Switzerla… arima  4.56e+11  -1322. 2655. 2655. 2667. NA       NA      
## 16 US         arima  4.78e+11  -1311. 2633. 2634. 2648. NA       NA      
## # … with 3 more variables: MAE <dbl>, ar_roots <list>, ma_roots <list>


				
			

With the forecast function we generate the forecasts for each time series from the models.

				
					fc = fit %>% 
  forecast(h = "2 years")

				
			

Finally, we visualize the forecasts. It can be seen that both methods pick up the patterns of the individual time series and incorporate them into the forecast of their respective methodology, as is the case, for example, with the "summer slump" in the French subsidiary or the rising trend in the US company.

				
					fc %>% autoplot(prep_data) +
  ggtitle("Reveunes of Company's Subsidiaries")

				
			
Forecast in R as plot

Download Wiki article as a .pdf

Know more?

Would you like to delve deeper into this topic? Then we would be happy to talk to you personally about the areas of application of R - gladly also in connection with analytics products from SAP. Simply get in touch with us!

Your analytics contact
Christiane Maria Kallfass is Recruiting and Marketing Specialist at s-peers AG
Christiane Kallfass
Recruiting and Marketing Specialist

Published by:

Dr. Stefan Lieder

Former Head of Data Science Workshop

author

How did you like the article?

How helpful was this post?

Click on a star to rate!

Average rating 5 / 5.
Number of ratings: 1

No votes so far! Be the first person to rate this post!

INFORMATION

More information

The connection between SAC and DSP

Seamless planning combines SAC and DSP

The seamless integration of SAP Analytics Cloud (SAC) and SAP Datasphere (DSP) optimizes, accelerates and simplifies your planning processes. So...
Wiki project management

3 steps to successful stakeholder management

Stakeholder management is a key success factor for any project. It...
Business Content in the SAP Datasphere - Wiki Visual

Business content in the SAP Datasphere - best practices for rapid progress

SAP Datasphere offers predefined business content, similar to SAP BW or SAP Analytics Cloud. This consists of data models...

Optimized vs. classic mode in the SAP Analytics Cloud

The SAP Analytics Cloud (SAC) currently offers users two main modes...
Wiki_Banner_Using Sharepoint as a Datasource_2

SharePoint as a data source for BW/DS

SharePoint is a widely used platform in companies and offers...
What can the new SAC Add-in for Microsoft PowerPoint (Composable Analytics) do? Visual

What can the new SAC Add-in for Microsoft PowerPoint (Composable Analytics) do?

This wiki article is about the new SAP Analytics Cloud (SAC) add-in for Microsoft PowerPoint, which provides a...