Updated on: 2022-07-30

1 Introduction

In the project “Forecasting Singapore CPI Using Different Time Series Models”, I mentioned that I had ignored the effects of seasonality when estimating the current Consumer Price Index (CPI) of Singapore. In reality, seasonal patterns can be found in many different types of data, such as the total number of air passengers in a month over different seasons and monthly business activity during festive seasons. Accounting for such patterns may help to improve estimation of a variable using different models.

The project uses Autoregressive (Integrated) Moving Average models or AR(I)MA, seasonal ARIMA or SARIMA, as well as ARIMA/SARIMA with exogenous regressors (ARIMAX/SARIMAX) to estimate CPI at time \(t\).

2 Packages Required

# Use install.packages("packagename") if they are not already installed

# Packages that will be needed for obtaining data via an API
library(httr)
library(jsonlite)

# Packages that will be needed for the main parts of the project
library(corrplot) # For visualizing correlation between variables
library(foreach) # Foreach loops
library(forecast) # For ARIMA models, forecasting and evaluation
library(tidyverse) # For ggplot2 and dplyr
library(zoo) # For converting data to and working with zoo objects

3 Importing Data

The data used in this project is the same as the previous project. I imported the data from the Department of Statistics Singapore using an API and did the necessary data cleaning and manipulation. For a detailed explanation of how to import data using API and the data cleaning procedures, please see Section 3 and 4 of the previous project.

# Import Singapore Monthly Consumer Price Index 

cpi_url <- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M212881?isTestApi=true&seriesNoORrowNo=1"

raw_cpi <- httr::GET(url = cpi_url)

cpi_content <- jsonlite::fromJSON(rawToChar(raw_cpi$content))

cpi_data <- as.data.frame(cpi_content$Data$row$columns[[1]])

# order.by argument helps to sort the data in time ascending order
cpi <- zoo(x = cpi_data$value, order.by = as.yearmon(cpi_data$key, format = "%Y %b"))

# Values are of class "character", need to change them to numeric 
storage.mode(cpi) <- "numeric"

head(cpi); tail(cpi)
## Jan 1961 Feb 1961 Mar 1961 Apr 1961 May 1961 Jun 1961 
##   24.542   24.565   24.585   24.187   24.053   24.223
## Jan 2022 Feb 2022 Mar 2022 Apr 2022 May 2022 Jun 2022 
##  104.472  105.379  106.691  106.547  107.598  108.671
# Import Singapore Monthly Domestic Supply Price Index

dspi_url <- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M212701?isTestApi=true&seriesNoORrowNo=1"

raw_dspi <- httr::GET(url = dspi_url)

dspi_content <- jsonlite::fromJSON(rawToChar(raw_dspi$content))

dspi_data <- as.data.frame(dspi_content$Data$row$columns[[1]])

dspi <- zoo(x = dspi_data$value, order.by = as.yearmon(dspi_data$key, format = "%Y  %b"))

storage.mode(dspi) <- "numeric"

head(dspi); tail(dspi)
## Jan 1974 Feb 1974 Mar 1974 Apr 1974 May 1974 Jun 1974 
##   79.774   80.715   81.278   81.560   81.842   81.278
## Jan 2022 Feb 2022 Mar 2022 Apr 2022 May 2022 Jun 2022 
##  111.692  115.835  125.088  127.240  129.836  129.371
# Import Singapore Quarterly Composite Leading Index

cli_url <- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M240421?isTestApi=true"

raw_cli <- httr::GET(url = cli_url)

cli_content <- jsonlite::fromJSON(rawToChar(raw_cli$content))

cli_data <- as.data.frame(cli_content$Data$row$columns[[1]])

cli <- zoo(x = cli_data$value, order.by = as.yearqtr(cli_data$key, format = "%Y %q"))

storage.mode(cli) <- "numeric"

head(cli); tail(cli)
## 1978 Q1 1978 Q2 1978 Q3 1978 Q4 1979 Q1 1979 Q2 
##    25.7    26.5    27.2    26.6    27.8    27.4
## 2020 Q4 2021 Q1 2021 Q2 2021 Q3 2021 Q4 2022 Q1 
##   111.0   110.9   112.1   112.0   112.9   112.3

4 Exploratory Data Analysis

In this section, I used some plots to visualize if there are any seasonal patterns in the CPI, DSPI and CLI data. Before that, I will need to create quarterly CPI and DSPI data from the monthly data and match the start and end periods of CLI since it has the lowest frequency and observations.

# Aggregate monthly data to quarterly data using mean value of the months in a quarter

cpi_q <- aggregate(cpi, as.yearqtr, mean)

dspi_q <- aggregate(dspi, as.yearqtr, mean)

# Adjust the sample size to match CLI data

cpi_q <- window(x = cpi_q, start = start(cli), end = end(cli))

head(cpi_q); tail(cpi_q)
##  1978 Q1  1978 Q2  1978 Q3  1978 Q4  1979 Q1  1979 Q2 
## 44.05733 44.16000 44.97500 45.09767 45.05167 45.37067
##  2020 Q4  2021 Q1  2021 Q2  2021 Q3  2021 Q4  2022 Q1 
## 100.0950 100.8980 101.6077 102.1867 103.7827 105.5140
dspi_q <- window(x = dspi_q, start = start(cli), end = end(cli))

head(dspi_q); tail(dspi_q)
##   1978 Q1   1978 Q2   1978 Q3   1978 Q4   1979 Q1   1979 Q2 
##  88.73200  89.92267  91.01833  91.45767  93.86900 100.88533
##   2020 Q4   2021 Q1   2021 Q2   2021 Q3   2021 Q4   2022 Q1 
##  87.67033  94.50600  99.25533 104.35233 109.15000 117.53833

Merge data into a single object:

dat <- cbind(CPI = cpi_q, DSPI = dspi_q, CLI = cli)

head(dat); dim(dat)
##              CPI      DSPI  CLI
## 1978 Q1 44.05733  88.73200 25.7
## 1978 Q2 44.16000  89.92267 26.5
## 1978 Q3 44.97500  91.01833 27.2
## 1978 Q4 45.09767  91.45767 26.6
## 1979 Q1 45.05167  93.86900 27.8
## 1979 Q2 45.37067 100.88533 27.4
## [1] 177   3

To create a boxplot of grouped by the quarters, I would use the first-differenced data. This is because the CPI, DSPI and CLI series at level is not stationary and the boxplots I had plotted in my previous project is very likely to be erroneous. Furthermore, I would group the plots by several decades, starting from 1978Q1-1987Q4 and ending with 2008Q1-2017Q4, since the trends/patterns may vary across years.

Take first-difference on level data:

diff_dat <- diff(x = dat, lag = 1, differences = 1) %>% 
  `colnames<-`(c("dCPI", "dDSPI", "dCLI"))

head(diff_dat); dim(diff_dat)
##               dCPI     dDSPI dCLI
## 1978 Q2  0.1026667 1.1906667  0.8
## 1978 Q3  0.8150000 1.0956667  0.7
## 1978 Q4  0.1226667 0.4393333 -0.6
## 1979 Q1 -0.0460000 2.4113333  1.2
## 1979 Q2  0.3190000 7.0163333 -0.4
## 1979 Q3  1.5480000 6.6393333  0.0
## [1] 176   3

Plot level and differenced CPI series:

par(mfrow = c(2, 1))

plot(dat$CPI, main = "Singapore Quarterly CPI from 1978Q1 to 2022Q1", ylab = "CPI")

plot(diff_dat$dCPI, main = "Differenced CPI from 1978Q2 to 2022Q1", ylab = "dCPI")

Boxplot showing range of differenced CPI values in different quarters across different years:

startdates = c("1978Q1", "1988Q1", "1998Q1", "2008Q1")
enddates = c("1987Q4", "1997Q4", "2007Q4", "2017Q4")

par(mfrow = c(2, 2))

invisible(
foreach(i = startdates, j = enddates) %do% {
  
  boxplot(window(x = diff_dat$dCPI, start = i, end = j) ~ cycle(window(x = diff_dat$dCPI, start = i, end = j)),
          ylab = "Quarterly dCPI", xlab = "Quarters",
          main = paste(i, "to", j, sep = " "))
  
}
)

In each of the plot, we can see varying median values of differenced CPI in different quarters, which might indicate that seasonality is present in the CPI series. The seasonal patterns were also not constant, which can be seen by comparing the different plots.

5 Model Selection

To add the effects of seasonality into the models, I would use seasonal differencing or seasonal dummies and test how they differ. For seasonal dummies, I would need to create \(k - 1\) columns of dummy variables representing the different quarters.

Create dummy variables for quarters:

sdum <- zoo(forecast::seasonaldummy(x = as.ts(diff_dat)), order.by = index(diff_dat))

diff_dat <- merge(diff_dat, sdum)

head(diff_dat); tail(diff_dat)
##               dCPI     dDSPI dCLI Q1 Q2 Q3
## 1978 Q2  0.1026667 1.1906667  0.8  0  1  0
## 1978 Q3  0.8150000 1.0956667  0.7  0  0  1
## 1978 Q4  0.1226667 0.4393333 -0.6  0  0  0
## 1979 Q1 -0.0460000 2.4113333  1.2  1  0  0
## 1979 Q2  0.3190000 7.0163333 -0.4  0  1  0
## 1979 Q3  1.5480000 6.6393333  0.0  0  0  1
##              dCPI     dDSPI dCLI Q1 Q2 Q3
## 2020 Q4 0.3870000 -0.358000  3.6  0  0  0
## 2021 Q1 0.8030000  6.835667 -0.1  1  0  0
## 2021 Q2 0.7096667  4.749333  1.2  0  1  0
## 2021 Q3 0.5790000  5.097000 -0.1  0  0  1
## 2021 Q4 1.5960000  4.797667  0.9  0  0  0
## 2022 Q1 1.7313333  8.388333 -0.6  1  0  0

Before estimating the models, I would split the data into training and testing sets, with the training set containing data up to end-2020 and leave the rest for testing.

train <- window(x = diff_dat, end = "2020 Q4")

test <- window(x = diff_dat, start = "2021 Q1")

dim(train); dim(test)
## [1] 171   6
## [1] 5 6

5.1 Autoregressive (Integrated) Moving Average Model

The ARIMA model is the simplest time series model, where it uses the lags of itself and the lags of the error terms. An \(ARIMA(p, d, q)\) model has three components/parameters, \(p\) is the number of AR lags, \(q\) is the number of MA lags, and \(d\) is the number of differencing required to achieve stationarity in the series.

Looking at the autocorrelation and partial autocorrelation plots can help us guess the lags of the AR and MA terms.

train$dCPI %>% tsdisplay(main = "ACF and PACF of dCPI")

# From the PACF, we might use 1 or 2 AR lags. From the ACF, we might use up to 4 MA lags

mod1 <- forecast::auto.arima(y = train$dCPI,
                             max.p = 4, max.q = 4, 
                             seasonal = F, ic = "aic", 
                             stepwise = F, approximation = F, trace = F)

mod1
## Series: train$dCPI 
## ARIMA(1,0,4) with non-zero mean 
## 
## Coefficients:
##           ar1     ma1     ma2    ma3     ma4    mean
##       -0.8769  1.3394  0.6345  0.358  0.2446  0.3281
## s.e.   0.0709  0.1024  0.1281  0.113  0.0726  0.0637
## 
## sigma^2 = 0.1996:  log likelihood = -102.21
## AIC=218.42   AICc=219.11   BIC=240.41
checkresiduals(object = mod1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,4) with non-zero mean
## Q* = 5.5025, df = 3, p-value = 0.1385
## 
## Model df: 6.   Total lags used: 9

The estimated model is:

\[ \widehat{CPI}_t = 0.62 - 0.88 CPI_{t-1} + 1.34 u_{t-1} + 0.63 u_{t-2} + 0.36 u_{t-3} + 0.24 u_{t-4} \]

The Ljung-Box test shows that the errors are not serially correlated (we do not reject the null based on p-value of 0.14)

Plot fitted dCPI of ARMA model and actual dCPI:

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year", 
     main = "Fitted dCPI from ARMA Model")

lines(zoo(x = mod1$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

5.2 Seasonal ARMA Model

The SARMA model adds in seasonal components by seasonal differencing and/or using seasonal lags of AR and MA terms, which is similar to a standard ARMA model.

The model can be written as SARIMA(p, d, q)(P, D, Q)s, where the additional P and Q are the seasonal lags of AR and MA terms, D is the seasonal difference, and s is the seasonal frequency (monthly = 12, quarterly = 4, etc.).

The ACF and PACF plots of dCPI in Section 5.1 shows significant spikes at lag 4 and 8 in the ACF plot and significant spike at lag 8 in the PACF plot. This would indicate possibility of seasonal components in the data.

mod2 <- forecast::auto.arima(y = train$dCPI, 
                             max.p = 4, max.q = 4, 
                             seasonal = T, ic = "aic", 
                             stepwise = F, approximation = F, trace = F)

mod2
## Series: train$dCPI 
## ARIMA(1,0,0)(2,0,2)[4] with non-zero mean 
## 
## Coefficients:
##          ar1    sar1     sar2     sma1    sma2    mean
##       0.4937  0.9830  -0.6436  -0.9344  0.8346  0.3167
## s.e.  0.0697  0.1548   0.1779   0.1371  0.1506  0.0874
## 
## sigma^2 = 0.1914:  log likelihood = -99.69
## AIC=213.37   AICc=214.06   BIC=235.36
checkresiduals(object = mod2, plot = F)
## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(2,0,2)[4] with non-zero mean
## Q* = 10.742, df = 3, p-value = 0.01321
## 
## Model df: 6.   Total lags used: 9

There is no seasonal differencing, but there are seasonal lags added to the model. The estimated model is:

\[ \widehat{CPI}_t = 0.05 + 0.49 CPI_{t-1} + 0.98 CPI_{t-4} - 0.64 CPI_{t-8} - 0.93 u_{t-4} + 0.83 u_{t-8} \]

The Ljung-Box test shows that the errors are serially correlated (we reject the null based on p-value of 0.01)

Plot fitted dCPI of SARMA model and actual dCPI:

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year",
     main = "Fitted dCPI from SARMA Model")

lines(zoo(x = mod2$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

5.3 ARMA Model with Seasonal Dummies

Another way we can include seasonal effects is to add seasonal dummies to the models. The seasonal dummies are called exogenous regressors, and adding them to an ARMA model forms the ARMAX model. According to Hyndman (2010), the coefficient of the exogenous regressors cannot be interpreted as the effect on \(y_t\) when \(x_t\) changes, as it is conditional on the lags of the dependent variable. He suggested using ARMA errors instead, which allows the coefficients of exogenous regressors to be interpreted normally, which is implemented by the forecast package when the xreg argument is used.

mod3 <- forecast::auto.arima(y = train$dCPI, 
                             max.p = 4, max.q = 4,
                             seasonal = F, ic = "aic", 
                             stepwise = F, approximation = F, trace = F,
                             xreg = train[, 4:6])

mod3
## Series: train$dCPI 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1      ma1  intercept      Q1       Q2      Q3
##       0.6822  -0.2506     0.2889  0.0239  -0.0647  0.1939
## s.e.  0.1110   0.1505     0.0907  0.0756   0.0801  0.0750
## 
## sigma^2 = 0.1961:  log likelihood = -100.47
## AIC=214.94   AICc=215.63   BIC=236.93
checkresiduals(object = mod3, plot = F)
## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 9.8405, df = 3, p-value = 0.01997
## 
## Model df: 6.   Total lags used: 9

The estimated model is:

\[ \widehat{CPI}_t = 0.29 + 0.02(Q1) - 0.06(Q2) + 0.19(Q3) + \eta_t \\ \eta_t = 0.68\eta_{t-1} - 0.25u_{t-1} + u_t \]

Based on the estimated model, Quarters 1 and 3 generally has higher dCPI, while Quarter 2 generally has lower dCPI compared to Quarter 4 (base equation), ceteris paribus.

Plot fitted dCPI of ARMA model with seasonal dummies and actual dCPI:

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year",
     main = "Fitted dCPI from ARMA Model with Seasonal Dummies")

lines(zoo(x = mod3$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

5.4 (S)ARMA Model with Exogenous Variables

The (S)ARMAX model, which was already shown above using seasonal dummies, can include other variables that are exogenous factors of CPI. For this, I assumed that DSPI and CLI are external factors that affect CPI.

I would compare three different ARMAX models: one without including seasonal dummies, one including seasonal dummies and one using SARMA without seasonal dummies.

Estimate ARMAX model without seasonal dummies:

mod4 <- forecast::auto.arima(y = train$dCPI, 
                             max.p = 4, max.q = 4,
                             seasonal = F, ic = "aic", 
                             stepwise = F, approximation = F, trace = F,
                             xreg = train[, 2:3])

mod4
## Series: train$dCPI 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1      ma1  intercept   dDSPI     dCLI
##       0.7719  -0.4564     0.3523  0.0386  -0.0495
## s.e.  0.1154   0.1725     0.0783  0.0099   0.0206
## 
## sigma^2 = 0.192:  log likelihood = -99.13
## AIC=210.27   AICc=210.78   BIC=229.12
checkresiduals(object = mod4, plot = F)
## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 11.303, df = 3, p-value = 0.0102
## 
## Model df: 5.   Total lags used: 8

\[ \widehat{CPI}_t = 0.35 + 0.04(dDSPI) - 0.05(dCLI) + \eta_t \\ \eta_t = 0.77\eta_{t-1} - 0.46u_{t-1} + u_t \]

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year",
     main = "Fitted dCPI from ARMAX Model without Seasonal Dummies")

lines(zoo(x = mod4$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

Estimate ARMAX model with seasonal dummies:

mod5 <- forecast::auto.arima(y = train$dCPI, 
                             max.p = 4, max.q = 4,
                             seasonal = F, ic = "aic", 
                             stepwise = F, approximation = F, trace = F,
                             xreg = train[, 2:6])

mod5
## Series: train$dCPI 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1      ma1  intercept   dDSPI     dCLI      Q1       Q2      Q3
##       0.7381  -0.3746     0.3340  0.0364  -0.0471  0.0097  -0.0981  0.1597
## s.e.  0.1201   0.1772     0.0891  0.0096   0.0201  0.0751   0.0780  0.0749
## 
## sigma^2 = 0.1826:  log likelihood = -93.3
## AIC=204.6   AICc=205.72   BIC=232.87
checkresiduals(object = mod5, plot = F)
## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 9.612, df = 3, p-value = 0.02217
## 
## Model df: 8.   Total lags used: 11

\[ \widehat{CPI}_t = 0.33 + 0.04(dDSPI) - 0.05(dCLI) + 0.01(Q1) - 0.10(Q2) + 0.16(Q3) + \eta_t \\ \eta_t = 0.74\eta_{t-1} - 0.37u_{t-1} + u_t \]

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year",
     main = "Fitted dCPI from ARMAX Model with Seasonal Dummies")

lines(zoo(x = mod5$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

Estimate SARMAX model without seasonal dummies:

mod6 <- forecast::auto.arima(y = train$dCPI, 
                             max.p = 4, max.q = 4,
                             seasonal = T, ic = "aic", 
                             stepwise = F, approximation = F, trace = F,
                             xreg = train[, 2:3])

mod6
## Series: train$dCPI 
## Regression with ARIMA(1,0,0)(2,0,0)[4] errors 
## 
## Coefficients:
##          ar1    sar1    sar2  intercept   dDSPI     dCLI
##       0.4197  0.1227  0.2349     0.3483  0.0377  -0.0456
## s.e.  0.0734  0.0766  0.0777     0.0848  0.0096   0.0204
## 
## sigma^2 = 0.1835:  log likelihood = -94.99
## AIC=203.98   AICc=204.67   BIC=225.98
checkresiduals(object = mod6, plot = F)
## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,0)(2,0,0)[4] errors
## Q* = 5.4754, df = 3, p-value = 0.1401
## 
## Model df: 6.   Total lags used: 9

\[ \widehat{CPI}_t = 0.35 + 0.04(dDSPI) - 0.05(dCLI) + \eta_t \\ \eta_t = 0.42\eta_{t-1} + 0.12\eta_{t-4} + 0.23\eta_{t-8} + u_t \]

plot(train$dCPI, 
     type = "l", lwd = 2, col = "black", 
     ylab = "Quarterly dCPI", xlab = "Year",
     main = "Fitted dCPI from SARMAX Model")

lines(zoo(x = mod6$fitted, order.by = index(train)), 
      type = "l", lwd = 2, col = "red")

legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)

6 Forecasting and Model Evaluation

In this section, I would forecast and evaluate the out-of-sample performance of the ARMA, SARMA, ARMAX and SARMAX models estimated in Section 5. The forecasts cover the four quarters in 2021 and the first quarter of 2022.

I produced the forecasts of all 5 models and plotted the forecasted and actual CPI series at levels before evaluating the models’ performance and accuracy using the actual out-of-sample CPI at level. It uses the forecast and accuracy functions in the forecast package.

6.1 ARMA Model

f_mod1 <- forecast(object = mod1, h = 5, level = 95)

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod1$mean, `Lo 95` = f_mod1$lower, `Hi 95` = f_mod1$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.2803639 -0.5953413 1.156069
## 2021 Q2   0.7096667      0.3921006 -0.5727374 1.356939
## 2021 Q3   0.5790000      0.4225619 -0.5628867 1.408010
## 2021 Q4   1.5960000      0.2619227 -0.7331056 1.256951
## 2022 Q1   1.7313333      0.3861960 -0.6132118 1.385604
f_mod1.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod1.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMA Model",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

6.2 SARMA Model

f_mod2 <- forecast(object = mod2, h = 5, level = 95)

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod2$mean, `Lo 95` = f_mod2$lower, `Hi 95` = f_mod2$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.5075483 -0.3499336 1.365030
## 2021 Q2   0.7096667      0.2989128 -0.6573614 1.255187
## 2021 Q3   0.5790000      0.4726703 -0.5061694 1.451510
## 2021 Q4   1.5960000      0.4901704 -0.4940899 1.474431
## 2022 Q1   1.7313333      0.2025612 -0.7860406 1.191163
f_mod2.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod2.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using SARMA Model",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

6.3 ARMA Model with Seasonal Dummies

f_mod3 <- forecast(object = mod3, h = 5, level = 95, xreg = test[, 4:6])

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod3$mean, `Lo 95` = f_mod3$lower, `Hi 95` = f_mod3$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.3127841 -0.5552458 1.180814
## 2021 Q2   0.7096667      0.2241780 -0.7212457 1.169602
## 2021 Q3   0.5790000      0.4827793 -0.4965837 1.462142
## 2021 Q4   1.5960000      0.2889091 -0.7058560 1.283674
## 2022 Q1   1.7313333      0.3127856 -0.6890676 1.314639
f_mod3.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod3.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMA Model with Seasonal Dummies",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

6.4 (S)ARMA Model with Exogenous Variables

Forecasts from ARMAX model without seasonal dummies:

f_mod4 <- forecast(object = mod4, h = 5, level = 95, xreg = test[, 2:3])

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod4$mean, `Lo 95` = f_mod4$lower, `Hi 95` = f_mod4$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.6483145 -0.2104694 1.507098
## 2021 Q2   0.7096667      0.4971612 -0.4033559 1.397678
## 2021 Q3   0.5790000      0.5702153 -0.3542743 1.494705
## 2021 Q4   1.5960000      0.5054231 -0.4330600 1.443906
## 2022 Q1   1.7313333      0.7155289 -0.2311943 1.662252
f_mod4.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod4.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMAX Model without Seasonal Dummies",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

Forecasts from ARMAX model with seasonal dummies:

f_mod5 <- forecast(object = mod5, h = 5, level = 95, xreg = test[, 2:6])

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod5$mean, `Lo 95` = f_mod5$lower, `Hi 95` = f_mod5$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.6421346 -0.1954001 1.479669
## 2021 Q2   0.7096667      0.3854001 -0.5057326 1.276533
## 2021 Q3   0.5790000      0.7083434 -0.2106720 1.627359
## 2021 Q4   1.5960000      0.4842991 -0.4495550 1.418153
## 2022 Q1   1.7313333      0.6906237 -0.2512153 1.632463
f_mod5.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod5.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMAX Model with Seasonal Dummies",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

Forecasts from SARMAX model:

f_mod6 <- forecast(object = mod6, h = 5, level = 95, xreg = test[, 2:3])

merge(`Actual dCPI` = test$dCPI, 
      as.zoo(cbind(`Point Forecast` = f_mod6$mean, `Lo 95` = f_mod6$lower, `Hi 95` = f_mod6$upper)))
##         Actual dCPI Point Forecast      Lo 95    Hi 95
## 2021 Q1   0.8030000      0.6574018 -0.1821677 1.496971
## 2021 Q2   0.7096667      0.3785194 -0.5319943 1.289033
## 2021 Q3   0.5790000      0.5021116 -0.4203333 1.424556
## 2021 Q4   1.5960000      0.4945078 -0.4300227 1.419038
## 2022 Q1   1.7313333      0.5981741 -0.3353193 1.531667
f_mod6.level <- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$mean),
                          fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$lower),
                          fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$upper)),
                    order.by = index(test))

plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod6.level), 
     type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
     ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using SARMAX Model",
     plot.type = "single")

legend(x = "topleft", 
       legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"), 
       col = c("black", "red", "gray"), lwd = 2)

6.5 Forecasting Performance of Models

Merge point forecasts to a data frame:

ptfor <- cbind(model1 = f_mod1.level$fCPI, 
               model2 = f_mod2.level$fCPI,
               model3 = f_mod3.level$fCPI, 
               model4 = f_mod4.level$fCPI,
               model5 = f_mod5.level$fCPI, 
               model6 = f_mod6.level$fCPI)

ptfor
##           model1   model2   model3   model4   model5   model6
## 2021 Q1 100.3754 100.6025 100.4078 100.7433 100.7371 100.7524
## 2021 Q2 100.7675 100.9015 100.6320 101.2405 101.1225 101.1309
## 2021 Q3 101.1900 101.3741 101.1147 101.8107 101.8309 101.6330
## 2021 Q4 101.4519 101.8643 101.4037 102.3161 102.3152 102.1275
## 2022 Q1 101.8381 102.0669 101.7164 103.0316 103.0058 102.7257

Use for loop to obtain accuracy measures:

results <- foreach (i = 1:6) %do% {
  accuracy(object = as.ts(ptfor[,i]), x = dat$CPI[index(dat) >= "2021 Q1"])
}

names(results) <- colnames(ptfor)

results
## $model1
##               ME     RMSE     MAE      MPE     MAPE
## Test set 1.67321 2.045313 1.67321 1.609947 1.609947
## 
## $model2
##                ME     RMSE      MAE      MPE     MAPE
## Test set 1.435939 1.833527 1.435939 1.379688 1.379688
## 
## $model3
##                ME     RMSE      MAE      MPE     MAPE
## Test set 1.742885 2.117666 1.742885 1.677304 1.677304
## 
## $model4
##                 ME     RMSE       MAE       MPE      MAPE
## Test set 0.9693524 1.312478 0.9693524 0.9296706 0.9296706
## 
## $model5
##                ME    RMSE      MAE       MPE      MAPE
## Test set 0.995495 1.32909 0.995495 0.9552384 0.9552384
## 
## $model6
##                ME     RMSE      MAE      MPE     MAPE
## Test set 1.123878 1.487883 1.123878 1.078533 1.078533

Based on the calculated accuracy measures, model 4 (ARMA with DSPI and CLI as external regressors, without seasonal dummies) performed best on out-of-sample forecasting. The caveat is I had used the actual data of DSPI and CLI for forecasting CPI. Real-world forecasting would require models to forecast these variables either separately or jointly, using an ARMA model or VAR model for example, or to use expert forecast/opinion. These would add to the uncertainty in forecasting and would decrease accuracy of the models.

Seasonal differencing in model 2 (SARIMA) yield better forecasting performance and seasonal dummies in model 3 (ARMA with seasonal dummies as external regressors) had worse performance compared to model 1(standard ARIMA). However, model 5 (ARMA with DSPI, CLI and seasonal dummies as external regressors) had better performance than model 6 (SARMA with DSPI and CLI as external regressors). More work is required to find out why there is such a difference between model 2 and 3 versus model 5 and 6.

7 Final Remarks

The project set out to test the effects of seasonality on forecasting Singapore’s CPI and found that the model with the better forecasting performance was the one without any seasonal components.

The project may not be fully accurate in its estimation, since the mean of the months in a quarter were used to calculate quarterly CPI and DSPI, which may or may not have effects on seasonality. This may be an interesting topic to work on next, since I forced quarterly CPI and DSPI to be used, given that CLI was assumed to affect CPI and it had quarterly frequency. The seasonal effects also seemed to change across decades, as seen in the boxplots in Section 4, and accounting for these changes may lead to better models.

References

Hyndman, R. (2010). The ARIMAX model muddle. Retrieved from https://robjhyndman.com/hyndsight/arimax/

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. Retrieved 20 July 2022, from https://otexts.com/fpp2/.