Updated on: 2022-07-30
In the project “Forecasting Singapore CPI Using Different Time Series Models”, I mentioned that I had ignored the effects of seasonality when estimating the current Consumer Price Index (CPI) of Singapore. In reality, seasonal patterns can be found in many different types of data, such as the total number of air passengers in a month over different seasons and monthly business activity during festive seasons. Accounting for such patterns may help to improve estimation of a variable using different models.
The project uses Autoregressive (Integrated) Moving Average models or AR(I)MA, seasonal ARIMA or SARIMA, as well as ARIMA/SARIMA with exogenous regressors (ARIMAX/SARIMAX) to estimate CPI at time \(t\).
# Use install.packages("packagename") if they are not already installed
# Packages that will be needed for obtaining data via an API
library(httr)
library(jsonlite)
# Packages that will be needed for the main parts of the project
library(corrplot) # For visualizing correlation between variables
library(foreach) # Foreach loops
library(forecast) # For ARIMA models, forecasting and evaluation
library(tidyverse) # For ggplot2 and dplyr
library(zoo) # For converting data to and working with zoo objects
The data used in this project is the same as the previous project. I imported the data from the Department of Statistics Singapore using an API and did the necessary data cleaning and manipulation. For a detailed explanation of how to import data using API and the data cleaning procedures, please see Section 3 and 4 of the previous project.
# Import Singapore Monthly Consumer Price Index
<- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M212881?isTestApi=true&seriesNoORrowNo=1"
cpi_url
<- httr::GET(url = cpi_url)
raw_cpi
<- jsonlite::fromJSON(rawToChar(raw_cpi$content))
cpi_content
<- as.data.frame(cpi_content$Data$row$columns[[1]])
cpi_data
# order.by argument helps to sort the data in time ascending order
<- zoo(x = cpi_data$value, order.by = as.yearmon(cpi_data$key, format = "%Y %b"))
cpi
# Values are of class "character", need to change them to numeric
storage.mode(cpi) <- "numeric"
head(cpi); tail(cpi)
## Jan 1961 Feb 1961 Mar 1961 Apr 1961 May 1961 Jun 1961
## 24.542 24.565 24.585 24.187 24.053 24.223
## Jan 2022 Feb 2022 Mar 2022 Apr 2022 May 2022 Jun 2022
## 104.472 105.379 106.691 106.547 107.598 108.671
# Import Singapore Monthly Domestic Supply Price Index
<- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M212701?isTestApi=true&seriesNoORrowNo=1"
dspi_url
<- httr::GET(url = dspi_url)
raw_dspi
<- jsonlite::fromJSON(rawToChar(raw_dspi$content))
dspi_content
<- as.data.frame(dspi_content$Data$row$columns[[1]])
dspi_data
<- zoo(x = dspi_data$value, order.by = as.yearmon(dspi_data$key, format = "%Y %b"))
dspi
storage.mode(dspi) <- "numeric"
head(dspi); tail(dspi)
## Jan 1974 Feb 1974 Mar 1974 Apr 1974 May 1974 Jun 1974
## 79.774 80.715 81.278 81.560 81.842 81.278
## Jan 2022 Feb 2022 Mar 2022 Apr 2022 May 2022 Jun 2022
## 111.692 115.835 125.088 127.240 129.836 129.371
# Import Singapore Quarterly Composite Leading Index
<- "https://tablebuilder.singstat.gov.sg/api/table/tabledata/M240421?isTestApi=true"
cli_url
<- httr::GET(url = cli_url)
raw_cli
<- jsonlite::fromJSON(rawToChar(raw_cli$content))
cli_content
<- as.data.frame(cli_content$Data$row$columns[[1]])
cli_data
<- zoo(x = cli_data$value, order.by = as.yearqtr(cli_data$key, format = "%Y %q"))
cli
storage.mode(cli) <- "numeric"
head(cli); tail(cli)
## 1978 Q1 1978 Q2 1978 Q3 1978 Q4 1979 Q1 1979 Q2
## 25.7 26.5 27.2 26.6 27.8 27.4
## 2020 Q4 2021 Q1 2021 Q2 2021 Q3 2021 Q4 2022 Q1
## 111.0 110.9 112.1 112.0 112.9 112.3
In this section, I used some plots to visualize if there are any seasonal patterns in the CPI, DSPI and CLI data. Before that, I will need to create quarterly CPI and DSPI data from the monthly data and match the start and end periods of CLI since it has the lowest frequency and observations.
# Aggregate monthly data to quarterly data using mean value of the months in a quarter
<- aggregate(cpi, as.yearqtr, mean)
cpi_q
<- aggregate(dspi, as.yearqtr, mean)
dspi_q
# Adjust the sample size to match CLI data
<- window(x = cpi_q, start = start(cli), end = end(cli))
cpi_q
head(cpi_q); tail(cpi_q)
## 1978 Q1 1978 Q2 1978 Q3 1978 Q4 1979 Q1 1979 Q2
## 44.05733 44.16000 44.97500 45.09767 45.05167 45.37067
## 2020 Q4 2021 Q1 2021 Q2 2021 Q3 2021 Q4 2022 Q1
## 100.0950 100.8980 101.6077 102.1867 103.7827 105.5140
<- window(x = dspi_q, start = start(cli), end = end(cli))
dspi_q
head(dspi_q); tail(dspi_q)
## 1978 Q1 1978 Q2 1978 Q3 1978 Q4 1979 Q1 1979 Q2
## 88.73200 89.92267 91.01833 91.45767 93.86900 100.88533
## 2020 Q4 2021 Q1 2021 Q2 2021 Q3 2021 Q4 2022 Q1
## 87.67033 94.50600 99.25533 104.35233 109.15000 117.53833
Merge data into a single object:
<- cbind(CPI = cpi_q, DSPI = dspi_q, CLI = cli)
dat
head(dat); dim(dat)
## CPI DSPI CLI
## 1978 Q1 44.05733 88.73200 25.7
## 1978 Q2 44.16000 89.92267 26.5
## 1978 Q3 44.97500 91.01833 27.2
## 1978 Q4 45.09767 91.45767 26.6
## 1979 Q1 45.05167 93.86900 27.8
## 1979 Q2 45.37067 100.88533 27.4
## [1] 177 3
To create a boxplot of grouped by the quarters, I would use the first-differenced data. This is because the CPI, DSPI and CLI series at level is not stationary and the boxplots I had plotted in my previous project is very likely to be erroneous. Furthermore, I would group the plots by several decades, starting from 1978Q1-1987Q4 and ending with 2008Q1-2017Q4, since the trends/patterns may vary across years.
Take first-difference on level data:
<- diff(x = dat, lag = 1, differences = 1) %>%
diff_dat `colnames<-`(c("dCPI", "dDSPI", "dCLI"))
head(diff_dat); dim(diff_dat)
## dCPI dDSPI dCLI
## 1978 Q2 0.1026667 1.1906667 0.8
## 1978 Q3 0.8150000 1.0956667 0.7
## 1978 Q4 0.1226667 0.4393333 -0.6
## 1979 Q1 -0.0460000 2.4113333 1.2
## 1979 Q2 0.3190000 7.0163333 -0.4
## 1979 Q3 1.5480000 6.6393333 0.0
## [1] 176 3
Plot level and differenced CPI series:
par(mfrow = c(2, 1))
plot(dat$CPI, main = "Singapore Quarterly CPI from 1978Q1 to 2022Q1", ylab = "CPI")
plot(diff_dat$dCPI, main = "Differenced CPI from 1978Q2 to 2022Q1", ylab = "dCPI")
Boxplot showing range of differenced CPI values in different quarters across different years:
= c("1978Q1", "1988Q1", "1998Q1", "2008Q1")
startdates = c("1987Q4", "1997Q4", "2007Q4", "2017Q4")
enddates
par(mfrow = c(2, 2))
invisible(
foreach(i = startdates, j = enddates) %do% {
boxplot(window(x = diff_dat$dCPI, start = i, end = j) ~ cycle(window(x = diff_dat$dCPI, start = i, end = j)),
ylab = "Quarterly dCPI", xlab = "Quarters",
main = paste(i, "to", j, sep = " "))
} )
In each of the plot, we can see varying median values of differenced CPI in different quarters, which might indicate that seasonality is present in the CPI series. The seasonal patterns were also not constant, which can be seen by comparing the different plots.
To add the effects of seasonality into the models, I would use seasonal differencing or seasonal dummies and test how they differ. For seasonal dummies, I would need to create \(k - 1\) columns of dummy variables representing the different quarters.
Create dummy variables for quarters:
<- zoo(forecast::seasonaldummy(x = as.ts(diff_dat)), order.by = index(diff_dat))
sdum
<- merge(diff_dat, sdum)
diff_dat
head(diff_dat); tail(diff_dat)
## dCPI dDSPI dCLI Q1 Q2 Q3
## 1978 Q2 0.1026667 1.1906667 0.8 0 1 0
## 1978 Q3 0.8150000 1.0956667 0.7 0 0 1
## 1978 Q4 0.1226667 0.4393333 -0.6 0 0 0
## 1979 Q1 -0.0460000 2.4113333 1.2 1 0 0
## 1979 Q2 0.3190000 7.0163333 -0.4 0 1 0
## 1979 Q3 1.5480000 6.6393333 0.0 0 0 1
## dCPI dDSPI dCLI Q1 Q2 Q3
## 2020 Q4 0.3870000 -0.358000 3.6 0 0 0
## 2021 Q1 0.8030000 6.835667 -0.1 1 0 0
## 2021 Q2 0.7096667 4.749333 1.2 0 1 0
## 2021 Q3 0.5790000 5.097000 -0.1 0 0 1
## 2021 Q4 1.5960000 4.797667 0.9 0 0 0
## 2022 Q1 1.7313333 8.388333 -0.6 1 0 0
Before estimating the models, I would split the data into training and testing sets, with the training set containing data up to end-2020 and leave the rest for testing.
<- window(x = diff_dat, end = "2020 Q4")
train
<- window(x = diff_dat, start = "2021 Q1")
test
dim(train); dim(test)
## [1] 171 6
## [1] 5 6
The ARIMA model is the simplest time series model, where it uses the lags of itself and the lags of the error terms. An \(ARIMA(p, d, q)\) model has three components/parameters, \(p\) is the number of AR lags, \(q\) is the number of MA lags, and \(d\) is the number of differencing required to achieve stationarity in the series.
Looking at the autocorrelation and partial autocorrelation plots can help us guess the lags of the AR and MA terms.
$dCPI %>% tsdisplay(main = "ACF and PACF of dCPI") train
# From the PACF, we might use 1 or 2 AR lags. From the ACF, we might use up to 4 MA lags
<- forecast::auto.arima(y = train$dCPI,
mod1 max.p = 4, max.q = 4,
seasonal = F, ic = "aic",
stepwise = F, approximation = F, trace = F)
mod1
## Series: train$dCPI
## ARIMA(1,0,4) with non-zero mean
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 mean
## -0.8769 1.3394 0.6345 0.358 0.2446 0.3281
## s.e. 0.0709 0.1024 0.1281 0.113 0.0726 0.0637
##
## sigma^2 = 0.1996: log likelihood = -102.21
## AIC=218.42 AICc=219.11 BIC=240.41
checkresiduals(object = mod1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,4) with non-zero mean
## Q* = 5.5025, df = 3, p-value = 0.1385
##
## Model df: 6. Total lags used: 9
The estimated model is:
\[ \widehat{CPI}_t = 0.62 - 0.88 CPI_{t-1} + 1.34 u_{t-1} + 0.63 u_{t-2} + 0.36 u_{t-3} + 0.24 u_{t-4} \]
The Ljung-Box test shows that the errors are not serially correlated (we do not reject the null based on p-value of 0.14)
Plot fitted dCPI of ARMA model and actual dCPI:
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from ARMA Model")
lines(zoo(x = mod1$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
The SARMA model adds in seasonal components by seasonal differencing and/or using seasonal lags of AR and MA terms, which is similar to a standard ARMA model.
The model can be written as SARIMA(p, d, q)(P, D, Q)s, where the additional P and Q are the seasonal lags of AR and MA terms, D is the seasonal difference, and s is the seasonal frequency (monthly = 12, quarterly = 4, etc.).
The ACF and PACF plots of dCPI in Section 5.1 shows significant spikes at lag 4 and 8 in the ACF plot and significant spike at lag 8 in the PACF plot. This would indicate possibility of seasonal components in the data.
<- forecast::auto.arima(y = train$dCPI,
mod2 max.p = 4, max.q = 4,
seasonal = T, ic = "aic",
stepwise = F, approximation = F, trace = F)
mod2
## Series: train$dCPI
## ARIMA(1,0,0)(2,0,2)[4] with non-zero mean
##
## Coefficients:
## ar1 sar1 sar2 sma1 sma2 mean
## 0.4937 0.9830 -0.6436 -0.9344 0.8346 0.3167
## s.e. 0.0697 0.1548 0.1779 0.1371 0.1506 0.0874
##
## sigma^2 = 0.1914: log likelihood = -99.69
## AIC=213.37 AICc=214.06 BIC=235.36
checkresiduals(object = mod2, plot = F)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(2,0,2)[4] with non-zero mean
## Q* = 10.742, df = 3, p-value = 0.01321
##
## Model df: 6. Total lags used: 9
There is no seasonal differencing, but there are seasonal lags added to the model. The estimated model is:
\[ \widehat{CPI}_t = 0.05 + 0.49 CPI_{t-1} + 0.98 CPI_{t-4} - 0.64 CPI_{t-8} - 0.93 u_{t-4} + 0.83 u_{t-8} \]
The Ljung-Box test shows that the errors are serially correlated (we reject the null based on p-value of 0.01)
Plot fitted dCPI of SARMA model and actual dCPI:
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from SARMA Model")
lines(zoo(x = mod2$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
Another way we can include seasonal effects is to add seasonal
dummies to the models. The seasonal dummies are called exogenous
regressors, and adding them to an ARMA model forms the ARMAX model.
According to Hyndman (2010), the coefficient of the exogenous regressors
cannot be interpreted as the effect on \(y_t\) when \(x_t\) changes, as it is conditional on the
lags of the dependent variable. He suggested using ARMA errors instead,
which allows the coefficients of exogenous regressors to be interpreted
normally, which is implemented by the
forecast
package when the
xreg
argument is used.
<- forecast::auto.arima(y = train$dCPI,
mod3 max.p = 4, max.q = 4,
seasonal = F, ic = "aic",
stepwise = F, approximation = F, trace = F,
xreg = train[, 4:6])
mod3
## Series: train$dCPI
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept Q1 Q2 Q3
## 0.6822 -0.2506 0.2889 0.0239 -0.0647 0.1939
## s.e. 0.1110 0.1505 0.0907 0.0756 0.0801 0.0750
##
## sigma^2 = 0.1961: log likelihood = -100.47
## AIC=214.94 AICc=215.63 BIC=236.93
checkresiduals(object = mod3, plot = F)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 9.8405, df = 3, p-value = 0.01997
##
## Model df: 6. Total lags used: 9
The estimated model is:
\[ \widehat{CPI}_t = 0.29 + 0.02(Q1) - 0.06(Q2) + 0.19(Q3) + \eta_t \\ \eta_t = 0.68\eta_{t-1} - 0.25u_{t-1} + u_t \]
Based on the estimated model, Quarters 1 and 3 generally has higher dCPI, while Quarter 2 generally has lower dCPI compared to Quarter 4 (base equation), ceteris paribus.
Plot fitted dCPI of ARMA model with seasonal dummies and actual dCPI:
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from ARMA Model with Seasonal Dummies")
lines(zoo(x = mod3$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
The (S)ARMAX model, which was already shown above using seasonal dummies, can include other variables that are exogenous factors of CPI. For this, I assumed that DSPI and CLI are external factors that affect CPI.
I would compare three different ARMAX models: one without including seasonal dummies, one including seasonal dummies and one using SARMA without seasonal dummies.
Estimate ARMAX model without seasonal dummies:
<- forecast::auto.arima(y = train$dCPI,
mod4 max.p = 4, max.q = 4,
seasonal = F, ic = "aic",
stepwise = F, approximation = F, trace = F,
xreg = train[, 2:3])
mod4
## Series: train$dCPI
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept dDSPI dCLI
## 0.7719 -0.4564 0.3523 0.0386 -0.0495
## s.e. 0.1154 0.1725 0.0783 0.0099 0.0206
##
## sigma^2 = 0.192: log likelihood = -99.13
## AIC=210.27 AICc=210.78 BIC=229.12
checkresiduals(object = mod4, plot = F)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 11.303, df = 3, p-value = 0.0102
##
## Model df: 5. Total lags used: 8
\[ \widehat{CPI}_t = 0.35 + 0.04(dDSPI) - 0.05(dCLI) + \eta_t \\ \eta_t = 0.77\eta_{t-1} - 0.46u_{t-1} + u_t \]
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from ARMAX Model without Seasonal Dummies")
lines(zoo(x = mod4$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
Estimate ARMAX model with seasonal dummies:
<- forecast::auto.arima(y = train$dCPI,
mod5 max.p = 4, max.q = 4,
seasonal = F, ic = "aic",
stepwise = F, approximation = F, trace = F,
xreg = train[, 2:6])
mod5
## Series: train$dCPI
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept dDSPI dCLI Q1 Q2 Q3
## 0.7381 -0.3746 0.3340 0.0364 -0.0471 0.0097 -0.0981 0.1597
## s.e. 0.1201 0.1772 0.0891 0.0096 0.0201 0.0751 0.0780 0.0749
##
## sigma^2 = 0.1826: log likelihood = -93.3
## AIC=204.6 AICc=205.72 BIC=232.87
checkresiduals(object = mod5, plot = F)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 9.612, df = 3, p-value = 0.02217
##
## Model df: 8. Total lags used: 11
\[ \widehat{CPI}_t = 0.33 + 0.04(dDSPI) - 0.05(dCLI) + 0.01(Q1) - 0.10(Q2) + 0.16(Q3) + \eta_t \\ \eta_t = 0.74\eta_{t-1} - 0.37u_{t-1} + u_t \]
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from ARMAX Model with Seasonal Dummies")
lines(zoo(x = mod5$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
Estimate SARMAX model without seasonal dummies:
<- forecast::auto.arima(y = train$dCPI,
mod6 max.p = 4, max.q = 4,
seasonal = T, ic = "aic",
stepwise = F, approximation = F, trace = F,
xreg = train[, 2:3])
mod6
## Series: train$dCPI
## Regression with ARIMA(1,0,0)(2,0,0)[4] errors
##
## Coefficients:
## ar1 sar1 sar2 intercept dDSPI dCLI
## 0.4197 0.1227 0.2349 0.3483 0.0377 -0.0456
## s.e. 0.0734 0.0766 0.0777 0.0848 0.0096 0.0204
##
## sigma^2 = 0.1835: log likelihood = -94.99
## AIC=203.98 AICc=204.67 BIC=225.98
checkresiduals(object = mod6, plot = F)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,0)(2,0,0)[4] errors
## Q* = 5.4754, df = 3, p-value = 0.1401
##
## Model df: 6. Total lags used: 9
\[ \widehat{CPI}_t = 0.35 + 0.04(dDSPI) - 0.05(dCLI) + \eta_t \\ \eta_t = 0.42\eta_{t-1} + 0.12\eta_{t-4} + 0.23\eta_{t-8} + u_t \]
plot(train$dCPI,
type = "l", lwd = 2, col = "black",
ylab = "Quarterly dCPI", xlab = "Year",
main = "Fitted dCPI from SARMAX Model")
lines(zoo(x = mod6$fitted, order.by = index(train)),
type = "l", lwd = 2, col = "red")
legend(x = "bottomleft", legend = c("Actual", "Fitted"), col = c("black", "red"), lwd = 2)
In this section, I would forecast and evaluate the out-of-sample performance of the ARMA, SARMA, ARMAX and SARMAX models estimated in Section 5. The forecasts cover the four quarters in 2021 and the first quarter of 2022.
I produced the forecasts of all 5 models and plotted the forecasted
and actual CPI series at levels before evaluating the models’
performance and accuracy using the actual out-of-sample CPI at level. It
uses the forecast
and
accuracy
functions in the
forecast
package.
<- forecast(object = mod1, h = 5, level = 95)
f_mod1
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod1$mean, `Lo 95` = f_mod1$lower, `Hi 95` = f_mod1$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.2803639 -0.5953413 1.156069
## 2021 Q2 0.7096667 0.3921006 -0.5727374 1.356939
## 2021 Q3 0.5790000 0.4225619 -0.5628867 1.408010
## 2021 Q4 1.5960000 0.2619227 -0.7331056 1.256951
## 2022 Q1 1.7313333 0.3861960 -0.6132118 1.385604
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$mean),
f_mod1.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod1$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod1.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMA Model",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
<- forecast(object = mod2, h = 5, level = 95)
f_mod2
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod2$mean, `Lo 95` = f_mod2$lower, `Hi 95` = f_mod2$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.5075483 -0.3499336 1.365030
## 2021 Q2 0.7096667 0.2989128 -0.6573614 1.255187
## 2021 Q3 0.5790000 0.4726703 -0.5061694 1.451510
## 2021 Q4 1.5960000 0.4901704 -0.4940899 1.474431
## 2022 Q1 1.7313333 0.2025612 -0.7860406 1.191163
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$mean),
f_mod2.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod2$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod2.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using SARMA Model",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
<- forecast(object = mod3, h = 5, level = 95, xreg = test[, 4:6])
f_mod3
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod3$mean, `Lo 95` = f_mod3$lower, `Hi 95` = f_mod3$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.3127841 -0.5552458 1.180814
## 2021 Q2 0.7096667 0.2241780 -0.7212457 1.169602
## 2021 Q3 0.5790000 0.4827793 -0.4965837 1.462142
## 2021 Q4 1.5960000 0.2889091 -0.7058560 1.283674
## 2022 Q1 1.7313333 0.3127856 -0.6890676 1.314639
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$mean),
f_mod3.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod3$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod3.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMA Model with Seasonal Dummies",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
Forecasts from ARMAX model without seasonal dummies:
<- forecast(object = mod4, h = 5, level = 95, xreg = test[, 2:3])
f_mod4
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod4$mean, `Lo 95` = f_mod4$lower, `Hi 95` = f_mod4$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.6483145 -0.2104694 1.507098
## 2021 Q2 0.7096667 0.4971612 -0.4033559 1.397678
## 2021 Q3 0.5790000 0.5702153 -0.3542743 1.494705
## 2021 Q4 1.5960000 0.5054231 -0.4330600 1.443906
## 2022 Q1 1.7313333 0.7155289 -0.2311943 1.662252
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$mean),
f_mod4.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod4$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod4.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMAX Model without Seasonal Dummies",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
Forecasts from ARMAX model with seasonal dummies:
<- forecast(object = mod5, h = 5, level = 95, xreg = test[, 2:6])
f_mod5
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod5$mean, `Lo 95` = f_mod5$lower, `Hi 95` = f_mod5$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.6421346 -0.1954001 1.479669
## 2021 Q2 0.7096667 0.3854001 -0.5057326 1.276533
## 2021 Q3 0.5790000 0.7083434 -0.2106720 1.627359
## 2021 Q4 1.5960000 0.4842991 -0.4495550 1.418153
## 2022 Q1 1.7313333 0.6906237 -0.2512153 1.632463
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$mean),
f_mod5.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod5$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod5.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using ARMAX Model with Seasonal Dummies",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
Forecasts from SARMAX model:
<- forecast(object = mod6, h = 5, level = 95, xreg = test[, 2:3])
f_mod6
merge(`Actual dCPI` = test$dCPI,
as.zoo(cbind(`Point Forecast` = f_mod6$mean, `Lo 95` = f_mod6$lower, `Hi 95` = f_mod6$upper)))
## Actual dCPI Point Forecast Lo 95 Hi 95
## 2021 Q1 0.8030000 0.6574018 -0.1821677 1.496971
## 2021 Q2 0.7096667 0.3785194 -0.5319943 1.289033
## 2021 Q3 0.5790000 0.5021116 -0.4203333 1.424556
## 2021 Q4 1.5960000 0.4945078 -0.4300227 1.419038
## 2022 Q1 1.7313333 0.5981741 -0.3353193 1.531667
<- zoo(cbind(fCPI = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$mean),
f_mod6.level fCPI_lower = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$lower),
fCPI_upper = as.vector(dat$CPI[index(dat) == "2020 Q4",]) + cumsum(f_mod6$upper)),
order.by = index(test))
plot(merge(dat$CPI[index(dat) >= "2018 Q1"], f_mod6.level),
type = "l", col = c("black", "red", "gray", "gray"), lwd = 2,
ylab = "Quarterly CPI", xlab = "Year", main = "Forecasted CPI Using SARMAX Model",
plot.type = "single")
legend(x = "topleft",
legend = c("Actual CPI 2018Q1 to 2022Q1", "Forecasted CPI 2021Q1 to 2022Q2", "Lower and Upper Bound"),
col = c("black", "red", "gray"), lwd = 2)
Merge point forecasts to a data frame:
<- cbind(model1 = f_mod1.level$fCPI,
ptfor model2 = f_mod2.level$fCPI,
model3 = f_mod3.level$fCPI,
model4 = f_mod4.level$fCPI,
model5 = f_mod5.level$fCPI,
model6 = f_mod6.level$fCPI)
ptfor
## model1 model2 model3 model4 model5 model6
## 2021 Q1 100.3754 100.6025 100.4078 100.7433 100.7371 100.7524
## 2021 Q2 100.7675 100.9015 100.6320 101.2405 101.1225 101.1309
## 2021 Q3 101.1900 101.3741 101.1147 101.8107 101.8309 101.6330
## 2021 Q4 101.4519 101.8643 101.4037 102.3161 102.3152 102.1275
## 2022 Q1 101.8381 102.0669 101.7164 103.0316 103.0058 102.7257
Use for loop to obtain accuracy measures:
<- foreach (i = 1:6) %do% {
results accuracy(object = as.ts(ptfor[,i]), x = dat$CPI[index(dat) >= "2021 Q1"])
}
names(results) <- colnames(ptfor)
results
## $model1
## ME RMSE MAE MPE MAPE
## Test set 1.67321 2.045313 1.67321 1.609947 1.609947
##
## $model2
## ME RMSE MAE MPE MAPE
## Test set 1.435939 1.833527 1.435939 1.379688 1.379688
##
## $model3
## ME RMSE MAE MPE MAPE
## Test set 1.742885 2.117666 1.742885 1.677304 1.677304
##
## $model4
## ME RMSE MAE MPE MAPE
## Test set 0.9693524 1.312478 0.9693524 0.9296706 0.9296706
##
## $model5
## ME RMSE MAE MPE MAPE
## Test set 0.995495 1.32909 0.995495 0.9552384 0.9552384
##
## $model6
## ME RMSE MAE MPE MAPE
## Test set 1.123878 1.487883 1.123878 1.078533 1.078533
Based on the calculated accuracy measures, model 4 (ARMA with DSPI and CLI as external regressors, without seasonal dummies) performed best on out-of-sample forecasting. The caveat is I had used the actual data of DSPI and CLI for forecasting CPI. Real-world forecasting would require models to forecast these variables either separately or jointly, using an ARMA model or VAR model for example, or to use expert forecast/opinion. These would add to the uncertainty in forecasting and would decrease accuracy of the models.
Seasonal differencing in model 2 (SARIMA) yield better forecasting performance and seasonal dummies in model 3 (ARMA with seasonal dummies as external regressors) had worse performance compared to model 1(standard ARIMA). However, model 5 (ARMA with DSPI, CLI and seasonal dummies as external regressors) had better performance than model 6 (SARMA with DSPI and CLI as external regressors). More work is required to find out why there is such a difference between model 2 and 3 versus model 5 and 6.
The project set out to test the effects of seasonality on forecasting Singapore’s CPI and found that the model with the better forecasting performance was the one without any seasonal components.
The project may not be fully accurate in its estimation, since the mean of the months in a quarter were used to calculate quarterly CPI and DSPI, which may or may not have effects on seasonality. This may be an interesting topic to work on next, since I forced quarterly CPI and DSPI to be used, given that CLI was assumed to affect CPI and it had quarterly frequency. The seasonal effects also seemed to change across decades, as seen in the boxplots in Section 4, and accounting for these changes may lead to better models.
Hyndman, R. (2010). The ARIMAX model muddle. Retrieved from https://robjhyndman.com/hyndsight/arimax/
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. Retrieved 20 July 2022, from https://otexts.com/fpp2/.