Updated on: 2022-07-26

1 Introduction

The purpose of this project was to learn about Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models and how to implement them in R to measure volatility of stock returns. It introduces the packages in R that is used for building GARCH models and briefly touches on basic n-ahead forecasting to evaluate the models’ out-of-sample forecast performance.

2 Packages Required

library(dplyr)
library(forecast) # For ARIMA models, forecasting and evaluation
library(PerformanceAnalytics) # For portfolio performance and risk analysis
library(PortfolioAnalytics) # For portfolio optimization and analysis
library(quantmod) # For obtaining historical prices from Yahoo Finance
library(rugarch) # For univariate GARCH models
library(urca) # For unit root tests

3 Retrieve Data and Check Stationarity

3.1 Data and Plots

In typical time series analysis, we usually require that the time series data is stationary with a constant mean and constant variance. However, we may have stationarity with changing conditional variance (conditional heteroskedasticity), commonly seen in financial data. In such data, the plots would show periods of volatility clustering, where there are periods of high and low volatility and variance is not constant in these periods. For instance, let us take a look at the weekly returns of Apple Inc. (AAPL), which we will be using for the rest of this project.

# Load AAPL price data from Yahoo Finance using quantmod package

# Set start and end dates for data retrieval
startdate <- as.Date("2010-01-01")
enddate <- as.Date("2022-07-01")

# Retrieve price data
AAPL_price <- quantmod::getSymbols(Symbols = "AAPL", 
                                   src = "yahoo", 
                                   from = startdate, to = enddate, 
                                   periodicity = "weekly", 
                                   auto.assign = F)

# View the first and last 6 observations in AAPL_price
c(head(AAPL_price), tail(AAPL_price))
##             AAPL.Open  AAPL.High   AAPL.Low AAPL.Close AAPL.Volume
## 2010-01-01   7.622500   7.699643   7.466071   7.520714  2124925600
## 2010-01-08   7.510714   7.607143   7.289286   7.479643  2543086000
## 2010-01-15   7.533214   7.698214   7.352500   7.431071  2544382400
## 2010-01-22   7.385000   7.632500   7.041429   7.117500  6710648000
## 2010-01-29   7.181429   7.221429   6.794643   6.858929  4067151200
## 2010-02-05   6.879643   7.133929   6.816071   7.095357  2882171600
## 2022-05-20 139.089996 144.339996 132.610001 143.779999   542369300
## 2022-05-27 145.389999 151.740005 145.259995 151.210007   341331600
## 2022-06-03 146.899994 149.869995 142.529999 142.639999   351400100
## 2022-06-10 140.279999 140.759995 129.039993 130.059998   498086200
## 2022-06-17 130.070007 138.589996 129.809998 138.270004   361363800
## 2022-06-24 139.899994 143.490005 133.770004 136.720001   391615000
##            AAPL.Adjusted
## 2010-01-01      6.430343
## 2010-01-08      6.395227
## 2010-01-15      6.353699
## 2010-01-22      6.085590
## 2010-01-29      5.864506
## 2010-02-05      6.066657
## 2022-05-20    143.779999
## 2022-05-27    151.210007
## 2022-06-03    142.639999
## 2022-06-10    130.059998
## 2022-06-17    138.270004
## 2022-06-24    136.720001
# Check number of observations
nrow(AAPL_price)
## [1] 652
# Check for missing data
colSums(is.na(AAPL_price))
##     AAPL.Open     AAPL.High      AAPL.Low    AAPL.Close   AAPL.Volume 
##             0             0             0             0             0 
## AAPL.Adjusted 
##             0
# Calculate log/continuous returns using Adjusted column (column 6)

# Remove first row of calculated returns since returns cannot be calculated for first observation
rAAPL <- PerformanceAnalytics::Return.calculate(prices = AAPL_price[, 6], method = "log")[-1,] * 100
colnames(rAAPL) <- "Returns"

# Check that first row has been removed
head(rAAPL)
##               Returns
## 2010-01-08 -0.5475950
## 2010-01-15 -0.6514768
## 2010-01-22 -4.3113482
## 2010-01-29 -3.7005431
## 2010-02-05  3.3889462
## 2010-02-12  2.1215958
nrow(rAAPL)
## [1] 651
# Chart weekly log-returns over time
plot.zoo(rAAPL, main = "Weekly Log-Returns of AAPL", xlab = "Time", ylab = "Log-Return (in %)")

If we were to only look at the period in early 2020 (COVID-19 outbreak), we see that there is a period where large absolute returns are followed by large absolute returns, which then starts to return to “normal”. I charted the monthly volatilty (measured using annualized standard deviation) of rAAPL to better present the fluctuations in returns that might not be easily seen in the weekly returns plot.

# Plot rolling volatility of AAPL returns

# width = 4 to approximately calculate annualized sd using a monthly rolling-window
PerformanceAnalytics::chart.RollingPerformance(R = rAAPL, width = 4, FUN = "sd.annualized", scale = 52, main = "1-Month Rolling Volatility of AAPL Returns")

3.2 Unit Root and Stationarity Tests

We can check if the series is stationary using an Augmented Dickey Fuller Test and Kwiatkowski-Phillips-Schmidt-Shin Test.

# ADF Test with Drift, lags selected based on AIC

rAAPL %>% urca::ur.df(type = "drift", selectlags = "AIC") %>% summary()
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17.0125  -2.1538   0.2441   2.2200  16.4332 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.4726270  0.1479833   3.194  0.00147 ** 
## z.lag.1     -0.9994788  0.0556699 -17.954  < 2e-16 ***
## z.diff.lag  -0.0008002  0.0394159  -0.020  0.98381    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.711 on 646 degrees of freedom
## Multiple R-squared:  0.5001, Adjusted R-squared:  0.4986 
## F-statistic: 323.1 on 2 and 646 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -17.9537 161.1678 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.43 -2.86 -2.57
## phi1  6.43  4.59  3.78
# KPSS Test

rAAPL %>% urca::ur.kpss(type = "mu", lags = "long") %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 19 lags. 
## 
## Value of test-statistic is: 0.0525 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test-statistic value of tau2 -17.95 against the critical value of -2.86 at the 5% significance level means that we can reject the null hypothesis (alternative hypothesis is that the time series does not have unit root). The KPSS test-statistic 0.052 against the 5% critical value of 0.46 means that we cannot reject the null hypothesis (alternative hypothesis that the time series is not stationary).

Despite varying variances at different periods, the ADF and KPSS tests shows that the series is stationary. The problem of conditional heteroskedasticity is not addressed in ARIMA and other models that assumed constant variance. We can use GARCH models to estimate this volatility to better manage risk and improve forecasts of returns. Before discussing GARCH, it may be helpful to talk about the ARCH model since GARCH was an improvement of it.

4 Understanding GARCH Models

4.1 ARCH Models

ARCH or AutoRegressive Conditional Heteroskedasticity models can be thought of as an Autoregressive model. Suppose that the returns was modelled as such (remains true for all the GARCH models):

\[r_t = \mu + \varepsilon_t\]

where \(\mu\) can be a constant or represented with ARMA models, and \(\varepsilon_t\) is the residual at time \(t\). \(\varepsilon_t\) can be separated into a time-varying volatility or standard deviation \(\sigma_t\) and a white noise error term (or called innovations/shocks) \(z_t\) which is i.i.d. with mean of 0 and variance of 1, which we write as \(\varepsilon_t = \sigma_t z_t\).

In most cases, \(z_t\) is assumed to have a normal distribution, but with the fat tails and slight skew that is typical of financial returns, it is also possible to use a Student’s t-distribution or a skewed t-distribution.

# Plot distribution of AAPL's weekly log returns

PerformanceAnalytics::chart.Histogram(R = rAAPL,
                                      main = "Distribution of AAPL Weekly Log-Returns", 
                                      methods = c("add.density", "add.normal"), 
                                      colorset = c("blue", "red", "black"),
                                      ylim = c(0, 0.15))

legend(x = "topleft", legend = c("Log-Return", "Density", "Normal"), col = c("blue", "red", "black"), lty = 1)

# Look at skewness and kurtosis of AAPL returns

PerformanceAnalytics::table.Distributions(R = rAAPL)

An ARCH(1) model is then formulated as:

\[Var(\varepsilon_t) = \sigma_t^2 = \omega + \alpha_1 \varepsilon^2_{t-1}\]

What the formula essentially say is that the conditional variance in the current period (\(\sigma_t^2\)) is affected by the errors in the previous period. \(\alpha_0 > 0\) since variance of first period should also be non-negative and \(\alpha_1 > 0\) so that larger error terms in the previous period means higher conditional variance but \(\alpha_1 < 1\) to prevent “explosive” variance. This can be extended to an ARCH(q) model where we have \(q\) lags of error terms.

Moving forward, I would estimate the models using returns data up to end-April 2022 and use the data in May and June 2022 to evaluate the models.

# Split the data for training and testing models
trainset <- rAAPL["/2022-04",]
nrow(trainset)
## [1] 643
testset <- rAAPL["2022-05/",]
nrow(testset)
## [1] 8

4.1.1 ARCH(1) Model

I used an ARMA model to estimate the mean equation before generating the ARCH model.

# Find mean equation using auto.arima from forecast package

meaneqn <- forecast::auto.arima(y = trainset, 
                                max.p = 2, max.q = 2, 
                                stepwise = F, approximation = F, trace = F, 
                                ic = "aic", method = "ML")
summary(meaneqn)
## Series: trainset 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       0.4965
## s.e.  0.1440
## 
## sigma^2 = 13.36:  log likelihood = -1745.25
## AIC=3494.5   AICc=3494.52   BIC=3503.43
## 
## Training set error measures:
##                         ME    RMSE      MAE  MPE MAPE     MASE         ACF1
## Training set -4.804446e-13 3.65206 2.751707 -Inf  Inf 0.981016 -0.007041325

Based on the ARMA output, a ARMA(0, 0) with non-zero mean has the lowest AIC. With this, we can create the model specification for the ARCH model using ugarchspec.

# Create ARCH(1) model specification using rugarch package, arguments to be in a list
# Check ?ugarchspec for the arguments

# Specify ARCH(1) model by setting order of GARCH terms to 0
archmodel <- list(model = "sGARCH", garchOrder = c(1, 0))

# Specify mean model as found using auto.arima
meanmodel <- list(armaOrder = c(0, 0), include.mean = T)

# Assume a normal distribution of the error term z_t
archspec <- rugarch::ugarchspec(variance.model = archmodel, mean.model = meanmodel, distribution.model = "norm")
# Fit the data to the ARCH(1) specification

fitARCH <- rugarch::ugarchfit(spec = archspec, data = trainset, solver = "hybrid")

fitARCH
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,0)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.51948    0.142665   3.6413 0.000271
## omega   12.00707    0.907333  13.2334 0.000000
## alpha1   0.10810    0.059532   1.8159 0.069385
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.51948     0.14860   3.4959 0.000472
## omega   12.00707     1.34500   8.9272 0.000000
## alpha1   0.10810     0.07645   1.4140 0.157349
## 
## LogLikelihood : -1742.841 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.4303
## Bayes        5.4511
## Shibata      5.4303
## Hannan-Quinn 5.4384
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.02494  0.8745
## Lag[2*(p+q)+(p+q)-1][2]   0.08917  0.9273
## Lag[4*(p+q)+(p+q)-1][5]   4.18949  0.2314
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                     0.6875 4.070e-01
## Lag[2*(p+q)+(p+q)-1][2]   17.0268 2.689e-05
## Lag[4*(p+q)+(p+q)-1][5]   28.0623 1.109e-07
## d.o.f=1
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale   P-Value
## ARCH Lag[2]     32.48 0.500 2.000 1.207e-08
## ARCH Lag[4]     34.00 1.397 1.611 2.633e-09
## ARCH Lag[6]     35.04 2.222 1.500 3.179e-09
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.5946
## Individual Statistics:              
## mu     0.07162
## omega  0.15271
## alpha1 0.23159
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          0.846 1.01 1.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.5830 0.5601    
## Negative Sign Bias  0.4720 0.6371    
## Positive Sign Bias  0.5776 0.5637    
## Joint Effect        3.5984 0.3082    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     28.88      0.06787
## 2    30     38.23      0.11734
## 3    40     48.45      0.14279
## 4    50     52.41      0.34310
## 
## 
## Elapsed time : 0.151643

The output is split into several tables. The first table Optimal Parameters gives us the coefficients, standard errors and significance for the mean and ARCH equations, as well as the robust errors. Using the values, we would write the equations:

\[\hat{r}_t = 0.519 + \hat{\varepsilon}_t \\ \widehat{Var(\varepsilon_t)} = \hat{\sigma}_t^2 = 12.007 + 0.108 \varepsilon_{t-1}^2\]

However, coefficient for \(\varepsilon_{t-1}^2\) is insignificant at the 5% significance level. The second table Information Criteria shows the different information criteria values, which can be compared across models to check which fits the data better. The third table Weighted Ljung-Box Test on Standardized Residuals shows results for serial correlation in the residuals and based on the test, the null hypothesis could not be rejected. The next table that interests us is the Weighted ARCH LM Tests and the low p-values means that we reject the null hypothesis (alternative is that there is ARCH effects). We may need to increase the order of ARCH terms to account for the rest of the ARCH effects. The last table Adjusted Pearson Goodness-of-Fit Test shows that the assumption of normally distributed errors cannot be rejected, but it is still good practice to try other distributions.

4.1.2 ARCH(2) Model

I created an ARCH(p) model to account for remaining ARCH effects, still using the assumed normal distribution of errors.

# Specify an ARCH(2) model
arch2model <- list(model = "sGARCH", garchOrder = c(2, 0))

# Assume normal distribution of error term z_t
arch2spec <- rugarch::ugarchspec(variance.model = arch2model, mean.model = meanmodel, distribution.model = "norm")

fitARCH2 <- ugarchfit(spec = arch2spec, data = trainset, solver = "hybrid")

fitARCH2
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(2,0)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.650963    0.138547   4.6985 0.000003
## omega   9.989957    0.922085  10.8341 0.000000
## alpha1  0.057729    0.045654   1.2645 0.206057
## alpha2  0.199779    0.065030   3.0721 0.002126
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.650963    0.155456   4.1874 0.000028
## omega   9.989957    1.209668   8.2584 0.000000
## alpha1  0.057729    0.048738   1.1845 0.236219
## alpha2  0.199779    0.070242   2.8442 0.004453
## 
## LogLikelihood : -1731.135 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3970
## Bayes        5.4248
## Shibata      5.3969
## Hannan-Quinn 5.4078
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1618  0.6875
## Lag[2*(p+q)+(p+q)-1][2]    0.1915  0.8594
## Lag[4*(p+q)+(p+q)-1][5]    3.3677  0.3440
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.00948  0.9224
## Lag[2*(p+q)+(p+q)-1][5]   0.29446  0.9843
## Lag[4*(p+q)+(p+q)-1][9]   1.37920  0.9649
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2177 0.500 2.000  0.6408
## ARCH Lag[5]    0.4174 1.440 1.667  0.9077
## ARCH Lag[7]    1.2924 2.315 1.543  0.8619
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.6254
## Individual Statistics:              
## mu     0.17240
## omega  0.08037
## alpha1 0.15925
## alpha2 0.13576
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.07 1.24 1.6
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias          0.09764 0.92225    
## Negative Sign Bias 1.23187 0.21845    
## Positive Sign Bias 1.44054 0.15020    
## Joint Effect       8.95549 0.02989  **
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     23.78      0.20470
## 2    30     33.28      0.26647
## 3    40     41.98      0.34313
## 4    50     65.01      0.06248
## 
## 
## Elapsed time : 0.1133151

ARCH LM Tests show that we cannot reject the null hypothesis, so we have sufficiently accounted for ARCH effects in the model.

4.1.3 ARCH(2) Model with t-Distribution

I tried an ARCH(2) model with a Student’s t-distribution to account for the fat-tailed distribution.

# Assume a normal distribution of the error term z_t
arch2spec_std <- rugarch::ugarchspec(variance.model = arch2model, mean.model = meanmodel, distribution.model = "std")

fitARCH2_std <- ugarchfit(spec = arch2spec_std, data = trainset, solver = "hybrid")

fitARCH2_std
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(2,0)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.667203    0.131293   5.0818 0.000000
## omega   9.877722    1.122702   8.7982 0.000000
## alpha1  0.064211    0.053893   1.1914 0.233482
## alpha2  0.207263    0.075341   2.7510 0.005942
## shape   7.546520    2.149598   3.5107 0.000447
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.667203    0.140562   4.7467 0.000002
## omega   9.877722    1.121571   8.8070 0.000000
## alpha1  0.064211    0.051120   1.2561 0.209084
## alpha2  0.207263    0.074721   2.7738 0.005540
## shape   7.546520    2.075020   3.6368 0.000276
## 
## LogLikelihood : -1720.172 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3660
## Bayes        5.4007
## Shibata      5.3659
## Hannan-Quinn 5.3795
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1812  0.6703
## Lag[2*(p+q)+(p+q)-1][2]    0.2107  0.8477
## Lag[4*(p+q)+(p+q)-1][5]    3.3750  0.3428
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.03283  0.8562
## Lag[2*(p+q)+(p+q)-1][5]   0.29956  0.9838
## Lag[4*(p+q)+(p+q)-1][9]   1.36826  0.9656
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2427 0.500 2.000  0.6223
## ARCH Lag[5]    0.4306 1.440 1.667  0.9039
## ARCH Lag[7]    1.2882 2.315 1.543  0.8627
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.203
## Individual Statistics:              
## mu     0.12363
## omega  0.17708
## alpha1 0.09359
## alpha2 0.29712
## shape  0.61475
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias          0.04524 0.96393    
## Negative Sign Bias 1.15905 0.24687    
## Positive Sign Bias 1.52698 0.12726    
## Joint Effect       8.71932 0.03327  **
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     12.65       0.8562
## 2    30     26.10       0.6202
## 3    40     31.03       0.8148
## 4    50     46.50       0.5750
## 
## 
## Elapsed time : 0.1592081

With the t-distribution ARCH(2) model, the coefficient of \(\varepsilon_{t-1}^2\) is still insignificant, but we have lower AIC.

4.1.4 ARCH(2) Model with Skewed t-Distribution

Finally, I created an ARCH(2) model with a skewed t-distribution to account for the heavy tail and skewed distribution.

# Use skewed t-distribution

arch2spec_sstd <- ugarchspec(variance.model = arch2model, mean.model = meanmodel, distribution.model = "sstd")

fitARCH2_sstd <- rugarch::ugarchfit(spec = arch2spec_sstd, data = trainset, solver = "hybrid")

fitARCH2_sstd
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(2,0)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.615281    0.138353   4.4472 0.000009
## omega   9.851054    1.115402   8.8318 0.000000
## alpha1  0.073452    0.055048   1.3343 0.182095
## alpha2  0.190350    0.073117   2.6034 0.009231
## skew    0.933738    0.051096  18.2742 0.000000
## shape   7.826487    2.310906   3.3868 0.000707
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.615281    0.148502   4.1432 0.000034
## omega   9.851054    1.122155   8.7787 0.000000
## alpha1  0.073452    0.055413   1.3256 0.184988
## alpha2  0.190350    0.072292   2.6331 0.008462
## skew    0.933738    0.048882  19.1018 0.000000
## shape   7.826487    2.222069   3.5222 0.000428
## 
## LogLikelihood : -1719.394 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3667
## Bayes        5.4084
## Shibata      5.3665
## Hannan-Quinn 5.3829
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1818  0.6698
## Lag[2*(p+q)+(p+q)-1][2]    0.2158  0.8446
## Lag[4*(p+q)+(p+q)-1][5]    3.3947  0.3397
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1139  0.7358
## Lag[2*(p+q)+(p+q)-1][5]    0.4651  0.9629
## Lag[4*(p+q)+(p+q)-1][9]    1.5191  0.9540
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.2610 0.500 2.000  0.6094
## ARCH Lag[5]    0.4191 1.440 1.667  0.9072
## ARCH Lag[7]    1.2610 2.315 1.543  0.8678
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.3748
## Individual Statistics:              
## mu     0.11719
## omega  0.18642
## alpha1 0.09345
## alpha2 0.28928
## skew   0.10917
## shape  0.61556
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias          0.09865 0.92145    
## Negative Sign Bias 1.03059 0.30312    
## Positive Sign Bias 1.52180 0.12855    
## Joint Effect       8.33019 0.03966  **
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     14.39       0.7607
## 2    30     26.56       0.5952
## 3    40     26.30       0.9400
## 4    50     44.95       0.6381
## 
## 
## Elapsed time : 0.229322

The ARCH(2) with skewed t-distribution has slightly higher AIC than the ARCH(2) with t-distribution, but the skew coefficient is significant.

4.1.5 Plots of ARCH(2) Model

# Plot some data from the ARCH(2) with t-distribution

par(mfrow = c(2, 2))

# Plot 1: Actual return series with conditional SD
# Plot 3: Conditional volatility vs absolute returns
# Plot 8: Distribution of residuals
# Plot 9: Q-Q plot of residuals to check for normality

plotnum <- c(1, 3, 8, 9)

for (i in plotnum) {
  plot(fitARCH2_std, which = i)
}

4.2 Standard GARCH Models

In order to capture the effects of volatility clustering, an ARCH model with many lags may be required. To solve this issue, the standard Generalized ARCH (sGARCH) can be used. The model can be thought of as an Autoregressive Moving Average Model, and is the basic model of in the GARCH variation of models. A GARCH(1, 1) model can be written as:

\[\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2 ~, \text{ where } \alpha_0,~ \alpha_1,~ \beta_1 > 0\]

Essentially, the conditional variance in the current period is affected by the errors and the conditional variance in the previous period. Similar to ARCH, the model can be extended to a GARCH(p, q) model with \(p\) lags of conditional variance and \(q\) lags of error terms.

4.2.1 GARCH(1, 1) Model

Create an GARCH(1, 1) model specification with normally distributed errors:

# Specify GARCH(1,1) model 
garchmodel <- list(model = "sGARCH", garchOrder = c(1, 1))

# Assume a normal distribution of the error term z_t
garchspec <- rugarch::ugarchspec(variance.model = garchmodel, mean.model = meanmodel, distribution.model = "norm")

# Fit GARCH(1, 1)
fitGARCH11 <- rugarch::ugarchfit(spec = garchspec, data = trainset, solver = "hybrid")

fitGARCH11
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.60537    0.142198   4.2573 0.000021
## omega    3.73165    1.531777   2.4362 0.014844
## alpha1   0.10892    0.044753   2.4339 0.014938
## beta1    0.61147    0.141684   4.3157 0.000016
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.60537    0.148316   4.0816 0.000045
## omega    3.73165    1.504923   2.4796 0.013152
## alpha1   0.10892    0.039491   2.7582 0.005813
## beta1    0.61147    0.128058   4.7749 0.000002
## 
## LogLikelihood : -1734.833 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.4085
## Bayes        5.4363
## Shibata      5.4084
## Hannan-Quinn 5.4193
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1829  0.6689
## Lag[2*(p+q)+(p+q)-1][2]    0.2539  0.8219
## Lag[4*(p+q)+(p+q)-1][5]    3.0361  0.4005
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.7269 0.39389
## Lag[2*(p+q)+(p+q)-1][5]    8.3103 0.02467
## Lag[4*(p+q)+(p+q)-1][9]    9.8487 0.05404
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.968 0.500 2.000  0.1606
## ARCH Lag[5]     2.137 1.440 1.667  0.4418
## ARCH Lag[7]     2.355 2.315 1.543  0.6425
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.3917
## Individual Statistics:              
## mu     0.12627
## omega  0.08493
## alpha1 0.07316
## beta1  0.09560
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.07 1.24 1.6
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.2338 0.8153    
## Negative Sign Bias  0.3893 0.6972    
## Positive Sign Bias  1.4266 0.1542    
## Joint Effect        5.5742 0.1343    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     25.90       0.1331
## 2    30     26.19       0.6152
## 3    40     40.98       0.3836
## 4    50     59.88       0.1372
## 
## 
## Elapsed time : 0.09278607

With a GARCH(1, 1) model, the \(\alpha_1\) is statistically significant, standardized residuals do not have serial correlation and there are no ARCH effects left after including the GARCH term. The Adjusted Pearson Goodness-of-Fit Test statistic also shows that we cannot reject that the errors follow a normal distribution. We can write the formula as:

\[\hat{r}_t = 0.605 + \hat{\sigma}_t \hat{z}_t \\ \widehat{Var(\varepsilon_t)} = \hat{\sigma}_t^2 = 3.732 + 0.109 \varepsilon_{t-1}^2 + 0.611 \sigma_{t-1}^2\]

As per the ARCH models, I created GARCH(1, 1) with the assumption that errors follow a t-distribution and a skewed t-distribution.

4.2.2 GARCH(1, 1) Model with t-Distribution

# GARCH(1, 1) with errors that follow a t-distribution

garchspec_std <- rugarch::ugarchspec(variance.model = garchmodel, mean.model = meanmodel, distribution.model = "std")

fitGARCH11_std <- rugarch::ugarchfit(spec = garchspec_std, data = trainset, solver = "hybrid")

fitGARCH11_std
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.644531    0.133241   4.8373 0.000001
## omega   1.335765    1.162309   1.1492 0.250460
## alpha1  0.076522    0.039344   1.9449 0.051782
## beta1   0.825073    0.115850   7.1219 0.000000
## shape   6.635424    1.642166   4.0407 0.000053
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.644531    0.143570  4.48933 0.000007
## omega   1.335765    1.613021  0.82811 0.407606
## alpha1  0.076522    0.049609  1.54252 0.122947
## beta1   0.825073    0.160723  5.13352 0.000000
## shape   6.635424    1.703154  3.89596 0.000098
## 
## LogLikelihood : -1720.528 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3671
## Bayes        5.4018
## Shibata      5.3670
## Hannan-Quinn 5.3806
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1283  0.7202
## Lag[2*(p+q)+(p+q)-1][2]    0.2301  0.8359
## Lag[4*(p+q)+(p+q)-1][5]    3.0580  0.3966
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.1381 0.71018
## Lag[2*(p+q)+(p+q)-1][5]    9.0970 0.01564
## Lag[4*(p+q)+(p+q)-1][9]   10.7614 0.03431
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     2.091 0.500 2.000  0.1482
## ARCH Lag[5]     2.396 1.440 1.667  0.3901
## ARCH Lag[7]     2.507 2.315 1.543  0.6110
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  0.966
## Individual Statistics:              
## mu     0.08553
## omega  0.23612
## alpha1 0.15839
## beta1  0.23547
## shape  0.66597
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.1688 0.8660    
## Negative Sign Bias  0.6056 0.5450    
## Positive Sign Bias  1.3342 0.1826    
## Joint Effect        5.4915 0.1391    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     13.95       0.7865
## 2    30     32.63       0.2929
## 3    40     34.14       0.6910
## 4    50     41.21       0.7776
## 
## 
## Elapsed time : 0.09906483

Using the t-distribution led to insignificant \(\omega\) and \(\alpha_1\).

4.2.3 GARCH(1, 1) Model with Skewed t-Distribution

# GARCH(1, 1) with errors that follow a skewed t-distribution

garchspec_sstd <- rugarch::ugarchspec(variance.model = garchmodel, mean.model = meanmodel, distribution.model = "sstd")

fitGARCH11_sstd <- rugarch::ugarchfit(spec = garchspec_sstd, data = trainset, solver = "hybrid")

fitGARCH11_sstd
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.563131    0.140625   4.0045 0.000062
## omega   1.081351    0.854691   1.2652 0.205802
## alpha1  0.068029    0.031637   2.1503 0.031531
## beta1   0.851328    0.086314   9.8631 0.000000
## skew    0.910202    0.050148  18.1504 0.000000
## shape   6.879183    1.756956   3.9154 0.000090
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu      0.563131    0.144220   3.9047 0.000094
## omega   1.081351    1.043376   1.0364 0.300018
## alpha1  0.068029    0.034996   1.9439 0.051908
## beta1   0.851328    0.105308   8.0841 0.000000
## skew    0.910202    0.045698  19.9176 0.000000
## shape   6.879183    1.794090   3.8344 0.000126
## 
## LogLikelihood : -1719.081 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3657
## Bayes        5.4074
## Shibata      5.3656
## Hannan-Quinn 5.3819
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.09793  0.7543
## Lag[2*(p+q)+(p+q)-1][2]   0.21648  0.8441
## Lag[4*(p+q)+(p+q)-1][5]   3.12267  0.3851
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                    0.06918 0.79253
## Lag[2*(p+q)+(p+q)-1][5]   9.76845 0.01055
## Lag[4*(p+q)+(p+q)-1][9]  11.51797 0.02331
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.988 0.500 2.000  0.1585
## ARCH Lag[5]     2.324 1.440 1.667  0.4039
## ARCH Lag[7]     2.437 2.315 1.543  0.6255
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.0998
## Individual Statistics:              
## mu     0.07835
## omega  0.22466
## alpha1 0.15568
## beta1  0.21385
## skew   0.09560
## shape  0.68766
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.2614 0.7939    
## Negative Sign Bias  0.6847 0.4938    
## Positive Sign Bias  1.2174 0.2239    
## Joint Effect        5.7490 0.1245    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     15.32       0.7020
## 2    30     27.59       0.5399
## 3    40     33.77       0.7072
## 4    50     37.33       0.8887
## 
## 
## Elapsed time : 0.1600549

4.2.4 Plots of GARCH(1, 1) Model

# Plot some data from the GARCH(1, 1) with t-distribution

par(mfrow = c(2, 2))

for (i in plotnum) {
  plot(fitGARCH11_std, which = i)
}

4.3 Glosten-Jagannathan-Runkle GARCH Models

GJR-GARCH models can capture the asymmetric patterns in volatility due to positive and negative shocks (such as bad/good financial results, etc.). The volatility asymmetry is not captured by the standard GARCH models. We write the GJR-GARCH(1, 1) as:

\[\sigma_t^2 = \omega + \alpha_1 \varepsilon_{t-1}^2 + \gamma_1 \varepsilon_{t-1}^2 I_{t-1} + \beta_1 \sigma_{t-1}^2\]

\(I_{t-1} = 0\) if \(\varepsilon_{t-1}^2 \ge 0\) and \(I_{t-1} = 1\) if \(\varepsilon_{t-1}^2 < 0\). The inclusion of the dummy variable \(I_{t-1}\) simply means that we expect higher volatility in the current period, if there was a negative shock in the previous period.

4.3.1 GJR-GARCH(1, 1) Model

Create a GJR-GARCH(1, 1) model specification with normally distributed errors:

# Specify GJR-GARCH(1,1) model 
gjrmodel <- list(model = "gjrGARCH", garchOrder = c(1, 1))

# Assume a normal distribution of the error term z_t
gjrspec <- rugarch::ugarchspec(variance.model = gjrmodel, mean.model = meanmodel, distribution.model = "norm")

# Fit GARCH(1, 1)
fitGJR11 <- rugarch::ugarchfit(spec = gjrspec, data = trainset, solver = "hybrid")

fitGJR11
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.54378      1.1933 0.455677  0.64862
## omega    4.48858      5.8324 0.769599  0.44154
## alpha1   0.00000      4.2990 0.000000  1.00000
## beta1    0.50477      1.3321 0.378920  0.70475
## gamma1   0.31532      3.3832 0.093203  0.92574
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.54378      91.101 0.005969  0.99524
## omega    4.48858     441.601 0.010164  0.99189
## alpha1   0.00000     330.322 0.000000  1.00000
## beta1    0.50477     102.073 0.004945  0.99605
## gamma1   0.31532     259.861 0.001213  0.99903
## 
## LogLikelihood : -1722.357 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3728
## Bayes        5.4075
## Shibata      5.3727
## Hannan-Quinn 5.3863
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.5627  0.4532
## Lag[2*(p+q)+(p+q)-1][2]    0.6598  0.6231
## Lag[4*(p+q)+(p+q)-1][5]    3.9271  0.2633
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      3.544 0.05975
## Lag[2*(p+q)+(p+q)-1][5]     7.739 0.03421
## Lag[4*(p+q)+(p+q)-1][9]     8.960 0.08284
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.581 0.500 2.000  0.2086
## ARCH Lag[5]     1.626 1.440 1.667  0.5599
## ARCH Lag[7]     1.753 2.315 1.543  0.7694
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.9467
## Individual Statistics:             
## mu     0.2152
## omega  0.1254
## alpha1 0.6448
## beta1  0.1288
## gamma1 0.1718
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.1138 0.9094    
## Negative Sign Bias  1.0726 0.2839    
## Positive Sign Bias  1.5283 0.1269    
## Joint Effect        3.8146 0.2822    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     29.01      0.06589
## 2    30     46.25      0.02214
## 3    40     60.02      0.01686
## 4    50     65.32      0.05935
## 
## 
## Elapsed time : 0.2745059

With the GJR-GARCH(1, 1) with normally distributed errors, none of the coefficients in the models are significant. This might hint at a mis-specification of models. Let’s try the assumption of t-distribution and skewed t-distribution.

4.3.2 GJR-GARCH(1, 1) Model with t-Distribution

# Assume Student's t-distribution of error term z_t
gjrspec_std <- rugarch::ugarchspec(variance.model = gjrmodel, mean.model = meanmodel, distribution.model = "std")

# Fit GARCH(1, 1)
fitGJR11_std <- rugarch::ugarchfit(spec = gjrspec_std, data = trainset, solver = "hybrid")

fitGJR11_std
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.59694    0.131810   4.5288 0.000006
## omega    3.76777    1.259606   2.9912 0.002779
## alpha1   0.00000    0.094145   0.0000 1.000000
## beta1    0.56513    0.111797   5.0550 0.000000
## gamma1   0.29505    0.135978   2.1698 0.030020
## shape    8.29014    2.631474   3.1504 0.001631
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.59694    0.150087   3.9773 0.000070
## omega    3.76777    1.500677   2.5107 0.012049
## alpha1   0.00000    0.155180   0.0000 1.000000
## beta1    0.56513    0.094059   6.0082 0.000000
## gamma1   0.29505    0.175437   1.6818 0.092608
## shape    8.29014    3.352420   2.4729 0.013403
## 
## LogLikelihood : -1711.757 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3429
## Bayes        5.3846
## Shibata      5.3428
## Hannan-Quinn 5.3591
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.5652  0.4522
## Lag[2*(p+q)+(p+q)-1][2]    0.6509  0.6268
## Lag[4*(p+q)+(p+q)-1][5]    3.8003  0.2800
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      3.187 0.07422
## Lag[2*(p+q)+(p+q)-1][5]     7.249 0.04512
## Lag[4*(p+q)+(p+q)-1][9]     8.326 0.11120
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.677 0.500 2.000  0.1953
## ARCH Lag[5]     1.723 1.440 1.667  0.5357
## ARCH Lag[7]     1.771 2.315 1.543  0.7656
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.802
## Individual Statistics:              
## mu     0.13648
## omega  0.30992
## alpha1 0.39793
## beta1  0.25869
## gamma1 0.09779
## shape  0.28461
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value    prob sig
## Sign Bias           0.1049 0.91645    
## Negative Sign Bias  0.9505 0.34220    
## Positive Sign Bias  1.7135 0.08711   *
## Joint Effect        4.0820 0.25274    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     22.04    0.2823255
## 2    30     33.19    0.2701523
## 3    40     56.53    0.0343318
## 4    50     89.12    0.0004023
## 
## 
## Elapsed time : 0.4018228

4.3.3 GJR-GARCH(1, 1) Model with Skewed t-Distribution

# Assume skewed t-distribution of error term z_t
gjrspec_sstd <- rugarch::ugarchspec(variance.model = gjrmodel, mean.model = meanmodel, distribution.model = "sstd")

# Fit GARCH(1, 1)
fitGJR11_sstd <- rugarch::ugarchfit(spec = gjrspec_sstd, data = trainset, solver = "hybrid")

fitGJR11_sstd
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : gjrGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.53099    0.135904   3.9071 0.000093
## omega    3.83977    1.280450   2.9988 0.002711
## alpha1   0.00000    0.085656   0.0000 1.000000
## beta1    0.56348    0.116819   4.8235 0.000001
## gamma1   0.29048    0.128188   2.2661 0.023447
## skew     0.90785    0.050240  18.0703 0.000000
## shape    8.51249    2.712575   3.1382 0.001700
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.53099    0.148100   3.5854 0.000337
## omega    3.83977    1.367220   2.8085 0.004978
## alpha1   0.00000    0.127236   0.0000 1.000000
## beta1    0.56348    0.095472   5.9020 0.000000
## gamma1   0.29048    0.147930   1.9636 0.049571
## skew     0.90785    0.048060  18.8899 0.000000
## shape    8.51249    3.194274   2.6649 0.007701
## 
## LogLikelihood : -1710.23 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3413
## Bayes        5.3899
## Shibata      5.3411
## Hannan-Quinn 5.3602
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.6114  0.4343
## Lag[2*(p+q)+(p+q)-1][2]    0.7120  0.6019
## Lag[4*(p+q)+(p+q)-1][5]    3.9143  0.2650
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                      3.158 0.07556
## Lag[2*(p+q)+(p+q)-1][5]     7.126 0.04833
## Lag[4*(p+q)+(p+q)-1][9]     8.185 0.11855
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]     1.594 0.500 2.000  0.2068
## ARCH Lag[5]     1.643 1.440 1.667  0.5556
## ARCH Lag[7]     1.699 2.315 1.543  0.7807
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.9299
## Individual Statistics:             
## mu     0.1385
## omega  0.2983
## alpha1 0.4199
## beta1  0.2464
## gamma1 0.1113
## skew   0.1295
## shape  0.2823
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.69 1.9 2.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias          0.07627 0.9392    
## Negative Sign Bias 0.98712 0.3240    
## Positive Sign Bias 1.58021 0.1146    
## Joint Effect       3.87723 0.2750    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     14.70     0.741571
## 2    30     33.56     0.255617
## 3    40     50.56     0.101653
## 4    50     77.61     0.005714
## 
## 
## Elapsed time : 0.5618382

With the t-distribution and skewed t-distribution assumption, only the coefficient of \(\varepsilon_{t-1}^2\) is insignificant and \(\gamma_1\) is insignificant when using robust errors.

4.3.4 Plots of GJR-GARCH(1, 1) Model

# Plot some data from the GJR-GARCH(1, 1) with t-distribution

par(mfrow = c(2, 2))

for (i in plotnum) {
  plot(fitGJR11_std, which = i)
}

4.4 Exponential GARCH Models

EGARCH models, similar to GJR-GARCH models, resolves the asymmetric patterns of volatility to positive and negative shocks. An EGARCH(1, 1) model is written as:

\[\ln(\sigma_t^2) = \omega + \bigg[ \alpha_1 z_{t-1} + \gamma_1 \bigg( |z_{t-1}| - E|z_{t-1}| \bigg) \bigg] + \beta_1 \ln(\sigma_{t-1}^2)\]

\(\gamma_1\) is the coefficient of the EGARCH term \(\gamma_1 \bigg( |z_{t-1}| - E|z_{t-1}| \bigg)\). If negative shocks cause higher volatility than positive shocks, then \(\alpha_1\) should be negative. This is because, if \(z_{t-1} \ge 0\), we would have \((-\alpha_1 + \gamma_1)z_{t-1} - \gamma_1 E(|z_{t-1}|)\) and if \(z_{t-1} < 0\), we would have \((\alpha_1 + \gamma_1)z_{t-1} - \gamma_1 E(|z_{t-1}|)\), which creates higher conditional volatility when there is a negative shock.

4.4.1 EGARCH(1, 1) Model

Create an EGARCH(1, 1) model specification with normally distributed errors:

# Specify EGARCH(1,1) model 
egarchmodel <- list(model = "eGARCH", garchOrder = c(1, 1))

# Assume a normal distribution of the error term z_t
egarchspec <- rugarch::ugarchspec(variance.model = egarchmodel, mean.model = meanmodel, distribution.model = "norm")

# Fit GARCH(1, 1)
fitEGARCH11 <- rugarch::ugarchfit(spec = egarchspec, data = trainset, solver = "hybrid")

fitEGARCH11
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : eGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : norm 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.53455    0.135192   3.9540 0.000077
## omega    1.04537    0.204634   5.1085 0.000000
## alpha1  -0.27588    0.056221  -4.9071 0.000001
## beta1    0.58685    0.080277   7.3103 0.000000
## gamma1   0.16260    0.067583   2.4059 0.016131
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.53455    0.147463   3.6250 0.000289
## omega    1.04537    0.199551   5.2386 0.000000
## alpha1  -0.27588    0.057280  -4.8164 0.000001
## beta1    0.58685    0.075056   7.8188 0.000000
## gamma1   0.16260    0.065427   2.4852 0.012947
## 
## LogLikelihood : -1721.632 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3705
## Bayes        5.4053
## Shibata      5.3704
## Hannan-Quinn 5.3840
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.4971  0.4808
## Lag[2*(p+q)+(p+q)-1][2]    0.6019  0.6476
## Lag[4*(p+q)+(p+q)-1][5]    5.1678  0.1402
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic   p-value
## Lag[1]                      2.877 0.0898311
## Lag[2*(p+q)+(p+q)-1][5]    15.004 0.0004434
## Lag[4*(p+q)+(p+q)-1][9]    18.009 0.0006453
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.5821 0.500 2.000  0.4455
## ARCH Lag[5]    1.0638 1.440 1.667  0.7141
## ARCH Lag[7]    1.7477 2.315 1.543  0.7705
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.2164
## Individual Statistics:             
## mu     0.1887
## omega  0.1932
## alpha1 0.7019
## beta1  0.1783
## gamma1 0.1430
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.28 1.47 1.88
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.5684 0.5700    
## Negative Sign Bias  0.8028 0.4224    
## Positive Sign Bias  0.9806 0.3272    
## Joint Effect        2.0914 0.5537    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     24.03      0.19502
## 2    30     38.42      0.11338
## 3    40     46.21      0.19904
## 4    50     63.30      0.08231
## 
## 
## Elapsed time : 0.2226751

Based on the output, we can write the formula as:

\[\hat{r}_t = 0.535 + \hat{\sigma}_t \hat{z}_t \\ \ln(\hat{\sigma}_t^2) = 1.045 - 0.275z_{t-1} + 0.163 \bigg( |z_{t-1}| - E|z_{t-1}| \bigg) + 0.587 \ln(\sigma_{t-1}^2)\]

All the coefficients are significant, standardized residuals are not serially correlated and ARCH effects have been sufficiently captured. I have also estimated the EGARCH with assumed t-distribution and skewed t-distribution of errors.

4.4.2 EGARCH(1, 1) Model with t-Distribution

# Assume a t-distribution of the error term z_t
egarchspec_std <- rugarch::ugarchspec(variance.model = egarchmodel, mean.model = meanmodel, distribution.model = "std")

# Fit GARCH(1, 1)
fitEGARCH11_std <- rugarch::ugarchfit(spec = egarchspec_std, data = trainset, solver = "hybrid")

fitEGARCH11_std
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : eGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : std 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.58466    0.122449   4.7748 0.000002
## omega    0.84197    0.264839   3.1792 0.001477
## alpha1  -0.24192    0.063613  -3.8030 0.000143
## beta1    0.66407    0.104998   6.3246 0.000000
## gamma1   0.16515    0.073542   2.2457 0.024726
## shape    8.46226    2.552073   3.3158 0.000914
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.58466    0.128296   4.5572 0.000005
## omega    0.84197    0.307692   2.7364 0.006211
## alpha1  -0.24192    0.055913  -4.3267 0.000015
## beta1    0.66407    0.121533   5.4641 0.000000
## gamma1   0.16515    0.068635   2.4062 0.016119
## shape    8.46226    2.699698   3.1345 0.001721
## 
## LogLikelihood : -1712.211 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3444
## Bayes        5.3860
## Shibata      5.3442
## Hannan-Quinn 5.3605
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.5006  0.4792
## Lag[2*(p+q)+(p+q)-1][2]    0.6018  0.6477
## Lag[4*(p+q)+(p+q)-1][5]    4.9355  0.1583
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                      2.583 0.108018
## Lag[2*(p+q)+(p+q)-1][5]    13.408 0.001183
## Lag[4*(p+q)+(p+q)-1][9]    15.906 0.002152
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.7679 0.500 2.000  0.3809
## ARCH Lag[5]    1.0585 1.440 1.667  0.7156
## ARCH Lag[7]    1.5424 2.315 1.543  0.8128
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.1313
## Individual Statistics:             
## mu     0.1426
## omega  0.3173
## alpha1 0.4081
## beta1  0.2791
## gamma1 0.1515
## shape  0.2131
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.49 1.68 2.12
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.7249 0.4688    
## Negative Sign Bias  0.6355 0.5253    
## Positive Sign Bias  1.3163 0.1885    
## Joint Effect        2.4070 0.4923    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     22.72      0.24976
## 2    30     33.66      0.25206
## 3    40     57.40      0.02889
## 4    50     51.01      0.39445
## 
## 
## Elapsed time : 0.591047

4.4.3 EGARCH(1, 1) Model with Skewed t-Distribution

# Assume a skewed t-distribution of the error term z_t
egarchspec_sstd <- rugarch::ugarchspec(variance.model = egarchmodel, mean.model = meanmodel, distribution.model = "sstd")

# Fit GARCH(1, 1)
fitEGARCH11_sstd <- rugarch::ugarchfit(spec = egarchspec_sstd, data = trainset, solver = "hybrid")

fitEGARCH11_sstd
## 
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## 
## Conditional Variance Dynamics    
## -----------------------------------
## GARCH Model  : eGARCH(1,1)
## Mean Model   : ARFIMA(0,0,0)
## Distribution : sstd 
## 
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.52678    0.134316   3.9219 0.000088
## omega    0.85533    0.284003   3.0117 0.002598
## alpha1  -0.23630    0.063857  -3.7004 0.000215
## beta1    0.66023    0.112343   5.8769 0.000000
## gamma1   0.16247    0.073062   2.2237 0.026166
## skew     0.91305    0.050314  18.1472 0.000000
## shape    8.65667    2.662249   3.2516 0.001147
## 
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.52678    0.146255   3.6018 0.000316
## omega    0.85533    0.333274   2.5665 0.010274
## alpha1  -0.23630    0.057426  -4.1148 0.000039
## beta1    0.66023    0.131672   5.0142 0.000001
## gamma1   0.16247    0.065391   2.4846 0.012970
## skew     0.91305    0.047507  19.2195 0.000000
## shape    8.65667    2.775810   3.1186 0.001817
## 
## LogLikelihood : -1710.853 
## 
## Information Criteria
## ------------------------------------
##                    
## Akaike       5.3432
## Bayes        5.3919
## Shibata      5.3430
## Hannan-Quinn 5.3621
## 
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
##                         statistic p-value
## Lag[1]                     0.5486  0.4589
## Lag[2*(p+q)+(p+q)-1][2]    0.6675  0.6199
## Lag[4*(p+q)+(p+q)-1][5]    5.0387  0.1500
## d.o.f=0
## H0 : No serial correlation
## 
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
##                         statistic  p-value
## Lag[1]                      2.487 0.114803
## Lag[2*(p+q)+(p+q)-1][5]    13.257 0.001297
## Lag[4*(p+q)+(p+q)-1][9]    15.720 0.002390
## d.o.f=2
## 
## Weighted ARCH LM Tests
## ------------------------------------
##             Statistic Shape Scale P-Value
## ARCH Lag[3]    0.7273 0.500 2.000  0.3937
## ARCH Lag[5]    1.0128 1.440 1.667  0.7291
## ARCH Lag[7]    1.5156 2.315 1.543  0.8182
## 
## Nyblom stability test
## ------------------------------------
## Joint Statistic:  1.3488
## Individual Statistics:             
## mu     0.1522
## omega  0.3067
## alpha1 0.4119
## beta1  0.2714
## gamma1 0.1355
## skew   0.1176
## shape  0.2086
## 
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic:          1.69 1.9 2.35
## Individual Statistic:     0.35 0.47 0.75
## 
## Sign Bias Test
## ------------------------------------
##                    t-value   prob sig
## Sign Bias           0.5094 0.6106    
## Negative Sign Bias  0.6706 0.5027    
## Positive Sign Bias  1.1665 0.2439    
## Joint Effect        1.9281 0.5875    
## 
## 
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
##   group statistic p-value(g-1)
## 1    20     16.19       0.6445
## 2    30     31.60       0.3375
## 3    40     46.33       0.1956
## 4    50     56.30       0.2206
## 
## 
## Elapsed time : 0.3819659

4.4.4 Plots of EGARCH(1, 1) Model

# Plot some data from the EGARCH(1, 1) with t-distribution

par(mfrow = c(2, 2))

for (i in plotnum) {
  plot(fitEGARCH11_std, which = i)
}

4.5 Summary

In this section, I have estimated 4 different models, with 3 different assumed distribution of the error \(z_t\). I have created a table comparing their AIC values below.

Models Normal Distribution t-Distribution Skewed t-Distribution
ARCH(2) 5.397 5.366 5.367
GARCH(1, 1) 5.409 5.367 5.366
GJR-GARCH(1, 1) 5.373 5.343 5.341
EGARCH(1, 1) 5.371 5.344 5.343

Despite the rather low AIC values of the GJR models, the \(\gamma\) coefficient for the \(\varepsilon_{t-1}^2 I_{t-1}\) term was insignificant when looking at the robust errors. However, I would still consider this model in the forecast evaluation in the next section.

5 Evaluating Forecast Performance

In this section, I evaluated the out-of-sample forecast performance of the 12 models created in Section 3. However, before jumping into the forecast, there is a need to highlight the difficulty of evaluating GARCH models. The difficulty lies in using an appropriate proxy of volatility, since it is not observable at a point in time. The paper by Wennström (2014) describes some proxies of volatility, such as squared returns, using moving averages of variance and cumulative squared intra-day returns. In the paper, and another by Schmidt (2021), the High-Low Range proxy was used and the formula is given by:

\[v_t = \ln \bigg( \frac{H_t}{L_t} \bigg)\]

I extracted the weekly high and weekly low prices of AAPL of the test period and calculate the volatility proxy.

# Extract high (column 2) and low (column 3) prices within test period

proxy <- AAPL_price["2022-05/", 2:3]

proxy$Volatility <- log(proxy[, 1] / proxy[, 2]) * 100

proxy
##            AAPL.High AAPL.Low Volatility
## 2022-05-06    159.44   138.80  13.863362
## 2022-05-13    149.77   136.60   9.204382
## 2022-05-20    144.34   132.61   8.475910
## 2022-05-27    151.74   145.26   4.364336
## 2022-06-03    149.87   142.53   5.021572
## 2022-06-10    140.76   129.04   8.693390
## 2022-06-17    138.59   129.81   6.544808
## 2022-06-24    143.49   133.77   7.014344

In the following sub-sections, I evaluated the models against the volatility proxy using Mean Squared Error, Root Mean Squared Error and Mean Absolute Error, grouped by the model types (ARCH, GARCH, GJR and EGARCH).

The formula for each evaluation metric is:

\[ MSE = \frac{1}{n} \sum_{n=1}^N (v_{n+1}^2 - \hat{\sigma}_{n+1}^2)^2 \\ RMSE = \sqrt{MSE} \\ MAE = \frac{1}{n} \sum_{n=1}^N |v_{n+1}^2 - \hat{\sigma}_{n+1}^2| \]

5.1 ARCH Forecasts

# Forecast ARCH(2) with normal distribution of innovations 8 periods ahead
fcstARCH2_nd <- rugarch::ugarchforecast(fitORspec = fitARCH2, n.ahead = 8)

# Forecast ARCH(2) with t-distribution of innovations 8 periods ahead
fcstARCH2_std <- rugarch::ugarchforecast(fitORspec = fitARCH2_std, n.ahead = 8)

# Forecast ARCH(2) with t-distribution of innovations 8 periods ahead
fcstARCH2_sstd <- rugarch::ugarchforecast(fitORspec = fitARCH2_sstd, n.ahead = 8)

Store forecasted conditional standard deviation of ARCH models into a data frame:

fcstARCH2 <- cbind(sigma(fcstARCH2_nd), sigma(fcstARCH2_std), sigma(fcstARCH2_sstd)) %>%
  `colnames<-`(c("ARCH2_ND", "ARCH2_STD", "ARCH2_SSTD"))

fcstARCH2
##     ARCH2_ND ARCH2_STD ARCH2_SSTD
## T+1 3.533897  3.549828   3.553096
## T+2 3.947907  3.972137   3.918734
## T+3 3.658503  3.674590   3.658155
## T+4 3.725102  3.743649   3.709058
## T+5 3.669469  3.684593   3.661807
## T+6 3.679605  3.695164   3.668056
## T+7 3.669006  3.683521   3.659467
## T+8 3.670421  3.684966   3.660027

Calculate evaluation metrics for each ARCH model:

evalARCH <- NULL

for (i in 1:3) {
  mse <- mean((proxy$Volatility^2 - fcstARCH2[,i]^2)^2)
  
  rmse <- sqrt(mse)
  
  mae <- mean(abs(proxy$Volatility^2 - fcstARCH2[,i]^2))
  
  evalARCH <- rbind(evalARCH, c(MSE = mse, RMSE = rmse, MAE= mae))
}

rownames(evalARCH) <- colnames(fcstARCH2)

evalARCH
##                 MSE     RMSE      MAE
## ARCH2_ND   5830.373 76.35688 56.41940
## ARCH2_STD  5816.254 76.26437 56.29428
## ARCH2_SSTD 5831.107 76.36168 56.48219

5.2 GARCH Forecasts

# Forecast GARCH(1, 1) with normal distribution of innovations 8 periods ahead
fcstGARCH11_nd <- rugarch::ugarchforecast(fitORspec = fitGARCH11, n.ahead = 8)

# Forecast GARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstGARCH11_std <- rugarch::ugarchforecast(fitORspec = fitGARCH11_std, n.ahead = 8)

# Forecast GARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstGARCH11_sstd <- rugarch::ugarchforecast(fitORspec = fitGARCH11_sstd, n.ahead = 8)

Store forecasted conditional standard deviation of GARCH models into a data frame:

fcstGARCH11 <- cbind(sigma(fcstGARCH11_nd), sigma(fcstGARCH11_std), sigma(fcstGARCH11_sstd)) %>%
  `colnames<-`(c("GARCH11_ND", "GARCH11_STD", "GARCH11_SSTD"))

fcstGARCH11
##     GARCH11_ND GARCH11_STD GARCH11_SSTD
## T+1   3.685383    3.711890     3.687765
## T+2   3.676414    3.709187     3.685680
## T+3   3.669939    3.706749     3.683762
## T+4   3.665268    3.704549     3.681998
## T+5   3.661899    3.702564     3.680376
## T+6   3.659470    3.700774     3.678883
## T+7   3.657719    3.699159     3.677511
## T+8   3.656457    3.697702     3.676249

Calculate evaluation metrics for each GARCH model:

evalGARCH <- NULL

for (i in 1:3) {
  mse <- mean((proxy$Volatility^2 - fcstGARCH11[,i]^2)^2)
  
  rmse <- sqrt(mse)
  
  mae <- mean(abs(proxy$Volatility^2 - fcstGARCH11[,i]^2))
  
  evalGARCH <- rbind(evalGARCH, c(MSE = mse, RMSE = rmse, MAE= mae))
}

rownames(evalGARCH) <- colnames(fcstGARCH11)

evalGARCH
##                   MSE     RMSE      MAE
## GARCH11_ND   5821.056 76.29584 56.63477
## GARCH11_STD  5793.129 76.11260 56.35842
## GARCH11_SSTD 5812.432 76.23930 56.52492

5.3 GJR-GARCH Forecasts

# Forecast GJR-GARCH(1, 1) with normal distribution of innovations 8 periods ahead
fcstGJR11_nd <- rugarch::ugarchforecast(fitORspec = fitGJR11, n.ahead = 8)

# Forecast GJR-GARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstGJR11_std <- rugarch::ugarchforecast(fitORspec = fitGJR11_std, n.ahead = 8)

# Forecast GJR-GARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstGJR11_sstd <- rugarch::ugarchforecast(fitORspec = fitGJR11_sstd, n.ahead = 8)

Store forecasted conditional standard deviation of GJR-GARCH models into a data frame:

fcstGJR11 <- cbind(sigma(fcstGJR11_nd), sigma(fcstGJR11_std), sigma(fcstGJR11_sstd)) %>%
  `colnames<-`(c("GJR11_ND", "GJR11_STD", "GJR11_SSTD"))

fcstGJR11
##     GJR11_ND GJR11_STD GJR11_SSTD
## T+1 4.266492  4.238812   4.202596
## T+2 4.067775  4.070917   4.033529
## T+3 3.930610  3.946910   3.910168
## T+4 3.837050  3.856104   3.820965
## T+5 3.773795  3.790063   3.756919
## T+6 3.731303  3.742287   3.711183
## T+7 3.702886  3.707864   3.678655
## T+8 3.683941  3.683137   3.655590

Calculate evaluation metrics for each GJR-GARCH model:

evalGJR <- NULL

for (i in 1:3) {
  mse <- mean((proxy$Volatility^2 - fcstGJR11[,i]^2)^2)
  
  rmse <- sqrt(mse)
  
  mae <- mean(abs(proxy$Volatility^2 - fcstGJR11[,i]^2))
  
  evalGJR <- rbind(evalGJR, c(MSE = mse, RMSE = rmse, MAE= mae))
}

rownames(evalGJR) <- colnames(fcstGJR11)

evalGJR
##                 MSE     RMSE      MAE
## GJR11_ND   5519.740 74.29495 55.03242
## GJR11_STD  5525.771 74.33553 54.99476
## GJR11_SSTD 5556.277 74.54044 55.25315

5.4 EGARCH Forecasts

# Forecast EGARCH(1, 1) with normal distribution of innovations 8 periods ahead
fcstEGARCH11_nd <- rugarch::ugarchforecast(fitORspec = fitEGARCH11, n.ahead = 8)

# Forecast EGARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstEGARCH11_std <- rugarch::ugarchforecast(fitORspec = fitEGARCH11_std, n.ahead = 8)

# Forecast EGARCH(1, 1) with t-distribution of innovations 8 periods ahead
fcstEGARCH11_sstd <- rugarch::ugarchforecast(fitORspec = fitEGARCH11_sstd, n.ahead = 8)

Store forecasted conditional standard deviation of EGARCH models into a data frame:

fcstEGARCH11 <- cbind(sigma(fcstEGARCH11_nd), sigma(fcstEGARCH11_std), sigma(fcstEGARCH11_sstd)) %>%
  `colnames<-`(c("EGARCH11_ND", "EGARCH11_STD", "EGARCH11_SSTD"))

fcstEGARCH11
##     EGARCH11_ND EGARCH11_STD EGARCH11_SSTD
## T+1    4.614321     4.553992      4.519062
## T+2    4.137426     4.169141      4.151596
## T+3    3.880849     3.931721      3.925516
## T+4    3.737751     3.781577      3.783040
## T+5    3.656243     3.685053      3.691820
## T+6    3.609240     3.622320      3.632802
## T+7    3.581939     3.581253      3.594355
## T+8    3.566013     3.554239      3.569195

Calculate evaluation metrics for each EGARCH model:

evalEGARCH <- NULL

for (i in 1:3) {
  mse <- mean((proxy$Volatility^2 - fcstEGARCH11[,i]^2)^2)
  
  rmse <- sqrt(mse)
  
  mae <- mean(abs(proxy$Volatility^2 - fcstEGARCH11[,i]^2))
  
  evalEGARCH <- rbind(evalEGARCH, c(MSE = mse, RMSE = rmse, MAE= mae))
}

rownames(evalEGARCH) <- colnames(fcstEGARCH11)

evalEGARCH
##                    MSE     RMSE      MAE
## EGARCH11_ND   5413.873 73.57903 55.15565
## EGARCH11_STD  5425.776 73.65987 55.07382
## EGARCH11_SSTD 5439.500 73.75296 55.09558

5.5 Summary

Summarizing the evaluation metrics of all the models:

rbind(evalARCH, evalGARCH, evalGJR, evalEGARCH)
##                    MSE     RMSE      MAE
## ARCH2_ND      5830.373 76.35688 56.41940
## ARCH2_STD     5816.254 76.26437 56.29428
## ARCH2_SSTD    5831.107 76.36168 56.48219
## GARCH11_ND    5821.056 76.29584 56.63477
## GARCH11_STD   5793.129 76.11260 56.35842
## GARCH11_SSTD  5812.432 76.23930 56.52492
## GJR11_ND      5519.740 74.29495 55.03242
## GJR11_STD     5525.771 74.33553 54.99476
## GJR11_SSTD    5556.277 74.54044 55.25315
## EGARCH11_ND   5413.873 73.57903 55.15565
## EGARCH11_STD  5425.776 73.65987 55.07382
## EGARCH11_SSTD 5439.500 73.75296 55.09558

The EGARCH(1, 1) model with assumed normal distribution of innovations had the best MSE and RMSE, but it was surprising that the GJR-GARCH(1, 1) with assumed t-distribution of innovations had the best MAE. The EGARCH and GJR-GARCH models seem to be able to capture the higher volatility in the first 2 forecast periods better than the standard ARCH and GARCH models, likely due to the negative returns in those periods. However, all models were weak in capturing the volatility of returns past the 4th period in the forecast period using the basic forecast function. Since forecasting many periods ahead is bound to fail regardless of the type of models used, it is better to keep the forecast period short when using such models.

6 Final Remarks

While the project intended to show how ARCH and GARCH (and its variants) can be used in R to model volatility, I believe there is much to be desired from the forecasting results in Section 4. Due to the innovations (\(z_t\)) that cannot be observed, the forecasts does not include the 8-period ahead returns. The bootstrapping method of forecasting in the rugarch package may be a solution, which I may tackle in another project and determine its feasibility and accuracy. I am also curious about the performance of rolling estimation since there can be structural changes to AAPL, such as business environment, company prospects, new innovations, etc. that could affect the estimation of the models.

References

Costa, F. J. M. (2017). Forecasting volatility using GARCH models. Retrieved from https://core.ac.uk/download/pdf/132797589.pdf

Schmidt, L. (2021). Volatility Forecasting Performance of GARCH Models: A Study on Nordic Indices During COVID-19. Retrieved from https://www.diva-portal.org/smash/get/diva2:1566342/FULLTEXT01.pdf

Wennström, A. (2014). Volatility Forecasting Performance: Evaluation of GARCH type volatility models on Nordic equity indices. Retrieved from https://www.math.kth.se/matstat/seminarier/reports/M-exjobb14/140616b.pdf

Wikipedia. (n.d.) Autoregressive conditional heteroskedasticity. Retrieved 15 July 2022, from https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity.

Zivot, E. (2013). Univariate GARCH. Presentation, University of Washington. Retrieved from https://faculty.washington.edu/ezivot/econ589/ch18-garch.pdf