Updated on: 2022-07-28

1 Introduction

Through this project, my intention is to apply what I have learnt in R onto one of the areas that I always had an interest in, which is stock portfolio optimization. It documents the commonly used packages and functions in R that are useful for portfolio analysis and possibly even for back-testing. It is not intended to be a detailed or advanced guide to portfolio theory and R application.

The first part starts off with understanding some basic functions in the relevant packages, which would include importing stock prices from Yahoo Finance to calculating returns and important metrics, such as Beta and Sharpe Ratio. The second part would involve tools that we can use for single-period optimization and multi-period optimization with back-testing of portfolio optimization strategy.

2 Packages Required

#Use install.packages() or go to the Packages panel to install packages if they have not been installed
library(dplyr)
library(PerformanceAnalytics) # For portfolio performance and risk analysis
library(PortfolioAnalytics) # For portfolio optimization and analysis
library(quantmod) # For obtaining historical prices from Yahoo Finance
library(ROI) # For ROI solver in portfolio optimization
library(ROI.plugin.glpk) # Part of the ROI solver for linear optimization
library(ROI.plugin.quadprog) #Part of the ROI solver for quadratic optimization

3 Getting To Know Some Useful Functions

3.1 Retrieving Historical Price Data

The getSymbols function allows us to retrieve historical price data from sources such as Yahoo Finance and FRED. For this project, since we are dealing with portfolio optimization, I have retrieved the stock price data of Amazon (AMZN) from Yahoo Finance. Data can be retrieved for any stock that has a ticker listed in Yahoo Finance.

AMZN <- getSymbols(Symbols = "AMZN",
                   src = "yahoo",
                   # Can instead choose "weekly" or "monthly" to import weekly or monthly price data
                   periodicity = "daily",
                   auto.assign = F)

Let us get some details about the object AMZN.

head(AMZN, n = 4); tail(AMZN, n = 4)
##            AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
## 2007-01-03    1.9340    1.9530   1.9025     1.9350   248102000        1.9350
## 2007-01-04    1.9295    1.9570   1.9130     1.9450   126368000        1.9450
## 2007-01-05    1.9360    1.9395   1.8800     1.9185   132394000        1.9185
## 2007-01-08    1.9110    1.9155   1.8585     1.8750   135660000        1.8750
##            AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume AMZN.Adjusted
## 2022-07-22    125.01    125.50   121.35     122.42    51402700        122.42
## 2022-07-25    122.70    123.64   120.03     121.14    50221300        121.14
## 2022-07-26    115.79    118.15   114.53     114.81    67075100        114.81
## 2022-07-27    117.31    121.90   117.16     120.97    61480400        120.97
colSums(is.na(AMZN))
##     AMZN.Open     AMZN.High      AMZN.Low    AMZN.Close   AMZN.Volume 
##             0             0             0             0             0 
## AMZN.Adjusted 
##             0

AMZN contains the Opening Prices, High and Low Prices, Closing Prices, Volume and Adjusted Closing Prices of each day. It ranges from 03 January 2007 (the earliest date for which data can be imported) to 27 May 2022 (the most recent date when this was written). The colSums function can be used to find if there are any NA rows, which represents missing data, in the columns.

3.2 Calculating Daily Returns and Plotting Cumulative Returns

The Adjusted Price column will be most useful for our analysis in general as it has been adjusted for splits and dividends. I have subset that column into a new object called AMZN.Adj, calculate the discrete return of the stock and plot its cumulative returns over time.

AMZN.Adj <- AMZN[,6]

# na.omit() removes the first row, which is an NA value as there are no prior data to calculate returns
AMZN_Returns <- na.omit(Return.calculate(AMZN.Adj, method = "discrete"))
# We could use Return.calculate(AMZN.Adj, method = "log") to calculate the log return instead.

chart.CumReturns(AMZN_Returns, 
                 legend.loc = "topleft",
                 # Could specify geometric = F to use simple/arithmetic return when using log returns in the previous step
                 geometric = T,
                 main = "Cumulative Daily Return of Amazon")

To make interpretation easier, I have opted to use discrete returns in the Return.calculate function and geometric chaining in the chart.CumReturns function. For continuous compounding, log returns can be used instead, which can be added across time and simple/arithmetic chaining should be chosen instead of geometric chaining. However, log returns cannot be added across securities in a portfolio, which is another reason for choosing discrete returns.

4 Building a Portfolio With 2 or More Securities

4.1 Import Historical Data

Now that we have an understanding of how to import stock prices and calculate returns, we can start to create a portfolio with more stocks. Firstly, we can create a vector of the stock tickers that we are interested in analyzing and adding to our portfolio.

tickers <- c("JNJ", "PG", "AAPL", "TSM", "MSFT", "NVDA")

I have created an empty object called price_data which will be used to aggregate the stock price data we will be importing. I had also indicated a start and end date for the period which data will be collected. We can also subset the Adjusted Closing Price column when we import the price data to simplify the work.

price_data <- NULL

startdate <- "2012-01-01"
enddate <- "2022-05-23"

for (ticker in tickers) {
  price_data <- cbind(price_data,
                      getSymbols(ticker, 
                                 src = "yahoo",
                                 from = startdate, to = enddate, 
                                 periodicity = "daily", auto.assign = F)[,6])
}

colnames(price_data) <- tickers

Let us see some details about the object price_data.

head(price_data, n = 4); tail(price_data, n = 4)
##                 JNJ       PG     AAPL      TSM     MSFT     NVDA
## 2012-01-03 49.02983 48.56105 12.55747 9.633207 21.57289 3.223092
## 2012-01-04 48.73216 48.53924 12.62495 9.546028 22.08058 3.259823
## 2012-01-05 48.67261 48.33577 12.76511 9.633207 22.30622 3.376900
## 2012-01-06 48.24839 48.21952 12.89856 9.553293 22.65274 3.337875
##                 JNJ       PG   AAPL      TSM   MSFT     NVDA
## 2022-05-17 177.6783 153.6823 149.24 92.83105 266.20 181.7316
## 2022-05-18 174.3795 144.1045 140.82 90.05566 254.08 169.3442
## 2022-05-19 172.8294 140.7860 137.35 89.73734 253.14 171.2038
## 2022-05-20 175.8500 140.8754 137.59 90.30435 252.56 166.9047
nrow(price_data); colSums(is.na(price_data))
## [1] 2614
##  JNJ   PG AAPL  TSM MSFT NVDA 
##    0    0    0    0    0    0

The nrow function returns us the number of rows, representing the number of data points of each stock, and colSums shows us that there are no missing data in any of the columns.

4.2 Daily Returns of Securities and Portfolio Return

Based on most textbooks on portfolio theory, the calculation of portfolio return is: \[ R_p = \sum_{i=1}^n w_iR_i \] where \(R_i\) is the return of security \(i\).

Therefore, we need to calculate the discrete returns of the securities.

returns <- na.omit(Return.calculate(price_data, method = "discrete"))

Subsequently, we can use the Return.portfolio function and geometric chaining to calculate the portfolio’s daily returns. There is an option to specify a vector of desired weights for the individual securities and leaving it out uses an equal weight as the default.

To view the object r_noRebal we have to use the lapply function as the object is a list. Due to the size of the output, I have indicated to show only the first four rows of each element in the list.

r_noRebal <- Return.portfolio(returns,
                              geometric = T,
                              verbose = T)

colnames(r_noRebal$returns) <- "Rp_noRebal"

lapply(r_noRebal, head, n = 4)
## $returns
##               Rp_noRebal
## 2012-01-04  0.0041223277
## 2012-01-05  0.0102229941
## 2012-01-06 -0.0007821064
## 2012-01-09  0.0012500645
## 
## $contribution
##                      JNJ            PG          AAPL          TSM         MSFT
## 2012-01-04 -0.0010118738 -7.483707e-05  0.0008956558 -0.001508307  0.003922323
## 2012-01-05 -0.0002015900 -6.954705e-04  0.0018526637  0.001502115  0.001736058
## 2012-01-06 -0.0014215841 -3.933200e-04  0.0017459907 -0.001363003  0.002639178
## 2012-01-09  0.0002495992  6.889309e-04 -0.0002679227  0.002852148 -0.002272691
##                    NVDA
## 2012-01-04  0.001899367
## 2012-01-05  0.006029218
## 2012-01-06 -0.001989368
## 2012-01-09  0.000000000
## 
## $BOP.Weight
##                  JNJ        PG      AAPL       TSM      MSFT      NVDA
## 2012-01-04 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 2012-01-05 0.1649747 0.1659079 0.1668744 0.1644803 0.1698887 0.1678740
## 2012-01-06 0.1631057 0.1635406 0.1670196 0.1643028 0.1698879 0.1721434
## 2012-01-09 0.1618107 0.1632749 0.1688977 0.1630673 0.1726622 0.1702872
## 
## $EOP.Weight
##                  JNJ        PG      AAPL       TSM      MSFT      NVDA
## 2012-01-04 0.1649747 0.1659079 0.1668744 0.1644803 0.1698887 0.1678740
## 2012-01-05 0.1631057 0.1635406 0.1670196 0.1643028 0.1698879 0.1721434
## 2012-01-06 0.1618107 0.1632749 0.1688977 0.1630673 0.1726622 0.1702872
## 2012-01-09 0.1618579 0.1637592 0.1684193 0.1657123 0.1701767 0.1700746
## 
## $BOP.Value
##                  JNJ        PG      AAPL       TSM      MSFT      NVDA
## 2012-01-04 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 2012-01-05 0.1656548 0.1665918 0.1675623 0.1651584 0.1705890 0.1685660
## 2012-01-06 0.1654524 0.1658935 0.1694226 0.1666667 0.1723322 0.1746201
## 2012-01-09 0.1640103 0.1654945 0.1711937 0.1652841 0.1750094 0.1726021
## 
## $EOP.Value
##                  JNJ        PG      AAPL       TSM      MSFT      NVDA
## 2012-01-04 0.1656548 0.1665918 0.1675623 0.1651584 0.1705890 0.1685660
## 2012-01-05 0.1654524 0.1658935 0.1694226 0.1666667 0.1723322 0.1746201
## 2012-01-06 0.1640103 0.1654945 0.1711937 0.1652841 0.1750094 0.1726021
## 2012-01-09 0.1642633 0.1661928 0.1709222 0.1681750 0.1727058 0.1726021

An awesome feature of this function is that we can input re-balancing periods, which can be monthly, quarterly, or yearly. If rebalance_on was left out, the default assumes a buy-and-hold strategy (no re-balancing), and you may experiment with this based on your investing preferences.

r_withRebal <- Return.portfolio(returns, 
                                geometric = T, 
                                rebalance_on = "quarters", 
                                verbose = T)

colnames(r_withRebal$returns) <- "Rp_withRebal"

4.3 Compare Portfolios With and Without Re-balancing

Indicating verbose = T in Return.portfolio allows us to extract more details about the underlying portfolio return calculation, such as the weights at the beginning and end of periods. We take a look at how the end-of-period weight for JNJ changes when the portfolio is re-balanced versus when it is not re-balanced.

eop_weight_noRebal <- r_noRebal$EOP.Weight

eop_weight_wRebal <- r_withRebal$EOP.Weight

  par(mfrow = c(2,1), mar = c(2, 4, 2, 2))

  plot.zoo(eop_weight_wRebal$JNJ, 
           main = "JNJ End-of-Period Weights Over Time",
           ylab = "With Rebalancing")
           
  abline(h = 1/length(colnames(eop_weight_wRebal)), col = "red", lwd = 2)

  plot.zoo(eop_weight_noRebal$JNJ,
          ylab = "Without Rebalancing")
  
  abline(h = 1/length(colnames(eop_weight_wRebal)), col = "red", lwd = 2)

When we re-balance the portfolio quarterly, the end-of-period weights tend to fluctuate around the weight of approximately 17% (which is the weight in an equal weight portfolio). In contrast, the weight without re-balancing showed a decreasing trend. We can find out why this happened by plotting the end-of period weights of all stocks in the portfolio. Weight of NVDA had increased drastically over time as the stock’s value grew faster in that span of time compared to other stocks.

par(mfrow = c(1,1), mar = c(2, 2, 2, 2))

plot.zoo(eop_weight_noRebal,
         main = "End-of-Period Weights Over Time Without Rebalancing")

We can use the charts.PerformanceSummary function to compare the cumulative return of the portfolios with re-balancing and without re-balancing.

ret_comp <- cbind(r_noRebal$returns, r_withRebal$returns)

charts.PerformanceSummary(R = ret_comp,
                          main = "Comparison of Cumulative Returns",
                          legend.loc = "topleft")

We can compare the statistics of this two portfolios using table.stats function. We can see that re-balancing reduced standard deviation but has also slightly reduced returns (values are based on daily returns).

table.Stats(ret_comp)

4.4 Comparison With A Benchmark

We may also wish to compare our portfolios against a benchmark that is relevant to the portfolio we created. In this example, I have added large-cap U.S. stocks into my portfolio and so my benchmark would be the S&P500 index (ticker: SPY)

benchmark <- getSymbols("SPY", 
                        src = "yahoo",
                        from = startdate, to = enddate, 
                        periodicity = "daily", auto.assign = F)[,6]

nrow(benchmark)
## [1] 2614
colSums(is.na(benchmark))
## SPY.Adjusted 
##            0
benchmarkReturn <- na.omit(Return.calculate(benchmark, method = "discrete")) %>%
  `colnames<-`("SPY")

With the benchmark return, we can calculate the portfolios’ betas and the alphas using the functions CAPM.beta and CAPM.jensenAlpha. These functions have an optional input for the risk-free rate, which I had assumed to be 3% and since we are dealing with daily returns, we need to divide by 252 trading days. Other metrics that might be of interest are the Sharpe, Information, and Treynor Ratios.

perfMetrics <- round(rbind(CAPM.beta(ret_comp, benchmarkReturn, Rf = 0.03/252),
                           CAPM.alpha(ret_comp, benchmarkReturn, Rf = 0.03/252),
                           SharpeRatio(ret_comp, Rf = 0.03/252, FUN = "StdDev"),
                           TreynorRatio(ret_comp, benchmarkReturn, Rf = 0.03/252),
                           InformationRatio(ret_comp, benchmarkReturn)), digits = 4
)

perfMetrics
##                               Rp_noRebal Rp_withRebal
## Beta: SPY                         1.2705       1.0314
## Alpha: SPY                        0.0005       0.0004
## StdDev Sharpe (Rf=0%, p=95%):     0.0627       0.0723
## Treynor Ratio: SPY                0.2051       0.2214
## Information Ratio: SPY            0.9366       1.3389

The table.AnnualizeReturns function can be used to find the annualized return, standard deviation, and Sharpe ratio.

cbind(table.AnnualizedReturns(ret_comp, Rf = 0.03/252),
      table.AnnualizedReturns(benchmarkReturn, Rf = 0.03/252))

For those who are interested in the breakdown of returns by months in each year, the function table.CalendarReturns can be used.

table.CalendarReturns(benchmarkReturn)

5 Portfolio Optimization

In Section 4, I had assumed that weights were given (using the default option to calculate the returns of an equal weight portfolio). In this part, we are going to optimize the weight of the stocks in our portfolio based on certain objectives.

5.1 Minimizing Variance

Before we begin with the optimization, we need to create a list of constraints and objectives for our portfolio specification. To do so, we need to use the portfolio.spec function before adding constrains or objectives using add.constraint or add.objective.

portspec1 <- portfolio.spec(colnames(returns))

# Sum of weights constrained to 1, can also specify as type = "full investment"
portspec1 <- add.constraint(portspec1, 
                            type = "weight_sum", 
                            min_sum = 1, max_sum = 1)

# Weight constraint on each stock, max is 35% of portfolio
portspec1 <- add.constraint(portspec1, 
                            type="box",
                            min = 0, max = 0.35)

# Objective to minimize risk based on variance (function will default to standard deviation as measure of risk)
portspec1 <- add.objective(portspec1,
                           type = "risk",
                           name = "var")

After indicating our desired constraints and objectives, we can proceed to use the function optimize.portfolio to calculate the optimal weights that minimizes variance. optimize.portfolio uses different types of solvers to calculate the optimal weight, and since minimizing variance is a quadratic optimization problem, we use the quadprog solver from the ROI.plugin.quadprog. We can instead indicate ROI which automatically chooses between glpk or quadprog solvers based on whether it is a linear or quadratic optimization problem.

port_MV <- optimize.portfolio(returns, 
                              portspec1,
                              optimize_method = "quadprog",
                              # Indicating "trace = T" allows us to use 
                              # additional information in the later parts
                              trace = T)

port_MV
## ***********************************
## PortfolioAnalytics Optimization
## ***********************************
## 
## Call:
## optimize.portfolio(R = returns, portfolio = portspec1, optimize_method = "quadprog", 
##     trace = T)
## 
## Optimal Weights:
##    JNJ     PG   AAPL    TSM   MSFT   NVDA 
## 0.3500 0.3500 0.0773 0.1426 0.0800 0.0000 
## 
## Objective Measure:
##   StdDev 
## 0.009587

5.2 Maximizing Return

This time, I am optimizing for the other extreme case, which maximizes returns of the portfolio. The steps are similar to those used in Section 5.1, except for a change in the objective and the type of solver used. Since maximizing return is a linear optimization problem, we can use the glpk solver, or simply indicate ROI.

portspec2 <- portfolio.spec(colnames(returns))

# We use the same constraints as in Case 1
portspec2 <- add.constraint(portspec2, 
                            type = "weight_sum", 
                            min_sum = 1, max_sum = 1)

portspec2 <- add.constraint(portspec2, 
                            type="box", 
                            min=0, max=0.35)

# Objective to maximize return based on mean return
portspec2 <- add.objective(portspec2,
                           type = "return",
                           name = "mean")
port_MR <- optimize.portfolio(returns, 
                              portspec2,
                              optimize_method = "glpk",
                              trace = T)

port_MR
## ***********************************
## PortfolioAnalytics Optimization
## ***********************************
## 
## Call:
## optimize.portfolio(R = returns, portfolio = portspec2, optimize_method = "glpk", 
##     trace = T)
## 
## Optimal Weights:
##  JNJ   PG AAPL  TSM MSFT NVDA 
## 0.00 0.00 0.35 0.00 0.30 0.35 
## 
## Objective Measure:
##     mean 
## 0.001352

5.3 Comparing Portfolios and Performance Against Market Benchmark

To compare the portfolios, we need to extract the optimal weights that had been calculated in the previous steps and calculate the portfolio returns. I have created a bar chart to show how the optimal weights differ in each portfolio.

weight_MV <- extractWeights(port_MV)

weight_MR <- extractWeights(port_MR)

par(mfrow = c(2,1), mar = c(2, 4, 2, 2))

barplot(weight_MV, 
        main = "Weights in Different Portfolios",
        ylab = "Portfolio MV")

barplot(weight_MR,
        ylab = "Portfolio MR")

With the weights, we can calculate the daily returns of the different portfolio using the Return.portfolio function. I have included the re-balancing periodicity, but you may choose to leave it out or change to months or years depending on your preference.

rp_MV <- Return.portfolio(returns, 
                          weights = weight_MV,
                          rebalance_on = "quarters",
                          geometric = T) %>%
  `colnames<-`("Portfolio MV")

rp_MR <- Return.portfolio(returns, 
                          weights = weight_MR,
                          rebalance_on = "quarters",
                          geometric = T) %>%
  `colnames<-`("Portfolio MR")

With the portfolio returns, we can compare the how the portfolios performed against the market benchmark. I have included a chart of cumulative returns and some statistics and metrics as I had done in Section 4.

comparison <- cbind(rp_MV, rp_MR, benchmarkReturn)

charts.PerformanceSummary(comparison,
                          main = "Comparing Performance of Portfolios",
                          legend.loc = "topleft")

table.Stats(comparison)
table.AnnualizedReturns(comparison, 0.03/252, scale = 252)

6 Back-Testing Optimization Strategy

6.1 Create Mean-Variance Specification

Let us create a new portfolio specification for back-testing that follows the mean-variance portfolio theory.

portspec3 <- portfolio.spec(colnames(returns))

# Slight change in weight_sum to reduce restrictiveness while optimizing
portspec3 <- add.constraint(portspec3, 
                            type = "weight_sum",
                            min_sum = 0.99, max_sum = 1.01)

portspec3 <- add.constraint(portspec3, 
                            type="box", 
                            min = 0, max = 0.4)

# Add a constraint to have a target return
portspec3 <- add.constraint(portspec3,
                            type = "return",
                            return_target = 0.15/252)

# Adding a return objective, thus we maximize mean return per unit of risk
portspec3 <- add.objective(portspec3,
                           type = "return",
                           name = "mean")

portspec3 <- add.objective(portspec3,
                           type = "risk",
                           name = "var")

6.2 Optimize Portfolio

The PortfolioAnalytics package provides a handy function called optimize.portfolio.rebalancing which can provide back-testing capabilities. It is similar to the function optimize.portfolio but has requires other inputs. To do back-testing, we need to input the training period for when the portfolio will be optimized, and the rolling window which rolls over historical data. For this part, I will be using the random portfolios solver and in order to do so, I will first generate a matrix of random portfolios. This will then be fed to the optimization function to prevent recalculation. A larger number of random portfolios tend to lead to better optimization results but will also require more time for calculation.

set.seed(123)

randport <- random_portfolios(portfolio = portspec3, 
                              permutations = 20000, 
                              rp_method = "sample", 
                              eliminate = T)

port_MeanVar_qRebal <- optimize.portfolio.rebalancing(returns,
                                                      portspec3,
                                                      optimize_method = "random",
                                                      rebalance_on = "quarters",
                                                      rp = randport,
                                                      maxSR = T, # maximize Sharpe Ratio
                                                      search_size = 5000,
                                                      training_period = 504,
                                                      rolling_window = 45)

I have generated the randport first so that the optimize.portfolio.rebalancing function does not recalculate these portfolios. Since our data is in daily periods, I have used 504 days as the optimization period for the portfolio for which weights are calculated, and this historical data rolls forward by 45 days each time. We can take a look at how the weights changed over time by using the chart.Weights function.

chart.Weights(port_MeanVar_qRebal)

6.3 Portfolio Returns and Performance

Using the function Return.portfolio, we calculate the returns of the portfolio. Since the optimization included quarterly re-balancing, there is no need to indicate it in this function.

w_qOptim <- extractWeights(port_MeanVar_qRebal)

rp_MeanVar <- Return.portfolio(returns, 
                               weights = w_qOptim, 
                               geometric = T)

Lastly, we can visualize the performance of this portfolio by plotting its cumulative returns and compare it against the market benchmark.

evaluation <- cbind(rp_MeanVar,benchmarkReturn)

charts.PerformanceSummary(evaluation, Rf=0.03/252, main = "Performance of Mean-Variance Portfolio")

table.AnnualizedReturns(evaluation, Rf=0.03/252)

7 Final Remarks

This project serves to apply what I have learnt in R to portfolio optimization. It documents the functions and packages that are used and how to create interesting visualizations of portfolio performance. While optimization and back-testing is a useful and important concept, it is critical to note that past performance of stocks should not be used to indicate future performance. It only serves to test how certain investment strategies performed based on historical data and whether it would suit an investor’s preference for risk and return.