\documentclass[12pt]{article}
% \usepackage{abstract} % Allows abstract customization
\usepackage{microtype}
\usepackage{fancyhdr} % Headers and footers
%% \pagestyle{fancy} % All pages have headers and footers
\usepackage{graphicx}
\usepackage{natbib}
\usepackage{parskip}
\usepackage{booktabs}
\usepackage{float}
\usepackage{framed}
\usepackage{hyperref}
\usepackage{amssymb,amsmath}

% \renewcommand{\abstractnamefont}{\normalfont\bfseries} % Set the "Abstract" text to bold
% \renewcommand{\abstracttextfont}{\normalfont\small\itshape} % Set the abstract itself to small italic text


% Expectation symbol
\DeclareMathOperator*{\E}{\mathbb{E}}


\title{\vspace{-15mm}\fontsize{16pt}{16pt}\selectfont\textbf{Constructing an ETF Portfolio}}
%% \author{Ian L. Kaplan}
\author{
\large
\textsc{Ian L. Kaplan}
%% \thanks{}
\\[2mm]
%% \normalsize University of Washington \\ % Your institution
\normalsize{iank@bearcave.com} \\[1mm]% Your email address
\normalsize{http://www.bearcave.com}
\vspace{-5mm}
}
\date{December 2, 2014}
\begin{document}

\maketitle

\graphicspath{ {../diagrams/} }
\bibliographystyle{plain}
\raggedright

\begin{abstract} 

These notes discusses the construction of investment portfolios
consisting of Exchange Traded Funds, ETFs. These portfolios are
constructed using quantitative portfolio techniques. This document is
somewhere between an academic paper and a lab notebook. As a result,
this paper is written in a more informal style. As a record of
experimentation, these notes are a "warts and all" account. They cover
the process of developing the portfolio, including back-testing
mistakes and the experiments that didn't work out.

Nothing in these notes should be viewed as investment advice.
\end{abstract}

\newpage

<<echo=FALSE, message=FALSE>>=
library(knitr)
library(tseries)
library(timeDate)
library(timeSeries)
library(zoo)
library(xts)
library(its)
library(data.table)
library(PerformanceAnalytics)
library(quadprog)
library(quantmod)
library(TTR)
library(corrplot)
library(xtable)
library(SiZer)
library(tawny)
library(tawny.types)
library(stringr)

@

<<echo=FALSE, message=FALSE>>=

roundWeights = function(wtsRaw) {
  wVec = round(round(wtsRaw * 1000, -1)/1000, 2)
  return(wVec)
}

calcRet = function( v ) {
  n = length(v)
  val = as.numeric(v)
  R = (val[2:n]/val[1:(n-1)]) - 1
  if (class(v) == "zoo") {
    R = zoo(R)
    index(R) = index(v)[2:n]
  } else {
    names(R) = names(v)[2:n]
  }
  return(R)
}

#
# Find the nearest market date (moving backward in time)
#
findMarketDate = function( date )
{
  while(! isBizday(x = as.timeDate(date), holidays=holidayNYSE(as.numeric(format(date, "%Y"))))) {
    date = date - 1
  }
  return(date)
}


fixNAHoles = function(vec.z)
{
  if (sum(is.na(vec.z)) > 0) {
    zooIx = index(vec.z)
    v = coredata(vec.z)
    v_approx = na.approx(v, maxgap=8, na.rm=FALSE)
    v_fill = na.fill(v_approx, fill="extend")
    newData.z = zoo(v_fill)
    index(newData.z) = zooIx
  } else {
    newData.z = vec.z
  }
  return(newData.z)
}


#
getCloseData = function(symbols, startDate, endDate, period="w")
{
  closeData.z = c()
  firstTime = TRUE
  minDate = c()
  maxDate = c()
  fetchedSyms = c()
  for (i in 1:length(symbols)) {
    sym = symbols[i]
    symClose.z = NULL
    timeOut = 1
    startDate.ch = as.character( findMarketDate(as.Date(startDate)))
    endDate.ch = as.character( findMarketDate(as.Date(endDate)))
    while ((timeOut < 7) && is.null(symClose.z)) {
      try(
        (symClose.z = get.hist.quote(instrument=sym, start=startDate.ch, end=endDate.ch, quote="AdjClose",
                                     provider="yahoo", compression=period, retclass="zoo", quiet=T)),
        silent = TRUE)
      endDate.ch = as.character( findMarketDate( (as.Date(endDate) - 1)))
      timeOut = timeOut + 1
    }
    if (! is.null(symClose.z)) {
      fetchedSyms = c(fetchedSyms, sym)
      dateIx = index(symClose.z)
      if (firstTime) {
        closeData.z = symClose.z
        firstTime = FALSE
        minDate = min(dateIx)
        maxDate = max(dateIx)
      } else {
        minDate = max(minDate, min(dateIx))
        maxDate = min(maxDate, max(dateIx))
        matIx = index(closeData.z)
        repeat {
          startIx = which(matIx == minDate)
          if (length(startIx) > 0 && startIx > 0) {
            break()
          } else {
            minDate = minDate + 1
          }
        }
        endIx = which(matIx == maxDate)
        matIxAdj = matIx[startIx:endIx]
        closeData.z = cbind(closeData.z[matIxAdj,], symClose.z[matIxAdj])
      }
    }
  }
  if (length(closeData.z) > 0) {
     colnames(closeData.z) = fetchedSyms
  }
  return( closeData.z )
}


countNA = function( data.z )
{
  counts = apply(data.z, 2, FUN=function(v) {
                                  cntNA = sum(is.na(v))
                                  return(cntNA)
                                })
  names(counts) = colnames(data.z)
  return(counts)
}

#
# Take a risk free rate time series, which is reported on Wednesday and set the dates to the 
# corresponding dates in the ETF return data frame.
#
adjustRfDates = function(rf.z, retData.z)
{
  rfDates = index(rf.z)
}


meanPredictors = function(close.v, winSize, ahead, emaWin )
{
  end = seq(from=1, to=(length(close.v)-(winSize + ahead)), by=ahead)
  xvals = seq(from=1, to=winSize, by=1)
  result.df = data.frame()
  for (i in end) {
    start = i + (winSize-1)
    buyIx = (i + winSize)
    sellIx = buyIx + ahead
    closeBlock = close.v[i:buyIx]
    ret = calcRet(closeBlock)
    buy = close.v[buyIx]
    sell = close.v[sellIx]
    actual = log(coredata(sell)) - log(coredata(buy))
    mu = mean(ret)
    ema = EMA(ret, n=emaWin)[winSize]
    l = lm(coredata(ret) ~ xvals)
    lmavg = l$coef[1] + l$coef[2] * (winSize + (ahead-1))
    row.df = data.frame(mu, ema, lmavg, actual)
    result.df = rbind(result.df, row.df)
  }
  colnames(result.df) = c("mean", "EMA", "ols", "actual")
  return(result.df)
}


lmPredictor = function( close.v, winSize, ahead, emaWin )
{
  end = seq(from=1, to=(length(close.v)-(winSize + ahead)), by=ahead)
  result.df = data.frame()
  for (i in end) {
    start = i + (winSize-1)
    buyIx = (i + winSize)
    sellIx = buyIx + ahead
    closeBlock = close.v[i:buyIx]
    lmClose = segmentedLM( closeBlock )
    lmRet = calcRet(lmClose)
    ret = calcRet(closeBlock)
    buy = close.v[buyIx]
    sell = close.v[sellIx]
    actual = log(coredata(sell)) - log(coredata(buy))
    mu = mean(ret)
    t = EMA(ret, n=emaWin)
    ema = t[length(t)]
    lmPred = mean(lmRet)
    row.df = data.frame(mu, ema, lmPred, actual)
    result.df = rbind(result.df, row.df)
  }
  colnames(result.df) = c("mean", "EMA", "segLM", "actual")
  return(result.df)
}


#
# Build a correlation table for the predictor functions vs. the "ahead" period returns.
#
predictorCorr = function(closeData.df, winSize = 52, ahead = 12, emaWin = 12, FUN)
{
  colNames = colnames(closeData.z)
  corRslt = matrix(0, nrow=3, ncol=length(colNames))
  colnames(corRslt) = colNames
  for (i in 1:length(colNames)) {
    clos.v = closeData.df[,colNames[i]]
    pred = FUN(clos.v, winSize, ahead, emaWin)
    corRow = cor(pred)[1:3,"actual"]
    corRslt[,colNames[i]] = corRow
  }
  rownames(corRslt) = names(corRow)
  return(corRslt)
}


#
# retVec.z : a vector of returns
# alphaFunc : the function to use for forecasting the future return (mean, EMA or ols)
# ahead : number of steps ahead for the linear model prediction
# emaWin the EMA window
#
# The function returns the forecasted return
#
forecastRet = function(retVec.z, Func, mult, emaWin = 12)
{
  forecast = NULL
  retVec = coredata(retVec.z)
  if (Func == "mean") {
    forecast = mean(retVec)
  } else if (Func == "EMA") {
    ema = EMA(retVec, n=emaWin)
    forecast = ema[length(ema)]
  } else if (Func == "ols") {
    xvals = seq(from=1, to=length(retVec), by=1)
    l = lm(retVec ~ xvals)
    forecast = l$coef[1] + l$coef[2] * (length(retVec) + 1)
  } else {
    cat(paste("forecastRet: unrecognized forecast function:", Func))
  }
  forecastAdj = forecast * mult
  return( forecastAdj )
}


forecastRetVec = function(retMat.z, Func, multiplier, emaWin = 12)
{
  numCols = ncol(retMat.z)
  r_p = rep(0, numCols)
  for (i in 1:numCols) {
    ret.v = as.vector(retMat.z[,i])
    func = Func[i]
    mult = multiplier[i]
    r_p[i] = forecastRet(ret.v, func, mult, emaWin)
  }  
  names(r_p) = colnames(retMat.z)
  return(r_p)
}


#
# retTarget - the target return
# r_p - a vector of forecasted returns
# retMat.z - an n x m matrix (52 x 7, for seven assets and 52 weeks)
#
# The function returns the portfolio weights or null
#
portfolioWeights = function(retTarget, r_p, S.hat )
{
  numCols = ncol(S.hat)
  D = 2 * S.hat
  one = rep(1, numCols)
  diagOne = diag(numCols)
  A = cbind(one, r_p, diagOne)
  zero = rep(0, numCols)
  b0 = c(1, retTarget, zero)
  d = zero
  w = NULL
  try(
  {
    solveRslt = solve.QP(Dmat = D, dvec=d, Amat=A, bvec=b0, meq=2)
    w = round(solveRslt$solution, 6)
    names(w) = colnames(S.hat)
  }, silent = TRUE)
  return(w)
}


weightsPortfolio.optim = function(retTarget, r_p, S.hat )
{
  r_p.m = matrix(r_p, nrow=1)
  numAssets = ncol(r_p.m)
  reshigh = rep(1, numAssets)
  reslow = rep(0, numAssets)
  port.sol = NULL
  try(
    (port.sol = portfolio.optim(x = r_p.m, pm = retTarget, covmat = S.hat, shorts=FALSE,
                               reslow = reslow, reshigh = reshigh )),
    silent=TRUE
  )
  w = NULL
  if (! is.null(port.sol)) {
    w = port.sol$pw
    names(w) = names(r_p)
  }
  return(w)
}


calcLongOnlyWeights = function(retMat.z, Funcs, multiplier, adj = 0.05 )
{
  r_p = forecastRetVec(retMat.z, Funcs, multiplier )
  targetRet = max(r_p)
  targetRetAdj = targetRet - (targetRet * adj)
  t_port = TawnyPortfolio(retMat.z, nrow(retMat.z))
  S.hat = cov_shrink(t_port)
  wRaw = portfolioWeights(targetRetAdj, r_p, S.hat )
  # wVec = round(wRaw * 10000, -1)/10000
  wVec = round(wRaw, 4)
  stdDev = sqrt(t(wVec) %*% S.hat %*% wVec)
  wRow = c(wVec, targetRetAdj, stdDev)
  names(wRow) = c(names(wVec), "targRet", "stdDev")
    return(wRow)
}


#
# Calculate portfolio weights over a backtest period
#
calcLongOnlyPort = function(retData.z, Funcs, multiplier, winSize = 52, ahead = 12, adj = 0.05 )
{
  qtrSeq = rev(seq(from=(nrow(retData.z) - ahead), to=1, by=-ahead))
  endSeq = qtrSeq[ qtrSeq > winSize]
  startSeq = endSeq - 51
  wMat = c()
  allDates = rownames(retData.z)
  dates = c()
  for (j in 1:length(startSeq)) {
    start = startSeq[j]
    end = endSeq[j]
    retMat.z = retData.z[start:end,]
    wRow = calcLongOnlyWeights(retMat.z, Funcs, multiplier, adj=adj )
    wMat = rbind(wMat, wRow)
    d = allDates[end]
    dates = c(dates, d)
  }
  rownames(wMat) = dates
  return(wMat)
}

#
# Calculate the backtest weights and the actual portfolio return.
#
calcLongPortReturn = function(retData.z, closeData.z, Funcs, multiplier, winSize = 52, ahead = 12 )
{
  wMat = c()
  allDates = rownames(retData.z)
  dates = c()
  closeDataAdj.z = closeData.z[,colnames(retData.z)]
  startSeq = seq(from=1, to=(nrow(retData.z) - (winSize + ahead)), by=ahead)
  for (j in 1:length(startSeq)) {
    start = startSeq[j]
    end = start + (winSize-1)
    retMat.z = retData.z[start:end,]
    r_p = forecastRetVec(retMat.z, Funcs, multiplier )
    targetRet = max(r_p)
    targetRetAdj = targetRet - (targetRet * 0.05)
    S.hat = cov(retMat.z)
    wRaw = portfolioWeights(targetRetAdj, r_p, S.hat )
    # wVec = round(wRaw * 10000, -1)/10000
    wVec = round(wRaw, 4)
    stdDev = sqrt(12 * (t(wVec) %*% S.hat %*% wVec))
    buy = end + 1
    sell = end + ahead
    qtrRet = log(coredata(closeDataAdj.z[sell,])) - log(coredata(closeDataAdj.z[buy,]))
    portRet = t(wVec) %*% t(qtrRet)
    wRow = c(wVec, portRet, stdDev)
    names(wRow) = c(names(wVec), "QtrRet", "QtrStdDev")
    wMat = rbind(wMat, wRow)
    d = allDates[end + (ahead-1)]
    dates = c(dates, d)
  }
  rownames(wMat) = dates
  return(wMat)
}

#
# Given a set of weights, calculate a portfolio return vector the close prices
#
# The weights are calculated on the holding period boundaries. For example, if there
# is a quarterly holding period, the weights are calculated at the start of the quarter.
# Assets are bought at that point and then sold when the next set of weights are calculated,
# yielding a return.
calcPortReturn = function(weights.m, closeData.z, ahead )
{
  wtDates = as.Date(rownames(weights.m))
  closeDates = index(closeData.z)
  ix = which(closeDates %in% wtDates)
  closeEnd = length(closeDates)
  weightsAdj.m = weights.m
  if ((max(ix) + 12) <= closeEnd ) {
    ix = c(ix, (max(ix) + 12))
  } else {
    weightsAdj.m = weights.m[-nrow(weights.m),]
  }
  qtrClose.z = closeData.z[ix,]
  ret.z = apply(qtrClose.z, 2, FUN=function(v) {calcRet(v)})
  portRet = rep(0, nrow(weightsAdj.m))
  names(portRet) = rownames(ret.z)
  for (i in 1:nrow(weightsAdj.m)) {
    w = weightsAdj.m[i,]
    r = ret.z[i,]
    r_p = t(w) %*% r
    portRet[i] = r_p
  }
  rslt.l = list(portRet = portRet, closeDates = index(qtrClose.z))
  return(rslt.l)
}


segmentedLM = function(data.z) {
  data = as.numeric(data.z)
  x = seq(from=1, to=length(data.z))
  plm = piecewise.linear(x = x, y = data )
  seg.z = zoo(plm$model$fitted.values)
  index(seg.z) = index(data.z)
  return(seg.z)
}


gainLossRatio = function( retVec ) {
  n = length(retVec)
  bl_ratio = (sum(pmax(retVec, 0))/n) / (sum(pmax(0 - retVec, 0))/n)
  return(bl_ratio)
}

# 
# This is a modified version of the Sharpe Ratio, where the benchmark is not subtracted from 
# the asset/portfolio return. Subtracting the risk-free rate in an era of low interst rates would
# not make much of a different. Subtracting a benchmark like the S\&P 500 would give bond funds a
# negative Sharpe ratio which would cause problems in portfolio optimization.
# 
sharpeRatio = function( retVec) {
  sharpe = mean(retVec) / sd(retVec)
  return(sharpe)
}


#
# Filter out ETFs that have a correlation of 0.9 or more with other ETFs.
# By filtering out assets with high correlation, assets in the same sector
# will tend to be filtered out.
#
correlationFilter = function(glNames, retBlock, n) {
  filtBlock = retBlock[,which(colnames(retBlock) %in% glNames)]
  corBlock = cor(filtBlock)
  filtNames = glNames
  ix = 1
  while (ix < length(filtNames)) {
    curName = filtNames[ix]
    end = length(filtNames)
    others = filtNames[(ix+1):end]
    corLine = corBlock[curName, others]
    remove = corLine[corLine >= 0.90]
    if (length(remove) > 0) {
      removeNames = names(remove)
      removeIxRaw = which(filtNames %in% removeNames)
      removeIx = removeIxRaw[ removeIxRaw > ix ]
      if (length(removeIx) > 0) {
         filtNames = filtNames[-removeIx]
      }
    }
    ix = ix + 1
  }
  end = min(n, length(filtNames))
  rsltNames = filtNames[1:end]
  return(rsltNames)
}


#
# Filter assets on the basis of their gain-loss ratio and on the basis of correlation.
#
filterAssets = function( retBlock, func, n = 40 ) {
  lessCorNames = NULL
  if (n > 0) {
    assetVals = apply( coredata(retBlock), 2, FUN=func)
    names(assetVals) = colnames(retBlock)
    assetValsOrd = sort(assetVals, decreasing=TRUE) # larger is assumed to be better
    assetValsNames = names(assetValsOrd)
    lessCorNames = correlationFilter(assetValsNames, retBlock, n)
  }
  return( lessCorNames )
}


calcWeights = function( position ) {
  shares = as.numeric(as.vector(position[,"numShares"]))
  names(shares) = as.vector(position[,"sym"])
  price = as.numeric(as.vector(position[,"fillPrice"]))
  names(price) = as.vector(position[,"sym"])
  assets = sort(unique(as.vector(position[,"sym"])))
  dollarPos = shares * price
  totalPos = rep(0, length(assets))
  names(totalPos) = assets
  for (i in 1:length(assets)) {
    ix = which(names(dollarPos) %in% assets[i])
    totalPos[assets[i]] = sum(dollarPos[ix])
  }
  weights = totalPos/sum(totalPos)
  weightsRnd = round(weights, 4)
  return(weightsRnd)
}


#
# For a single period, calculate the portfolio weights for that period.
#
# func: the function to provide the values for filtering. For example, gainLossRatio or
# shareRatio
#
calcPortfolioWeights = function(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  selectAssets = filterAssets(retBlock, filterFunc )
  if (length(selectAssets) > 1) {
    assetRet.z = zoo(retBlock[,selectAssets])
    t_port = TawnyPortfolio(assetRet.z, nrow(assetRet.z))
    S.hat = cov_shrink(t_port)
    alpha = apply(assetRet.z, 2, FUN=alphaFunc, win=emaWin)
    targetRet = max(alpha)
    targetRetAdj = targetRet - (targetRet * adj)
    wRaw = portfolioFunc(targetRetAdj, alpha, S.hat)
    # wVec = round(wRaw * 10000, -1)/10000
    wVec = roundWeights(wRaw)
    wVecTrunc = wVec[ wVec != 0]
  } else {
    wVecTrunc = 1
    names(wVecTrunc) = selectAssets
  }
  return(wVecTrunc)
}

#
# Calculate a set of portfolio returns.
#
# etfRet: a matrix of N x M returns 
#
# The function returns a list consisting of a data and the protfolio weights calculated for
# that date.
#
# func: the filter function for assets.
#
backtestPortfolio = function(etfRet, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  dataDates = rownames(etfRet)
  numWeeks = length(dataDates)
  endIx = rev( seq(from=numWeeks, to=53, by=-12))
  startIx = endIx - 51
  weight.l = list()
  for (i in 1:length(startIx)) {
    retBlock = etfRet[startIx[i]:endIx[i],]
    blockDates = rownames(retBlock)
    startDate = blockDates[1]
    endDate = blockDates[length(blockDates)]
    w = calcPortfolioWeights(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin, adj)
    weight.l[[i]] = list(date = endDate, weights = w)
  }
  return(weight.l)
}


backtestEqWtPortfolio = function(etfRet, func ) {
  dataDates = rownames(etfRet)
  numWeeks = length(dataDates)
  endIx = rev( seq(from=numWeeks, to=53, by=-12))
  startIx = endIx - 51
  weight.l = list()
  for (i in 1:length(startIx)) {
    retBlock = etfRet[startIx[i]:endIx[i],]
    blockDates = rownames(retBlock)
    startDate = blockDates[1]
    endDate = blockDates[length(blockDates)]
    selectAssets = filterAssets(retBlock, func, n=20)
    w = rep(1/length(selectAssets), length(selectAssets))
    names(w) = selectAssets
    weight.l[[i]] = list(date = endDate, weights = w)
  }
  return(weight.l)
}


#
# the port.l has a set of portfolios (ETF symbols and weights). The associated date is the
# date that the portfolio will be bought and the date when the previous portfolio will be sold.
#
calcBacktestRet = function(port.l, etfClose) {
  buyDates = as.Date(sapply(port.l, FUN=function(e) {e$date}))
  closeDates = as.Date(as.vector(rownames(etfClose)))
  closeIx = which(closeDates %in% buyDates)
  port.lAdj = port.l
  if ((max(closeIx) + 12) <= length(closeDates) ) {
    closeIx = c(closeIx, (max(closeIx) + 12))
  } else {
    port.lAdj = port.l[-length(port.l)]
  }
  qtrClose = etfClose[closeIx,]
  qtrRet = apply(qtrClose, 2, FUN=function(v) {calcRet(v)})
  qtrDates = rownames(qtrRet)
  portRet = rep(0, length(port.lAdj))
  names(portRet) = qtrDates
  for (i in 1:length(port.lAdj)) {
    elem = port.lAdj[[i]]
    wts = elem$weights
    investDate = elem$date
    syms = names(wts)
    sellDate = qtrDates[i]
    r = qtrRet[i,syms]
    r_p = r %*% wts
    portRet[i] = r_p
  }
  return(portRet)
}


getBenchmark = function(benchSym, port.l, etfClose) {
  buyDates = as.Date(sapply(port.l, FUN=function(e) {e$date}))
  closeDates = as.Date(as.vector(rownames(etfClose)))
  closeIx = which(closeDates %in% buyDates)
  if ((max(closeIx) + 12) <= length(closeDates) ) {
    closeIx = c(closeIx, (max(closeIx) + 12))
  }
  qtrDates = closeDates[closeIx]
  startDate = qtrDates[1]
  endDate = qtrDates[ length(qtrDates) ]
  benchCloseData = getCloseData(benchSym, startDate, endDate)
  benchQtr = NULL
  if ((! is.null(benchCloseData)) && length(benchCloseData) > 0) {
    benchCloseDates = as.Date(index(benchCloseData))
    benchIx = which(benchCloseDates %in% qtrDates)
    benchQtr = benchCloseData[benchIx]
  }
  return(benchQtr)
}


emaMean = function(v, win=12) {
  ema = EMA(v, n = win)
  est = ema[ length(ema)]
  return(est)
}

# The win argument is ingored. This argument allows the meanMean and emaMean functions to have
# the same function "signature".
meanMean = function( assetRet, win ) {
  est = mean(assetRet)
  return(est)
}

#
# Filter out weights that are less than 1% (0.01) and evenly distribute the weight to 
# the other assets.
#
weightFilter = function( wts ) {
  newWts = wts
  ix = which( wts < 0.01)
  if (length(ix) > 0) {
    newWts = wts[-ix]
    bits = 1 - sum(newWts)
    newWts = round((newWts + (bits/length(newWts))), 3)
  }
  return(newWts)
}


calcWeeklyReturn = function(blockRet, wts) {
  retFilt = blockRet[, which(colnames(blockRet) %in% names(wts))]
  if (ncol(retFilt) != length(wts)) {
    cat("calcWeeklyReturn: there are missing stocks")
    ix = which(names(wts) %in% colnames(blockRet))
    cat("missing: ", names(wts[-ix]))
  }
  retSeries = as.vector( retFilt %*% wts )
  names(retSeries) = rownames(retFilt)
  return( retSeries )
}


#
# Round portfolio positions by removing the positions below the threshold and
# proportionally redistributing the weight to other assets that have weights
# that are above the threshold.
#
roundPortfolio = function( wts, thresh) {
  newWt = wts
  ix = which(wts <= thresh)
  if (length(ix) > 0) {
    gtWt = wts
    gtWt[ ix ] = 0
    ltWt = sum(wts[ix])
    sumGtWt = sum(gtWt)
    frac = gtWt / sumGtWt
    fracInc = frac * ltWt
    newWtRaw = gtWt + fracInc
    newWt = newWtRaw[ -which(newWtRaw == 0)]
    newWt = roundWeights( newWt )
  }
  return( newWt)
}


@


<<echo=FALSE>>=
# set the 
dataFileRoot = "/home/iank/Documents/etf_portfolios/portfolio_2014"

# ETF symbols plus explaination
etfPort = c("SHY", "Core short-term US Bond", "Fixed Income", "2002-07-22",
            "AGG", "Core Total US Bond Market",  "Fixed Income", "2003-09-22",
            "HYG", "High Yield Corporate Bond ETF", "Fixed Income", "2007-04-04",
            "SPY", "S&P 500", "Equity", "2000-07-24",
            "XLU", "Utilities Select Sector", "Equity", "1998-12-16",
            "XPH", "S&P Pharmaceuticals", "Equity", "2006-06-19", 
            "XSD", "S&P Semiconductor", "Equity", "2006-01-31",
            "XRT", "S&P Retail", "Equity", "2006-06-19",
            "XHB", "S&P Homebuilders", "Equity", "2006-01-31",
            "XOP", "Oil and Gas", "Equity", "2006-06-19")

etfPort.m = matrix(etfPort, ncol=4, byrow=T)
colnames(etfPort.m) = c("sym", "desc", "type", "start")
@

\section*{Introduction}

These notes discuss the construction of an investment portfolios constructed from Exchange Traded Funds (ETF). 

Why ETFs and not other assets?  

ETFs are constructed with a wide
variety of market exposures. For example, there are ETFs that attempt to
replicate indexes like the S\&P 500 and the S\&P Mid-cap 400 indexes.
There are ETFs that provide negative market exposure and
leveraged long exposure. There are also specialized ETFs in a wide
variety of sectors (software, semiconductors, market indexes, precious
metals). 

The number of ETFs has exploded since they were first introduced in
1993. One drawback of ETFs is that many of them, especially the
newer exotic ETFs, have a short history (sometimes only a year or
less).

Unlike index mutual funds, ETFs trade
like stocks. A portfolio constructed from a few ETFs have some
level of diversification, since an ETF itself is composed on a portfolio of
assets. ETFs also have tax advantages compared to mutual funds or
index funds (unlike mutual funds, trading within an ETF does not incur tax liability).

\subsection*{Past is Prologue}

The approach used here in constructing investment portfolios
assumes that the past has some influence on the future. This includes
the assumption that there is serial dependence in asset
returns and that assets that have increased in value in the recent past have a
higher likelihood of doing so in the next quarter.

Empirically, the returns in the universe of ETF assets are not
normally distributed. This is especially true for the back-test period
used here, which includes the 2008 crash where a number of assets have
(negative) returns that would be far in the tail of a normal
distribution.

\subsection*{Challenges with ETFs}

\subsubsection*{Liquidity}

Some ETFs are thinly capitalized with only a few million dollars worth of net assets. These ETFs also tend to be thinly traded. For example, the net assets of the LSTK ETF (iPath Pure Beta Livestock ETN) were only 2.09 million in November 2014 and the three month average volume is only 788 shares. The LSTK ETF had a year-to-date return in November of 2014 of 26.88 percent. However, the low liquidity and the potential market impact in trading a low liquidity ETF like LSTK makes this a riskier investment than the return statistics suggest. The etf.com screen used to select the current ETF universe filters out ETFs like LSTK.

\subsubsection*{Exotic Market Behavior}

ETFs package a wide variety of market exposure. This includes currency ETFs like YCS (ProShares UltraShort Yen). 

For the three months ending on November 28, 2014, YCS had a 29.4 percent return. In the same three month period ending on November 28, 2014 the exchange rate between the dollar and the Japanese yen went from about 104 yen/dollar to about 118 yen/dollar, causing YCS to increase proportionally in value. The yen/dollar exchange rate was at a ten year high (for the dollar and a low for the yen) of about 120 yen/dollar.

For the appreciation in YCS to continue, the exchange rate would have to rise (relative to the dollar) to levels that have not been seen since 1998. The 1998 exchange rate was also accompanied by extreme volatility In 1998 the yen/dollar exchange rate fell about 34 percent in two months, from July to August. These factors make YCS an inappropriate choice for the type of portfolio that is being constructed in this paper.

\section*{Holding Period}

A quarterly holding period is used for the portfolios. This time
period allows the portfolio to be adjusted for new market conditions. 

To provide sufficient data for back-testing, weekly return
data is used. Weekly returns do not have exactly the same distribution
as monthly or quarterly returns, so this is an imperfect solution.

\section*{Calculating Returns}

In academic finance, continuously compounded log returns are commonly used:

\begin{displaymath}
  r_t = log( P_t ) - log(P_{t-1})
\end{displaymath}

Calculating returns in this manner is correct if returns have a distribution that is
close to normal. This is not the case for the ETF
returns during the back-test period which includes the market crash of 2008. For example, 
SKF (ProShares UltraShort Financials) dropped from
753.00 to 190.08 over a quarter. 

\begin{displaymath}
r_t = log( P_t ) - log(P_{t-1}) = log(190.08) - log(753.00) = -1.37662
\end{displaymath}

This is a loss greater than the value of the asset, which is not
possible with stocks (or ETFs).

A better way to estimate returns for non-normal data is to use arithmetic returns:

\begin{displaymath}
  R_t = \frac{ P_t }{P_{t-1}} - 1
\end{displaymath}

Using arithmetic returns $R_t = -0.7475697$, which accurately estimates the loss. 

This approach to estimating returns is also a better model for how an
investor behaves, compared to compound returns. An investor buys
the assets at the start of the period and then sells at the end of the
period. 

Arithmetic returns vs. compound returns are discussed in the paper \textit{Linear
vs. Compounded Returns Common Pitfalls in Portfolio Management} by
Attilio Meucci, available on SSRN.

\section*{Back-testing}

Some people emphatically proclaim that they never back-test. They point out that back-test results are never the same as out-of-sample actual results. So why bother?

Perhaps the core reason to back-test is to catch problems in the
construction of the portfolio construction algorithm.
For example, in the gain-loss portfolio below, assets are pre-filtered by the gain-loss ratio. Without back-testing I might not have noticed that assets with similar gain-loss ratios tend to be similar ETFs (for example, Oil or Energy ETFs). To avoid a portfolio heavy in a single asset type (energy) or sector, I added code to filter out assets with high paired correlation.

As the Mutual Fund adds say, past performance is not indication of future performance. But past performance does provide an example of how the algorithm behaves. If the algorithm performs poorly in the back-test it is unlikely to suddenly perform much better with out-of-sample data.

There is a lurking danger in back-testing, however. There is rarely
enough financial data for both back-testing and out-of-sample tests.
This is especially true of ETFs, which are relatively new investment
instruments.  

The back-tests here use a period of about seven years. This means that
there is no out-of-sample data except for the future. 

When back-testing a portfolio construction algorithm, features of the
algorithm are adjusted to yield better performance. In doing this,
future information is applied to the past. If this is done frequently
enough the result will be an algorithm that performs well on past
data, but doesn't necessarily do well on the out-of-sample future
data. This is over-fitting.

A thoughtful discussion of the pitfalls of back-testing can be found
in \textit{Pseudo-Mathematics and Financial Charlatanism: The Effects
of Back-test Over-fitting on Out-of-Sample Performance} by David
H. Bailey, Jonathan M. Borwein, Marcos Lopez de Prado, and Qiji Jim
Zhu available on SSRN. This paper also proposes a way to estimate some
of the inaccuracy introduced by back-testing.

\section*{Portfolio Optimization}

Portfolio optimization involves finding the asset weights that yield
highest portfolio return for a given level of risk. Calculating an optimal
portfolio for a particular level of risk requires an estimate of the
risk and forecasted (future) return.

\subsection*{Risk}

Risk is a retrospective measure that is reported as variance, value-at-risk (VaR) or Conditional Value at Risk (CVaR, also known as Expected Shortfall or Extended Tail Loss). One problem with risk estimates is that they are difficult verify in back testing where the actual "future" is know.

CVaR is, theoretically, a more attractive risk measure since it measures down-side risk, without penalizing upside volatility. A problem with CVaR is that it relies on an estimate of the negative tail of the distribution where, empirically, there are few values. To address this problem a normal distribution is sometimes used. The back-test period includes a market crash that contains extreme values. A normal distribution differs dramatically from this empirical distribution. Another solution is to fit a distribution (perhaps using R's \texttt{density} function) and use the tail of this distribution to estimate the CVaR value.

Each portfolio is calculated from a year of past weekly data. 
With only a year of weekly data the error in CVaR estimate is so high that any advantage it has over using variance ($\sigma^2$) is lost.

\subsection*{Forecasting Future Returns}

In classic (Markowitz) mean-variance portfolio optimization the future return is forecast by the mean return over a historical window. This assumes that the mean is stable (which is only the case over a very long or very short period). 

To forecast future returns, one year of weekly past data (e.g., 52
values) is used to forecast the return twelve weeks ahead. 

The portfolio is bought at the "current week" based on the portfolio estimate using the past
year of weekly data. The portfolio is held for 12 weeks and then sold realizing a
return.

Some functions can be used to forecast future return are:

\begin{itemize}
  \item mean
  \item exponential moving average (EMA) - from R's TRR package
  \item linear mean (e.g., the value predicted by an ordinary least squares line through the past returns).
\end{itemize}

\subsection*{Portfolio Weights}

In the paper, long-only portfolios are constructed. The analytic mean-variance equations cannot be used (since mean-variance may short sell assets). One way to calculate an optimal long-only portfolio is to use R's \texttt{solve.QP} in the \texttt{quadprog} package \footnote{The material in this section is based on Guy Yollen's lecture notes for \textit{Financial Data Modeling and Analysis in R}, University of Washington, 2012}. 

The \texttt{solve.QP} function minimizes:

\begin{displaymath}
  \frac{1}{2} \textbf{b}^T \textbf{D} \textbf{b} - \textbf{b}^T \textbf{d}
  \text{ such that } \textbf{A}^T\textbf{b} \geq \textbf{b}_0
\end{displaymath}

If there are \textit{n} stocks with $T = 52$ time periods 
\begin{itemize}
\item $\textbf{D} = 2\boldsymbol\Sigma = 2 \cdot cov(\textbf{R})$ an $n \times n$ matrix
\item $b = w$ (the portfolio weigths) a vector of length $n$
\item $\textbf{d} = 0$
\item $\textbf{A}$ is a constraint matrix ($n \times m$)
\item $\textbf{b}_0$ is a vector of constraint bounds (length $m$)
\end{itemize}

There are \textit{n} stocks with a forecasted return 
$\textbf{r} = r_1, \dots r_\textit{n}$. The portfolio return is 
$r_p = \textbf{w}^T \textbf{r}$

For a portfolio where

\begin{itemize}
\item weights sum to 1
\item portfolio return target is $r_p$
\item no short sales (e.g., all portfolio weights are greater than 1)
\end{itemize}

\begin{displaymath}
\textbf{A} = \begin{bmatrix}
             1 & r_1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
             1 & r_2 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
             \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\
             1 & r_{\textit{n}} & 0 & 0 & 0 & 0 & 0 & 0 & 1 
             \end{bmatrix}
\end{displaymath}

\begin{displaymath}
\textbf{b} = \begin{bmatrix}
             w_1 \\
             w_2 \\
             \vdots \\
             w_{\textit{n}}
             \end{bmatrix}
\end{displaymath}

\begin{displaymath}
\textbf{b}_0 = \begin{bmatrix}
             1 \\
             r_p \\
             0 \\
             \vdots \\
             0
             \end{bmatrix}
\end{displaymath}

The $\textbf{b}_0$ vector contains the portfolio target return, $r_p$. 

For each portfolio, a target return must be chosen. If this return can be realized, the optimizer will choose portfolio weights that realize this return, with the lowest risk (variance).

In order to choose a maximum return that is likely to be realized the equation below is used for the target return:

\begin{displaymath}
  r_{target} = max(\textbf{r}_p) -  max(\textbf{r}_p) \times \textit{frac}
\end{displaymath}

Where $\textbf{r}_p$ is a vector of forecasted returns for the portfolio assets and \textit{frac} is usually between 0.1 and 0.3.

An alternative picking a single target return is to calculate an efficient frontier for the portfolio by constructing portfolios for a range of values. The portfolio with the maximum Sharpe ratio is used in some of the models discussed below.

\begin{framed}
One of the challenges of using \texttt{solve.QP} is getting the arguments right. There is a GNU port of an optimization language that can be used to describe the constraints that is much clearer than the vector definitions used for \texttt{solve.QP}. The R-blogger Systematic Investor has written a post that describes this:

\href{http://systematicinvestor.wordpress.com/2012/03/14/portfolio-optimization-specify-constraints-with-gnu-mathprog-language}{Portfolio optimization constraints with the GNU MathProg language}

\end{framed} 


\section*{A Hand Selected ETF Portfolio}

The first portfolio discussed consists of ETFs that were selected "by
hand". Selecting ETFs for this portfolio was an iterative and time
consuming process. The portfolios discussed later in this report 
use software screening to select a set of ETFs from a large ETF universe.

In an early version of this portfolio, ETFs with negative exposure where
included. For example, ETFs with -1x S\&P 500 exposure and -1x
oil and gas exposure. These negative $\beta$ assets were selected
because these assets are expected to have positive return when the
market went down. The speculation was that ETFs with negative beta
would be similar to a short
position, without the complexity of a short position. 

The portfolios calculated here are all long-only portfolios. Unlike long-short portfolios, a long-only
portfolio does not profit from a
market drop. It was hoped that negative beta ETFs would act like a short position and profit
from market down turns. However, the portfolio
optimizer never chose these assets, even when the market went
down. 

The negative beta ETFs attempt to replicate a negative market position on a 
daily basis. Holding a negative beta S\&P 500 ETF for month is not the same as
holding a short position in the S\&P 500 for a month because the ETF is re-balanced.

The ETFs that were selected by hand for the first portfolio are listed below in Table \ref{etf_table}. 

<<ETFTable, echo=FALSE, results='asis'>>=
tblData = etfPort.m
colnames(tblData) = c("ETF Symbol", "Description", "ETF Type", "Inception Date")
t = xtable(tblData, label="etf_table", caption="ETF Universe")
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=T, caption.placement="bottom")
@

<<echo=FALSE>>=

portfolioEndDate = "2014-10-28"

startDates = as.Date(etfPort.m[,"start"], format="%Y-%m-%d")
# the end of the period, going backward
startDateIx = which.max(startDates)
startDate = as.Date(startDates[startDateIx])
endDate = findMarketDate(as.Date(portfolioEndDate))
@

In a portfolio of ETFs, the back-test time period is set by the ETF with the most recent inception data. In this portfolio this is the \Sexpr{paste(etfPort.m[startDateIx, "sym"], etfPort.m[startDateIx, "desc"] )}. The inception date is \Sexpr{startDate}.

<<echo=FALSE, message=FALSE>>=
symbols = etfPort.m[,"sym"]
etfDataFile = "etf_close_data.csv"
etfFilePath = paste(dataFileRoot, etfDataFile, sep="/")
if (file.exists(etfFilePath)) {
  # cat(paste("Reading close data from file", etfFilePath))
  closeData.df = read.csv(file = etfFilePath)
} else {
  # cat("Reading close data from finance.yahoo.com")
  closeData.z = getCloseData(symbols, startDate, endDate)
  naCounts = countNA( closeData.z )
  dateIx = as.character(as.Date(index(closeData.z)))
  if (! all(naCounts == 0)) {
    cat(paste("NA values in Yahoo data:", naCounts, "\n"))
    cat("Fixing holes in the data")
    closeDataFilled.z = apply(closeData.z, 2, fixNAHoles)
    closeData.df = data.frame(dateIx, closeDataFilled.z)
  } else {
    closeData.df = data.frame(dateIx, closeData.z)
  }
  colnames(closeData.df) = c("date", symbols)
  if (all(! is.na(closeData.df))) {
    # cat(paste("The close data has no NAs. Writing to file", etfFilePath, "\n"))
    write.csv(x = closeData.df, file=etfFilePath, quote=F, row.names=F)
  } else {
    cat("Close data has NA values\n")
  }
}

closeData.z = zoo( closeData.df[,-which(colnames(closeData.df) %in% "date")], as.Date(closeData.df[, "date"]))
retData.z = zoo(apply(closeData.z, 2, FUN=function(v) {calcRet(v)}))
@

The correlation plot for the portfolio ETFs is shown below in Figure \ref{fig:corrplot}

<<corrplot, fig.lp="fig:", echo=FALSE, fig.cap = 'Correlation of Asset Returns', fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
#
# Correlation plot
#
M = round(cor(retData.z), 4)
#  mar = c(bottom, left, top, right)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2))
corrplot(M, type="lower", diag=F, addgrid.col=T, addCoef.col=T, method="square", order="hclust")
par(mfrow=c(1,1))
@

The portfolio $\beta$ between the ETF funds and the S\&P 500
ETF (SPY) is shown below in Table \ref{beta_table}. The $\beta$ is
calculated from weekly returns.

<<assetBeta, echo=FALSE, results='asis'>>=
#
# Calculate the asset beta
#
etfDesc = etfPort.m[,"desc"]
etfSym = etfPort.m[,"sym"]
sp500Ret = retData.z[,"SPY"]
varsp500 = var(sp500Ret)
beta = apply(retData.z, 2, FUN=function(v, rb, varb) {return(cov(v,rb)/varb)}, rb=sp500Ret, varb=varsp500)
beta.df = data.frame(etfSym, etfDesc, round(coredata(beta), 4))
colnames(beta.df) = c("ETF Sym", "ETF Description", "Beta w/S&P 500")
ix = order(beta)
betaOrd.df = beta.df[ix,]
t = xtable(betaOrd.df, label="beta_table", caption="ETF Beta with the SP 500", digits=4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=T, caption.placement="bottom")
@

The weekly return distribution for the selected ETF assets is shown below:

<<boxPlot,  fig.lp="fig:", echo=FALSE, fig.cap="ETF Weekly Return Distribution", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
tmpData.z = retData.z
colnames(tmpData.z) = etfDesc
chart.Boxplot(tmpData.z, xlab="Return", main=paste("ETF Weekly Returns", startDate, "to", endDate),
              colorset=rainbow(n=ncol(tmpData.z)), lwd=4)
par(mfrow=c(1,1))
@

Table \ref{mean_table} shows the weekly mean and standard deviation values (in percent).

<<assetMeans, echo=FALSE, results='asis'>>=
etfMeans = apply(retData.z, 2, mean)
etfSD = apply(retData.z, 2, sd)
etfMeans.df = data.frame(etfSym, etfDesc, (etfMeans * 100), (etfSD * 100))
colnames(etfMeans.df) = c("ETF Sym", "Description", "Weekly Mean (percent)", "StdDev (percent)")
ordIx = order(etfMeans)
t = xtable(etfMeans.df[ordIx,], label="mean_table", caption="Weekly ETF mean and StdDev (percent)", digits=4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=T, caption.placement="bottom")
@


If the HYG high yield corporate bond ETF were dropped from the portfolio, there would be a larger back-test time period available (no high yield bond fund with an earlier inception data has been identified.) This ETF does have some attractive characteristics that make it difficult to discard. In particular it has good return and low volatility compared to the equity ETFs.

\section*{Forecasting Returns}

An important step in building a portfolio is forecasting the future returns that will be used in portfolio optimization. In classic mean-variance optimization the future return is estimated by the mean return.

Using a window of 52-weeks of past weekly returns, tables \ref{mon_cor_func_table} and \ref{qtr_cor_func_table} show the correlation between the value forecasted by the function and the actual monthly and quarterly return.

<<echo=FALSE, results='asis'>>=
monCorRslt = predictorCorr(closeData.df, ahead = 4, FUN=meanPredictors, emaWin=12)
t = xtable(monCorRslt, label="mon_cor_func_table", 
           caption="Correlation of Mean Functions with 4-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=T, caption.placement="bottom")
@

<<echo=FALSE, results='asis'>>=
qtrCorRslt = predictorCorr(closeData.df, ahead = 12, FUN=meanPredictors, emaWin=12)
t = xtable(qtrCorRslt, label="qtr_cor_func_table", 
           caption="Correlation of Mean Functions with 12-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=T, caption.placement="bottom")
@

The correlation between the predicted return and the quarterly (12-week) return is much stronger for all functions than the correlation with the monthly (4-week) return. This suggests that a portfolio that trades quarterly may be a better choice than a portfolio that trades monthly.

Table \ref{func_table} shows the function that produce the highest absolute correlation with quarterly future returns. Some of these correlations are negative correlations.

<<echo=FALSE, results='asis'>>=
funcIx = apply(abs(qtrCorRslt), 2, which.max)
funcs = c("mean", "EMA", "ols")
Funcs = funcs[funcIx]
names(Funcs) = colnames(qtrCorRslt)
Funcs.df = data.frame(Funcs)
shouldBeCor = ifelse(beta > 0, "postive", "negative")
corVals = rep(0, ncol(qtrCorRslt))
names(corVals) = colnames(qtrCorRslt)
for (i in 1:ncol(qtrCorRslt)) {
  rowIx = funcIx[i]
  corVals[i] = qtrCorRslt[rowIx, i]
}
corIs = ifelse(corVals > 0, "postive", "negative")
table = cbind(Funcs, corIs, shouldBeCor)
rownames(table) = colnames(qtrCorRslt)
colnames(table) = c("function", "correlation", "beta sign")
t = xtable(table, label="func_table", caption="Functions vs. correlation and market beta")
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=T, caption.placement="bottom")
@

Selecting these functions and the associated adjustment factor is over-fitting, since information from the future is used in the past (when it could not have been known).

The mean return is negatively correlated with the future return. 

As Table \ref{func_table} shows, the bond funds SHY and AGG have a negative beta with the benchmark (which is to be expected with bonds). In the case where the beta is not negative, the negative correlation of the mean can be converted to a positive correlation by scaling the forecast by a multiplier:

\begin{displaymath}
  \text{multiplier} = sign(\beta) \cdot sign(\rho) \quad \text{if $|\beta| >= 0.1$}
\end{displaymath}

where $\rho$ is the correlation. The result is shown in Table \ref{multiplier}

<<echo=FALSE, results='asis'>>=
multiplier = ifelse(beta >= 0.1, (sign(beta) * sign(corVals)), 1)
table.df = data.frame(Funcs, multiplier)
colnames(table.df) = c("function", "multiplier")
t = xtable(table.df, label="multiplier", caption="Prediction Multiplier")
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=T, caption.placement="bottom")
@


<<echo=FALSE, message=FALSE>>=
retBlock.z = retData.z[1:52,]
minWt = rep(0, ncol(retBlock.z))
names(minWt) = colnames(retBlock.z)
minWt["SHY"] = 0.2

weights = calcLongOnlyPort(retData.z, Funcs, multiplier, adj = 0.2 )
assetCols = colnames(weights)[-which(colnames(weights) %in% c("targRet", "stdDev"))]
weightsFilt = weights[,assetCols]
@


<<portWeights, fig.lp="fig:", echo=FALSE, fig.cap = 'Quarterly Portfolio Weights', fig.align='left', fig.pos='H', fig.width=10, fig.height=4, message=FALSE>>=
ix = which(etfPort.m[,"sym"] %in% assetCols)
weightsLongName = weightsFilt
colnames(weightsLongName) = etfPort.m[ix,"desc"]
# colors from ColorBrewer - colorbrewer.org
colorRange = c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99",
               "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6", "#6a3d9a")
chart.StackedBar(w = weightsLongName, colorset = colorRange, ylab="Weight", space=0.1,
                 main="Quarterly Portfolio Weights", cex.legend = 0.72)
@

The table below shows the numeric portfolio weights. Portfolio weights are rounded to two decimal places.

<<echo=FALSE, results='asis'>>=
t = xtable(weightsFilt, label="portfolio_weights_table", 
           caption="Portfolio Weights")
print.xtable(t, include.rownames=T, size="\\tiny", floating=T, caption.placement="bottom")
@

<<echo=FALSE, results='asis'>>=
s = matrix(apply(weightsFilt, 2, sum), nrow=1)
rownames(s) = "sum"
colnames(s) = colnames(weightsFilt)
t = xtable(s, label="sum_table", caption="Column Sum of Weights")
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=T, caption.placement="bottom")
@

Figure \ref{fig:mvPortRet} shows the cumulative gain for the portfolio, along with the S\&P 500 benchmark. This is the \^GSPC index, not the S\&P 500 ETF (SPY). By choosing the index rather than a tradable asset there is a consistent benchmark.

<<mvPortRet, fig.lp="fig:", echo=FALSE, fig.cap = 'Portfolio Cumulative Return with SP 500 Return', fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=
portInfo.l = calcPortReturn(weightsFilt, closeData.z, ahead=12 )
closeDates = portInfo.l$closeDates
portRet = portInfo.l$portRet
startCloseDate = closeDates[1]
endCloseDate = closeDates[length(closeDates)]

midcapCloseFile = "sp500.csv"
midcapClosePath = paste(dataFileRoot, midcapCloseFile, sep="/")
sp500Sym = "^GSPC"
if (file.exists(midcapClosePath)) {
  # cat(paste("Reading midcap index data from file", midcapClosePath))
  sp500.df = read.csv(file=midcapClosePath)
  colnames(sp500.df) = c("date", sp500Sym)
} else {
  # cat("Reading midcap data from finance.yahoo.com")
  sp500.z = getCloseData(sp500Sym, startCloseDate, endCloseDate)
  dateIx = as.character(as.Date(index(sp500.z)))
  sp500.df = data.frame(dateIx, sp500.z)
  colNames = c("date", colnames(sp500.z))
  colnames(sp500.df) = colNames
  write.csv(x = sp500.df, file=midcapClosePath, quote=T, row.names=F )
}
sp500Dates = as.Date(as.vector(sp500.df[,"date"]))
weightDates = as.Date(rownames(weightsFilt))
midIx = which(sp500Dates %in% closeDates)
midClose = as.vector(sp500.df[midIx,sp500Sym])
names(midClose) = sp500.df[midIx,"date"]
midRet = calcRet(midClose)
retData = cbind(portRet, midRet)
colnames(retData) = c("Portfolio", "S&P 500")

par(mfrow=c(1,1))
chart.CumReturns(retData, wealth.index=T, main="Quarterly Long Only Portfolio and S&P 500", 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="bottomright", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))
@

The draw-down chart is shown below in Figure \ref{fig:qtrDrawDown}

<<qtrDrawDown, fig.lp="fig:", echo=FALSE, fig.cap = 'Portfolio and SP 500 Drawdown', fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=
par(mfrow=c(1,1))
chart.Drawdown(R = retData, colorset = c("blue", "red"), legend.loc="bottomright", lwd=2, ylab="Drawdown", 
               main="Portfolio and S&P 500 Drawdown")
par(mfrow=c(1,1))
@

The information ratio is:

\begin{equation}
  IR = \frac{E[R_p - R_b]}{\sigma} = \frac{E[R_p - R_b]}{\sqrt{var(R_p - R_b)}} = \frac{\alpha}{\omega}
\end{equation}

<<echo=FALSE>>=
mvIR = mean(portRet - midRet) / sd(portRet - midRet)
@

Using the S\&P 500 as the benchmark, the information ratio of this portfolio is \Sexpr{mvIR}.

As Figure \ref{fig:mvPortRet} shows, this portfolio has performance that is very similar to the benchmark. 

For taxable funds, the S\&P 500 would be a better choice than this portfolio, which is traded every quarter.

A holding period of less than a year-and-a-day will incur short term capital gains taxes. Short-term capital gains are taxed at the income tax rate (e.g., if your tax rate is 28 percent, the short-term capital gains tax will be 28 percent). 

In contrast, long-term capital gains (for stocks held at least a year and a day) are 15 percent. The higher short-term capital gains rate incurs a 13 percent penalty compared to simply holding the S\&P 500 ETF (SPY).

If a portfolio has performance that is similar to the benchmark (as is the case here), the tax consequences of short term capital gains mean that it would be better to purchase the benchmark and hold it for at least a year.

\subsection*{Mean-Variance and Equally Weighted Portfolios}

In this section, for the universe of 10 ETFs (Table \ref{etf_table}), an equally equally weighted portfolio is compared to a Markowitz style mean-variance portfolio.

The previous portfolio, shown in Figure \ref{fig:mvPortRet}, is not a classic mean variance portfolio. This portfolio uses the functions shown in Table \ref{multiplier} to estimate the future return. This future return estimate is then used in portfolio optimization.

A Markowitz style mean-variance portfolio uses the mean to estimate the future return. The mean tends to be negatively correlated with the future return, but mean-variance portfolio models do not attempt to correct for this. As Table \ref{qtr_cor_func_table} shows, the mean is not strongly correlated with future returns.

The difference in the estimation of the future return accounts for the difference between the portfolio shown in Figure \ref{fig:mvPortRet} and the mean-variance portfolio shown in the plot below.


<<echo=FALSE, message=FALSE>>=
meanFuncs = rep("mean", length(colnames(retData.z)))
names(meanFuncs) = colnames(retData.z)
meanMult = rep(1, length(meanFuncs))
mvWeights = calcLongOnlyPort(retData.z, meanFuncs, meanMult, adj = 0.1 )
mvWeightsFilt = mvWeights[,assetCols]
eqWeights = rep(1/length(meanFuncs), length(meanFuncs) * nrow(mvWeights))
eqWeights.m = matrix(eqWeights, ncol=length(meanFuncs), nrow=nrow(mvWeights))
rownames(eqWeights.m) = rownames(mvWeights)
colnames(eqWeights.m) = colnames(mvWeightsFilt)
mvPort.l = calcPortReturn(mvWeightsFilt, closeData.z, ahead=12 )
eqPort.l = calcPortReturn(eqWeights.m, closeData.z, ahead=12 )
mvPortRet = mvPort.l$portRet
eqPortRet = eqPort.l$portRet
@

<<comparePort, fig.lp="fig:", echo=FALSE, fig.cap = 'MV Portfolio, Eq. Weighted and SP 500', fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=
compData = cbind(mvPortRet, eqPortRet, midRet)
colnames(compData) = c("Mean-Variance", "Equal Weight", "S&P 500")
par(mfrow=c(1,1))
chart.CumReturns(compData, wealth.index=T, main="MV, EQ and S&P 500 Portfolios", 
                 col=c("blue", "magenta", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))

@

As Figure \ref{fig:comparePort} shows, the equally weighted portfolio performs better than the mean-variance portfolio and the portfolio in \ref{fig:mvPortRet}. The equally weighted portfolio also has low turnover (resulting in a lower tax rate). If the tax penalty is taken into account, the equally weighted portfolio may be one of the top performing portfolios described in this paper.

\section*{Predicting Returns, Redux}

In calculating the forecasted return for the first portfolio, an adjustment factor is used to adjust for negative correlation (in the case of XHB and XPH).  The correlation adjustment and the choice of mean function (mean, EMA or OLS) is based on future information (e.g., over-fitting)

\section*{Segmented Linear Model}

The more accurately the future return can be estimated, the more likely it is that the portfolio will realize its target return. This section investigates using a segmented linear model to predict future returns. The motivation for this was the idea that a segmented linear model would produce an more accurate estimate of the local mean.

A segmented linear model maps a set of least squares lines through a time series. These fitted lines are the mean through the local distributions.

A segmented linear model is created using the close price. An example is shown below in 
Figure \ref{fig:segmentExample}. The return can be calculated from the segmented linear model result.

As with the models discussed previously, one year (52 weeks) of past data is used to develop a prediction of the return, one quarter ahead.

<<segmentExample, fig.lp="fig:", echo=FALSE, fig.cap = 'XHB Segmented Linear Model and Close Price', fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=
len = nrow(closeData.z)
xhbLastYear.z = closeData.z[(len-52):len,"XHB"]
segClose.z = segmentedLM(xhbLastYear.z)
par(mfrow=c(1,1))
years = format(index(xhbLastYear.z), "%Y")
plot(xhbLastYear.z, type="l", col="blue", lwd=2, xlab=paste("Date", years[1], "-", years[length(years)]),
     ylab="XHB Weekly Close Price")
lines(segClose.z, col="magenta", lwd=2)
legend(x="topleft", legend=c("Close Price", "Linear Model"), lwd=4, col=c("blue", "magenta"))
par(mfrow=c(1,1))
@

A return sequence is calculated from the fitted values from the linear model. The predicted return is the mean of the fitted values. By using the fitted values from the linear model, there is, in theory, less noise and the price trend is captured.

The table below compares the mean, the segmented linear model mean and the EMA on the segmented linear model values.  

<<echo=FALSE, results='asis'>>=
seg.z = zoo( apply(closeData.z[(len-52):len,], 2, FUN=segmentedLM) )
index(seg.z) = index(closeData.z[(len-52):len,])
ret.z = zoo(apply(closeData.z[(len-52):len,], 2, FUN=function(vec) { return(calcRet(vec)) }))
segRet.z = zoo(apply(seg.z, 2, FUN=function(vec) { return(calcRet(vec)) }))
stdMean = apply(ret.z, 2, mean)
segMu = apply(segRet.z, 2, mean)
emaWindow = 12
segEMA = apply(segRet.z, 2, FUN=function(v) { ema = EMA(v, n=emaWindow)
                                              last = ema[length(ema)]
                                              return(last)})

means = cbind(stdMean, segMu, segEMA)
colnames(means) = c("mean", "LM mean", "EMA")
t = xtable(means, label="LMmeans_table", caption="Mean, Segmented Mean and EMA", digits = 4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=F, caption.placement="bottom")
@

The tables below shows the correlation between the values predicted by the mean, EMA and segmented linear model functions with the actual future return.

Mean functions vs. 2-week ahead return:

<<echo=FALSE, results='asis', cache=TRUE>>=
lmCorRslt2 = predictorCorr(closeData.df, ahead = 2, FUN=lmPredictor )
t = xtable(lmCorRslt2, label="lm_cor_func_table2", 
           caption="Linear Model and Mean Functions Correlated with 2-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=F, caption.placement="bottom")
@

Mean functions vs. 4-week ahead return:

<<echo=FALSE, results='asis', cache=TRUE>>=
lmCorRslt4 = predictorCorr(closeData.df, ahead = 4, FUN=lmPredictor )
t = xtable(lmCorRslt4, label="lm_cor_func_table4", 
           caption="Linear Model and Mean Functions Correlated with 4-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=F, caption.placement="bottom")
@

Mean functions vs. 8-week ahead return:

<<echo=FALSE, results='asis', cache=TRUE>>=
lmCorRslt8 = predictorCorr(closeData.df, ahead = 8, FUN=lmPredictor )
t = xtable(lmCorRslt8, label="lm_cor_func_table8", 
           caption="Linear Model and Mean Functions Correlated with 8-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=F, caption.placement="bottom")
@

Mean functions vs. 12-week ahead return:

<<echo=FALSE, results='asis', cache=TRUE>>=
lmCorRslt12 = predictorCorr(closeData.df, ahead = 12, FUN=lmPredictor )
t = xtable(lmCorRslt12, label="lm_cor_func_table12", 
           caption="Linear Model and Mean Functions Correlated with 12-week return", digits=4)
print.xtable(t, include.rownames=T, size="\\scriptsize", floating=F, caption.placement="bottom")
@

As the comparison point moves out in time (from 2 to 12 weeks), the correlations increase. The EMA is a notably better predictor in most cases than the segmented linear model, the mean or ordinary least squares.

\section*{Looking Back at the Hand Selected Portfolio}

The portfolio chosen from the hand selected ETFs by the optimizer for the May to August 2014 time period was:

<<May14Weights, echo=FALSE, results='asis'>>=
may2014 = t(data.frame(c("HYG", 0.24),
                       c("XLU", 0.76)))
t = xtable(may2014, caption="Portfolio Weights, May to August 2014", digits = 4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=F, caption.placement="bottom")
@

The optimization was not constrained to limit portfolio weight and
there were only nine ETFs in the ETF universe. 

This portfolio may suffers from over-fitting. In selecting the return
forecast functions and the correlation adjustment, future knowledge is
used to select the function with the bast historical correlation and
to select the adjustment factor (to adjust for negative correlation).
Small changes in the mean functions and the correlation adjustment
result in large changes on the portfolio result. This suggests that these
factors are not stable.

Another problem is that the correlation adjustments for the 
various functions are estimated from the distributions of only a few
assets.

\section*{Portfolio Redux}

The rest of the paper investigates portfolios that have better performance than the portfolio shown in Figure \ref{fig:mvPortRet}, which was constructed from the hand selected set of ETFs.

\subsection*{Asset Selection}

Portfolio returns are extremely sensitive to the choice of assets. Quantitative techniques cannot turn lead
into gold. A portfolio algorithm can only produce returns within its universe of assets. For example, the maximum gain-loss and maximum Sharpe ratio algorithm produced extremely poor returns with an initial set of ETFs. Much better returns were produced with the etf.com list, discussed below.

The literature on asset selection for stocks goes back at least as far as the first edition of Graham and Dodd's 1934 book \textit{Securities Analysis}. A variety of factors have been proposed for selecting the stocks for a portfolio.

ETFs are collections of stocks and other assets (like futures contracts) that can be traded without tax consequences. This makes the analysis of ETFs more complicated than single stocks. ETFs also have a wide variety of assets and strategies. Their complex nature makes ETFs more difficult to select on an empirical basis.

The portfolios select ETFs on the basis of past performance. The problem with always looking at the past, without any tools to forecast future performance, is that sometimes the desired performance is also in the past.

\subsection*{Selecting ETFs}

In the case of stocks, the universe can be selected from the stocks that make up an index, like 
the S\&P 500. With ETFs there is no canonical choice for the ETF universe.

Originally a list of ETFs was constructed from a list from the Morningstar web site. The Morningstar ETF filter is difficult to use and does not allow data to be exported to a CSV file. Morningstar has also placed a number of features behind a pay wall.

I later discovered that a better source of ETF information is the etf.com web site.
This site allows ETFs to be selected on the basis of minimum capitalization (among other
criteria). The data can be easily exported in CSV format.

The "settings" used in the etf.com screen are:

\begin{itemize}
\item Efficiency: 80 and above
\item AUM: \textgreater \$10 million
\item 3 Mo Return: \textgreater 0 percent
\item 1 Year Return: \textgreater 0 percent
\item Fund Closure Risk: Low Risk
\end{itemize}

In November of 2014 this resulted in a universe of 436 ETFs.

<<echo=FALSE>>=

startDate.d = as.Date("2007-04-04")
startDate.d = findMarketDate( startDate.d )
# endDate.d = startDate.d + 30
endDate.d = findMarketDate( as.Date(portfolioEndDate) )
startDate = as.character( startDate.d )
endDate = as.character( endDate.d )

# The etf_symbols.csv file contains ETFs with history going back to at least 2007-04-04. This is the 
# ETF universe that is used in the Portfolio Redux (the second back test portfolio)
#
etfSymbolFile = "etf_symbols.csv"
etfSymbolPath = paste(dataFileRoot, etfSymbolFile, sep="/")
symlist.m = read.csv(file=etfSymbolPath, header=TRUE)
@

Most ETFs have been constructed in the last few years (relative to 2014). To allow sufficient data for back testing, a start date of \Sexpr{startDate.d} was used. Out of the 436 ETS, there were \Sexpr{nrow(symlist.m)} that had an inception date in this time frame.

\section*{Asset Selection and Risk}

If there is sufficient data, assets can be selected from an asset universe using mean-variance optimization. Mean-variance analysis relies on the calculation of a covariance matrix. In most cases, we do not have sufficient past data, relative to the asset universe. This creates a significant problem when it comes to estimating the covariance matrix. With just a year of past data (52 samples), the ETF universe is larger than the sample data. The resulting covariance matrix may not be full rank and may have a great deal of error (see \textit{Estimating High Dimensional Covariance Matrices and its Applications} by Bai and Shi (2011)).

In \textit{Honey, I Shrunk the Sample Covariance Matrix} (2003) Ledoit and Wolf describe how "shrinkage" can be used to construct a more accurate covariance matrix. Ledoit and Wolf's shrinkage algorithms is implemented by the R function \texttt{cov\_shrink} in the \texttt{tawny} package. This package is used to estimate the covariance matrix for this portfolio.

Even if Ledoit and Wolf's shrinkage method is used, such a large mismatch between the number of assets and the number of historical samples may still yield an inaccurate covariance matrix. By pre-filtering assets to yield an asset universe that is closer to the number of historical data points, a more accurate covariance matrix may be achieved.

\section*{Asset Selection using the Gain-Loss Ratio}

In the paper \textit{Gain, Loss and Asset Pricing} by Bernardo and Ledoit (1996) the authors propose an asset performance metric that improves on the Sharpe Ratio by only penalizing for the down side loss. This metric also has the advantage that, unlike the Sharpe Ratio, the sample standard deviation does not have to be estimated (standard deviation has a lot of error for small samples). Unlike their later article on covariance matrix shrinkage, this article is not exactly a paragon of clarity. However, the gain-loss equation is nicely summarized in the book \textit{Practical Risk-Adjusted Performance Measurement} by Bacon, page 110 (available on Google Books). The equation is:

\begin{displaymath}
\text{gain-loss ratio} = \frac{ \frac{1}{n} \times \sum_{i=1}^{n} max(r_i, 0)}{\frac{1}{n} \times \sum_{i=1}^{n} max(0 - r_i, 0)}
\end{displaymath}

Note that bond ETFs tend to have high gain-loss ratios, since they tend to have a small number of losses and lots of (small) gains. Although bond ETFs will have a high gain-loss ratio, their return may be low.

<<echo=FALSE, cache=TRUE, message=FALSE>>=
etfUniv = as.vector(symlist.m[,"sym"])
etfCloseFile = "etf_close_prices.csv"
etfClosePath = paste(dataFileRoot, etfCloseFile, sep="/")
if (file.exists(etfClosePath)) {
  etfClose = read.csv(file=etfClosePath, header=TRUE, row.names=1)
} else {
  t = getCloseData(etfUniv, startDate, endDate)
  etfClose = data.frame(t)
  write.csv(etfClose, file=etfClosePath, row.names=T)
}
@


<<echo=FALSE, cache=TRUE>>=
etfRet = apply(etfClose, 2, FUN=function(v) {calcRet(v)})
etfDates = as.Date(rownames(etfRet))
blockStart = etfDates[1]
blockEnd = etfDates[52]
retBlock = etfRet[1:52,]
glVals = apply(retBlock, 2, FUN=gainLossRatio)
ix = order(glVals, decreasing=TRUE)
glValsSort = glVals[ix]
@

The plot below shows the distribution of gain-loss ratio values for the time period \Sexpr{blockStart} to \Sexpr{blockEnd}. The back-test time period includes the crash of 2007-2008, so there are probably an unusually large number of gain-loss ratio values that are less than one (e.g., more losses than gains).

<<glHistPlot,  fig.lp="fig:", echo=FALSE, fig.cap="Histogram of the log10of the gain-loss ratio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
par(mfrow=c(1,1))
chart.Histogram(log10(glVals), col="slategrey", xlab="log10 of the gain-loss ratio values", breaks = 22)
par(mfrow=c(1,1))
@

\section*{Back-testing Portfolio Algorithms}

Back-testing uses a rolling 1-year window that moves forward in 12-week steps.

For back-testing the asset universe is limited to ETFs that span the entire back test period. This limitation exists for the sake of simplicity (a new asset universe doesn't have to be calculated every quarter).

\subsection*{Maximum Return Portfolios}

The algorithm for calculating a portfolio that targets the maximum forecasted return is:

\begin{enumerate}
\item From 52 weekly returns, calculate the Sharpe ratio or the gain-loss ratio for all assets.
\item Sort the assets in decreasing order on the basis of the Sharpe or gain-loss ratio result.
\item For each ETF, remove the other ETFs that are highly correlated $(\rho >= 0.9)$ This helps avoid a portfolio that is composed the same type of ETF (e.g., energy, technology, etc).
\item Select the top N assets from the set that remains after correlation filtering.
\item Calculate the forecasted return for the selected assets. Return is forecasted using either the EMA function.
\item Use shrinkage to estimate the covariance matrix based on 52 weekly returns
\item Pick the target return from forecasted future returns: $r_{target} = max(\textbf{r}_p) - (max(\textbf{r}_p) \times \textit{frac})$, where $\textbf{r}_p$. This target return is near the maximum projected return for the portfolio and \textit{frac} is usually between 0.1 and 0.3. For this sets of ETFs, $\text{frac} = 0.3$
\item Using numeric optimization, calculate the optimal portfolio for the selected ETFs
\end{enumerate}

In the Sharpe ratio the risk free rate is subtracted from the return. Here a modified Sharpe ratio is used: 

\begin{displaymath}
\text{Sharpe ratio}_{modified} = \frac{\E[\textbf{r}] }{ \sigma_r}
\end{displaymath}

The back-test result, for the maximum gain-loss and Sharpe ratio filtered portfolios, is shown below in Figure \ref{fig:plotReduxPort1}.

<<plotReduxPort1,  fig.lp="fig:", echo=FALSE, fig.cap="Plot of Portfolio Redux Performance", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=

# Maximum gain-loss portfolio
port.l = backtestPortfolio( etfRet, portfolioWeights, gainLossRatio, emaMean, emaWin=12, adj=0.3 )
portRet = calcBacktestRet(port.l, etfClose)

# Maximum Sharpe ratio portfolio
portSharpe.l = backtestPortfolio( etfRet, portfolioWeights, sharpeRatio, emaMean, emaWin=12, adj=0.3)
portRetSharpe = calcBacktestRet(portSharpe.l, etfClose)

benchQtrClose = getBenchmark( sp500Sym, port.l, etfClose)
benchRet = calcRet(benchQtrClose)

maxRetData = cbind(portRet, portRetSharpe, as.numeric(benchRet))
colnames(maxRetData) = c("gain-loss", "Sharpe", "^GSPC")
par(mfrow=c(1,1))
chart.CumReturns(maxRetData, wealth.index=T, main="Maximum Gain-loss and Sharpe Ratio portfolios, with ^GSPC", 
                 col=c("blue", "magenta", "red"), legend.loc="topleft", 
                 lwd=4, lty = c(2, 4, 1), grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))
@

The Maximum gain-loss and Sharpe ratio portfolios are extremely sensitive to the choice of assets. With an earlier set of ETFs this portfolio lost about 60 percent of its value in the crash of 2008-2009. By the end of the back-test period in 2014 this portfolio had only recovered about 20 percent of its value, while the S\&P 500 had recovered its loss, plus a 40 percent gain. 

In contrast, with the current set of ETFs, both of maximum gain-loss and Sharpe ratio portfolios significantly out perform the S\&P 500 benchmark.

\subsection*{Optimized Portfolios and Diversification}

The initial portfolio, before optimization, consists of 40 ETFs. The plot in Figure \ref{fig:portfolioDist} shows the number of ETFs in the optimized portfolio. In some cases the portfolio consists of only two ETFs.

<<portfolioDist, echo=FALSE, message=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Number of ETFs per Portfolio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
numStocksGL = unlist(lapply(port.l, FUN=function(e) { length(e$weights)}))
numStocksSharpe = unlist(lapply(portSharpe.l, FUN=function(e) { length(e$weights)}))
numStockDates = unlist(lapply(port.l, FUN=function(e) {e$date}))
numStocks = cbind(numStocksGL, numStocksSharpe )
numStocks.z = as.zoo(numStocks)
index(numStocks.z) = as.Date(numStockDates)
par(mfrow=c(1,1))
barplot(numStocks.z, main="Number of Stocks in the Gain-Loss and Sharpe Ratio Portfolios", 
        ylab="Number of ETFs in Portfolio", xlab="Portfolio Buy Date", 
        beside=TRUE, 
        legend.text=c("gain-loss", "Sharpe"), 
        col=c("blue", "magenta"), 
        args.legend = list(x="topleft"))
par(mfrow=c(1,1))
@

One of the criticisms of optimized portfolios is that they concentrate the portfolio in a few assets, violating the portfolio construction dogma of diversification. While reciting point, the critic will point out that diversification reduces risk, which can be shown mathematically.

The maximum Sharpe and gain-loss ratio portfolios are given a target return (the maximum forecasted return, using the EMA function). The optimizer will find the combination of assets from the input portfolio that produces that return with the lowest risk. The optimizer can be given constraints that limit the weight for any asset (e.g., no asset can make up more than 10 percent of the portfolio). Constraining the portfolio in this way usually reduces the possible portfolio return. To arrive at a portfolio where the optimizer can find a solution, a lower target return will be necessary. The portfolio that results will have a lower return than the unconstrained portfolio, but also a lower risk.

One argument for a diversified portfolio is that it performs better (and has lower losses) in changing market conditions. The longer the portfolio is held, the more important this is.

The design of the portfolios in this paper take a different approach. Changing market conditions are adapted to by re-balancing the portfolios with a new set of assets.

\subsection*{Using \texttt{portfolio.optim()}}

The portfolio optimization above uses \texttt{solve.QP}, whose arguments can be confusing. In this section \texttt{portfolio.optim} is applied to check that the results are the \texttt{solve.QP} results.

<<plotPortfolioOptimPort,  fig.lp="fig:", echo=FALSE, fig.cap="portfolio.optim() optimized Portfolio (gain-loss filtered)", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
portOptim.l = backtestPortfolio( etfRet, weightsPortfolio.optim, gainLossRatio, emaMean, emaWin=12 )
portOptimRet = calcBacktestRet(portOptim.l, etfClose)

par(mfrow=c(1,1))
chart.CumReturns(portOptimRet, wealth.index=T, main="Portfolio optimized with portfolio.optim", 
                 col=c("blue"), 
                 lwd=4, type="l", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))
@

This is the same result as the gain-loss and Sharpe ratio portfolios optimized using \texttt{solve.QP}. The application of \texttt{solve.QP} appears to be correct.

\section*{Equal Weighted Portfolios}

Comparing an equally weighted portfolio to an optimized portfolio provides a useful perspective on whether the optimization resulted in a portfolio with better performance.

An equally weighted portfolio is examined in this section.

\begin{enumerate}
\item From 52 weekly returns, calculate the gain-loss or Sharpe ratio for all assets.
\item Sort the assets in decreasing order on the basis of the gain-loss ratio
\item For each ETF, remove the other ETFs that are highly correlated $(\rho >= 0.9)$
\item Pick the top \textit{n} assets $n = min(\text{numAssets}, 20)$
\item Set the portfolio weights to $\frac{1}{n}$
\end{enumerate}

<<plotEqWtPort,  fig.lp="fig:", echo=FALSE, fig.cap="Plot of Equal Weighted Portfolio (gain-loss filtered)", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
portEqWt.l = backtestEqWtPortfolio( etfRet, gainLossRatio )
portRetEqWt = calcBacktestRet(portEqWt.l, etfClose)
portEqWtSharpe.l = backtestEqWtPortfolio( etfRet, sharpeRatio )
portRetEqWtSharpe = calcBacktestRet(portEqWtSharpe.l, etfClose)
eqWtData = cbind(portRetEqWt, portRetEqWtSharpe, as.numeric(benchRet))
colnames(eqWtData) = c("EqWt G-L Portfolio", "EqWt Shapre Portfolio", sp500Sym)
par(mfrow=c(1,1))
chart.CumReturns(eqWtData, wealth.index=T, main=paste("Equal Weight Portfolios and ", sp500Sym), 
                 col=c("blue", "magenta", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))
@

The equally weighted portfolio out performed the benchmark portfolio, but with a lower loss during the market crash (20 vs. a 60 percent loss). 

\section*{Efficient Frontier Portfolios}

The maximum return portfolio targets a return close to the maximum forecasted return.

The portfolios constructed in this section start in the same way: a set of assets are selected using the Sharpe ratio or the gain-loss ratio. However, rather building a single portfolio for a target return, an efficient frontier is calculated for a range of returns. The portfolio on the efficient frontier with the highest Sharpe or gain-loss ratio is selected.

The previous portfolios used the EMA function to estimate the future return. The portfolios below use both EMA and the mean to estimate the future return.

\section*{Efficient Frontier Portfolios}

This section explores the construction of efficient frontier portfolios. The result portfolio is selected on the basis of the highest gain-loss or Sharpe ratio for the portfolio.

For each back-test step:

\begin{enumerate}
\item From 52 weekly returns, calculate the gain-loss ratio for all assets.
\item Sort the assets in decreasing order on the basis of the gain-loss ratio
\item For each ETF, remove the other ETFs that are highly correlated $(\rho >= 0.9)$
\item Pick the top \textit{n} assets $n = min(\text{numAssets}, 40)$
\item Estimate the future returns, $\textbf{r}_p$
\item Find the efficient frontier for the long-only portfolios using target returns in the range $min(r_p) \cdots max(\textbf{r}_p) - (max(\textbf{r}_p) \times 0.10)$
\item For each portfolio, return the estimated future return and the portfolio standard deviation.
\item Find the maximum Sharpe Ratio portfolio.
\end{enumerate}

<<echo=FALSE, cache=TRUE>>=

calcEfficientFrontier = function(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  selectAssets = filterAssets(retBlock, filterFunc )
  assetRet.z = zoo(retBlock[,selectAssets])
  t_port = TawnyPortfolio(assetRet.z, nrow(assetRet.z))
  S.hat = cov_shrink(t_port)
  alpha = apply(assetRet.z, 2, FUN=alphaFunc, win=emaWin)
  maxRet = max(alpha)
  maxRetAdj = maxRet - (maxRet * adj)
  minRet = min(alpha[alpha > 0])
  portRet = c()
  portSd = c()
  wts = c()
  if (minRet < maxRetAdj) {
    targetRet = seq(from=minRet, to=maxRetAdj, length=50)
    for (i in 1:length(targetRet)) {
      wRaw = portfolioFunc(targetRet[i], alpha, S.hat)
      if (! is.null(wRaw)) {
        # wVec = round(wRaw * 10000, -1)/10000
        wVec = round(wRaw, 4)
        r_p = wVec %*% alpha
        sd = sqrt(as.vector(wVec %*% S.hat %*% wVec))
        portRet = c(portRet, r_p)
        portSd = c(portSd, sd)
        wts = rbind(wts, wVec)
      }
    }
  } else {
    wRaw = portfolioFunc(targetRetAdj, alpha, S.hat)
    # wVec = round(wRaw * 10000, -1)/10000
    wVec = round(wRaw, 4)
    r_p = wVec %*% alpha
    sd = sqrt(as.vector(wVec %*% S.hat %*% wVec))
    portRet = r_p
    portSd = sd
    wts = wVec
  }
  
  return( list(portRet = portRet, portSd = portSd, wts = wts))
} # calcEfficientFrontier


backtestSharpePortfolio = function(etfRet, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  dataDates = rownames(etfRet)
  numWeeks = length(dataDates)
  endIx = rev( seq(from=numWeeks, to=53, by=-12))
  startIx = endIx - 51
  weight.l = list()
  for (i in 1:length(startIx)) {
    retBlock = etfRet[startIx[i]:endIx[i],]
    blockDates = rownames(retBlock)
    startDate = blockDates[1]
    endDate = blockDates[length(blockDates)]
    frontier = calcEfficientFrontier(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin, adj)
    wts = NULL
    if (length(frontier$portRet) > 1) {
      r_p = frontier$portRet
      stdDev = frontier$portSd
      sharpe = r_p / stdDev
      ix = which.max(sharpe)
      wts = frontier$wts[ix,]
    } else {
      wts = frontier$wts
    }
    wVec = wts[wts > 0]
    weight.l[[i]] = list(date = endDate, weights = wVec)
  }
  return(weight.l)
}


calcEfficientFrontierGL = function(retBlock, portfolioFunc, filterFunc, alphaFunc=emaMean, emaWin = 12, adj = 0.10) {
  dot = function(v, w) {
    r = v %*% w
    return(r)
  }
  selectAssets = filterAssets(retBlock, filterFunc )
  assetRet.z = zoo(retBlock[,selectAssets])
  t_port = TawnyPortfolio(assetRet.z, nrow(assetRet.z))
  S.hat = cov_shrink(t_port)
  alpha = apply(assetRet.z, 2, FUN=alphaFunc, win=emaWin)
  maxRet = max(alpha)
  maxRetAdj = maxRet - (maxRet * adj)
  minRet = min(alpha[alpha > 0])
  portGL = c()
  portRet = c()
  wts = c()
  if (minRet < maxRetAdj) {
    targetRet = seq(from=minRet, to=maxRetAdj, length=50)
    for (i in 1:length(targetRet)) {
      wRaw = portfolioFunc(targetRet[i], alpha, S.hat)
      if (! is.null(wRaw)) {
        # wVec = round(wRaw * 10000, -1)/10000
        wVec = round(wRaw, 4)
        retVec = apply(assetRet.z, 1, FUN=dot, w = wVec)
        gl = gainLossRatio( retVec )
        portGL = c(portGL, gl)
        wts = rbind(wts, wVec)
      }
    }
  } else {
    wRaw = portfolioFunc(targetRetAdj, alpha, S.hat)
    # wVec = round(wRaw * 10000, -1)/10000
    wVec = round(wRaw, 4)
    retVec = apply(retBlock, 1, FUN=function(v, w) { v %*% w}, w = wVec)
    portGL = gainLossRatio( retVec )
    wts = wVec
  }
  
  return( list(portGL = portGL, wts = wts))
}


backtestGainLossPortfolio = function(etfRet, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  dataDates = rownames(etfRet)
  numWeeks = length(dataDates)
  endIx = rev( seq(from=numWeeks, to=53, by=-12))
  startIx = endIx - 51
  weight.l = list()
  for (i in 1:length(startIx)) {
    retBlock = etfRet[startIx[i]:endIx[i],]
    blockDates = rownames(retBlock)
    startDate = blockDates[1]
    endDate = blockDates[length(blockDates)]
    frontier = calcEfficientFrontierGL(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin, adj)
    wts = NULL
    if (length(frontier$portGL) > 1) {
      gl = frontier$portGL
      ix = which.max(gl)
      wts = frontier$wts[ix,]
    } else {
      wts = frontier$wts
    }
    wVec = wts[wts > 0]
    weight.l[[i]] = list(date = endDate, weights = wVec)
  }
  return(weight.l)
}

@

The maximum Sharpe Ratio portfolio chosen from the efficient frontier has no loss during the 2008 crash and shows steady appreciation. However, the terminal value of the portfolio is less than the S\&P 500 benchmark portfolio.

<<plotMaxSharpePortfolio,  fig.lp="fig:", echo=FALSE, fig.cap="Plot of the maximum Sharpe Ratio Portfolio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6, message=FALSE>>=

maxSharpe.l = backtestSharpePortfolio( etfRet, portfolioWeights, gainLossRatio, emaMean, emaWin = 12 )
maxSharpePort = calcBacktestRet(maxSharpe.l, etfClose)
maxSharpeData = cbind(maxSharpePort, as.numeric(benchRet))
colnames(maxSharpeData) = c("Max Sharpe", sp500Sym)
par(mfrow=c(1,1))
chart.CumReturns(maxSharpeData, wealth.index=T, main=paste("Maximum Sharpe Ratio Portfolio (EMA) and ", sp500Sym), 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))

maxSharpeIR = mean(maxSharpePort - benchRet) / sd(maxSharpePort - benchRet)

@

The IR of the portfolio is \Sexpr{maxSharpeIR}. The draw-down plot is shown below.

<<maxSharpeDrawDown, fig.lp="fig:", echo=FALSE, fig.cap = 'Max Sharpe Portfolio and SP 500 Drawdown', fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=
par(mfrow=c(1,1))
chart.Drawdown(R = maxSharpeData, colorset = c("blue", "red"), legend.loc="bottomright", lwd=2, ylab="Drawdown", 
               main="Max Sharpe Portfolio and S&P 500 Drawdown")
par(mfrow=c(1,1))
@


For the risk averse, this portfolio seems promising for this set of assets.

The Sharpe Ratio has the disadvantage of penalizing for upside volatility, in addition to downside volatility. Instead of choosing the efficient frontier portfolio with the maximum Sharpe Ratio, this algorithm chooses the portfolio with the maximum gain-loss ratio.

<<plotMaxGainLossPortfolio,  fig.lp="fig:", echo=FALSE, fig.cap="Plot of the maximum Gain-Loss Ratio Portfolio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6, message=FALSE>>=

maxGainLoss.l = backtestGainLossPortfolio( etfRet, portfolioWeights, gainLossRatio, meanMean, emaWin=12 )
maxGainLossPort = calcBacktestRet(maxGainLoss.l, etfClose)
maxGainLossData = cbind(maxGainLossPort, as.numeric(benchRet))
colnames(maxGainLossData) = c("Max Gain-Loss", sp500Sym)
par(mfrow=c(1,1))
chart.CumReturns(maxGainLossData, wealth.index=T, main=paste("Maximum Gain-Loss Ratio Portfolio (mean) and ", 
                                                             sp500Sym), 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))

maxGainLossIR = mean(maxGainLossPort - benchRet) / sd(maxGainLossPort - benchRet)

@

The IR of this portfolio is \Sexpr{maxGainLossIR}.

For this set of assets, the efficient frontier portfolio selected by its gain-loss ratio performs worse than the portfolio selected using the Sharpe ratio.

Here again it is worth noting that the portfolio characteristics depend heavily on the assets. For another set of assets the Sharpe and gain-loss portfolios had similar performance.

<<echo=FALSE>>=
#
# Calculate a current investment portfolio
#
currentGainLossPortfolio = function(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin = 12, adj = 0.10) {
  frontier = calcEfficientFrontierGL(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin, adj)
  wts = NULL
  if (length(frontier$portGL) > 1) {
    gl = frontier$portGL
    ix = which.max(gl)
    wts = frontier$wts[ix,]
  } else {
    wts = frontier$wts
  }
  wVec = wts[wts > 0]
  return(wVec)
}


currentSharpePortfolio = function(retBlock, portfolioFunc, filterFunc, alphaFunc=emaMean, emaWin = 12, adj = 0.10) {
  frontier = calcEfficientFrontier(retBlock, portfolioFunc, filterFunc, alphaFunc, emaWin, adj)
  wts = NULL
  if (length(frontier$portRet) > 1) {
    r_p = frontier$portRet
    stdDev = frontier$portSd
    sharpe = r_p / stdDev
    ix = which.max(sharpe)
    wts = frontier$wts[ix,]
  } else {
    wts = frontier$wts
  }
  wVec = weightFilter( wts[wts > 0] )
  return(wVec)
}


getAssetData = function(sym, startDate, endDate, period="w") {
  close.z = NULL
  symClose.z = NULL
  try(
  (close.z = get.hist.quote(instrument=sym, start=startDate, end=endDate, quote="AdjClose",
                              provider="yahoo", compression=period, retclass="zoo", quiet=T)),
  silent = TRUE
  )
  if (! is.null(close.z)) {
    dates = index(close.z)
    adjEnd = as.Date(endDate) - 7
    if ((min(dates) <= as.Date(startDate)) && (max(dates) >= adjEnd)) {
      symClose.z = close.z
    }
  }
  return( symClose.z )
}

readETFCloseList = function(etfList, startDate, endDate) {
  etfClose.m = c()
  symNames = c()
  dates = c()
  for (i in 1:length(etfList)) {
    sym = etfList[i]
    close.z = getAssetData(sym, startDate = startDate, endDate = endDate)
    close = as.numeric( close.z )
    if (! is.null(close.z)) {
      etfClose.m = cbind(etfClose.m, close)
      symNames = c(symNames, sym)
      dates = as.character( as.Date(index(close.z)) )
    }
  }
  colnames(etfClose.m) = symNames
  rownames(etfClose.m) = dates
  return(etfClose.m)
}


@

\section*{Return Ranked Portfolios}

In this section portfolios are constructed by ranking the assets by their mean return. The implication of this approach is that the mean return is assumed to be a predictor of future return (an idea that was resisted in previous portfolios).

The previous section shows results for portfolios constructed from assets that are ranked either by their Share ratio or their gain-loss ratio. These ratios can be independent of the magnitude of return. For example, an assets with a small return but an even smaller standard deviation will have a high Sharpe ratio. In the case of the gain-loss ratio, the actual magnitude of the gain or loss doesn't matter.

Portfolios constructed using the Sharpe or gain-loss filters can be heavily weighted toward bonds. While these portfolios have lower risk, they may be overly conservative.  This section examines portfolios where the assets are ranked on the basis of their estimated future return.

\subsection*{Return Ranked, Maximum Sharpe Ratio Portfolio}

\begin{itemize}
\item Rank the assets by their mean return
\item Filter out assets that have a high paired correlation with a reference asset.
\item Select \textit{N} assets for the portfolio. 
\item Use the mean return as an estimate of the future return
\footnote{Previous portfolios use EMA to estimate the future return. In theory this is a better estimate than the mean because it has a higher correlation with future return. However, this doesn't seem to be true here. Using the mean to estimate future return is a result of the observation that this produces a portfolio with better performance. However, in doing this there is a real danger of over-fitting.} 
\item Calculate the efficient frontier for a range of target returns. 
\item Select the portfolio with the highest Sharpe ratio.
\end{itemize}

<<maxSharpeMeanRet, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Plot Return Filtered Max Sharpe Ratio Portfolio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6, message=FALSE>>=

maxSharpeRet.l = backtestSharpePortfolio( etfRet, portfolioWeights, meanMean, meanMean, emaWin = 12 )

maxSharpeRetPort = calcBacktestRet(maxSharpeRet.l, etfClose)
maxSharpeRetData = cbind(maxSharpeRetPort, as.numeric(benchRet))
colnames(maxSharpeRetData) = c("Max Ret Sharpe", sp500Sym)
par(mfrow=c(1,1))
chart.CumReturns(maxSharpeRetData, wealth.index=T, main=paste("Maximum Sharpe Ratio Portfolio (mean) and ", sp500Sym), 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))

@

By ranking the portfolio assets by their mean return, higher return assets are selected, which may compltely omit assets like bond funds with lower return and risk. 

The next portfolio mixes 90 percent assets that are selected by return ranking and 10 percent selected by gain-loss ranking (these percentages were selected by observing back-test performance).

<<mixedPortfolio,  fig.lp="fig:", echo=FALSE, fig.cap="Plot of the Maximum Return + 10 percent Gain-Loss Ratio Portfolio", fig.align='left', fig.pos='H', fig.width=10, fig.height=6, message=FALSE>>=

calcMixedEfficientFrontier = function(retBlock, alphaFunc, emaWin = 12, adj = 0.10, weightGL ) {
  maxRetAssets = filterAssets(retBlock, mean, n = 40 * (1 - weightGL) )
  maxGL = filterAssets(retBlock, gainLossRatio, n = 40 * weightGL ) 
  selectAssets = unique( c(maxRetAssets, maxGL) )
  assetRet.z = zoo(retBlock[,selectAssets])
  t_port = TawnyPortfolio(assetRet.z, nrow(assetRet.z))
  S.hat = cov_shrink(t_port)
  alpha = apply(assetRet.z, 2, FUN=alphaFunc, win=emaWin)
  maxRet = max(alpha)
  maxRetAdj = maxRet - (maxRet * adj)
  minRet = min(alpha[alpha > 0])
  portRet = c()
  portSd = c()
  wts = c()
  if (minRet < maxRetAdj) {
    targetRet = seq(from=minRet, to=maxRetAdj, length=50)
    for (i in 1:length(targetRet)) {
      wRaw = portfolioWeights(targetRet[i], alpha, S.hat)
      if (! is.null(wRaw)) {
        # wVec = round(wRaw * 10000, -1)/10000
        wVec = round(wRaw, 4)
        r_p = wVec %*% alpha
        sd = sqrt(as.vector(wVec %*% S.hat %*% wVec))
        portRet = c(portRet, r_p)
        portSd = c(portSd, sd)
        wts = rbind(wts, wVec)
      }
    }
  } else {
    wRaw = portfolioWeights(targetRetAdj, alpha, S.hat)
    # wVec = round(wRaw * 10000, -1)/10000
    wVec = round(wRaw, 4)
    r_p = wVec %*% alpha
    sd = sqrt(as.vector(wVec %*% S.hat %*% wVec))
    portRet = r_p
    portSd = sd
    wts = wVec
  }
  
  return( list(portRet = portRet, portSd = portSd, wts = wts))
} # calcMixedEfficientFrontier


backtestMixedPortfolio = function(etfRet, alphaFunc, emaWin = 12, adj = 0.10, weightGL = 0.3) {
  dataDates = rownames(etfRet)
  numWeeks = length(dataDates)
  endIx = rev( seq(from=numWeeks, to=53, by=-12))
  startIx = endIx - 51
  weight.l = list()
  for (i in 1:length(startIx)) {
    retBlock = etfRet[startIx[i]:endIx[i],]
    blockDates = rownames(retBlock)
    startDate = blockDates[1]
    endDate = blockDates[length(blockDates)]
    frontier = calcMixedEfficientFrontier(retBlock, alphaFunc, emaWin, adj, weightGL)
    wts = NULL
    # for the efficient frontier, find the maximum Sharpe ratio portfolio
    if (length(frontier$portRet) > 1) {
      r_p = frontier$portRet
      stdDev = frontier$portSd
      sharpe = r_p / stdDev
      ix = which.max(sharpe)
      wts = frontier$wts[ix,]
    } else {
      wts = frontier$wts
    }
    wVec = wts[wts > 0]
    weight.l[[i]] = list(date = endDate, weights = wVec)
  }
  return(weight.l)
} # backtestMixedPortfolio

mixedPortRet.l = backtestMixedPortfolio( etfRet, meanMean, emaWin = 12, weightGL = 0.1 )
mixedPort = calcBacktestRet(mixedPortRet.l, etfClose)
mixedRetData = cbind(mixedPort, as.numeric(benchRet))
colnames(mixedRetData) = c("Mixed Return Port", sp500Sym)
par(mfrow=c(1,1))
chart.CumReturns(mixedRetData, wealth.index=T, main=paste("Ranked Return with 10% Max Gain-Loss Ratio (mean) and ", sp500Sym), 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))

@

<<echo=FALSE>>=

currentMixedPortfolio = function(retBlock, alphaFunc, emaWin = 12, adj = 0.10, weightGL = 0.3 ) {
  frontier = calcMixedEfficientFrontier(retBlock, alphaFunc, emaWin, adj, weightGL )
  wts = NULL
  if (length(frontier$portRet) > 1) {
    r_p = frontier$portRet
    stdDev = frontier$portSd
    sharpe = r_p / stdDev
    ix = which.max(sharpe)
    wts = frontier$wts[ix,]
  } else {
    wts = frontier$wts
  }
  wVec = weightFilter( wts[wts > 0] )
  return(wVec)
}


@

The best performing portfolios are the portfolios that target the maximum forecasted return from assets that were filtered using the gain-loss or Sharpe ratio (see Figure \ref{fig:plotReduxPort1}). 

Unfortunately these portfolios are not without their problems. In constructing the optimized portfolio, a target return near the maximum asset value is used. This can lead to selecting assets that perform very badly in market crashes. Although these portfolios perform well with the current set of assets, there was very poor performance with a previous set of ETFs.

Of the ranked portfolios that do not have significant draw-down in the market crash, best performing portfolio seems to the the maximum Sharpe ratio portfolio (Figure \ref{fig:maxSharpeMeanRet}) using the mean to calculate the future return and filter the portfolio set. This performance seems to be relatively independent of the set of assets.

\section*{Actual Performance}


<<echo=FALSE, cache=TRUE, message=FALSE>>=
today = findMarketDate( as.Date(portfolioEndDate) )
# 1-year plus one week of close prices, to yield 52 weeks of returns
closeStart = findMarketDate( today - (53 * 7) )

# Full list of all ETFs
etfListFile = "etf_list.txt"
etfListPath = paste(dataFileRoot, etfListFile, sep="/")
etfList.m = read.table(file = etfListPath)
etfList = as.vector(etfList.m[,1])
closeStart.ch = as.character(closeStart)
today.ch = as.character(today)

etf1yrCloseFile = "etf_1yr_close.csv"
etf1yrClosePath = paste(dataFileRoot, etf1yrCloseFile, sep="/")
if (file.exists(etf1yrClosePath)) {
  etf1yrCloseRaw = read.csv( etf1yrClosePath, row.names=1)
  etf1yrClose = as.matrix(etf1yrCloseRaw)
} else {
   etf1yrClose = readETFCloseList(etfList, closeStart.ch, today.ch)
   write.csv(x = etf1yrClose, file=etf1yrClosePath, quote=F, row.names=T)
}

len = nrow(etf1yrClose)
blockClose = etf1yrClose[(len-52):len,]

blockRet = apply(blockClose, 2, calcRet)
len = nrow(blockClose)
rownames(blockRet) = rownames(blockClose)[2:len]
@


\subsection*{August 28, 2014 - November 28, 2014}

This section looks back the performance of the portfolio selected for the period August 28 to November 28, 2014.
The portfolio is shown in the table below. The portfolio is a maximum Sharpe ratio portfolio (see Figure \ref{fig:maxSharpeMeanRet}).

<<echo=FALSE, message=FALSE, results='asis'>>=
aug28Names = c("GSY", "DIV", "PZA", "MINT", "IYW", "CSM", "BAB", "NIB", "FBT", "GVT")
aug28Wts = c(0.3410, 0.0130, 0.0840, 0.4720, 0.0240, 0.0040, 0.0340, 0.0110, 0.0100, 0.007 )
names(aug28Wts) = aug28Names
aug28Wts.df = t( data.frame( aug28Wts ) )
rownames(aug28Wts.df) = rep("", nrow(aug28Wts.df))
t = xtable(aug28Wts.df, label="portfolio_aug_28", caption="Portfolio August 28, 2014", digits = 4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=F, caption.placement="bottom")
@

This plot shows weekly continuously compounded returns. This return is unrealistically high, since the portfolio is not continuously compounded. This plot does give an idea of what the relative performance of the portfolio vs. the benchmark would be.

<<aug28_2014Portfolio,  fig.lp="fig:", echo=FALSE, fig.cap="Aug. 28, 2014 portfolio vs. SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5, message=FALSE>>=

getYearBlock = function(etfRet, endDate, len) {
  etfDates = rownames(etfRet)
  end = max(which(etfDates <= endDate))
  start = end - (len - 1)
  block = etfRet[start:end,]
  return(block)
}

# This will have to be fixed in the future
yearBlock = getYearBlock(blockRet,"2014-10-28", len=10)
aug28RetSeries = calcWeeklyReturn(yearBlock, aug28Wts)
aug28Dates = names(aug28RetSeries)
closeDatesYr = rownames(etf1yrClose)
ix = which(closeDatesYr == aug28Dates[1])
ix = ix - 1
startDateQtr = closeDatesYr[ix]
endDateQtr = closeDatesYr[ length(closeDatesYr)]
benchCloseQtr = getCloseData(sp500Sym, startDateQtr, endDateQtr, period="w")
benchRetQtr = calcRet( benchCloseQtr )

aug28RetSeries.z = zoo(aug28RetSeries)
index(aug28RetSeries.z) = as.Date(names(aug28RetSeries))
qtrRetData.z = cbind(aug28RetSeries.z, benchRetQtr)

colnames(qtrRetData.z) = c("Aug 2014 Portfolio", "S&P 500")

par(mfrow=c(1,1))
chart.CumReturns(qtrRetData.z, wealth.index=T, main="August 28, 2014 Portfolio and S&P 500", 
                 col=c("blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="bottomleft", grid.color="slategrey")
abline(h = 1, col="grey30")
par(mfrow=c(1,1))
@

\section*{Current Investment Portfolio: December 1, 2014}

The current portfolios are constructed for the current 1-year period. 
 This provides a larger universe of ETFs than the back-test. Of the \Sexpr{nrow(etfList.m)} ETFs there were \Sexpr{ncol(etf1yrClose)} which had a year of past data.

Two portfolio models were considered for the current investment period (e.g., December 1, 2014):

\begin{itemize}
\item Maximum return target portfolio. From August 2008 to November 2014 the maximum return portfolios in \ref{fig:plotReduxPort1} has a return of almost 150 percent. This portfolio did have a loss during the 2008-2009 market crash. This portfolio is also extremely sensitive to its ETF universe composition.

\item Efficient frontier maximum Sharpe ratio portfolio (Figure \ref{fig:maxSharpeMeanRet}). This is a more conservative portfolio. This portfolio had no loss during the market crash, although in the same period it had a very small return. During the investment period it had a return of almost 40 percent. This portfolio model is less sensitive to it's ETF universe and had similar performance with past ETF universes.
\end{itemize}

\subsection*{Maximum return target portfolio}

<<echo=FALSE, message=FALSE>>=

fillInNames = function(nameTable, wts) {
  wtNames = names(wts)
  ix = which(as.vector(nameTable[,"sym"]) %in% wtNames)
  etfNames = as.vector(nameTable[ix, "names"])
  etfSyms = as.vector(nameTable[ix,"sym"])
  wtTable = data.frame(etfSyms, wts[etfSyms], etfNames)
  colnames(wtTable) = c("sym", "weight", "name")
  return(wtTable)
}

rawDataFile = "etf_list_etf_dot_com.csv"
rawDataPath = paste(dataFileRoot, rawDataFile, sep="/")
rawData = read.csv(rawDataPath)
nameTable = rawData[,c("Ticker", "Fund.Name")]
colnames(nameTable) = c("sym", "names")
@


<<echo=FALSE, message=FALSE, results='asis'>>=
nov28_max_port_wts = calcPortfolioWeights(blockRet, portfolioWeights, sharpeRatio, emaMean, emaWin=12, adj=0.3)
nov28_max_port_table = fillInNames(nameTable, nov28_max_port_wts)

t = xtable(nov28_max_port_table, label="nov28_max_port", caption="Maximum Return Portfolio November 28, 2014", digits = 4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=F, caption.placement="bottom")
@

The correlations between the assets are:

<<corrPlotnov28_max_port, fig.lp="fig:", echo=FALSE, fig.cap = 'Correlation of Max Sharpe Ratio Current Portfolio', fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
#
# Correlation plot
#
portRet = blockRet[,names(nov28_max_port_wts)]
M = round(cor(portRet), 4)
#  mar = c(bottom, left, top, right)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2))
corrplot(M, type="lower", diag=F, addgrid.col=T, addCoef.col=T, method="square", order="hclust")
par(mfrow=c(1,1))
@

\subsection*{Efficient Frontier Maximum Sharpe Ratio Portfolio}

<<echo=FALSE, message=FALSE, results='asis'>>=

maxSharpeMeanRet = currentSharpePortfolio(blockRet, portfolioWeights, meanMean, meanMean )
maxSharpeTable = fillInNames(nameTable, maxSharpeMeanRet)
t = xtable(maxSharpeTable, label="maxSharpeMeanRet_port", caption="Efficient Maximum Sharpe Ratio Portfolio Nov 28, 2014", 
           digits = 4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=F, caption.placement="bottom")
@


The correlation between the portfolio assets is shown below.

<<currentPortCorrPlot, fig.lp="fig:", echo=FALSE, fig.cap = 'Correlation of Max Sharpe Ratio Current Portfolio', fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
#
# Correlation plot
#
portRet = blockRet[,names(maxSharpeMeanRet)]
M = round(cor(portRet), 4)
#  mar = c(bottom, left, top, right)
par(mfrow=c(1,1), mar=c(2, 2, 2, 2))
corrplot(M, type="lower", diag=F, addgrid.col=T, addCoef.col=T, method="square", order="hclust")
par(mfrow=c(1,1))
@

For smaller portfolios, asset weights of only a few percent can result in trading costs that consume a significant fraction of the asset return (at least for retail brokers like Charles Schwab). Portfolio optimization can allocate a portfolio that has no more than a certain percentage for each asset. But there is no easy way to allocate at least X percent per asset.

To address this issue, the weights for assets below a particular threshold can be proportionally reallocated to the other assets. This will change the risk return profile for the portfolio, but it should not be a huge change.

The portfolio, rounded to reallocate any asset with a weight of 5 percent or less, is shown below.

<<echo=FALSE, message=FALSE, results='asis'>>=

# Round a set of weights that are less than or equal to the threshold
#

portRounded = t( data.frame( roundPortfolio(maxSharpeMeanRet, 0.05) ) )
rownames(portRounded) = rep("", nrow(portRounded))
t = xtable(portRounded, label="rounded_portfolio_nov_28", caption="Rounded Portfolio November 28, 2014", digits = 4)
print.xtable(t, include.rownames=F, size="\\scriptsize", floating=F, caption.placement="bottom")
@

\subsection*{And the winner is...}

Although the maximum target return portfolio has more risk, it's historical returns are hard to resist. We'll see in early March 2015 how this worked out. The portfolio is shown below:

{\scriptsize
\begin{tabular}{lrl}
  \hline
sym & weight & name \\ 
  \hline
FBT & 0.4400 & First Trust NYSE Arca Biotechnology \\ 
  PJP & 0.0900 & PowerShares Dynamic Pharmaceuticals \\ 
  RYU & 0.4300 & Guggenheim S\&P Equal Weight Utilities \\ 
  INY & 0.0500 & SPDR Nuveen Barclays New York Municipal Bond \\ 
   \hline
\end{tabular}
}


\section*{Reproducibility}

This is reproducible research. This paper is compiled from a combination of executable R code and \LaTeX text. All of the tables and plots in the paper were generated by executing the R code that makes up the source of the paper. This paper is published along with its source and the ETF lists that were used. All of the ETF market data used in the paper was downloaded from \texttt{finance.yahoo.com}. The reader can examine the R code in the paper source to see how each of the plots was generated. The paper can be rerun using Knitr and R to regenerate the paper.

\end{document}