CAPM, a Review
Note that we will be using the Sharpe-Linter version of CAPM:
\begin{equation} E[R_{i}-R_{f}] = \beta_{im} E[(R_{m}-R_{f})] \end{equation}
\begin{equation} \beta_{im} := \frac{Cov[(R_{i}-R_{f}),(R_{m}-R_{f})]}{Var[R_{m}-R_{f}]} \end{equation}
Recall that we declare \(R_{f}\) (the risk-free rate) to be non-stochastic.
Let us begin. We will create a generic function to analyze some given stock.
Data Import
We will first import our utilities
import pandas as pd
import numpy as np
Let’s load the data from our market (NYSE) as well as our 10 year t-bill data.
t_bill = pd.read_csv("./linearity_test_data/10yrTBill.csv")
nyse = pd.read_csv("./linearity_test_data/NYSEComposite.csv")
nyse.head()
Date Close
0 11/7/2013 16:00:00 9924.37
1 11/8/2013 16:00:00 10032.14
2 11/11/2013 16:00:00 10042.95
3 11/12/2013 16:00:00 10009.84
4 11/13/2013 16:00:00 10079.89
Excellent. Let’s load in the data for that stock.
def load_stock(stock):
return pd.read_csv(f"./linearity_test_data/{stock}.csv")
load_stock("LMT").head()
Date Close
0 11/7/2013 16:00:00 136.20
1 11/8/2013 16:00:00 138.11
2 11/11/2013 16:00:00 137.15
3 11/12/2013 16:00:00 137.23
4 11/13/2013 16:00:00 137.26
Raw Data
And now, let’s load all three stocks, then concatenate them all into a big-ol DataFrame.
# load data
df = { "Date": nyse.Date,
"NYSE": nyse.Close,
"TBill": t_bill.Close,
"LMT": load_stock("LMT").Close,
"TWTR": load_stock("TWTR").Close,
"MCD": load_stock("MCD").Close }
# convert to dataframe
df = pd.DataFrame(df)
# drop empty
df.dropna(inplace=True)
df
Date NYSE TBill LMT TWTR MCD
0 11/7/2013 16:00:00 9924.37 26.13 136.20 44.90 97.20
1 11/8/2013 16:00:00 10032.14 27.46 138.11 41.65 97.01
2 11/11/2013 16:00:00 10042.95 27.51 137.15 42.90 97.09
3 11/12/2013 16:00:00 10009.84 27.68 137.23 41.90 97.66
4 11/13/2013 16:00:00 10079.89 27.25 137.26 42.60 98.11
... ... ... ... ... ... ...
2154 10/24/2022 16:00:00 14226.11 30.44 440.11 39.60 252.21
2155 10/25/2022 16:00:00 14440.69 31.56 439.30 39.30 249.28
2156 10/26/2022 16:00:00 14531.69 33.66 441.11 39.91 250.38
2157 10/27/2022 16:00:00 14569.90 34.83 442.69 40.16 248.36
2158 10/28/2022 16:00:00 14795.63 33.95 443.41 39.56 248.07
[2159 rows x 6 columns]
Log Returns
Excellent. Now, let’s convert all of these values into daily log-returns (we don’t really care about the actual pricing.)
log_returns = df[["NYSE", "TBill", "LMT", "TWTR", "MCD"]].apply(np.log, inplace=True)
df.loc[:, ["NYSE", "TBill", "LMT", "TWTR", "MCD"]] = log_returns
df
Date NYSE TBill LMT TWTR MCD
0 11/7/2013 16:00:00 9.202749 3.263084 4.914124 3.804438 4.576771
1 11/8/2013 16:00:00 9.213549 3.312730 4.928050 3.729301 4.574814
2 11/11/2013 16:00:00 9.214626 3.314550 4.921075 3.758872 4.575638
3 11/12/2013 16:00:00 9.211324 3.320710 4.921658 3.735286 4.581492
4 11/13/2013 16:00:00 9.218298 3.305054 4.921877 3.751854 4.586089
... ... ... ... ... ... ...
2154 10/24/2022 16:00:00 9.562834 3.415758 6.087025 3.678829 5.530262
2155 10/25/2022 16:00:00 9.577805 3.451890 6.085183 3.671225 5.518577
2156 10/26/2022 16:00:00 9.584087 3.516310 6.089294 3.686627 5.522980
2157 10/27/2022 16:00:00 9.586713 3.550479 6.092870 3.692871 5.514879
2158 10/28/2022 16:00:00 9.602087 3.524889 6.094495 3.677819 5.513711
[2159 rows x 6 columns]
And now, the log returns! We will shift this data by one column and subtract.
returns = df.drop(columns=["Date"]) - df.drop(columns=["Date"]).shift(1)
returns.dropna(inplace=True)
returns
NYSE TBill LMT TWTR MCD
1 0.010801 0.049646 0.013926 -0.075136 -0.001957
2 0.001077 0.001819 -0.006975 0.029570 0.000824
3 -0.003302 0.006161 0.000583 -0.023586 0.005854
4 0.006974 -0.015657 0.000219 0.016568 0.004597
5 0.005010 -0.008476 0.007476 0.047896 -0.005622
... ... ... ... ... ...
2154 0.005785 0.004940 -0.023467 -0.014291 0.001349
2155 0.014971 0.036133 -0.001842 -0.007605 -0.011685
2156 0.006282 0.064420 0.004112 0.015402 0.004403
2157 0.002626 0.034169 0.003575 0.006245 -0.008100
2158 0.015374 -0.025590 0.001625 -0.015053 -0.001168
[2158 rows x 5 columns]
Risk-Free Excess
Recall that we want to be working with the excess-to-risk-free rates \(R_{T}-R_{f}\), where \(R_{T}\) is some security. So, we will go through and subtract everything by the risk-free rate (and drop the RFR itself):
risk_free_excess = returns.drop(columns="TBill").apply(lambda x: x-returns.TBill)
risk_free_excess
NYSE LMT TWTR MCD
1 -0.038846 -0.035720 -0.124783 -0.051603
2 -0.000742 -0.008794 0.027751 -0.000995
3 -0.009463 -0.005577 -0.029747 -0.000307
4 0.022630 0.015875 0.032225 0.020254
5 0.013486 0.015952 0.056372 0.002854
... ... ... ... ...
2154 0.000845 -0.028406 -0.019231 -0.003591
2155 -0.021162 -0.037975 -0.043738 -0.047818
2156 -0.058138 -0.060308 -0.049017 -0.060017
2157 -0.031543 -0.030593 -0.027924 -0.042269
2158 0.040964 0.027215 0.010537 0.024422
[2158 rows x 4 columns]
Actual Regression
It is now time to perform the actual linear regression! We will use statsmodels’ Ordinary Least Squares API to make our work easier, but we will go through a full regression in the end.
import statsmodels.api as sm
CAPM Regression: Lockheed Martin
Let’s work with Lockheed Martin first for regression, fitting an ordinary least squares. Remember that the OLS functions reads the endogenous variable first (for us, the return of the asset.)
# add a column of ones to our input market excess returns
nyse_with_bias = sm.add_constant(risk_free_excess.NYSE)
# perform linreg
lmt_model = sm.OLS(risk_free_excess.LMT, nyse_with_bias).fit()
lmt_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: LMT R-squared: 0.859
Model: OLS Adj. R-squared: 0.859
Method: Least Squares F-statistic: 1.312e+04
No. Observations: 2158 AIC: -1.263e+04
Df Residuals: 2156 BIC: -1.262e+04
Df Model: 1 Prob (F-statistic): 0.00
Covariance Type: nonrobust Log-Likelihood: 6318.9
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0004 0.000 1.311 0.190 -0.000 0.001
NYSE 0.9449 0.008 114.552 0.000 0.929 0.961
==============================================================================
Based on the constants row, we can see that—within \(95\%\) confidence—the intercept is generally \(0\) and CAPM applies. However, we do see a slight positive compared to the market. Furthermore, we can see that the regression has a beta value of \(0.9449\) — according the CAPM model, it being slightly undervarying that the market.
CAPM Regression: MacDonald’s
# perform linreg
mcd_model = sm.OLS(risk_free_excess.MCD, nyse_with_bias).fit()
mcd_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: MCD R-squared: 0.887
Model: OLS Adj. R-squared: 0.887
Method: Least Squares F-statistic: 1.697e+04
No. Observations: 2158 AIC: -1.310e+04
Df Residuals: 2156 BIC: -1.309e+04
Df Model: 1 Prob (F-statistic): 0.00
Covariance Type: nonrobust Log-Likelihood: 6551.1
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0003 0.000 1.004 0.315 -0.000 0.001
NYSE 0.9651 0.007 130.287 0.000 0.951 0.980
==============================================================================
Same thing as before, we are within \(95\%\) confidence having a intercept of \(0\) (with a slight positive edge), and it looks like MacDonald’s vary a little bit more than Lockheed Martin. The food industry is probably a tougher business than that in defense.
CAPM Regression: Twitter
Lastly, to analyze the recently delisted Twitter!
# perform linreg
twtr_model = sm.OLS(risk_free_excess.TWTR, nyse_with_bias).fit()
twtr_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: TWTR R-squared: 0.522
Model: OLS Adj. R-squared: 0.522
Method: Least Squares F-statistic: 2357.
No. Observations: 2158 AIC: -8610.
Df Residuals: 2156 BIC: -8599.
Df Model: 1 Prob (F-statistic): 0.00
Covariance Type: nonrobust Log-Likelihood: 4307.1
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0002 0.001 -0.346 0.730 -0.002 0.001
NYSE 1.0173 0.021 48.549 0.000 0.976 1.058
==============================================================================
Evidently, Twitter is much more variable. It looks like it has a nontrivial bias (the intercept being -0.001 being within the \(95\%\) confidence band — that the security is possibly significantly underperforming the CAPM expectation in the market.) Furthermore, we have a positive beta value: that the asset is more variable than the market.
Manual Checking
We can also use the betas formula to manually calculate what we expect for the beta values (i.e. as if they were one IID random variable.)
risk_free_cov = risk_free_excess.cov()
risk_free_cov
NYSE LMT TWTR MCD
NYSE 0.001143 0.001080 0.001163 0.001103
LMT 0.001080 0.001188 0.001116 0.001083
TWTR 0.001163 0.001116 0.002264 0.001155
MCD 0.001103 0.001083 0.001155 0.001200
Finally, to construct the beta values. Recall that:
\begin{equation} \beta_{im} := \frac{Cov[(R_{i}-R_{f}),(R_{m}-R_{f})]}{Var[R_{m}-R_{f}]} \end{equation}
and that:
\begin{equation} Var[X] = Cov[X,X], \forall X \end{equation}
# get the market variance (covariance with itself)
market_variation = risk_free_cov.NYSE.NYSE
# calculate betas
betas = {"LMT": (risk_free_cov.LMT.NYSE/market_variation),
"TWTR": (risk_free_cov.TWTR.NYSE/market_variation),
"MCD": (risk_free_cov.MCD.NYSE/market_variation)}
# and make dataframe
betas = pd.Series(betas)
betas
LMT 0.944899
TWTR 1.017294
MCD 0.965081
dtype: float64
Apparently, all of our assets swing less than the overall NYSE market! Especially Lockheed—it is only \(94.4\%\) of the market variation. Furthermore, it is interesting to see that Twitter swings much more dramatically compared to the market.
Equal-Part Fund
We will now create two funds with the three securities, one with equal parts and one which attempts to maximizes the bias (max returns) while minimizing the beta variance value compared to the market.
First, let’s create a baseline fund in equal parts. Here it is:
fund_1_returns = returns.LMT + returns.TWTR + returns.MCD
fund_1_returns
1 -0.063167
2 0.023420
3 -0.017149
4 0.021384
5 0.049750
...
2154 -0.036409
2155 -0.021132
2156 0.023917
2157 0.001720
2158 -0.014596
Length: 2158, dtype: float64
We will calculate the excess returns of this fund:
fund_1_excess = fund_1_returns-returns.TBill
fund_1_excess
1 -0.112813
2 0.021600
3 -0.023310
4 0.037041
5 0.058226
...
2154 -0.041349
2155 -0.057265
2156 -0.040503
2157 -0.032449
2158 0.010994
Length: 2158, dtype: float64
Performance of the Equal-Part Fund
# perform linreg
fund_1_model = sm.OLS(fund_1_excess, nyse_with_bias).fit()
fund_1_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.473
Model: OLS Adj. R-squared: 0.473
Method: Least Squares F-statistic: 1935.
No. Observations: 2158 AIC: -7735.
Df Residuals: 2156 BIC: -7724.
Df Model: 1 Prob (F-statistic): 3.01e-302
Covariance Type: nonrobust Log-Likelihood: 3869.5
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0007 0.001 0.841 0.401 -0.001 0.002
NYSE 1.1290 0.026 43.993 0.000 1.079 1.179
==============================================================================
Surprisingly, we have now created a significantly riskier investment that, though riskier, generates a much higher probability of reward (\(+0.001\) is now within the \(99\%\) band!)
A More Optimized Fund
To me, this is the payoff of this assignment. We will now use CAPM to create the “best” fund combination—given some variance, the funds which match CAPM. To do this, let’s create a generic linear combination of the assets.
import sympy as sym
x = sym.Symbol('x')
y = sym.Symbol('y')
z = sym.Symbol('z')
fund_2_returns = x*returns.LMT + y*returns.TWTR + z*returns.MCD
fund_2_returns
1 0.0139260753744255*x - 0.0751364261353569*y - ...
2 -0.00697525170622448*x + 0.0295704573211193*y ...
3 0.000583132897928884*x - 0.0235859990058791*y ...
4 0.000218587198947517*x + 0.016568426347233*y +...
5 0.00747599199607762*x + 0.0478955096700351*y -...
...
2154 -0.0234665578621085*x - 0.0142913301107561*y +...
2155 -0.00184214468578059*x - 0.0076045993852194*y ...
2156 0.00411172646842317*x + 0.0154024001854269*y +...
2157 0.00357547337231878*x + 0.0062445563228315*y -...
2158 0.00162509910496933*x - 0.0150529686289622*y -...
Length: 2158, dtype: object
Excellent. We will also calculate the excess returns of this fund:
fund_2_excess = fund_2_returns-returns.TBill
Y = fund_2_excess.to_numpy()
Y
[0.0139260753744255*x - 0.0751364261353569*y - 0.00195664549320096*z - 0.0496463208073039
-0.00697525170622448*x + 0.0295704573211193*y + 0.000824317408861575*z - 0.00181917459665826
0.000583132897928884*x - 0.0235859990058791*y + 0.00585367525146019*z - 0.00616055581298536
...
0.00411172646842317*x + 0.0154024001854269*y + 0.00440300114913317*z - 0.0644196927849867
0.00357547337231878*x + 0.0062445563228315*y - 0.0081004573348249*z - 0.0341688956152497
0.00162509910496933*x - 0.0150529686289622*y - 0.00116834209450634*z + 0.0255902303732043]
We cast this type to a numpy array because we are about to perform some matrix operations upon it.
Optimizing the Optimized Fund: Linreg
Now, let us perform the actual linear regression ourselves. Recall that the pseudoinverse linear regression estimator is:
\begin{equation} \beta = (X^{T}X)^{-1}X^{T}Y \end{equation}
We have already our \(Y\) as a vector (above), and our \(X\) is:
X = nyse_with_bias.to_numpy()
X
[[ 1.00000000e+00 -3.88457302e-02]
[ 1.00000000e+00 -7.42217926e-04]
[ 1.00000000e+00 -9.46284244e-03]
...
[ 1.00000000e+00 -5.81378271e-02]
[ 1.00000000e+00 -3.15429207e-02]
[ 1.00000000e+00 4.09643405e-02]]
We now have our matrices, let’s perform the linear regression!
linear_model = np.linalg.inv((X.transpose()@X))@X.transpose()@Y
linear_model
[0.000544056413840724*x - 6.62061061591867e-5*y + 0.000429966553373172*z - 0.000178620725465344
0.0457830563134785*x + 0.118178191274045*y + 0.0659651260604729*z + 0.899115719100281]
Excellent. So we now have two rows; the top row represents the “bias”—how much deviation there is from CAPM, and the bottom row represents the “rate”—the “beta” value which represents how much excess variance there is.
Optimizing the Optimized Fund: Picking Optimizing Parameters
We can will solve for a combination of solutions to give us specific values of returns vs risk. For instance, we can fix the variance to 1 (i.e. we can vary as much as the market.) We subtract one here for the solver, which expects the expressions equaling to \(0\).
risk_expr = linear_model[1] - 1
risk_expr
0.0457830563134785*x + 0.118178191274045*y + 0.0659651260604729*z - 0.100884280899719
Now, we will set a certain earning value, and solve for possible solutions. We will try to get the largest possible bias without needing to short something (i.e. cause a negative solution). By hand-fisting a value, it seems 0.001 is a good bet.
deviance_expr = linear_model[0] - 0.001
deviance_expr
0.000544056413840724*x - 6.62061061591867e-5*y + 0.000429966553373172*z - 0.00117862072546534
Optimizing the Optimized Fund: Optimize!
solution = sym.solvers.solve([deviance_expr, risk_expr], x,y,z)
solution
{x: 2.16803104555387 - 0.819584899551304*z, y: 0.0137520589394366 - 0.24067066980814*z}
We have one degree of freedom here: how much MacDonald’s we want! Let’s say we want none (which would, according to this, be an equally efficient solution.)
How Does Our Fund Do?
This would create the following plan:
# for our case
z_val = 0
# numerical solutions
s_x = solution[x].subs(z,z_val)
s_y = solution[y].subs(z,z_val)
# solution
fund_2_nobias_nomac = s_x*returns.LMT + s_y*returns.TWTR + z_val*returns.MCD
fund_2_nobias_nomac.mean()
0.001185050286566688
Recall that this is the performance of the balanced portfolio:
fund_1_returns.mean()
0.0009224705380695683
So, for market-level risk (\(\beta =1\), instead of the balanced portfolio’s \(\beta =1.1290\)), this is a pretty good deal!
Some Plots
Finally, let’s plot the prices of our various funds:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import datetime
sns.set()
fund_2_price = s_x*df.LMT + s_y*df.TWTR + z_val*df.MCD
fund_1_price = df.LMT + df.TWTR
fund_l_price = df.LMT
fund_t_price = df.TWTR
dates = df.Date.apply(lambda x:datetime.strptime(x, "%m/%d/%Y %H:%M:%S"))
sns.lineplot(x=dates, y=fund_2_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_1_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_l_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_t_price.apply(sym.Float).astype(float))
plt.gca().xaxis.set_major_locator(mdates.YearLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.gca().set_ylabel("Price")
plt.show()
Recall that we didn’t actually buy any MacDonald’s. So, we have—blue, being our “optimal” portfolio, yellow, our balanced portfolio, green, being Lockheed, and red, being Twitter.
Our portfolio works surprisingly well!