Running a SLR in Python is fairly simple once you know how to use the relevant functions. What might be confusing is that there exist several packages which provide functions for linear regression. We will use functions from the statsmodels
(sub-)package. Other packages such as e.g. scikit-learn
have linear regression functions too, but what makes statsmodels
stand out from other packages is its broad set of auxiliary functions for regression diagnostics. As usual we start by importing the packages needed for our task.
In [1]:
# Load relevant packages
%matplotlib inline
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
plt.rcParams['font.size'] = 14
As toy data we will use the 'Advertising' data set introduced in the chapter 4 of the script. The data is taken from James et al. (2013). A copy is provided on the book's website where we will download it from.
In [2]:
# From Advertising data set read cols 2:4
url = 'http://faculty.marshall.usc.edu/gareth-james/ISL/Advertising.csv'
ad = pd.read_csv(url, sep=',', usecols=(np.arange(1, 5, 1)))
print(ad.head())
Next we run a linear regression of TV
on sales
to calculate the coefficients and print a summary output.
In [3]:
# Run regression and calculate fit
reg = sm.OLS(ad.sales, exog=sm.add_constant(ad.TV)).fit()
# Alternatively: reg = sm.OLS(ad.sales, sm.add_constant(ad.TV)).fit()
print(reg.summary())
Side note: If you are an R
guy and prefer their syntax, you could import the statsmodels.formula.api subpackage and run something like:
from statsmodels.formula.api import ols reg = ols("Sales ~ TV", data=ad).fit() reg.summary()
Intercept will automatically be calculated in the above setting.
Both p-values for intercept ($\hat{\beta}_0$) and slope ($\hat{\beta}_1$) are smaller than any reasonable significance level and thus we can reject the null hypothesis that either of the coefficients is zero (or irrelevant).
Instead of printing the whole summary, we could also access each of the three summary as follows:
In [4]:
reg.summary().tables[1]
Out[4]:
The sm.OLS()
function calculates all kind of regression-related results. All this information is attached to the reg
object. For example to plot the data we would need the model's fitted values. These can be accessed by combining the regression variable/object with the attribute .fittedvalues
as in reg.fittedvalues
. In below plot it is shown how this can be of use. We plot the data and fit using the standard plotting functions.
In [5]:
# Plot scatter & lm
plt.figure(figsize=(12, 8))
plt.scatter(ad.TV, ad.sales, marker='.', label='Sample') # Training data
plt.plot(ad.TV, reg.fittedvalues, c='k', label='Fit') # Linear fit
plt.ylabel('Sales')
plt.xlabel('TV')
plt.legend();
Let us plot residuals versus fitted values to do some visual regression diagnostics:
In [6]:
plt.figure(figsize=(12, 8))
plt.scatter(ad.TV, reg.resid)
plt.axhline(y=0, c='k') #Black horizontal line at 0
plt.xlabel('TV (fitted)')
plt.ylabel('Residuals');
The above two plots are just standard matplotlib
plots serving the purpose of visual diagnostics. Beyond that, the statsmodel
package has separate built-in plotting functions suited for visual regression diagnostics. We will not discuss them here but if interested you'll find plenty of sources on the web (e.g. here, here or here)
The reg
object contains a ton of information which you can all access. To see what there is, type reg.
and press tab. Two examples are shown below. Notice that some are attributes (like x.shape
, x.size
) and do not need paranthesis to call them. Others are methods (similar to .sum()
, .min()
) and require parenthesis.
In [7]:
# Regression coefficients
print(reg.params, '\n')
print(reg.resid.head())
In [8]:
# 99% CI (alpha = 1%) based on t-distribution
reg.conf_int(alpha=0.01)
Out[8]:
The regression summary provides $t$-statistic and p-value for the null hypothesis $H_0: \hat{\beta}_j = 0$, $H_a: \hat{\beta}_j \neq 0$. You can call the resulting $t$-statistic and p-value with its attributes.
In [9]:
print(reg.tvalues, '\n')
print(reg.pvalues)
If you wish to test a different null hypothesis, e.g. $H_0: \hat{\beta}_{TV} = 0.054$ vs. $H_1: \hat{\beta}_{TV} \neq 0.054$ use the following code:
In [10]:
reg.t_test('TV=0.054')
Out[10]:
As far as I know, Statsmodels
does not provide a function to calculate 'greater than' or 'smaller than' alternative hypothesis. Reason being: because with symmetric distributions, the one-sided test can be derived from the two-sided test. A one-sided p-value is just half of the two-sided p-value. This means that given p and $t$ values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and $t$ > 0, and of a less-than test when p/2 < alpha and $t$ < 0.
The $R^2$ measure, or "coefficient of determination", displays the proportion of the variability in $y$ that is well explained by the regression fit. It is defined as
$$\begin{equation} R^2 = \frac{TSS - SSR}{TSS} = 1 - \frac{SSR}{TSS} \end{equation}$$where TSS is the total sum of squares, defined as $TSS = \sum (y_i - \bar{y})^2$, and SSR is the sum of squared residuals, given by $SSR = \sum (y_i - \hat{y}_i)^2$.
It is easy to call the $R^2$ value from the regression object reg
as the following line shows.
In [11]:
# R squared measure
reg.rsquared
Out[11]:
In general we assume a constant variance of the error term (homoskedasticity; $Var(\epsilon_i) = \sigma^2$ for $i = 1, \ldots, N$). From the residuals vs. fitted plot we have to question this assumption. To test it mathematically, you can run a heteroskedasticity test. The stats package offers several test options; the more common ones are White's or the one from Breusch-Pagan. See here for more details on tests on heteroskedasticity.
Below the White test is applied as an example. The parameter 'reg.model.exog' simply contains the matrix X (here it is a [200 x 2] matrix with constant 1 in first column and values for TV in second). The output should become more clear when you check the function's help page (use ?sm.stats.diagnostic.het_white
).
The null hypothesis is that the error variance does not depend on X, thus is homoskedastic. Based on the large f-statistic value we can gently reject the null-hypothesis that the error variance is homoskedastic.
In [12]:
# Test for heteroskedasticity with White test
wht = sm.stats.diagnostic.het_white(resid=reg.resid, exog=reg.model.exog)
print('f-statistic: {0:>19.4f} \n'
'p-value for f-statistic: {1:>7.4f}'.format(wht[2], wht[3]))
If you wish to run tests with heteroskedastistic robust standard errors you can either access the reg
object's robust standard errors (reg.HC0_se
, reg.HC1_se
, reg.HC2_se
, reg.HC3_se
) or, more conveniently, directly define the covariance estimator (e.g. 'HC3' as below) when you generate the object in the first place. Below example shows how you can do this. See also here or here for some further information.
In [13]:
regRobust = sm.OLS(ad.sales, exog=sm.add_constant(ad.TV)).fit(cov_type='HC3')
print(regRobust.HC3_se, '\n')
print(reg.HC3_se)
The statsmodels
package offers many more functions to run regression diagnostics, e.g. checks for autocorrelation, non-linearity, normality of residuals etc. These functions are applicable to both simple as well as multiple linear regression models. There's a short Jupyther notebook detailing some of the options.
A stock beta measures the systematic risk of a security, the tendency of a security to respond to swings in the broad market. Typically, a large, well diversified index is taken as a proxy for the market portfolio (e.g. S&P500, Euro Stoxx 50, SPI, etc.). There are different ways to calculate a stock beta. We will show the regression approach, where a stock's beta is the slope of the following linear regression:
$$\begin{equation} r - r_f = \alpha + \beta(r_M - r_f) + e \end{equation}$$Let us look into Geberit's stock beta. As a proxy for the market portfolio we use the Swiss market index (SMI). The risk free rate is set to $r_f=0$, which is a fairly reasonable approach in light of the Swiss national bank's (SNB) interest rates for the past five years (at the time of this writing in 2017/2018). We will work with monthly returns for the past five years (60 months) - though other approaches (e.g. last 24 monthly returns, weekly returns for last 2 years, etc.) are reasonable choices too. The stock and SMI data we will load from a csv that was sourced through a financial data provider.
In [14]:
df = pd.read_csv('Data/SMIDataMonthly.csv', sep=',',
parse_dates=['Date'], dayfirst=True,
index_col=['Date'])
df.head()
Out[14]:
The dataframe df
contains monthly closing prices on all SMI-stocks (incl. SMI index) with a date index in descending order. Let us create a separate Pandas dataframe with the returns of the past 60 months (dfRets
) for both GEBN as well as the index.
In [15]:
# Calculate returns and assign to variable dfRets
dfRets = pd.DataFrame()
dfRets['GEBNrets'] = np.log(df['GEBN'] / df['GEBN'].shift(-1))
dfRets['SMIrets'] = np.log(df['SMI'] / df['SMI'].shift(-1))
print(dfRets.head())
Having done that, we are already set to run the regression and print the results.
In [16]:
# Set observation period (last 60 monthly returns)
months = 60
# Create OLS object, run regression and calculate fit
regBeta = sm.OLS(endog=dfRets.iloc[:months, 0],
exog=sm.add_constant(dfRets.iloc[:months, 1])).fit()
# Show table on coefficients
print(regBeta.summary())
Based on the regression output we have no reason to reject the null of Geberit's stock beta being equal to zero. The $R^2$ measure, though, shows that only a small portion of the variation in Geberits monthly returns is explained by SMI's monthly returns.
In [17]:
# Get relevant information
beta = regBeta.params['SMIrets']
alpha = regBeta.params["const"]
rsqr = regBeta.rsquared
# Plot scatter & lm; add text with alpha, beta , R2
plt.figure(figsize=(12, 8))
plt.scatter(dfRets.iloc[:months, 1],
dfRets.iloc[:months, 0],
marker='.', label='Monthly Returns')
plt.plot(dfRets.iloc[:months, 1], regBeta.fittedvalues, c='k', label='Fit')
plt.gca().set_aspect('equal')
plt.gca().set_xlim(-0.1, 0.15)
plt.ylabel('Geberit Monthly Returns')
plt.xlabel('SMI Monthly Returns')
plt.legend(loc='lower right')
plt.text(-0.08, 0.13, 'Beta: {0: .2f}'.format(beta))
plt.text(-0.08, 0.12, 'Alpha: {0: .2f}'.format(alpha))
plt.text(-0.08, 0.11, 'R^2: {0: .2f}'.format(rsqr));
What we calculated above is often referred to as the raw beta. The beta value of a stock has been found to be on average closer to the mean value of 1.0, the beta of an average-systematic-risk portfolio, than to the value of the raw beta (Pinto et al. (2016)). This is why data providers such as Bloomberg publish the adjusted beta as first introduced by Blume (1971), which is calculated as
$$ \text{Adjusted beta} = 2/3 \cdot \text{raw beta} + 1/3 \cdot 1$$Now, let us assume we are given the task to investigate whether a beta indeed regresses to 1 over time. For that we could, as a starting point, assess a stock's rolling beta over the past years. Note that this is just an example of use. Computationally it would be much faster to calculate the stock beta via the covariance/variance formula.
In [18]:
def rollingBeta(df, window=60):
'''Calculates the running beta of a stock.
Parameters
==========
df : [n x 2] pandas dataframe with log-returns for
stock and market portfolio. Index should be
datetime series.
window : rolling window with default value 60 [optional]
Returns
=======
rb : Pandas dataframe with (backward-looking) rolling beta.
'''
# Drop NA rows from df
df = df.dropna()
# Set up empty results array
res = np.empty(len(df) - window + 1)
# Loop through df
for i in range(0, len(df)):
# As long as remaining subset is >= window, we proceed
if (len(df) - i) >= window:
# Subset df
sub = df.iloc[i:window+i, :]
# Run Regression
model = sm.OLS(endog=sub.iloc[:, 0],
exog=sm.add_constant(sub.iloc[:, 1])).fit()
# Read out beta coefficient
res[i] = model.params[1]
# Format output to dataframe
rb = pd.DataFrame(data=res, index=df.index[:(len(df)-window+1)])
rb.columns = ['RollingBeta']
return rb
Now we are ready to call the function. This time we use the last two years of monthly returns. Thus we set window=24
to overwrite the default value of 60.
In [19]:
# Call function and save output to 'rollBeta'
rollBeta = rollingBeta(df=dfRets, window=24)
Though this is far away from a thorough analysis, plotting the results shows that at least in Geberit's case, there is indeed some truth to the assessment, that the beta exhibits some reversion to the market beta value of 1.
In [20]:
# Import 'mdates' library to format dates in x-axis
import matplotlib.dates as mdates
# Plot rolling beta
fig, ax = plt.subplots(1, figsize=(12, 8))
ax.plot(rollBeta, label='Geberit Rolling Beta')
ax.axhline(y=1, c='gray', linestyle=':') # Horizontal line
ax.legend(fontsize=12)
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=6))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%Y'))
fig.autofmt_xdate(); # Autorotate x-axis for readability
Simple linear regression serves well to introduce the concept and to build a good understanding. However, in reality we often have to work with more than one predictor. In the advertising data set for example we had not only data on TV advertising spendings but also on radio newspaper. It thus makes sense to extend the simple to a multiple linear regression model.
We again use the Advertising data set to see how this is done in Python. The same functions from the statsmodels
package apply to multiple linear regression. We run the following regression
In [21]:
# Assign features and response to X and y
y = ad.sales
X = ad[['TV', 'radio', 'newspaper']]
X = sm.add_constant(X)
In [22]:
# Run regression and print summary
mlReg = sm.OLS(endog=y, exog=X).fit()
print(mlReg.summary())
The coefficient for radio (0.1885) tells us, that - holding all other factors fixed - an additional 1'000 dollars in radio advertising spendings will boost the product's sales by 188.5 units.
Again: If you are an R
guy and prefer their syntax, you could import the statsmodels.formula.api subpackage and run something like:
from statsmodels.formula.api import ols mlReg = ols("Sales ~ TV + radio + newspaper", data=ad).fit() mlReg.summary()
Again the summary above provides $t$-statistic and p-value for each individual regression coefficient. As was the case for the simple linear regression, the underlying null hypothesis is that each parameter is zero ($H_0: \beta_{j,\, H_0} = 0$). For TV and Radio we reject the null even at the 1% significance level. However, given the large p-value for Newspaper we fail to reject the null for $\beta_{\text{Newspaper}} = 0$ at any reasonable level. Thus we can conclude that leaving Newspaper data out might be a reasonable option. If other null hypothesis' ought to be tested, we can use the same command as shown above.
In [23]:
# t-test on H0: beta(TV) = 0.0475
mlReg.t_test('TV=0.0475')
Out[23]:
The output shows that we fail to reject the null hypothesis that $\beta_{TV} = 0.0475$.
Beyond the element-wise hypothesis tests the regression summary also provides F-statistic (and the corresponding p-value) on the combined hypothesis that
$$\begin{align} H_0&: \quad \beta_j = \beta_1, \beta_2, \ldots \beta_p = 0 \\ H_a&: \quad \beta_j \neq 0 \text{ for at least one $j$} \end{align}$$On the basis of the corresponding p-value (i.e. 1.58e-96) we can reject the null at any reasonable significance level. Should we be interested in assessing a particular hypothesis, say
$$\begin{align} H_0&: \quad \beta_{TV} = \beta_{\text{Radio}} = 0.1 \\ H_a&: \quad \beta_{TV} \neq \beta_{\text{Radio}} \neq 0.1 \end{align}$$we use the .f_test()
method.
In [24]:
# Test H0: beta(radio) = beta(newspaper) = 0.1
mlReg.f_test('radio = newspaper = 0.1')
Out[24]:
See the documentation page for further examples on how this function can be used.
The $R^2$ measure for the MLR is the same as for the SLR. However, in the case of MLR it has one drawback: the value will always increase when more explanatory variables are added to the model - even if those variables are only weakly associated with the response. To make good on this disadvantage a modificated measure is often used: adjusted $R^2$.
$$\begin{equation} R^2_{adj} = 1 - (1-R^2) \frac{n-1}{n-p-1} \end{equation}$$To get this measure in Python, simply use the OLS object and call the .rsquareed_adj
attribute.
In [25]:
mlReg.rsquared_adj
Out[25]:
We will apply the concept of multiple linear regression in the context of Fama-French's three factor model (Fama and French (1993)). Their model follows Ross' arbitrage pricinge theory which postulates that excess returns are linearly related to a set of systematic risk factors (Ross et al. (1973)). The factors can be returns on other assets, such as the market portfolio, or any other variable (e.g. interest rates, inflation, consumption growth, market sentiment, hedging demands etc.. Fama-French empirically discovered three factors to capture the systematic risk: (A) firm size, (B) book-to-market ratio (B/M) and (C) market risk. To quantify their findings, Fama-French constructed zero-net-investment factor portfolios capturing the systematic risk on firm size (factor is labeled 'small minus big' (SMB) and is constructed by going long on small and short on big size stocks) and B/M (labeled 'high minus low' (HML), i.e. going long on high B/M, short on low B/M stocks). The sensitivity of individual stocks to the three factors is then given by the estimated coefficients of a multiple linear regression. As a group the three factors combine for the total risk premium.
The expected excess return $R_{it}$ of asset $i$ at time $t$ in the Fama-French three-factor model is described by
$$\begin{equation} R_{it} = \alpha_i + \beta_i^{M} (r_{M,t} - r_{f,t}) + \beta_i^{SMB} SMB_t + \beta_i^{HML} HML_t + \epsilon_{it} \end{equation}$$The above Fama-French factors are calculated on a monthly basis and published on Kenneth R. French's website. There you will also find information on the methodology of the model and lots of other possible factors besides the three we look into here. To run this regression in Python we use a shortcut. The pandas_datareader
package is capable of loading the data without having to download a txt or csv file in a separate step.
We will calculate the factor beta for all 20 SMI stocks. For that we use Fama-French's 'European 3 Factors' data. Following our 'Stock Beta' example from above and for the sake of simplicity, the risk free rate for Switzerland is again assumed to be zero (wrt. $R_{it}$).
In [26]:
import pandas_datareader as web
# Define obs. period, start & enddate
months = 60
startdate = '2012-06-01'
enddate = '2017-12-31'
Side Note: If you want to know what data is available (and their labels), you can run the get_available_datasets()
function.
from pandas_datareader.famafrench import get_available_datasets get_available_datasets()
or simply check Kenneth R. French's website
In [27]:
# Load FF factors
data = web.DataReader('Europe_3_Factors', data_source='famafrench',
start=startdate, end=enddate)
Variable data
is a dictionary with three entries: monhtly data, annual data and a description. We select the monthly rates in dictionary 0 and format the data.
In [28]:
# Select monthly data
ff = data[0]
# Sort data in descending order
ff = ff.sort_index(ascending=False)
# Convert returns to decimal percentages
ff = ff/100
print(ff.head(3))
Next we calculate the log-returns of all SMI stocks. The share prices are taken from dataframe df
which we loaded above.
In [29]:
shsRets = np.log(df / df.shift(-1))
shsRets = shsRets['2017-12-31':'2012-06-01']
shsRets = shsRets.iloc[:, :-1] # We exclude the last column (with SMI data)
shsRets.head(3)
Out[29]:
In order to run the regression we need to have matching indices. Fama-French's index is 'yyyy-mm' while our dataframe with returns has format 'yyyy-mm-dd'. Since we know that the length of both dataframes is equal, we can simply overwrite the index of one of the dataframes.
In [30]:
# Create matching indices
ff.index = shsRets.index
We are now in a position to run the multiple linear regression. We will again use the past 60 months. From the Fama French set we just need the first three columns. Column 4 is the risk free rate which we do not use.
In [31]:
# Add constant to matrix for alphas (=intercept)
X = sm.add_constant(ff.iloc[:months, :3])
In [32]:
# Assign ticker to variable
tickers = shsRets.columns
In [33]:
# Create results matrix to paste beta factors
res = np.empty(shape=(5, len(tickers)))
In [34]:
# Run regression for each ticker
for i in range(0, len(tickers)):
# Select returns of share i
sub = shsRets.iloc[:months, i]
# Run regression
model = sm.OLS(endog=sub, exog=X).fit()
# Paste beta factors to 'res' matrix
res[0:4, i] = model.params
res[4, i] = model.rsquared_adj
In [35]:
# Format output to dataframe
ff3f = pd.DataFrame(data=res, index=['Alpha', 'BetaMkt', 'BetaSMB', 'BetaHML', 'R2_adj'])
ff3f.columns = tickers
And here are our factors:
In [36]:
ff3f
Out[36]:
In [37]:
# Transpose matrix (.T) and display stats summary
print(ff3f.T.describe())
Now that we have all the factors, let's assume we want to build a portfolio with all 20 SMI stocks that maximizes the Sharpe ratio (SR). As a further condition we want to limit our exposure to the SMB factor to, let's say, $\beta_p^{SMB} = 0$. How would we allocate our investment under these conditions? In mathematical terms we have the following optimization problem:
$$\begin{equation} \max_{w_i} SR = \frac{\mathbb{E}[r_p] - r_f}{\sigma_p} \qquad s.t. \qquad \begin{cases} \sum w_i &= 1 \\ \beta_p^{SMB} &= 0 \end{cases} \end{equation}$$Usually, to calculate the expected return $\mathbb{E}[r_p]$, historical returns are taken. For our case here, we will take the expected returns given by our Fama-French 3 Factor model (denoted $\mathbf{R_{ff}}$). The portfolio variance $\sigma_p$ however, we estimate using historical data. Alternatively one could think of taking the SMI volatility index value as proxy. But this is only approximately true because we will not have the same weights per stock as the SMI and this thus might be questionable. With that we have
$$\begin{equation} \max_{w_i} SR = \frac{\mathbf{w}^T \left(r_f + \mathbf{\alpha} + \mathbf{\beta}^{M} (r_{M} - r_{f}) + \mathbf{\beta}^{SMB} SMB + \mathbf{\beta}^{HML} HML \right) - r_f}{\mathbf{w}^T \mathbf{\Sigma}\mathbf{w}} \qquad s.t. \qquad \begin{cases} \sum w_i &= 1 \\ \beta_p^{SMB} &= 0 \end{cases} \end{equation}$$Python can solve this problem numerically. We first set the stage by defining a auxiliary function pfStats
that returns the expected portfolio return, volatility and Sharpe ratio given a vector of weights. Note that the function also makes use of other data like monthly returns and the riskfree rate as previously defined (which is set to 0) but only weights are a function input value. This is necessary for the optimization function.
In [38]:
# Define rf and (equally spread) start weights
rf = 0
wghts = np.repeat(1. / len(tickers), len(tickers))
# Expected stock returns based on ff3f model
expShsRets = rf + ff3f.T.Alpha + \
ff3f.T.BetaMkt * ff['Mkt-RF'].mean() + \
ff3f.T.BetaSMB * ff.SMB.mean() + \
ff3f.T.BetaHML * ff.HML.mean()
In [39]:
def pfStats(weights):
'''Returns basic measures for a portfolio
Parameters
==========
weights : array-like
weights for different securities in portfolio
Returns
=======
expPfRet : float
weighted, annualized expected portfolio return based on ff3f model
pfVol : float
historical annualized portfolio volatility
SR : float
portfolio Sharpe ratio for given riskfree rate
'''
expPfRet = np.sum(weights * expShsRets) * 12
pfVol = np.sqrt(np.dot(weights.T, np.dot(shsRets.cov() * 12, weights)))
SR = (expPfRet - rf) / pfVol
return np.array([expPfRet, pfVol, SR])
Python's scipy package has a sublibrary for constrained optimization problems. We will use the minimize function and minimize the negative value of the Sharpe ratio (which is obviously equal to maximizing the SR)
In [40]:
import scipy.optimize as sco
def minSR(wghts):
return -pfStats(wghts)[2]
Our constraints are as stated above: $\sum w_i = 1$, $\beta_P^{SMB} = 0$. Additionally we set bounds for the weights such that short/long position are allowed but only up to 100% per share ($w_i \in [-1, 1]\; \forall i \in [1, 2, \ldots, n]$).
In [41]:
# Constraints and bounds
constr = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1},
{'type': 'eq', 'fun': lambda x: np.sum(x * ff3f.T.BetaSMB) - 0}]
bnds = tuple((-1,1) for x in range(len(tickers)))
In [42]:
# Minimization function
optPf = sco.minimize(minSR, x0=wghts, method='SLSQP', bounds=bnds, constraints=constr)
In [43]:
# Check if conditions are actually met
print('Sum of weights: ', np.sum(optPf['x']))
print('Beta SMB factor: ', np.sum(optPf['x'] * ff3f.T.BetaSMB))
In [44]:
# Calculate portfolio stats given optimal weights
rsltsOptPf = pfStats(optPf['x'])
# Format weights into dataframe with Tickers as heading
optWghts = pd.DataFrame(data=optPf['x'], index=tickers)
optWghts.columns = ['optimalWghts']
# Print results
print('Portfolio return: ', str(rsltsOptPf[0]))
print('Portfolio volatility: ', str(rsltsOptPf[1]))
print('Portfolio SR: ', str(rsltsOptPf[2]), '\n')
print(str(optWghts))
In writing this notebook, many ressources were consulted. For internet ressources the links are provided within the textflow above and will therefore not be listed again. Beyond these links, the following ressources were consulted and are recommended as further reading on the discussed topics: