Many studies have documented long-term historical phenomena in securities markets that contradict the Efficient Market Hypothesis. The EMH states it is impossible to "beat the market"because stock market efficiency causes existing share prices to always incorporate and reflect all relevant information.
Behavioral finance attempts to fill the void by proposing psychology-based theories to explain market anomalies. Recently, there have been several papers focusing on what is called "investor sentiment" -- the propensity of individuals to trade on "noise" and emotions rather than facts. Sentiment causes investors to have beliefs about future cash flows and investment risks that aren't justified.
Warren Buffett once said that as an investor it is wise to be “Fearful when others are greedy and greedy when others are fearful.” This statement is somewhat of a contrarian view on stock markets and relates directly to the price of an asset: when others are greedy, prices typically spike, and one should be cautious so that they do not overpay for an asset. When others are fearful however, it may present a good buying opportunity at an undervalued cost. This is an intriguing yet intellectually invigorating topic we are set to explore in our project.
In this project, we ask ourselves one simple question: is the philosophy that aggregate retail investor sentiment is a contrary indicator of future stock market returns correct? To investigate, we explore the possible relationship between investor sentiments and actual stock return. Our project use easily accesible public data to examine whether a negative correlation exists between the two variables. In turn, we are going to validate Buffett's famous investment philosophy by comparing it to our actual results.
In [1]:
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import sys # system module, used to get Python version
import os # operating system tools (check files)
import datetime as dt # date tools, used to note current date
import seaborn as sns
# plotly imports
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
print('\nPython version: ', sys.version)
print('Pandas version: ', pd.__version__)
print("Today's date:", dt.date.today())
%matplotlib inline
plotly.offline.init_notebook_mode()
We use pandas, a Python package that allows for fast data manipulation and analysis. In pandas, a dataframe allows for storing related columns of data. We use matplotlib to generate a variety of figures and graphics. We also used system module to get Python version. Lastly, we used os, a operating system tool to check files.
Using the sentiment data from American Associatioin of Individual Investors and weekly stock return data from the Farma French Data Factors Weekly, we created a list of dataframes. Each sentiment dataframe contains Date, % Bullish, % Neutral and % Bearish. Each stock return dataframe contains Date, Excess Return,SMB,HML,RF and Stock Index. The problem with the sentiment data is that we notice some noises in the dataset. Some of the data are not related to the project, such as annual sentimanet data. Thus, we clean the data by slicing out the portion we would like to examine. We also notice that the dates are not exactly paired for many cells. We perform fuzzy pair and pair stock return data with the dates of the sentiment data. We then concatenate the two datasets, which leaves us with a clean side-by-side comparison.
In [2]:
# sentiment data
sentiment = pd.read_excel("http://www.aaii.com/files/surveys/sentiment.xls",skiprows=3);
stm = sentiment[["Date","Bullish","Neutral","Bearish"]]
In [3]:
stm.head()
Out[3]:
In [4]:
# We have some noises in the dataset, for example:
stm.tail()
Out[4]:
In [5]:
# clean sentiment data
k=[];
for i in range(len(stm.index)):
if (type(stm["Date"][i])==type(stm["Date"][3])):
k.append(i)
stm2 = stm.loc[k].reset_index(drop="Index")
stm2["Date"] = pd.to_datetime(stm2["Date"])
In [6]:
stm2.head()
Out[6]:
In [7]:
# weekly stock return data
from pandas_datareader.famafrench import get_available_datasets
import pandas_datareader.data as web
get_available_datasets();
r = web.DataReader('F-F_Research_Data_factors_weekly', 'famafrench')[0];
names = ["Excess_Return","SMB","HML","RF"];
r.columns = names
# Slice Stock return data starting from sentiment data's beginning date
start = r.index.searchsorted(stm2["Date"][0]);
rd = r.ix[start:]
r2 = rd.reset_index();
Here, we create a replicative version of stock index which starts from 1 and multiplies weekly returns from the beginning date of the dataset. Why are we doing this? This is because we may have some very interesting graphs to show and we need an index to track the stock price movement and calculate the returns:
In [8]:
# create an index that starts from 1 and weekly return
w=[]
kk = 1
for i in range(len(r2)):
kk = kk*(1+r2["Excess_Return"][i]/100+r2["RF"][i]/100);
w.append(kk)
r2["Stock_Index"] = pd.Series(w);
In [9]:
iplot_mpl(r2.set_index("Date")["Stock_Index"].plot(figsize=(12,6)).get_figure())
r2.head()
Out[9]:
The sample data below shows that the dates are not paired exactly, and thus fuzzy pairing, i.e., matching the dates with approximation is necessary here:
In [10]:
#The dates are not exactly paired for many cells. For example:
print(r2["Date"].loc[1496:1499])
print(stm2["Date"].loc[1493:1496])
In [11]:
# fuzzy pair the dates in two datasets (Caution: It may run for a while. Please be patient!)
dates = stm2["Date"];
dates2 = r2["Date"];
ii=[];
jj=[];
for i in range(len(dates)):
for j in range(i,len(dates2)):
transformed_date1 = dates[i];
transformed_date2 = dates2[j];
timediff = transformed_date2 - transformed_date1;
if abs(timediff.days) < 3:
ii.append(i);
jj.append(j);
print(len(ii)==len(jj)) #check if they are paired
In [12]:
# Concatenate two datasets
stm3 = stm2.ix[ii].reset_index(drop="index");
stm3 = stm3.rename(columns={"Date":"Report_Date"});
r3 = r2.ix[jj].reset_index(drop="index");
result = pd.concat([stm3, r3], axis=1);
In [13]:
result.tail()
Out[13]:
In [14]:
# Select Columns we would like to examine
examine = result[["Date","Bullish","Bearish","Excess_Return","Stock_Index"]].set_index("Date")
examine.head()
Out[14]:
In the box plot we create for Sentiment Data, we can see that the mean of investors feeling Bullish is generally higher than those who feel bearish.
In [15]:
#Boxplot of Sentiment Data
long = pd.melt(examine, value_vars=['Bullish','Bearish'], var_name='Sentiment', value_name='Ratio')
plt.figure(figsize=(6,8))
sns.boxplot(data=long, x="Sentiment", y="Ratio",palette=["g","r"])
plt.xlabel("Sentiment",fontsize=18)
plt.show()
In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. We can see from our plot below that Bearish sentiment is more rihgt tailed, while Bullish sentiment is more similar to the normal distribution.
In [16]:
#KDE of Sentiment Data
fig, ax1 = plt.subplots(figsize=(10,6))
sns.kdeplot(examine["Bullish"], ax=ax1, color="g")
sns.kdeplot(examine["Bearish"],ax=ax1, color="r")
ax1.legend()
fig.suptitle("Kernel Density",fontsize=18)
plt.show()
We performed scatterplot analysis to examine the correlation between Bullish and Bearish Sentiment. We can see that there is a negative correlation between Bullish and Brearish sentiment. From the scatterplot of Stock Return vs. Sentiment Data, we notice that there is a nonconstant variance between Bullish and Bearish sentiment. However, in both cases, we see no correlation between Sentiment Data and Stock Return.
In [17]:
#Scatterplot and Correlation of Bullish versus Bearish Data
plt.figure(figsize=(6,6))
k = sns.regplot(x="Bullish", y="Bearish", data=examine)
note = "Correlation is " + str( round(examine["Bullish"].corr(examine["Bearish"]),3))
k.figure.text(0.4, 0.8, note, fontsize=18, weight="bold")
Out[17]:
In [18]:
sns.jointplot(x="Bullish", y="Excess_Return", data=examine,color="g")
sns.jointplot(x="Bearish", y="Excess_Return", data=examine,color="r")
plt.show()
The regression result shows that coefficients of Bullish and bearish sentiment data is not significantly away from zero and the model barely explains the variations of weekly stock returns, with R-squared of 0.006. This is not surprising because otherwise some intelligent players would have earned tons of money by tracking the investor sentiment.
In [19]:
# Regression of Excess Return on Bullish and Bearish Sentiment Data
import statsmodels.formula.api as sm
reg = sm.ols(formula="Excess_Return ~ Bullish + Bearish", data=examine).fit()
reg.summary()
Out[19]:
Although regression result doesn't tell us anything useful about the relationship between investor sentiment and stock returns, we may still be interested to see whether an extremely bullish or bearish datapoint tells us something about the market timing. In other words, is an extremely bullish point implies that the market is about to reach the peak, or is an extreme bearish point suggests that the market is at the bottom? To do this, we first source the very bullish and bearish datapoints that are over 3.5 standard deviations above the mean. Then we use the replicative stock index that we generated before and see what happened to the stock market one year after a super bullish or bearish point is detected.
In [23]:
#Define outliers as beyond +3.5 to -3.5 standard deviation. Subject to change.
toobull = examine[(examine.Bullish - examine.Bullish.mean())>=(3.5*examine.Bullish.std())].reset_index();
toobull
Out[23]:
In [24]:
#Define outliers as beyond +3 to -3 standard deviation. Subject to change.
toobear = examine[(examine.Bearish - examine.Bearish.mean())>=(3.5*examine.Bearish.std())].reset_index();
toobear
Out[24]:
In [22]:
for i in range(len(toobull)):
num = examine.index.get_loc(toobull["Date"][i]);
num2 = num - 52;
num3 = num + 53;
toobull_index_before = examine["Stock_Index"][num2:num+1] #Start from the previous 52nd week to this week
toobull_index = examine["Stock_Index"][num:num3] #Start from this week to the 52nd week
Annual_Return = str(round((toobull_index[len(toobull_index)-1] -
toobull_index[0]) *100 / toobull_index[0],2)) + "%";
plt.figure(figsize=(12,6))
plt.plot(toobull_index_before,color="b",alpha=0.2)
plt.plot(toobull_index,color="r")
plt.ylabel("Stock Index")
plt.title("Annual Return: "+Annual_Return, fontsize=20, loc="left", weight="bold")
plt.annotate("Sell",
xy=(toobull["Date"][i], toobull["Stock_Index"][i]+0.04),
xytext=(toobull["Date"][i], toobull["Stock_Index"][i]+0.2),
fontsize=12,
weight="bold",
arrowprops=dict(facecolor='red', shrink=0.05))
for i in range(len(toobear)):
num = examine.index.get_loc(toobear["Date"][i]);
num2 = num - 52;
num3 = num + 53;
toobear_index_before = examine["Stock_Index"][num2:num+1] #Start from the previous 52nd week to this week
toobear_index = examine["Stock_Index"][num:num3] #Start from this week to the next 52nd week
Annual_Return = str(round((toobear_index[len(toobear_index)-1] -
toobear_index[0]) *100 / toobear_index[0],2)) + "%";
plt.figure(figsize=(12,6))
plt.plot(toobear_index_before,color="b",alpha=0.2)
plt.plot(toobear_index,color="g")
plt.ylabel("Stock Index")
plt.title("Annual Return: "+Annual_Return, fontsize=20, loc="left", weight="bold")
plt.annotate("Buy",
xy=(toobear["Date"][i], toobear["Stock_Index"][i]-0.02),
xytext=(toobear["Date"][i], toobear["Stock_Index"][i]-0.06),
fontsize=12,
weight="bold",
arrowprops=dict(facecolor='green', shrink=0.05))
The result shows a very interesting market-timing skill. We can see that all three datapoints look like turning points in the stock index movement. For example, when highly bullish atmosphere was discovered in January 2000, the dot-com bubble was about to collapse. And when everyone was extremely pessimistic about the financial market, a 5-year bull market was coming. Indeed, this experimental design is not perfect. First, there are only extreme observations found in the dataset, meaning that there is no sufficient evidence to justify these market-timing patterns. Second, the experiment is retrospective rather than prospective, where we used current standard deviations of predictors and back-forecasted the results. Nevertheless, these outliers could be still indicative in evaluating the market inflection points.
We believe that Buffet's saying is somehow informative to investors. "Be fearful" when the market feels extremely bullish and "be greedy" when the market is way too pessimistic. Our findings from historical data suggest that when a positive outlier from bullish and bearish sentiment data was discovered, the date of such outlier may be the date for market to switch sign. We may further the study by connecting the sentiment data with macroeconomic factors to have better predictive power in marking down the turning points of the stock market.
"AAII | AAII Investor Sentiment Data." AAII | AAII Investor Sentiment Data. Quandl, n.d. Web. 4 May 2016.
"Efficient Market Hypothesis (EMH) Definition | Investopedia." Investopedia. N.p., 18 Nov. 2003. Web. 2 May 2016.
"How Investor Sentiment Affects Returns." CBSNews. CBS Interactive, n.d. Web. 2 May 2016.
"KFRENCH | Fama/French Factors (Weekly)." KFRENCH | Fama/French Factors (Weekly). Quandl, n.d. Web. 3 May 2016.