In [1]:
import pandas as pd
import datetime
import numpy as np
import scipy as sp
import os
import matplotlib.pyplot as plt
import matplotlib
from ggplot import geom_point
%matplotlib inline
# font = {'size' : 18}
# matplotlib.rc('font', **font)
matplotlib.rcParams['figure.figsize'] = (12.0, 6.0)
os.chdir("/Users/yulongyang/Envs/btc-analysis/btc-price-analysis")
In [3]:
time_format = "%Y-%m-%dT%H:%M:%S"
data = pd.read_csv("./data/price.csv", names=['time', 'price'], index_col='time',
parse_dates=[0], date_parser=lambda x: datetime.datetime.strptime(x[:-6], time_format))
data.head()
Out[3]:
Data points are roughly 10 seconds per point.
In [4]:
data.plot(rot=0)
Out[4]:
In [5]:
df2015 = data['2015']
df2015.plot(rot=0)
Out[5]:
Moving average is the average value of a consecutive subset of data. It is mostly used to smooth out the fluctuation and highlight the trend in general.
In [9]:
pd.rolling_mean(df2015, 60).plot(rot=0, label='mov avg')
plt.legend()
Out[9]:
A classic trading strategy involves detecting momentum of stock prices [1]. To detect momentum, we could utilize moving average. By defining two moving average with different window size, we would be able to spot the discrepency lies in the long-term and short-term momentum, and thusly trading the stock when appropriate.
In [16]:
fig, axes = plt.subplots()
# 1-hour window
pd.rolling_mean(df2015, 6*2).plot(rot=0, ax=axes, legend=False)
# 1-day window
pd.rolling_mean(df2015, 6*24).plot(rot=0, ax=axes, legend=False)
plt.legend(['short','long'], loc="lower right")
Out[16]:
Basic gist is, when two lines overlap,
If the transaction time/duration is different, how would this strategy be changed? Could we already form a trading strategy based on this?
Another moving metric of interest would be pairwise moving correlation. (maybe with say volume?)
Return is defined as the change in percentage over time:
$$r_t = \frac{p_t}{p_{t-1}} - 1$$
In [18]:
df2015ohlc = df2015.resample('d', how='ohlc')
df2015ohlc.head()
Out[18]:
In [19]:
df2015pct = df2015ohlc['price']['close'].pct_change()
In [20]:
df2015pct.plot(legend=True, label='daily return index')
Out[20]:
The return index depends also on the frequency of the transaction. However, the frequency of BTC might be different from the regualr stock transaction. A transaction in BTC is appending to a block chain, which will be verified by BTC hashing machines (miners). In general, a transaction is considered to be complete and safe enough after 6 different confirmation or more. Confirmations reduce the risk of fraud actions exponentially. The average confirmation wait time for BTC transaction is about 10 minutes. On the other hand, it is said that for newly mined coins one has to wait for more than 100 confirmations.
If we are looking at the particular platform, such as coinbase, they might have different time of confirmation. Our strategies or algorithms might need to depend on this because the time of the prediction would be different if we could only sell out the coin after a certain period of time.
For now, let's say we could only sell the coin 10 minutes after we bought it. Therefore, we are looking at the 10-minute return.
In [21]:
# select only weekend data of 2015
df2015ohlc_wknd = df2015[df2015.index.weekday>3].resample('d', how='ohlc')
# compute daily returns
wknd_return = df2015ohlc_wknd['price']['close'].pct_change()
wknd_return.describe()
Out[21]:
In [22]:
wknd_return.plot(legend=True, label='daily return index')
Out[22]:
Compare with weekdays:
In [23]:
# select only weekend data of 2015
df2015ohlc_wkdy = df2015[df2015.index.weekday<5].resample('d', how='ohlc')
# compute daily returns
wkdy_return = df2015ohlc_wkdy['price']['close'].pct_change()
wkdy_return.describe()
Out[23]:
In [24]:
wkdy_return.plot(legend=True, label='daily return index')
Out[24]:
So it seems the returns are rather arbitrary here. We dont have any different for weekday or weekend.
In [25]:
df2015pct_groupby = df2015ohlc['price']['close'].pct_change().groupby(df2015ohlc.index.weekday)
df2015pct_groupby.mean().plot(yerr=df2015pct_groupby.std(), xlim=[-1,7])
Out[25]:
Let's try all data.
In [26]:
dataohlc = data.resample('d', how='ohlc')
datapct = dataohlc['price']['close'].pct_change()
In [27]:
datapct_groupby = datapct.groupby(datapct.index.weekday)
datapct_groupby.mean().plot(yerr=datapct_groupby.std(), xlim=[-1,7])
Out[27]:
After all, it seems while we have a somewhat trend here for the return of each day, the huge std indicates returns are not following the trend at all for any day. This again shows the randomness and instability of BTC price now.
In [28]:
df2015hourly = df2015.resample('h', how='ohlc')
df2015hourly.head()
Out[28]:
In [29]:
df2015hourly_groupby = df2015hourly['price']['close'].groupby(df2015hourly.index.hour)
df2015hourly_groupby.mean().plot(yerr=df2015hourly_groupby.std())
Out[29]:
....definitely no pattern here.
In [30]:
plt.figure(figsize=(6,6))
plt.scatter(datapct.mean(), datapct.std(), label='all data')
plt.scatter(df2015pct.mean(), df2015pct.std(), label='2015 data', color='red')
plt.scatter(datapct['1/1/2015':'1/15/2015'].mean(), datapct['1/1/2015':'1/15/2015'].std(),
label='2015 down trend', color='green')
plt.scatter(datapct['1/16/2015':].mean(), datapct['1/16/2015':].std(), label='2015 up trend', color='grey')
plt.legend()
plt.xlabel("Average return")
plt.ylabel("Average risk")
Out[30]:
One can see that, first, the BTC has been generally high return ever since. This is probably largely due to the spike to $1000 in 2013. However, in 2015, the return has not been made to positive yet.
Look a bit deeper, we notice the return index of 2015 could largely due to a down fall at the beginning of the year, which drew the price to ~$180. Ever since, the price has been rising and falling. If we plot the return of the two periods separately, we find that the period after the free fall shows a very good return index associated with a relatively small risk.