In [1]:
import pandas as pd
import datetime
import numpy as np
import scipy as sp
import os
import matplotlib.pyplot as plt
import matplotlib
from ggplot import geom_point
%matplotlib inline
# font = {'size'   : 18}
# matplotlib.rc('font', **font)
matplotlib.rcParams['figure.figsize'] = (12.0, 6.0)
os.chdir("/Users/yulongyang/Envs/btc-analysis/btc-price-analysis")

In [3]:
time_format = "%Y-%m-%dT%H:%M:%S"
data = pd.read_csv("./data/price.csv", names=['time', 'price'], index_col='time',
                   parse_dates=[0], date_parser=lambda x: datetime.datetime.strptime(x[:-6], time_format))
data.head()


Out[3]:
price
time
2015-03-23 20:53:03 263.00
2015-03-23 20:43:03 264.04
2015-03-23 20:33:03 263.49
2015-03-23 20:23:03 264.61
2015-03-23 20:13:03 264.25

Data points are roughly 10 seconds per point.

Overall plot it out


In [4]:
data.plot(rot=0)


Out[4]:
<matplotlib.axes.AxesSubplot at 0x104472fd0>

2015 data


In [5]:
df2015 = data['2015']
df2015.plot(rot=0)


Out[5]:
<matplotlib.axes.AxesSubplot at 0x10c5268d0>

Moving average

Moving average is the average value of a consecutive subset of data. It is mostly used to smooth out the fluctuation and highlight the trend in general.

moving average per 10 hours of 2015 data


In [9]:
pd.rolling_mean(df2015, 60).plot(rot=0, label='mov avg')
plt.legend()


Out[9]:
<matplotlib.legend.Legend at 0x10a477650>

long window v. short window (dual moving average)

A classic trading strategy involves detecting momentum of stock prices [1]. To detect momentum, we could utilize moving average. By defining two moving average with different window size, we would be able to spot the discrepency lies in the long-term and short-term momentum, and thusly trading the stock when appropriate.


In [16]:
fig, axes = plt.subplots()
# 1-hour window
pd.rolling_mean(df2015, 6*2).plot(rot=0, ax=axes, legend=False)
# 1-day window
pd.rolling_mean(df2015, 6*24).plot(rot=0, ax=axes, legend=False)
plt.legend(['short','long'], loc="lower right")


Out[16]:
<matplotlib.legend.Legend at 0x10e4ac850>

Basic gist is, when two lines overlap,

  1. short-term average surpasses long-term average indicates a price rise
  2. short-term average falls below long-term average indicates a price drop

If the transaction time/duration is different, how would this strategy be changed? Could we already form a trading strategy based on this?

Another moving metric of interest would be pairwise moving correlation. (maybe with say volume?)

Returns

Return is defined as the change in percentage over time:

$$r_t = \frac{p_t}{p_{t-1}} - 1$$

Daily return index


In [18]:
df2015ohlc = df2015.resample('d', how='ohlc')
df2015ohlc.head()


Out[18]:
price
open high low close
time
2015-01-01 315.70 317.33 312.93 313.28
2015-01-02 314.24 316.74 306.49 306.49
2015-01-03 306.30 311.36 278.57 279.94
2015-01-04 281.80 287.93 257.02 273.52
2015-01-05 277.15 279.03 264.39 277.80

In [19]:
df2015pct = df2015ohlc['price']['close'].pct_change()

In [20]:
df2015pct.plot(legend=True, label='daily return index')


Out[20]:
<matplotlib.axes.AxesSubplot at 0x1109c4d10>

The return index depends also on the frequency of the transaction. However, the frequency of BTC might be different from the regualr stock transaction. A transaction in BTC is appending to a block chain, which will be verified by BTC hashing machines (miners). In general, a transaction is considered to be complete and safe enough after 6 different confirmation or more. Confirmations reduce the risk of fraud actions exponentially. The average confirmation wait time for BTC transaction is about 10 minutes. On the other hand, it is said that for newly mined coins one has to wait for more than 100 confirmations.

If we are looking at the particular platform, such as coinbase, they might have different time of confirmation. Our strategies or algorithms might need to depend on this because the time of the prediction would be different if we could only sell out the coin after a certain period of time.

For now, let's say we could only sell the coin 10 minutes after we bought it. Therefore, we are looking at the 10-minute return.

Weekend effect?

As BTC could be traded during the weekend, would the price behave differently in weekend?


In [21]:
# select only weekend data of 2015
df2015ohlc_wknd = df2015[df2015.index.weekday>3].resample('d', how='ohlc')
# compute daily returns
wknd_return = df2015ohlc_wknd['price']['close'].pct_change()
wknd_return.describe()


Out[21]:
count    35.000000
mean     -0.000178
std       0.081141
min      -0.264245
25%      -0.025278
50%       0.006581
75%       0.040831
max       0.153357
Name: close, dtype: float64

In [22]:
wknd_return.plot(legend=True, label='daily return index')


Out[22]:
<matplotlib.axes.AxesSubplot at 0x110a6c650>

Compare with weekdays:


In [23]:
# select only weekend data of 2015
df2015ohlc_wkdy = df2015[df2015.index.weekday<5].resample('d', how='ohlc')
# compute daily returns
wkdy_return = df2015ohlc_wkdy['price']['close'].pct_change()
wkdy_return.describe()


Out[23]:
count    56.000000
mean     -0.001272
std       0.060435
min      -0.233355
25%      -0.025529
50%       0.000388
75%       0.033654
max       0.118431
Name: close, dtype: float64

In [24]:
wkdy_return.plot(legend=True, label='daily return index')


Out[24]:
<matplotlib.axes.AxesSubplot at 0x1116e8210>

So it seems the returns are rather arbitrary here. We dont have any different for weekday or weekend.

Monday effect? (single day return)

Let's see if the price behaves differently on each day.


In [25]:
df2015pct_groupby = df2015ohlc['price']['close'].pct_change().groupby(df2015ohlc.index.weekday)
df2015pct_groupby.mean().plot(yerr=df2015pct_groupby.std(), xlim=[-1,7])


Out[25]:
<matplotlib.axes.AxesSubplot at 0x1120131d0>

Let's try all data.


In [26]:
dataohlc = data.resample('d', how='ohlc')
datapct = dataohlc['price']['close'].pct_change()

In [27]:
datapct_groupby = datapct.groupby(datapct.index.weekday)
datapct_groupby.mean().plot(yerr=datapct_groupby.std(), xlim=[-1,7])


Out[27]:
<matplotlib.axes.AxesSubplot at 0x1111daa90>

After all, it seems while we have a somewhat trend here for the return of each day, the huge std indicates returns are not following the trend at all for any day. This again shows the randomness and instability of BTC price now.

Hourly price

Would time of the day have any effect on the price?


In [28]:
df2015hourly = df2015.resample('h', how='ohlc')
df2015hourly.head()


Out[28]:
price
open high low close
time
2015-01-01 00:00:00 315.70 316.30 314.73 316.30
2015-01-01 01:00:00 316.12 316.93 315.76 316.41
2015-01-01 02:00:00 315.70 317.33 314.78 316.30
2015-01-01 03:00:00 316.12 316.93 315.11 315.60
2015-01-01 04:00:00 317.33 317.33 315.30 316.86

In [29]:
df2015hourly_groupby = df2015hourly['price']['close'].groupby(df2015hourly.index.hour)
df2015hourly_groupby.mean().plot(yerr=df2015hourly_groupby.std())


Out[29]:
<matplotlib.axes.AxesSubplot at 0x112c46ed0>

....definitely no pattern here.

Return v. risk

It is useful to plot the return index against the associated risk. The risk is defined by the variance of the return index.


In [30]:
plt.figure(figsize=(6,6))
plt.scatter(datapct.mean(), datapct.std(), label='all data')
plt.scatter(df2015pct.mean(), df2015pct.std(), label='2015 data', color='red')
plt.scatter(datapct['1/1/2015':'1/15/2015'].mean(), datapct['1/1/2015':'1/15/2015'].std(), 
            label='2015 down trend', color='green')
plt.scatter(datapct['1/16/2015':].mean(), datapct['1/16/2015':].std(), label='2015 up trend', color='grey')
plt.legend()
plt.xlabel("Average return")
plt.ylabel("Average risk")


Out[30]:
<matplotlib.text.Text at 0x112fa69d0>

One can see that, first, the BTC has been generally high return ever since. This is probably largely due to the spike to $1000 in 2013. However, in 2015, the return has not been made to positive yet.

Look a bit deeper, we notice the return index of 2015 could largely due to a down fall at the beginning of the year, which drew the price to ~$180. Ever since, the price has been rising and falling. If we plot the return of the two periods separately, we find that the period after the free fall shows a very good return index associated with a relatively small risk.