In [9]:
import os
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
This is a free api which gives bitcoin pricing
This function will download and cache datasets from Quandl.
In [4]:
def get_quandl_data(quandl_id):
'''Download and cache Quandl dataseries'''
cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
try:
f = open(cache_path,'rb')
df= pickle.load(f)
print('Loaded {} from cache'.format(quandl_id))
except(OSError,IOError) as e:
print('Downloading {} from Quandl'.format(quandl_id))
df=quandl.get(quandl_id,returns="pandas")
df.to_pickle(cache_path)
print('cached {} at {}'.format(quandl_id,cache_path))
return df
I'm using pickle to searialize and save the downloaded data as a file which wont redownload everytime ths script is run.(This is the cache that is being referred to).
The data returned is a Pandas dataframe.
We will now pull the bitcoin data from the kraken bitcoin exchange
In [5]:
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')
In [6]:
btc_usd_price_kraken.head()
Out[6]:
generating a sample chart to see how the data looks visually.
In [10]:
btc_trace=go.Scatter(x=btc_usd_price_kraken.index,y=btc_usd_price_kraken['Weighted Price'])
py.iplot([btc_trace])
What can be immediately inferred is that there is a missing element which is evident from the sharp drops in late 2014 and early 2016.
The nature of bitcoin is very supply and demand oriented,Hence it entirely depends if and how much the particular bitcoin exchange platform was in use at that time*(A little bit unlclear but this is my basic understanding of cryptocurrency trading,sorry if this is altogether inaccurate!)
To fill in our gaps the most effective solution will be to get datasets from a wide array of exhanges which will fill up most gaps in our plots
In [11]:
#pull data for 3 more exchanges
exchanges = ['COINBASE','BITSTAMP','ITBIT']
exchange_data={}
exchange_data['Kraken']= btc_usd_price_kraken
for exchange in exchanges:
exchange_code = 'BCHARTS/{}USD'.format(exchange)
btc_exchange_df= get_quandl_data(exchange_code)
exchange_data[exchange]=btc_exchange_df
In [19]:
def merge_dfs_on_column(dataframes,labels,col):
'''merge a single column of each dataframe into a new combined dataframe'''
series_dict = {}
for index in range(len(dataframes)):
series_dict[labels[index]]=dataframes[index][col]
return pd.DataFrame(series_dict)
In [21]:
#merge the BTC prices into one dataframe
btc_usd_datasets = merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')
In [22]:
btc_usd_datasets.tail()
Out[22]:
In [27]:
def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
'''Generate a scatter plot of the entire dataframe'''
label_arr = list(df)
series_arr = list(map(lambda col: df[col], label_arr))
layout = go.Layout(
title=title,
legend=dict(orientation="h"),
xaxis=dict(type='date'),
yaxis=dict(
title=y_axis_label,
showticklabels= not seperate_y_axis,
type=scale
)
)
y_axis_config = dict(
overlaying='y',
showticklabels=False,
type=scale )
visibility = 'visible'
if initial_hide:
visibility = 'legendonly'
# Form Trace For Each Series
trace_arr = []
for index, series in enumerate(series_arr):
trace = go.Scatter(
x=series.index,
y=series,
name=label_arr[index],
visible=visibility
)
# Add seperate axis for the series
if seperate_y_axis:
trace['yaxis'] = 'y{}'.format(index + 1)
layout['yaxis{}'.format(index + 1)] = y_axis_config
trace_arr.append(trace)
fig = go.Figure(data=trace_arr, layout=layout)
py.iplot(fig)
# Plot all of the BTC exchange prices
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
Next steps would be to clean the datasets by removing zero values and then analyze the trends of correlation to specific market events
This excercise was done to further my understanding of visualising the data and to demonstrate working proficiency in using the various python libraries for getting data from the web.
What has also come to notice is that it is imperative to have domain knowledge to make complete sense of the dataset and analyze the trends.
For this particular instance it is seen that there is a good spike across all the exchanges post jan 2017. This could be attributed to bigger investors/hedge funds getting into bitcoin trading which is driving the prices up?. As cryptocurrency is not subject to market volatility/linked to a particular establishments stature and is driven purely by demand and supply economics the spikes are indicative of a rising interest in crypto currency for investors?
ps. next topic will be taken from a field where i will have more domain knowledge to practice trying to find correlations between the data
In [ ]: