In [9]:
import os
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)


This is a free api which gives bitcoin pricing

This function will download and cache datasets from Quandl.


In [4]:
def get_quandl_data(quandl_id):
    '''Download and cache Quandl dataseries'''
    cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
    try:
        f = open(cache_path,'rb')
        df= pickle.load(f)
        print('Loaded {} from cache'.format(quandl_id))
    except(OSError,IOError) as e:
        print('Downloading {} from Quandl'.format(quandl_id))
        df=quandl.get(quandl_id,returns="pandas")
        df.to_pickle(cache_path)
        print('cached {} at {}'.format(quandl_id,cache_path))
    return df

I'm using pickle to searialize and save the downloaded data as a file which wont redownload everytime ths script is run.(This is the cache that is being referred to).

The data returned is a Pandas dataframe.

We will now pull the bitcoin data from the kraken bitcoin exchange


In [5]:
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')


Downloading BCHARTS/KRAKENUSD from Quandl
cached BCHARTS/KRAKENUSD at BCHARTS-KRAKENUSD.pkl

In [6]:
btc_usd_price_kraken.head()


Out[6]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-01-07 874.67040 892.06753 810.00000 810.00000 15.622378 13151.472844 841.835522
2014-01-08 810.00000 899.84281 788.00000 824.98287 19.182756 16097.329584 839.156269
2014-01-09 825.56345 870.00000 807.42084 841.86934 8.158335 6784.249982 831.572913
2014-01-10 839.99000 857.34056 817.00000 857.33056 8.024510 6780.220188 844.938794
2014-01-11 858.20000 918.05471 857.16554 899.84105 18.748285 16698.566929 890.671709

generating a sample chart to see how the data looks visually.


In [10]:
btc_trace=go.Scatter(x=btc_usd_price_kraken.index,y=btc_usd_price_kraken['Weighted Price'])
py.iplot([btc_trace])


What can be immediately inferred is that there is a missing element which is evident from the sharp drops in late 2014 and early 2016.

The nature of bitcoin is very supply and demand oriented,Hence it entirely depends if and how much the particular bitcoin exchange platform was in use at that time*(A little bit unlclear but this is my basic understanding of cryptocurrency trading,sorry if this is altogether inaccurate!)

To fill in our gaps the most effective solution will be to get datasets from a wide array of exhanges which will fill up most gaps in our plots


In [11]:
#pull data for 3 more exchanges
exchanges = ['COINBASE','BITSTAMP','ITBIT']
exchange_data={}
exchange_data['Kraken']= btc_usd_price_kraken

for exchange in exchanges:
    exchange_code = 'BCHARTS/{}USD'.format(exchange)
    btc_exchange_df= get_quandl_data(exchange_code)
    exchange_data[exchange]=btc_exchange_df


Downloading BCHARTS/COINBASEUSD from Quandl
cached BCHARTS/COINBASEUSD at BCHARTS-COINBASEUSD.pkl
Downloading BCHARTS/BITSTAMPUSD from Quandl
cached BCHARTS/BITSTAMPUSD at BCHARTS-BITSTAMPUSD.pkl
Downloading BCHARTS/ITBITUSD from Quandl
cached BCHARTS/ITBITUSD at BCHARTS-ITBITUSD.pkl

In [19]:
def merge_dfs_on_column(dataframes,labels,col):
    '''merge a single column of each dataframe into a new combined dataframe'''
    series_dict = {}
    for index in range(len(dataframes)):
        series_dict[labels[index]]=dataframes[index][col]
    return pd.DataFrame(series_dict)

In [21]:
#merge the BTC prices into one dataframe
btc_usd_datasets = merge_dfs_on_column(list(exchange_data.values()),list(exchange_data.keys()),'Weighted Price')

In [22]:
btc_usd_datasets.tail()


Out[22]:
BITSTAMP COINBASE ITBIT Kraken
Date
2017-09-01 4808.168193 4826.977522 4815.011536 4814.612632
2017-09-02 4682.157680 4707.420113 4713.761706 4709.216263
2017-09-03 4553.445351 4632.109206 4591.110215 4611.887809
2017-09-04 4299.815249 4437.194533 4414.376755 4384.824318
2017-09-05 4226.855281 4384.321722 4286.346417 4336.218000

In [27]:
def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
    '''Generate a scatter plot of the entire dataframe'''
    label_arr = list(df)
    series_arr = list(map(lambda col: df[col], label_arr))
    
    layout = go.Layout(
        title=title,
        legend=dict(orientation="h"),
        xaxis=dict(type='date'),
        yaxis=dict(
            title=y_axis_label,
            showticklabels= not seperate_y_axis,
            type=scale
        )
    )
    
    y_axis_config = dict(
        overlaying='y',
        showticklabels=False,
        type=scale )
    
    visibility = 'visible'
    if initial_hide:
        visibility = 'legendonly'
        
    # Form Trace For Each Series
    trace_arr = []
    for index, series in enumerate(series_arr):
        trace = go.Scatter(
            x=series.index, 
            y=series, 
            name=label_arr[index],
            visible=visibility
        )
        
        # Add seperate axis for the series
        if seperate_y_axis:
            trace['yaxis'] = 'y{}'.format(index + 1)
            layout['yaxis{}'.format(index + 1)] = y_axis_config    
        trace_arr.append(trace)

    fig = go.Figure(data=trace_arr, layout=layout)
    py.iplot(fig)
    # Plot all of the BTC exchange prices
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')


Next steps would be to clean the datasets by removing zero values and then analyze the trends of correlation to specific market events

This excercise was done to further my understanding of visualising the data and to demonstrate working proficiency in using the various python libraries for getting data from the web.

What has also come to notice is that it is imperative to have domain knowledge to make complete sense of the dataset and analyze the trends.

For this particular instance it is seen that there is a good spike across all the exchanges post jan 2017. This could be attributed to bigger investors/hedge funds getting into bitcoin trading which is driving the prices up?. As cryptocurrency is not subject to market volatility/linked to a particular establishments stature and is driven purely by demand and supply economics the spikes are indicative of a rising interest in crypto currency for investors?

ps. next topic will be taken from a field where i will have more domain knowledge to practice trying to find correlations between the data


In [ ]: