Cryptocurrency prediction - DEMO

This is a simple demo for cryptocurrency prediction based on daily data. It does not work, so don't blame me if you lose your money.

Created by Judit Acs

Data source on kaggle.com



In [1]:

    
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from keras.layers import Input, Dense, Bidirectional, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Model
from keras.callbacks import EarlyStopping

import numpy as np









    



Using TensorFlow backend.



In [2]:

    
os.listdir("data/cryptocurrency/")









    Out[2]:





['bitcoin_cash_price.csv',
 'bitcoin_dataset.csv',
 'bitcoin_price.csv',
 'bitconnect_price.csv',
 'dash_price.csv',
 'ethereum_classic_price.csv',
 'ethereum_dataset.csv',
 'ethereum_price.csv',
 'iota_price.csv',
 'litecoin_price.csv',
 'monero_price.csv',
 'nem_price.csv',
 'neo_price.csv',
 'numeraire_price.csv',
 'omisego_price.csv',
 'qtum_price.csv',
 'ripple_price.csv',
 'stratis_price.csv',
 'waves_price.csv']

Load data

We load each currency into a separate dataframe and store the dataframes in a dictionary.



In [3]:

    
coin_dataframes = {}

def convert_comma_int(field):
    try:
        return int(field.replace(',', ''))
    except ValueError:
        return None
    
for fn in os.listdir("data/cryptocurrency/"):
    if "bitcoin_cache" in fn:
        continue
    if fn.endswith("_price.csv"):
        coin_name = fn.split("_")[0]
        df = pd.read_csv(os.path.join("data", "cryptocurrency", fn), parse_dates=["Date"])
        df['Market Cap'] = df['Market Cap'].map(convert_comma_int)
        coin_dataframes[coin_name] = df.sort_values('Date')



In [4]:

    
coin_dataframes.keys()









    Out[4]:





dict_keys(['bitcoin', 'bitconnect', 'dash', 'ethereum', 'iota', 'litecoin', 'monero', 'nem', 'neo', 'numeraire', 'omisego', 'qtum', 'ripple', 'stratis', 'waves'])

Each dataframe looks like this:



In [5]:

    
coin_dataframes['nem'].head()

Bitcoin value growth

Just for fun.



In [6]:

    
coin_dataframes['bitcoin'].plot(x='Date', y='Close')









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f73fab9f320>

Compute relative growth and other relative values

We add these values as new columns to the dataframes:



In [7]:

    
def add_relative_columns(df):
    day_diff = df['Close'] - df['Open']
    df['rel_close'] = day_diff / df['Open']
    df['high_low_ratio'] = df['High'] / df['Low']
    df['rel_high'] = df['High'] / df['Close']
    df['rel_low'] = df['Low'] / df['Close']
    
    
for df in coin_dataframes.values():
    add_relative_columns(df)
    
coin_dataframes["nem"].head()









    Out[7]:







  
    
      
      Date
      Open
      High
      Low
      Close
      Volume
      Market Cap
      rel_close
      high_low_ratio
      rel_high
      rel_low
    
  
  
    
      916
      2015-04-01
      0.000400
      0.000458
      0.000170
      0.000242
      65,525
      3604110
      -0.395000
      2.694118
      1.892562
      0.702479
    
    
      915
      2015-04-02
      0.000242
      0.000323
      0.000227
      0.000314
      28,549
      2179320
      0.297521
      1.422907
      1.028662
      0.722930
    
    
      914
      2015-04-03
      0.000309
      0.000330
      0.000291
      0.000310
      20,678
      2778750
      0.003236
      1.134021
      1.064516
      0.938710
    
    
      913
      2015-04-04
      0.000310
      0.000318
      0.000251
      0.000277
      23,550
      2792120
      -0.106452
      1.266932
      1.148014
      0.906137
    
    
      912
      2015-04-05
      0.000272
      0.000283
      0.000218
      0.000232
      26,800
      2443720
      -0.147059
      1.298165
      1.219828
      0.939655

Create historical training data

The history tables will have values for the last 10 days for each day.



In [8]:

    
def create_history_frames(coin_dataframes):
    history_frames = {}
    for coin_name, df in coin_dataframes.items():
        history_frames[coin_name], x_cols = create_history_frame(df)
    return history_frames, x_cols
        

def create_history_frame(df):
    feature_cols = ['rel_close', 'rel_high', 'rel_low', 'high_low_ratio']
    y_col = ['rel_close']
    x_cols = []
    days = 10
    history = df[['Date'] + y_col].copy()
    for n in range(1, days+1):
        for feat_col in feature_cols:
            colname = '{}_{}'.format(feat_col, n)
            history[colname] = df[feat_col].shift(n)
            x_cols.append(colname)
    history = history[days:]
    return history, x_cols

y_col = 'rel_close'
coin_history, x_cols = create_history_frames(coin_dataframes)

Define model

We will train a separate model for each currency. The models' architecture identical.



In [9]:

    
def create_model():
    input_layer = Input(batch_shape=(None, len(x_cols), 1))
    layer = Bidirectional(LSTM(128, return_sequences=True))(input_layer)
    layer = Bidirectional(LSTM(128))(layer)
    out = Dense(1, activation="sigmoid")(layer)
    m = Model(inputs=input_layer, outputs=out)
    m.compile("rmsprop", loss='mean_squared_error')
    return m

def create_train_test_mtx(history):
    X = history[x_cols].as_matrix()
    y = history[y_col].as_matrix()
    X = X.reshape(X.shape[0], X.shape[1], 1)
    rand_mtx = np.random.permutation(X.shape[0])
    train_split = int(X.shape[0] * 0.9)
    train_indices = rand_mtx[:train_split]
    test_indices = rand_mtx[train_split:]

    X_train = X[train_indices]
    X_test = X[test_indices]
    y_train = y[train_indices]
    y_test = y[test_indices]
    
    return X_train, X_test, y_train, y_test

def train_model(model, X, y):
    ea = EarlyStopping(monitor='val_loss', patience=2)
    val_loss = model.fit(X, y, epochs=500, batch_size=64, callbacks=[ea], verbose=0, validation_split=.1)
    return val_loss

Train a model for each currency

We save RMSE as well as the predictions on each test set.



In [10]:

    
rmse = {}
pred = {}
test = {}

for coin_name, history in coin_history.items():
    model = create_model()
    X_train, X_test, y_train, y_test = create_train_test_mtx(history)
    train_model(model, X_train, y_train)
    test[coin_name] = y_test
    
    # run prediction on test set
    pred[coin_name] = model.predict(X_test)
    # compute test loss
    rmse[coin_name] = np.sqrt(np.mean((pred[coin_name] - y_test)**2))
    print(coin_name, rmse[coin_name])









    



bitcoin 0.0395829767904
bitconnect 0.105878993163
dash 0.067239614115
ethereum 0.0779520952451
iota 0.116951050119
litecoin 0.120207499218
monero 0.05655138178
nem 0.0749458345141
neo 0.081203570008
numeraire 0.143091012434
omisego 0.100255266302
qtum 0.0939036685546
ripple 0.165650145426
stratis 0.103679609164
waves 0.0830779387789

Do our models predict the signum of the value change correctly?



In [13]:

    
pred_sign = {coin_name: np.sign(pred[coin_name]) * np.sign(test[coin_name]) for coin_name in pred.keys()}
for coin, val in sorted(pred_sign.items()):
    cnt = np.unique(pred_sign[coin], return_counts=True)[1]
    print("[{}] pos/neg change guessed correctly: {}, incorrectly: {}, correct%: {}".format(
        coin, cnt[0], cnt[1], cnt[0]/ (cnt[0]+cnt[1]) * 100))









    



[bitcoin] pos/neg change guessed correctly: 11914, incorrectly: 14007, correct%: 45.962732919254655
[bitconnect] pos/neg change guessed correctly: 350, incorrectly: 275, correct%: 56.00000000000001
[dash] pos/neg change guessed correctly: 7524, incorrectly: 396, correct%: 95.0
[ethereum] pos/neg change guessed correctly: 3120, incorrectly: 2964, correct%: 51.28205128205128
[iota] pos/neg change guessed correctly: 66, incorrectly: 55, correct%: 54.54545454545454
[litecoin] pos/neg change guessed correctly: 13041, incorrectly: 1449, correct%: 90.0
[monero] pos/neg change guessed correctly: 6765, incorrectly: 8364, correct%: 44.71544715447154
[nem] pos/neg change guessed correctly: 4914, incorrectly: 3367, correct%: 59.34065934065934
[neo] pos/neg change guessed correctly: 760, incorrectly: 684, correct%: 52.63157894736842
[numeraire] pos/neg change guessed correctly: 50, incorrectly: 50, correct%: 50.0
[omisego] pos/neg change guessed correctly: 16, incorrectly: 48, correct%: 25.0
[qtum] pos/neg change guessed correctly: 91, incorrectly: 78, correct%: 53.84615384615385
[ripple] pos/neg change guessed correctly: 12616, incorrectly: 152, correct%: 98.80952380952381
[stratis] pos/neg change guessed correctly: 1025, incorrectly: 656, correct%: 60.97560975609756
[waves] pos/neg change guessed correctly: 1104, incorrectly: 1200, correct%: 47.91666666666667

Did we guess anything useful at all?



In [12]:

    
pred_sign = {coin_name: np.sign(pred[coin_name]) for coin_name in pred.keys()}
for coin, val in sorted(pred_sign.items()):
    e, cnt = np.unique(val, return_counts=True)
    print("[{}] guesses: {}".format(coin, dict(zip(e, cnt))))









    



[bitcoin] guesses: {1.0: 161}
[bitconnect] guesses: {1.0: 25}
[dash] guesses: {1.0: 132}
[ethereum] guesses: {1.0: 78}
[iota] guesses: {1.0: 11}
[litecoin] guesses: {1.0: 161}
[monero] guesses: {1.0: 123}
[nem] guesses: {1.0: 91}
[neo] guesses: {1.0: 38}
[numeraire] guesses: {1.0: 10}
[omisego] guesses: {1.0: 8}
[qtum] guesses: {1.0: 13}
[ripple] guesses: {1.0: 152}
[stratis] guesses: {1.0: 41}
[waves] guesses: {1.0: 48}

	Date	Open	High	Low	Close	Volume	Market Cap
916	2015-04-01	0.000400	0.000458	0.000170	0.000242	65,525	3604110
915	2015-04-02	0.000242	0.000323	0.000227	0.000314	28,549	2179320
914	2015-04-03	0.000309	0.000330	0.000291	0.000310	20,678	2778750
913	2015-04-04	0.000310	0.000318	0.000251	0.000277	23,550	2792120
912	2015-04-05	0.000272	0.000283	0.000218	0.000232	26,800	2443720