Spot resources Analytics

Here we perform some initial process and analysis on the dataset.


With static dataset, e.g. load the grabbed data.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline
import time
import dateutil.parser


/home/shwsun/py2_kernel/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

In [2]:
# The filtered file make sure to use the fi
#filename = 'data/aws-recent-filtered.csv'
filename = './data/subset.csv'

In [3]:
start = time.time()
# parse the data file and extra the results
df = pd.read_csv(filename)
print(time.time()-start)
df.head(3)


1.2520980835
Out[3]:
TimeStamp AvailabilityZone InstanceType SpotPrice
0 2016-12-12 10:46:15 ap-northeast-2a m4.large 0.0353
1 2016-12-12 10:46:05 ap-northeast-2a m4.large 0.0351
2 2016-12-12 10:45:05 ap-northeast-2c m4.large 0.0244

In [4]:
df = df.dropna()
df = df[df.AvailabilityZone != "AvailabilityZone"]
df['TimeStamp'] = pd.to_datetime(df.TimeStamp)
df.index = df.TimeStamp
df = df.drop('TimeStamp', axis=1)
df= df.sort_index()
df.head(3)


Out[4]:
AvailabilityZone InstanceType SpotPrice
TimeStamp
2016-10-11 11:09:34 ap-northeast-2c i2.4xlarge 1.136
2016-10-11 11:09:34 ap-northeast-2a i2.4xlarge 1.136
2016-10-11 11:10:00 ap-northeast-2a i2.2xlarge 0.568


In [5]:
print (len(df))
print (df['InstanceType'].unique())
print (df['AvailabilityZone'].unique())
print (df['SpotPrice'].unique())


1496361
['i2.4xlarge' 'i2.2xlarge' 'm4.16xlarge' 'x1.32xlarge' 'x1.16xlarge'
 'd2.8xlarge' 'r3.large' 'r3.4xlarge' 'r3.8xlarge' 'd2.xlarge' 'i2.8xlarge'
 'c4.2xlarge' 'i2.xlarge' 'd2.2xlarge' 'r3.2xlarge' 'c4.4xlarge'
 'd2.4xlarge' 'r3.xlarge' 'm4.10xlarge' 'm4.2xlarge' 'c4.xlarge'
 'm4.xlarge' 'm4.large' 'c4.large' 'c4.8xlarge' 'm4.4xlarge']
['ap-northeast-2c' 'ap-northeast-2a']
[ 1.136   0.568   3.472  ...,  2.2382  2.2814  2.3162]

Hypothesis #2

For each machine type there exists a region that is more favorable to use, as the market volatility is very low and the prices tend to stay cheaper than the other regions.

With in proving this hypothesis users will be able to find the best region they should be bidding in, as long as latency is not an issue for them.

Data Science tools & Techniques: We can use clustering and classification methods.


In [6]:
def corrGraph(title, df):
    corr_df = df.corr()
    mask = np.zeros_like(corr_df)
    mask[np.triu_indices_from(mask)] = True
    seaborn.heatmap(corr_df, cmap='RdYlGn_r', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5)
    plt.yticks(rotation=0) 
    plt.title(title)
    plt.xticks(rotation=90) 
    plt.show()

In [7]:
# Some info about the data
df = df.truncate(before='2016-10-13 00:00:00', after='2016-12-11 00:00:00')
print (df.index.min())
print (df.index.max())
print(df.index.max()- df.index.min()) 

df.head(3)


2016-10-13 00:00:13
2016-12-11 00:00:00
58 days 23:59:47
Out[7]:
AvailabilityZone InstanceType SpotPrice
TimeStamp
2016-10-13 00:00:13 ap-northeast-2c c4.8xlarge 1.8506
2016-10-13 00:00:13 ap-northeast-2a m4.large 0.0193
2016-10-13 00:00:14 ap-northeast-2a r3.2xlarge 0.0838

In [8]:
depa = df.groupby(['AvailabilityZone', 'InstanceType'])

In [9]:
#depa = awsResampler(df)
# Initialize dictionary of all combos of  empty dfs we want to graph and corr
zonedfs={}
typedfs={}
for item in df['InstanceType'].unique():
    typedfs.update({item: pd.DataFrame()})
for item in df['AvailabilityZone'].unique():
    zonedfs.update({item: pd.DataFrame()})

#Fill zonedfs with dataframes of all machines in that zone pricing
for name, group in depa:
    #We have to create the data frame by merging the rows and the first run we need our first row
    if zonedfs[name[0]].empty:
        #RESAMPLE DATA HOURLY getting the avg of the hour
        zonedfs[name[0]] = group
        zonedfs[name[0]] = zonedfs[name[0]].resample('H').mean()
        zonedfs[name[0]] = zonedfs[name[0]].fillna(method="ffill")
        #zonedfs[name[0]] = zonedfs[name[0]].drop('InstanceType', axis=1).drop(['AvailabilityZone'],axis=1)
        #Remove index so that you can merge
        zonedfs[name[0]] = zonedfs[name[0]].reset_index()
        #print(zonedfs[name[0]].head(40) )
        zonedfs[name[0]].rename(columns = {'SpotPrice':name[1]}, inplace = True)

    else:
        #RESAMPLE DATA HOURLY getting the avg of the hour
        group1 = group.resample('H').mean()
        group1 = group1.fillna(method="ffill")
        #print(zonedfs[name[0]].head(20) )
        #group1 = group.drop('InstanceType', axis=1).drop(['AvailabilityZone'],axis=1)
        group1.rename(columns = {'SpotPrice':name[1]}, inplace = True)    
        #Remove index so that you can merge
        group1 = group1.reset_index()
        #print(group1.head(20))
        zonedfs[name[0]] = zonedfs[name[0]].merge(group1,how='right')


#Fill typedfs with dataframes of all machines in that zone pricing
for name, group in depa:
    if typedfs[name[1]].empty:
        typedfs[name[1]] = group
        typedfs[name[1]] = typedfs[name[1]].resample('H').mean()
        typedfs[name[1]] = typedfs[name[1]].fillna(method="ffill")
        #typedfs[name[1]] = typedfs[name[1]].drop('InstanceType', axis=1).drop(['AvailabilityZone'],axis=1)
        typedfs[name[1]].rename(columns = {'SpotPrice':name[0]}, inplace = True)
        typedfs[name[1]] = typedfs[name[1]].reset_index()
    else:
        group1 = group.resample('H').mean()
        group1 = group1.fillna(method="ffill")
        #group1 = group.drop('InstanceType', axis=1).drop(['AvailabilityZone'],axis=1)
        group1.rename(columns = {'SpotPrice':name[0]}, inplace = True)
        group1 = group1.reset_index()
        typedfs[name[1]] = typedfs[name[1]].merge(group1,how='right')

In [23]:
#for key in typedfs:
#    print(typedfs[key])
    
type(typedfs)
df = typedfs["c4.2xlarge"]
df.head(10)
#print(typedfs)
#s = pd.Series(data, index=index)
#df.index
#df.columns
#s = df['c4.2xlarge'].values
#s = list(df.values)

array = []
def populate_array(df):
    for c in df.columns:
        #print(c)
        #if c != 'TimeStamp':
        xs = df[c].values
        array.append(xs.tolist())
print("finished")

for x in [ 'd2.2xlarge', 'r3.2xlarge', 'c4.4xlarge', 
 'd2.4xlarge', 'r3.xlarge', 'm4.10xlarge', 'm4.2xlarge', 'c4.xlarge', 
 'm4.xlarge', 'm4.large', 'c4.large', 'c4.8xlarge', 'm4.4xlarge']:
    populate_array(typedfs[x])


finished

In [24]:
len(array[23])
df = pd.DataFrame(array)
df
df.to_csv("vmarrays.csv")
df


Out[24]:
0 1 2 3 4 5 6 7 8 9 ... 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416
0 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
1 2.364273e-01 2.088333e-01 2.120308e-01 2.118571e-01 2.143889e-01 2.089091e-01 2.133778e-01 2.080300e-01 2.414500e-01 2.260083e-01 ... 1.791500e-01 1.828182e-01 1.929143e-01 1.825333e-01 1.910000e-01 2.643500e-01 1.806000e-01 1.786286e-01 1.791500e-01 NaN
2 2.416500e-01 2.291500e-01 2.337769e-01 2.416500e-01 2.546455e-01 2.236818e-01 2.668455e-01 2.224667e-01 2.692143e-01 2.224667e-01 ... 2.462500e-01 2.462500e-01 2.462500e-01 2.462500e-01 2.572857e-01 2.996250e-01 2.462500e-01 2.462500e-01 2.462500e-01 NaN
3 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 1.476353e+18 ... 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN NaN
4 8.400000e-02 8.400000e-02 8.445000e-02 8.395000e-02 8.446000e-02 8.390000e-02 8.455000e-02 8.396667e-02 8.478333e-02 9.986667e-02 ... 8.800909e-02 9.090556e-02 8.768947e-02 1.029409e-01 1.084760e-01 8.391500e-02 8.516842e-02 8.713043e-02 NaN NaN
5 8.738571e-02 8.658000e-02 8.705000e-02 8.659000e-02 8.705000e-02 8.801000e-02 8.705000e-02 1.030333e-01 1.030333e-01 1.030333e-01 ... 1.116050e-01 9.747857e-02 1.036250e-01 1.082063e-01 1.039957e-01 8.755625e-02 8.468462e-02 8.868824e-02 NaN NaN
6 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
7 1.063500e-01 1.069571e-01 1.064667e-01 1.066500e-01 1.061500e-01 1.061500e-01 1.063000e-01 1.063500e-01 1.044333e-01 1.008250e-01 ... 2.300300e-01 1.136808e-01 9.935000e-02 9.940000e-02 9.910000e-02 9.942500e-02 1.045286e-01 2.458200e-01 9.926667e-02 NaN
8 9.875000e-02 9.925000e-02 9.870000e-02 9.915000e-02 9.880000e-02 9.900000e-02 9.880000e-02 9.880000e-02 9.894000e-02 9.893333e-02 ... 1.833000e-01 1.134000e-01 1.085750e-01 1.064500e-01 1.111111e-01 1.272667e-01 1.345143e-01 3.028000e-01 1.278222e-01 NaN
9 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
10 7.938600e-01 7.950700e-01 5.233400e-01 4.703857e-01 8.334000e-01 4.002838e+00 7.692250e-01 8.813143e-01 6.063571e-01 4.716111e-01 ... 7.699500e-01 4.848125e-01 7.288875e-01 9.946500e-01 7.769000e+00 9.162000e-01 5.955262e+00 9.321360e+00 6.520000e-01 NaN
11 4.663375e-01 3.846667e-01 5.118833e-01 3.708571e-01 3.801429e-01 3.707778e-01 5.710750e-01 5.021286e-01 3.821667e-01 3.741111e-01 ... 5.485000e-01 5.485000e-01 5.485000e-01 5.485000e-01 5.485000e-01 7.378000e-01 6.929600e-01 4.924500e-01 5.233400e-01 NaN
12 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
13 4.236667e-02 4.262609e-02 4.230000e-02 4.253333e-02 4.258333e-02 4.333043e-02 4.253889e-02 5.100870e-02 4.267000e-02 4.212667e-02 ... 4.096667e-02 4.138182e-02 4.117000e-02 4.146000e-02 4.131111e-02 4.598571e-02 4.598571e-02 4.134000e-02 4.090000e-02 NaN
14 4.252500e-02 4.228571e-02 4.262857e-02 4.275000e-02 4.226667e-02 4.200000e-02 4.250000e-02 7.386667e-02 4.288000e-02 4.240000e-02 ... 4.160000e-02 4.162500e-02 4.278000e-02 4.160000e-02 4.265000e-02 4.160000e-02 4.187500e-02 4.187500e-02 4.470000e-02 NaN
15 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
16 1.268650e+00 9.653417e-01 1.154667e+00 1.338475e+00 1.338475e+00 5.035000e-01 1.248340e+00 1.338906e+00 1.341280e+00 1.344037e+00 ... 1.025317e+00 1.309791e+00 6.443789e-01 1.228133e+00 8.222437e-01 6.494385e-01 1.004458e+00 8.456737e-01 5.290273e-01 NaN
17 1.334500e+00 1.334500e+00 1.334500e+00 1.334500e+00 1.334500e+00 1.334500e+00 1.334500e+00 1.329967e+00 1.343050e+00 8.453875e-01 ... 7.680900e-01 3.907125e-01 3.547889e-01 1.135160e+00 6.204643e-01 3.725917e-01 3.650933e-01 3.599385e-01 3.717600e-01 NaN
18 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
19 1.806156e-01 1.562085e-01 1.761507e-01 1.650441e-01 1.968392e-01 1.874987e-01 1.953101e-01 2.217125e-01 2.074171e-01 2.004338e-01 ... 1.316382e+00 1.130829e+00 1.669315e+00 1.109614e+00 1.438936e+00 1.694169e+00 1.240611e+00 1.517286e+00 1.494544e+00 NaN
20 3.684143e-01 4.343000e-01 3.394818e-01 4.343000e-01 3.810577e-01 3.655000e-01 3.851762e-01 3.566613e-01 3.852107e-01 4.057125e-01 ... 4.703150e+00 4.703150e+00 5.357000e+00 5.004927e+00 4.670818e+00 5.004927e+00 5.357000e+00 3.497386e+00 5.200300e+00 NaN
21 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
22 1.181773e-01 1.127667e-01 1.201341e-01 1.092500e-01 1.270592e-01 1.176068e-01 1.082000e-01 1.275390e-01 9.603684e-02 1.073355e-01 ... 6.376922e-01 4.413427e-01 1.635538e-01 1.344519e-01 1.288868e-01 1.231579e-01 1.235017e-01 1.230212e-01 1.245984e-01 NaN
23 1.439536e-01 1.722400e-01 1.484367e-01 1.297543e-01 1.456837e-01 1.397852e-01 1.539903e-01 1.507842e-01 1.267429e-01 1.301400e-01 ... 1.323727e+00 3.875881e-01 1.542968e-01 1.548909e-01 1.491651e-01 1.647941e-01 1.696043e-01 1.830057e-01 1.605836e-01 NaN
24 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
25 1.335050e-01 1.189048e-01 1.248053e-01 1.213267e-01 1.384694e-01 1.292167e-01 1.267282e-01 1.290680e-01 1.058400e-01 1.239222e-01 ... 1.572306e-01 1.510196e-01 1.940516e-01 1.420909e-01 1.742342e-01 1.587875e-01 1.940677e-01 1.367234e-01 1.932567e-01 NaN
26 1.755136e-01 1.749769e-01 2.171500e-01 1.911500e-01 1.760290e-01 1.868667e-01 1.717125e-01 1.990200e-01 1.717167e-01 1.716500e-01 ... 1.673576e-01 1.713312e-01 1.428531e-01 1.487700e-01 1.598184e-01 1.594053e-01 1.560889e-01 1.578838e-01 1.496842e-01 NaN
27 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 1.481414e+18
28 1.914035e-02 1.912157e-02 1.917119e-02 1.913108e-02 1.911143e-02 1.911690e-02 1.913235e-02 1.917297e-02 1.918358e-02 1.917857e-02 ... 4.478962e-02 4.585351e-02 4.416740e-02 4.484307e-02 4.971029e-02 4.496480e-02 4.386188e-02 4.791724e-02 4.409716e-02 3.060000e-02
29 1.894000e-02 1.890000e-02 1.881250e-02 1.888000e-02 1.887500e-02 1.885000e-02 1.883333e-02 1.878000e-02 1.880000e-02 1.889000e-02 ... 3.176143e-02 3.360617e-02 3.715000e-02 3.180128e-02 3.078108e-02 2.979610e-02 3.162179e-02 3.543297e-02 3.348286e-02 2.360000e-02
30 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
31 3.423483e-02 3.898851e-02 3.750353e-02 2.513026e-02 3.345778e-02 3.507692e-02 3.514390e-02 3.707363e-02 3.605663e-02 3.795341e-02 ... 1.595093e-01 1.287920e-01 1.041591e-01 6.940621e-02 8.768844e-02 5.340598e-02 4.141720e-01 2.234383e-01 1.518701e-01 NaN
32 6.610645e-02 7.390556e-02 5.455484e-02 1.385882e-02 1.378333e-02 1.380909e-02 1.382000e-02 1.377619e-02 1.378462e-02 3.734783e-02 ... 4.463544e-01 1.254983e-01 5.582650e-02 5.189053e-02 4.986064e-02 4.976122e-02 1.397969e-01 1.669336e-01 2.463385e-01 NaN
33 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
34 1.178892e+00 1.006281e+00 1.118136e+00 1.205720e+00 1.098240e+00 1.342353e+00 1.128300e+00 1.672711e+00 1.335984e+00 1.206670e+00 ... 2.446592e+00 2.483583e+00 2.401888e+00 2.481939e+00 2.980458e+00 2.773621e+00 2.611614e+00 2.662794e+00 1.935991e+00 NaN
35 1.286660e+00 8.426625e-01 8.838000e-01 1.160043e+00 8.659333e-01 1.529150e+00 1.259657e+00 1.621286e+00 1.368055e+00 1.851825e+00 ... 3.290883e+00 2.265493e+00 2.553725e+00 2.505039e+00 3.483338e+00 2.411938e+00 2.467071e+00 2.354741e+00 1.730284e+00 NaN
36 1.476317e+18 1.476320e+18 1.476324e+18 1.476328e+18 1.476331e+18 1.476335e+18 1.476338e+18 1.476342e+18 1.476346e+18 1.476349e+18 ... 1.481382e+18 1.481386e+18 1.481389e+18 1.481393e+18 1.481396e+18 1.481400e+18 1.481404e+18 1.481407e+18 1.481411e+18 NaN
37 4.735780e-01 4.849351e-01 4.977585e-01 4.828412e-01 4.870508e-01 4.575491e-01 4.804319e-01 4.979441e-01 4.715333e-01 5.509462e-01 ... 5.315315e-01 5.872872e-01 4.818849e-01 5.418191e-01 4.953549e-01 5.104957e-01 5.191038e-01 5.303433e-01 5.360159e-01 NaN
38 4.801196e-01 4.623446e-01 5.017792e-01 5.016625e-01 5.093061e-01 4.685682e-01 4.652875e-01 5.097277e-01 4.699478e-01 5.753100e-01 ... 8.137115e-01 7.656357e-01 8.137115e-01 7.887333e-01 8.160037e-01 8.137231e-01 8.137231e-01 7.656571e-01 8.137231e-01 NaN

39 rows × 1417 columns


In [ ]:

Correlations of timeseries


In [ ]:


In [ ]:


AVGS, MINS, MAXs vs REAL Prices


In [ ]: