2017 NCTU Data Maning HW2

0416037 李家安

Info

Group 3
Dataset: New York Citi Bike Trip Histories, first data

Schema

Station id, name, lat, lng
Station's flow id, time, in-flow, out-flow

Task

Using listed algorithms as following:
- Spatial clustering
  - stations’ geo-information
  - Kmeans (k=?)
    - elbow method
  - DBscan (eps=?,min_sample=?,metric=?)
- Temporal clustering
  - in-flow and out-flow data in the first week
  - Agglomerative Clustering (affinity=?)
  - PCA (n_components=?) => Agglomerative Clustering(affinity=?)
- Other

Need

Make some observation, compare different method and parameters, explain the result and see if the output meet your expect.

Data Query

Connect SQL



In [1]:

    
from sqlalchemy import create_engine

engine = create_engine('mysql://calee:110010@localhost/citybike')

Load Station info



In [2]:

    
import pandas as pd

station = pd.read_sql_table(table_name='station', con=engine)
station









    



/usr/local/lib/python3.5/dist-packages/pandas/core/computation/__init__.py:18: UserWarning: The installed version of numexpr 2.4.3 is not supported in pandas and will be not be used
The minimum supported version is 2.4.6

  ver=ver, min_ver=_MIN_NUMEXPR_VERSION), UserWarning)






    Out[2]:







  
    
      
      id
      name
      latitude
      longitude
    
  
  
    
      0
      72
      W 52 St & 11 Ave
      40.767272
      -73.993929
    
    
      1
      79
      Franklin St & W Broadway
      40.719116
      -74.006667
    
    
      2
      82
      St James Pl & Pearl St
      40.711174
      -74.000165
    
    
      3
      83
      Atlantic Ave & Fort Greene Pl
      40.683826
      -73.976323
    
    
      4
      116
      W 17 St & 8 Ave
      40.741776
      -74.001497
    
    
      5
      119
      Park Ave & St Edwards St
      40.696089
      -73.978034
    
    
      6
      120
      Lexington Ave & Classon Ave
      40.686768
      -73.959282
    
    
      7
      127
      Barrow St & Hudson St
      40.731724
      -74.006744
    
    
      8
      128
      MacDougal St & Prince St
      40.727103
      -74.002971
    
    
      9
      143
      Clinton St & Joralemon St
      40.692395
      -73.993379
    
    
      10
      144
      Nassau St & Navy St
      40.698399
      -73.980689
    
    
      11
      146
      Hudson St & Reade St
      40.716250
      -74.009106
    
    
      12
      150
      E 2 St & Avenue C
      40.720874
      -73.980858
    
    
      13
      151
      Cleveland Pl & Spring St
      40.722104
      -73.997249
    
    
      14
      152
      Warren St & Church St
      40.714740
      -74.009106
    
    
      15
      153
      E 40 St & 5 Ave
      40.752062
      -73.981632
    
    
      16
      157
      Henry St & Atlantic Ave
      40.690893
      -73.996123
    
    
      17
      161
      LaGuardia Pl & W 3 St
      40.729170
      -73.998102
    
    
      18
      164
      E 47 St & 2 Ave
      40.753231
      -73.970325
    
    
      19
      167
      E 39 St & 3 Ave
      40.748901
      -73.976049
    
    
      20
      168
      W 18 St & 6 Ave
      40.739713
      -73.994564
    
    
      21
      173
      Broadway & W 49 St
      40.760683
      -73.984527
    
    
      22
      174
      E 25 St & 1 Ave
      40.738177
      -73.977387
    
    
      23
      195
      Liberty St & Broadway
      40.709056
      -74.010434
    
    
      24
      212
      W 16 St & The High Line
      40.743349
      -74.006818
    
    
      25
      216
      Columbia Heights & Cranberry St
      40.700379
      -73.995481
    
    
      26
      217
      Old Fulton St
      40.702772
      -73.993836
    
    
      27
      223
      W 13 St & 7 Ave
      40.737815
      -73.999947
    
    
      28
      228
      E 48 St & 3 Ave
      40.754601
      -73.971879
    
    
      29
      229
      Great Jones St
      40.727434
      -73.993790
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      604
      3436
      Greenwich St & Hubert St
      40.721319
      -74.010065
    
    
      605
      3437
      Riverside Dr & W 91 St
      40.793135
      -73.977004
    
    
      606
      3438
      E 76 St & 3 Ave
      40.772249
      -73.958421
    
    
      607
      3440
      Fulton St & Adams St
      40.692418
      -73.989495
    
    
      608
      3441
      10 Hudson Yards
      40.752957
      -74.002640
    
    
      609
      3443
      W 52 St & 6 Ave
      40.761330
      -73.979820
    
    
      610
      3445
      Riverside Dr & W 89 St
      40.791812
      -73.978602
    
    
      611
      3447
      E 71 St & 1 Ave
      40.767034
      -73.956227
    
    
      612
      3449
      Eckford St & Engert Ave
      40.721463
      -73.948009
    
    
      613
      3452
      Bayard St & Leonard St
      40.719156
      -73.948854
    
    
      614
      3453
      Devoe St & Lorimer St
      40.713352
      -73.949103
    
    
      615
      3454
      Leonard St & Maujer St
      40.710369
      -73.947060
    
    
      616
      3455
      Schermerhorn St & 3 Ave
      40.686808
      -73.980362
    
    
      617
      3456
      Jackson St & Leonard St
      40.716380
      -73.948213
    
    
      618
      3457
      E 58 St & Madison Ave
      40.763026
      -73.972095
    
    
      619
      3458
      W 55 St & 6 Ave
      40.763094
      -73.978350
    
    
      620
      3459
      E 53 St & 3 Ave
      40.757632
      -73.969306
    
    
      621
      3461
      Murray St & Greenwich St
      40.714852
      -74.011223
    
    
      622
      3462
      E 44 St & 2 Ave
      40.751184
      -73.971387
    
    
      623
      3463
      E 16 St & Irving Pl
      40.735367
      -73.987974
    
    
      624
      3464
      W 37 St & Broadway
      40.752271
      -73.987706
    
    
      625
      3466
      W 45 St & 6 Ave
      40.756687
      -73.982577
    
    
      626
      3468
      NYCBS Depot - STY - Garage 4
      40.730380
      -73.974750
    
    
      627
      3469
      India St & West St
      40.731814
      -73.959950
    
    
      628
      3470
      Gowanus Tech Station
      40.669802
      -73.994905
    
    
      629
      3472
      W 15 St & 10 Ave
      40.742754
      -74.007474
    
    
      630
      3474
      6 Ave & Spring St
      40.725256
      -74.004121
    
    
      631
      3476
      Norman Ave & Leonard St
      40.725770
      -73.950740
    
    
      632
      3477
      39 St & 2 Ave - Citi Bike HQ at Industry City
      40.655400
      -74.010628
    
    
      633
      3478
      2 Ave & 36 St - Citi Bike HQ at Industry City
      40.657089
      -74.008702
    
  

634 rows × 4 columns

Load station flow info



In [3]:

    
import pandas as pd
import datetime
import numpy as np

query = '''
SELECT * FROM in_out
WHERE time between "2017-07-01" AND "2017-07-07 23:59:59"
ORDER BY time;
'''

flow = pd.read_sql_query(query, con=engine)
flow['time'] = flow['time'] - datetime.datetime(2017, 7, 1, 0, 0, 0)
flow['time'] = flow['time'] / np.timedelta64(1, 's')
flow['time'] = flow[['time']].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
flow









    Out[3]:







  
    
      
      id
      time
      in
      out
    
  
  
    
      0
      546
      0.0
      0
      3
    
    
      1
      3131
      0.0
      1
      0
    
    
      2
      212
      0.0
      3
      1
    
    
      3
      398
      0.0
      0
      3
    
    
      4
      3120
      0.0
      0
      2
    
    
      5
      3466
      0.0
      1
      0
    
    
      6
      153
      0.0
      1
      0
    
    
      7
      3171
      0.0
      1
      0
    
    
      8
      416
      0.0
      2
      0
    
    
      9
      502
      0.0
      2
      2
    
    
      10
      243
      0.0
      0
      1
    
    
      11
      3233
      0.0
      2
      0
    
    
      12
      532
      0.0
      0
      1
    
    
      13
      344
      0.0
      0
      1
    
    
      14
      3381
      0.0
      1
      0
    
    
      15
      474
      0.0
      1
      0
    
    
      16
      468
      0.0
      0
      1
    
    
      17
      505
      0.0
      5
      1
    
    
      18
      3242
      0.0
      2
      0
    
    
      19
      3067
      0.0
      3
      1
    
    
      20
      492
      0.0
      0
      1
    
    
      21
      308
      0.0
      0
      1
    
    
      22
      457
      0.0
      3
      1
    
    
      23
      79
      0.0
      0
      1
    
    
      24
      3300
      0.0
      1
      2
    
    
      25
      440
      0.0
      1
      0
    
    
      26
      514
      0.0
      0
      2
    
    
      27
      3090
      0.0
      4
      0
    
    
      28
      3305
      0.0
      0
      1
    
    
      29
      3125
      0.0
      1
      0
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      131107
      394
      1.0
      2
      3
    
    
      131108
      3461
      1.0
      2
      2
    
    
      131109
      348
      1.0
      1
      1
    
    
      131110
      3443
      1.0
      0
      2
    
    
      131111
      363
      1.0
      0
      2
    
    
      131112
      3325
      1.0
      1
      1
    
    
      131113
      3142
      1.0
      3
      2
    
    
      131114
      432
      1.0
      3
      2
    
    
      131115
      405
      1.0
      2
      1
    
    
      131116
      3238
      1.0
      0
      1
    
    
      131117
      173
      1.0
      2
      1
    
    
      131118
      3319
      1.0
      0
      1
    
    
      131119
      344
      1.0
      0
      1
    
    
      131120
      3359
      1.0
      1
      0
    
    
      131121
      410
      1.0
      1
      1
    
    
      131122
      469
      1.0
      7
      0
    
    
      131123
      244
      1.0
      0
      3
    
    
      131124
      358
      1.0
      0
      1
    
    
      131125
      3162
      1.0
      0
      1
    
    
      131126
      3463
      1.0
      3
      0
    
    
      131127
      453
      1.0
      3
      2
    
    
      131128
      450
      1.0
      1
      4
    
    
      131129
      3119
      1.0
      1
      0
    
    
      131130
      3082
      1.0
      0
      2
    
    
      131131
      308
      1.0
      1
      0
    
    
      131132
      3175
      1.0
      1
      1
    
    
      131133
      3120
      1.0
      0
      1
    
    
      131134
      387
      1.0
      2
      0
    
    
      131135
      265
      1.0
      2
      1
    
    
      131136
      467
      1.0
      2
      1
    
  

131137 rows × 4 columns

Clustering

K-Means

kmeans 切出中心點並作圖



In [4]:

    
from mpl_toolkits.basemap import Basemap
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np

my_map = Basemap(projection='merc', lat_0=40.7, lon_0=-73.98,
    resolution = 'h', area_thresh = 0.01,
    llcrnrlon=-74.1, llcrnrlat=40.64,
    urcrnrlon=-73.9, urcrnrlat=40.85)

def draw_kmeans(num):
    kmeans = KMeans(n_clusters=num, random_state=0).fit(station[['latitude','longitude']])
    area = kmeans.cluster_centers_
    
    lon = station['longitude'].tolist()
    lat = station['latitude'].tolist()
    cen_lon = [ a[0] for a in area[:,[1]] ]
    cen_lat = [ a[0] for a in area[:,[0]] ]
    labels = station['id'].tolist()

    fig = plt.figure(frameon=False)
    fig.set_size_inches(15,10)

    my_map.drawcoastlines()
    my_map.drawcountries()
    my_map.fillcontinents(color='coral')
    my_map.drawmapboundary()

    x,y = my_map(lon, lat)
    my_map.plot(x, y, 'bo', markersize=1)

    x,y = my_map(cen_lon, cen_lat)
    my_map.plot(x, y, 'g+', markersize=10)

    plt.show()
    
for num in (2, 3, 5, 10, 15, 20, 30):
    draw_kmeans(num)









    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)

由上面的資料顯示，用 kmeans 時，將 n 設為 20 似乎是不錯的分群，可以大略把每一個小群集切割出來

DBScan

DBScan 做出分群並作圖



In [6]:

    
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
import numpy as np
from sklearn import metrics
import matplotlib.cm as cm

def draw_DBSCAN(method, min_sup, eps):
    dbscan = DBSCAN(eps=eps, min_samples=min_sup, metric=method).fit(station[['latitude','longitude']])
    lb = dbscan.labels_
    lbmx = max(lb)
    llat = []
    llon = []
    for i in range(-1, lbmx+1):
        la = []
        lo = []
        for idx, j in enumerate(lb):
            if i == j:
                lo.append(station['longitude'].tolist()[idx])
                la.append(station['latitude'].tolist()[idx])
        llat.append(la)
        llon.append(lo)
    # draw
    my_map = Basemap(projection='merc', lat_0=40.7, lon_0=-73.98,
        resolution = 'h', area_thresh = 0.01,
        llcrnrlon=-74.1, llcrnrlat=40.64,
        urcrnrlon=-73.9, urcrnrlat=40.85)

    lon = station['longitude'].tolist()
    lat = station['latitude'].tolist()
    labels = station['id'].tolist()

    fig = plt.figure(frameon=False)
    fig.set_size_inches(15,10)

    my_map.drawcoastlines()
    my_map.drawcountries()
    my_map.fillcontinents(color='coral')
    my_map.drawmapboundary()

    colors = cm.rainbow(np.linspace(0, 1, lbmx+2))
    #print(colors)
    #colors = np.array([0,0,0,1]) + colors
    #print(colors)

    for i in range(lbmx+2):
        x,y = my_map(llon[i], llat[i])
        my_map.plot(x, y, color=colors[i], markersize=2, marker='o', linestyle='')

    plt.show()

for method in ['euclidean']:
    for eps in (0.002, 0.004, 0.006):
        for min_sup in (5, 10, 15):
#             if eps == 0.004 and min_sup == 5: continue
            print('method=', method)
            print('eps=', eps)
            print('min_support=', min_sup)
            draw_DBSCAN(method=method, min_sup=min_sup, eps=eps)









    



method= euclidean
eps= 0.002
min_support= 5






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.002
min_support= 10






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.002
min_support= 15






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.004
min_support= 5






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.004
min_support= 10






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.004
min_support= 15






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.006
min_support= 5






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.006
min_support= 10






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)






    












    



method= euclidean
eps= 0.006
min_support= 15






    



/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1767: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  axisbgc = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:1698: MatplotlibDeprecationWarning: The axesPatch function was deprecated in version 2.1. Use Axes.patch instead.
  limb = ax.axesPatch
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
/usr/local/lib/python3.5/dist-packages/mpl_toolkits/basemap/__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)

由上面的圖可以看出，若 eps 與 min_support 太低，容易出現非常方散，沒有鑑別度的分群，而若 eps 與 min_support 太大，又容易出現把全部點化成同一區，完全沒有分群到。

在這次的資料中，看起來似乎設定 eps = 0.004 min_support = 2，eps = 0.006 min_support = 10，eps = 0.006 min_support = 5，eps = 0.06 min_support = 2，eps = 0.06 min_support = 1 等，都可以切出比較不錯的分群。

Temporal clustering

Agglomerative

將 in-flow, out-flow, timestemp 抽出來，用 Agglomerative 分群



In [7]:

    
from sklearn.cluster import AgglomerativeClustering

array = []
for method in ('ward', 'complete', 'average'):
    for aff in ('euclidean', 'l1', 'l2', 'manhattan', 'cosine'):
        if method == 'ward' and not aff == 'euclidean':
            continue
        agg = AgglomerativeClustering(n_clusters=2, affinity=aff, linkage=method, memory="agg_cache")
        agg.fit(flow.iloc[1:][:20000])
        array.append(agg.labels_)
array









    Out[7]:





[array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 0, 0, ..., 0, 0, 0]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 1, 1, ..., 1, 1, 1]),
 array([0, 0, 0, ..., 0, 0, 0])]



In [8]:

    
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np


def randrange(n, vmin, vmax):
    '''
    Helper function to make an array of random numbers having shape (n, )
    with each number distributed Uniform(vmin, vmax).
    '''
    return (vmax - vmin)*np.random.rand(n) + vmin

def draw(data):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    n = 100

    # For each set of style and range settings, plot n random points in the box
    # defined by x in [23, 32], y in [0, 100], z in [zlow, zhigh].
    xs1 = np.array([])
    xs2 = np.array([])
    ys1 = np.array([])
    ys2 = np.array([])
    zs1 = np.array([])
    zs2 = np.array([])
    s1 = []
    s2 = []
    for idx, d in enumerate(data):
        #print(flow['time'][idx])
        if d == 0:
            xs1 = np.append(xs1, flow['time'][idx])
            ys1 = np.append(ys1, flow['in'][idx])
            zs1 = np.append(zs1, flow['out'][idx])
            s1.append(1)
        else:
            xs2 = np.append(xs2, flow['time'][idx])
            ys2 = np.append(ys2, flow['in'][idx])
            zs2 = np.append(zs2, flow['out'][idx])
            s2.append(1)
        #print(xs1)
#     for c, m, zlow, zhigh in [('r', 'o', -50, -25), ('b', '^', -30, -5)]:
#         xs = randrange(n, 23, 32)
#         ys = randrange(n, 0, 100)
#         zs = randrange(n, zlow, zhigh)
    ax.scatter(xs1, ys1, zs1, c='r', marker='o', s=s1)
    ax.scatter(xs2, ys2, zs2, c='b', marker='^', s=s2)

    ax.set_xlabel('time stemp')
    ax.set_ylabel('in-flow')
    ax.set_zlabel('out-flow')

    plt.show()

for tmp in array:
    #print(tmp)
    draw(tmp)

不管用任何算法，感覺看不出明顯的分佈差異

PCA + Agglomerative

試著先用 PCA 降維



In [23]:

    
from sklearn.decomposition import PCA

pca = PCA(n_components=3).fit(flow.iloc[1:])
print(pca.components_)









    



[[  9.99999833e-01   2.35510194e-06  -4.10646657e-04  -4.07537190e-04]
 [  5.78176873e-04   7.37263743e-03   7.33414737e-01   6.79741226e-01]
 [ -1.97542706e-05  -9.93559792e-04   6.79764859e-01  -7.33429443e-01]]



In [ ]:

	id	name	latitude	longitude
0	72	W 52 St & 11 Ave	40.767272	-73.993929
1	79	Franklin St & W Broadway	40.719116	-74.006667
2	82	St James Pl & Pearl St	40.711174	-74.000165
3	83	Atlantic Ave & Fort Greene Pl	40.683826	-73.976323
4	116	W 17 St & 8 Ave	40.741776	-74.001497
5	119	Park Ave & St Edwards St	40.696089	-73.978034
6	120	Lexington Ave & Classon Ave	40.686768	-73.959282
7	127	Barrow St & Hudson St	40.731724	-74.006744
8	128	MacDougal St & Prince St	40.727103	-74.002971
9	143	Clinton St & Joralemon St	40.692395	-73.993379
10	144	Nassau St & Navy St	40.698399	-73.980689
11	146	Hudson St & Reade St	40.716250	-74.009106
12	150	E 2 St & Avenue C	40.720874	-73.980858
13	151	Cleveland Pl & Spring St	40.722104	-73.997249
14	152	Warren St & Church St	40.714740	-74.009106
15	153	E 40 St & 5 Ave	40.752062	-73.981632
16	157	Henry St & Atlantic Ave	40.690893	-73.996123
17	161	LaGuardia Pl & W 3 St	40.729170	-73.998102
18	164	E 47 St & 2 Ave	40.753231	-73.970325
19	167	E 39 St & 3 Ave	40.748901	-73.976049
20	168	W 18 St & 6 Ave	40.739713	-73.994564
21	173	Broadway & W 49 St	40.760683	-73.984527
22	174	E 25 St & 1 Ave	40.738177	-73.977387
23	195	Liberty St & Broadway	40.709056	-74.010434
24	212	W 16 St & The High Line	40.743349	-74.006818
25	216	Columbia Heights & Cranberry St	40.700379	-73.995481
26	217	Old Fulton St	40.702772	-73.993836
27	223	W 13 St & 7 Ave	40.737815	-73.999947
28	228	E 48 St & 3 Ave	40.754601	-73.971879
29	229	Great Jones St	40.727434	-73.993790
...	...	...	...	...
604	3436	Greenwich St & Hubert St	40.721319	-74.010065
605	3437	Riverside Dr & W 91 St	40.793135	-73.977004
606	3438	E 76 St & 3 Ave	40.772249	-73.958421
607	3440	Fulton St & Adams St	40.692418	-73.989495
608	3441	10 Hudson Yards	40.752957	-74.002640
609	3443	W 52 St & 6 Ave	40.761330	-73.979820
610	3445	Riverside Dr & W 89 St	40.791812	-73.978602
611	3447	E 71 St & 1 Ave	40.767034	-73.956227
612	3449	Eckford St & Engert Ave	40.721463	-73.948009
613	3452	Bayard St & Leonard St	40.719156	-73.948854
614	3453	Devoe St & Lorimer St	40.713352	-73.949103
615	3454	Leonard St & Maujer St	40.710369	-73.947060
616	3455	Schermerhorn St & 3 Ave	40.686808	-73.980362
617	3456	Jackson St & Leonard St	40.716380	-73.948213
618	3457	E 58 St & Madison Ave	40.763026	-73.972095
619	3458	W 55 St & 6 Ave	40.763094	-73.978350
620	3459	E 53 St & 3 Ave	40.757632	-73.969306
621	3461	Murray St & Greenwich St	40.714852	-74.011223
622	3462	E 44 St & 2 Ave	40.751184	-73.971387
623	3463	E 16 St & Irving Pl	40.735367	-73.987974
624	3464	W 37 St & Broadway	40.752271	-73.987706
625	3466	W 45 St & 6 Ave	40.756687	-73.982577
626	3468	NYCBS Depot - STY - Garage 4	40.730380	-73.974750
627	3469	India St & West St	40.731814	-73.959950
628	3470	Gowanus Tech Station	40.669802	-73.994905
629	3472	W 15 St & 10 Ave	40.742754	-74.007474
630	3474	6 Ave & Spring St	40.725256	-74.004121
631	3476	Norman Ave & Leonard St	40.725770	-73.950740
632	3477	39 St & 2 Ave - Citi Bike HQ at Industry City	40.655400	-74.010628
633	3478	2 Ave & 36 St - Citi Bike HQ at Industry City	40.657089	-74.008702

	id	time	in	out
0	546	0.0	0	3
1	3131	0.0	1	0
2	212	0.0	3	1
3	398	0.0	0	3
4	3120	0.0	0	2
5	3466	0.0	1	0
6	153	0.0	1	0
7	3171	0.0	1	0
8	416	0.0	2	0
9	502	0.0	2	2
10	243	0.0	0	1
11	3233	0.0	2	0
12	532	0.0	0	1
13	344	0.0	0	1
14	3381	0.0	1	0
15	474	0.0	1	0
16	468	0.0	0	1
17	505	0.0	5	1
18	3242	0.0	2	0
19	3067	0.0	3	1
20	492	0.0	0	1
21	308	0.0	0	1
22	457	0.0	3	1
23	79	0.0	0	1
24	3300	0.0	1	2
25	440	0.0	1	0
26	514	0.0	0	2
27	3090	0.0	4	0
28	3305	0.0	0	1
29	3125	0.0	1	0
...	...	...	...	...
131107	394	1.0	2	3
131108	3461	1.0	2	2
131109	348	1.0	1	1
131110	3443	1.0	0	2
131111	363	1.0	0	2
131112	3325	1.0	1	1
131113	3142	1.0	3	2
131114	432	1.0	3	2
131115	405	1.0	2	1
131116	3238	1.0	0	1
131117	173	1.0	2	1
131118	3319	1.0	0	1
131119	344	1.0	0	1
131120	3359	1.0	1	0
131121	410	1.0	1	1
131122	469	1.0	7	0
131123	244	1.0	0	3
131124	358	1.0	0	1
131125	3162	1.0	0	1
131126	3463	1.0	3	0
131127	453	1.0	3	2
131128	450	1.0	1	4
131129	3119	1.0	1	0
131130	3082	1.0	0	2
131131	308	1.0	1	0
131132	3175	1.0	1	1
131133	3120	1.0	0	1
131134	387	1.0	2	0
131135	265	1.0	2	1
131136	467	1.0	2	1