Pokemon GO Visualizations based on London spawn data

Let's have first look at the data dump from the crawler that was populating a database.

Generation

Using PokéVision.com I was able to query pokemon spawns by lat/long values. For this notebook I'm restricting the data to what I collected from London UK, sampling from a grid by a constant stride every 5 minutes.

I ignored the four most common pokemon up-front while crawling which were:

16, // pidgey
19, // rattata
21, // spearow
41, // zubat
96 // drowzee

Format

The data format is really simple, the API returns a unique identifier and an expiration time, which tells when a pokemon despawns- this will act as our "key" to deduplicate and clean data first. The real juicy content is in the unique identifier of the pokemon ("pokemon_id") and its lat/long location value.


In [2]:
import pandas as pd
df = pd.read_csv('data/data.csv', encoding="utf-8-sig")
df.head()


Out[2]:
id expiration_time pokemon_id lat long
0 1437290 1469531977 86 51.585117 0.079396
1 3258228 1469531978 86 51.585117 0.079396
2 3258226 1469532675 18 51.585281 0.079672
3 2403432 1469532300 7 51.576583 -0.396503
4 2387733 1469532339 13 51.575416 -0.350026

In [3]:
original_count = df['id'].count()
print('%d rows' % original_count)


1008222 rows

Cleaning the data

First off we are going to deduplicate the data, since the crawler might return some duplicated rows on failure conditions. The ID and expiration time come in very handy here and can be used as a safe unique identifier.

Afterwards we are turning the expiration time into a python datetime and add the pokemons' real name.


In [4]:
df = df.drop_duplicates(['id', 'expiration_time'])

# let's see how much we have removed
count = df['id'].count()
print('removed %d rows' % (original_count - count))
print('remaining %d rows' % count)

from datetime import datetime as dt
df['expiration_time'] = df['expiration_time'].apply(lambda x: dt.utcfromtimestamp(x))


removed 150779 rows
remaining 857443 rows

In [5]:
# join with the list of pokemons
pkmn = pd.read_csv('data/pokemon.csv', encoding="utf-8")
df = pd.merge(df, pkmn, on='pokemon_id', how='inner')
df.head()


Out[5]:
id expiration_time pokemon_id lat long Name
0 1437290 2016-07-26 11:19:37 86 51.585117 0.079396 Seel
1 3258228 2016-07-26 11:19:38 86 51.585117 0.079396 Seel
2 1666641 2016-07-26 11:21:21 86 51.562280 -0.051938 Seel
3 1572259 2016-07-26 11:21:22 86 51.562280 -0.051938 Seel
4 2957601 2016-07-26 11:20:49 86 51.536197 -0.103055 Seel

Top 10 most common Pokemon in London


In [6]:
hist = df.groupby('Name')['id'].count().sort_values(ascending=False)
hist[0:10]


Out[6]:
Name
Magikarp    66179
Krabby      51800
Weedle      49428
Gastly      48836
Goldeen     40247
Jynx        39906
Psyduck     33934
Poliwag     32849
Staryu      32742
Caterpie    32617
Name: id, dtype: int64

London full of Magikarps! It seems that London is filled with water type pokemon, 6 out of the 10! We will later on figure out if the rumoured affinitization of water and water type pokemon holds.

Top 10 least common Pokemon in London


In [7]:
hist.tail(10)


Out[7]:
Name
Dugtrio       8
Machamp       7
Raichu        6
Jolteon       5
Rapidash      5
Sandslash     4
Charmeleon    3
Dodrio        2
Exeggutor     2
Charizard     1
Name: id, dtype: int64

Not so surprising that the final evolutions are amongst the most uncommon pokemon.

Pokemon not found in London


In [8]:
allPokemonNames = set(pkmn['Name'])
# ignore the pokemon we ignored upfront
filteredPokemonNames = allPokemonNames - set(['Pidgey', 'Rattata', 'Spearow', 'Zubat', 'Drowzee'])
# ignore the legendaries
filteredPokemonNames = filteredPokemonNames - set(['Mew', 'Mewtwo', 'Moltres', 'Zapdos', 'Articuno'])
actualPokemonNames = set(df['Name'].unique())

filteredPokemonNames.difference(actualPokemonNames)


Out[8]:
{'Ditto', "Farfetch'd", 'Flareon', 'Kangaskhan', 'Tauros'}

I originally expected for only the region locked pokemons (Farfetch'd, Tauros, Kangaskhan) and Ditto to show up here. However, there are some evolutions in there that seem to have a very small chance of spawning and weren't covered by the data yet as I've personally seen Golem and Machamp already. Let's see how it will change over time with more data available.

Plotting locations


In [9]:
import geoplotlib as g
from geoplotlib.utils import BoundingBox, DataAccessObject

def savePlotOnLondonMap(data, fileName):
    # define london
    LDN = BoundingBox(north=51.547, west=-0.239, south=51.451011, east=0.072161)
    # massage the data into a format that geoplotlib understands
    geodf = data[['lat', 'long']].copy()
    geodf.rename(columns={'lat': 'lat', 'long': 'lon'}, inplace=True)
    
    g.tiles_provider('positron')
    g.kde(DataAccessObject.from_dataframe(geodf), bw=1)
    g.set_bbox(LDN)
    g.savefig(fileName)
    # inline unfortunately doesn't work :/
    # g.inline()

# plot everything
savePlotOnLondonMap(df, 'img/alllocations')

# give me all charmander locations
geodf = df[df['pokemon_id'] == 4] 
savePlotOnLondonMap(geodf, 'img/charmanderlocations')

# give me all Blastoise locations
geodf = df[df['pokemon_id'] == 9] 
savePlotOnLondonMap(geodf, 'img/blastoiselocations')


('smallest non-zero count', 0.037444165055830925)
('max count:', 242.57823777953465)
('smallest non-zero count', 7.1642544339087712e-08)
('max count:', 18.19639468235092)
('smallest non-zero count', 3.5821272169543856e-08)
('max count:', 1.9098707009025571)

Let's look at the output:

Very busy map, you find that the most pokemons are sighted where a lot of people seem to be. Especially parks and touristy areas like Westminster/Greenwich/Hyde Park/The O2. Keep in mind that there is a selection bias, the PokeVision API caches the results heavily and only refreshes on a user request. We won't have any data in parts of London where nobody is using their service.

Let's look at the Charmander spawns:

There seems to be a so called "nest" in Holland Park and in the Stratford Olympic Park, where there seem to be an unusually high density of Charmanders.

Let's look at much more rarer Blastoise spawns:

Random places, no real cluster pattern here! I missed two chances already of catching it- I guess I have to grind and evolve one :/

Automatically finding nests of pokemon

To find nests, we are going to use a simple density based clustering algorithm called DBSCAN for every pokemon where we have more than twenty data points available. Then we simply print out the mean of the point in one cluster and plot it on the map.


In [13]:
import numpy as np

from sklearn.cluster import DBSCAN
from sklearn import metrics

for x in df['pokemon_id'].sort_values().unique():   
    dff = df[df['pokemon_id'] == x]   
    cnt = dff['id'].count()
    if cnt > 20:
       dff = dff[['lat', 'long']].copy()
       db = DBSCAN(eps=0.001, min_samples=50, metric='haversine', algorithm='ball_tree').fit(dff.as_matrix())
       pkname = pkmn[pkmn['pokemon_id'] == x].iloc[0]['Name']
       labels = db.labels_
       core_samples_mask = np.zeros_like(labels, dtype=bool)
       core_samples_mask[db.core_sample_indices_] = True
       unique_labels = set(labels)
        
       n_clusters = len(unique_labels) - (1 if -1 in labels else 0)
       if(n_clusters > 1):
           print('found clusters for %s, number of pokemon %d' % (pkname, cnt))
           print('number of clusters: %d' % n_clusters)
           tp = pd.DataFrame(columns=['lat','long'])
           for k in unique_labels:
              # ignore class noise when printing            
              if k != -1:           
                 class_member_mask = (labels == k)
                 output = dff[class_member_mask & core_samples_mask]
                 tp = tp.append(output)
                 center = output.mean()
                 kcount = output['lat'].count()
                 # limit the verbosity of the output
                 if len(unique_labels) < 5:
                    print('found cluster \'%d\' at %f/%f with %d occurrences' % (k, center['lat'], center['long'], kcount))    
           # plot all the clusters in a single map
           savePlotOnLondonMap(tp, 'img/{}_nests'.format(pkname))
           print()


found clusters for Bulbasaur, number of pokemon 4271
number of clusters: 16
('smallest non-zero count', 1.7910636084771928e-07)
('max count:', 17.940581673447781)

found clusters for Charmander, number of pokemon 5893
number of clusters: 14
('smallest non-zero count', 3.5821272169543856e-08)
('max count:', 18.19639468235092)

found clusters for Squirtle, number of pokemon 18547
number of clusters: 23
('smallest non-zero count', 3.5821272169543856e-08)
('max count:', 7.8117030065067254)

found clusters for Caterpie, number of pokemon 32617
number of clusters: 64
('smallest non-zero count', 1.4328508867817542e-07)
('max count:', 11.569769285486398)

found clusters for Weedle, number of pokemon 49428
number of clusters: 136
('smallest non-zero count', 5.3731908254315777e-08)
('max count:', 11.801651576267837)

found clusters for Pidgeotto, number of pokemon 22829
number of clusters: 26
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 2.4271631488844747)

found clusters for Ekans, number of pokemon 2551
number of clusters: 8
('smallest non-zero count', 4.4776590211929819e-07)
('max count:', 11.632801810407553)

found clusters for Pikachu, number of pokemon 2088
number of clusters: 9
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 10.669613423399158)

found clusters for Sandshrew, number of pokemon 899
number of clusters: 4
('smallest non-zero count', 3.7612335778021048e-07)
('max count:', 10.552989245572373)

found clusters for Nidoran♀, number of pokemon 13419
number of clusters: 25
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 7.6367546284282062)

found clusters for Nidoran♂, number of pokemon 13207
number of clusters: 25
('smallest non-zero count', 1.2537445259340348e-07)
('max count:', 9.6571906583751836)

found clusters for Clefairy, number of pokemon 5664
number of clusters: 12
('smallest non-zero count', 2.1492763301726311e-07)
('max count:', 15.01355229030624)

found clusters for Vulpix, number of pokemon 2081
number of clusters: 9
('smallest non-zero count', 3.7612335778021048e-07)
('max count:', 9.3471319657837615)

found clusters for Jigglypuff, number of pokemon 8871
number of clusters: 7
('smallest non-zero count', 4.1194462994975431e-07)
('max count:', 9.7180680599170337)

found clusters for Oddish, number of pokemon 14269
number of clusters: 23
('smallest non-zero count', 8.4179989598428052e-07)
('max count:', 45.415433118079775)

found clusters for Paras, number of pokemon 21232
number of clusters: 23
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 16.237355051520243)

found clusters for Venonat, number of pokemon 14057
number of clusters: 22
('smallest non-zero count', 1.2537445259340348e-07)
('max count:', 6.7092260709129841)

found clusters for Diglett, number of pokemon 1555
number of clusters: 5
('smallest non-zero count', 3.4030208561066659e-07)
('max count:', 9.0718858292871456)

found clusters for Meowth, number of pokemon 8104
number of clusters: 6
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 8.1996686616786096)

found clusters for Psyduck, number of pokemon 33934
number of clusters: 86
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 31.103892470203661)

found clusters for Mankey, number of pokemon 1605
number of clusters: 4
('smallest non-zero count', 7.1642544339087712e-08)
('max count:', 10.504288854964065)

found clusters for Growlithe, number of pokemon 2134
number of clusters: 4
('smallest non-zero count', 2.3283826910203505e-07)
('max count:', 6.6928434485406303)

found clusters for Poliwag, number of pokemon 32849
number of clusters: 89
('smallest non-zero count', 1.7910636084771928e-07)
('max count:', 29.857855892006228)

found clusters for Abra, number of pokemon 4135
number of clusters: 10
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 9.3901976127709048)

found clusters for Machop, number of pokemon 4049
number of clusters: 12
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 10.375979525084205)

found clusters for Bellsprout, number of pokemon 12216
number of clusters: 24
('smallest non-zero count', 1.2537445259340348e-07)
('max count:', 3.9788972935469942)

found clusters for Tentacool, number of pokemon 8661
number of clusters: 7
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 7.0631223129209033)

found clusters for Geodude, number of pokemon 4813
number of clusters: 7
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 5.8799411084362267)

found clusters for Ponyta, number of pokemon 2695
number of clusters: 10
('smallest non-zero count', 2.5074890518680697e-07)
('max count:', 10.840225311261634)

found clusters for Slowpoke, number of pokemon 20147
number of clusters: 54
('smallest non-zero count', 1.2537445259340348e-07)
('max count:', 19.840092639724958)

found clusters for Magnemite, number of pokemon 8194
number of clusters: 7
('smallest non-zero count', 3.5821272169543856e-08)
('max count:', 9.0699684481472431)

found clusters for Doduo, number of pokemon 1539
number of clusters: 4
('smallest non-zero count', 4.8358717428884202e-07)
('max count:', 8.0097642356053349)

found clusters for Seel, number of pokemon 14329
number of clusters: 25
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 4.6407670489622843)

found clusters for Shellder, number of pokemon 26670
number of clusters: 44
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 6.2466076885395205)

found clusters for Gastly, number of pokemon 48836
number of clusters: 98
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 9.1928751494220418)

found clusters for Onix, number of pokemon 1390
number of clusters: 6
('smallest non-zero count', 3.9403399386498244e-07)
('max count:', 5.7135320738351272)

found clusters for Hypno, number of pokemon 10748
number of clusters: 14
('smallest non-zero count', 7.1642544339087712e-08)
('max count:', 2.8796384247588547)

found clusters for Krabby, number of pokemon 51800
number of clusters: 104
('smallest non-zero count', 5.3731908254315777e-08)
('max count:', 8.7749519665347169)

found clusters for Voltorb, number of pokemon 4208
number of clusters: 7
('smallest non-zero count', 5.3731908254315777e-08)
('max count:', 8.8034471325353891)

found clusters for Exeggcute, number of pokemon 1423
number of clusters: 4
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 6.6353233183041551)

found clusters for Cubone, number of pokemon 1886
number of clusters: 3
found cluster '0' at 51.523119/-0.129579 with 103 occurrences
found cluster '1' at 51.554295/-0.061798 with 168 occurrences
found cluster '2' at 51.538445/-0.158835 with 410 occurrences
('smallest non-zero count', 3.9403399386498244e-07)
('max count:', 5.8189929306050363)

found clusters for Hitmonlee, number of pokemon 1211
number of clusters: 6
('smallest non-zero count', 1.7910636084771928e-07)
('max count:', 10.501239841894693)

found clusters for Hitmonchan, number of pokemon 1304
number of clusters: 8
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 9.5372586678172038)

found clusters for Lickitung, number of pokemon 1038
number of clusters: 6
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 5.9384494949494435)

found clusters for Rhyhorn, number of pokemon 5186
number of clusters: 12
('smallest non-zero count', 1.7910636084771928e-07)
('max count:', 16.145632173864275)

found clusters for Tangela, number of pokemon 516
number of clusters: 3
found cluster '0' at 51.520615/-0.096254 with 83 occurrences
found cluster '1' at 51.507817/-0.158754 with 54 occurrences
found cluster '2' at 51.500607/-0.127759 with 70 occurrences
('smallest non-zero count', 6.2687226296701755e-07)
('max count:', 9.2058680490535174)

found clusters for Horsea, number of pokemon 31383
number of clusters: 55
('smallest non-zero count', 3.5821272169543856e-08)
('max count:', 15.730872401528064)

found clusters for Goldeen, number of pokemon 40247
number of clusters: 83
('smallest non-zero count', 1.6119572476294734e-07)
('max count:', 31.432908297251299)

found clusters for Staryu, number of pokemon 32742
number of clusters: 82
('smallest non-zero count', 1.6119572476294734e-07)
('max count:', 29.594810360485496)

found clusters for Mr. Mime, number of pokemon 969
number of clusters: 4
('smallest non-zero count', 4.656765382040701e-07)
('max count:', 11.781543718273685)

found clusters for Scyther, number of pokemon 1613
number of clusters: 11
('smallest non-zero count', 1.0746381650863155e-07)
('max count:', 12.163148929934703)

found clusters for Jynx, number of pokemon 39906
number of clusters: 95
('smallest non-zero count', 1.4328508867817542e-07)
('max count:', 14.554283832251251)

found clusters for Electabuzz, number of pokemon 2324
number of clusters: 5
('smallest non-zero count', 1.7910636084771928e-07)
('max count:', 6.4308234498310943)

found clusters for Magmar, number of pokemon 1775
number of clusters: 10
('smallest non-zero count', 4.1194462994975431e-07)
('max count:', 9.8415473972837191)

found clusters for Pinsir, number of pokemon 1230
number of clusters: 4
('smallest non-zero count', 4.2985526603452622e-07)
('max count:', 5.4165883094682874)

found clusters for Magikarp, number of pokemon 66179
number of clusters: 125
('smallest non-zero count', 1.7910636084771928e-08)
('max count:', 55.032553159587209)

found clusters for Eevee, number of pokemon 18415
number of clusters: 32
('smallest non-zero count', 8.955318042385964e-08)
('max count:', 8.62929324326781)

found clusters for Dratini, number of pokemon 8771
number of clusters: 17
('smallest non-zero count', 2.3283826910203505e-07)
('max count:', 21.091428902372307)

Plotting nest locations


In [14]:
%matplotlib inline
%pylab inline
figsize(25, 12)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from ipywidgets import Dropdown
from glob import glob
from IPython.display import Image, display, HTML, clear_output

imagePaths = glob('img/*_nests.png')

w = Dropdown(
    options=imagePaths,
    value=imagePaths[0]
)

def handleOnChange(value):      
    displayImage(value['new'])
    
def displayImage(fileName):
    clear_output()
    display(HTML('<a href=\'%s\' target="_blank">Open "%s" in a new tab</a>' % (fileName, fileName)))
    display(Image(filename=fileName))

w.observe(handleOnChange, names='value')

display(w)
displayImage(imagePaths[0])




Analysing spawn timings

Let's see if we find any pattern in the times different pokemon spawn. Note that we only have the expiration time, but we know for most of the pokemon the despawn timer should be 15 minutes.


In [15]:
%matplotlib inline
%pylab inline
figsize(18, 8)

from IPython.display import Markdown
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime

def minuteChart(occurance_list):
    hour_list = [t.minute for t in occurance_list]
    numbers=[x for x in range(0, 60)]
    labels=map(lambda x: str(x), numbers)
    plt.xticks(numbers, labels)
    plt.xlim(0,60)
    plt.hist(hour_list, bins=60)
    plt.show()

def hourChart(occurance_list):
    hour_list = [t.hour for t in occurance_list]
    numbers=[x for x in range(0,24)]
    labels=map(lambda x: str(x), numbers)
    plt.xticks(numbers, labels)
    plt.xlim(0,24)
    plt.hist(hour_list, bins=24)
    plt.show()    

delta = datetime.timedelta(minutes=15)
allString = 'All'

wNameList = [allString] + sorted(list(actualPokemonNames))
w = Dropdown(
    options=wNameList,
    value=allString
)

def handleOnChange(value):      
    displayHistograms(value['new'])
    
def displayHistograms(pkName):
    clear_output()
    
    ddf = df
    if pkName != allString:
        ddf = df[df['Name'] == pkName]
        
    # rebase to the spawn delta
    deltadf = ddf['expiration_time'].apply(lambda x: (x - delta))
    display(Markdown('## Histogram by spawn minute'))
    minuteChart(deltadf)
    display(Markdown('## Histogram by spawn hour'))
    hourChart(deltadf)

w.observe(handleOnChange, names='value')

display(w)
displayHistograms(allString)


Histogram by spawn minute

Minutely, there is a sawtooth pattern in spawns. It seems like their spawn process runs on a cron every 10 minutes.

Across the whole day, I couldn't gather enough data for it yet and there are gaps from availability issues from the API- will be completed over the next few days.