Automating GIS-processes - Lecture 8: Python GIS


In [1]:
# Import necessary modules used in this lecture
import pandas as pd
import geopandas as gpd   # geopandas is usually shortened as gpd
from shapely.geometry import Point
from fiona.crs import from_epsg
import sys


# Set filepath (in this case I download the data directly from the server where I have put the data)
fp = r"C:\HY-Data\HENTENKA\Flickr\Flickr_Kruger_2014-2015_May.shp"
#parkName = "Mkuze-Phinda"
#fp = "C:\HY-Data\HENTENKA\Africa_NPs\InstagramData_Shapes\Instagram_%s_2014.shp" % parkName

#fp = r"C:\HY-Data\HENTENKA\Africa_NPs\InstagramData_Shapes\Instagram_%s_2013-2015_October.shp" % parkName

# Read data
data = gpd.read_file(fp)

Cleaning the data from duplicates

There seems to be duplicates in the first rows of the dataset, so we can assume that there might be also more duplicates in the rest of the dataset. We don't want to have any duplicate rows, thus we will delete such rows by using a nice pandas function called .dropduplicates.


In [2]:
# Drop duplicate values from the data set
data = data.drop_duplicates(subset=['photoid'])

# Let's see what the data looks like
data.head()


Out[2]:
comm_cnt comm_txt comm_user coord_acc flickr_lan geometry hashtags id language lat ... photo_cnt photo_desc photoid photourl placeid text time_local userid username views
0 37 ;Hierdie is sommer spiekeries Piet!! Baie goed... 30076469@N02;64815591@N08;41039814@N08;9904906... 16 und POINT (31.371653 -23.758765) fuji;finepix;fujifilm;krugerpark;cattleegret;b... 1 english -23.758765 ... 2037 Photos taken in the Kruger Park from the bridg... 13430688124 https://farm8.staticflickr.com/7248/1343068812... cQIoWFBUV7NSkfdOMA Veereier - Cattle Egret 2014-01-08 06:17:50 30076469@N02 Piet Grobler 1826
1 1 None None 16 da POINT (31.42543 -24.473062) safari;hoedspruit;krugerpark;kruger;safári;áfr... 2 danish -24.473062 ... 18766 None 12238796325 https://farm6.staticflickr.com/5494/1223879632... 2xZfQlBUV7JlNCw04w Kruger Park 2014-01-04 15:19:53 44452722@N03 Leo Soares - DF 73
2 1 None None 16 und POINT (31.425061 -24.472362) safari;hoedspruit;krugerpark;kruger;safári;áfr... 3 english -24.472362 ... 18766 None 12238975883 https://farm6.staticflickr.com/5540/1223897588... 2xZfQlBUV7JlNCw04w Kruger Park 2014-01-04 15:20:40 44452722@N03 Leo Soares - DF 76
3 1 None None 16 da POINT (31.452702 -24.468156) safari;hoedspruit;krugerpark;kruger;safári;áfr... 4 danish -24.468156 ... 18766 None 12238795615 https://farm4.staticflickr.com/3676/1223879561... yqYwfQ1UV7OJqf3REQ Kruger Park 2014-01-04 15:25:38 44452722@N03 Leo Soares - DF 77
4 1 None None 16 da POINT (31.452711 -24.468159) safari;hoedspruit;krugerpark;kruger;safári;áfr... 5 danish -24.468159 ... 18766 None 12238795265 https://farm8.staticflickr.com/7296/1223879526... yqYwfQ1UV7OJqf3REQ Kruger Park 2014-01-04 15:25:44 44452722@N03 Leo Soares - DF 63

5 rows × 25 columns


In [4]:
len(data)


Out[4]:
9156

Process the text characters

Cut the texts after 250 characters.


In [3]:
# Function for cleaning the emojis
def cleanText(row, incol, outcol):
    if len(str(row[incol])) > 150:
        row[outcol] = str(row[incol])[0:146] + " [...]"
    else:
        row[outcol] = row[incol]
    return row

def parseDateTime(row, incol, outcol, type="Date"):
    if type == "Date":
        row[outcol] = row[incol][0:10]
    elif type == "Time":
        row[outcol] = row[incol][10:]
    return row

# Create column
data['textcut'] = None

# Create columns for date and time
data['date'] = None
data['time'] = None


# Iterate and clean texts from emojis
data = data.apply(cleanText, axis=1, incol='text', outcol='textcut')

# Parse dates and times
data = data.apply(parseDateTime, axis=1, incol="time_local", outcol='date', type='Date')
data = data.apply(parseDateTime, axis=1, incol="time_local", outcol='time', type='Time')

data.head()


Out[3]:
comm_cnt comm_txt comm_user coord_acc flickr_lan geometry hashtags id language lat ... photourl placeid text time_local userid username views textcut date time
0 37 ;Hierdie is sommer spiekeries Piet!! Baie goed... 30076469@N02;64815591@N08;41039814@N08;9904906... 16 und POINT (31.371653 -23.758765) fuji;finepix;fujifilm;krugerpark;cattleegret;b... 1 english -23.758765 ... https://farm8.staticflickr.com/7248/1343068812... cQIoWFBUV7NSkfdOMA Veereier - Cattle Egret 2014-01-08 06:17:50 30076469@N02 Piet Grobler 1826 Veereier - Cattle Egret 2014-01-08 06:17:50
1 1 None None 16 da POINT (31.42543 -24.473062) safari;hoedspruit;krugerpark;kruger;safári;áfr... 2 danish -24.473062 ... https://farm6.staticflickr.com/5494/1223879632... 2xZfQlBUV7JlNCw04w Kruger Park 2014-01-04 15:19:53 44452722@N03 Leo Soares - DF 73 Kruger Park 2014-01-04 15:19:53
2 1 None None 16 und POINT (31.425061 -24.472362) safari;hoedspruit;krugerpark;kruger;safári;áfr... 3 english -24.472362 ... https://farm6.staticflickr.com/5540/1223897588... 2xZfQlBUV7JlNCw04w Kruger Park 2014-01-04 15:20:40 44452722@N03 Leo Soares - DF 76 Kruger Park 2014-01-04 15:20:40
3 1 None None 16 da POINT (31.452702 -24.468156) safari;hoedspruit;krugerpark;kruger;safári;áfr... 4 danish -24.468156 ... https://farm4.staticflickr.com/3676/1223879561... yqYwfQ1UV7OJqf3REQ Kruger Park 2014-01-04 15:25:38 44452722@N03 Leo Soares - DF 77 Kruger Park 2014-01-04 15:25:38
4 1 None None 16 da POINT (31.452711 -24.468159) safari;hoedspruit;krugerpark;kruger;safári;áfr... 5 danish -24.468159 ... https://farm8.staticflickr.com/7296/1223879526... yqYwfQ1UV7OJqf3REQ Kruger Park 2014-01-04 15:25:44 44452722@N03 Leo Soares - DF 63 Kruger Park 2014-01-04 15:25:44

5 rows × 28 columns

Extra - Create an interactive visualization of social media data using bokeh

Following lines of code demonstrate how we can create an interactive map of our Instagram Points on top of Google Maps using bokeh module. See the result at the end of this page, you can zoom in/out and also move around the map.

In the GIS-lab computers bokeh module is not installed, so unfortunately at the moment it is not possible for you to test this yourself.


In [10]:
# Import bokeh stuff to visualize our map
from bokeh.models.glyphs import Circle
from bokeh.plotting import figure, show, output_notebook, output_file
from bokeh.models import GMapPlot, Range1d, ColumnDataSource, HoverTool, PanTool, WheelZoomTool, BoxSelectTool, GMapOptions, ResizeTool
from collections import OrderedDict
from shapely.geometry import box

# Plot the data using bokeh - Initialize
output_notebook()

# Parse coordinates into a list
x_coords, y_coords = data['lon'].values[:], data['lat'].values[:]

# Parse texts
texts = data['textcut'].values[:]

# Parse Images
imgs = data['photourl'].values[:]

# Parse User
user = data['username'].values[:]

# Parse Likes
likes = data['likes'].values[:]

# Parse date
date = data['date'].values[:]

# Parse time
time = data['time'].values[:]

# Parse the bounding box of the Geometries
b = data['geometry'].total_bounds
bbox = box(b[0], b[1], b[2], b[3])
print(bbox)

# Parse the centroid
bb_centroid = bbox.centroid

# Parse x and y
clat, clon = -23.8, 31.4

# Set the ranges for the map 
x_range = Range1d(17.9, 25.0)
y_range = Range1d(-34.15, -36.2)

# Set the Google Maps to Helsinki
map_options = GMapOptions(lat=clat, lng=clon, map_type="roadmap", zoom=8)

# Create an interactive Google map to Helsinki
plot = GMapPlot(
    x_range=x_range, y_range=y_range,
    map_options=map_options,
    title = "Kruger - Flickr Points",
    plot_width=1600,
    plot_height=850,
    webgl=True
)

# Create a Data source
source = ColumnDataSource(
    data=dict(
        lat=y_coords,
        lon=x_coords,
        text=texts,
        imgs=imgs,
        time=time,
        date=date,
        user=user,
        likes=likes
        
    )
)

# Configure hover tooltips
#hover = HoverTool(
#        tooltips=[
#        ("x", "@lon"),
#        ("y", "@lat"),
#        ("Text", "@text")
#    ]
#)

# Save to disk
outfile = r"C:\HY-Data\HENTENKA\Africa_NPs\InteractiveMaps\Flickr_Kruger_2014-2015May.html" 
output_file(outfile)


"""span style="font-size: 15px; font-weight: bold;">@text</span>"""

hover = HoverTool(
        
        tooltips="""
        <div>
            <div>
                <img
                    src="@imgs" height="200" alt="@imgs" width="200"
                    style="float: left; margin:0px 15px 15px 0px;"
                    border="2"
                ></img>
            </div>
            <div>
                <span style="font-size: 13px; font-weight: bold;">@text</span>
                <span style="font-size: 13px; font-weight: bold; color: #FF0000;">&#10084; @likes</span>
            </div>
            <div>
                <span style="font-size: 13px; font-weight: bold; color: #696;">@user </span>
            </div>
            <div>
                <span style="font-size: 12px; ">&#128197; @date    </span>
                <span style="font-size: 12px; ">&#9200; @time </span>
            </div>
            <div>
                <span style="font-size: 20px;"> &target; </span>
                <span style="font-size: 13px;"> @lat,  </span>
                <span style="font-size: 13px;"> @lon </span>
            </div>
            
        </div>
        """
)


# Create the Patch i.e. the polygon
circle = Circle(x="lon",y="lat", fill_color='#DE2D26', size=3.75)

# Add Polygon Patch on top of the map
plot.add_glyph(source, circle)

# Add tools
plot.add_tools(PanTool(), WheelZoomTool(), hover)

# Show the map
show(plot)


BokehJS successfully loaded.
POLYGON ((31.976674 -34.354339, 31.976674 0, 0 0, 0 -34.354339, 31.976674 -34.354339))
Out[10]:
<bokeh.io._CommsHandle at 0x87fb160>

In [5]:
import bokeh
bokeh.__version__


Out[5]:
'0.11.1'