Peak-times at WU Mensa

We want to quickly showcase how some of the Open Data of WU Vienna can be used to estimate the peak times at the WU mensa by estimating the number of students and their arrival time based on lecture and the geolocation information of the lecture rooms.


In [3]:
#import necessary python modules
%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 6)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns; sns.set()
import datetime


Populating the interactive namespace from numpy and matplotlib

Data sources

We will use three different data sources for this experiment from the Open Data portal of the WU

The All Course-Events during WS15 at WU Vienna dataset containing information about lecture events. We will use this data to get the lecture rooms, buildings and the end times of the courses.


In [4]:
course_events='http://data.wu.ac.at/dataset/812a45f5-3f69-457a-b496-dbd598456829/resource/bfe6cda0-258f-47b2-876d-2913e5459a66/download/allcoursesandevents15w.csv'

The All Campus Rooms at WU Vienna dataset containing information about lecture rooms. We will use this data to get the capacity of a lecture room.


In [5]:
room_info='http://data.wu.ac.at/dataset/fed3bae6-397c-4f4c-9c14-15aa8443d268/resource/d17a0d32-562a-4b37-9f32-ce06c4482583/download/allcampusrooms.csv'

The Mapping of campus rooms and main entrances at WU Vienna dataset containing information about the campus rooms and the main entrance of the building. We will use this data to get the geo-locations of the entraces of the lecture rooms.


In [6]:
entrance_coord='http://data.wu.ac.at/dataset/cc76ca38-a904-4909-b621-1f7be63b821b/resource/8b67906a-7459-46eb-b4e0-ee312150a330/download/entrancesroomsmapping.csv'

Collecting and inspecting the data


In [7]:
#get the course
d_courseEvents= pd.read_csv(course_events,parse_dates=['start', 'end'], dtype={'course_id': object})
d_courseEvents.head()


Out[7]:
course_id semester name roomcode buildingname start end
0 0001 15W Grundlagen der Volkswirtschaftslehre TC.0.10 Audimax TC 2015-10-12 13:30:00 2015-10-12 16:00:00
1 0001 15W Grundlagen der Volkswirtschaftslehre TC.0.10 Audimax TC 2015-10-14 13:30:00 2015-10-14 16:00:00
2 0001 15W Grundlagen der Volkswirtschaftslehre TC.0.10 Audimax TC 2015-10-19 13:30:00 2015-10-19 16:00:00
3 0001 15W Grundlagen der Volkswirtschaftslehre TC.0.10 Audimax TC 2015-10-21 13:30:00 2015-10-21 16:00:00
4 0001 15W Grundlagen der Volkswirtschaftslehre TC.0.10 Audimax TC 2015-10-28 13:30:00 2015-10-28 16:00:00

In [8]:
#get room information
d_roomInfo=pd.read_csv(room_info)
d_roomInfo.head()


Out[8]:
location_id capacity roomcode floorname buildingname category_en category_de roomname_en roomname_de fancyname_de fancyname_en
0 001_10_OG01_110900 3 LC.2.169 OG01 LC Tandem room Tandemraum LC.2.169 Tandem room LC.2.169 Tandemraum Tandemraum Tandem room
1 001_20_OG01_350300 8 D1.1.002 OG01 D1 Project room Projektraum D1.1.002 D1.1.002 NaN NaN
2 001_10_OG01_310800 8 LC.2.024 OG01 LC Project room Projektraum LC.2.024 LC.2.024 NaN NaN
3 001_10_OG01_112500 15 LC.2.232 OG01 LC Meeting room Besprechungsraum LC.2.232 LC.2.232 NaN NaN
4 001_30_OG01_530000 8 D2.1.512 OG01 D2 Meeting room Besprechungsraum D2.1.512 D2.1.512 NaN NaN

In [9]:
#load data about entrance coordinates for the lecture rooms
d_entrCoords= pd.read_csv(entrance_coord, dtype={'course_id': object})
d_entrCoords.head()


Out[9]:
location_id entrance_id entrance_name_de entrance_name_en latitude longitude
0 001_10_E101_310100 10251 LC Eingang LC Entrance 48.213554 16.408457
1 001_10_E101_310200 10251 LC Eingang LC Entrance 48.213554 16.408457
2 001_10_E101_310300 10251 LC Eingang LC Entrance 48.213554 16.408457
3 001_10_E101_310500 10251 LC Eingang LC Entrance 48.213554 16.408457
4 001_10_E101_310600 10251 LC Eingang LC Entrance 48.213554 16.408457

Combining the three data sources


In [10]:
#merge lecture events and room information on roomcode
merge = d_roomInfo.merge(d_courseEvents, on='roomcode')

#select the necessary columns
merge = merge[['roomcode', 'location_id', 'end','capacity', ]].copy()
merge.head()


Out[10]:
roomcode location_id end capacity
0 D1.1.002 001_20_OG01_350300 2015-11-27 14:00:00 8
1 D1.1.002 001_20_OG01_350300 2015-12-18 14:00:00 8
2 D1.1.002 001_20_OG01_350300 2015-12-07 17:00:00 8
3 D1.1.002 001_20_OG01_350300 2015-10-12 14:00:00 8
4 D1.1.002 001_20_OG01_350300 2015-10-13 14:00:00 8

In [11]:
#combine the location information data based on the location_id
data= pd.merge(merge, d_entrCoords[['location_id', 'latitude', 'longitude']],  on=['location_id'])
data.head()


Out[11]:
roomcode location_id end capacity latitude longitude
0 D1.1.002 001_20_OG01_350300 2015-11-27 14:00:00 8 48.213301 16.409166
1 D1.1.002 001_20_OG01_350300 2015-12-18 14:00:00 8 48.213301 16.409166
2 D1.1.002 001_20_OG01_350300 2015-12-07 17:00:00 8 48.213301 16.409166
3 D1.1.002 001_20_OG01_350300 2015-10-12 14:00:00 8 48.213301 16.409166
4 D1.1.002 001_20_OG01_350300 2015-10-13 14:00:00 8 48.213301 16.409166

Estimation of how many students will go to the mensa after a lecture


In [12]:
# We assume that a lecture room is occupied by around 75% of its capacity
ratio_of_students_in_room=0.75
# Further, we assume that only 50% of the students in a lecture room will go to the mensa
ratio_of_students_going_to_mensa=0.5

data['est_students'] = data['capacity'] * ratio_of_students_in_room * ratio_of_students_going_to_mensa
data.sort(['est_students'], ascending=[0]).head()


Out[12]:
roomcode location_id end capacity latitude longitude est_students
5923 TC.1.02 001_20_OG01_020200 2015-11-06 13:00:00 120 48.212588 16.410134 45
6011 TC.1.02 001_20_OG01_020200 2015-11-26 14:00:00 120 48.212588 16.410134 45
6004 TC.1.02 001_20_OG01_020200 2016-01-27 18:00:00 120 48.212588 16.410134 45
6005 TC.1.02 001_20_OG01_020200 2015-11-13 16:00:00 120 48.212588 16.410134 45
6006 TC.1.02 001_20_OG01_020200 2015-11-16 18:00:00 120 48.212588 16.410134 45

Inspect further the data


In [13]:
time_idx = pd.DatetimeIndex(data.end)
data['date'] = time_idx.date.astype('datetime64')
data['weekday'] = time_idx.weekday
data['hour'] = time_idx.hour
data['min'] = time_idx.minute

data.head()


Out[13]:
roomcode location_id end capacity latitude longitude est_students date weekday hour min
0 D1.1.002 001_20_OG01_350300 2015-11-27 14:00:00 8 48.213301 16.409166 3 2015-11-27 4 14 0
1 D1.1.002 001_20_OG01_350300 2015-12-18 14:00:00 8 48.213301 16.409166 3 2015-12-18 4 14 0
2 D1.1.002 001_20_OG01_350300 2015-12-07 17:00:00 8 48.213301 16.409166 3 2015-12-07 0 17 0
3 D1.1.002 001_20_OG01_350300 2015-10-12 14:00:00 8 48.213301 16.409166 3 2015-10-12 0 14 0
4 D1.1.002 001_20_OG01_350300 2015-10-13 14:00:00 8 48.213301 16.409166 3 2015-10-13 1 14 0

In [113]:
d_idx=data.set_index(['end'])

hourly = d_idx.resample('h',how = 'sum')
daily = hourly.resample('d', 'sum')
weekly = d_idx.resample('w', 'sum')
weekly[['est_students']].plot(kind='bar')


Out[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x110879e10>

In [115]:
days = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
daily['dayofweek'] = daily['est_students'].index.dayofweek
daily['hour'] = daily['est_students'].index.hour

grouped = daily.groupby(['dayofweek'])['est_students'].mean()
grouped.index = days

grouped.plot(kind='bar')
plt.title("Average Estimated Number of Studends By Day")
plt.ylabel("Average number");



In [130]:
hourly['hour'] = hourly['est_students'].index.hour
hourly['weekday'] = hourly['est_students'].index.weekday

grouped = hourly.groupby(['hour'])['est_students'].mean()

grouped.plot(kind='bar')
plt.title("Average Estimated Number of Studends By Day")
plt.ylabel("Average number");

#hourly['capacity']['mean'].head()


Out[130]:
<matplotlib.text.Text at 0x115d17bd0>

In [188]:
g_h= hourly.groupby(['hour','weekday']).mean().reset_index()
#['capacity'].plot(kind='bar')
#hourly.plot()
#g_h=g_h.fillna(0)
g_h['avoid']= g_h['est_students']>200

ax=g_h.plot(kind='scatter ', x='hour', y='weekday', s=g_h['est_students'], c='est_students');

#g_h[(g_h['est_students']>0) & (g_h['est_students']<200)].plot(kind='scatter ', x='hour', y='weekday', s=g_h['est_students'],ax=ax,);
#g_h[['weekday','hour', 'capacity']]
g_h[g_h['est_students']>200].head()


Out[188]:
hour weekday capacity latitude longitude est_students min avoid
77 11 0 672.571429 843.724185 287.170434 252.214286 120.000000 True
79 11 2 574.000000 694.265235 236.296370 215.250000 157.000000 True
85 12 1 599.250000 729.225641 248.182419 224.718750 171.250000 True
86 12 2 660.583333 795.516774 270.750695 247.718750 194.375000 True
87 12 3 607.260870 758.830541 258.267204 227.722826 211.304348 True

In [192]:
from bokeh.plotting import figure, gridplot, output_notebook, show
from bokeh.charts import Scatter
output_notebook()


BokehJS successfully loaded.

In [197]:
p = Scatter(g_h, x='hour', y='weekday', title="HP vs MPG", color="est_students",
            xlabel="Miles Per Gallon", ylabel="Horsepower")
show(p)


Filter for this week


In [ ]:
#get current date and weekday
today = datetime.date.today()
weekday = today.weekday()

#get start and end of this week
start_of_week = today - datetime.timedelta(days=weekday)
end_of_week   = start_of_week + datetime.timedelta(days=5)

#generate filter for lectures which happen this week
mask = (data['end'] > start_of_week) & (data['end'] <= end_of_week)
cur_week= data.loc[mask].copy()
cur_week.head()

Filter only lectures which end between 11 and 15 o'clock


In [ ]:
#filter for all lectures between 11 and 15
mask = (cur_week['hour'] > 11) & (cur_week['hour'] <= 15)
filtered_week= cur_week.loc[mask].copy()
filtered_week.head(5)

Aggregate by weekday, hour and minute


In [ ]:
#groupby and aggregate
g_wh = filtered_week.groupby(['weekday', 'hour','min']).sum()
g_wh=g_wh[['est_students']]

#The top 10 worst times to go to the mensa
g_wh.sort(['est_students'], ascending=[0]).head(10)

Lets do some plotting

What are the worst days and times of this week to go to the mensa?


In [ ]:
#sort by estimated #of students
ax=g_wh.sort(['est_students'], ascending=[0]).plot(y=['est_students'], kind='bar', legend=False, title="Maximum number of students")
ax.set_xlabel("day and hour of week",fontsize=26)
ax.set_ylabel("#students",fontsize=22)
plt.show()

Better estimation - incooperating the distance to the mensa

Get Mensa coordinates

The mensa building is D1 and we get the coordinats from the All main entrances at WU Vienna dataset (http://data.wu.ac.at/dataset/entrances/resource/f92ecfd9-3f0a-4dfd-b6ff-1b0acf0b5340)


In [ ]:
mensa_lat=48.21330080810941
mensa_long=16.409165877046618

Function to estimate the distance from lecture building to mensa entrance


In [ ]:
import math

#from http://www.johndcook.com/blog/python_longitude_latitude/
def distance_on_unit_sphere(lat1, long1, lat2, long2):
 
    # Convert latitude and longitude to 
    # spherical coordinates in radians.
    degrees_to_radians = math.pi/180.0
         
    # phi = 90 - latitude
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
         
    # theta = longitude
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
         
    # Compute spherical distance from spherical coordinates.
         
    # For two locations in spherical coordinates 
    # (1, theta, phi) and (1, theta', phi')
    # cosine( arc length ) = 
    #    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
    # distance = rho * arc length
     
    cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) + 
           math.cos(phi1)*math.cos(phi2))
    arc = math.acos( cos )
 
    # Remember to multiply arc by the radius of the earth 
    # in your favorite set of units to get length.
    
    #convert to kilometers by multiplying it with 6373*1000
    m = arc*6373*1000

    #add additional 100m to cater for walking up and down in buildings
    return (m + 100)

Add estimated distance to our data


In [ ]:
filtered_week['distance'] = filtered_week.apply(lambda row: distance_on_unit_sphere(row['latitude'], row['longitude'], mensa_lat, mensa_long), axis=1)
filtered_week.head()
#distance_on_unit_sphere(48.213301, 16.409166, mensa_lat, mensa_long)

Estimate the arrival time of the students based on lecture end time and distance to mensa


In [ ]:
#compute additional delay considering an average walking speed of 1m/s (https://www.google.at/search?client=safari&rls=en&q=how+fast+do+we+walk+on+average&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=NDqWVrHkNcGF8QfC0L7oDw#q=how+fast+do+we+walk+on+average+meters+per+second)

avg_walking_speed=1 # 1 m/s

def estimated_arrival(end, distance, avg_walking_speed):
    """ Estimate the arrival time based on the lecture end time and the time to walk from the lecture to the mensa"""
    walking_time= distance*avg_walking_speed
    return end+pd.Timedelta(seconds=walking_time)

filtered_week['avg_arrival_time'] = filtered_week.apply(lambda row: estimated_arrival(row['end'], row['distance'],avg_walking_speed ), axis=1)


est_data=filtered_week[['est_students','weekday','hour','min','distance','avg_arrival_time']].copy()

#build again an index on the arrival time
time_idx = pd.DatetimeIndex(est_data.avg_arrival_time)
est_data['arr_min'] = time_idx.minute

est_data.head()

In [ ]:
g_data = est_data.sort(['weekday','hour','arr_min'], ascending=[1,1,1])
g_data= g_data.groupby(['weekday', 'hour','min']).sum()

g_data['avoid'] = g_data['est_students'] > 100
g_data.head()

In [ ]:
ax=g_data.sort(['est_students'], ascending=[0]).plot( y=['est_students'], 
                kind='bar', legend=False, 
                title="Maximum number of students",
                color=g_data.sort(['est_students'], ascending=[0]).avoid.map({True: 'r', False: 'k'})
               ,alpha=0.75)
              
plt.setp(ax.get_xticklabels(), rotation='vertical', fontsize=14)
plt.setp(ax.get_yticklabels(), fontsize=14)

ax.set_xlabel("date and hour of week",fontsize=22)
ax.set_ylabel("capacity",fontsize=22)
plt.show()

In [ ]:


In [ ]: