Peak-times at WU Mensa

We want to quickly showcase how some of the Open Data of WU Vienna can be used to estimate the peak times at the WU mensa by estimating the number of students and their arrival time based on lecture and the geolocation information of the lecture rooms.



In [3]:

    
#import necessary python modules
%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 6)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns; sns.set()
import datetime









    



Populating the interactive namespace from numpy and matplotlib

Data sources

We will use three different data sources for this experiment from the Open Data portal of the WU

The All Course-Events during WS15 at WU Vienna dataset containing information about lecture events. We will use this data to get the lecture rooms, buildings and the end times of the courses.



In [4]:

    
course_events='http://data.wu.ac.at/dataset/812a45f5-3f69-457a-b496-dbd598456829/resource/bfe6cda0-258f-47b2-876d-2913e5459a66/download/allcoursesandevents15w.csv'

The All Campus Rooms at WU Vienna dataset containing information about lecture rooms. We will use this data to get the capacity of a lecture room.



In [5]:

    
room_info='http://data.wu.ac.at/dataset/fed3bae6-397c-4f4c-9c14-15aa8443d268/resource/d17a0d32-562a-4b37-9f32-ce06c4482583/download/allcampusrooms.csv'

The Mapping of campus rooms and main entrances at WU Vienna dataset containing information about the campus rooms and the main entrance of the building. We will use this data to get the geo-locations of the entraces of the lecture rooms.



In [6]:

    
entrance_coord='http://data.wu.ac.at/dataset/cc76ca38-a904-4909-b621-1f7be63b821b/resource/8b67906a-7459-46eb-b4e0-ee312150a330/download/entrancesroomsmapping.csv'

Collecting and inspecting the data



In [7]:

    
#get the course
d_courseEvents= pd.read_csv(course_events,parse_dates=['start', 'end'], dtype={'course_id': object})
d_courseEvents.head()









    Out[7]:






  
    
      
      course_id
      semester
      name
      roomcode
      buildingname
      start
      end
    
  
  
    
      0
       0001
       15W
       Grundlagen der Volkswirtschaftslehre
       TC.0.10 Audimax
       TC
      2015-10-12 13:30:00
      2015-10-12 16:00:00
    
    
      1
       0001
       15W
       Grundlagen der Volkswirtschaftslehre
       TC.0.10 Audimax
       TC
      2015-10-14 13:30:00
      2015-10-14 16:00:00
    
    
      2
       0001
       15W
       Grundlagen der Volkswirtschaftslehre
       TC.0.10 Audimax
       TC
      2015-10-19 13:30:00
      2015-10-19 16:00:00
    
    
      3
       0001
       15W
       Grundlagen der Volkswirtschaftslehre
       TC.0.10 Audimax
       TC
      2015-10-21 13:30:00
      2015-10-21 16:00:00
    
    
      4
       0001
       15W
       Grundlagen der Volkswirtschaftslehre
       TC.0.10 Audimax
       TC
      2015-10-28 13:30:00
      2015-10-28 16:00:00



In [8]:

    
#get room information
d_roomInfo=pd.read_csv(room_info)
d_roomInfo.head()









    Out[8]:






  
    
      
      location_id
      capacity
      roomcode
      floorname
      buildingname
      category_en
      category_de
      roomname_en
      roomname_de
      fancyname_de
      fancyname_en
    
  
  
    
      0
       001_10_OG01_110900
        3
       LC.2.169
       OG01
       LC
        Tandem room
             Tandemraum
       LC.2.169 Tandem room
       LC.2.169 Tandemraum
        Tandemraum
        Tandem room
    
    
      1
       001_20_OG01_350300
        8
       D1.1.002
       OG01
       D1
       Project room
            Projektraum
                   D1.1.002
                  D1.1.002
               NaN
                NaN
    
    
      2
       001_10_OG01_310800
        8
       LC.2.024
       OG01
       LC
       Project room
            Projektraum
                   LC.2.024
                  LC.2.024
               NaN
                NaN
    
    
      3
       001_10_OG01_112500
       15
       LC.2.232
       OG01
       LC
       Meeting room
       Besprechungsraum
                   LC.2.232
                  LC.2.232
               NaN
                NaN
    
    
      4
       001_30_OG01_530000
        8
       D2.1.512
       OG01
       D2
       Meeting room
       Besprechungsraum
                   D2.1.512
                  D2.1.512
               NaN
                NaN



In [9]:

    
#load data about entrance coordinates for the lecture rooms
d_entrCoords= pd.read_csv(entrance_coord, dtype={'course_id': object})
d_entrCoords.head()









    Out[9]:






  
    
      
      location_id
      entrance_id
      entrance_name_de
      entrance_name_en
      latitude
      longitude
    
  
  
    
      0
       001_10_E101_310100
       10251
       LC Eingang
       LC Entrance
       48.213554
       16.408457
    
    
      1
       001_10_E101_310200
       10251
       LC Eingang
       LC Entrance
       48.213554
       16.408457
    
    
      2
       001_10_E101_310300
       10251
       LC Eingang
       LC Entrance
       48.213554
       16.408457
    
    
      3
       001_10_E101_310500
       10251
       LC Eingang
       LC Entrance
       48.213554
       16.408457
    
    
      4
       001_10_E101_310600
       10251
       LC Eingang
       LC Entrance
       48.213554
       16.408457

Combining the three data sources



In [10]:

    
#merge lecture events and room information on roomcode
merge = d_roomInfo.merge(d_courseEvents, on='roomcode')

#select the necessary columns
merge = merge[['roomcode', 'location_id', 'end','capacity', ]].copy()
merge.head()









    Out[10]:






  
    
      
      roomcode
      location_id
      end
      capacity
    
  
  
    
      0
       D1.1.002
       001_20_OG01_350300
      2015-11-27 14:00:00
       8
    
    
      1
       D1.1.002
       001_20_OG01_350300
      2015-12-18 14:00:00
       8
    
    
      2
       D1.1.002
       001_20_OG01_350300
      2015-12-07 17:00:00
       8
    
    
      3
       D1.1.002
       001_20_OG01_350300
      2015-10-12 14:00:00
       8
    
    
      4
       D1.1.002
       001_20_OG01_350300
      2015-10-13 14:00:00
       8



In [11]:

    
#combine the location information data based on the location_id
data= pd.merge(merge, d_entrCoords[['location_id', 'latitude', 'longitude']],  on=['location_id'])
data.head()









    Out[11]:






  
    
      
      roomcode
      location_id
      end
      capacity
      latitude
      longitude
    
  
  
    
      0
       D1.1.002
       001_20_OG01_350300
      2015-11-27 14:00:00
       8
       48.213301
       16.409166
    
    
      1
       D1.1.002
       001_20_OG01_350300
      2015-12-18 14:00:00
       8
       48.213301
       16.409166
    
    
      2
       D1.1.002
       001_20_OG01_350300
      2015-12-07 17:00:00
       8
       48.213301
       16.409166
    
    
      3
       D1.1.002
       001_20_OG01_350300
      2015-10-12 14:00:00
       8
       48.213301
       16.409166
    
    
      4
       D1.1.002
       001_20_OG01_350300
      2015-10-13 14:00:00
       8
       48.213301
       16.409166

Estimation of how many students will go to the mensa after a lecture



In [12]:

    
# We assume that a lecture room is occupied by around 75% of its capacity
ratio_of_students_in_room=0.75
# Further, we assume that only 50% of the students in a lecture room will go to the mensa
ratio_of_students_going_to_mensa=0.5

data['est_students'] = data['capacity'] * ratio_of_students_in_room * ratio_of_students_going_to_mensa
data.sort(['est_students'], ascending=[0]).head()









    Out[12]:






  
    
      
      roomcode
      location_id
      end
      capacity
      latitude
      longitude
      est_students
    
  
  
    
      5923
       TC.1.02
       001_20_OG01_020200
      2015-11-06 13:00:00
       120
       48.212588
       16.410134
       45
    
    
      6011
       TC.1.02
       001_20_OG01_020200
      2015-11-26 14:00:00
       120
       48.212588
       16.410134
       45
    
    
      6004
       TC.1.02
       001_20_OG01_020200
      2016-01-27 18:00:00
       120
       48.212588
       16.410134
       45
    
    
      6005
       TC.1.02
       001_20_OG01_020200
      2015-11-13 16:00:00
       120
       48.212588
       16.410134
       45
    
    
      6006
       TC.1.02
       001_20_OG01_020200
      2015-11-16 18:00:00
       120
       48.212588
       16.410134
       45

Inspect further the data



In [13]:

    
time_idx = pd.DatetimeIndex(data.end)
data['date'] = time_idx.date.astype('datetime64')
data['weekday'] = time_idx.weekday
data['hour'] = time_idx.hour
data['min'] = time_idx.minute

data.head()









    Out[13]:






  
    
      
      roomcode
      location_id
      end
      capacity
      latitude
      longitude
      est_students
      date
      weekday
      hour
      min
    
  
  
    
      0
       D1.1.002
       001_20_OG01_350300
      2015-11-27 14:00:00
       8
       48.213301
       16.409166
       3
      2015-11-27
       4
       14
       0
    
    
      1
       D1.1.002
       001_20_OG01_350300
      2015-12-18 14:00:00
       8
       48.213301
       16.409166
       3
      2015-12-18
       4
       14
       0
    
    
      2
       D1.1.002
       001_20_OG01_350300
      2015-12-07 17:00:00
       8
       48.213301
       16.409166
       3
      2015-12-07
       0
       17
       0
    
    
      3
       D1.1.002
       001_20_OG01_350300
      2015-10-12 14:00:00
       8
       48.213301
       16.409166
       3
      2015-10-12
       0
       14
       0
    
    
      4
       D1.1.002
       001_20_OG01_350300
      2015-10-13 14:00:00
       8
       48.213301
       16.409166
       3
      2015-10-13
       1
       14
       0



In [113]:

    
d_idx=data.set_index(['end'])

hourly = d_idx.resample('h',how = 'sum')
daily = hourly.resample('d', 'sum')
weekly = d_idx.resample('w', 'sum')
weekly[['est_students']].plot(kind='bar')









    Out[113]:





<matplotlib.axes._subplots.AxesSubplot at 0x110879e10>



In [115]:

    
days = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
daily['dayofweek'] = daily['est_students'].index.dayofweek
daily['hour'] = daily['est_students'].index.hour

grouped = daily.groupby(['dayofweek'])['est_students'].mean()
grouped.index = days

grouped.plot(kind='bar')
plt.title("Average Estimated Number of Studends By Day")
plt.ylabel("Average number");



In [130]:

    
hourly['hour'] = hourly['est_students'].index.hour
hourly['weekday'] = hourly['est_students'].index.weekday

grouped = hourly.groupby(['hour'])['est_students'].mean()

grouped.plot(kind='bar')
plt.title("Average Estimated Number of Studends By Day")
plt.ylabel("Average number");

#hourly['capacity']['mean'].head()









    Out[130]:





<matplotlib.text.Text at 0x115d17bd0>



In [188]:

    
g_h= hourly.groupby(['hour','weekday']).mean().reset_index()
#['capacity'].plot(kind='bar')
#hourly.plot()
#g_h=g_h.fillna(0)
g_h['avoid']= g_h['est_students']>200

ax=g_h.plot(kind='scatter ', x='hour', y='weekday', s=g_h['est_students'], c='est_students');

#g_h[(g_h['est_students']>0) & (g_h['est_students']<200)].plot(kind='scatter ', x='hour', y='weekday', s=g_h['est_students'],ax=ax,);
#g_h[['weekday','hour', 'capacity']]
g_h[g_h['est_students']>200].head()









    Out[188]:






  
    
      
      hour
      weekday
      capacity
      latitude
      longitude
      est_students
      min
      avoid
    
  
  
    
      77
       11
       0
       672.571429
       843.724185
       287.170434
       252.214286
       120.000000
       True
    
    
      79
       11
       2
       574.000000
       694.265235
       236.296370
       215.250000
       157.000000
       True
    
    
      85
       12
       1
       599.250000
       729.225641
       248.182419
       224.718750
       171.250000
       True
    
    
      86
       12
       2
       660.583333
       795.516774
       270.750695
       247.718750
       194.375000
       True
    
    
      87
       12
       3
       607.260870
       758.830541
       258.267204
       227.722826
       211.304348
       True



In [192]:

    
from bokeh.plotting import figure, gridplot, output_notebook, show
from bokeh.charts import Scatter
output_notebook()









    




    


    

    
        
        BokehJS successfully loaded.



In [197]:

    
p = Scatter(g_h, x='hour', y='weekday', title="HP vs MPG", color="est_students",
            xlabel="Miles Per Gallon", ylabel="Horsepower")
show(p)

Filter for this week



In [ ]:

    
#get current date and weekday
today = datetime.date.today()
weekday = today.weekday()

#get start and end of this week
start_of_week = today - datetime.timedelta(days=weekday)
end_of_week   = start_of_week + datetime.timedelta(days=5)

#generate filter for lectures which happen this week
mask = (data['end'] > start_of_week) & (data['end'] <= end_of_week)
cur_week= data.loc[mask].copy()
cur_week.head()

Filter only lectures which end between 11 and 15 o'clock



In [ ]:

    
#filter for all lectures between 11 and 15
mask = (cur_week['hour'] > 11) & (cur_week['hour'] <= 15)
filtered_week= cur_week.loc[mask].copy()
filtered_week.head(5)

Aggregate by weekday, hour and minute



In [ ]:

    
#groupby and aggregate
g_wh = filtered_week.groupby(['weekday', 'hour','min']).sum()
g_wh=g_wh[['est_students']]

#The top 10 worst times to go to the mensa
g_wh.sort(['est_students'], ascending=[0]).head(10)

Lets do some plotting

What are the worst days and times of this week to go to the mensa?



In [ ]:

    
#sort by estimated #of students
ax=g_wh.sort(['est_students'], ascending=[0]).plot(y=['est_students'], kind='bar', legend=False, title="Maximum number of students")
ax.set_xlabel("day and hour of week",fontsize=26)
ax.set_ylabel("#students",fontsize=22)
plt.show()

Better estimation - incooperating the distance to the mensa

Get Mensa coordinates

The mensa building is D1 and we get the coordinats from the All main entrances at WU Vienna dataset (http://data.wu.ac.at/dataset/entrances/resource/f92ecfd9-3f0a-4dfd-b6ff-1b0acf0b5340)



In [ ]:

    
mensa_lat=48.21330080810941
mensa_long=16.409165877046618

Function to estimate the distance from lecture building to mensa entrance



In [ ]:

    
import math

#from http://www.johndcook.com/blog/python_longitude_latitude/
def distance_on_unit_sphere(lat1, long1, lat2, long2):
 
    # Convert latitude and longitude to 
    # spherical coordinates in radians.
    degrees_to_radians = math.pi/180.0
         
    # phi = 90 - latitude
    phi1 = (90.0 - lat1)*degrees_to_radians
    phi2 = (90.0 - lat2)*degrees_to_radians
         
    # theta = longitude
    theta1 = long1*degrees_to_radians
    theta2 = long2*degrees_to_radians
         
    # Compute spherical distance from spherical coordinates.
         
    # For two locations in spherical coordinates 
    # (1, theta, phi) and (1, theta', phi')
    # cosine( arc length ) = 
    #    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
    # distance = rho * arc length
     
    cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) + 
           math.cos(phi1)*math.cos(phi2))
    arc = math.acos( cos )
 
    # Remember to multiply arc by the radius of the earth 
    # in your favorite set of units to get length.
    
    #convert to kilometers by multiplying it with 6373*1000
    m = arc*6373*1000

    #add additional 100m to cater for walking up and down in buildings
    return (m + 100)

Add estimated distance to our data



In [ ]:

    
filtered_week['distance'] = filtered_week.apply(lambda row: distance_on_unit_sphere(row['latitude'], row['longitude'], mensa_lat, mensa_long), axis=1)
filtered_week.head()
#distance_on_unit_sphere(48.213301, 16.409166, mensa_lat, mensa_long)

Estimate the arrival time of the students based on lecture end time and distance to mensa



In [ ]:

    
#compute additional delay considering an average walking speed of 1m/s (https://www.google.at/search?client=safari&rls=en&q=how+fast+do+we+walk+on+average&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=NDqWVrHkNcGF8QfC0L7oDw#q=how+fast+do+we+walk+on+average+meters+per+second)

avg_walking_speed=1 # 1 m/s

def estimated_arrival(end, distance, avg_walking_speed):
    """ Estimate the arrival time based on the lecture end time and the time to walk from the lecture to the mensa"""
    walking_time= distance*avg_walking_speed
    return end+pd.Timedelta(seconds=walking_time)

filtered_week['avg_arrival_time'] = filtered_week.apply(lambda row: estimated_arrival(row['end'], row['distance'],avg_walking_speed ), axis=1)


est_data=filtered_week[['est_students','weekday','hour','min','distance','avg_arrival_time']].copy()

#build again an index on the arrival time
time_idx = pd.DatetimeIndex(est_data.avg_arrival_time)
est_data['arr_min'] = time_idx.minute

est_data.head()



In [ ]:

    
g_data = est_data.sort(['weekday','hour','arr_min'], ascending=[1,1,1])
g_data= g_data.groupby(['weekday', 'hour','min']).sum()

g_data['avoid'] = g_data['est_students'] > 100
g_data.head()



In [ ]:

    
ax=g_data.sort(['est_students'], ascending=[0]).plot( y=['est_students'], 
                kind='bar', legend=False, 
                title="Maximum number of students",
                color=g_data.sort(['est_students'], ascending=[0]).avoid.map({True: 'r', False: 'k'})
               ,alpha=0.75)
              
plt.setp(ax.get_xticklabels(), rotation='vertical', fontsize=14)
plt.setp(ax.get_yticklabels(), fontsize=14)

ax.set_xlabel("date and hour of week",fontsize=22)
ax.set_ylabel("capacity",fontsize=22)
plt.show()



In [ ]:



In [ ]:

	course_id	semester	name	roomcode	buildingname	start	end
0	0001	15W	Grundlagen der Volkswirtschaftslehre	TC.0.10 Audimax	TC	2015-10-12 13:30:00	2015-10-12 16:00:00
1	0001	15W	Grundlagen der Volkswirtschaftslehre	TC.0.10 Audimax	TC	2015-10-14 13:30:00	2015-10-14 16:00:00
2	0001	15W	Grundlagen der Volkswirtschaftslehre	TC.0.10 Audimax	TC	2015-10-19 13:30:00	2015-10-19 16:00:00
3	0001	15W	Grundlagen der Volkswirtschaftslehre	TC.0.10 Audimax	TC	2015-10-21 13:30:00	2015-10-21 16:00:00
4	0001	15W	Grundlagen der Volkswirtschaftslehre	TC.0.10 Audimax	TC	2015-10-28 13:30:00	2015-10-28 16:00:00

	location_id	capacity	roomcode	floorname	buildingname	category_en	category_de	roomname_en	roomname_de	fancyname_de	fancyname_en
0	001_10_OG01_110900	3	LC.2.169	OG01	LC	Tandem room	Tandemraum	LC.2.169 Tandem room	LC.2.169 Tandemraum	Tandemraum	Tandem room
1	001_20_OG01_350300	8	D1.1.002	OG01	D1	Project room	Projektraum	D1.1.002	D1.1.002	NaN	NaN
2	001_10_OG01_310800	8	LC.2.024	OG01	LC	Project room	Projektraum	LC.2.024	LC.2.024	NaN	NaN
3	001_10_OG01_112500	15	LC.2.232	OG01	LC	Meeting room	Besprechungsraum	LC.2.232	LC.2.232	NaN	NaN
4	001_30_OG01_530000	8	D2.1.512	OG01	D2	Meeting room	Besprechungsraum	D2.1.512	D2.1.512	NaN	NaN

	location_id	entrance_id	entrance_name_de	entrance_name_en	latitude	longitude
0	001_10_E101_310100	10251	LC Eingang	LC Entrance	48.213554	16.408457
1	001_10_E101_310200	10251	LC Eingang	LC Entrance	48.213554	16.408457
2	001_10_E101_310300	10251	LC Eingang	LC Entrance	48.213554	16.408457
3	001_10_E101_310500	10251	LC Eingang	LC Entrance	48.213554	16.408457
4	001_10_E101_310600	10251	LC Eingang	LC Entrance	48.213554	16.408457

	roomcode	location_id	end	capacity
0	D1.1.002	001_20_OG01_350300	2015-11-27 14:00:00	8
1	D1.1.002	001_20_OG01_350300	2015-12-18 14:00:00	8
2	D1.1.002	001_20_OG01_350300	2015-12-07 17:00:00	8
3	D1.1.002	001_20_OG01_350300	2015-10-12 14:00:00	8
4	D1.1.002	001_20_OG01_350300	2015-10-13 14:00:00	8

	roomcode	location_id	end	capacity	latitude	longitude	est_students
5923	TC.1.02	001_20_OG01_020200	2015-11-06 13:00:00	120	48.212588	16.410134	45
6011	TC.1.02	001_20_OG01_020200	2015-11-26 14:00:00	120	48.212588	16.410134	45
6004	TC.1.02	001_20_OG01_020200	2016-01-27 18:00:00	120	48.212588	16.410134	45
6005	TC.1.02	001_20_OG01_020200	2015-11-13 16:00:00	120	48.212588	16.410134	45
6006	TC.1.02	001_20_OG01_020200	2015-11-16 18:00:00	120	48.212588	16.410134	45

	hour	weekday	capacity	latitude	longitude	est_students	min	avoid
77	11	0	672.571429	843.724185	287.170434	252.214286	120.000000	True
79	11	2	574.000000	694.265235	236.296370	215.250000	157.000000	True
85	12	1	599.250000	729.225641	248.182419	224.718750	171.250000	True
86	12	2	660.583333	795.516774	270.750695	247.718750	194.375000	True
87	12	3	607.260870	758.830541	258.267204	227.722826	211.304348	True