Data set 2: railway stations equipment for disabled people

Source and documentation: http://data.sbb.ch/explore/dataset/equipement/

Open your dataset up using pandas in a Jupyter notebook
Do a .head() to get a feel for your data
Write down 12 questions to ask your data, or 12 things to hunt for in the data
Attempt to answer those ten questions using the magic of pandas
Make three charts with your dataset
Keep track of anything that was problematic - it can be non-standard country names, extra spaces in columns, trouble getting multiple colors in scatterplots, whatever you'd like.



In [1]:

    
import pandas as pd



In [2]:

    
df = pd.read_csv("stations.csv", delimiter=';')



In [3]:

    
df.head(5)









    Out[3]:






  
    
      
      didok
      station_name
      accessible_ticket
      accessible_wc
      wheelchair_load
      mobilift
      stepless_perron
      bats
      autelca
      automat_ktu
      geopos
      TUNummer
      Betriebspunkttyp
    
  
  
    
      0
      10
      Basel SBB
      1.0
      1.0
      1.0
      1.0
      1.0
      10 Min. Übergangszeit für Züge von und nach Fr...
      0.0
      19.0
      47.5474041527, 7.58955146721
      1
      Haltestelle
    
    
      1
      2213
      Wohlen
      1.0
      1.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      47.3484618374, 8.26978121872
      1
      Haltestelle_und_Bedienpunkt
    
    
      2
      2116
      Schinznach Bad
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      47.4517752695, 8.16692832223
      1
      Haltestelle
    
    
      3
      2211
      Hendschiken
      1.0
      0.0
      1.0
      0.0
      0.0
      NaN
      0.0
      1.0
      47.3894242237, 8.20737399629
      1
      Haltestelle
    
    
      4
      2226
      Knonau
      0.0
      0.0
      0.0
      1.0
      1.0
      NaN
      0.0
      1.0
      47.2202591199, 8.4667085831
      1
      Haltestelle



In [4]:

    
print("Q1: How many train station is there in Switzerland? How many of them are accessible for wheelchairs?")

accessible_perron = df[df['stepless_perron'] == 1]

print("A: There is", df.shape[0], "stations.", accessible_perron.shape[0], "of them have a stepless perron.")
#df.describe()
#df['accessible_ticket'].value_counts()









    



Q1: How many train station is there in Switzerland? How many of them are accessible for wheelchairs?
A: There is 731 stations. 460 of them have a stepless perron.



In [5]:

    
import math
print("Q2: What does “bats” stand for? Is there interesting data about that?")

print("A: This field contains comments in french, italian and german about the stations.")

print("The metadata says: “Comment field (Billet Automat Touch Screen?)” with a question mark.")

bats = df[df['bats'] == df['bats']] # check if equals to himself to get rid of the NaN
#bats[['station_name','bats']].head(5) # to explore the data

print("Some stations are marked as “not served” in french. The “bats” column enables us to get a non-extensive list of these:")

bats[bats['bats'] == 'non desservi'][['station_name','bats']]









    



Q2: What does “bats” stand for? Is there interesting data about that?
A: This field contains comments in french, italian and german about the stations.
The metadata says: “Comment field (Billet Automat Touch Screen?)” with a question mark.
Some stations are marked as “not served” in french. The “bats” column enables us to get a non-extensive list of these:






    Out[5]:






  
    
      
      station_name
      bats
    
  
  
    
      54
      Evouettes, Les
      non desservi
    
    
      147
      Vouvry
      non desservi
    
    
      333
      Collombey
      non desservi
    
    
      334
      Vionnaz
      non desservi
    
    
      335
      St-Gingolph (Suisse)
      non desservi
    
    
      421
      Bouveret
      non desservi



In [6]:

    
print("Q3: How many stations offer help to board and get out of the trains? What percentage of the stations does it represent?")
with_help = df['wheelchair_load'].value_counts()[1]
total = with_help + df['wheelchair_load'].value_counts()[0]
percentage = (with_help*100) / total
print("A:", with_help, "stations offer help to board and get out of trains. This represents", "%.2f" % percentage + "% of the total.")









    



Q3: How many stations offer help to board and get out of the trains? What percentage of the stations does it represent?
A: 193 stations offer help to board and get out of trains. This represents 27.03% of the total.



In [7]:

    
print("Q4: Which are the 10 least equipped stations?")

# For our rating, we just want to know if “automat_ktu” is more than 1, and not add the actual number.
def automat(x):
    if x > 0:
        return 1
    else:
        return 0

df['rating'] = df['accessible_ticket'] + df['accessible_wc'] + df['wheelchair_load'] + df['mobilift'] + df['stepless_perron'] + df['autelca'] + df['automat_ktu'].apply(automat)

zeroRatingCount = df['didok'][df['rating'] == 0].count()

print("A: These 10 stations have zero recorded equipment. But", zeroRatingCount-10, "other station have no equipment.")


df[['station_name', 'rating']].sort_values(by='rating').head(10)









    



Q4: Which are the 10 least equipped stations?
A: These 10 stations have zero recorded equipment. But 33 other station have no equipment.






    Out[7]:






  
    
      
      station_name
      rating
    
  
  
    
      98
      Deurres, Les
      0.0
    
    
      589
      Kerzers Papiliorama
      0.0
    
    
      114
      Kölliken Oberdorf
      0.0
    
    
      606
      Schloss Laufen am Rheinfall
      0.0
    
    
      408
      Tuileries, Les
      0.0
    
    
      293
      Grandson
      0.0
    
    
      462
      Mühlau
      0.0
    
    
      465
      Chamoson
      0.0
    
    
      710
      Worb SBB
      0.0
    
    
      119
      Rekingen AG
      0.0



In [8]:

    
print('Q5: Is there areas with a lot of non-equipped stations?')

zeroRating = df[df['rating'] == 0]
# posList = df['geopos'].tolist
geopos_array = []
neighbors_list = []

# 1) store all positions in an array
for index, row in zeroRating.iterrows():
    geopos = row['geopos'].split(', ')
    latlng = [float(item) for item in geopos]
    geopos_array.append(latlng)
    neighbors_list.append(row['station_name'])

# 2) store “neighbour” non-equipped stations in two new columns

# [latitude = 47.452833265 N, longitude = 8.70557069755 E]
# In Switzerland, 1 ~= 110 km for latitude, 75 km for longitude
# Let's say 50 km is enough to be "neighbour" stations (31 miles, a bit more than Columbia University - Newark)
# It's a rough calculation: the surface in which cities are "neighbour" is a square and not a circle.
latdistance = 0.4
longdistance = 0.6

df['neighbors_unequipped'] = 0
df['neighbors_names'] = ''

for index, row in zeroRating.iterrows(): # iteration 1: through the stations
    neighbors_count = 0
    neighbors_index = 0
    neighbors_currlist = []
    geopos = row['geopos'].split(', ')
    latlng = [float(item) for item in geopos]
    
    for loc in geopos_array: # iteration 2: through the array of locations
        if loc != latlng: # we check if this isn't the current station location
            if (abs(loc[0] - latlng[0]) < latdistance) and (abs(loc[1] - latlng[1]) < longdistance):
                neighbors_count += 1
                neighbors_currlist.append(neighbors_list[neighbors_index])
        neighbors_index += 1

    df.loc[index, 'neighbors_unequipped'] = neighbors_count
    df.loc[index, 'neighbors_names'] = str.join(', ', neighbors_currlist)

df[df['neighbors_unequipped'] > 0].sort_values(by='neighbors_unequipped', ascending=False).head(10)

top_unequipped = df[df['neighbors_unequipped'] > 0].sort_values(by='neighbors_unequipped', ascending=False)


print("[We create two new columns and get this list -- text answer in the next cell]")
top_unequipped[['station_name', 'neighbors_unequipped', 'neighbors_names']].head(10)









    



Q5: Is there areas with a lot of non-equipped stations?
[We create two new columns and get this list -- text answer in the next cell]






    Out[8]:






  
    
      
      station_name
      neighbors_unequipped
      neighbors_names
    
  
  
    
      543
      Baldegg Kloster
      16
      Kemptthal, Blumenau, Walterswil-Striegel, Köll...
    
    
      351
      Küngoldingen
      15
      Walterswil-Striegel, Kölliken Oberdorf, Reking...
    
    
      671
      Emmenmatt
      15
      Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
    
    
      201
      Trimbach
      14
      Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
    
    
      674
      Littau
      14
      Kemptthal, Blumenau, Walterswil-Striegel, Köll...
    
    
      114
      Kölliken Oberdorf
      14
      Walterswil-Striegel, Rekingen AG, Rothenburg, ...
    
    
      635
      Lyssach
      14
      Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
    
    
      158
      Rothenburg
      14
      Kemptthal, Blumenau, Walterswil-Striegel, Köll...
    
    
      236
      Olten Hammer
      14
      Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
    
    
      595
      Signau
      13
      Suberg-Grossaffoltern, Rothenburg, Brügg BE, K...



In [9]:

    
# 3) new for loop to get a “clean” list of different locations,
# which means we avoid printing two stations from the same area

name_list = []
reject_list = []

for index, row in top_unequipped.iterrows(): # iteration 1: through the stations
    rejects = row['neighbors_names'] # we want to get the different areas
    reject_list.extend(rejects.split(', '))
    if row['station_name'] not in reject_list:
        name_list.append("“" + row['station_name'] + "”: " + str(row['neighbors_unequipped']))

print("Answer to Q5: These places count the most unequipped stations within about 50 km:", str.join(', ', name_list))









    



Answer to Q5: These places count the most unequipped stations within about 50 km: “Baldegg Kloster”: 16, “Deurres, Les”: 9, “Quartino”: 4, “Creux-de-Genthod”: 1, “Tuileries, Les”: 1



In [10]:

    
print("Q6: What is the average station “rating”, based on the number of equipments?")
print("A: The average rating is", df['rating'].mean())









    



Q6: What is the average station “rating”, based on the number of equipments?
A: The average rating is 2.32112676056



In [11]:

    
print("Q7: Merge this data with the passenger frequence. Is it a perfect match?")
dfp = pd.read_csv('passagierfrequenz.csv', delimiter=';')
dfm = df.merge(dfp, how='inner', left_on='station_name', right_on='Station')

print("A: It isn’t a perfect match. The “equipment” dataset has", df.shape[0], "rows; the “frequency” dataset has", dfp.shape[0], "rows; the merged dataset has only", dfm.shape[0], "rows.")
print("Here is the merged dataset:")

dfm









    



Q7: Merge this data with the passenger frequence. Is it a perfect match?
A: It isn’t a perfect match. The “equipment” dataset has 731 rows; the “frequency” dataset has 724 rows; the merged dataset has only 646 rows.
Here is the merged dataset:






    Out[11]:






  
    
      
      didok
      station_name
      accessible_ticket
      accessible_wc
      wheelchair_load
      mobilift
      stepless_perron
      bats
      autelca
      automat_ktu
      ...
      neighbors_unequipped
      neighbors_names
      Code
      Station
      Year
      DTV
      DWV
      Owner
      Comments
      geopos_y
    
  
  
    
      0
      10
      Basel SBB
      1.0
      1.0
      1.0
      1.0
      1.0
      10 Min. Übergangszeit für Züge von und nach Fr...
      0.0
      19.0
      ...
      0
      
      BS
      Basel SBB
      2014
      101400
      112900
      SBB
      Passagierfrequenzen: ohne SNCF.
      47.5474041527, 7.58955146721
    
    
      1
      2213
      Wohlen
      1.0
      1.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      WO
      Wohlen
      2014
      5200
      6200
      SBB
      Passagierfrequenzen: ohne BDWM.
      47.3484618374, 8.26978121872
    
    
      2
      2116
      Schinznach Bad
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      SBAD
      Schinznach Bad
      2014
      530
      600
      SBB
      NaN
      47.4517752695, 8.16692832223
    
    
      3
      2211
      Hendschiken
      1.0
      0.0
      1.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      HDK
      Hendschiken
      2014
      380
      450
      SBB
      NaN
      47.3894242237, 8.20737399629
    
    
      4
      2226
      Knonau
      0.0
      0.0
      0.0
      1.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      KNO
      Knonau
      2014
      980
      1100
      SBB
      NaN
      47.2202591199, 8.4667085831
    
    
      5
      3105
      Uetikon
      0.0
      0.0
      1.0
      1.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      UET
      Uetikon
      2014
      4500
      5500
      SBB
      NaN
      47.2590294057, 8.678774746
    
    
      6
      3015
      Zürich Wipkingen
      0.0
      0.0
      1.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      ZWIP
      Zürich Wipkingen
      2014
      3300
      3800
      SBB
      NaN
      47.3930313233, 8.52935905447
    
    
      7
      3112
      Kempraten
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      KPT
      Kempraten
      2014
      850
      990
      SBB
      NaN
      47.2383929337, 8.81429184846
    
    
      8
      3100
      Zollikon
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      ZK
      Zollikon
      2014
      1100
      1300
      SBB
      NaN
      47.3373317337, 8.569717612
    
    
      9
      3502
      Siggenthal-Würenlingen
      0.0
      0.0
      0.0
      0.0
      0.0
      Postauto fahren ab Bahnhofplatz
      0.0
      1.0
      ...
      0
      
      SIG
      Siggenthal-Würenlingen
      2014
      1300
      1600
      SBB
      NaN
      47.5179218559, 8.24015363326
    
    
      10
      3505
      Wettingen
      0.0
      0.0
      1.0
      1.0
      1.0
      Gleise 1, 3, 4 und 5 stufenlos zugänglich.
      0.0
      0.0
      ...
      0
      
      WE
      Wettingen
      2014
      4400
      5400
      SBB
      NaN
      47.4596367944, 8.3159863528
    
    
      11
      3427
      Schlatt
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      SCHT
      Schlatt
      2014
      270
      310
      SBB
      NaN
      47.6795011585, 8.68749675648
    
    
      12
      3527
      Buchs-Dällikon
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      BUD
      Buchs-Dällikon
      2014
      2500
      3100
      SBB
      NaN
      47.4533973239, 8.43557852512
    
    
      13
      3429
      Schlattingen
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      SCHN
      Schlattingen
      2014
      240
      270
      SBB
      NaN
      47.6666263051, 8.77041438581
    
    
      14
      4007
      Moreillon
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      MRL
      Moreillon
      2014
      70
      80
      CFF
      NaN
      46.5083872786, 6.78564812794
    
    
      15
      2237
      Lupfig
      0.0
      0.0
      0.0
      1.0
      1.0
      Postautohaltestelle Lupfig Bahnhof = 3 Minuten...
      0.0
      0.0
      ...
      0
      
      LUPF
      Lupfig
      2014
      220
      270
      SBB
      NaN
      47.4451758366, 8.21496733295
    
    
      16
      3007
      Zürich Seebach
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      ZSEB
      Zürich Seebach
      2014
      1500
      1700
      SBB
      NaN
      47.4187469105, 8.54463614173
    
    
      17
      3001
      Zürich Altstetten
      1.0
      1.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      ZAS
      Zürich Altstetten
      2014
      32100
      39900
      SBB
      NaN
      47.3914808361, 8.4889402654
    
    
      18
      2229
      Urdorf Weihermatt
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      URDW
      Urdorf Weihermatt
      2014
      2200
      2500
      SBB
      NaN
      47.3809708849, 8.43033031194
    
    
      19
      3011
      Zürich Wiedikon
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      ZWIE
      Zürich Wiedikon
      2014
      7700
      9500
      SBB
      NaN
      47.3714716204, 8.52346258297
    
    
      20
      2300
      Zug Postplatz
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      ZGPP
      Zug Postplatz
      2014
      160
      190
      SBB
      NaN
      47.168273486, 8.51691604796
    
    
      21
      2231
      Zug Oberwil
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      ZGO
      Zug Oberwil
      2014
      280
      320
      SBB
      NaN
      47.147675872, 8.50992153777
    
    
      22
      3313
      Niederglatt
      0.0
      0.0
      0.0
      1.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      NG
      Niederglatt
      2014
      1100
      1400
      SBB
      NaN
      47.4874689477, 8.50329612568
    
    
      23
      3316
      Steinmaur
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      STMR
      Steinmaur
      2014
      940
      1100
      SBB
      NaN
      47.4899801946, 8.4468407224
    
    
      24
      3304
      Kemptthal
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      0.0
      ...
      10
      Blumenau, Rekingen AG, Rothenburg, Oberrüti, M...
      KE
      Kemptthal
      2014
      600
      690
      SBB
      NaN
      47.452833265, 8.70557069755
    
    
      25
      3420
      Lottstetten
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      LOT
      Lottstetten
      2014
      180
      220
      SBB
      NaN
      47.6257356296, 8.56680664761
    
    
      26
      3409
      Bad Zurzach
      0.0
      0.0
      1.0
      1.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      ZZ
      Bad Zurzach
      2014
      1400
      1600
      SBB
      NaN
      47.5883101087, 8.29556465398
    
    
      27
      3425
      Feuerthalen
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      FT
      Feuerthalen
      2014
      300
      330
      SBB
      NaN
      47.6921636376, 8.64595208918
    
    
      28
      3421
      Jestetten
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      JE
      Jestetten
      2014
      550
      700
      SBB
      NaN
      47.6544006039, 8.57327389271
    
    
      29
      3414
      Mellikon
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      MELN
      Mellikon
      2014
      70
      80
      SBB
      NaN
      47.5684839744, 8.35249723256
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      616
      4134
      Payerne
      0.0
      0.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      PAY
      Payerne
      2014
      4500
      5100
      CFF
      NaN
      46.8196387484, 6.93987910663
    
    
      617
      4127
      Faoug
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      FG
      Faoug
      2014
      130
      140
      CFF
      NaN
      46.9082315978, 7.07459265792
    
    
      618
      4219
      Auvernier
      0.0
      0.0
      0.0
      0.0
      0.0
      Accès au quai sans escalier (rampe ou ascenseu...
      0.0
      1.0
      ...
      0
      
      AUV
      Auvernier
      2014
      380
      410
      CFF
      NaN
      46.9796312118, 6.87770603996
    
    
      619
      4223
      Cornaux
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      CORN
      Cornaux
      2014
      440
      530
      CFF
      NaN
      47.0385599223, 7.0233637328
    
    
      620
      4229
      Tüscherz
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      TUE
      Tüscherz
      2014
      100
      100
      SBB
      NaN
      47.1149950289, 7.19730481483
    
    
      621
      4228
      Twann
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      TWN
      Twann
      2014
      550
      500
      SBB
      NaN
      47.0936555927, 7.15649171643
    
    
      622
      5111
      Sisikon
      0.0
      0.0
      0.0
      1.0
      0.0
      NaN
      0.0
      0.0
      ...
      0
      
      SK
      Sisikon
      2014
      170
      170
      SBB
      NaN
      46.9492536033, 8.62073068694
    
    
      623
      6300
      St. Gallen Winkeln
      0.0
      0.0
      0.0
      0.0
      1.0
      VBSG-Haltestelle ca. 150m vom Bahnhof
      0.0
      1.0
      ...
      0
      
      SGWI
      St. Gallen Winkeln
      2014
      1100
      1300
      SBB
      NaN
      47.4039256735, 9.30083970711
    
    
      624
      6215
      Bischofszell Nord
      0.0
      0.0
      0.0
      1.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      BZN
      Bischofszell Nord
      2014
      200
      250
      SBB
      NaN
      47.5004169871, 9.23401945255
    
    
      625
      8005
      Burgdorf
      1.0
      1.0
      1.0
      1.0
      1.0
      NaN
      0.0
      4.0
      ...
      0
      
      BDF
      Burgdorf
      2014
      14500
      16800
      SBB
      NaN
      47.0606957305, 7.62168894507
    
    
      626
      6308
      Arbon
      0.0
      0.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      ARB
      Arbon
      2014
      1600
      1700
      SBB
      NaN
      47.5105854479, 9.43333858591
    
    
      627
      6217
      Sulgen
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      SLG
      Sulgen
      2014
      1200
      1400
      SBB
      NaN
      47.5387993681, 9.18367857671
    
    
      628
      6316
      Au SG
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      AUSG
      Au SG
      2014
      290
      330
      SBB
      NaN
      47.4361134139, 9.64124454757
    
    
      629
      6304
      Mörschwil
      0.0
      0.0
      1.0
      1.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      MOER
      Mörschwil
      2014
      270
      280
      SBB
      NaN
      47.4747707737, 9.41437284119
    
    
      630
      6309
      Egnach
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      EGN
      Egnach
      2014
      310
      330
      SBB
      NaN
      47.5439914495, 9.38300007743
    
    
      631
      9000
      Chur
      1.0
      1.0
      1.0
      1.0
      1.0
      Postautostation per Rolltreppe und Lift erreic...
      0.0
      6.0
      ...
      0
      
      CH
      Chur
      2014
      23600
      24100
      SBB
      NaN
      46.8530797552, 9.52892561961
    
    
      632
      8100
      Langenthal
      1.0
      1.0
      1.0
      0.0
      1.0
      NaN
      0.0
      3.0
      ...
      0
      
      LTH
      Langenthal
      2014
      11400
      13100
      SBB
      NaN
      47.2173003777, 7.78470885276
    
    
      633
      9004
      Bad Ragaz
      0.0
      1.0
      1.0
      1.0
      1.0
      Postautos fahren ab Bahnhofplatz
      0.0
      1.0
      ...
      0
      
      BRAG
      Bad Ragaz
      2014
      2200
      2500
      SBB
      NaN
      47.0103066077, 9.50521579767
    
    
      634
      9413
      Flums
      0.0
      1.0
      1.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      FMS
      Flums
      2014
      470
      520
      SBB
      NaN
      47.0966128468, 9.34791662857
    
    
      635
      16218
      Winterthur Hegi
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      WHE
      Winterthur Hegi
      2014
      490
      570
      SBB
      NaN
      47.5015644748, 8.76930571536
    
    
      636
      15992
      Baar Neufeld
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      BAAN
      Baar Neufeld
      2014
      840
      1200
      SBB
      NaN
      47.1884876743, 8.51775897551
    
    
      637
      6139
      Stein am Rhein
      0.0
      0.0
      1.0
      0.0
      1.0
      Gleis 3 Unterführung, steile Rampe
      0.0
      2.0
      ...
      0
      
      STR
      Stein am Rhein
      2014
      2200
      2400
      SBB
      NaN
      47.656108874, 8.85500693112
    
    
      638
      6209
      Flawil
      1.0
      1.0
      1.0
      1.0
      1.0
      NaN
      0.0
      2.0
      ...
      0
      
      FLA
      Flawil
      2014
      3200
      3800
      SBB
      NaN
      47.4151898192, 9.1897661689
    
    
      639
      6046
      Henggart
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      HGT
      Henggart
      2014
      1500
      1700
      SBB
      NaN
      47.5642523105, 8.68505196166
    
    
      640
      6044
      Winterthur Töss
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      WTOE
      Winterthur Töss
      2014
      630
      720
      SBB
      NaN
      47.4898075094, 8.70929159585
    
    
      641
      6009
      Saland
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      SD
      Saland
      2014
      210
      230
      SBB
      NaN
      47.3942126094, 8.85420315671
    
    
      642
      6017
      Wiesendangen
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      WD
      Wiesendangen
      2014
      760
      880
      SBB
      NaN
      47.5255401974, 8.77601496118
    
    
      643
      6021
      Dinhard
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      0.0
      ...
      0
      
      DIH
      Dinhard
      2014
      180
      190
      SBB
      NaN
      47.5533468989, 8.75295793686
    
    
      644
      5204
      Faido
      0.0
      0.0
      0.0
      0.0
      0.0
      NaN
      0.0
      1.0
      ...
      0
      
      FA
      Faido
      2014
      300
      330
      FFS
      NaN
      46.48307671, 8.79070214577
    
    
      645
      16271
      Steinach
      0.0
      0.0
      0.0
      0.0
      1.0
      NaN
      0.0
      1.0
      ...
      0
      
      STCH
      Steinach
      2014
      450
      500
      SBB
      NaN
      47.5008611902, 9.44212099652
    
  

646 rows × 24 columns



In [12]:

    
print("Q8: Find the mismatches.")

# We do an outer join, then we find out which rows are incomplete

dfouter = df.merge(dfp, how='outer', left_on='station_name', right_on='Station')

mismatches = dfouter[dfouter['Station'] != dfouter['station_name']][['Station', 'station_name']]

print("A: Here are the", len(mismatches), "unmatched rows:")

mismatches









    



Q8: Find the mismatches.
A: Here are the 163 unmatched rows:






    Out[12]:






  
    
      
      Station
      station_name
    
  
  
    
      43
      NaN
      Kaltbrunn
    
    
      64
      NaN
      Cossonay-Penthalaz
    
    
      79
      NaN
      Dotzigen
    
    
      89
      NaN
      Corcelles-Nord
    
    
      92
      NaN
      Niederwangen
    
    
      93
      NaN
      Wünnewil
    
    
      99
      NaN
      Suberg-Grossaffoltern
    
    
      122
      NaN
      Pully-Nord
    
    
      135
      NaN
      Thörishaus Dorf
    
    
      199
      NaN
      Studen BE
    
    
      273
      NaN
      Brügg BE
    
    
      275
      NaN
      Kallnach
    
    
      278
      NaN
      Bern Ausserholligen
    
    
      279
      NaN
      Fribourg/Freiburg
    
    
      318
      NaN
      Cointrin
    
    
      346
      NaN
      Boniswil
    
    
      381
      NaN
      Otelfingen Golfpark
    
    
      385
      NaN
      Oberwangen
    
    
      390
      NaN
      Münchenbuchsee
    
    
      401
      NaN
      Basel St. Jakob
    
    
      402
      NaN
      Büren an der Aare
    
    
      429
      NaN
      Emmenbrücke Gersag
    
    
      498
      NaN
      Bargen
    
    
      499
      NaN
      Busswil
    
    
      500
      NaN
      Fräschels
    
    
      506
      NaN
      Yverdon-Champ Pittet
    
    
      507
      NaN
      Muntelier-Löwenberg
    
    
      508
      NaN
      Corcelles-Sud
    
    
      537
      NaN
      Bern Wankdorf
    
    
      540
      NaN
      Luzern Verkehrshaus
    
    
      ...
      ...
      ...
    
    
      779
      Brienz
      NaN
    
    
      780
      Bex
      NaN
    
    
      781
      Bettwiesen
      NaN
    
    
      782
      Corcelles Sud
      NaN
    
    
      783
      Cossonay
      NaN
    
    
      784
      Ebligen
      NaN
    
    
      785
      Hochdorf Schönau
      NaN
    
    
      786
      Gersag
      NaN
    
    
      787
      Wolfenschiessen
      NaN
    
    
      788
      Stans
      NaN
    
    
      789
      Vallorbe
      NaN
    
    
      790
      Mendrisio San Martino
      NaN
    
    
      791
      Riehen b. Basel
      NaN
    
    
      792
      Sachseln
      NaN
    
    
      793
      Sarnen
      NaN
    
    
      794
      Vernier
      NaN
    
    
      795
      Stansstad
      NaN
    
    
      796
      Siegershausen
      NaN
    
    
      797
      Tägerschen
      NaN
    
    
      798
      Tobel-Affeltrangen
      NaN
    
    
      799
      Pully Nord
      NaN
    
    
      800
      Riazzino
      NaN
    
    
      801
      Ringgenberg
      NaN
    
    
      802
      Tägerwilen Dorf
      NaN
    
    
      803
      Zug Casino/Frauenstein
      NaN
    
    
      804
      Zollikofen
      NaN
    
    
      805
      Yverdon - Champ Pittet
      NaN
    
    
      806
      Walchwil Hörndli
      NaN
    
    
      807
      Luzern-Verkehrshaus
      NaN
    
    
      808
      Luzern Allmend
      NaN
    
  

163 rows × 2 columns



In [13]:

    
print("Q9: Which are the most frequented stations with no equipment?")
print("A: These are the top 5:")
dfm[['Station', 'rating', 'DTV']][dfm['rating'] == 0].sort_values('DTV', ascending=False).head(5)









    



Q9: Which are the most frequented stations with no equipment?
A: These are the top 5:






    Out[13]:






  
    
      
      Station
      rating
      DTV
    
  
  
    
      486
      Winterthur Grüze
      0.0
      1700
    
    
      452
      Lugano-Paradiso
      0.0
      850
    
    
      149
      Rothenburg
      0.0
      720
    
    
      24
      Kemptthal
      0.0
      600
    
    
      598
      Kurzrickenbach Seepark
      0.0
      530



In [14]:

    
print("Q10: Which are the most frequented stations with poor equipment (a rating of 3)?")
print("A: These are the top 5. Schlieren has more than 10,000 daily passengers.")
dfm[['Station', 'rating', 'DTV']][dfm['rating'] == 3].sort_values('DTV', ascending=False).head(5)









    



Q10: Which are the most frequented stations with poor equipment (a rating of 3)?
A: These are the top 5. Schlieren has more than 10,000 daily passengers.






    Out[14]:






  
    
      
      Station
      rating
      DTV
    
  
  
    
      166
      Zürich Hardbrücke
      3.0
      47200
    
    
      169
      Stettbach
      3.0
      19900
    
    
      467
      Schlieren
      3.0
      10100
    
    
      470
      Turgi
      3.0
      6100
    
    
      165
      Stäfa
      3.0
      5500



In [15]:

    
print("Q11: Does any of the top 20 most frequented stations have missing equipments?")
most_frequented = dfm.sort_values('DTV', ascending=False).head(20)
most_frequented_missing = most_frequented[most_frequented['rating'] < 6]

print("These very frequented stations don't have a complete equipment:")
most_frequented_missing[['Station', 'DTV', 'rating']]









    



Q11: Does any of the top 20 most frequented stations have missing equipments?
These very frequented stations don't have a complete equipment:






    Out[15]:






  
    
      
      Station
      DTV
      rating
    
  
  
    
      166
      Zürich Hardbrücke
      47200
      3.0
    
    
      346
      Wetzikon
      24100
      5.0



In [16]:

    
print("Q12: How many “automat KTU” is there in all Swiss railway stations? How many stations have one?")
print("A: There is", int(df['automat_ktu'].sum()), "“automat KTU”.")
print(df['automat_ktu'].apply(automat).sum(), "stations have at least one “automat KTU”.")









    



Q12: How many “automat KTU” is there in all Swiss railway stations? How many stations have one?
A: There is 1009 “automat KTU”.
558 stations have at least one “automat KTU”.

Graphics



In [17]:

    
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('fivethirtyeight')
df['rating'].value_counts().plot(kind='barh', label='Yolool').invert_yaxis()
plt.ylabel('Rating')
plt.xlabel('Number of stations')


print("Most railway stations (more than 250) have a rating of 1 or 2:")









    



Most railway stations (more than 250) have a rating of 1 or 2:



In [18]:

    
plt.style.use('ggplot')
most_frequented.plot(kind='barh', x='Station', y='DTV').invert_yaxis()

fig = plt.gcf()
fig.set_size_inches(18, 7, forward=True)
plt.rcParams.update({'font.size': 16})
plt.rc('ytick', labelsize=14)

print("These are the most frequented stations, in average daily passengers:")









    



These are the most frequented stations, in average daily passengers:



In [19]:

    
most_frequented.plot(kind='scatter', x='rating', y='DTV')


print("The same stations and their equipment rating:")









    



The same stations and their equipment rating:



In [ ]:

	didok	station_name	accessible_ticket	accessible_wc	wheelchair_load	mobilift	stepless_perron	bats	automat_ktu	geopos	TUNummer	Betriebspunkttyp
0	10	Basel SBB	1.0	1.0	1.0	1.0	1.0	10 Min. Übergangszeit für Züge von und nach Fr...	19.0	47.5474041527, 7.58955146721	1	Haltestelle
1	2213	Wohlen	1.0	1.0	1.0	1.0	1.0	NaN	2.0	47.3484618374, 8.26978121872	1	Haltestelle_und_Bedienpunkt
2	2116	Schinznach Bad	0.0	0.0	0.0	0.0	0.0	NaN	1.0	47.4517752695, 8.16692832223	1	Haltestelle
3	2211	Hendschiken	1.0	0.0	1.0	0.0	0.0	NaN	1.0	47.3894242237, 8.20737399629	1	Haltestelle
4	2226	Knonau	0.0	0.0	0.0	1.0	1.0	NaN	1.0	47.2202591199, 8.4667085831	1	Haltestelle

	station_name	bats
54	Evouettes, Les	non desservi
147	Vouvry	non desservi
333	Collombey	non desservi
334	Vionnaz	non desservi
335	St-Gingolph (Suisse)	non desservi
421	Bouveret	non desservi

	station_name	rating
98	Deurres, Les	0.0
589	Kerzers Papiliorama	0.0
114	Kölliken Oberdorf	0.0
606	Schloss Laufen am Rheinfall	0.0
408	Tuileries, Les	0.0
293	Grandson	0.0
462	Mühlau	0.0
465	Chamoson	0.0
710	Worb SBB	0.0
119	Rekingen AG	0.0

	station_name	neighbors_unequipped	neighbors_names
543	Baldegg Kloster	16	Kemptthal, Blumenau, Walterswil-Striegel, Köll...
351	Küngoldingen	15	Walterswil-Striegel, Kölliken Oberdorf, Reking...
671	Emmenmatt	15	Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
201	Trimbach	14	Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
674	Littau	14	Kemptthal, Blumenau, Walterswil-Striegel, Köll...
114	Kölliken Oberdorf	14	Walterswil-Striegel, Rekingen AG, Rothenburg, ...
635	Lyssach	14	Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
158	Rothenburg	14	Kemptthal, Blumenau, Walterswil-Striegel, Köll...
236	Olten Hammer	14	Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
595	Signau	13	Suberg-Grossaffoltern, Rothenburg, Brügg BE, K...

	Station	station_name
43	NaN	Kaltbrunn
64	NaN	Cossonay-Penthalaz
79	NaN	Dotzigen
89	NaN	Corcelles-Nord
92	NaN	Niederwangen
93	NaN	Wünnewil
99	NaN	Suberg-Grossaffoltern
122	NaN	Pully-Nord
135	NaN	Thörishaus Dorf
199	NaN	Studen BE
273	NaN	Brügg BE
275	NaN	Kallnach
278	NaN	Bern Ausserholligen
279	NaN	Fribourg/Freiburg
318	NaN	Cointrin
346	NaN	Boniswil
381	NaN	Otelfingen Golfpark
385	NaN	Oberwangen
390	NaN	Münchenbuchsee
401	NaN	Basel St. Jakob
402	NaN	Büren an der Aare
429	NaN	Emmenbrücke Gersag
498	NaN	Bargen
499	NaN	Busswil
500	NaN	Fräschels
506	NaN	Yverdon-Champ Pittet
507	NaN	Muntelier-Löwenberg
508	NaN	Corcelles-Sud
537	NaN	Bern Wankdorf
540	NaN	Luzern Verkehrshaus
...	...	...
779	Brienz	NaN
780	Bex	NaN
781	Bettwiesen	NaN
782	Corcelles Sud	NaN
783	Cossonay	NaN
784	Ebligen	NaN
785	Hochdorf Schönau	NaN
786	Gersag	NaN
787	Wolfenschiessen	NaN
788	Stans	NaN
789	Vallorbe	NaN
790	Mendrisio San Martino	NaN
791	Riehen b. Basel	NaN
792	Sachseln	NaN
793	Sarnen	NaN
794	Vernier	NaN
795	Stansstad	NaN
796	Siegershausen	NaN
797	Tägerschen	NaN
798	Tobel-Affeltrangen	NaN
799	Pully Nord	NaN
800	Riazzino	NaN
801	Ringgenberg	NaN
802	Tägerwilen Dorf	NaN
803	Zug Casino/Frauenstein	NaN
804	Zollikofen	NaN
805	Yverdon - Champ Pittet	NaN
806	Walchwil Hörndli	NaN
807	Luzern-Verkehrshaus	NaN
808	Luzern Allmend	NaN

	Station	DTV
486	Winterthur Grüze	1700
452	Lugano-Paradiso	850
149	Rothenburg	720
24	Kemptthal	600
598	Kurzrickenbach Seepark	530

	Station	rating	DTV
166	Zürich Hardbrücke	3.0	47200
169	Stettbach	3.0	19900
467	Schlieren	3.0	10100
470	Turgi	3.0	6100
165	Stäfa	3.0	5500