Data set 2: railway stations equipment for disabled people

Source and documentation: http://data.sbb.ch/explore/dataset/equipement/

  1. Open your dataset up using pandas in a Jupyter notebook
  2. Do a .head() to get a feel for your data
  3. Write down 12 questions to ask your data, or 12 things to hunt for in the data
  4. Attempt to answer those ten questions using the magic of pandas
  5. Make three charts with your dataset
  6. Keep track of anything that was problematic - it can be non-standard country names, extra spaces in columns, trouble getting multiple colors in scatterplots, whatever you'd like.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("stations.csv", delimiter=';')

In [3]:
df.head(5)


Out[3]:
didok station_name accessible_ticket accessible_wc wheelchair_load mobilift stepless_perron bats autelca automat_ktu geopos TUNummer Betriebspunkttyp
0 10 Basel SBB 1.0 1.0 1.0 1.0 1.0 10 Min. Übergangszeit für Züge von und nach Fr... 0.0 19.0 47.5474041527, 7.58955146721 1 Haltestelle
1 2213 Wohlen 1.0 1.0 1.0 1.0 1.0 NaN 0.0 2.0 47.3484618374, 8.26978121872 1 Haltestelle_und_Bedienpunkt
2 2116 Schinznach Bad 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 47.4517752695, 8.16692832223 1 Haltestelle
3 2211 Hendschiken 1.0 0.0 1.0 0.0 0.0 NaN 0.0 1.0 47.3894242237, 8.20737399629 1 Haltestelle
4 2226 Knonau 0.0 0.0 0.0 1.0 1.0 NaN 0.0 1.0 47.2202591199, 8.4667085831 1 Haltestelle

In [4]:
print("Q1: How many train station is there in Switzerland? How many of them are accessible for wheelchairs?")

accessible_perron = df[df['stepless_perron'] == 1]

print("A: There is", df.shape[0], "stations.", accessible_perron.shape[0], "of them have a stepless perron.")
#df.describe()
#df['accessible_ticket'].value_counts()


Q1: How many train station is there in Switzerland? How many of them are accessible for wheelchairs?
A: There is 731 stations. 460 of them have a stepless perron.

In [5]:
import math
print("Q2: What does “bats” stand for? Is there interesting data about that?")

print("A: This field contains comments in french, italian and german about the stations.")

print("The metadata says: “Comment field (Billet Automat Touch Screen?)” with a question mark.")

bats = df[df['bats'] == df['bats']] # check if equals to himself to get rid of the NaN
#bats[['station_name','bats']].head(5) # to explore the data

print("Some stations are marked as “not served” in french. The “bats” column enables us to get a non-extensive list of these:")

bats[bats['bats'] == 'non desservi'][['station_name','bats']]


Q2: What does “bats” stand for? Is there interesting data about that?
A: This field contains comments in french, italian and german about the stations.
The metadata says: “Comment field (Billet Automat Touch Screen?)” with a question mark.
Some stations are marked as “not served” in french. The “bats” column enables us to get a non-extensive list of these:
Out[5]:
station_name bats
54 Evouettes, Les non desservi
147 Vouvry non desservi
333 Collombey non desservi
334 Vionnaz non desservi
335 St-Gingolph (Suisse) non desservi
421 Bouveret non desservi

In [6]:
print("Q3: How many stations offer help to board and get out of the trains? What percentage of the stations does it represent?")
with_help = df['wheelchair_load'].value_counts()[1]
total = with_help + df['wheelchair_load'].value_counts()[0]
percentage = (with_help*100) / total
print("A:", with_help, "stations offer help to board and get out of trains. This represents", "%.2f" % percentage + "% of the total.")


Q3: How many stations offer help to board and get out of the trains? What percentage of the stations does it represent?
A: 193 stations offer help to board and get out of trains. This represents 27.03% of the total.

In [7]:
print("Q4: Which are the 10 least equipped stations?")

# For our rating, we just want to know if “automat_ktu” is more than 1, and not add the actual number.
def automat(x):
    if x > 0:
        return 1
    else:
        return 0

df['rating'] = df['accessible_ticket'] + df['accessible_wc'] + df['wheelchair_load'] + df['mobilift'] + df['stepless_perron'] + df['autelca'] + df['automat_ktu'].apply(automat)

zeroRatingCount = df['didok'][df['rating'] == 0].count()

print("A: These 10 stations have zero recorded equipment. But", zeroRatingCount-10, "other station have no equipment.")


df[['station_name', 'rating']].sort_values(by='rating').head(10)


Q4: Which are the 10 least equipped stations?
A: These 10 stations have zero recorded equipment. But 33 other station have no equipment.
Out[7]:
station_name rating
98 Deurres, Les 0.0
589 Kerzers Papiliorama 0.0
114 Kölliken Oberdorf 0.0
606 Schloss Laufen am Rheinfall 0.0
408 Tuileries, Les 0.0
293 Grandson 0.0
462 Mühlau 0.0
465 Chamoson 0.0
710 Worb SBB 0.0
119 Rekingen AG 0.0

In [8]:
print('Q5: Is there areas with a lot of non-equipped stations?')

zeroRating = df[df['rating'] == 0]
# posList = df['geopos'].tolist
geopos_array = []
neighbors_list = []

# 1) store all positions in an array
for index, row in zeroRating.iterrows():
    geopos = row['geopos'].split(', ')
    latlng = [float(item) for item in geopos]
    geopos_array.append(latlng)
    neighbors_list.append(row['station_name'])

# 2) store “neighbour” non-equipped stations in two new columns

# [latitude = 47.452833265 N, longitude = 8.70557069755 E]
# In Switzerland, 1 ~= 110 km for latitude, 75 km for longitude
# Let's say 50 km is enough to be "neighbour" stations (31 miles, a bit more than Columbia University - Newark)
# It's a rough calculation: the surface in which cities are "neighbour" is a square and not a circle.
latdistance = 0.4
longdistance = 0.6

df['neighbors_unequipped'] = 0
df['neighbors_names'] = ''

for index, row in zeroRating.iterrows(): # iteration 1: through the stations
    neighbors_count = 0
    neighbors_index = 0
    neighbors_currlist = []
    geopos = row['geopos'].split(', ')
    latlng = [float(item) for item in geopos]
    
    for loc in geopos_array: # iteration 2: through the array of locations
        if loc != latlng: # we check if this isn't the current station location
            if (abs(loc[0] - latlng[0]) < latdistance) and (abs(loc[1] - latlng[1]) < longdistance):
                neighbors_count += 1
                neighbors_currlist.append(neighbors_list[neighbors_index])
        neighbors_index += 1

    df.loc[index, 'neighbors_unequipped'] = neighbors_count
    df.loc[index, 'neighbors_names'] = str.join(', ', neighbors_currlist)

df[df['neighbors_unequipped'] > 0].sort_values(by='neighbors_unequipped', ascending=False).head(10)

top_unequipped = df[df['neighbors_unequipped'] > 0].sort_values(by='neighbors_unequipped', ascending=False)


print("[We create two new columns and get this list -- text answer in the next cell]")
top_unequipped[['station_name', 'neighbors_unequipped', 'neighbors_names']].head(10)


Q5: Is there areas with a lot of non-equipped stations?
[We create two new columns and get this list -- text answer in the next cell]
Out[8]:
station_name neighbors_unequipped neighbors_names
543 Baldegg Kloster 16 Kemptthal, Blumenau, Walterswil-Striegel, Köll...
351 Küngoldingen 15 Walterswil-Striegel, Kölliken Oberdorf, Reking...
671 Emmenmatt 15 Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
201 Trimbach 14 Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
674 Littau 14 Kemptthal, Blumenau, Walterswil-Striegel, Köll...
114 Kölliken Oberdorf 14 Walterswil-Striegel, Rekingen AG, Rothenburg, ...
635 Lyssach 14 Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
158 Rothenburg 14 Kemptthal, Blumenau, Walterswil-Striegel, Köll...
236 Olten Hammer 14 Suberg-Grossaffoltern, Walterswil-Striegel, Kö...
595 Signau 13 Suberg-Grossaffoltern, Rothenburg, Brügg BE, K...

In [9]:
# 3) new for loop to get a “clean” list of different locations,
# which means we avoid printing two stations from the same area

name_list = []
reject_list = []

for index, row in top_unequipped.iterrows(): # iteration 1: through the stations
    rejects = row['neighbors_names'] # we want to get the different areas
    reject_list.extend(rejects.split(', '))
    if row['station_name'] not in reject_list:
        name_list.append("“" + row['station_name'] + "”: " + str(row['neighbors_unequipped']))

print("Answer to Q5: These places count the most unequipped stations within about 50 km:", str.join(', ', name_list))


Answer to Q5: These places count the most unequipped stations within about 50 km: “Baldegg Kloster”: 16, “Deurres, Les”: 9, “Quartino”: 4, “Creux-de-Genthod”: 1, “Tuileries, Les”: 1

In [10]:
print("Q6: What is the average station “rating”, based on the number of equipments?")
print("A: The average rating is", df['rating'].mean())


Q6: What is the average station “rating”, based on the number of equipments?
A: The average rating is 2.32112676056

In [11]:
print("Q7: Merge this data with the passenger frequence. Is it a perfect match?")
dfp = pd.read_csv('passagierfrequenz.csv', delimiter=';')
dfm = df.merge(dfp, how='inner', left_on='station_name', right_on='Station')

print("A: It isn’t a perfect match. The “equipment” dataset has", df.shape[0], "rows; the “frequency” dataset has", dfp.shape[0], "rows; the merged dataset has only", dfm.shape[0], "rows.")
print("Here is the merged dataset:")

dfm


Q7: Merge this data with the passenger frequence. Is it a perfect match?
A: It isn’t a perfect match. The “equipment” dataset has 731 rows; the “frequency” dataset has 724 rows; the merged dataset has only 646 rows.
Here is the merged dataset:
Out[11]:
didok station_name accessible_ticket accessible_wc wheelchair_load mobilift stepless_perron bats autelca automat_ktu ... neighbors_unequipped neighbors_names Code Station Year DTV DWV Owner Comments geopos_y
0 10 Basel SBB 1.0 1.0 1.0 1.0 1.0 10 Min. Übergangszeit für Züge von und nach Fr... 0.0 19.0 ... 0 BS Basel SBB 2014 101400 112900 SBB Passagierfrequenzen: ohne SNCF. 47.5474041527, 7.58955146721
1 2213 Wohlen 1.0 1.0 1.0 1.0 1.0 NaN 0.0 2.0 ... 0 WO Wohlen 2014 5200 6200 SBB Passagierfrequenzen: ohne BDWM. 47.3484618374, 8.26978121872
2 2116 Schinznach Bad 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 SBAD Schinznach Bad 2014 530 600 SBB NaN 47.4517752695, 8.16692832223
3 2211 Hendschiken 1.0 0.0 1.0 0.0 0.0 NaN 0.0 1.0 ... 0 HDK Hendschiken 2014 380 450 SBB NaN 47.3894242237, 8.20737399629
4 2226 Knonau 0.0 0.0 0.0 1.0 1.0 NaN 0.0 1.0 ... 0 KNO Knonau 2014 980 1100 SBB NaN 47.2202591199, 8.4667085831
5 3105 Uetikon 0.0 0.0 1.0 1.0 1.0 NaN 0.0 1.0 ... 0 UET Uetikon 2014 4500 5500 SBB NaN 47.2590294057, 8.678774746
6 3015 Zürich Wipkingen 0.0 0.0 1.0 0.0 0.0 NaN 0.0 1.0 ... 0 ZWIP Zürich Wipkingen 2014 3300 3800 SBB NaN 47.3930313233, 8.52935905447
7 3112 Kempraten 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 KPT Kempraten 2014 850 990 SBB NaN 47.2383929337, 8.81429184846
8 3100 Zollikon 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 ZK Zollikon 2014 1100 1300 SBB NaN 47.3373317337, 8.569717612
9 3502 Siggenthal-Würenlingen 0.0 0.0 0.0 0.0 0.0 Postauto fahren ab Bahnhofplatz 0.0 1.0 ... 0 SIG Siggenthal-Würenlingen 2014 1300 1600 SBB NaN 47.5179218559, 8.24015363326
10 3505 Wettingen 0.0 0.0 1.0 1.0 1.0 Gleise 1, 3, 4 und 5 stufenlos zugänglich. 0.0 0.0 ... 0 WE Wettingen 2014 4400 5400 SBB NaN 47.4596367944, 8.3159863528
11 3427 Schlatt 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 SCHT Schlatt 2014 270 310 SBB NaN 47.6795011585, 8.68749675648
12 3527 Buchs-Dällikon 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 BUD Buchs-Dällikon 2014 2500 3100 SBB NaN 47.4533973239, 8.43557852512
13 3429 Schlattingen 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 SCHN Schlattingen 2014 240 270 SBB NaN 47.6666263051, 8.77041438581
14 4007 Moreillon 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 MRL Moreillon 2014 70 80 CFF NaN 46.5083872786, 6.78564812794
15 2237 Lupfig 0.0 0.0 0.0 1.0 1.0 Postautohaltestelle Lupfig Bahnhof = 3 Minuten... 0.0 0.0 ... 0 LUPF Lupfig 2014 220 270 SBB NaN 47.4451758366, 8.21496733295
16 3007 Zürich Seebach 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 ZSEB Zürich Seebach 2014 1500 1700 SBB NaN 47.4187469105, 8.54463614173
17 3001 Zürich Altstetten 1.0 1.0 1.0 1.0 1.0 NaN 0.0 2.0 ... 0 ZAS Zürich Altstetten 2014 32100 39900 SBB NaN 47.3914808361, 8.4889402654
18 2229 Urdorf Weihermatt 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 URDW Urdorf Weihermatt 2014 2200 2500 SBB NaN 47.3809708849, 8.43033031194
19 3011 Zürich Wiedikon 0.0 0.0 0.0 0.0 1.0 NaN 0.0 2.0 ... 0 ZWIE Zürich Wiedikon 2014 7700 9500 SBB NaN 47.3714716204, 8.52346258297
20 2300 Zug Postplatz 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 ZGPP Zug Postplatz 2014 160 190 SBB NaN 47.168273486, 8.51691604796
21 2231 Zug Oberwil 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 ZGO Zug Oberwil 2014 280 320 SBB NaN 47.147675872, 8.50992153777
22 3313 Niederglatt 0.0 0.0 0.0 1.0 1.0 NaN 0.0 1.0 ... 0 NG Niederglatt 2014 1100 1400 SBB NaN 47.4874689477, 8.50329612568
23 3316 Steinmaur 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 STMR Steinmaur 2014 940 1100 SBB NaN 47.4899801946, 8.4468407224
24 3304 Kemptthal 0.0 0.0 0.0 0.0 0.0 NaN 0.0 0.0 ... 10 Blumenau, Rekingen AG, Rothenburg, Oberrüti, M... KE Kemptthal 2014 600 690 SBB NaN 47.452833265, 8.70557069755
25 3420 Lottstetten 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 LOT Lottstetten 2014 180 220 SBB NaN 47.6257356296, 8.56680664761
26 3409 Bad Zurzach 0.0 0.0 1.0 1.0 1.0 NaN 0.0 1.0 ... 0 ZZ Bad Zurzach 2014 1400 1600 SBB NaN 47.5883101087, 8.29556465398
27 3425 Feuerthalen 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 FT Feuerthalen 2014 300 330 SBB NaN 47.6921636376, 8.64595208918
28 3421 Jestetten 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 JE Jestetten 2014 550 700 SBB NaN 47.6544006039, 8.57327389271
29 3414 Mellikon 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 MELN Mellikon 2014 70 80 SBB NaN 47.5684839744, 8.35249723256
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
616 4134 Payerne 0.0 0.0 1.0 1.0 1.0 NaN 0.0 2.0 ... 0 PAY Payerne 2014 4500 5100 CFF NaN 46.8196387484, 6.93987910663
617 4127 Faoug 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 FG Faoug 2014 130 140 CFF NaN 46.9082315978, 7.07459265792
618 4219 Auvernier 0.0 0.0 0.0 0.0 0.0 Accès au quai sans escalier (rampe ou ascenseu... 0.0 1.0 ... 0 AUV Auvernier 2014 380 410 CFF NaN 46.9796312118, 6.87770603996
619 4223 Cornaux 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 CORN Cornaux 2014 440 530 CFF NaN 47.0385599223, 7.0233637328
620 4229 Tüscherz 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 TUE Tüscherz 2014 100 100 SBB NaN 47.1149950289, 7.19730481483
621 4228 Twann 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 TWN Twann 2014 550 500 SBB NaN 47.0936555927, 7.15649171643
622 5111 Sisikon 0.0 0.0 0.0 1.0 0.0 NaN 0.0 0.0 ... 0 SK Sisikon 2014 170 170 SBB NaN 46.9492536033, 8.62073068694
623 6300 St. Gallen Winkeln 0.0 0.0 0.0 0.0 1.0 VBSG-Haltestelle ca. 150m vom Bahnhof 0.0 1.0 ... 0 SGWI St. Gallen Winkeln 2014 1100 1300 SBB NaN 47.4039256735, 9.30083970711
624 6215 Bischofszell Nord 0.0 0.0 0.0 1.0 1.0 NaN 0.0 1.0 ... 0 BZN Bischofszell Nord 2014 200 250 SBB NaN 47.5004169871, 9.23401945255
625 8005 Burgdorf 1.0 1.0 1.0 1.0 1.0 NaN 0.0 4.0 ... 0 BDF Burgdorf 2014 14500 16800 SBB NaN 47.0606957305, 7.62168894507
626 6308 Arbon 0.0 0.0 1.0 1.0 1.0 NaN 0.0 2.0 ... 0 ARB Arbon 2014 1600 1700 SBB NaN 47.5105854479, 9.43333858591
627 6217 Sulgen 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 SLG Sulgen 2014 1200 1400 SBB NaN 47.5387993681, 9.18367857671
628 6316 Au SG 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 AUSG Au SG 2014 290 330 SBB NaN 47.4361134139, 9.64124454757
629 6304 Mörschwil 0.0 0.0 1.0 1.0 0.0 NaN 0.0 1.0 ... 0 MOER Mörschwil 2014 270 280 SBB NaN 47.4747707737, 9.41437284119
630 6309 Egnach 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 EGN Egnach 2014 310 330 SBB NaN 47.5439914495, 9.38300007743
631 9000 Chur 1.0 1.0 1.0 1.0 1.0 Postautostation per Rolltreppe und Lift erreic... 0.0 6.0 ... 0 CH Chur 2014 23600 24100 SBB NaN 46.8530797552, 9.52892561961
632 8100 Langenthal 1.0 1.0 1.0 0.0 1.0 NaN 0.0 3.0 ... 0 LTH Langenthal 2014 11400 13100 SBB NaN 47.2173003777, 7.78470885276
633 9004 Bad Ragaz 0.0 1.0 1.0 1.0 1.0 Postautos fahren ab Bahnhofplatz 0.0 1.0 ... 0 BRAG Bad Ragaz 2014 2200 2500 SBB NaN 47.0103066077, 9.50521579767
634 9413 Flums 0.0 1.0 1.0 0.0 0.0 NaN 0.0 1.0 ... 0 FMS Flums 2014 470 520 SBB NaN 47.0966128468, 9.34791662857
635 16218 Winterthur Hegi 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 WHE Winterthur Hegi 2014 490 570 SBB NaN 47.5015644748, 8.76930571536
636 15992 Baar Neufeld 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 BAAN Baar Neufeld 2014 840 1200 SBB NaN 47.1884876743, 8.51775897551
637 6139 Stein am Rhein 0.0 0.0 1.0 0.0 1.0 Gleis 3 Unterführung, steile Rampe 0.0 2.0 ... 0 STR Stein am Rhein 2014 2200 2400 SBB NaN 47.656108874, 8.85500693112
638 6209 Flawil 1.0 1.0 1.0 1.0 1.0 NaN 0.0 2.0 ... 0 FLA Flawil 2014 3200 3800 SBB NaN 47.4151898192, 9.1897661689
639 6046 Henggart 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 HGT Henggart 2014 1500 1700 SBB NaN 47.5642523105, 8.68505196166
640 6044 Winterthur Töss 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 WTOE Winterthur Töss 2014 630 720 SBB NaN 47.4898075094, 8.70929159585
641 6009 Saland 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 SD Saland 2014 210 230 SBB NaN 47.3942126094, 8.85420315671
642 6017 Wiesendangen 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 WD Wiesendangen 2014 760 880 SBB NaN 47.5255401974, 8.77601496118
643 6021 Dinhard 0.0 0.0 0.0 0.0 1.0 NaN 0.0 0.0 ... 0 DIH Dinhard 2014 180 190 SBB NaN 47.5533468989, 8.75295793686
644 5204 Faido 0.0 0.0 0.0 0.0 0.0 NaN 0.0 1.0 ... 0 FA Faido 2014 300 330 FFS NaN 46.48307671, 8.79070214577
645 16271 Steinach 0.0 0.0 0.0 0.0 1.0 NaN 0.0 1.0 ... 0 STCH Steinach 2014 450 500 SBB NaN 47.5008611902, 9.44212099652

646 rows × 24 columns


In [12]:
print("Q8: Find the mismatches.")

# We do an outer join, then we find out which rows are incomplete

dfouter = df.merge(dfp, how='outer', left_on='station_name', right_on='Station')

mismatches = dfouter[dfouter['Station'] != dfouter['station_name']][['Station', 'station_name']]

print("A: Here are the", len(mismatches), "unmatched rows:")

mismatches


Q8: Find the mismatches.
A: Here are the 163 unmatched rows:
Out[12]:
Station station_name
43 NaN Kaltbrunn
64 NaN Cossonay-Penthalaz
79 NaN Dotzigen
89 NaN Corcelles-Nord
92 NaN Niederwangen
93 NaN Wünnewil
99 NaN Suberg-Grossaffoltern
122 NaN Pully-Nord
135 NaN Thörishaus Dorf
199 NaN Studen BE
273 NaN Brügg BE
275 NaN Kallnach
278 NaN Bern Ausserholligen
279 NaN Fribourg/Freiburg
318 NaN Cointrin
346 NaN Boniswil
381 NaN Otelfingen Golfpark
385 NaN Oberwangen
390 NaN Münchenbuchsee
401 NaN Basel St. Jakob
402 NaN Büren an der Aare
429 NaN Emmenbrücke Gersag
498 NaN Bargen
499 NaN Busswil
500 NaN Fräschels
506 NaN Yverdon-Champ Pittet
507 NaN Muntelier-Löwenberg
508 NaN Corcelles-Sud
537 NaN Bern Wankdorf
540 NaN Luzern Verkehrshaus
... ... ...
779 Brienz NaN
780 Bex NaN
781 Bettwiesen NaN
782 Corcelles Sud NaN
783 Cossonay NaN
784 Ebligen NaN
785 Hochdorf Schönau NaN
786 Gersag NaN
787 Wolfenschiessen NaN
788 Stans NaN
789 Vallorbe NaN
790 Mendrisio San Martino NaN
791 Riehen b. Basel NaN
792 Sachseln NaN
793 Sarnen NaN
794 Vernier NaN
795 Stansstad NaN
796 Siegershausen NaN
797 Tägerschen NaN
798 Tobel-Affeltrangen NaN
799 Pully Nord NaN
800 Riazzino NaN
801 Ringgenberg NaN
802 Tägerwilen Dorf NaN
803 Zug Casino/Frauenstein NaN
804 Zollikofen NaN
805 Yverdon - Champ Pittet NaN
806 Walchwil Hörndli NaN
807 Luzern-Verkehrshaus NaN
808 Luzern Allmend NaN

163 rows × 2 columns


In [13]:
print("Q9: Which are the most frequented stations with no equipment?")
print("A: These are the top 5:")
dfm[['Station', 'rating', 'DTV']][dfm['rating'] == 0].sort_values('DTV', ascending=False).head(5)


Q9: Which are the most frequented stations with no equipment?
A: These are the top 5:
Out[13]:
Station rating DTV
486 Winterthur Grüze 0.0 1700
452 Lugano-Paradiso 0.0 850
149 Rothenburg 0.0 720
24 Kemptthal 0.0 600
598 Kurzrickenbach Seepark 0.0 530

In [14]:
print("Q10: Which are the most frequented stations with poor equipment (a rating of 3)?")
print("A: These are the top 5. Schlieren has more than 10,000 daily passengers.")
dfm[['Station', 'rating', 'DTV']][dfm['rating'] == 3].sort_values('DTV', ascending=False).head(5)


Q10: Which are the most frequented stations with poor equipment (a rating of 3)?
A: These are the top 5. Schlieren has more than 10,000 daily passengers.
Out[14]:
Station rating DTV
166 Zürich Hardbrücke 3.0 47200
169 Stettbach 3.0 19900
467 Schlieren 3.0 10100
470 Turgi 3.0 6100
165 Stäfa 3.0 5500

In [15]:
print("Q11: Does any of the top 20 most frequented stations have missing equipments?")
most_frequented = dfm.sort_values('DTV', ascending=False).head(20)
most_frequented_missing = most_frequented[most_frequented['rating'] < 6]

print("These very frequented stations don't have a complete equipment:")
most_frequented_missing[['Station', 'DTV', 'rating']]


Q11: Does any of the top 20 most frequented stations have missing equipments?
These very frequented stations don't have a complete equipment:
Out[15]:
Station DTV rating
166 Zürich Hardbrücke 47200 3.0
346 Wetzikon 24100 5.0

In [16]:
print("Q12: How many “automat KTU” is there in all Swiss railway stations? How many stations have one?")
print("A: There is", int(df['automat_ktu'].sum()), "“automat KTU”.")
print(df['automat_ktu'].apply(automat).sum(), "stations have at least one “automat KTU”.")


Q12: How many “automat KTU” is there in all Swiss railway stations? How many stations have one?
A: There is 1009 “automat KTU”.
558 stations have at least one “automat KTU”.

Graphics


In [17]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('fivethirtyeight')
df['rating'].value_counts().plot(kind='barh', label='Yolool').invert_yaxis()
plt.ylabel('Rating')
plt.xlabel('Number of stations')


print("Most railway stations (more than 250) have a rating of 1 or 2:")


Most railway stations (more than 250) have a rating of 1 or 2:

In [18]:
plt.style.use('ggplot')
most_frequented.plot(kind='barh', x='Station', y='DTV').invert_yaxis()

fig = plt.gcf()
fig.set_size_inches(18, 7, forward=True)
plt.rcParams.update({'font.size': 16})
plt.rc('ytick', labelsize=14)

print("These are the most frequented stations, in average daily passengers:")


These are the most frequented stations, in average daily passengers:

In [19]:
most_frequented.plot(kind='scatter', x='rating', y='DTV')


print("The same stations and their equipment rating:")


The same stations and their equipment rating:

In [ ]: