Train Station Data Cleaner

Data source https://www.data.vic.gov.au/data/dataset/train-station-entries-2008-09-to-2011-12-new Data Temporal Coverage: 01/07/2008 to 30/06/2012 Comparable data with buses and trams [Weekeday by time 'AM Peak', 'Interpeak', 'PM Peak']

Data Source

https://www.data.vic.gov.au/data/dataset/train-station-entries-2008-09-to-2011-12-new

Data Temporal Coverage: 01/07/2008 to 30/06/2012


In [21]:
rawtrain = './raw/Train Station Entries 2008-09 to 2011-12 - data.XLS'

Step 1: Download raw tram boarding data, save a local copy in ./raw directory

Download Tram boardings and alightings xls file manually. The web page has a 'I consent to terms and conditions / I am not a robot' button that prevents automated downloading (or at least makes it harder than I expected). Save file to './raw' directory


In [22]:
import pandas as pd

In [25]:
df = pd.read_excel(rawtrain,sheetname='Data', header = 0, skiprows = 1, skip_footer=5)
df


Out[25]:
Station Notes Line Group Network Segment Line Segment 2008-09 2009-10 2010-11 2011-12 2008-09.1 ... 2011-12.1 Normal Weekday Saturday Sunday Weekly Pre AM Peak AM Peak Interpeak PM Peak PM Late
0 Aircraft NaN Northern Newport Corridor Seaholme-Werribee 0.313692 0.305483 0.324742 0.315160 1088.699986 ... 1061.929966 1061.929966 525.683292 397.080300 6232.413425 237.897730 456.425690 225.740549 130.721038 11.144959
1 Alamein NaN Burnley Camberwell Corridor Riversdale-Alamein 0.176067 0.169994 0.175858 0.153094 628.730293 ... 492.705744 492.705744 292.156843 201.637095 2957.322660 48.946191 267.239895 114.143413 44.118682 18.257564
2 Albion NaN Northern Sunbury Line Albion-Sunbury 0.785147 0.808922 0.802370 0.679577 2859.820566 ... 2529.307073 2529.307073 928.927817 632.559319 14208.022498 319.162044 1310.765719 485.731707 389.188179 24.459423
3 Alphington NaN Clifton Hill Hurstbridge Line Westgarth-Hurstbridge 0.316047 0.323038 0.321010 0.287551 1160.002749 ... 1011.836954 1011.836954 476.803206 348.284321 5884.272295 42.208523 537.959378 241.489810 153.019594 37.159648
4 Altona NaN Northern Newport Corridor Seaholme-Werribee 0.409006 0.391029 0.385586 0.281536 1381.436326 ... 962.443514 962.443514 486.406093 298.350946 5596.974608 120.828059 400.638040 231.197189 170.906116 38.874109
5 Anstey NaN Northern Upfield Line Macauly-Upfield 0.353803 0.379469 0.376610 0.361545 1181.831716 ... 1143.501139 1143.501139 797.290704 612.842409 7127.638809 95.506592 453.776044 306.652866 216.409306 71.156331
6 Armadale NaN Caulfield Hawksburn-Caulfield Hawksburn-Caulfield 0.568183 0.581113 0.623642 0.563734 2040.527713 ... 1844.564584 1844.564584 1356.511188 870.567424 11449.901533 80.678582 866.786739 381.513971 437.373267 78.212026
7 Ascot Vale NaN Northern Craigieburn Line Kensington-Craigieburn 0.570532 0.587112 0.548968 0.544282 2026.388357 ... 1890.138774 1890.138774 970.636798 678.919272 11100.249942 126.057396 987.481205 443.258259 280.724595 52.617319
8 Ashburton NaN Burnley Camberwell Corridor Riversdale-Alamein 0.295785 0.303425 0.307057 0.286560 1091.933975 ... 1023.164969 1023.164969 370.467082 244.203530 5730.495455 93.941294 487.340828 236.751601 189.405391 15.725855
9 Aspendale NaN Caulfield Frankston Line Glenhuntly-Frankston 0.388967 0.380146 0.355486 0.305565 1340.146017 ... 959.853901 959.853901 606.947772 455.699106 5861.916380 89.744631 496.647928 221.980937 127.979531 23.500874
10 Auburn NaN Burnley Camberwell Corridor East Richmond-Camberwell 0.635399 0.648381 0.688923 0.613467 2266.200792 ... 2081.810274 2081.810274 1135.421043 795.840744 12340.313158 67.355354 934.446348 470.993373 547.819489 61.195710
11 Balaclava NaN Caulfield Sandringham Line Prahran-Sandringham 1.098892 1.151161 1.163197 1.046436 3741.274838 ... 3337.522752 3337.522752 2077.868744 1423.519443 20189.001948 172.824574 1430.279922 873.153018 696.786645 164.478593
12 Batman NaN Northern Upfield Line Macauly-Upfield 0.319489 0.332657 0.313373 0.285973 1122.869572 ... 942.042074 942.042074 547.579899 414.645318 5672.435585 72.398862 349.459620 251.456742 226.989400 41.737449
13 Bayswater NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.492183 0.505117 0.537508 0.511002 1812.828583 ... 1835.545303 1835.545303 631.862149 412.857766 10222.446430 245.530767 745.709352 373.007350 439.275548 32.022286
14 Beaconsfield NaN Caulfield Dandenong Corridor Hallam-Pakenham 0.226322 0.237028 0.236643 0.239137 873.914346 ... 871.291170 871.291170 301.775939 167.785238 4826.017027 268.899229 346.181736 128.043337 120.307591 7.859277
15 Belgrave NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.583975 0.515926 0.503393 0.504070 2024.730476 ... 1652.450451 1652.450451 1114.243947 868.381874 10244.878075 331.174281 547.349043 503.170925 208.329372 62.426830
16 Bell NaN Clifton Hill South Morang Line Rushall-South Morang 0.508278 0.549773 0.532537 0.523225 1521.364317 ... 1499.630670 1499.630670 1042.658928 588.063886 9128.876165 101.922958 514.820324 458.908365 358.462697 65.516326
17 Bentleigh NaN Caulfield Frankston Line Glenhuntly-Frankston 0.989939 1.029037 1.005742 0.898005 3431.420241 ... 2914.405406 2914.405406 1948.659309 1191.924518 17712.610857 194.787844 1189.354853 739.945272 652.710848 137.606589
18 Berwick NaN Caulfield Dandenong Corridor Hallam-Pakenham 0.750368 0.776350 0.848875 0.890924 2744.285531 ... 3294.239303 3294.239303 1108.860239 660.833356 18240.890108 706.732340 1113.437915 657.417195 757.705830 58.946022
19 Blackburn NaN Burnley Camberwell Corridor East Camberwell-Ringwood 1.122865 1.236781 1.219206 1.258705 4086.825560 ... 4501.305165 4501.305165 1873.691482 1279.779202 25659.996510 321.212872 1632.293465 1031.163692 1439.485115 77.150022
20 Bonbeach NaN Caulfield Frankston Line Glenhuntly-Frankston 0.323577 0.374199 0.352353 0.327569 1103.141173 ... 1162.376105 1162.376105 495.426288 359.665707 6666.972522 161.135056 494.020680 278.844128 203.547200 24.829040
21 Boronia NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.617362 0.617459 0.682181 0.680487 2210.058218 ... 2400.005271 2400.005271 1123.285829 734.930380 13858.242566 362.587305 1180.790457 582.552389 232.773466 41.301654
22 Box Hill NaN Burnley Camberwell Corridor East Camberwell-Ringwood 3.025194 3.160572 3.160397 2.742972 10802.406934 ... 9439.572570 9439.572570 4872.990487 3696.963732 55767.817070 645.497818 2836.825887 2329.658722 3315.365319 312.224826
23 Brighton Beach NaN Caulfield Sandringham Line Prahran-Sandringham 0.426686 0.455663 0.456528 0.455037 1431.306544 ... 1404.270361 1404.270361 1141.109034 734.918867 8897.379707 125.536489 692.023089 283.925004 234.231847 68.553932
24 Broadmeadows NaN Northern Craigieburn Line Kensington-Craigieburn 0.682694 0.716055 0.684284 0.733842 2375.824425 ... 2495.276916 2495.276916 1516.368108 1068.392379 15061.145066 125.306601 528.369305 797.819080 937.784941 105.996988
25 Brunswick NaN Northern Upfield Line Macauly-Upfield 0.269396 0.302548 0.299091 0.286981 928.983373 ... 903.549940 903.549940 610.316329 546.586310 5674.652339 55.109711 278.477907 242.412666 251.464289 76.085368
26 Burnley NaN Burnley Camberwell Corridor East Richmond-Camberwell 0.835919 0.872310 0.927393 0.841847 3079.315691 ... 2856.361922 2856.361922 1446.591082 1216.115853 16944.516543 90.742622 1042.181843 655.646547 819.610721 248.180189
27 Burwood NaN Burnley Camberwell Corridor Riversdale-Alamein 0.373574 0.378675 0.381104 0.327392 1360.296141 ... 1092.312271 1092.312271 512.134465 350.504723 6324.200544 83.863248 573.984562 245.421586 147.814210 41.228664
28 Camberwell NaN Burnley Camberwell Corridor East Richmond-Camberwell 2.102035 2.162117 2.267186 2.051733 7133.635858 ... 6571.171421 6571.171421 4866.553363 4154.747040 41877.157506 149.017955 1616.374441 1496.695877 2983.135401 325.947746
29 Canterbury NaN Burnley Camberwell Corridor East Camberwell-Ringwood 0.373434 0.369572 0.400612 0.374332 1364.624793 ... 1288.984767 1288.984767 677.029825 458.238233 7580.191892 65.440852 512.759240 290.348537 367.015625 53.420512
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
174 St Albans NaN Northern Sunbury Line Albion-Sunbury 1.680160 1.796735 1.738504 1.527953 6396.812565 ... 5992.341035 5992.341035 1806.815393 1427.711854 33196.232421 832.692601 1441.624471 2073.866798 1551.267383 92.889781
175 Strathmore NaN Northern Craigieburn Line Kensington-Craigieburn 0.347295 0.370580 0.371052 0.342511 1294.759543 ... 1208.319884 1208.319884 531.827415 388.384319 6961.811152 93.203205 603.140060 258.220581 208.207228 45.548811
176 Sunshine NaN Northern Sunbury Line Middle Footscray-Sunshine 1.936390 2.060345 2.108567 1.981379 6680.718475 ... 7031.655174 7031.655174 3380.918589 2554.978257 41094.172714 431.921305 2198.619706 2427.945911 1717.151291 256.016960
177 Surrey Hills NaN Burnley Camberwell Corridor East Camberwell-Ringwood 0.725658 0.771550 0.764162 0.690202 2637.424929 ... 2414.611194 2414.611194 1079.079619 694.833815 13846.969401 185.709287 1443.280073 424.742808 271.856242 89.022784
178 Syndal NaN Burnley Glen Waverley Line Heyington-Glen Waverley 0.682937 0.708319 0.794046 0.718235 2587.706316 ... 2620.955481 2620.955481 1023.918465 836.465851 14965.161721 245.261822 1245.037990 441.789188 654.421807 34.444673
179 Tecoma NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.077525 0.077407 0.076761 0.068889 273.462042 ... 230.136503 230.136503 136.223265 90.989461 1377.895238 35.922475 103.858737 51.818768 30.389748 8.146773
180 Thomastown NaN Clifton Hill South Morang Line Rushall-South Morang 0.871835 0.901163 0.888904 0.734174 3046.669561 ... 2527.507042 2527.507042 1095.940052 861.450962 14594.926223 388.599436 1117.061367 601.356534 382.526078 37.963627
181 Thornbury NaN Clifton Hill South Morang Line Rushall-South Morang 0.448927 0.464390 0.458635 0.430616 1498.962519 ... 1411.857573 1411.857573 882.713790 590.162534 8532.164186 114.174008 676.782173 373.452575 174.319919 73.128898
182 Toorak NaN Caulfield Hawksburn-Caulfield Hawksburn-Caulfield 0.433521 0.437247 0.510177 0.463455 1630.276771 ... 1618.950346 1618.950346 795.644933 521.861696 9412.258359 58.886934 640.003109 360.324474 459.894247 99.841582
183 Tooronga NaN Burnley Glen Waverley Line Heyington-Glen Waverley 0.478729 0.486046 0.520267 0.514741 1738.533697 ... 1914.659795 1914.659795 697.375276 459.383185 10730.057437 73.151999 698.342134 305.493214 770.773744 66.898704
184 Tottenham NaN Northern Sunbury Line Middle Footscray-Sunshine 0.469777 0.468401 0.491280 0.432533 1556.018763 ... 1443.319332 1443.319332 813.400230 638.377893 8668.374781 94.677604 527.962757 443.970478 318.180963 58.527530
185 Upfield NaN Northern Upfield Line Macauly-Upfield 0.369930 0.390478 0.344048 0.318588 1313.895815 ... 1038.912949 1038.912949 651.600717 515.726271 6361.891731 90.588282 303.450119 296.174540 293.119960 55.580047
186 Upper Ferntree Gully NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.280888 0.272581 0.284428 0.277734 1068.713157 ... 1012.280107 1012.280107 373.971284 262.097378 5697.469198 151.671462 379.641304 181.238533 285.439881 14.288928
187 Upwey NaN Burnley Camberwell Corridor Heathmont-Belgrave 0.259953 0.259661 0.266529 0.247125 987.372375 ... 917.909988 917.909988 393.596805 242.739845 5225.886590 105.054676 272.127373 190.715960 326.858393 23.153586
188 Victoria Park NaN Clifton Hill Jolimont-Clifton Hill Jolimont-Clifton Hill 0.473573 0.490286 0.536820 0.490913 1625.194994 ... 1663.360474 1663.360474 754.669610 576.205756 9647.677734 51.293375 322.804558 453.590678 693.733180 141.938684
189 Watergardens (Sydenham) NaN Northern Sunbury Line Albion-Sunbury 1.495998 1.593798 1.618270 1.530111 5378.913782 ... 5656.270772 5656.270772 1654.080020 1190.351987 31125.785865 762.812912 2629.834401 1141.700017 1044.005266 77.918175
190 Watsonia NaN Clifton Hill Hurstbridge Line Westgarth-Hurstbridge 0.598247 0.631850 0.655479 0.618701 2229.805077 ... 2132.908547 2132.908547 1018.285554 623.510226 12306.338516 376.316357 1021.160452 369.120734 342.395663 23.915341
191 Wattle Glen NaN Clifton Hill Hurstbridge Line Westgarth-Hurstbridge 0.055077 0.052024 0.057772 0.054709 208.871315 ... 198.806241 198.806241 80.914341 49.230690 1124.176236 39.305513 114.857693 28.018355 14.056897 2.567783
192 Werribee NaN Northern Newport Corridor Seaholme-Werribee 0.907355 0.963761 1.039778 1.067810 3258.678132 ... 3876.673129 3876.673129 1293.267310 857.371906 21534.004858 829.421538 1386.440447 790.375452 833.605216 36.830475
193 West Footscray NaN Northern Sunbury Line Middle Footscray-Sunshine 0.392650 0.398015 0.426490 0.355557 1350.105643 ... 1271.002280 1271.002280 530.768423 406.402300 7292.182124 89.763223 646.716898 337.187272 151.455349 45.879538
194 West Richmond NaN Clifton Hill Jolimont-Clifton Hill Jolimont-Clifton Hill 0.257459 0.275671 0.323192 0.329447 838.879647 ... 1017.492390 1017.492390 705.480935 536.581639 6329.524525 33.549801 309.462662 305.537884 246.830707 122.111336
195 Westall NaN Caulfield Dandenong Corridor Carnegie-Dandenong 0.532592 0.531782 0.472518 0.538259 1947.779806 ... 1934.108350 1934.108350 796.999722 744.054810 11211.596282 145.111788 646.624349 460.052595 626.319618 56.000000
196 Westgarth NaN Clifton Hill Hurstbridge Line Westgarth-Hurstbridge 0.292978 0.297036 0.323125 0.294679 995.441400 ... 925.326738 925.326738 596.150002 511.853785 5734.637477 41.008089 472.091032 237.504996 112.466101 62.256521
197 Westona NaN Northern Newport Corridor Seaholme-Werribee 0.349476 0.350423 0.309620 0.245212 1277.981415 ... 904.565930 904.565930 340.342664 242.350998 5105.523310 113.297640 338.617880 161.776385 264.103584 26.770440
198 Williamstown NaN Northern Newport Corridor North Williamstown-Williamstown 0.138101 0.136182 0.131095 0.132888 427.733161 ... 409.158103 409.158103 322.180337 255.810031 2623.780882 23.242671 163.086869 109.204693 91.822772 21.801098
199 Williamstown Beach NaN Northern Newport Corridor North Williamstown-Williamstown 0.244128 0.246207 0.231888 0.223411 896.700118 ... 767.226590 767.226590 395.207348 307.424717 4538.765015 52.581414 286.566304 201.609692 190.596130 35.873050
200 Willison NaN Burnley Camberwell Corridor Riversdale-Alamein 0.096312 0.096346 0.097112 0.087210 356.774033 ... 327.341435 327.341435 94.561587 54.368455 1785.637216 20.185304 220.568194 52.445619 28.980636 5.161682
201 Windsor NaN Caulfield Sandringham Line Prahran-Sandringham 0.965600 1.025300 1.066929 1.067218 3538.695263 ... 3910.891157 3910.891157 1825.528466 1245.843533 22625.827784 110.343773 1131.072837 678.606690 1657.848612 333.019244
202 Yarraman NaN Caulfield Dandenong Corridor Carnegie-Dandenong 0.336235 0.337992 0.325546 0.305602 1251.189383 ... 1094.094070 1094.094070 439.487909 300.954366 6210.912624 128.327438 380.112882 297.736253 259.604945 28.312552
203 Yarraville NaN Northern Newport Corridor Seddon-Newport 0.843157 0.846625 0.886215 0.876868 2850.785297 ... 2972.277302 2972.277302 1612.628586 1369.080996 17843.096094 178.287096 1291.790472 767.455020 613.035122 121.709591

204 rows × 22 columns

Comparison with bus and tram reports.

Station entries v Boardings and alightings

The train station entry data does not provide information about how many people got on or off a train. (There is no boarding or alighting information). Train Station entries can only measure the level of activity over the course of a day at a particular station.

Step 2: Subset out the weekday 7am to 7pm station entry data

The Train station entry report covers the entire operating day. The bus and tram reports cover only the 7am to 7pm period.

Train station entries are broken into four time periods. They are not specieifed on the Data Vic Gov Au website, but they are specified on the PTV Research website https://www.ptv.vic.gov.au/about-ptv/ptv-data-and-reports/research-and-statistics/

  • Pre AM Peak - first service to 6:59am
  • AM Peak - 7:00am to 9:29am
  • Interpeak - 9:30am to 2:59pm
  • PM Peak - 3:00pm to 6:59pm
  • PM Late - 7:00pm to last service

To compare with bus and tram boadings, SUM('AM Peak', 'Interpeak', 'PM Peak') columns from the 2011 weekday dataset to create a 'wk7am7pm' value.


In [26]:
trains = df.loc[:, ['Station','AM Peak','Interpeak','PM Peak']]
trains['wk7am7pm'] = trains['AM Peak'] + trains['Interpeak'] + trains['PM Peak']

Step 3: Create a .csv file with weekday 7am to 7pm station entries for each stop

This script groups all the reported tram boardings and alightings for a given stop If multiple routes use the same stop the results from multiple routes will be combined into a single "boarding" value and a single "alighting" value.

Results are saved as

'./clean/TrainStationEntries.csv'


In [24]:
trains.to_csv('./clean/TrainStationEntries.csv')

trains


Out[24]:
Station AM Peak Interpeak PM Peak wk7am7pm
0 Aircraft 456.425690 225.740549 130.721038 812.887277
1 Alamein 267.239895 114.143413 44.118682 425.501990
2 Albion 1310.765719 485.731707 389.188179 2185.685605
3 Alphington 537.959378 241.489810 153.019594 932.468782
4 Altona 400.638040 231.197189 170.906116 802.741346
5 Anstey 453.776044 306.652866 216.409306 976.838216
6 Armadale 866.786739 381.513971 437.373267 1685.673976
7 Ascot Vale 987.481205 443.258259 280.724595 1711.464059
8 Ashburton 487.340828 236.751601 189.405391 913.497819
9 Aspendale 496.647928 221.980937 127.979531 846.608396
10 Auburn 934.446348 470.993373 547.819489 1953.259209
11 Balaclava 1430.279922 873.153018 696.786645 3000.219585
12 Batman 349.459620 251.456742 226.989400 827.905763
13 Bayswater 745.709352 373.007350 439.275548 1557.992250
14 Beaconsfield 346.181736 128.043337 120.307591 594.532664
15 Belgrave 547.349043 503.170925 208.329372 1258.849340
16 Bell 514.820324 458.908365 358.462697 1332.191386
17 Bentleigh 1189.354853 739.945272 652.710848 2582.010973
18 Berwick 1113.437915 657.417195 757.705830 2528.560940
19 Blackburn 1632.293465 1031.163692 1439.485115 4102.942272
20 Bonbeach 494.020680 278.844128 203.547200 976.412009
21 Boronia 1180.790457 582.552389 232.773466 1996.116313
22 Box Hill 2836.825887 2329.658722 3315.365319 8481.849927
23 Brighton Beach 692.023089 283.925004 234.231847 1210.179940
24 Broadmeadows 528.369305 797.819080 937.784941 2263.973326
25 Brunswick 278.477907 242.412666 251.464289 772.354861
26 Burnley 1042.181843 655.646547 819.610721 2517.439111
27 Burwood 573.984562 245.421586 147.814210 967.220358
28 Camberwell 1616.374441 1496.695877 2983.135401 6096.205719
29 Canterbury 512.759240 290.348537 367.015625 1170.123403
... ... ... ... ... ...
174 St Albans 1441.624471 2073.866798 1551.267383 5066.758652
175 Strathmore 603.140060 258.220581 208.207228 1069.567868
176 Sunshine 2198.619706 2427.945911 1717.151291 6343.716908
177 Surrey Hills 1443.280073 424.742808 271.856242 2139.879123
178 Syndal 1245.037990 441.789188 654.421807 2341.248986
179 Tecoma 103.858737 51.818768 30.389748 186.067254
180 Thomastown 1117.061367 601.356534 382.526078 2100.943979
181 Thornbury 676.782173 373.452575 174.319919 1224.554667
182 Toorak 640.003109 360.324474 459.894247 1460.221831
183 Tooronga 698.342134 305.493214 770.773744 1774.609092
184 Tottenham 527.962757 443.970478 318.180963 1290.114198
185 Upfield 303.450119 296.174540 293.119960 892.744620
186 Upper Ferntree Gully 379.641304 181.238533 285.439881 846.319717
187 Upwey 272.127373 190.715960 326.858393 789.701726
188 Victoria Park 322.804558 453.590678 693.733180 1470.128415
189 Watergardens (Sydenham) 2629.834401 1141.700017 1044.005266 4815.539685
190 Watsonia 1021.160452 369.120734 342.395663 1732.676849
191 Wattle Glen 114.857693 28.018355 14.056897 156.932945
192 Werribee 1386.440447 790.375452 833.605216 3010.421116
193 West Footscray 646.716898 337.187272 151.455349 1135.359519
194 West Richmond 309.462662 305.537884 246.830707 861.831254
195 Westall 646.624349 460.052595 626.319618 1732.996563
196 Westgarth 472.091032 237.504996 112.466101 822.062128
197 Westona 338.617880 161.776385 264.103584 764.497849
198 Williamstown 163.086869 109.204693 91.822772 364.114333
199 Williamstown Beach 286.566304 201.609692 190.596130 678.772127
200 Willison 220.568194 52.445619 28.980636 301.994449
201 Windsor 1131.072837 678.606690 1657.848612 3467.528139
202 Yarraman 380.112882 297.736253 259.604945 937.454080
203 Yarraville 1291.790472 767.455020 613.035122 2672.280615

204 rows × 5 columns

Step 4: Map train specific results

Use QGIS to join TrainStationEntries.csv to 'layer ptv_train_station' using the common column 'Station'

Use Display properties to colourcode tram stops by wk7am7pm to find the busiest stop.

Note: 'layer ptv_train_station' includes a 'Metlink_StopID' column creating, providing a common index for all public transport stops.


In [ ]: