In [2]:
using DataFrames
using JSON
using Iterators
using taxis
using HDF5, JLD
using Interact
using Gadfly
using Color
using Stats
using kNN
using sequenceCompare

#reload("taxis")

nprocs()


Warning: replacing module taxis
Out[2]:
8

In [3]:
taxi_df, taxi_validation_df = taxis.LoadData("/home/tony/ML/taxi/taxi2_time/train_100k.csv", 
                                        "/home/tony/ML/taxi/taxi2_time/test.csv")


Begin
loading csv files
loading coords
getting coords counts
deleting unneeded data rows/columns
done!
Out[3]:
(99632x13 DataFrame
| Row   | TRIP_ID             | CALL_TYPE | ORIGIN_CALL | ORIGIN_STAND |
|-------|---------------------|-----------|-------------|--------------|
| 1     | 1372636858620000589 | "C"       | NA          | NA           |
| 2     | 1372637303620000596 | "B"       | NA          | 7            |
| 3     | 1372636951620000320 | "C"       | NA          | NA           |
| 4     | 1372636854620000520 | "C"       | NA          | NA           |
| 5     | 1372637091620000337 | "C"       | NA          | NA           |
| 6     | 1372636965620000231 | "C"       | NA          | NA           |
| 7     | 1372637210620000456 | "C"       | NA          | NA           |
| 8     | 1372637299620000011 | "C"       | NA          | NA           |
| 9     | 1372637274620000403 | "C"       | NA          | NA           |
| 10    | 1372637905620000320 | "C"       | NA          | NA           |
| 11    | 1372636875620000233 | "C"       | NA          | NA           |
⋮
| 99621 | 1374431063620000419 | "C"       | NA          | NA           |
| 99622 | 1374432016620000658 | "B"       | NA          | 57           |
| 99623 | 1374434617620000146 | "B"       | NA          | 34           |
| 99624 | 1374433008620000250 | "B"       | NA          | 9            |
| 99625 | 1374432749620000023 | "A"       | 59871       | NA           |
| 99626 | 1374434381620000618 | "A"       | 2002        | NA           |
| 99627 | 1374431340620000384 | "B"       | NA          | 10           |
| 99628 | 1374431335620000271 | "A"       | 41673       | NA           |
| 99629 | 1374433338620000398 | "B"       | NA          | 36           |
| 99630 | 1374433379620000506 | "B"       | NA          | 53           |
| 99631 | 1374433756620000435 | "B"       | NA          | 9            |
| 99632 | 1374434789620000074 | "A"       | 60678       | NA           |

| Row   | TAXI_ID  | TIMESTAMP  | DAY_TYPE | MISSING_DATA |
|-------|----------|------------|----------|--------------|
| 1     | 20000589 | 1372636858 | "A"      | "False"      |
| 2     | 20000596 | 1372637303 | "A"      | "False"      |
| 3     | 20000320 | 1372636951 | "A"      | "False"      |
| 4     | 20000520 | 1372636854 | "A"      | "False"      |
| 5     | 20000337 | 1372637091 | "A"      | "False"      |
| 6     | 20000231 | 1372636965 | "A"      | "False"      |
| 7     | 20000456 | 1372637210 | "A"      | "False"      |
| 8     | 20000011 | 1372637299 | "A"      | "False"      |
| 9     | 20000403 | 1372637274 | "A"      | "False"      |
| 10    | 20000320 | 1372637905 | "A"      | "False"      |
| 11    | 20000233 | 1372636875 | "A"      | "False"      |
⋮
| 99621 | 20000419 | 1374431063 | "A"      | "False"      |
| 99622 | 20000658 | 1374432016 | "A"      | "False"      |
| 99623 | 20000146 | 1374434617 | "A"      | "False"      |
| 99624 | 20000250 | 1374433008 | "A"      | "False"      |
| 99625 | 20000023 | 1374432749 | "A"      | "False"      |
| 99626 | 20000618 | 1374434381 | "A"      | "False"      |
| 99627 | 20000384 | 1374431340 | "A"      | "False"      |
| 99628 | 20000271 | 1374431335 | "A"      | "False"      |
| 99629 | 20000398 | 1374433338 | "A"      | "False"      |
| 99630 | 20000506 | 1374433379 | "A"      | "False"      |
| 99631 | 20000435 | 1374433756 | "A"      | "False"      |
| 99632 | 20000074 | 1374434789 | "A"      | "False"      |

| Row   | COORDS                                                                                                                                                                      |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1     | 2x23 Array{Float64,2}:
 -8.61864  -8.6185  -8.62033  -8.62215  …  -8.63083  -8.63083  -8.63084
 41.1414   41.1414  41.1425   41.1438      41.1545   41.1545   41.1545     |
| 2     | 2x19 Array{Float64,2}:
 -8.63985  -8.64035  -8.6422  -8.64446  …  -8.66664  -8.66577  -8.66574
 41.1598   41.1599   41.1601  41.1605      41.17     41.1706   41.1707     |
| 3     | 2x65 Array{Float64,2}:
 -8.61296  -8.61338  -8.61421  -8.61477  …  -8.61944  -8.61736  -8.61597
 41.1404   41.1403   41.1403   41.1404      41.142    41.1409   41.1405   |
| 4     | 2x43 Array{Float64,2}:
 -8.57468  -8.5747  -8.5747  -8.57466  …  -8.60799  -8.60801  -8.608 
 41.152    41.1519  41.1519  41.152       41.1429   41.1429   41.1429        |
| 5     | 2x29 Array{Float64,2}:
 -8.64599  -8.64595  -8.64605  -8.6468  …  -8.68726  -8.68726  -8.68727
 41.1805   41.1805   41.18     41.1789     41.1778   41.1781   41.1781     |
| 6     | 2x26 Array{Float64,2}:
 -8.6155  -8.61485  -8.61335  -8.60998  …  -8.57863  -8.57852  -8.57822
 41.1407  41.1409   41.1415   41.1409      41.1577   41.1594   41.1607     |
| 7     | 2x36 Array{Float64,2}:
 -8.57952  -8.58094  -8.58271  -8.58409  …  -8.604   -8.60398  -8.60397
 41.1459   41.145    41.145    41.1462      41.1428  41.1428   41.1428     |
| 8     | 2x34 Array{Float64,2}:
 -8.61756  -8.61753  -8.61698  -8.61575  …  -8.62474  -8.62473  -8.6247
 41.1462   41.1458   41.1448   41.1454      41.1615   41.1615   41.1616    |
| 9     | 2x38 Array{Float64,2}:
 -8.61179  -8.61178  -8.612   -8.61262  …  -8.59145  -8.58992  -8.5894
 41.1406   41.1406   41.1406  41.1405      41.1616   41.1635   41.1633      |
| 10    | 2x19 Array{Float64,2}:
 -8.61591  -8.61445  -8.61352  -8.6099  …  -8.60548  -8.6046  -8.60459
 41.1406   41.1411   41.1414   41.1408     41.1344   41.1342  41.1342       |
| 11    | 2x22 Array{Float64,2}:
 -8.61989  -8.62016  -8.62065  -8.62092  …  -8.61021  -8.60951  -8.60949
 41.148    41.1477   41.1485   41.1503      41.1573   41.1573   41.1574   |
⋮
| 99621 | 2x16 Array{Float64,2}:
 -8.61082  -8.61085  -8.61078  -8.61046  …  -8.62457  -8.62454  -8.625 
 41.1456   41.1457   41.1457   41.1461      41.1484   41.1484   41.1485    |
| 99622 | 2x45 Array{Float64,2}:
 -8.6109  -8.61095  -8.61096  -8.61097  …  -8.64152  -8.64117  -8.64068
 41.1456  41.1457   41.1457   41.1457      41.1602   41.1609   41.161      |
| 99623 | 2x37 Array{Float64,2}:
 -8.61554  -8.61552  -8.61575  -8.61669  …  -8.64974  -8.65036  -8.65035
 41.1407   41.1407   41.1407   41.1406      41.1512   41.151    41.151    |
| 99624 | 2x52 Array{Float64,2}:
 -8.60652  -8.60697  -8.60749  -8.60795  …  -8.55941  -8.5594  -8.5594
 41.1446   41.1447   41.144    41.1434      41.1448   41.1448  41.1448      |
| 99625 | 2x114 Array{Float64,2}:
 -8.68889  -8.68902  -8.68917  -8.68933  …  -8.64545  -8.64544  -8.64541
 41.1744   41.1758   41.1772   41.1786      41.1644   41.1644   41.1644  |
| 99626 | 2x24 Array{Float64,2}:
 -8.57473  -8.57472  -8.57489  -8.57586  …  -8.60882  -8.60948  -8.60967
 41.1427   41.1427   41.1428   41.1433      41.141    41.1409   41.1409   |
| 99627 | 2x23 Array{Float64,2}:
 -8.60711  -8.60711  -8.60716  -8.60812  …  -8.61983  -8.62088  -8.6214
 41.1503   41.1503   41.1502   41.1502      41.1585   41.1599   41.1605    |
| 99628 | 2x81 Array{Float64,2}:
 -8.61325  -8.61309  -8.61429  -8.61467  …  -8.67027  -8.67027  -8.67005
 41.1479   41.1483   41.1484   41.1476      41.2375   41.2375   41.237    |
| 99629 | 2x133 Array{Float64,2}:
 -8.6493  -8.65004  -8.65017  -8.64935  …  -8.63551  -8.63552  -8.63551
 41.1543  41.1542   41.154    41.154       40.976    40.976    40.9759    |
| 99630 | 2x56 Array{Float64,2}:
 -8.6139  -8.61389  -8.61436  -8.61546  …  -8.67428  -8.67452  -8.67453
 41.1411  41.1411   41.141    41.1406      41.1543   41.1542   41.1542     |
| 99631 | 2x83 Array{Float64,2}:
 -8.60647  -8.60652  -8.60666  -8.60704  …  -8.67821  -8.67781  -8.6778
 41.1445   41.1445   41.1446   41.1448      41.1575   41.1577   41.1577    |
| 99632 | 2x32 Array{Float64,2}:
 -8.60495  -8.60537  -8.60565  -8.60567  …  -8.6304  -8.63042  -8.63041
 41.1497   41.1498   41.1498   41.1497      41.1579  41.158    41.1579     |

| Row   | NUM_COORDS | START              | END                |
|-------|------------|--------------------|--------------------|
| 1     | 23         | [-8.61864,41.1414] | [-8.63084,41.1545] |
| 2     | 19         | [-8.63985,41.1598] | [-8.66574,41.1707] |
| 3     | 65         | [-8.61296,41.1404] | [-8.61597,41.1405] |
| 4     | 43         | [-8.57468,41.152]  | [-8.608,41.1429]   |
| 5     | 29         | [-8.64599,41.1805] | [-8.68727,41.1781] |
| 6     | 26         | [-8.6155,41.1407]  | [-8.57822,41.1607] |
| 7     | 36         | [-8.57952,41.1459] | [-8.60397,41.1428] |
| 8     | 34         | [-8.61756,41.1462] | [-8.6247,41.1616]  |
| 9     | 38         | [-8.61179,41.1406] | [-8.5894,41.1633]  |
| 10    | 19         | [-8.61591,41.1406] | [-8.60459,41.1342] |
| 11    | 22         | [-8.61989,41.148]  | [-8.60949,41.1574] |
⋮
| 99621 | 16         | [-8.61082,41.1456] | [-8.625,41.1485]   |
| 99622 | 45         | [-8.6109,41.1456]  | [-8.64068,41.161]  |
| 99623 | 37         | [-8.61554,41.1407] | [-8.65035,41.151]  |
| 99624 | 52         | [-8.60652,41.1446] | [-8.5594,41.1448]  |
| 99625 | 114        | [-8.68889,41.1744] | [-8.64541,41.1644] |
| 99626 | 24         | [-8.57473,41.1427] | [-8.60967,41.1409] |
| 99627 | 23         | [-8.60711,41.1503] | [-8.6214,41.1605]  |
| 99628 | 81         | [-8.61325,41.1479] | [-8.67005,41.237]  |
| 99629 | 133        | [-8.6493,41.1543]  | [-8.63551,40.9759] |
| 99630 | 56         | [-8.6139,41.1411]  | [-8.67453,41.1542] |
| 99631 | 83         | [-8.60647,41.1445] | [-8.6778,41.1577]  |
| 99632 | 32         | [-8.60495,41.1497] | [-8.63041,41.1579] |

| Row   | COORDS_TEST                                                                                                                                                                |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1     | 2x11 Array{Float64,2}:
 -8.61864  -8.6185  -8.62033  -8.62215  …  -8.63275  -8.63174  -8.62994
 41.1414   41.1414  41.1425   41.1438      41.1469   41.1482   41.1504    |
| 2     | 2x15 Array{Float64,2}:
 -8.63985  -8.64035  -8.6422  -8.64446  …  -8.67085  -8.67094  -8.66961
 41.1598   41.1599   41.1601  41.1605      41.1651   41.1666   41.168     |
| 3     | 2x44 Array{Float64,2}:
 -8.61296  -8.61338  -8.61421  -8.61477  …  -8.64937  -8.6492  -8.64971
 41.1404   41.1403   41.1403   41.1404      41.1526   41.1524  41.1512    |
| 4     | 2x14 Array{Float64,2}:
 -8.57468  -8.5747  -8.5747  -8.57466  …  -8.57936  -8.58074  -8.5829
 41.152    41.1519  41.1519  41.152       41.1462   41.145    41.1451       |
| 5     | 2x26 Array{Float64,2}:
 -8.64599  -8.64595  -8.64605  -8.6468  …  -8.68909  -8.68906  -8.6875
 41.1805   41.1805   41.18     41.1789     41.1764   41.1766   41.1768     |
| 6     | 2x14 Array{Float64,2}:
 -8.6155  -8.61485  -8.61335  -8.60998  …  -8.58383  -8.58109  -8.57913
 41.1407  41.1409   41.1415   41.1409      41.1436   41.1437   41.1442    |
| 7     | 2x23 Array{Float64,2}:
 -8.57952  -8.58094  -8.58271  -8.58409  …  -8.60179  -8.60177  -8.60223
 41.1459   41.145    41.145    41.1462      41.1473   41.1473   41.147   |
| 8     | 2x12 Array{Float64,2}:
 -8.61756  -8.61753  -8.61698  -8.61575  …  -8.61745  -8.61772  -8.61801
 41.1462   41.1458   41.1448   41.1454      41.1472   41.1472   41.1475  |
| 9     | 2x26 Array{Float64,2}:
 -8.61179  -8.61178  -8.612   -8.61262  …  -8.60249  -8.60104  -8.60044
 41.1406   41.1406   41.1406  41.1405      41.1456   41.1458   41.1464    |
| 10    | 2x10 Array{Float64,2}:
 -8.61591  -8.61445  -8.61352  -8.6099  …  -8.61145  -8.61062  -8.60932
 41.1406   41.1411   41.1414   41.1408     41.136    41.1346   41.1344    |
| 11    | 2x10 Array{Float64,2}:
 -8.61989  -8.62016  -8.62065  -8.62092  …  -8.62094  -8.62097  -8.62103
 41.148    41.1477   41.1485   41.1503      41.1555   41.1555   41.1555  |
⋮
| 99621 | 2x5 Array{Float64,2}:
 -8.61082  -8.61085  -8.61078  -8.61046  -8.60965
 41.1456   41.1457   41.1457   41.1461   41.1467                                                   |
| 99622 | 2x21 Array{Float64,2}:
 -8.6109  -8.61095  -8.61096  -8.61097  …  -8.62589  -8.62605  -8.626 
 41.1456  41.1457   41.1457   41.1457      41.1502   41.1518   41.1519     |
| 99623 | 2x27 Array{Float64,2}:
 -8.61554  -8.61552  -8.61575  -8.61669  …  -8.64128  -8.64302  -8.64472
 41.1407   41.1407   41.1407   41.1406      41.1483   41.148    41.1483  |
| 99624 | 2x20 Array{Float64,2}:
 -8.60652  -8.60697  -8.60749  -8.60795  …  -8.59077  -8.58905  -8.58875
 41.1446   41.1447   41.144    41.1434      41.1468   41.1471   41.1472  |
| 99625 | 2x97 Array{Float64,2}:
 -8.68889  -8.68902  -8.68917  -8.68933  …  -8.64214  -8.6401  -8.64129
 41.1744   41.1758   41.1772   41.1786      41.167    41.1653  41.1649    |
| 99626 | 2x21 Array{Float64,2}:
 -8.57473  -8.57472  -8.57489  -8.57586  …  -8.60284  -8.60493  -8.60681
 41.1427   41.1427   41.1428   41.1433      41.1419   41.1418   41.1415  |
| 99627 | 2x11 Array{Float64,2}:
 -8.60711  -8.60711  -8.60716  -8.60812  …  -8.60927  -8.61053  -8.61199
 41.1503   41.1503   41.1502   41.1502      41.1534   41.1536   41.1539  |
| 99628 | 2x60 Array{Float64,2}:
 -8.61325  -8.61309  -8.61429  -8.61467  …  -8.66325  -8.66616  -8.66798
 41.1479   41.1483   41.1484   41.1476      41.2244   41.2273   41.2302  |
| 99629 | 2x90 Array{Float64,2}:
 -8.6493  -8.65004  -8.65017  -8.64935  …  -8.63551  -8.6355  -8.6355
 41.1543  41.1542   41.154    41.154       40.9759   40.976   40.976        |
| 99630 | 2x51 Array{Float64,2}:
 -8.6139  -8.61389  -8.61436  -8.61546  …  -8.67105  -8.67225  -8.67305
 41.1411  41.1411   41.141    41.1406      41.1536   41.1549   41.1555    |
| 99631 | 2x35 Array{Float64,2}:
 -8.60647  -8.60652  -8.60666  -8.60704  …  -8.63276  -8.63274  -8.63269
 41.1445   41.1445   41.1446   41.1448      41.147    41.1471   41.1472  |
| 99632 | 2x15 Array{Float64,2}:
 -8.60495  -8.60537  -8.60565  -8.60567  …  -8.60877  -8.61122  -8.61201
 41.1497   41.1498   41.1498   41.1497      41.1536   41.1536   41.1543  |,320x12 DataFrame
| Row | TRIP_ID | CALL_TYPE | ORIGIN_CALL | ORIGIN_STAND | TAXI_ID  |
|-----|---------|-----------|-------------|--------------|----------|
| 1   | "T1"    | "B"       | NA          | 15           | 20000542 |
| 2   | "T2"    | "B"       | NA          | 57           | 20000108 |
| 3   | "T3"    | "B"       | NA          | 15           | 20000370 |
| 4   | "T4"    | "B"       | NA          | 53           | 20000492 |
| 5   | "T5"    | "B"       | NA          | 18           | 20000621 |
| 6   | "T6"    | "A"       | 42612       | NA           | 20000607 |
| 7   | "T7"    | "B"       | NA          | 15           | 20000310 |
| 8   | "T8"    | "A"       | 31780       | NA           | 20000619 |
| 9   | "T9"    | "B"       | NA          | 9            | 20000503 |
| 10  | "T10"   | "B"       | NA          | 15           | 20000327 |
| 11  | "T11"   | "B"       | NA          | 56           | 20000664 |
⋮
| 309 | "T316"  | "C"       | NA          | NA           | 20000496 |
| 310 | "T317"  | "A"       | 48578       | NA           | 20000436 |
| 311 | "T318"  | "B"       | NA          | 22           | 20000325 |
| 312 | "T319"  | "A"       | 80148       | NA           | 20000281 |
| 313 | "T320"  | "A"       | 66812       | NA           | 20000549 |
| 314 | "T321"  | "C"       | NA          | NA           | 20000393 |
| 315 | "T322"  | "C"       | NA          | NA           | 20000391 |
| 316 | "T323"  | "A"       | 70885       | NA           | 20000430 |
| 317 | "T324"  | "B"       | NA          | 53           | 20000020 |
| 318 | "T325"  | "C"       | NA          | NA           | 20000207 |
| 319 | "T326"  | "A"       | 76232       | NA           | 20000667 |
| 320 | "T327"  | "A"       | 31208       | NA           | 20000255 |

| Row | TIMESTAMP  | DAY_TYPE | MISSING_DATA |
|-----|------------|----------|--------------|
| 1   | 1408039037 | "A"      | "False"      |
| 2   | 1408038611 | "A"      | "False"      |
| 3   | 1408038568 | "A"      | "False"      |
| 4   | 1408039090 | "A"      | "False"      |
| 5   | 1408039177 | "A"      | "False"      |
| 6   | 1408037146 | "A"      | "False"      |
| 7   | 1408038846 | "A"      | "False"      |
| 8   | 1408038948 | "A"      | "False"      |
| 9   | 1408038563 | "A"      | "False"      |
| 10  | 1408038021 | "A"      | "False"      |
| 11  | 1408038267 | "A"      | "False"      |
⋮
| 309 | 1419171893 | "A"      | "False"      |
| 310 | 1419171826 | "A"      | "False"      |
| 311 | 1419171921 | "A"      | "False"      |
| 312 | 1419171095 | "A"      | "False"      |
| 313 | 1419164220 | "A"      | "False"      |
| 314 | 1419168199 | "A"      | "False"      |
| 315 | 1419171201 | "A"      | "False"      |
| 316 | 1419171485 | "A"      | "False"      |
| 317 | 1419170802 | "A"      | "False"      |
| 318 | 1419172121 | "A"      | "False"      |
| 319 | 1419171980 | "A"      | "False"      |
| 320 | 1419171420 | "A"      | "False"      |

| Row | COORDS                                                                                                                                                                      |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1   | 2x11 Array{Float64,2}:
 -8.58568  -8.58571  -8.58568  -8.58573  …  -8.587   -8.58658  -8.58488
 41.1485   41.1486   41.1489   41.1489      41.1475  41.1472   41.1466     |
| 2   | 2x40 Array{Float64,2}:
 -8.61088  -8.61086  -8.6109  -8.61044  …  -8.60293  -8.60255  -8.60189
 41.1456   41.1456   41.1458  41.1462      41.1628   41.1631   41.1636     |
| 3   | 2x40 Array{Float64,2}:
 -8.58574  -8.58573  -8.58572  -8.58629  …  -8.57695  -8.5759  -8.5749
 41.1486   41.1488   41.149    41.149       41.1664   41.1672  41.1677      |
| 4   | 2x8 Array{Float64,2}:
 -8.61396  -8.61412  -8.61509  -8.61528  …  -8.61524  -8.61505  -8.61464
 41.1412   41.1411   41.1409   41.1408      41.1408   41.1408   41.141     |
| 5   | 2x2 Array{Float64,2}:
 -8.6199  -8.61989
 41.148   41.148                                                                                                                   |
| 6   | 2x137 Array{Float64,2}:
 -8.63061  -8.63061  -8.63074  -8.63151  …  -8.62639  -8.6264  -8.62641
 41.1782   41.1782   41.1782   41.1781      41.172    41.172   41.172     |
| 7   | 2x24 Array{Float64,2}:
 -8.58562  -8.58564  -8.58592  -8.58637  …  -8.58156  -8.58181  -8.58205
 41.1489   41.1489   41.1489   41.1489      41.1533   41.1535   41.1538   |
| 8   | 2x17 Array{Float64,2}:
 -8.58292  -8.582   -8.58108  -8.58011  …  -8.57703  -8.57753  -8.57877
 41.1811   41.1818  41.183    41.184       41.1861   41.1861   41.1852     |
| 9   | 2x43 Array{Float64,2}:
 -8.60653  -8.60667  -8.6068  -8.60679  …  -8.60548  -8.60549  -8.60549
 41.1447   41.1447   41.1447  41.1447      41.1257   41.1258   41.1258     |
| 10  | 2x79 Array{Float64,2}:
 -8.58566  -8.5857  -8.58573  -8.58574  …  -8.59117  -8.58826  -8.58631
 41.1486   41.1486  41.1486   41.1486      41.1942   41.1974   41.1993     |
| 11  | 2x63 Array{Float64,2}:
 -8.59123  -8.59123  -8.59122  -8.591   …  -8.58767  -8.5881  -8.58823
 41.1627   41.1627   41.1627   41.1626     41.1687   41.1689  41.1689       |
⋮
| 309 | 2x21 Array{Float64,2}:
 -8.61072  -8.61049  -8.6094  -8.6085  …  -8.59074  -8.58956  -8.58816
 41.1445   41.1437   41.1432  41.1431     41.1469   41.1471   41.1473       |
| 310 | 2x25 Array{Float64,2}:
 -8.6406  -8.64005  -8.64022  -8.63974  …  -8.63605  -8.63605  -8.63605
 41.1549  41.1547   41.1536   41.1533      41.1405   41.1405   41.1406     |
| 311 | 2x19 Array{Float64,2}:
 -8.68929  -8.6893  -8.68873  -8.68765  …  -8.678   -8.67778  -8.67773
 41.1682   41.1682  41.1674   41.1663      41.1521  41.1517   41.1515       |
| 312 | 2x72 Array{Float64,2}:
 -8.60636  -8.60636  -8.60711  -8.6073  …  -8.68831  -8.6866  -8.68486
 41.1445   41.1446   41.1451   41.1457     41.1728   41.1734  41.1734       |
| 313 | 2x45 Array{Float64,2}:
 -8.61253  -8.61253  -8.61287  -8.61289  …  -8.58566  -8.58584  -8.58584
 41.1595   41.1595   41.1595   41.1595      41.1489   41.149    41.149    |
| 314 | 2x267 Array{Float64,2}:
 -8.66747  -8.66735  -8.66717  -8.66798  …  -8.53496  -8.53497  -8.53498
 41.2381   41.2383   41.2384   41.2387      41.1433   41.1433   41.1433  |
| 315 | 2x47 Array{Float64,2}:
 -8.60647  -8.60648  -8.60649  -8.60667  …  -8.5917  -8.59579  -8.60045
 41.1447   41.1447   41.1447   41.1448      41.1973  41.1973   41.1988     |
| 316 | 2x48 Array{Float64,2}:
 -8.5702  -8.57019  -8.56947  -8.56733  …  -8.59311  -8.59333  -8.59331
 41.1595  41.159    41.1591   41.1606      41.1511   41.151    41.1511     |
| 317 | 2x94 Array{Float64,2}:
 -8.61387  -8.61388  -8.61472  -8.61584  …  -8.62978  -8.62977  -8.62979
 41.1412   41.1412   41.1411   41.1407      41.1526   41.1526   41.1527   |
| 318 | 2x6 Array{Float64,2}:
 -8.6481  -8.64746  -8.64688  -8.64593  -8.64534  -8.6433
 41.1525  41.1524   41.1531   41.1538   41.1544   41.1543                                   |
| 319 | 2x15 Array{Float64,2}:
 -8.5717  -8.57058  -8.569   -8.57006  …  -8.5658  -8.56669  -8.56921
 41.1561  41.1559   41.1555  41.1561      41.1647  41.1667   41.1676         |
| 320 | 2x52 Array{Float64,2}:
 -8.57456  -8.57225  -8.57049  -8.56883  …  -8.59046  -8.59078  -8.59234
 41.1802   41.1799   41.1795   41.1806      41.1978   41.1952   41.1922   |

| Row | NUM_COORDS | START              | END                |
|-----|------------|--------------------|--------------------|
| 1   | 11         | [-8.58568,41.1485] | [-8.58488,41.1466] |
| 2   | 40         | [-8.61088,41.1456] | [-8.60189,41.1636] |
| 3   | 40         | [-8.58574,41.1486] | [-8.5749,41.1677]  |
| 4   | 8          | [-8.61396,41.1412] | [-8.61464,41.141]  |
| 5   | 2          | [-8.6199,41.148]   | [-8.61989,41.148]  |
| 6   | 137        | [-8.63061,41.1782] | [-8.62641,41.172]  |
| 7   | 24         | [-8.58562,41.1489] | [-8.58205,41.1538] |
| 8   | 17         | [-8.58292,41.1811] | [-8.57877,41.1852] |
| 9   | 43         | [-8.60653,41.1447] | [-8.60549,41.1258] |
| 10  | 79         | [-8.58566,41.1486] | [-8.58631,41.1993] |
| 11  | 63         | [-8.59123,41.1627] | [-8.58823,41.1689] |
⋮
| 309 | 21         | [-8.61072,41.1445] | [-8.58816,41.1473] |
| 310 | 25         | [-8.6406,41.1549]  | [-8.63605,41.1406] |
| 311 | 19         | [-8.68929,41.1682] | [-8.67773,41.1515] |
| 312 | 72         | [-8.60636,41.1445] | [-8.68486,41.1734] |
| 313 | 45         | [-8.61253,41.1595] | [-8.58584,41.149]  |
| 314 | 267        | [-8.66747,41.2381] | [-8.53498,41.1433] |
| 315 | 47         | [-8.60647,41.1447] | [-8.60045,41.1988] |
| 316 | 48         | [-8.5702,41.1595]  | [-8.59331,41.1511] |
| 317 | 94         | [-8.61387,41.1412] | [-8.62979,41.1527] |
| 318 | 6          | [-8.6481,41.1525]  | [-8.6433,41.1543]  |
| 319 | 15         | [-8.5717,41.1561]  | [-8.56921,41.1676] |
| 320 | 52         | [-8.57456,41.1802] | [-8.59234,41.1922] |)

In [4]:
taxi_cols = taxi_df.colindex.names
taxi_cols = taxi_cols[taxi_cols .!= :COORDS]

taxi_df_simp = taxi_df[taxi_cols]


Out[4]:
TRIP_IDCALL_TYPEORIGIN_CALLORIGIN_STANDTAXI_IDTIMESTAMPDAY_TYPEMISSING_DATANUM_COORDSSTARTENDCOORDS_TEST
11372636858620000589CNANA200005891372636858AFalse23[-8.618643,41.141412][-8.630838,41.154489][-8.618643 -8.618499 -8.620326 -8.622153 -8.623953 -8.62668 -8.627373 -8.630226 -8.632746 -8.631738 -8.629938 41.141412 41.141376 41.14251 41.143815 41.144373 41.144778 41.144697 41.14521 41.14692 41.148225 41.150385]
21372637303620000596BNA7200005961372637303AFalse19[-8.639847,41.159826][-8.66574,41.170671][-8.639847 -8.640351 -8.642196 -8.644455 -8.646921 -8.649999 -8.653167 -8.656434 -8.660178 -8.663112 -8.666235 -8.669169 -8.670852 -8.670942 -8.66961 41.159826 41.159871 41.160114 41.160492 41.160951 41.161491 41.162031 41.16258 41.163192 41.163687 41.1642 41.164704 41.165136 41.166576 41.167962]
31372636951620000320CNANA200003201372636951AFalse65[-8.612964,41.140359][-8.61597,41.14053][-8.612964 -8.613378 -8.614215 -8.614773 -8.615907 -8.616609 -8.618472 -8.620623 -8.622558 -8.62506 -8.627436 -8.630082 -8.6319 -8.632584 -8.631252 -8.629713 -8.628804 -8.628579 -8.62875 -8.630424 -8.632683 -8.635131 -8.637705 -8.64036 -8.642205 -8.644068 -8.646453 -8.648613 -8.649504 -8.649837 -8.649837 -8.649882 -8.649936 -8.6499 -8.599383 -8.59653 -8.65008 -8.650395 -8.650377 -8.650359 -8.649891 -8.649369 -8.649198 -8.649711 41.140359 41.14035 41.140278 41.140368 41.140449 41.140602 41.141412 41.142789 41.144094 41.144805 41.144733 41.145174 41.146461 41.147316 41.148774 41.150628 41.152077 41.152464 41.152662 41.15277 41.152779 41.152563 41.153013 41.15358 41.154021 41.154507 41.154336 41.1543 41.154336 41.154354 41.1543 41.154282 41.1543 41.154264 41.141736 41.140566 41.154291 41.153814 41.153832 41.153787 41.153166 41.152572 41.152374 41.151213]
41372636854620000520CNANA200005201372636854AFalse43[-8.574678,41.151951][-8.607996,41.142915][-8.574678 -8.574705 -8.574696 -8.57466 -8.574723 -8.574714 -8.574714 -8.575164 -8.577135 -8.57853 -8.579745 -8.579358 -8.580744 -8.582904 41.151951 41.151942 41.151933 41.15196 41.151933 41.151924 41.151924 41.150934 41.150232 41.148639 41.147316 41.146173 41.14503 41.14512]
51372637091620000337CNANA200003371372637091AFalse29[-8.645994,41.18049][-8.687268,41.178087][-8.645994 -8.645949 -8.646048 -8.646804 -8.649495 -8.65215 -8.654049 -8.655012 -8.656353 -8.659647 -8.662518 -8.664561 -8.667432 -8.668944 -8.671374 -8.673894 -8.676918 -8.680032 -8.682615 -8.685441 -8.688105 -8.688879 -8.689059 -8.689086 -8.689059 -8.687502 41.18049 41.180517 41.180049 41.178888 41.178465 41.177961 41.177196 41.177925 41.177853 41.177277 41.177619 41.179221 41.178537 41.176674 41.17518 41.173308 41.171841 41.171949 41.173191 41.173776 41.17365 41.174379 41.17608 41.176431 41.176593 41.176755]
61372636965620000231CNANA200002311372636965AFalse26[-8.615502,41.140674][-8.578224,41.160717][-8.615502 -8.614854 -8.613351 -8.609976 -8.607537 -8.603676 -8.599833 -8.596458 -8.592993 -8.589384 -8.587026 -8.583831 -8.581086 -8.579133 41.140674 41.140926 41.14152 41.140854 41.141295 41.141808 41.141916 41.140494 41.140008 41.140674 41.142753 41.143644 41.143698 41.144238]
71372637210620000456CNANA200004561372637210AFalse36[-8.57952,41.145948][-8.603973,41.142816][-8.57952 -8.580942 -8.582706 -8.584092 -8.58546 -8.587116 -8.586171 -8.58609 -8.588016 -8.590401 -8.593119 -8.593506 -8.593668 -8.595342 -8.59608 -8.597466 -8.598051 -8.598663 -8.600688 -8.601723 -8.601795 -8.601768 -8.602227 41.145948 41.145039 41.145021 41.146164 41.14683 41.147397 41.148018 41.148963 41.149368 41.150016 41.150736 41.150853 41.15097 41.150232 41.149827 41.148909 41.148567 41.148342 41.148585 41.147325 41.147262 41.147289 41.147001]
81372637299620000011CNANA200000111372637299AFalse34[-8.617563,41.146182][-8.6247,41.161554][-8.617563 -8.617527 -8.616978 -8.615754 -8.615745 -8.615466 -8.615142 -8.615142 -8.61579 -8.617455 -8.617716 -8.618013 41.146182 41.145849 41.144832 41.145426 41.145408 41.145714 41.147046 41.147118 41.147298 41.147235 41.147217 41.147505]
91372637274620000403CNANA200004031372637274AFalse38[-8.611794,41.140557][-8.589402,41.163309][-8.611794 -8.611785 -8.612001 -8.612622 -8.613702 -8.614665 -8.615844 -8.61561 -8.614566 -8.614395 -8.613936 -8.612793 -8.611488 -8.610543 -8.610282 -8.610255 -8.608824 -8.608419 -8.606565 -8.605179 -8.604549 -8.604297 -8.603505 -8.602488 -8.601039 -8.600436 41.140557 41.140575 41.140566 41.140503 41.140341 41.140386 41.140485 41.140683 41.141088 41.141979 41.142942 41.143851 41.144787 41.144391 41.143536 41.143401 41.143239 41.143149 41.142348 41.143446 41.144796 41.1453 41.145561 41.145633 41.145759 41.146443]
101372637905620000320CNANA200003201372637905AFalse19[-8.615907,41.140557][-8.604594,41.134158][-8.615907 -8.614449 -8.613522 -8.609904 -8.609301 -8.609544 -8.610777 -8.611452 -8.610624 -8.609319 41.140557 41.141088 41.14143 41.140827 41.139522 41.138865 41.137551 41.136012 41.134563 41.134446]
111372636875620000233CNANA200002331372636875AFalse22[-8.619894,41.148009][-8.60949,41.157351][-8.619894 -8.620164 -8.62065 -8.62092 -8.621208 -8.621118 -8.620884 -8.620938 -8.620974 -8.621028 41.148009 41.14773 41.148513 41.150313 41.151951 41.153517 41.155416 41.155479 41.155461 41.155461]
121372637984620000520CNANA200005201372637984AFalse44[-8.56242,41.168403][-8.623017,41.164218][-8.56242 -8.562429 -8.562348 -8.564571 -8.566596 -8.568 -8.570295 -8.570223 -8.570898 -8.572626 -8.57403 -8.576046 -8.577711 -8.580087 -8.58222 -8.584056 -8.586387 -8.588772 -8.589546 -8.590158 -8.591607 -8.593272 -8.595792 -8.596701 -8.597898 -8.598528 41.168403 41.168358 41.167953 41.167125 41.166621 41.167476 41.167755 41.169321 41.168817 41.16924 41.168457 41.167197 41.165442 41.164065 41.164452 41.163201 41.163381 41.1642 41.164164 41.163444 41.16186 41.160564 41.160744 41.160942 41.161158 41.161221]
131372637343620000571A31508NA200005711372637343AFalse32[-8.618868,41.155101][-8.575065,41.162265][-8.618868 -8.6175 -8.615079 -8.613468 -8.613261 -8.613297 -8.612037 -8.611929 -8.610876 -8.610183 -8.610138 -8.609508 41.155101 41.154912 41.154525 41.154228 41.154102 41.153832 41.153904 41.155803 41.157171 41.157252 41.15727 41.157369]
141372638595620000233CNANA200002331372638595AFalse34[-8.608716,41.153499][-8.632737,41.168295][-8.608716 -8.607627 -8.606502 -8.606493 -8.605269 -8.604756 -8.604648 -8.604477 -8.604279 -8.604351 -8.604387 -8.603892 -8.60562 -8.607987 -8.6103 -8.610957 -8.610957 -8.611965 -8.614242 -8.616573 -8.61894 -8.621082 -8.623305 -8.624889 -8.625042 -8.626068 -8.626554 -8.628309 -8.630352 -8.631639 -8.631981 -8.632206 -8.63271 41.153499 41.153481 41.153472 41.153472 41.153427 41.154156 41.155623 41.157351 41.159016 41.16015 41.160276 41.161545 41.162085 41.162391 41.162661 41.162751 41.162805 41.162841 41.163129 41.16339 41.163705 41.163993 41.164281 41.164722 41.166513 41.166972 41.166945 41.16744 41.168079 41.16798 41.168484 41.16843 41.168304]
151372638151620000231CNANA200002311372638151AFalse28[-8.612208,41.14053][-8.606772,41.15826][-8.612208 -8.612235 -8.614035 -8.614809 -8.61561 -8.616024 -8.616006 -8.61462 -8.614314 -8.613711 -8.612811 -8.611983 -8.611299 -8.611056 -8.610777 -8.609112 -8.60886 -8.610876 -8.610237 -8.610066 -8.609985 -8.609805 -8.610003 -8.609328 -8.609238 41.14053 41.140521 41.140323 41.14035 41.140287 41.140467 41.140548 41.141097 41.142393 41.143167 41.143806 41.144436 41.144985 41.145084 41.145921 41.146839 41.147442 41.147955 41.149584 41.150727 41.150898 41.151861 41.153472 41.155056 41.156847]
161372637610620000497BNA13200004971372637610AFalse64[-8.585145,41.164857][-8.628147,41.157522][-8.585145 -8.584146 -8.583147 -8.627931 -8.628813 -8.628264 -8.627589 -8.627508 -8.627517 -8.627184 -8.628759 -8.63127 -8.633736 -8.63613 -8.63613 -8.638209 -8.640144 -8.642079 -8.643915 -8.606169 -8.610615 -8.642412 -8.643042 -8.644041 -8.645868 -8.647686 -8.649207 -8.648784 -8.648064 -8.647992 -8.647992 -8.647983 -8.647974 -8.647983 -8.647983 -8.647983 -8.647983 -8.647974 41.164857 41.164704 41.164758 41.157954 41.159106 41.160978 41.163057 41.1633 41.1633 41.164407 41.164947 41.164893 41.163759 41.164254 41.164272 41.16474 41.165658 41.16726 41.168907 41.172093 41.174352 41.170365 41.17041 41.169546 41.169042 41.170482 41.169879 41.16924 41.169384 41.16942 41.169402 41.169393 41.169393 41.169393 41.169393 41.169393 41.169393 41.169393]
171372638481620000403BNA28200004031372638481AFalse65[-8.584263,41.163156][-8.641566,41.142672][-8.584263 -8.584695 -8.585595 -8.585487 -8.583561 -8.582319 -8.581995 -8.582769 -8.581032 -8.582265 -8.584803 -8.588295 -8.591805 -8.595045 -8.599203 -8.603586 -8.607474 -8.610966 -8.614827 -8.618922 -8.622918 -8.627103 -8.63118 -8.63532 -8.639154 -8.64261 -8.646489 -8.649405 -8.647623 -8.645337 -8.642835 -8.640729 -8.639946 -8.640315 -8.641503 -8.641755 -8.640414 -8.639649 -8.639667 -8.63964 -8.639694 -8.639703 -8.63973 -8.639739 -8.639748 -8.639739 -8.639721 -8.639712 -8.639721 41.163156 41.163003 41.162652 41.161437 41.160276 41.160519 41.161851 41.163354 41.163516 41.165226 41.166693 41.167845 41.169924 41.171877 41.1714 41.171562 41.173002 41.174442 41.174388 41.173857 41.173443 41.173029 41.172264 41.171373 41.169465 41.166945 41.164947 41.162238 41.158593 41.154903 41.151753 41.148756 41.145462 41.143167 41.142555 41.141934 41.141583 41.141538 41.141529 41.141538 41.141619 41.141619 41.141601 41.141592 41.141565 41.141565 41.141565 41.141574 41.141574]
181372639135620000570A33180NA200005701372639135AFalse19[-8.666757,41.174055][-8.662554,41.181714][-8.666757 -8.666784 -8.666649 -8.666325 -8.666811 -8.665524 -8.663841 -8.663013 -8.662032 -8.65962 -8.657757 -8.658576 41.174055 41.174064 41.174073 41.174847 41.175549 41.176368 41.17716 41.177862 41.176872 41.17707 41.177457 41.178546]
191372637482620000005CNANA200000051372637482AFalse17[-8.599239,41.149188][-8.584326,41.169258][-8.599239 -8.598681 -8.597943 -8.596962 -8.595765 -8.594388 -8.59293 -8.591589 -8.59005 41.149188 41.149296 41.150583 41.152266 41.154345 41.156739 41.159286 41.161464 41.163336]
201372639181620000089CNANA200000891372639181AFalse9[-8.643807,41.168979][-8.647911,41.1786][-8.643807 -8.642529 -8.642133 -8.64324 -8.644788 -8.646534 41.168979 41.170113 41.171202 41.172723 41.173947 41.175558]
211372638161620000423CNANA200004231372638161AFalse15[-8.609706,41.151303][-8.621064,41.16114][-8.609706 -8.609562 -8.610021 -8.609769 -8.609355 -8.610732 -8.611893 -8.612784 -8.614989 -8.61813 -8.619606 -8.619777 -8.620497 -8.621163 41.151303 41.151474 41.152509 41.153409 41.153535 41.153688 41.154345 41.155164 41.155425 41.155704 41.156478 41.157954 41.159412 41.160357]
221372637254620000657A39233NA200006571372637254AFalse43[-8.660646,41.168574][-8.601894,41.181813][-8.660646 -8.661087 -8.661231 -8.660637 -8.660295 -8.658954 -8.657649 -8.656371 -8.654706 -8.653014 -8.651349 -8.652213 -8.651358 -8.65071 -8.65125 -8.649648 -8.647515 -8.644491 -8.641701 -8.638677 -8.635284 -8.632476 -8.629983 -8.629857 -8.629857 -8.629263 -8.625996 -8.622018 -8.61885 -8.617077 41.168574 41.167926 41.166576 41.166396 41.166819 41.168394 41.169906 41.171454 41.173479 41.17527 41.17644 41.177241 41.178069 41.178924 41.17968 41.180643 41.18193 41.183541 41.184108 41.182452 41.180589 41.179104 41.179365 41.179374 41.179374 41.179437 41.179779 41.180463 41.181912 41.182821]
231372638502620000320CNANA200003201372638502AFalse35[-8.612955,41.140377][-8.60247,41.15592][-8.612955 -8.612991 -8.61309 -8.613297 -8.614512 -8.615358 -8.615988 -8.615556 -8.614548 -8.614269 -8.61372 -8.612631 -8.61174 -8.611461 -8.611488 -8.611434 -8.611308 -8.610615 -8.609715 -8.608599 -8.608581 -8.608563 -8.60733 -8.605224 -8.604468 -8.604405 -8.603703 -8.603307 -8.602038 -8.602011 -8.60193 -8.602074 -8.602317 -8.602452 41.140377 41.140359 41.140368 41.14035 41.140269 41.140215 41.140422 41.140701 41.14116 41.142285 41.143032 41.143932 41.14458 41.144787 41.144787 41.144796 41.144886 41.144103 41.143158 41.143149 41.143167 41.143158 41.142501 41.143338 41.145084 41.145192 41.145444 41.145525 41.145687 41.145678 41.146875 41.149197 41.151816 41.154129]
241372639960620000309BNA38200003091372639960AFalse20[-8.60418,41.160969][-8.579763,41.167899][-8.60418 -8.603874 -8.60391 -8.603919 -8.602614 -8.601309 41.160969 41.1615 41.161527 41.162022 41.163039 41.163993]
251372637658620000596A22864NA200005961372637658AFalse26[-8.665686,41.170626][-8.654337,41.187816][-8.665686 -8.665677 -8.664786 -8.663238 -8.66205 -8.662248 -8.663733 -8.663751 -8.663229 -8.66322 -8.66322 -8.662986 41.170626 41.170653 41.171319 41.172453 41.173875 41.175378 41.176413 41.17725 41.177754 41.177763 41.177763 41.17779]
261372639092620000233CNANA200002331372639092AFalse8[-8.632737,41.168295][-8.6328,41.168232][-8.632737 -8.6328 -8.632845 -8.632836 41.168295 41.16825 41.168205 41.168178]
271372639535620000161A25862NA200001611372639535AFalse57[-8.648226,41.148333][-8.58825,41.097357][-8.648226 -8.648514 -8.648784 -8.650827 -8.653158 -8.652906 -8.651799 -8.649999 -8.649216 -8.648469 -8.648379 -8.648325 -8.648217 -8.648379 -8.649252 -8.648298 -8.646435 -8.645202 -8.643042 -8.64081 -8.640081 -8.639361 -8.637804 41.148333 41.148297 41.148153 41.148045 41.14836 41.149206 41.150313 41.151096 41.152311 41.153481 41.154021 41.154003 41.154282 41.15583 41.15799 41.159079 41.157216 41.154507 41.151915 41.149269 41.146272 41.143248 41.139909]
281372640499620000596CNANA200005961372640499AFalse29[-8.638227,41.159592][-8.611596,41.150439][-8.638227 -8.637543 -8.635158 -8.633196 -8.632728 -8.633052 -8.632656 -8.632593 -8.631315 -8.629092 -8.626662 -8.624799 -8.622864 -8.620425 -8.617752 -8.61525 -8.612811 -8.611299 -8.609742 -8.60994 -8.610075 -8.610003 -8.610084 -8.610075 -8.610714 -8.611335 -8.611524 -8.61156 41.159592 41.15925 41.158881 41.158926 41.160276 41.161095 41.161842 41.161806 41.162598 41.163597 41.163471 41.163354 41.16303 41.162724 41.16249 41.162202 41.161905 41.161707 41.1615 41.160105 41.158242 41.156145 41.15412 41.152455 41.15106 41.150331 41.150367 41.150403]
291372639635620000178BNA52200001781372639635AFalse18[-8.613243,41.154444][-8.613315,41.166891][-8.613243 -8.612811 -8.611965 -8.61192 -8.612541 -8.612073 -8.611605 -8.611371 -8.611236 -8.611011 -8.611011 -8.61102 41.154444 41.153733 41.154642 41.156667 41.157324 41.158701 41.160429 41.161581 41.162292 41.164092 41.164434 41.164902]
301372640555620000235CNANA200002351372640555AFalse11[-8.611065,41.149431][-8.619993,41.146839][-8.611065 -8.611209 -8.611236 -8.611794 -8.612937 -8.620308 -8.620353 41.149431 41.149368 41.148243 41.147901 41.148207 41.147469 41.147352]
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Data Analysis

Creating coord dict


In [8]:
small_taxi_df = GetTableOrderedSubset(taxi_df, 20000)
coordsDB = ConstructCoordsDatabase(small_taxi_df, 4)|


syntax: incomplete: premature end of input
while loading In[8], in expression starting on line 2

In [1]:
coord_counts = [length(x)::Int64 for x in values(coordsDB)]
#coord_counts
describe(coord_counts)


coordsDB not defined
while loading In[1], in expression starting on line 1

 in anonymous at no file

In [4]:
#all_coords_val = hcat(taxi_validation_df[:COORDS]...)
all_coords = hcat(taxi_df[:COORDS][1:700]...)'
x = all_coords[:,1]
y = all_coords[:,2]
#taxi_df[:COORDS][1:50]
Gadfly.plot(x=x, y=y)


taxi_df not defined
while loading In[4], in expression starting on line 2

In [5]:
println("Showing arbitrary paths")

function plotCoords(coords, show_limit=40)
    num_paths = min(length(coords), show_limit)
    num_coords = length(coords)
    coordsToPlot = coords[randperm(num_coords)[1:num_paths]]
    layers = [layer(x=round(c'[:,1],3),y=round(c'[:,2],3),Geom.point, Geom.path, Theme(default_color=RGB(rand(3)...))) for c in coordsToPlot]
    Gadfly.plot(layers...) 
end

taxi_id = taxi_df[:TAXI_ID][11]
plotCoords(taxi_df[:COORDS][taxi_df[:TAXI_ID] .== taxi_id][1:30])


Showing arbitrary paths
Out[5]:
x -8.95 -8.90 -8.85 -8.80 -8.75 -8.70 -8.65 -8.60 -8.55 -8.50 -8.45 -8.40 -8.35 -8.30 -8.25 -8.91 -8.90 -8.89 -8.88 -8.87 -8.86 -8.85 -8.84 -8.83 -8.82 -8.81 -8.80 -8.79 -8.78 -8.77 -8.76 -8.75 -8.74 -8.73 -8.72 -8.71 -8.70 -8.69 -8.68 -8.67 -8.66 -8.65 -8.64 -8.63 -8.62 -8.61 -8.60 -8.59 -8.58 -8.57 -8.56 -8.55 -8.54 -8.53 -8.52 -8.51 -8.50 -8.49 -8.48 -8.47 -8.46 -8.45 -8.44 -8.43 -8.42 -8.41 -8.40 -8.39 -8.38 -8.37 -8.36 -8.35 -8.34 -8.33 -8.32 -8.31 -8.30 -8.29 -9.0 -8.8 -8.6 -8.4 -8.2 -8.92 -8.90 -8.88 -8.86 -8.84 -8.82 -8.80 -8.78 -8.76 -8.74 -8.72 -8.70 -8.68 -8.66 -8.64 -8.62 -8.60 -8.58 -8.56 -8.54 -8.52 -8.50 -8.48 -8.46 -8.44 -8.42 -8.40 -8.38 -8.36 -8.34 -8.32 -8.30 -8.28 40.90 40.95 41.00 41.05 41.10 41.15 41.20 41.25 41.30 41.35 41.40 41.45 40.950 40.955 40.960 40.965 40.970 40.975 40.980 40.985 40.990 40.995 41.000 41.005 41.010 41.015 41.020 41.025 41.030 41.035 41.040 41.045 41.050 41.055 41.060 41.065 41.070 41.075 41.080 41.085 41.090 41.095 41.100 41.105 41.110 41.115 41.120 41.125 41.130 41.135 41.140 41.145 41.150 41.155 41.160 41.165 41.170 41.175 41.180 41.185 41.190 41.195 41.200 41.205 41.210 41.215 41.220 41.225 41.230 41.235 41.240 41.245 41.250 41.255 41.260 41.265 41.270 41.275 41.280 41.285 41.290 41.295 41.300 41.305 41.310 41.315 41.320 41.325 41.330 41.335 41.340 41.345 41.350 41.355 41.360 41.365 41.370 41.375 41.380 41.385 41.390 41.395 41.400 40.8 41.0 41.2 41.4 40.95 40.96 40.97 40.98 40.99 41.00 41.01 41.02 41.03 41.04 41.05 41.06 41.07 41.08 41.09 41.10 41.11 41.12 41.13 41.14 41.15 41.16 41.17 41.18 41.19 41.20 41.21 41.22 41.23 41.24 41.25 41.26 41.27 41.28 41.29 41.30 41.31 41.32 41.33 41.34 41.35 41.36 41.37 41.38 41.39 41.40 y

In [11]:
println("Plotting start and end points")

function plotStartAndEndPoints(coords)
    points = [[x[:,1] x[:,end]]' for x in coords]
    colors = [RGB(rand(3)...) for _ in points]
    
    layers = [layer(x=round(p[:,1],5),y=round(p[:,2],5), Geom.point, Geom.path, Theme(default_color=RGB(rand(3)...))) for p in points]
    Gadfly.plot(layers...) 
end

taxi_id = taxi_df[:TAXI_ID][11]
plotStartAndEndPoints(taxi_df[taxi_df[:TAXI_ID] .== taxi_id,:][:COORDS][100:180])


Plotting start and end points
Out[11]:
x -9.00 -8.95 -8.90 -8.85 -8.80 -8.75 -8.70 -8.65 -8.60 -8.55 -8.50 -8.45 -8.40 -8.35 -8.30 -8.95 -8.94 -8.93 -8.92 -8.91 -8.90 -8.89 -8.88 -8.87 -8.86 -8.85 -8.84 -8.83 -8.82 -8.81 -8.80 -8.79 -8.78 -8.77 -8.76 -8.75 -8.74 -8.73 -8.72 -8.71 -8.70 -8.69 -8.68 -8.67 -8.66 -8.65 -8.64 -8.63 -8.62 -8.61 -8.60 -8.59 -8.58 -8.57 -8.56 -8.55 -8.54 -8.53 -8.52 -8.51 -8.50 -8.49 -8.48 -8.47 -8.46 -8.45 -8.44 -8.43 -8.42 -8.41 -8.40 -8.39 -8.38 -8.37 -8.36 -8.35 -9.0 -8.8 -8.6 -8.4 -8.2 -8.95 -8.90 -8.85 -8.80 -8.75 -8.70 -8.65 -8.60 -8.55 -8.50 -8.45 -8.40 -8.35 40.90 40.95 41.00 41.05 41.10 41.15 41.20 41.25 41.30 41.35 41.40 41.45 40.950 40.955 40.960 40.965 40.970 40.975 40.980 40.985 40.990 40.995 41.000 41.005 41.010 41.015 41.020 41.025 41.030 41.035 41.040 41.045 41.050 41.055 41.060 41.065 41.070 41.075 41.080 41.085 41.090 41.095 41.100 41.105 41.110 41.115 41.120 41.125 41.130 41.135 41.140 41.145 41.150 41.155 41.160 41.165 41.170 41.175 41.180 41.185 41.190 41.195 41.200 41.205 41.210 41.215 41.220 41.225 41.230 41.235 41.240 41.245 41.250 41.255 41.260 41.265 41.270 41.275 41.280 41.285 41.290 41.295 41.300 41.305 41.310 41.315 41.320 41.325 41.330 41.335 41.340 41.345 41.350 41.355 41.360 41.365 41.370 41.375 41.380 41.385 41.390 41.395 41.400 40.8 41.0 41.2 41.4 40.95 40.96 40.97 40.98 40.99 41.00 41.01 41.02 41.03 41.04 41.05 41.06 41.07 41.08 41.09 41.10 41.11 41.12 41.13 41.14 41.15 41.16 41.17 41.18 41.19 41.20 41.21 41.22 41.23 41.24 41.25 41.26 41.27 41.28 41.29 41.30 41.31 41.32 41.33 41.34 41.35 41.36 41.37 41.38 41.39 41.40 y

In [91]:
println("Plotting paths for driving looking at start distance")

function plotCoords(coords, show_limit=40)
    num_paths = min(length(coords), show_limit)
    num_coords = length(coords)
    coordsToPlot = coords[randperm(num_coords)[1:num_paths]]
    layers = [layer(x=round(c'[:,1],3),y=round(c'[:,2],3),Geom.point, Geom.path, Theme(default_color=RGB(rand(3)...))) for c in coordsToPlot]
    Gadfly.plot(layers...) 
end

taxi_id = taxi_df[:TAXI_ID][11]
test_path = taxi_df[taxi_df[:TAXI_ID] .== taxi_id,:][:COORDS][248]
a_taxi_df = taxi_df[taxi_df[:TAXI_ID] .== taxi_id,:][1:200,:]
println("test path size: ", size(test_path, 2), " partial path size: ", int(0.40*size(test_path,2)))
test_path = test_path[:,1:int(0.4*size(test_path,2))]

#a_taxi_df[:inv_DTW] = [float(1/sequenceCompare.DTWDistance(train_path, test_path)) for train_path in a_taxi_df[:COORDS]]
a_taxi_df[:inv_DTW] = [float(1/sequenceCompare.DTWDistance(train_path, test_path[:,1:min(end,size(train_path,2)+2)], 1)) for train_path in a_taxi_df[:COORDS]]
a_taxi_df[:NUM_COORDS_DIFF] = [float(n-size(test_path,2)) for n in a_taxi_df[:NUM_COORDS]]
a_taxi_df[:START_DIFF] = [float(euclideanDist(test_path[:,1], p)+0.000001) for p in a_taxi_df[:START]]
a_taxi_df[:START_DTW_score] = a_taxi_df[:inv_DTW] ./ a_taxi_df[:START_DIFF]
a_taxi_df[:inv_START_DTW_score] = a_taxi_df[:START_DIFF] ./ a_taxi_df[:inv_DTW]
sort!(a_taxi_df, cols=[:inv_DTW], rev=true)
p1 = Gadfly.plot(layer(x=1:398, y=a_taxi_df[:inv_DTW], Geom.point), Scale.y_log10)
p2 = Gadfly.plot(layer(x=1:398, y=a_taxi_df[:NUM_COORDS], Geom.point))
p3 = Gadfly.plot(layer(x=a_taxi_df[:inv_DTW], y=a_taxi_df[:NUM_COORDS_DIFF], Geom.point), Scale.x_log10)
p4 = Gadfly.plot(layer(x=1./a_taxi_df[:START_DIFF], y=a_taxi_df[:NUM_COORDS_DIFF], Geom.point))
p5 = Gadfly.plot(layer(x=a_taxi_df[:inv_START_DTW_score], y=a_taxi_df[:NUM_COORDS_DIFF], Geom.point), Scale.x_log10)
vstack(p5)
#a_taxi_df[:NUM_COORDS]


Plotting paths for driving looking at start distance
test path size: 33 partial path size: 13
Out[91]:
x 10-18 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 10-16.0 10-15.5 10-15.0 10-14.5 10-14.0 10-13.5 10-13.0 10-12.5 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 10-20 10-10 100 1010 10-16.0 10-15.5 10-15.0 10-14.5 10-14.0 10-13.5 10-13.0 10-12.5 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 -500 -480 -460 -440 -420 -400 -380 -360 -340 -320 -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 -500 0 500 1000 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 y

In [66]:
df = DataFrame(A = [1,2,3], B = [1,10,100])
df[:A] ./ df[:B]


Out[66]:
3-element DataArray{Float64,1}:
 1.0 
 0.2 
 0.03

In [34]:
function euclideanDist(p1, p2)
    return sqrt((p1[1]-p2[1])^2 + (p1[2]-p2[2])^2)
end

s1, s2 = taxi_df[:START][13], taxi_df[:START][42]
euclideanDist(s1,s2)


Out[34]:
0.04586648040781201

Visualization

Math

Grid up the map $ M = \left( \begin{array}{ccc} m_{1,1} & \cdots & m_{1,n} \\ \vdots & \ddots & \vdots \\ m_{n,1} & \cdots & m_{n,n} \end{array} \right) $ \

Certain points will have a much higher "ending prior" $ Pr(p_{i,j} = END) = $ No. of paths that end here / No. of paths that pass through here

We also could empose a markov assumption: $ Pr(p_k = m_{i,j} | p_{k-1} = m_{i-1,j})$

This problem can be seen as a markov model then. We could use MRF or CRF that predict vector of direction.

Could also empose start/end priors: $ Pr(p_k = dest_j | p_0 = src_j) $

  • Instead of having a list of coordinates, it may make more sense to map trips to a sequence of interest points: Points that occur with med/high frequency. This will remove alot of noise and simplify paths.

  • Idea: use k-means on the coordinate set to find K most common points on the map $p_1 \cdots p_K$, we can re-encode each path as $c_i^* = arg\min_{k} {|p_k - c_i|}$. We could collapse sequences of repeated elements for simplicity, optionally encoding as (c_i, # times)

Idea. find the conduits. The longest frequently traveled roads where most people that enter will leave. For a new one, if we see them entering the conduit, we can be confident they will exit the conduit.


In [156]:
# deleting rows/ filtering rows from a dataframe
is_even = x -> x % 2 .== 0
df = DataFrame(A = [11,12,13,14,15], B = [1,2,3,4,5])
#deleterows!(df, find(is_even(df[:A])))
df[is_even(df[:A]),:]


Out[156]:
AB
1122
2144

In [174]:
#deleterows!(taxi_df, find(df[:NUM_COORDS]))

taxi_df[taxi_df[:NUM_COORDS] .== 0,:]
#deleterows!(taxi_df, find(taxi_df[:NUM_COORDS] .== 0))


Out[174]:
TRIP_IDCALL_TYPEORIGIN_CALLORIGIN_STANDTAXI_IDTIMESTAMPDAY_TYPEMISSING_DATACOORDSNUM_COORDS

In [ ]: