Lesson 8: Cross-validation

So far, we've learned about splitting our data into training and testing sets to validate our models. This helps ensure that the model we create on one sample performs well on another sample we want to predict.

However, we don't have to use just TWO samples to train and test our models. Instead, we can split our data up into MULTIPLE samples to train and test on multiple segments of the data. This is called CROSS-VALIDATION. This allows us to ensure that our model predicts outcomes over a wider range of circumstances.

Let's begin by importing our packages.



In [1]:

    
! conda install geopandas -qy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 

import geopandas as gpd
from shapely.geometry import Point, Polygon

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold









    




# All requested packages already installed.
# packages in environment at /opt/conda:
#
geopandas                 0.3.0                    py36_0    conda-forge



In [2]:

    
import os
os.getcwd()
os.chdir('/home/jovyan/assignment-08-cross-validation-drewgobbi')

Today we'll be looking at 311 service requests for rodent inspection and abatement aggregated at the Census block level. The data set is already prepared for you and available in the same folder as this assignment. Census blocks are a good geographic level to analyze rodent infestations because they are drawn along natural and human-made boundaries, like rivers and roads, that rats tend not to cross.

We will look at the 'activity' variable, which indicates whether inspectors found rat burrows during an inspection (1) or not (0). Here we are looking only at inpsections in 2016. About 43 percent on inspections in 2016 led to inspectors finding and treating rat burrows, as you can see below.



In [3]:

    
data = pd.read_csv('rat_data_2016.csv')



In [4]:

    
data.columns









    Out[4]:





Index(['activity', 'alley_condition', 'bbl_hotel', 'bbl_multifamily_rental',
       'bbl_restaurant', 'bbl_single_family_rental', 'bbl_storage',
       'bbl_two_family_rental', 'communitygarden_area', 'communitygarden_id',
       'dcrapermit_addition', 'dcrapermit_demolition', 'dcrapermit_excavation',
       'dcrapermit_new_building', 'dcrapermit_raze', 'impervious_area',
       'month', 'num_mixed_use', 'num_non_residential', 'num_residential',
       'park', 'pct_mixed_use', 'pct_non_residential', 'pct_residential',
       'pop_density', 'sidewalk_grates', 'ssl_cndtn_Average_comm',
       'ssl_cndtn_Average_res', 'ssl_cndtn_Excellent_comm',
       'ssl_cndtn_Excellent_res', 'ssl_cndtn_Fair_comm', 'ssl_cndtn_Fair_res',
       'ssl_cndtn_Good_comm', 'ssl_cndtn_Good_res', 'ssl_cndtn_Poor_comm',
       'ssl_cndtn_Poor_res', 'ssl_cndtn_VeryGood_comm',
       'ssl_cndtn_VeryGood_res', 'tot_pop', 'well_activity', 'WARD'],
      dtype='object')



In [5]:

    
data.describe().T









    Out[5]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      activity
      2606.0
      0.431696
      0.495408
      0.000000
      0.000000
      0.000000
      1.000000
      1.000000
    
    
      alley_condition
      2606.0
      11.111282
      8.900166
      0.000000
      4.000000
      10.000000
      16.000000
      79.000000
    
    
      bbl_hotel
      2606.0
      0.082118
      0.376073
      0.000000
      0.000000
      0.000000
      0.000000
      8.000000
    
    
      bbl_multifamily_rental
      2606.0
      1.388718
      2.376244
      0.000000
      0.000000
      0.000000
      2.000000
      22.000000
    
    
      bbl_restaurant
      2606.0
      0.569455
      1.518526
      0.000000
      0.000000
      0.000000
      0.000000
      17.000000
    
    
      bbl_single_family_rental
      2606.0
      4.709133
      8.375165
      0.000000
      1.000000
      2.000000
      5.000000
      147.000000
    
    
      bbl_storage
      2606.0
      0.002686
      0.051768
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      bbl_two_family_rental
      2606.0
      0.743668
      1.378860
      0.000000
      0.000000
      0.000000
      1.000000
      15.000000
    
    
      communitygarden_area
      2606.0
      18.727920
      326.332382
      0.000000
      0.000000
      0.000000
      0.000000
      11004.319881
    
    
      communitygarden_id
      2606.0
      0.242134
      3.234069
      0.000000
      0.000000
      0.000000
      0.000000
      80.000000
    
    
      dcrapermit_addition
      2606.0
      0.054490
      0.265959
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      dcrapermit_demolition
      2606.0
      0.280507
      0.738300
      0.000000
      0.000000
      0.000000
      0.000000
      12.000000
    
    
      dcrapermit_excavation
      2606.0
      0.010361
      0.105000
      0.000000
      0.000000
      0.000000
      0.000000
      2.000000
    
    
      dcrapermit_new_building
      2606.0
      0.053722
      0.286938
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      dcrapermit_raze
      2606.0
      0.044896
      0.308390
      0.000000
      0.000000
      0.000000
      0.000000
      5.000000
    
    
      impervious_area
      2606.0
      18450.549489
      18774.921307
      2150.037473
      11356.602831
      14532.464194
      19920.166029
      473222.487756
    
    
      month
      2606.0
      7.194935
      3.001022
      1.000000
      5.000000
      7.000000
      10.000000
      12.000000
    
    
      num_mixed_use
      2606.0
      0.153876
      0.474006
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      num_non_residential
      2606.0
      3.804682
      5.956966
      0.000000
      0.000000
      1.000000
      5.000000
      58.000000
    
    
      num_residential
      2606.0
      39.287797
      26.661136
      0.000000
      21.000000
      36.000000
      53.000000
      334.000000
    
    
      park
      2606.0
      0.046815
      0.225350
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      pct_mixed_use
      2606.0
      0.003899
      0.013706
      0.000000
      0.000000
      0.000000
      0.000000
      0.250000
    
    
      pct_non_residential
      2606.0
      0.139389
      0.242231
      0.000000
      0.000000
      0.038462
      0.150000
      1.000000
    
    
      pct_residential
      2606.0
      0.854794
      0.247308
      0.000000
      0.838200
      0.959184
      1.000000
      1.000000
    
    
      pop_density
      2606.0
      24969.466549
      19217.566990
      0.000000
      13524.013052
      21965.055541
      30907.352412
      182709.507271
    
    
      sidewalk_grates
      2606.0
      2.109363
      5.186348
      0.000000
      0.000000
      0.000000
      2.000000
      73.000000
    
    
      ssl_cndtn_Average_comm
      2606.0
      0.402058
      0.401132
      0.000000
      0.000000
      0.333333
      0.800000
      1.000000
    
    
      ssl_cndtn_Average_res
      2606.0
      0.574651
      0.265928
      0.000000
      0.444444
      0.635642
      0.767857
      1.000000
    
    
      ssl_cndtn_Excellent_comm
      2606.0
      0.022413
      0.095254
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Excellent_res
      2606.0
      0.001871
      0.029081
      0.000000
      0.000000
      0.000000
      0.000000
      0.766990
    
    
      ssl_cndtn_Fair_comm
      2606.0
      0.024810
      0.101711
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Fair_res
      2606.0
      0.018239
      0.047230
      0.000000
      0.000000
      0.000000
      0.017857
      0.666667
    
    
      ssl_cndtn_Good_comm
      2606.0
      0.162147
      0.261734
      0.000000
      0.000000
      0.000000
      0.250000
      1.000000
    
    
      ssl_cndtn_Good_res
      2606.0
      0.285093
      0.206829
      0.000000
      0.133333
      0.266667
      0.403509
      1.000000
    
    
      ssl_cndtn_Poor_comm
      2606.0
      0.002487
      0.032450
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Poor_res
      2606.0
      0.002067
      0.010527
      0.000000
      0.000000
      0.000000
      0.000000
      0.200000
    
    
      ssl_cndtn_VeryGood_comm
      2606.0
      0.104810
      0.232111
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_VeryGood_res
      2606.0
      0.036727
      0.071472
      0.000000
      0.000000
      0.000000
      0.052632
      1.000000
    
    
      tot_pop
      2606.0
      188.892172
      211.199274
      0.000000
      82.000000
      137.000000
      231.000000
      3888.000000
    
    
      well_activity
      2606.0
      0.572141
      1.569373
      0.000000
      0.000000
      0.000000
      0.000000
      19.000000
    
    
      WARD
      2606.0
      3.994244
      2.112836
      1.000000
      2.000000
      4.000000
      6.000000
      8.000000

Recall from last week that, when we do predictive analysis, we usually are not interested in the relationship between two different variables as we are when we do traditional hypothesis testing. Instead, we're interested in training a model that generates predictions that best fit our target population. Therefore, when we are doing any kind of validation, including cross-validation, it is important for us to choose the metric by which we will evaluate the performance of our models.

For this model, we will predict the locations of requests for rodent inspection and abatement in the District of Columbia. When we select a validation metric, it's important for us to think about what we want to optimize. For example, do we want to make sure that our top predictions accurately identify places with rodent infestations, so we don't send our inspectors on a wild goose chase? Then we may to look at the models precision, or what proportion of its positive predictions turn out to be positive. Or do we want to make sure we don't miss any infestations? If so, we may want to look at recall, or the proportion of positive cases that are correctly categorized by the model. If we care a lot about how the model ranks our observations, then we may want to look at the area under the ROC curve, or ROC-AUC, while if we care more about how well the model fits the data, or its "calibration," we may want to look at Brier score or logarithmic loss (log-loss).

In the case of rodent inspections, we most likely want to make sure that we send our inspectors to places where they are most likely to find rats and to avoid sending them on wild goose [rat] chases. Therefore, we will optimize for precision, which we will call from the metrics library in scikit-learn.

The metrics library in scikit-learn provides a number of different options. You should take some time to look at the different metrics that are available to you and consider which ones are most appropriate for your own research



In [6]:

    
from sklearn.metrics import precision_score

The next important decision we need to make when cross-validating our models is how we will define our "folds." Folds are the independent subsamples on which we train and test the data. Keep in mind that it is important that our folds are INDEPENDENT, which means we must guarantee that there's no overlap between our training and test set (i.e., no observation is in both the training and test set). Independence can also have other implications for how we slice the data, which we will discuss as we progress through this lesson.

One of the most common approaches to cross-validation is to make random splits in the data. This is often referred to as k-fold cross-validation, in which the only thing we define is the number of folds (k) that want to split our sample into. Here, I'll use the KFold function from scikit-learn's model_selection library. Let's begin by importing the library and then taking a look at how it splits our data.



In [7]:

    
from sklearn.model_selection import KFold

KFold divides our data into a pre-specified number of (approximately) equally-sized folds so that each observation is in the test set once. When we specify that shuffle=True, KFold first shuffles our data into a random order to ensure that the observations are randomly selected. By selecting a random_state, we can ensure that KFold selects observations the same way each time.

While there are other functions in the model_selection library that will do much of this work for us, KFold will allow us to look at what's going on in the background of our cross-validation process. Let's begin by just looking at how KFold splits our data. Here we split our data into 10 folds each with 10 percent of the data (.1).



In [8]:

    
cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", train_index, "TEST:", test_index)









    



TRAIN: [   0    1    2 ..., 2603 2604 2605] TEST: [   9   22   27   33   53   70   92  104  109  117  121  135  137  156  182
  192  195  196  215  217  224  227  252  259  271  276  289  314  317  326
  333  351  398  399  418  422  427  436  438  443  452  465  478  480  482
  489  518  547  562  563  567  569  578  581  597  609  616  618  619  674
  682  686  700  704  710  711  720  722  728  743  745  746  748  764  778
  795  817  831  855  868  878  880  899  913  916  921  927  933  961  962
  982  983  988  998 1000 1012 1013 1018 1023 1032 1036 1051 1052 1059 1078
 1079 1096 1100 1101 1106 1108 1109 1147 1187 1192 1213 1263 1264 1285 1287
 1300 1323 1326 1327 1371 1396 1418 1421 1432 1452 1484 1507 1515 1520 1544
 1568 1570 1577 1580 1585 1588 1590 1592 1597 1622 1627 1656 1657 1668 1680
 1681 1686 1689 1708 1710 1719 1729 1732 1735 1757 1761 1762 1763 1765 1780
 1783 1790 1798 1803 1814 1816 1818 1820 1821 1827 1832 1836 1853 1873 1874
 1895 1898 1901 1902 1921 1927 1929 1939 1943 1961 1968 1969 1972 1974 1979
 1982 1989 1997 1998 2019 2027 2030 2033 2055 2074 2085 2087 2090 2115 2123
 2134 2138 2151 2156 2161 2183 2186 2191 2225 2230 2254 2258 2274 2281 2289
 2302 2312 2316 2317 2343 2345 2346 2358 2359 2378 2386 2387 2398 2400 2410
 2416 2417 2419 2429 2450 2468 2498 2499 2502 2510 2520 2543 2544 2550 2555
 2564 2567 2579 2585 2596 2597]
TRAIN: [   0    1    2 ..., 2603 2604 2605] TEST: [   4   10   14   23   37   39   41   57   69   76   98  113  124  132  145
  148  162  179  191  204  232  234  245  248  251  296  302  303  311  320
  330  353  357  361  379  385  390  402  405  414  425  440  446  454  457
  477  486  487  501  526  527  536  543  558  565  582  610  615  621  634
  638  652  653  666  667  672  676  687  688  692  702  703  708  712  713
  715  716  758  776  789  812  828  838  840  847  852  876  891  897  898
  905  909  914  924  926  935  966  989  997 1002 1003 1025 1041 1068 1070
 1091 1093 1110 1118 1122 1138 1146 1150 1161 1173 1185 1193 1197 1199 1210
 1211 1222 1228 1231 1232 1239 1242 1244 1270 1292 1294 1303 1332 1334 1362
 1366 1377 1380 1405 1412 1414 1424 1426 1449 1465 1467 1486 1487 1493 1496
 1504 1506 1512 1525 1535 1539 1543 1548 1549 1553 1555 1573 1594 1599 1601
 1625 1637 1642 1646 1663 1664 1665 1675 1678 1702 1703 1712 1714 1727 1728
 1768 1770 1774 1781 1809 1813 1824 1825 1826 1839 1845 1851 1852 1859 1885
 1928 1937 1946 1949 1952 1955 1988 2001 2007 2013 2037 2038 2045 2054 2067
 2069 2073 2088 2095 2116 2131 2143 2149 2175 2201 2207 2213 2229 2233 2245
 2246 2262 2266 2272 2276 2288 2293 2297 2321 2333 2341 2354 2356 2376 2381
 2395 2399 2405 2409 2442 2482 2491 2497 2508 2512 2515 2526 2531 2556 2559
 2561 2562 2566 2588 2592 2594]
TRAIN: [   0    2    3 ..., 2603 2604 2605] TEST: [   1    6   11   17   18   30   40   47   48   52   58  107  125  133  149
  157  161  173  175  189  194  200  206  220  229  249  254  264  283  286
  300  305  322  342  384  386  391  392  411  442  444  453  458  459  461
  488  503  505  517  529  530  535  538  557  568  570  574  575  579  587
  596  602  641  646  648  651  657  661  665  670  684  723  727  731  757
  762  768  792  805  841  850  886  892  895  900  906  918  936  949  951
  953  963  995  996 1005 1009 1015 1017 1027 1042 1055 1058 1063 1073 1081
 1098 1103 1116 1126 1127 1139 1140 1157 1160 1174 1188 1190 1203 1205 1226
 1236 1246 1256 1267 1271 1273 1283 1295 1302 1317 1322 1328 1357 1367 1370
 1373 1386 1393 1411 1422 1428 1431 1448 1450 1459 1471 1492 1503 1513 1521
 1523 1528 1533 1540 1547 1567 1569 1598 1602 1621 1631 1632 1643 1673 1677
 1697 1704 1713 1715 1724 1725 1726 1736 1737 1753 1754 1773 1802 1819 1829
 1840 1854 1855 1861 1864 1867 1869 1872 1897 1900 1931 1942 1945 1947 1964
 1983 1985 1996 1999 2012 2015 2026 2039 2052 2072 2083 2097 2099 2109 2124
 2144 2145 2165 2188 2190 2198 2208 2223 2235 2238 2240 2249 2256 2279 2284
 2285 2290 2300 2308 2320 2326 2336 2352 2355 2357 2367 2372 2382 2396 2407
 2414 2422 2430 2432 2462 2467 2471 2477 2480 2495 2500 2511 2518 2525 2548
 2549 2560 2569 2576 2582 2601]
TRAIN: [   0    1    2 ..., 2603 2604 2605] TEST: [  15   31   34   43   61   64   77   80   85   87  106  118  139  141  144
  169  187  202  223  233  240  250  253  260  262  267  270  279  287  294
  295  298  299  309  310  359  360  369  376  381  393  415  474  475  483
  485  491  502  506  512  516  519  521  522  532  546  553  564  572  600
  620  629  633  636  643  654  655  663  701  721  733  735  766  775  781
  791  793  794  799  806  810  815  818  820  825  829  832  836  842  861
  882  883  890  896  910  917  923  937  938  944  946  958  964  971  977
  979  980  981  986  991 1010 1038 1043 1045 1047 1049 1056 1074 1077 1083
 1087 1097 1099 1114 1119 1136 1148 1151 1170 1180 1183 1212 1217 1240 1254
 1255 1279 1301 1325 1330 1339 1341 1347 1355 1368 1374 1385 1388 1390 1397
 1399 1417 1453 1458 1474 1485 1490 1491 1516 1559 1560 1571 1581 1587 1593
 1603 1610 1611 1647 1648 1674 1694 1696 1709 1758 1759 1760 1779 1786 1787
 1789 1801 1808 1817 1860 1865 1875 1876 1892 1903 1907 1916 1919 1923 1935
 1948 1980 1994 2020 2032 2044 2050 2053 2057 2061 2063 2064 2075 2078 2092
 2122 2129 2133 2140 2158 2164 2173 2178 2203 2204 2210 2219 2259 2263 2273
 2296 2298 2305 2311 2319 2328 2351 2360 2391 2392 2394 2397 2401 2402 2406
 2408 2413 2441 2444 2447 2457 2458 2469 2478 2479 2493 2507 2509 2519 2522
 2536 2538 2571 2574 2575 2581]
TRAIN: [   0    1    3 ..., 2602 2604 2605] TEST: [   2    5   19   36   42   45   54   55   65   66   72   73   82   89  102
  108  122  140  142  152  154  159  170  171  177  178  184  185  190  198
  203  210  214  219  268  272  278  308  312  315  318  319  335  340  341
  347  364  378  383  408  412  416  434  467  468  473  479  481  498  511
  534  539  551  561  583  590  598  632  644  658  689  717  718  724  726
  740  744  750  759  760  772  773  785  796  801  811  813  823  839  856
  863  866  893  930  955  965  974  978  985  987  992 1001 1008 1021 1026
 1029 1050 1069 1076 1080 1082 1117 1129 1132 1135 1137 1145 1154 1165 1166
 1175 1195 1216 1225 1235 1257 1259 1261 1266 1275 1277 1280 1284 1293 1338
 1343 1349 1358 1359 1363 1376 1378 1379 1387 1423 1427 1454 1456 1457 1460
 1462 1473 1478 1482 1489 1499 1500 1518 1526 1530 1537 1538 1550 1606 1612
 1615 1633 1635 1661 1666 1679 1717 1742 1748 1749 1752 1764 1775 1785 1846
 1848 1858 1878 1886 1890 1909 1914 1934 1950 1957 1960 1976 2000 2005 2010
 2017 2018 2025 2028 2031 2034 2035 2056 2058 2093 2100 2102 2108 2119 2160
 2166 2172 2181 2194 2202 2218 2228 2236 2242 2244 2248 2253 2264 2265 2267
 2269 2270 2278 2283 2303 2310 2324 2325 2349 2364 2365 2393 2421 2434 2438
 2452 2453 2456 2460 2463 2465 2475 2484 2485 2487 2492 2505 2506 2514 2528
 2532 2552 2570 2598 2599 2603]
TRAIN: [   0    1    2 ..., 2602 2603 2605] TEST: [   8   13   16   29   32   35   49   56   60   68   71   75   96  110  114
  134  147  155  186  211  212  218  226  231  241  243  244  247  258  265
  285  293  304  336  339  354  355  358  362  367  372  377  401  406  420
  426  431  432  435  445  450  455  456  464  471  494  499  500  513  520
  524  533  540  542  548  549  552  554  576  580  585  589  593  611  631
  642  649  668  677  678  679  693  706  725  737  752  769  782  788  803
  814  862  871  875  881  884  887  889  939  940  942  943  948  957  960
  970  984  994  999 1014 1016 1031 1044 1060 1064 1065 1075 1105 1120 1123
 1124 1128 1143 1164 1171 1184 1189 1200 1218 1227 1229 1245 1248 1252 1265
 1299 1310 1312 1315 1319 1336 1340 1354 1361 1403 1407 1436 1444 1446 1451
 1455 1463 1464 1476 1495 1498 1501 1505 1511 1519 1554 1572 1586 1591 1604
 1607 1626 1644 1650 1652 1654 1655 1662 1669 1670 1683 1698 1716 1721 1738
 1739 1741 1746 1766 1767 1771 1776 1794 1806 1837 1850 1870 1880 1888 1893
 1894 1922 1936 1944 1951 1953 1981 1984 1992 2016 2029 2043 2060 2062 2065
 2070 2082 2096 2111 2125 2126 2130 2132 2157 2167 2174 2192 2193 2196 2205
 2216 2224 2231 2243 2250 2255 2268 2301 2307 2314 2323 2330 2337 2363 2368
 2403 2404 2423 2436 2440 2461 2466 2481 2516 2537 2540 2541 2547 2557 2563
 2572 2580 2586 2589 2600 2604]
TRAIN: [   0    1    2 ..., 2602 2603 2604] TEST: [  38   44   51   59   78   81   83   88   97  103  111  119  123  129  131
  165  183  188  205  213  238  239  261  263  269  313  316  328  338  349
  356  363  371  374  382  395  397  410  439  448  463  466  472  476  484
  490  493  507  510  514  528  541  571  599  601  608  613  625  635  645
  656  660  681  685  695  697  729  751  761  771  779  783  784  808  819
  826  833  844  846  849  853  858  867  874  879  904  907  911  919  934
  947  969  993 1019 1034 1035 1039 1057 1067 1084 1088 1089 1094 1102 1115
 1121 1125 1156 1158 1191 1196 1220 1223 1224 1233 1238 1260 1262 1268 1269
 1274 1276 1281 1290 1291 1296 1307 1311 1318 1320 1321 1324 1335 1344 1351
 1360 1364 1375 1382 1383 1391 1401 1408 1410 1415 1419 1420 1430 1439 1440
 1442 1475 1477 1494 1502 1510 1524 1527 1529 1534 1557 1564 1609 1613 1614
 1618 1624 1630 1636 1639 1651 1658 1687 1691 1693 1695 1700 1706 1711 1733
 1743 1756 1772 1784 1804 1811 1830 1831 1833 1841 1842 1857 1881 1882 1887
 1906 1912 1918 1959 1977 1991 2004 2014 2086 2094 2101 2103 2104 2106 2110
 2114 2118 2127 2147 2179 2184 2199 2214 2220 2226 2227 2232 2239 2241 2275
 2286 2287 2294 2318 2327 2329 2332 2342 2348 2373 2375 2411 2420 2428 2433
 2437 2439 2451 2459 2464 2472 2474 2489 2503 2521 2529 2530 2534 2535 2545
 2546 2551 2553 2590 2605]
TRAIN: [   0    1    2 ..., 2603 2604 2605] TEST: [  12   20   28   46   50   62   74   79   90   95  101  105  115  116  120
  127  128  138  143  150  158  167  172  181  193  208  222  225  228  230
  235  236  242  255  266  284  288  290  301  306  331  332  337  344  345
  346  352  366  370  380  389  394  396  403  409  413  417  421  441  462
  492  495  496  515  523  531  545  559  566  588  592  612  614  617  622
  626  628  630  662  669  675  683  690  707  719  734  742  747  753  765
  777  787  790  798  822  824  857  864  870  877  901  912  915  920  922
  959  968 1024 1030 1037 1054 1061 1062 1066 1072 1086 1092 1095 1142 1168
 1169 1178 1181 1182 1186 1214 1230 1234 1237 1258 1282 1286 1288 1309 1313
 1342 1356 1365 1372 1394 1402 1404 1406 1433 1437 1481 1508 1509 1517 1522
 1545 1546 1575 1576 1583 1584 1595 1600 1616 1620 1623 1628 1629 1638 1649
 1659 1676 1682 1685 1688 1692 1730 1745 1751 1755 1782 1788 1791 1793 1796
 1797 1799 1807 1810 1812 1815 1835 1843 1856 1866 1884 1891 1911 1917 1926
 1932 1958 1962 1965 1986 1990 1995 2002 2041 2047 2048 2049 2066 2068 2077
 2079 2098 2112 2113 2128 2137 2142 2148 2155 2162 2168 2170 2200 2206 2209
 2211 2212 2221 2234 2252 2261 2277 2291 2299 2313 2350 2353 2361 2366 2369
 2370 2374 2384 2388 2426 2445 2448 2449 2455 2476 2486 2504 2513 2533 2539
 2542 2565 2573 2583 2595]
TRAIN: [   1    2    4 ..., 2603 2604 2605] TEST: [   0    3    7   21   26   63   93  100  112  126  153  160  163  164  174
  237  246  280  281  282  292  321  325  327  329  334  343  348  350  365
  375  387  400  404  407  419  424  428  437  447  449  451  460  470  497
  504  550  573  577  584  586  594  595  603  604  605  606  624  627  640
  647  650  664  671  673  680  691  694  696  698  699  709  732  736  738
  739  741  754  780  786  800  804  827  830  834  837  845  848  851  854
  859  869  873  902  903  929  932  941  945  950  952  975  990 1004 1006
 1011 1028 1040 1046 1048 1090 1111 1113 1130 1131 1133 1144 1149 1159 1163
 1177 1179 1194 1201 1202 1209 1215 1219 1221 1243 1247 1249 1250 1251 1253
 1289 1298 1305 1306 1308 1314 1331 1333 1337 1348 1353 1369 1384 1389 1398
 1409 1413 1416 1425 1438 1441 1443 1461 1468 1479 1480 1497 1514 1532 1541
 1542 1551 1556 1558 1562 1566 1574 1579 1582 1596 1608 1617 1660 1667 1671
 1672 1690 1705 1707 1722 1734 1744 1747 1769 1800 1805 1834 1838 1844 1847
 1849 1862 1868 1879 1883 1889 1915 1924 1933 1938 1941 1963 1967 1971 1975
 1978 1993 2003 2006 2009 2021 2040 2042 2051 2107 2136 2150 2152 2153 2154
 2180 2182 2185 2189 2195 2215 2247 2271 2295 2304 2306 2309 2315 2331 2340
 2347 2377 2379 2380 2385 2390 2412 2424 2427 2443 2454 2470 2473 2488 2501
 2523 2527 2591 2593 2602]
TRAIN: [   0    1    2 ..., 2603 2604 2605] TEST: [  24   25   67   84   86   91   94   99  130  136  146  151  166  168  176
  180  197  199  201  207  209  216  221  256  257  273  274  275  277  291
  297  307  323  324  368  373  388  423  429  430  433  469  508  509  525
  537  544  555  556  560  591  607  623  637  639  659  705  714  730  749
  755  756  763  767  770  774  797  802  807  809  816  821  835  843  860
  865  872  885  888  894  908  925  928  931  954  956  967  972  973  976
 1007 1020 1022 1033 1053 1071 1085 1104 1107 1112 1134 1141 1152 1153 1155
 1162 1167 1172 1176 1198 1204 1206 1207 1208 1241 1272 1278 1297 1304 1316
 1329 1345 1346 1350 1352 1381 1392 1395 1400 1429 1434 1435 1445 1447 1466
 1469 1470 1472 1483 1488 1531 1536 1552 1561 1563 1565 1578 1589 1605 1619
 1634 1640 1641 1645 1653 1684 1699 1701 1718 1720 1723 1731 1740 1750 1777
 1778 1792 1795 1822 1823 1828 1863 1871 1877 1896 1899 1904 1905 1908 1910
 1913 1920 1925 1930 1940 1954 1956 1966 1970 1973 1987 2008 2011 2022 2023
 2024 2036 2046 2059 2071 2076 2080 2081 2084 2089 2091 2105 2117 2120 2121
 2135 2139 2141 2146 2159 2163 2169 2171 2176 2177 2187 2197 2217 2222 2237
 2251 2257 2260 2280 2282 2292 2322 2334 2335 2338 2339 2344 2362 2371 2383
 2389 2415 2418 2425 2431 2435 2446 2483 2490 2494 2496 2517 2524 2554 2558
 2568 2577 2578 2584 2587]

You can see that ShuffleSplit has selected a random set of observations from the index of our data set for each fold of our cross-validation. Let's look at the size of our training and test set for each fold.



In [9]:

    
cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", len(train_index), "TEST:", len(test_index))









    



TRAIN: 2345 TEST: 261
TRAIN: 2345 TEST: 261
TRAIN: 2345 TEST: 261
TRAIN: 2345 TEST: 261
TRAIN: 2345 TEST: 261
TRAIN: 2345 TEST: 261
TRAIN: 2346 TEST: 260
TRAIN: 2346 TEST: 260
TRAIN: 2346 TEST: 260
TRAIN: 2346 TEST: 260

Now let's try using KFold to train and test our model on 10 different subsets of our data. Below we set our cross-validator as 'cv'. We then loop through the various splits in our data that cv creates and use it to make our training and test sets. We then use our training set to fit a Logistic Regression model and generate predictions from our test set, which we compare to the actual outcomes we observed.



In [10]:

    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=0)

## Create for-loop
for train_index, test_index in cv.split(data):

    ## Define training and test sets
    X_train = data.loc[train_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_train = data.loc[train_index]['activity']
    X_test = data.loc[test_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_test = data.loc[test_index]['activity']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))









    



Precision: 56.9
Precision: 55.8
Precision: 60.4
Precision: 63.5
Precision: 55.4
Precision: 52.8
Precision: 56.9
Precision: 57.1
Precision: 65.8
Precision: 41.0

We can see that, for the most part, about 50 to 60 percent of the inspections our model predicts will lead our inspectors to rat burrows actually do. This is a modest improvement over our inspectors' current performance in the field. Based on these results, if we used our models to determine which locations our inspectors go to in the field, we'd probably see a 10 to 20 point increase in their likelihood of finding rat burrows.

Exercise 1

Try running the k-fold cross-validation a few times with the same random state. Then try running it a few times with different random states. How do the results change?



In [11]:

    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=0)

## Create for-loop
for train_index, test_index in cv.split(data):

    ## Define training and test sets
    X_train = data.loc[train_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_train = data.loc[train_index]['activity']
    X_test = data.loc[test_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_test = data.loc[test_index]['activity']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))









    



Precision: 56.9
Precision: 55.8
Precision: 60.4
Precision: 63.5
Precision: 55.4
Precision: 52.8
Precision: 56.9
Precision: 57.1
Precision: 65.8
Precision: 41.0



In [12]:

    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=1)

## Create for-loop
for train_index, test_index in cv.split(data):

    ## Define training and test sets
    X_train = data.loc[train_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_train = data.loc[train_index]['activity']
    X_test = data.loc[test_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_test = data.loc[test_index]['activity']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))









    



Precision: 60.3
Precision: 52.7
Precision: 57.4
Precision: 60.4
Precision: 57.1
Precision: 47.9
Precision: 54.2
Precision: 43.2
Precision: 71.4
Precision: 53.3



In [13]:

    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=17)

## Create for-loop
for train_index, test_index in cv.split(data):

    ## Define training and test sets
    X_train = data.loc[train_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_train = data.loc[train_index]['activity']
    X_test = data.loc[test_index].drop(['activity', 'month', 'WARD'], axis=1)
    y_test = data.loc[test_index]['activity']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))









    



Precision: 60.0
Precision: 62.2
Precision: 62.7
Precision: 47.7
Precision: 53.2
Precision: 55.7
Precision: 43.6
Precision: 47.5
Precision: 55.4
Precision: 60.4

Different random states create different measures of precision/datapoints across the 10 subsamples. However, the mean still seems to be in the mid 50's,suggesting some level of consitency

It's important to point out here that, because we have TIME SERIES data, the same Census blocks may be appearing in our training AND our test sets. This is a challenge to ensuring that our training and test samples are INDEPENDENT. While Rodent Control does not inspect the same blocks every month, some of the same blocks may be re-inspected from month to month depending on where 311 requests are coming from.

However, this also affords us an opportunity. More than likely, when we make predictions about which inspections will lead our inspectors to rat burrows, we are interested in predicting FUTURE inspections with observations from PAST inspections. In this case, cross-validating over time can be a very good way of looking at how well our models are performing.

Cross-validating over time requires more than just splitting by month. Rather, we will use observations from each month as a test set and train our models on all PRIOR months. Which we do below.

Cross-validation by Month

Let's begin by seeing what our cross-validation sets look like. Below, we loop through each of the sets to see which months end up in our training and test sets. You can see that as we move from month to month, we have more and more past observations in our training set.



In [14]:

    
months = np.sort(data.month.unique())

for month in range(2,13):
    test = data[data.month==month]
    train = data[(data.month < month)]

    print('Test Month: '+str(test.month.unique()), 'Training Months: '+str(train.month.unique()))









    



Test Month: [2] Training Months: [1]
Test Month: [3] Training Months: [1 2]
Test Month: [4] Training Months: [1 2 3]
Test Month: [5] Training Months: [1 2 3 4]
Test Month: [6] Training Months: [1 2 3 4 5]
Test Month: [7] Training Months: [1 2 3 4 5 6]
Test Month: [8] Training Months: [1 2 3 4 5 6 7]
Test Month: [9] Training Months: [1 2 3 4 5 6 7 8]
Test Month: [10] Training Months: [1 2 3 4 5 6 7 8 9]
Test Month: [11] Training Months: [ 1  2  3  4  5  6  7  8  9 10]
Test Month: [12] Training Months: [ 1  2  3  4  5  6  7  8  9 10 11]



In [15]:

    
months = np.sort(data.month.unique())

for month in range(2,13):

    test = data[data.month==month]
    train = data[(data.month < month)]
    X_test = test.drop(['activity', 'month', 'WARD'], axis=1)
    y_test = test['activity']
    X_train = test.drop(['activity', 'month', 'WARD'], axis=1)
    y_train = test['activity']
        
    clf = LogisticRegression()
    clf.fit(X_train, y_train)
    predicted = clf.predict(X_test)
    print('Precision for Month '+str(month)+': '+str(100*round(precision_score(y_test, predicted),3)))









    



Precision for Month 2: 79.2
Precision for Month 3: 67.9
Precision for Month 4: 48.9
Precision for Month 5: 63.5
Precision for Month 6: 61.3
Precision for Month 7: 67.9
Precision for Month 8: 67.2
Precision for Month 9: 67.6
Precision for Month 10: 57.3
Precision for Month 11: 68.3
Precision for Month 12: 70.5

Our model seems to be performing even better when we cross-validate over months, possibly because we're structuring the cross-validation such that inspections in some of the same blocks appear consistently over time.

Exercise 2

Try re-creating this cross-validation, but with the training set restricted to only the 3 months prior to the test set. Now do the same with the last 1 and 2 months. Do the results change?



In [16]:

    
months = np.sort(data.month.unique())

for month in range(2,13):
    test = data[data.month==month]
    train = data[data.month>=month-3] 
    train = train.drop[train.month>=month]
    print('Test Month: '+str(test.month.unique()), 'Training Months: '+str(train.month.unique()))









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-6ab3699e1c10> in <module>()
      4     test = data[data.month==month]
      5     train = data[data.month>=month-3]
----> 6     train = train.drop[train.month>=month]
      7     print('Test Month: '+str(test.month.unique()), 'Training Months: '+str(train.month.unique()))

TypeError: 'method' object is not subscriptable



In [ ]:



In [ ]:



In [ ]:

We may still be concerned about the independence of our training and test sets. In particular, as I've pointed out, the same Census blocks may appear repeatedly in our data over time. In this case, it may be good to cross-validate geographically to make sure that our model is performing well in different parts of the city. In particular, we know that requests for rodent abatement (and rats themselves) are more common in some parts of the city than in others. In particular, rats are more common in the more densely-populated parts of downtown and less common in less densely-populated places like Wards 3, 7, and 8. Therefore, we may be interested in cross-validating by ward.

Again, this is as simple as looping through each of the 8 wards, holding out each ward as a test set and training the models on observations from the remaining wards.

Cross-validate by Ward



In [ ]:

    
data.WARD.value_counts().sort_index()



In [75]:

    
for ward in np.sort(data.WARD.unique()):

    test = data[data.WARD == ward]
    train = data[data.WARD != ward]
    X_test = test.drop(['activity', 'month', 'WARD'], axis=1)
    y_test = test['activity']
    X_train = test.drop(['activity', 'month', 'WARD'], axis=1)
    y_train = test['activity']
        
    clf = LogisticRegression()
    clf.fit(X_train, y_train)
    predicted = clf.predict(X_test)
    print('Precision for Ward '+str(ward)+': '+str(100*round(precision_score(y_test, predicted),3)))









    



Precision for Ward 1: 60.5
Precision for Ward 2: 60.2
Precision for Ward 3: 81.2
Precision for Ward 4: 62.0
Precision for Ward 5: 62.3
Precision for Ward 6: 55.6
Precision for Ward 7: 0.0
Precision for Ward 8: 0.0






    



/opt/conda/lib/python3.6/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)

Here we see that the model performs very well predicting the outcomes of inspections in wards 1 through 4, but less well in wards 5 though 8. In wards 7 and 8 in particular, the model fails to predict any positive cases. This means that our model may be overfit to observations in Wards 1 through 6, and we may want to re-evaluate our approach.

Exercise 3

Explore the data and our model and try to come up with some reasons that the model is performing poorly on Wards 7 and 8. Is there a way we can fix the model to perform better on those wards? How might we fix the model?



In [19]:

    
data.head().T









    Out[19]:







  
    
      
      0
      1
      2
      3
      4
    
  
  
    
      activity
      1.000000
      0.000000
      0.000000
      1.000000
      1.000000
    
    
      alley_condition
      10.000000
      25.000000
      15.000000
      0.000000
      10.000000
    
    
      bbl_hotel
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      bbl_multifamily_rental
      1.000000
      0.000000
      0.000000
      1.000000
      2.000000
    
    
      bbl_restaurant
      1.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      bbl_single_family_rental
      2.000000
      8.000000
      3.000000
      3.000000
      1.000000
    
    
      bbl_storage
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      bbl_two_family_rental
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      communitygarden_area
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      communitygarden_id
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      dcrapermit_addition
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      dcrapermit_demolition
      0.000000
      0.000000
      0.000000
      1.000000
      0.000000
    
    
      dcrapermit_excavation
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      dcrapermit_new_building
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      dcrapermit_raze
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      impervious_area
      25753.280567
      20933.568033
      11002.566472
      10855.879189
      15560.200354
    
    
      month
      1.000000
      1.000000
      1.000000
      1.000000
      1.000000
    
    
      num_mixed_use
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      num_non_residential
      17.000000
      0.000000
      0.000000
      4.000000
      2.000000
    
    
      num_residential
      54.000000
      102.000000
      26.000000
      14.000000
      64.000000
    
    
      park
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      pct_mixed_use
      0.000000
      0.000000
      0.000000
      0.000000
      0.014925
    
    
      pct_non_residential
      0.239437
      0.000000
      0.000000
      0.222222
      0.029851
    
    
      pct_residential
      0.760563
      1.000000
      1.000000
      0.777778
      0.955224
    
    
      pop_density
      5069.093059
      8738.804149
      12064.771513
      32435.135124
      43599.659702
    
    
      sidewalk_grates
      2.000000
      0.000000
      0.000000
      1.000000
      3.000000
    
    
      ssl_cndtn_Average_comm
      0.000000
      0.000000
      0.000000
      0.000000
      0.285714
    
    
      ssl_cndtn_Average_res
      0.583333
      0.627451
      0.720000
      0.416667
      0.622642
    
    
      ssl_cndtn_Excellent_comm
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      ssl_cndtn_Excellent_res
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      ssl_cndtn_Fair_comm
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      ssl_cndtn_Fair_res
      0.041667
      0.000000
      0.000000
      0.000000
      0.037736
    
    
      ssl_cndtn_Good_comm
      0.400000
      0.000000
      0.000000
      0.000000
      0.142857
    
    
      ssl_cndtn_Good_res
      0.312500
      0.343137
      0.240000
      0.416667
      0.339623
    
    
      ssl_cndtn_Poor_comm
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      ssl_cndtn_Poor_res
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      ssl_cndtn_VeryGood_comm
      0.600000
      0.000000
      0.000000
      1.000000
      0.571429
    
    
      ssl_cndtn_VeryGood_res
      0.062500
      0.029412
      0.040000
      0.166667
      0.000000
    
    
      tot_pop
      137.000000
      273.000000
      81.000000
      132.000000
      216.000000
    
    
      well_activity
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      WARD
      3.000000
      3.000000
      3.000000
      2.000000
      2.000000



In [24]:

    
data.describe().T









    Out[24]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      activity
      2606.0
      0.431696
      0.495408
      0.000000
      0.000000
      0.000000
      1.000000
      1.000000
    
    
      alley_condition
      2606.0
      11.111282
      8.900166
      0.000000
      4.000000
      10.000000
      16.000000
      79.000000
    
    
      bbl_hotel
      2606.0
      0.082118
      0.376073
      0.000000
      0.000000
      0.000000
      0.000000
      8.000000
    
    
      bbl_multifamily_rental
      2606.0
      1.388718
      2.376244
      0.000000
      0.000000
      0.000000
      2.000000
      22.000000
    
    
      bbl_restaurant
      2606.0
      0.569455
      1.518526
      0.000000
      0.000000
      0.000000
      0.000000
      17.000000
    
    
      bbl_single_family_rental
      2606.0
      4.709133
      8.375165
      0.000000
      1.000000
      2.000000
      5.000000
      147.000000
    
    
      bbl_storage
      2606.0
      0.002686
      0.051768
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      bbl_two_family_rental
      2606.0
      0.743668
      1.378860
      0.000000
      0.000000
      0.000000
      1.000000
      15.000000
    
    
      communitygarden_area
      2606.0
      18.727920
      326.332382
      0.000000
      0.000000
      0.000000
      0.000000
      11004.319881
    
    
      communitygarden_id
      2606.0
      0.242134
      3.234069
      0.000000
      0.000000
      0.000000
      0.000000
      80.000000
    
    
      dcrapermit_addition
      2606.0
      0.054490
      0.265959
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      dcrapermit_demolition
      2606.0
      0.280507
      0.738300
      0.000000
      0.000000
      0.000000
      0.000000
      12.000000
    
    
      dcrapermit_excavation
      2606.0
      0.010361
      0.105000
      0.000000
      0.000000
      0.000000
      0.000000
      2.000000
    
    
      dcrapermit_new_building
      2606.0
      0.053722
      0.286938
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      dcrapermit_raze
      2606.0
      0.044896
      0.308390
      0.000000
      0.000000
      0.000000
      0.000000
      5.000000
    
    
      impervious_area
      2606.0
      18450.549489
      18774.921307
      2150.037473
      11356.602831
      14532.464194
      19920.166029
      473222.487756
    
    
      month
      2606.0
      7.194935
      3.001022
      1.000000
      5.000000
      7.000000
      10.000000
      12.000000
    
    
      num_mixed_use
      2606.0
      0.153876
      0.474006
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      num_non_residential
      2606.0
      3.804682
      5.956966
      0.000000
      0.000000
      1.000000
      5.000000
      58.000000
    
    
      num_residential
      2606.0
      39.287797
      26.661136
      0.000000
      21.000000
      36.000000
      53.000000
      334.000000
    
    
      park
      2606.0
      0.046815
      0.225350
      0.000000
      0.000000
      0.000000
      0.000000
      4.000000
    
    
      pct_mixed_use
      2606.0
      0.003899
      0.013706
      0.000000
      0.000000
      0.000000
      0.000000
      0.250000
    
    
      pct_non_residential
      2606.0
      0.139389
      0.242231
      0.000000
      0.000000
      0.038462
      0.150000
      1.000000
    
    
      pct_residential
      2606.0
      0.854794
      0.247308
      0.000000
      0.838200
      0.959184
      1.000000
      1.000000
    
    
      pop_density
      2606.0
      24969.466549
      19217.566990
      0.000000
      13524.013052
      21965.055541
      30907.352412
      182709.507271
    
    
      sidewalk_grates
      2606.0
      2.109363
      5.186348
      0.000000
      0.000000
      0.000000
      2.000000
      73.000000
    
    
      ssl_cndtn_Average_comm
      2606.0
      0.402058
      0.401132
      0.000000
      0.000000
      0.333333
      0.800000
      1.000000
    
    
      ssl_cndtn_Average_res
      2606.0
      0.574651
      0.265928
      0.000000
      0.444444
      0.635642
      0.767857
      1.000000
    
    
      ssl_cndtn_Excellent_comm
      2606.0
      0.022413
      0.095254
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Excellent_res
      2606.0
      0.001871
      0.029081
      0.000000
      0.000000
      0.000000
      0.000000
      0.766990
    
    
      ssl_cndtn_Fair_comm
      2606.0
      0.024810
      0.101711
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Fair_res
      2606.0
      0.018239
      0.047230
      0.000000
      0.000000
      0.000000
      0.017857
      0.666667
    
    
      ssl_cndtn_Good_comm
      2606.0
      0.162147
      0.261734
      0.000000
      0.000000
      0.000000
      0.250000
      1.000000
    
    
      ssl_cndtn_Good_res
      2606.0
      0.285093
      0.206829
      0.000000
      0.133333
      0.266667
      0.403509
      1.000000
    
    
      ssl_cndtn_Poor_comm
      2606.0
      0.002487
      0.032450
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_Poor_res
      2606.0
      0.002067
      0.010527
      0.000000
      0.000000
      0.000000
      0.000000
      0.200000
    
    
      ssl_cndtn_VeryGood_comm
      2606.0
      0.104810
      0.232111
      0.000000
      0.000000
      0.000000
      0.000000
      1.000000
    
    
      ssl_cndtn_VeryGood_res
      2606.0
      0.036727
      0.071472
      0.000000
      0.000000
      0.000000
      0.052632
      1.000000
    
    
      tot_pop
      2606.0
      188.892172
      211.199274
      0.000000
      82.000000
      137.000000
      231.000000
      3888.000000
    
    
      well_activity
      2606.0
      0.572141
      1.569373
      0.000000
      0.000000
      0.000000
      0.000000
      19.000000
    
    
      WARD
      2606.0
      3.994244
      2.112836
      1.000000
      2.000000
      4.000000
      6.000000
      8.000000



In [41]:

    
data.bbl_restaurant.value_counts()









    Out[41]:





0.0     2015
1.0      260
2.0      137
3.0       83
4.0       38
5.0       24
8.0       11
6.0       10
9.0        9
7.0        7
12.0       4
10.0       2
15.0       2
11.0       2
17.0       1
14.0       1
Name: bbl_restaurant, dtype: int64



In [42]:

    
data.groupby('WARD').bbl_restaurant.value_counts()









    Out[42]:





WARD  bbl_restaurant
1     0.0               283
      1.0                61
      2.0                37
      3.0                18
      4.0                 8
      6.0                 2
      8.0                 2
      11.0                2
      12.0                2
      5.0                 1
      9.0                 1
      15.0                1
      17.0                1
2     0.0               255
      1.0                70
      2.0                49
      3.0                37
      4.0                23
      5.0                19
      6.0                 7
      7.0                 5
      9.0                 5
      8.0                 4
      10.0                2
      14.0                1
3     0.0                71
      1.0                11
      2.0                10
      3.0                 5
      9.0                 3
      7.0                 2
      4.0                 1
      5.0                 1
      8.0                 1
4     0.0               431
      1.0                44
      3.0                11
      2.0                10
5     0.0               336
      1.0                10
      2.0                 7
      3.0                 1
      4.0                 1
6     0.0               358
      1.0                51
      2.0                22
      3.0                11
      4.0                 5
      8.0                 4
      5.0                 3
      12.0                2
      6.0                 1
      15.0                1
7     0.0               148
      1.0                 5
      2.0                 1
8     0.0               133
      1.0                 8
      2.0                 1
Name: bbl_restaurant, dtype: int64



In [65]:

    
data.groupby(data.activity==1).WARD.value_counts().sort_values(ascending = False)









    Out[65]:





activity  WARD
False     6       353
True      2       271
False     4       260
True      4       236
          1       231
False     2       206
True      5       199
False     1       188
          5       156
          7       134
          8       119
True      6       105
False     3        65
True      3        40
          8        23
          7        20
Name: WARD, dtype: int64



In [79]:

    
data.groupby('WARD').tot_pop.sum().sort_values(ascending = False)









    Out[79]:





WARD
1    123400
2     90829
4     72237
6     66442
5     58077
3     27970
7     27699
8     25599
Name: tot_pop, dtype: int64

Looks like Ward 3, 7, 8 are about the same size. Model is MOST Accurate in Ward 3 and not accurate at all in Wards 7 and 8. Maybe our model is overfit to 3 -- what about these wards are different?



In [96]:

    
three = data[data.WARD==3]
seven = data[data.WARD==7]
eight = data[data.WARD==8]

three.activity.value_counts()









    Out[96]:





0.0    65
1.0    40
Name: activity, dtype: int64



In [97]:

    
seven.activity.value_counts()









    Out[97]:





0.0    134
1.0     20
Name: activity, dtype: int64



In [98]:

    
eight.activity.value_counts()









    Out[98]:





0.0    119
1.0     23
Name: activity, dtype: int64



In [108]:

    
data.groupby('WARD').activity.value_counts(sort=True)









    Out[108]:





WARD  activity
1     1.0         231
      0.0         188
2     1.0         271
      0.0         206
3     0.0          65
      1.0          40
4     0.0         260
      1.0         236
5     1.0         199
      0.0         156
6     0.0         353
      1.0         105
7     0.0         134
      1.0          20
8     0.0         119
      1.0          23
Name: activity, dtype: int64

Ward's 5-8 have different active-not active ratios. They're most out of whack in 6-8, the least accurate of our predictions. Ratio of not active/to active is about 1 (give or take .25) for every value.

Things are about 3 times as inactive and 5-6 times as inactive in wards 6-8. Ward 3, the sample we could be overfitting to is about 1.5 times more inactive. A bit weird, but more reasonable. This could be a source of our issue?

Exercise 4

Now try running some cross-validations with the data from your project. What are some different ways you might slice the data you're using for your project? Try them out here. This will be a good way to begin making progress toward your final submission.

PLEASE REMEMBER TO SUBMIT THIS HOMEWORK BY CLASS TIME ON THURSDAY.



In [124]:

    
data.describe().T









    Out[124]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      X
      53155.0
      -7.701390e+01
      3.896925e-02
      -7.711317e+01
      -7.703758e+01
      -7.701876e+01
      -7.699048e+01
      -7.691005e+01
    
    
      Y
      53155.0
      3.891153e+01
      2.755631e-02
      3.881366e+01
      3.889805e+01
      3.891023e+01
      3.892769e+01
      3.899398e+01
    
    
      OBJECTID
      53155.0
      2.584204e+07
      4.772180e+04
      2.557148e+07
      2.580968e+07
      2.583192e+07
      2.585243e+07
      2.599323e+07
    
    
      BBL_LICENSE_FACT_ID
      53155.0
      3.573744e+05
      3.762054e+04
      3.117700e+05
      3.340900e+05
      3.484900e+05
      3.606180e+05
      4.720330e+05
    
    
      CUST_NUM
      53155.0
      3.838201e+11
      2.744648e+11
      1.950124e+07
      7.010872e+07
      4.103160e+11
      5.005168e+11
      9.313170e+11
    
    
      LATITUDE
      53155.0
      3.891153e+01
      2.755631e-02
      3.881365e+01
      3.889804e+01
      3.891022e+01
      3.892768e+01
      3.899398e+01
    
    
      LONGITUDE
      53155.0
      -7.701390e+01
      3.896912e-02
      -7.711317e+01
      -7.703758e+01
      -7.701876e+01
      -7.699048e+01
      -7.691005e+01
    
    
      XCOORD
      53155.0
      3.987948e+05
      3.379723e+03
      3.901879e+05
      3.967413e+05
      3.983744e+05
      4.008254e+05
      4.078038e+05
    
    
      YCOORD
      53155.0
      1.381857e+05
      3.059101e+03
      1.273200e+05
      1.366886e+05
      1.380399e+05
      1.399780e+05
      1.473384e+05
    
    
      ZIPCODE
      53068.0
      2.001509e+04
      2.909145e+01
      2.000100e+04
      2.000500e+04
      2.001000e+04
      2.001800e+04
      2.059300e+04
    
    
      MARADDRESSREPOSITORYID
      53155.0
      2.280445e+05
      8.922920e+04
      0.000000e+00
      2.250700e+05
      2.445020e+05
      2.896090e+05
      3.147560e+05
    
    
      DCSTATADDRESSKEY
      53155.0
      1.390966e+05
      1.283098e+05
      2.000000e+00
      6.841700e+04
      8.885700e+04
      1.285030e+05
      4.819170e+05
    
    
      DCSTATLOCATIONKEY
      53155.0
      1.309599e+05
      1.090843e+05
      2.000000e+00
      6.841700e+04
      8.885700e+04
      1.285030e+05
      4.138170e+05
    
    
      WARD
      53155.0
      4.105954e+00
      2.080958e+00
      1.000000e+00
      2.000000e+00
      4.000000e+00
      6.000000e+00
      8.000000e+00



In [132]:

    
df1 = pd.read_csv('https://opendata.arcgis.com/datasets/82ab09c9541b4eb8ba4b537e131998ce_22.csv')

df2 = pd.read_csv('https://opendata.arcgis.com/datasets/f2e1c2ef9eb44f2899f4a310a80ecec9_2.csv')



In [131]:









    Out[131]:







  
    
      
      0
      1
      2
      3
      4
    
  
  
    
      X
      -76.9841
      -76.9841
      -76.9841
      -76.9841
      -76.9841
    
    
      Y
      38.8363
      38.8363
      38.8363
      38.8363
      38.8363
    
    
      OBJECTID
      25571475
      25571476
      25571477
      25571478
      25571479
    
    
      BBL_LICENSE_FACT_ID
      313246
      313246
      313246
      313246
      313246
    
    
      LICENSESTATUS
      ACTIVE
      ACTIVE
      ACTIVE
      ACTIVE
      ACTIVE
    
    
      LICENSECATEGORY
      Charitable Solicitation
      Charitable Solicitation
      Charitable Solicitation
      Charitable Solicitation
      Charitable Solicitation
    
    
      CUST_NUM
      400212000180
      400212000180
      400212000180
      400212000180
      400212000180
    
    
      TRADE_NAME
      BREATHE DC
      BREATHE DC
      BREATHE DC
      BREATHE DC
      BREATHE DC
    
    
      LICENSE_START_DATE
      2016-03-01T00:00:00.000Z
      2016-03-01T00:00:00.000Z
      2016-03-01T00:00:00.000Z
      2016-03-01T00:00:00.000Z
      2016-03-01T00:00:00.000Z
    
    
      LICENSE_EXPIRATION_DATE
      2018-02-28T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      2018-02-28T00:00:00.000Z
    
    
      LICENSE_ISSUE_DATE
      2016-02-25T00:00:00.000Z
      2016-02-25T00:00:00.000Z
      2016-02-25T00:00:00.000Z
      2016-02-25T00:00:00.000Z
      2016-02-25T00:00:00.000Z
    
    
      AGENT_PHONE
      2027265555
      2027265555
      2027265555
      2027265555
      2027265555
    
    
      LASTMODIFIEDDATE
      2017-06-20T15:10:12.000Z
      2017-06-20T15:10:12.000Z
      2017-06-20T15:10:12.000Z
      2017-06-20T15:10:12.000Z
      2017-06-20T15:10:12.000Z
    
    
      CITY
      WASHINGTON
      WASHINGTON
      WASHINGTON
      WASHINGTON
      WASHINGTON
    
    
      STATE
      DC
      DC
      DC
      DC
      DC
    
    
      SITEADDRESS
      1310 SOUTHERN AVENUE SE
      1310 SOUTHERN AVENUE SE
      1310 SOUTHERN AVENUE SE
      1310 SOUTHERN AVENUE SE
      1310 SOUTHERN AVENUE SE
    
    
      LATITUDE
      38.8363
      38.8363
      38.8363
      38.8363
      38.8363
    
    
      LONGITUDE
      -76.9841
      -76.9841
      -76.9841
      -76.9841
      -76.9841
    
    
      XCOORD
      401383
      401383
      401383
      401383
      401383
    
    
      YCOORD
      129834
      129834
      129834
      129834
      129834
    
    
      ZIPCODE
      20032
      20032
      20032
      20032
      20032
    
    
      MARADDRESSREPOSITORYID
      277936
      277936
      277936
      277936
      277936
    
    
      DCSTATADDRESSKEY
      120049
      120049
      120049
      120049
      120049
    
    
      DCSTATLOCATIONKEY
      120049
      120049
      120049
      120049
      120049
    
    
      WARD
      8
      8
      8
      8
      8
    
    
      ANC
      8E
      8E
      8E
      8E
      8E
    
    
      SMD
      8E03
      8E03
      8E03
      8E03
      8E03
    
    
      DISTRICT
      SEVENTH
      SEVENTH
      SEVENTH
      SEVENTH
      SEVENTH
    
    
      PSA
      706
      706
      706
      706
      706
    
    
      NEIGHBORHOODCLUSTER
      38
      38
      38
      38
      38
    
    
      HOTSPOT2006NAME
      NONE
      NONE
      NONE
      NONE
      NONE
    
    
      HOTSPOT2005NAME
      NONE
      NONE
      NONE
      NONE
      NONE
    
    
      HOTSPOT2004NAME
      NONE
      NONE
      NONE
      NONE
      NONE
    
    
      BUSINESSIMPROVEMENTDISTRICT
      NONE
      NONE
      NONE
      NONE
      NONE



In [133]:

    
df2.describe().T









    Out[133]:







  
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
  
  
    
      OBJECTID
      22.0
      11.500000
      6.493587
      1.000000
      6.25000
      11.500000
      16.75
      22.00
    
    
      LAST_UPDAT
      22.0
      2009.000000
      0.000000
      2009.000000
      2009.00000
      2009.000000
      2009.00
      2009.00
    
    
      X
      22.0
      397703.338738
      3329.244112
      391615.268950
      395425.71250
      397380.225005
      400186.49
      406439.23
    
    
      Y
      22.0
      138079.263527
      2909.785057
      130590.010007
      136453.63472
      137979.804471
      139832.70
      144234.39
    
    
      ADDRID
      22.0
      132709.454545
      134544.668464
      5616.000000
      15750.00000
      18006.000000
      275965.25
      307772.00
    
    
      PRIORITY_LEVEL
      13.0
      1.000000
      0.000000
      1.000000
      1.00000
      1.000000
      1.00
      1.00



In [162]:

    
df1 = pd.read_csv('https://opendata.arcgis.com/datasets/82ab09c9541b4eb8ba4b537e131998ce_22.csv')

df2 = pd.read_csv('https://opendata.arcgis.com/datasets/f2e1c2ef9eb44f2899f4a310a80ecec9_2.csv')


DF = df1.merge(df2, on ='X', how = 'outer')

from sklearn.metrics import precision_score
from sklearn.model_selection import KFold

cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", train_index, "TEST:", test_index)

cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", len(train_index), "TEST:", len(test_index))
    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=0)

## Create for-loop
for train_index, test_index in cv.split(data):

## Define training and test sets
    X_train = DF.loc[train_index].drop(['BBL_LICENSE_FACT_ID'], axis=1)
    y_train = DF.loc[train_index]['BBL_LICENSE_FACT_ID']
    X_test = DF.loc[test_index].drop(['BBL_LICENSE_FACT_ID'], axis=1)
    y_test = DF.loc[test_index]['BBL_LICENSE_FACT_ID']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))



In [164]:

    
DF









    Out[164]:







  
    
      
      X
      Y_x
      OBJECTID_x
      BBL_LICENSE_FACT_ID
      LICENSESTATUS
      LICENSECATEGORY
      CUST_NUM
      TRADE_NAME
      LICENSE_START_DATE
      LICENSE_EXPIRATION_DATE
      ...
      WEB_URL
      WEB_URL2
      STEWARD
      SOURCE
      LAST_UPDAT
      Y_y
      ADDRID
      PRIORITY_LEVEL
      DESCRIPTION
      EMAIL_PHONE
    
  
  
    
      0
      -76.984071
      38.836306
      25571475.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      1
      -76.984071
      38.836306
      25571476.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      2
      -76.984071
      38.836306
      25571477.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      3
      -76.984071
      38.836306
      25571478.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      4
      -76.984071
      38.836306
      25571479.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      5
      -76.984071
      38.836306
      25571480.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      6
      -76.984071
      38.836306
      25571481.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      7
      -76.984071
      38.836306
      25571482.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      8
      -76.984071
      38.836306
      25571483.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      9
      -76.984071
      38.836306
      25571484.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      10
      -76.984071
      38.836306
      25571485.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      11
      -76.984071
      38.836306
      25571486.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      12
      -76.984071
      38.836306
      25571487.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      13
      -76.984071
      38.836306
      25571488.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      14
      -76.984071
      38.836306
      25784321.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      15
      -76.984071
      38.836306
      25784322.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      16
      -76.984071
      38.836306
      25784323.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      17
      -76.984071
      38.836306
      25784324.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      18
      -76.984071
      38.836306
      25784325.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      19
      -76.984071
      38.836306
      25784326.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      20
      -76.984071
      38.836306
      25784327.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      21
      -76.984071
      38.836306
      25784328.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      22
      -76.984071
      38.836306
      25784329.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      23
      -76.984071
      38.836306
      25784330.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      24
      -76.984071
      38.836306
      25784331.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      25
      -76.984071
      38.836306
      25784332.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      26
      -76.984071
      38.836306
      25784333.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      27
      -76.984071
      38.836306
      25784334.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      28
      -76.984071
      38.836306
      25784335.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      29
      -76.984071
      38.836306
      25784336.0
      313246.0
      ACTIVE
      Charitable Solicitation
      4.002120e+11
      BREATHE DC
      2016-03-01T00:00:00.000Z
      2018-02-28T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      53147
      -76.998701
      38.903386
      25991812.0
      470369.0
      EXPIRED
      One Family Rental
      5.005149e+11
      Matthew Godwin
      2014-03-01T00:00:00.000Z
      2016-02-29T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53148
      -77.065015
      38.933024
      25991814.0
      470371.0
      EXPIRED
      One Family Rental
      5.005149e+11
      Lindsay Lincoln
      2014-03-01T00:00:00.000Z
      2016-02-29T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53149
      -77.071379
      38.933291
      25991818.0
      470375.0
      EXPIRED
      One Family Rental
      5.005149e+11
      Cohen/Kornblut residence; Jonathan Cohen
      2014-06-01T00:00:00.000Z
      2016-05-31T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53150
      -77.074664
      38.937192
      25992005.0
      470376.0
      EXPIRED
      One Family Rental
      5.005149e+11
      Jay Hewlin ; Jay
      2014-09-01T00:00:00.000Z
      2016-08-31T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53151
      -77.042141
      38.930821
      25992197.0
      470281.0
      EXPIRED
      Apartment
      6.600022e+07
      NIELS THIESS
      2014-01-01T00:00:00.000Z
      2015-12-31T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53152
      -77.030820
      38.921718
      25992208.0
      470294.0
      EXPIRED
      Apartment
      6.800379e+07
      WARDMAN COURT CO/OF INTERSTATE REALTY MGR CO
      2014-06-01T00:00:00.000Z
      2016-05-31T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53153
      -76.935101
      38.890048
      25992430.0
      470658.0
      EXPIRED
      Two Family Rental
      7.010751e+07
      NaN
      2014-09-01T00:00:00.000Z
      2016-08-31T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53154
      -76.980528
      38.866850
      25993234.0
      470560.0
      EXPIRED
      One Family Rental
      6.800645e+07
      HENRY H STRONG
      2014-10-01T00:00:00.000Z
      2016-09-30T00:00:00.000Z
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      0
      NaN
      NaN
    
    
      53155
      395248.988794
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      137857.987965
      15719.0
      0
      Locally grown fresh fruits and vegetables. Apr...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53156
      402347.350000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      135781.360000
      307772.0
      0
      Locally grown fresh fruits and vegetables. Sum...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53157
      396725.040000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://mtpfm.com/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      140373.590000
      226249.0
      0
      Locally grown fresh fruits and vegetables. May...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53158
      395414.540000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      139969.700000
      219212.0
      0
      Locally grown fresh fruits and vegetables. Sum...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53159
      400773.519999
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      140627.060000
      16406.0
      0
      Locally grown fresh fruits and vegetables. WIC...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53160
      400134.560000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      www.congressheightsontherise.com
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      130590.010007
      17957.0
      0
      Locally grown fresh fruits and vegetables. Sat...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53161
      391615.268950
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.palisadesfarmersmarket.com/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      138847.436638
      17316.0
      0
      Locally grown fresh fruits and vegetables. Sun...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53162
      393836.944730
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.newmorningfarm.net/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      142134.741057
      12254.0
      0
      Locally grown fresh fruits and vegetables. Sat...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53163
      406439.230000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      136608.140000
      5742.0
      0
      Locally grown fresh fruits and vegetables. Fri...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53164
      394042.440000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.chevychasefarmersmarket.org/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      144234.390000
      263178.0
      0
      Locally grown fresh fruits and vegetables. Sat...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53165
      398008.015772
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.freshfarmmarkets.org/
      http://www.freshfarmmarket.org/
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      136402.132960
      11946.0
      0
      Locally grown fresh fruits and vegetables. Apr...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53166
      400330.039998
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.easternmarket-dc.org/
      www.goodgenerallink.org
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      135504.990010
      18055.0
      0
      DC's oldest continually operated fresh food pu...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53167
      397531.320000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.ams.usda.gov/AMSv1.0/farmersmarkets
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      135490.010000
      294873.0
      0
      Locally grown fresh fruits and vegetables. Fri...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53168
      398132.500000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      135138.410000
      276612.0
      0
      Seasonal produce direct from farmers. Tuesday.
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53169
      398945.580000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.14andufarmersmarket.com/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      138304.650014
      15468.0
      0
      Locally grown fresh fruits and vegetables. May...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53170
      396107.063994
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://freshfarmmarkets.org/farmers_markets/ma...
      http://www.freshfarmmarket.org/
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      138086.718942
      5616.0
      0
      Locally grown fresh fruits and vegetables. Yea...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53171
      400226.570000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://freshfarmmarkets.org/farmers_markets/ma...
      http://www.freshfarmmarket.org/
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      136899.100000
      288803.0
      0
      Locally grown fresh fruits and vegetables. May...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53172
      395459.230000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.freshfarmmarket.org/markets/foggy_b...
      http://www.freshfarmmarket.org/
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      136966.330000
      274025.0
      0
      Locally grown fresh fruits and vegetables. Apr...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53173
      400203.800000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      137872.890000
      301991.0
      0
      Locally grown fresh fruits and vegetables. WIC...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53174
      396298.579990
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      139421.699999
      16877.0
      0
      Locally grown fresh fruits and vegetables. May...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53175
      397229.130010
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      http://www.14andufarmersmarket.com/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      138792.679998
      15843.0
      0
      Locally grown fresh fruits and vegetables. Apr...
      jennifer.guillaume@dc.gov _ 535-2252
    
    
      53176
      394423.740000
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      https://www.udc.edu/event/farmers-market/
      
      Jen
      http://www.dcfoodfinder.org/
      2009.0
      141839.770000
      297694.0
      0
      Locally grown fresh fruits and vegetables. Jul...
      jennifer.guillaume@dc.gov _ 535-2252
    
  

53177 rows × 51 columns



In [145]:

    
from sklearn.metrics import precision_score
from sklearn.model_selection import KFold

cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", train_index, "TEST:", test_index)

cv = KFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in cv.split(data):
    print("TRAIN:", len(train_index), "TEST:", len(test_index))
    
## Define function
cv = KFold(n_splits=10, shuffle=True, random_state=0)

## Create for-loop
for train_index, test_index in cv.split(data):

## Define training and test sets
    X_train = DF.loc[train_index].drop(['BBL_LICENSE_FACT_ID'], axis=1)
    y_train = DF.loc[train_index]['BBL_LICENSE_FACT_ID']
    X_test = DF.loc[test_index].drop(['BBL_LICENSE_FACT_ID'], axis=1)
    y_test = DF.loc[test_index]['BBL_LICENSE_FACT_ID']
        
    ## Fit model
    clf = LogisticRegression()
    clf.fit(X_train, y_train)

    ## Generate predictions
    predicted = clf.predict(X_test)
    
    ## Compare to actual outcomes and return precision
    print('Precision: '+str(100 * round(precision_score(y_test, predicted),3)))









    



TRAIN: [    1     2     3 ..., 53152 53153 53154] TEST: [    0    17    18 ..., 53130 53142 53147]
TRAIN: [    0     2     4 ..., 53152 53153 53154] TEST: [    1     3    14 ..., 53140 53150 53151]
TRAIN: [    0     1     2 ..., 53152 53153 53154] TEST: [    6    33    62 ..., 53141 53144 53149]
TRAIN: [    0     1     2 ..., 53151 53152 53153] TEST: [   15    31    37 ..., 53112 53120 53154]
TRAIN: [    0     1     2 ..., 53152 53153 53154] TEST: [    4     7     9 ..., 53077 53132 53146]
TRAIN: [    0     1     2 ..., 53152 53153 53154] TEST: [   29    35    40 ..., 53121 53122 53125]
TRAIN: [    0     1     2 ..., 53151 53152 53154] TEST: [    8    12    23 ..., 53088 53101 53153]
TRAIN: [    0     1     3 ..., 53151 53153 53154] TEST: [    2     5    20 ..., 53143 53145 53152]
TRAIN: [    0     1     2 ..., 53152 53153 53154] TEST: [   11    13    21 ..., 53090 53094 53103]
TRAIN: [    0     1     2 ..., 53152 53153 53154] TEST: [   10    19    43 ..., 53128 53131 53148]
TRAIN: 47839 TEST: 5316
TRAIN: 47839 TEST: 5316
TRAIN: 47839 TEST: 5316
TRAIN: 47839 TEST: 5316
TRAIN: 47839 TEST: 5316
TRAIN: 47840 TEST: 5315
TRAIN: 47840 TEST: 5315
TRAIN: 47840 TEST: 5315
TRAIN: 47840 TEST: 5315
TRAIN: 47840 TEST: 5315






    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-145-a51de7a1b7be> in <module>()
     24     ## Fit model
     25     clf = LogisticRegression()
---> 26     clf.fit(X_train, y_train)
     27 
     28     ## Generate predictions

/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/logistic.py in fit(self, X, y, sample_weight)
   1171 
   1172         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
-> 1173                          order="C")
   1174         check_classification_targets(y)
   1175         self.classes_ = np.unique(y)

/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    519     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    520                     ensure_2d, allow_nd, ensure_min_samples,
--> 521                     ensure_min_features, warn_on_dtype, estimator)
    522     if multi_output:
    523         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    380                                       force_all_finite)
    381     else:
--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)
    383 
    384         if ensure_2d:

ValueError: could not convert string to float: 'HENRY H STRONG'



In [ ]:

	count	mean	std	min	25%	50%	75%	max
activity	2606.0	0.431696	0.495408	0.000000	0.000000	0.000000	1.000000	1.000000
alley_condition	2606.0	11.111282	8.900166	0.000000	4.000000	10.000000	16.000000	79.000000
bbl_hotel	2606.0	0.082118	0.376073	0.000000	0.000000	0.000000	0.000000	8.000000
bbl_multifamily_rental	2606.0	1.388718	2.376244	0.000000	0.000000	0.000000	2.000000	22.000000
bbl_restaurant	2606.0	0.569455	1.518526	0.000000	0.000000	0.000000	0.000000	17.000000
bbl_single_family_rental	2606.0	4.709133	8.375165	0.000000	1.000000	2.000000	5.000000	147.000000
bbl_storage	2606.0	0.002686	0.051768	0.000000	0.000000	0.000000	0.000000	1.000000
bbl_two_family_rental	2606.0	0.743668	1.378860	0.000000	0.000000	0.000000	1.000000	15.000000
communitygarden_area	2606.0	18.727920	326.332382	0.000000	0.000000	0.000000	0.000000	11004.319881
communitygarden_id	2606.0	0.242134	3.234069	0.000000	0.000000	0.000000	0.000000	80.000000
dcrapermit_addition	2606.0	0.054490	0.265959	0.000000	0.000000	0.000000	0.000000	4.000000
dcrapermit_demolition	2606.0	0.280507	0.738300	0.000000	0.000000	0.000000	0.000000	12.000000
dcrapermit_excavation	2606.0	0.010361	0.105000	0.000000	0.000000	0.000000	0.000000	2.000000
dcrapermit_new_building	2606.0	0.053722	0.286938	0.000000	0.000000	0.000000	0.000000	4.000000
dcrapermit_raze	2606.0	0.044896	0.308390	0.000000	0.000000	0.000000	0.000000	5.000000
impervious_area	2606.0	18450.549489	18774.921307	2150.037473	11356.602831	14532.464194	19920.166029	473222.487756
month	2606.0	7.194935	3.001022	1.000000	5.000000	7.000000	10.000000	12.000000
num_mixed_use	2606.0	0.153876	0.474006	0.000000	0.000000	0.000000	0.000000	4.000000
num_non_residential	2606.0	3.804682	5.956966	0.000000	0.000000	1.000000	5.000000	58.000000
num_residential	2606.0	39.287797	26.661136	0.000000	21.000000	36.000000	53.000000	334.000000
park	2606.0	0.046815	0.225350	0.000000	0.000000	0.000000	0.000000	4.000000
pct_mixed_use	2606.0	0.003899	0.013706	0.000000	0.000000	0.000000	0.000000	0.250000
pct_non_residential	2606.0	0.139389	0.242231	0.000000	0.000000	0.038462	0.150000	1.000000
pct_residential	2606.0	0.854794	0.247308	0.000000	0.838200	0.959184	1.000000	1.000000
pop_density	2606.0	24969.466549	19217.566990	0.000000	13524.013052	21965.055541	30907.352412	182709.507271
sidewalk_grates	2606.0	2.109363	5.186348	0.000000	0.000000	0.000000	2.000000	73.000000
ssl_cndtn_Average_comm	2606.0	0.402058	0.401132	0.000000	0.000000	0.333333	0.800000	1.000000
ssl_cndtn_Average_res	2606.0	0.574651	0.265928	0.000000	0.444444	0.635642	0.767857	1.000000
ssl_cndtn_Excellent_comm	2606.0	0.022413	0.095254	0.000000	0.000000	0.000000	0.000000	1.000000
ssl_cndtn_Excellent_res	2606.0	0.001871	0.029081	0.000000	0.000000	0.000000	0.000000	0.766990
ssl_cndtn_Fair_comm	2606.0	0.024810	0.101711	0.000000	0.000000	0.000000	0.000000	1.000000
ssl_cndtn_Fair_res	2606.0	0.018239	0.047230	0.000000	0.000000	0.000000	0.017857	0.666667
ssl_cndtn_Good_comm	2606.0	0.162147	0.261734	0.000000	0.000000	0.000000	0.250000	1.000000
ssl_cndtn_Good_res	2606.0	0.285093	0.206829	0.000000	0.133333	0.266667	0.403509	1.000000
ssl_cndtn_Poor_comm	2606.0	0.002487	0.032450	0.000000	0.000000	0.000000	0.000000	1.000000
ssl_cndtn_Poor_res	2606.0	0.002067	0.010527	0.000000	0.000000	0.000000	0.000000	0.200000
ssl_cndtn_VeryGood_comm	2606.0	0.104810	0.232111	0.000000	0.000000	0.000000	0.000000	1.000000
ssl_cndtn_VeryGood_res	2606.0	0.036727	0.071472	0.000000	0.000000	0.000000	0.052632	1.000000
tot_pop	2606.0	188.892172	211.199274	0.000000	82.000000	137.000000	231.000000	3888.000000
well_activity	2606.0	0.572141	1.569373	0.000000	0.000000	0.000000	0.000000	19.000000
WARD	2606.0	3.994244	2.112836	1.000000	2.000000	4.000000	6.000000	8.000000

	count	mean	std	min	25%	50%	75%	max
X	53155.0	-7.701390e+01	3.896925e-02	-7.711317e+01	-7.703758e+01	-7.701876e+01	-7.699048e+01	-7.691005e+01
Y	53155.0	3.891153e+01	2.755631e-02	3.881366e+01	3.889805e+01	3.891023e+01	3.892769e+01	3.899398e+01
OBJECTID	53155.0	2.584204e+07	4.772180e+04	2.557148e+07	2.580968e+07	2.583192e+07	2.585243e+07	2.599323e+07
BBL_LICENSE_FACT_ID	53155.0	3.573744e+05	3.762054e+04	3.117700e+05	3.340900e+05	3.484900e+05	3.606180e+05	4.720330e+05
CUST_NUM	53155.0	3.838201e+11	2.744648e+11	1.950124e+07	7.010872e+07	4.103160e+11	5.005168e+11	9.313170e+11
LATITUDE	53155.0	3.891153e+01	2.755631e-02	3.881365e+01	3.889804e+01	3.891022e+01	3.892768e+01	3.899398e+01
LONGITUDE	53155.0	-7.701390e+01	3.896912e-02	-7.711317e+01	-7.703758e+01	-7.701876e+01	-7.699048e+01	-7.691005e+01
XCOORD	53155.0	3.987948e+05	3.379723e+03	3.901879e+05	3.967413e+05	3.983744e+05	4.008254e+05	4.078038e+05
YCOORD	53155.0	1.381857e+05	3.059101e+03	1.273200e+05	1.366886e+05	1.380399e+05	1.399780e+05	1.473384e+05
ZIPCODE	53068.0	2.001509e+04	2.909145e+01	2.000100e+04	2.000500e+04	2.001000e+04	2.001800e+04	2.059300e+04
MARADDRESSREPOSITORYID	53155.0	2.280445e+05	8.922920e+04	0.000000e+00	2.250700e+05	2.445020e+05	2.896090e+05	3.147560e+05
DCSTATADDRESSKEY	53155.0	1.390966e+05	1.283098e+05	2.000000e+00	6.841700e+04	8.885700e+04	1.285030e+05	4.819170e+05
DCSTATLOCATIONKEY	53155.0	1.309599e+05	1.090843e+05	2.000000e+00	6.841700e+04	8.885700e+04	1.285030e+05	4.138170e+05
WARD	53155.0	4.105954e+00	2.080958e+00	1.000000e+00	2.000000e+00	4.000000e+00	6.000000e+00	8.000000e+00

	0	1	2	3	4
X	-76.9841	-76.9841	-76.9841	-76.9841	-76.9841
Y	38.8363	38.8363	38.8363	38.8363	38.8363
OBJECTID	25571475	25571476	25571477	25571478	25571479
BBL_LICENSE_FACT_ID	313246	313246	313246	313246	313246
LICENSESTATUS	ACTIVE	ACTIVE	ACTIVE	ACTIVE	ACTIVE
LICENSECATEGORY	Charitable Solicitation	Charitable Solicitation	Charitable Solicitation	Charitable Solicitation	Charitable Solicitation
CUST_NUM	400212000180	400212000180	400212000180	400212000180	400212000180
TRADE_NAME	BREATHE DC	BREATHE DC	BREATHE DC	BREATHE DC	BREATHE DC
LICENSE_START_DATE	2016-03-01T00:00:00.000Z	2016-03-01T00:00:00.000Z	2016-03-01T00:00:00.000Z	2016-03-01T00:00:00.000Z	2016-03-01T00:00:00.000Z
LICENSE_EXPIRATION_DATE	2018-02-28T00:00:00.000Z	2018-02-28T00:00:00.000Z	2018-02-28T00:00:00.000Z	2018-02-28T00:00:00.000Z	2018-02-28T00:00:00.000Z
LICENSE_ISSUE_DATE	2016-02-25T00:00:00.000Z	2016-02-25T00:00:00.000Z	2016-02-25T00:00:00.000Z	2016-02-25T00:00:00.000Z	2016-02-25T00:00:00.000Z
AGENT_PHONE	2027265555	2027265555	2027265555	2027265555	2027265555
LASTMODIFIEDDATE	2017-06-20T15:10:12.000Z	2017-06-20T15:10:12.000Z	2017-06-20T15:10:12.000Z	2017-06-20T15:10:12.000Z	2017-06-20T15:10:12.000Z
CITY	WASHINGTON	WASHINGTON	WASHINGTON	WASHINGTON	WASHINGTON
STATE	DC	DC	DC	DC	DC
SITEADDRESS	1310 SOUTHERN AVENUE SE	1310 SOUTHERN AVENUE SE	1310 SOUTHERN AVENUE SE	1310 SOUTHERN AVENUE SE	1310 SOUTHERN AVENUE SE
LATITUDE	38.8363	38.8363	38.8363	38.8363	38.8363
LONGITUDE	-76.9841	-76.9841	-76.9841	-76.9841	-76.9841
XCOORD	401383	401383	401383	401383	401383
YCOORD	129834	129834	129834	129834	129834
ZIPCODE	20032	20032	20032	20032	20032
MARADDRESSREPOSITORYID	277936	277936	277936	277936	277936
DCSTATADDRESSKEY	120049	120049	120049	120049	120049
DCSTATLOCATIONKEY	120049	120049	120049	120049	120049
WARD	8	8	8	8	8
ANC	8E	8E	8E	8E	8E
SMD	8E03	8E03	8E03	8E03	8E03
DISTRICT	SEVENTH	SEVENTH	SEVENTH	SEVENTH	SEVENTH
PSA	706	706	706	706	706
NEIGHBORHOODCLUSTER	38	38	38	38	38
HOTSPOT2006NAME	NONE	NONE	NONE	NONE	NONE
HOTSPOT2005NAME	NONE	NONE	NONE	NONE	NONE
HOTSPOT2004NAME	NONE	NONE	NONE	NONE	NONE
BUSINESSIMPROVEMENTDISTRICT	NONE	NONE	NONE	NONE	NONE

	count	mean	std	min	25%	50%	75%	max
OBJECTID	22.0	11.500000	6.493587	1.000000	6.25000	11.500000	16.75	22.00
LAST_UPDAT	22.0	2009.000000	0.000000	2009.000000	2009.00000	2009.000000	2009.00	2009.00
X	22.0	397703.338738	3329.244112	391615.268950	395425.71250	397380.225005	400186.49	406439.23
Y	22.0	138079.263527	2909.785057	130590.010007	136453.63472	137979.804471	139832.70	144234.39
ADDRID	22.0	132709.454545	134544.668464	5616.000000	15750.00000	18006.000000	275965.25	307772.00
PRIORITY_LEVEL	13.0	1.000000	0.000000	1.000000	1.00000	1.000000	1.00	1.00

	X	Y_x	OBJECTID_x	BBL_LICENSE_FACT_ID	LICENSESTATUS	LICENSECATEGORY	CUST_NUM	TRADE_NAME	LICENSE_START_DATE	LICENSE_EXPIRATION_DATE	...	WEB_URL	WEB_URL2	STEWARD	SOURCE	LAST_UPDAT	Y_y	ADDRID	PRIORITY_LEVEL	DESCRIPTION	EMAIL_PHONE
0	-76.984071	38.836306	25571475.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
1	-76.984071	38.836306	25571476.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
2	-76.984071	38.836306	25571477.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
3	-76.984071	38.836306	25571478.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
4	-76.984071	38.836306	25571479.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
5	-76.984071	38.836306	25571480.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
6	-76.984071	38.836306	25571481.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
7	-76.984071	38.836306	25571482.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
8	-76.984071	38.836306	25571483.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
9	-76.984071	38.836306	25571484.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
10	-76.984071	38.836306	25571485.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
11	-76.984071	38.836306	25571486.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
12	-76.984071	38.836306	25571487.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
13	-76.984071	38.836306	25571488.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
14	-76.984071	38.836306	25784321.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
15	-76.984071	38.836306	25784322.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
16	-76.984071	38.836306	25784323.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
17	-76.984071	38.836306	25784324.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
18	-76.984071	38.836306	25784325.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
19	-76.984071	38.836306	25784326.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
20	-76.984071	38.836306	25784327.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
21	-76.984071	38.836306	25784328.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
22	-76.984071	38.836306	25784329.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
23	-76.984071	38.836306	25784330.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
24	-76.984071	38.836306	25784331.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
25	-76.984071	38.836306	25784332.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
26	-76.984071	38.836306	25784333.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
27	-76.984071	38.836306	25784334.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
28	-76.984071	38.836306	25784335.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
29	-76.984071	38.836306	25784336.0	313246.0	ACTIVE	Charitable Solicitation	4.002120e+11	BREATHE DC	2016-03-01T00:00:00.000Z	2018-02-28T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
53147	-76.998701	38.903386	25991812.0	470369.0	EXPIRED	One Family Rental	5.005149e+11	Matthew Godwin	2014-03-01T00:00:00.000Z	2016-02-29T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53148	-77.065015	38.933024	25991814.0	470371.0	EXPIRED	One Family Rental	5.005149e+11	Lindsay Lincoln	2014-03-01T00:00:00.000Z	2016-02-29T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53149	-77.071379	38.933291	25991818.0	470375.0	EXPIRED	One Family Rental	5.005149e+11	Cohen/Kornblut residence; Jonathan Cohen	2014-06-01T00:00:00.000Z	2016-05-31T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53150	-77.074664	38.937192	25992005.0	470376.0	EXPIRED	One Family Rental	5.005149e+11	Jay Hewlin ; Jay	2014-09-01T00:00:00.000Z	2016-08-31T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53151	-77.042141	38.930821	25992197.0	470281.0	EXPIRED	Apartment	6.600022e+07	NIELS THIESS	2014-01-01T00:00:00.000Z	2015-12-31T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53152	-77.030820	38.921718	25992208.0	470294.0	EXPIRED	Apartment	6.800379e+07	WARDMAN COURT CO/OF INTERSTATE REALTY MGR CO	2014-06-01T00:00:00.000Z	2016-05-31T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53153	-76.935101	38.890048	25992430.0	470658.0	EXPIRED	Two Family Rental	7.010751e+07	NaN	2014-09-01T00:00:00.000Z	2016-08-31T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53154	-76.980528	38.866850	25993234.0	470560.0	EXPIRED	One Family Rental	6.800645e+07	HENRY H STRONG	2014-10-01T00:00:00.000Z	2016-09-30T00:00:00.000Z	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0	NaN	NaN
53155	395248.988794	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	137857.987965	15719.0	0	Locally grown fresh fruits and vegetables. Apr...	jennifer.guillaume@dc.gov _ 535-2252
53156	402347.350000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	135781.360000	307772.0	0	Locally grown fresh fruits and vegetables. Sum...	jennifer.guillaume@dc.gov _ 535-2252
53157	396725.040000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://mtpfm.com/		Jen	http://www.dcfoodfinder.org/	2009.0	140373.590000	226249.0	0	Locally grown fresh fruits and vegetables. May...	jennifer.guillaume@dc.gov _ 535-2252
53158	395414.540000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN		Jen	http://www.dcfoodfinder.org/	2009.0	139969.700000	219212.0	0	Locally grown fresh fruits and vegetables. Sum...	jennifer.guillaume@dc.gov _ 535-2252
53159	400773.519999	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	140627.060000	16406.0	0	Locally grown fresh fruits and vegetables. WIC...	jennifer.guillaume@dc.gov _ 535-2252
53160	400134.560000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	www.congressheightsontherise.com		Jen	http://www.dcfoodfinder.org/	2009.0	130590.010007	17957.0	0	Locally grown fresh fruits and vegetables. Sat...	jennifer.guillaume@dc.gov _ 535-2252
53161	391615.268950	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.palisadesfarmersmarket.com/		Jen	http://www.dcfoodfinder.org/	2009.0	138847.436638	17316.0	0	Locally grown fresh fruits and vegetables. Sun...	jennifer.guillaume@dc.gov _ 535-2252
53162	393836.944730	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.newmorningfarm.net/		Jen	http://www.dcfoodfinder.org/	2009.0	142134.741057	12254.0	0	Locally grown fresh fruits and vegetables. Sat...	jennifer.guillaume@dc.gov _ 535-2252
53163	406439.230000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	136608.140000	5742.0	0	Locally grown fresh fruits and vegetables. Fri...	jennifer.guillaume@dc.gov _ 535-2252
53164	394042.440000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.chevychasefarmersmarket.org/		Jen	http://www.dcfoodfinder.org/	2009.0	144234.390000	263178.0	0	Locally grown fresh fruits and vegetables. Sat...	jennifer.guillaume@dc.gov _ 535-2252
53165	398008.015772	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.freshfarmmarkets.org/	http://www.freshfarmmarket.org/	Jen	http://www.dcfoodfinder.org/	2009.0	136402.132960	11946.0	0	Locally grown fresh fruits and vegetables. Apr...	jennifer.guillaume@dc.gov _ 535-2252
53166	400330.039998	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.easternmarket-dc.org/	www.goodgenerallink.org	Jen	http://www.dcfoodfinder.org/	2009.0	135504.990010	18055.0	0	DC's oldest continually operated fresh food pu...	jennifer.guillaume@dc.gov _ 535-2252
53167	397531.320000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.ams.usda.gov/AMSv1.0/farmersmarkets		Jen	http://www.dcfoodfinder.org/	2009.0	135490.010000	294873.0	0	Locally grown fresh fruits and vegetables. Fri...	jennifer.guillaume@dc.gov _ 535-2252
53168	398132.500000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	135138.410000	276612.0	0	Seasonal produce direct from farmers. Tuesday.	jennifer.guillaume@dc.gov _ 535-2252
53169	398945.580000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.14andufarmersmarket.com/		Jen	http://www.dcfoodfinder.org/	2009.0	138304.650014	15468.0	0	Locally grown fresh fruits and vegetables. May...	jennifer.guillaume@dc.gov _ 535-2252
53170	396107.063994	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://freshfarmmarkets.org/farmers_markets/ma...	http://www.freshfarmmarket.org/	Jen	http://www.dcfoodfinder.org/	2009.0	138086.718942	5616.0	0	Locally grown fresh fruits and vegetables. Yea...	jennifer.guillaume@dc.gov _ 535-2252
53171	400226.570000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://freshfarmmarkets.org/farmers_markets/ma...	http://www.freshfarmmarket.org/	Jen	http://www.dcfoodfinder.org/	2009.0	136899.100000	288803.0	0	Locally grown fresh fruits and vegetables. May...	jennifer.guillaume@dc.gov _ 535-2252
53172	395459.230000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.freshfarmmarket.org/markets/foggy_b...	http://www.freshfarmmarket.org/	Jen	http://www.dcfoodfinder.org/	2009.0	136966.330000	274025.0	0	Locally grown fresh fruits and vegetables. Apr...	jennifer.guillaume@dc.gov _ 535-2252
53173	400203.800000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	137872.890000	301991.0	0	Locally grown fresh fruits and vegetables. WIC...	jennifer.guillaume@dc.gov _ 535-2252
53174	396298.579990	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...			Jen	http://www.dcfoodfinder.org/	2009.0	139421.699999	16877.0	0	Locally grown fresh fruits and vegetables. May...	jennifer.guillaume@dc.gov _ 535-2252
53175	397229.130010	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	http://www.14andufarmersmarket.com/		Jen	http://www.dcfoodfinder.org/	2009.0	138792.679998	15843.0	0	Locally grown fresh fruits and vegetables. Apr...	jennifer.guillaume@dc.gov _ 535-2252
53176	394423.740000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	https://www.udc.edu/event/farmers-market/		Jen	http://www.dcfoodfinder.org/	2009.0	141839.770000	297694.0	0	Locally grown fresh fruits and vegetables. Jul...	jennifer.guillaume@dc.gov _ 535-2252