1. Motivation

1.1 Dataset

The main dataset was the NYC Street Tree Data from 2015 and is the result of a community survery done mainly by volunteers, cataloging all trees in NYC. As secondary data sets we had the ones regarding Street Tree Data from 1995 and 2005. Moreover, we used the air pollution data in New York City, in order to understand the influence of trees on the air quality. We also started analyzing the "311" dataset, to explore some complaints regarding trees. The Street Tree dataset was chosen because it could give new insights and perspective to urban planning, discover their status (how healthy they are, if people are taking care of those etc.), and if they are influencing the life quality of the city. Moreover we could have discovered facts that most people would probably not be aware of beforehand.

1.2 Goal

The goal was to enlighten users about trees in NYC. Are there certain types of trees more suitable for streets than others? Where are they located? Is it possible to know which kind of tree you might encounter based on the location, health of the tree, the diameter, or even the amount of problems of the tree? From this project it should be possible to learn something new about a topic you might never have considered learning something about.

2. Basic stats

2.1 Preprocessing the data

There were some outlies in the dataset which had to be removed to get useful results. One example was lat/lon which had an extreme outlier.

Regarding the 311 dataset, we just selected the 2015 data and the complaints regarding trees, as these were the only ones important for our domain.

For the air pollution there was data for all the community districts, but only some of the neighbourhoods. The measurements were mean percentile. We took the mean values for the community district and and assigned them to the corresponding borough. This was because the names of the neighbourhoods in this dataset and our own dataset were so different that it was very difficult to figure out which neighbourhoods were the same in the two sets.

2.1.1 Variable selection

When taking a first glance at the dataset it was a bit overwhelming, as it's huge and there are a lot of rows which does not necessarily make sense at a first glance, as wel as a number of of variables not particularly interesting or necessary for what we wanted to do. Each variable was carefully examined and the variables deemed unnecessary were excluded. Among these was "Tree_Id", a unique ID for each tree, but this unique ID was unique for each of the three datasets (1995, 2005, 2015), meaning it was not possible to join the datasets by this ID, deeming it not relevant. Other excluded variables were address information, since there were multiple variables delivering address information on different levels - and it was not relevant to distinguish between all these.

2.1.2 Observation selection

It was decided to only focus on the top 20 tree species, since there were a lot of different species without a significant amount of observations, it would be difficult to describe them all properly. It would also be very difficult to do good predictions if the observations are sparse. For some machine learning tools, we focused only on the top 10 species, or top 5, because the data was too sparse when going above this limit.

There were a lot of trees without a species listed, and those were disregarded completely. The dead trees were also excluded from the dataset.

It was considered to only focus on one of the five boroughs in NYC to get a more detailed view. This was not implemented since it was deemed more interesting to two differences between the boroughs as well.

2.2 Stats for the preprocessed data

The final dataset "Street Tree Data 2015" consists of 534,514 tree observations and 21 variables/features, totalling 74.5 MB. The selected features were:

Diameter (inches)
Health (three values: Good, fair, poor)
Spc_Latin, Spc_Common (latin and common name for the species)
Sidewalk_Condition (two values: Damage, NoDamage)
Problems (a string concatenated from the following types of problems)
- root_stone, root_grate, root_other, trunk_wire, trnk_light, trnk_other, brch_light, brch_shoe, brch_other (two values: yes/no)
Address
Zipcode
CB (community board)
Borough
Latitude, Longitude

Amount of trees in each borough:

Bronx: 63,035
Brooklyn: 138,760
Staten Island: 82,619
Manhattan: 54,115
Queens: 195,985

In general, the top 20 species were the same for the 5 boroughs, but the order of this "top 20" list was different. There were more trees with general problems in Manhattan as well as more unhealthy trees.

We did a lot of Pearson correlation among the different variables for figuring out that not a lot of the variables were correlated. In the end, finally figured out some correlation between the air quality, tree amount and tree diameter and the health states.

But let's start looking at the main dataset, the 2015 Street Tree Census (https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh).

2.3 Other datasets inspected

Multiple secondary datasets were inspected, e.g. the 311 dataset and the airpollution dataset, as well as the Street Tree datasets from 2005 and 1995, respectively. In the 311 set, there were several complaints about trees in NYC. No significant correlations were found though. It was hoped that a connected between a certain type of complaint were correlated with different problems or the health of the tree, but unfortunately data does not always behave as hoped or suspected, and patterns cannot (and should not) be forced to appear.

One could also be inclined to wonder if more "green" areas, meaning areas with a lot of trees, had higher house prices. Again, after investigation, this was found to be challenging, since there is not a lot of information about house prices available - at least not on a neighbourhood level.

It was also considered if there was a correlation between the trees/features of the trees and the air pollution. This dataset was used for simple linear regression.

For the different maps used, a few other datasets have been included in the shape of geojson files, which includes the data needed for drawing the d3 maps (polygons) as well as basic information about the parts of the city, they're representing, such as borough, community district etc., which was used in combination with our own data from the Street Tree dataset to produce interactive maps. The geojsons used can be found and downloaded at (https://github.com/cecli/cecli.github.io/tree/master/data/geojson).

3. Theory

3.1 Machine Learning tools

When doing predictions it can be difficult to find the appropriate tools to use. Different tools have different qualities and it all depends on the data and the patterns in your data. In this project, different tools have been tried out, typically multiple tools for the same prediction to inspect the model performance of each tool.

3.1.1 KNN

KNN is a tool rather easy to grasp and implement. It was chosen for predicting the health of a tree based on GPS coordinates, as well as predicting species based on GPS coordinates. An argument for KNN being the most appropriate choice is that one could think that when planting trees, one would be inclined to plant the same trees together. One could also think that unhealthy trees are likely in the same area, presumable because of a decease in the area, a pollution problem, soil problems or something completely different. A drawback of the KNN method is that when dealing when an unbalanced dataset it will favour the most occuring observation.

3.1.2 Decision trees and Random Forest

Decision trees can often be a good choice because they are nice to visualize. A drawback is that they tend to overfit the training data. It was used for predicting health based on GPS coordinates, as well as species based on GPS coordinates in spite of its drawback. And also, we use it for predicting the tree species based on location and predicting diameter based on species and location. When predicting species different features were added to see if they contributed to the predictions, e.g. the diameter. The main reason was to compare with the other results. If the decision trees did not overfit and still performed well, then it would be nice to visualize. To accomodate the overfitting issue a random forest was also tried out.

Decision trees were also used to predict diameter based on species and problems, as well as predicting diamater based on the amount of problems. Here, the diameter was binned in bins of different sizes (1-10, 10-15, 15-20, ..., 45-50, 50-60, 60-70, ..., 90-100, 100-150, 150-200, ...)

3.1.3 SVM

As a third tool, Support Vector Machines were tried out. SVMs can do linear classification by creating a "maximum seperating hyperplane" between data. It can also do non-linear classification using a so-called kernel-trick where inputs are mapped to high-dimensional feature space. This was used to predict health based on GPS coordinates.

3.1.4 Apriori

Apriori is an algorithm for frequent item search. It was used to inspect problems appearing together. We used this algorithm for seeing if some problems were appearing together in the same observation. We found that some problems are sometimes appearing in the same trees. [We used Apyori package: https://pypi.python.org/pypi/apyori/1.1.1]

3.1.5 Linear and multiple regression

Linear regression was used to inspect correlation between different features, and is not really a machine learning tool as much as a tool for investigating linear correlations. It was used to predict air pollution based on the amount of trees as well as diameter. Techniques used were Lasso and Elastic Net, apart from the standard linear regression.

3.2 Model selection

When selecting appropriate models, first thing is to split the data into a training set and a test set. When predicting health (or species) based on GPS coordinates, a test set consisting of 15% of the total amount of observations was used. Hereafter the training set was "split" into a training set and a validation set, using a 5-fold and 10-fold cross-validation. The best model was chosen based on accuracy scored, but with computation time taken into account as well. For the KNN, different values of $k$ was tried, ranging from $K=2,...,10$. The limit was set to 10 because it was not expected that we would have a whole area of unhealthy trees, and that might just confuse the predictions.

3.3 Model performance

For predicting species and health based on GPS coordinates, KNN was selected as the best model. SVM simply took a significant amount of time to run, making it difficult to fine tune and handle. Decision trees overfitted the training data and was not good at handling sparse data.

For predicting species, the KNN classifier was able to predict $51.7$% accurately for $K=4$ on the test data, whereas the average (average over the 5 folds) performance on the validation set were $49.9$% (compared to $49$% for decision trees). This is considered rather good, taken into account that it is labelling $20$ different species, but it would also suggest that the same species is not always planted next to each other. They are actually not planted next to each other more often than expected before investigating the data. In comparison, when only trying to predict the top 5 species instead of all 20 species, the average performance on the validation set were $70.4$% for $K=4$. This also confirms that the same species are not always planted next to each other, and shows, as expected, that the model performs better when addressing fewer species.

For predicting health, the KNN classifier was able to predict $80.7$% accurately for $K=5$ on the test data, whereas the average (over the 5 folds) performance on the valdiation set were $80.4$% (compared to $74.8$% for decision trees). An accuracy of $80.7$% is rather good considering the sparsity of the "fair" and "poor" tree observations. The KNN did handle the sparsity trees better than the decision tree classifier. When it labeled a tree as "fair" it had around $43$% correct (on the validation set for $K=5$). A bit worse it went for the "poor" classifications, here it only predicted around $1/3$ correctly. It was, as we thought, much better at predicting the good trees. This makes sense since there were a lot more training data available. In comparison, the decision tree classifier had around $31$% correct for the "fair" trees, and $17$% for the "poor" trees. The amount of misclassifications on the "poor" and "fair" trees suggests that the condition of the trees do not really reflect on its neighbours and are most likely caused by other, individual things.

The decision trees were performing almost like the random forest in our case. When using species as an input, we obtained a maximum accuracy of 0.5 with the top 5 species. With top 10 and 20, the accuracy was even less, which made us conclude it was not a good model. Moreover, we saw that in the visualization using graphviz, it was predicting just two species, Locust and Pine Oak, so it was definitely not a good tool for our case.

The apriori algorithm found some correlation among the stone problems. The score was not that good, but we saw it was improving when taking into account just the trees with some problems or the trees with just one problem. This is understandable, because some of the trees might be new, which can also be seen on the diameter of these trees. So it is possible that they don't have a lot of problems.

Elastic Net and Lasso are generally good tools, but we lack of variables and data points (maximum 188). That can be the reason behind the non-optimal performance.

4. Visualizations

4.1 Top 20 Trees

The initial histogram shows all the trees in New York City. It can be clearly seen that some trees are present in a really low amount. That is why we have focused on a subset of species, mostly the top 20 (for all the anlysis), and sometimes top 10 and 5. The pie chart shows then the distribution of top 20 trees, with the possibility of hovering it for a more clear eplanation of the hovered slice.

4.2 Tree distribution of top 5 species

The d3 bar chart show the distribution of the 5 most common species in NYC for each borough, as well as NYC. This illustrates the borough-wise differences regarding the most common trees. E.g. the London Planetree is the most common tree in NYC. It is also the most common tree in Queens. Queens is the largest area in NYC both regarding trees and size, as shown in the pie charts. Therefore it might be that most of NYC's London Planetrees are located in Queens, but as it can be seen from the chart, most of them actually comes from Brooklyn. The chart also have the option of changing the years from 2015 to either 2005 or 1995, and hereby see the differences over time as well.

This chart is important for the project since it enables the user to view the distribution of the most common trees in NYC on borough level for the three different years available: 1995, 2005, 2015. This helps us show the changes in street trees in NYC over time.

We have also appended a plot in Python (using GeoPlotLib) in order to give the user an overview of how distribution/amount of trees has changed over years.

4.3 Fun facts about the top 10 species

This page enables the user to hover over pictures of leaves for the 10 most common species to see the species name, and then when clicking a leaf image, fun facts about these trees appear.

This function is important for the project since it sets each of the trees in a context: What is it called? What is its ranking? How many trees are there of this species? What is special/interesting about the species? Why is it a good street tree compared to others?

4.4 Health distribution of the trees

A basic histogram and a pie chart for make user understanding the health distribution of the trees. It can be clearly seen that there are more healthy trees in respect to the poor ones, but the histogram gives some more details about various species, probably discovering some interesting species.

4.5 Health map of the NYC street trees

The d3 map enables the user to view a prediction of the health of the NYC street trees, using KNN as a classifier. It is possible to hover over the individual boroughs for details, and to switch between visualizing the good, fair, and poor trees. When hovering over a borough, a tooltip is displayed, showing information in the shape of borough name, borough size, amount of trees in that borough specified by the tree species, and finally, the percentages of 'good' and 'poor' trees in that borough. The boroughs have different colors in respect to the amount of trees present there.

The map is an important visualization for the project since it shows the location of the trees in regards to different health. Because of the large amount of data points it would have created a too confusing picture to visualize all three health states at the same time, which is why it has been split up into 'good' and 'poor' with the 'fair' trees available to append at the click of a button.

4.6 Scatterplots

The site includes three scatterplots. The first scatterplot visualizes the most common problems trees can have according to the dataset. The problems visualized are split into three different ones: 'Trunk', 'Root' and 'Branch'. These problems are caused by humans, such as trees growing into phone lines, wires around the trunk or stones on the root. Further definitions can be found in the dataset manual in the link provided.

The second and the third scatterplot shows the correlation among the amount of trees, diameter and air pollution in each of the neighbourhoods. We just took the community district points and assign the mean of each neighbourhood. The user can explore each neighbourhood when hovering over the map (which actually shows community districts) and the corresponding plots in the scatterplot (and viceversa). The scatterplots colors are per borough, thus the users can explore the air pollution amount in each different area, and in various level of details.

References for ispiration for d3 visualization (scatterplot):

https://bl.ocks.org/ctufts/674ece47de093f6e0cd5af22d7ee9b9b
http://bl.ocks.org/weiglemc/6185069
Extension d3pie: http://d3pie.org

5. Discussion

When working with the project, two things became clear to us: 1) Real life data is messy 2) Data do not care what you think of them

It is possible to have good intentions, and a lot of good ideas as to what to do when analyzing a dataset, but the data itself just have limitations that are not always possible to overcome.

5.1 What went well? What went wrong?

During the project, a lot of things did not go as expected. First of all, the pattern we expected to find in the data was just not there. The intention was to find correlation between the problems of the trees and the health, possible also correlation with the diameter. A lot of basic Pearson correlations was done on the data, but it appeared that there actually were no significant correlations. Then a lot of other datasets were inspected to see if they were correlated with some of the tree data features. The only interesting part that appears regarded the air pollution, seeing the correlation among the amount of trees and diameter and the air quality. Moreover, we discovered that the diameter of a trees is influenced by the problems. We managed to create a lot of plots and visualizations of the data, showing the fundamentals counts. We also managed to apply different machine learning methods, though they only showed us what the preliminary analysis did. In general, there were not really large areas with problematic trees, and that could not really be related to the health. The health was sparse, influencing the predictions.

In general, we feel that we have acquired familiarity with Python for data analysis. On the other hand, it took much more time having visualization working, especially when d3 visualization were put online. Sadly, we spent a lot of time in fixing minor problems of compatibility with the website/d3, taking it out from our analysis/visualization work.

5.2 Possible improvements

If there was more data available to join with the street tree data, we might have been able to find other nice pattern/correlation. The air pollution was one, and it could have been nice to focus more on this, exploring also other variables, like amount of traffic in New York City.

We could also have focused on one prediction goal instead of trying to find a lot of different patterns, that turned out to not be there. E.g. we could have focused on predicting species based but with some other tools, since KNN obviously was not the best choice. A suggestion would be to try do some binary classifications, locating e.g. London Planetrees using SVMs (so just one species instead of focusing on more than one).

5.3 What is still missing?

In the end, what we actually missed was patterns in the data so we could have done some more advanced and great predictions. But patterns cannot be forced, so with that in mind, we could have used more visualizations for the predictions we did. We could also have showed some more interactive features regarding health and problems in various years, deeply analyzing the changes of the trees.

We could have assigned the air pollution to each neighbourhood available in the dataset instead of the community district mean. This was too much work because of the different neighbourhood names. Moreover, we could have explore the 311 dataset more than we did to see if some areas with more complaints performed worse in regards to air pollution.

We also started analyzing house prices, but the neighbourhood names were too different, so time prohibited us from continuing down this path.

The visualization of the map on the site is a bit slow, and we could not figure out how to optimize it. Moreover, the scatterplots were not working initially so we spent a lot of time figuring out how to improve the site in regards to these and one axis is even still missing in the second scatterplot (even if locally was working before).

Appendix: Code

Loading data



In [1]:

    
#Import data the whole dataset 
import pandas as pd
import csv

tree_data = pd.read_csv('2015_tree_data_updated.csv')
tree_data









    Out[1]:






  
    
      
      Diameter
      Status
      Health
      Spc_Latin
      Spc_Common
      Sidewalk_Condition
      problems
      root_stone
      root_grate
      root_other
      ...
      brch_shoe
      brch_other
      Address
      Zipcode
      CB
      Borough
      nta
      Neighbourhoods
      Latitude
      Longitude
    
  
  
    
      0
      3
      Alive
      Fair
      Acer rubrum
      red maple
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      108-005 70 AVENUE
      11375
      406
      Queens
      QN17
      Forest Hills
      40.723092
      -73.844215
    
    
      1
      21
      Alive
      Fair
      Quercus palustris
      pin oak
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      147-074 7 AVENUE
      11357
      407
      Queens
      QN49
      Whitestone
      40.794111
      -73.818679
    
    
      2
      3
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      None
      No
      No
      No
      ...
      No
      No
      390 MORGAN AVENUE
      11211
      301
      Brooklyn
      BK90
      East Williamsburg
      40.717581
      -73.936608
    
    
      3
      10
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      1027 GRAND STREET
      11211
      301
      Brooklyn
      BK90
      East Williamsburg
      40.713537
      -73.934456
    
    
      4
      21
      Alive
      Good
      Tilia americana
      American linden
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      603 6 STREET
      11215
      306
      Brooklyn
      BK37
      Park Slope-Gowanus
      40.666778
      -73.975979
    
    
      5
      11
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      8 COLUMBUS AVENUE
      10023
      107
      Manhattan
      MN14
      Lincoln Square
      40.770046
      -73.984950
    
    
      6
      11
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      120 WEST 60 STREET
      10023
      107
      Manhattan
      MN14
      Lincoln Square
      40.770210
      -73.985338
    
    
      7
      9
      Alive
      Good
      Tilia americana
      American linden
      NoDamage
      MetalGrates
      No
      Yes
      No
      ...
      No
      No
      311 WEST 50 STREET
      10019
      104
      Manhattan
      MN15
      Clinton
      40.762724
      -73.987297
    
    
      8
      6
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      65 JEROME AVENUE
      10305
      502
      Staten Island
      SI14
      Grasmere-Arrochar-Ft. Wadsworth
      40.596579
      -74.076255
    
    
      9
      21
      Alive
      Fair
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      638 AVENUE Z
      11223
      313
      Brooklyn
      BK26
      Gravesend
      40.586357
      -73.969744
    
    
      10
      11
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      20-025 24 STREET
      11105
      401
      Queens
      QN72
      Steinway
      40.782428
      -73.911171
    
    
      11
      8
      Alive
      Poor
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      20-055 24 STREET
      11105
      401
      Queens
      QN72
      Steinway
      40.781735
      -73.912020
    
    
      12
      13
      Alive
      Fair
      Platanus x acerifolia
      London planetree
      NoDamage
      Stones
      Yes
      No
      No
      ...
      No
      No
      35 FENWAY CIRCLE
      10308
      503
      Staten Island
      SI54
      Great Kills
      40.557103
      -74.162670
    
    
      13
      22
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      RootOther
      No
      No
      Yes
      ...
      No
      No
      100 WAVERLY AVENUE
      11205
      302
      Brooklyn
      BK69
      Clinton Hill
      40.694733
      -73.968211
    
    
      14
      30
      Alive
      Fair
      Platanus x acerifolia
      London planetree
      Damage
      Stones,BranchOther
      Yes
      No
      No
      ...
      No
      Yes
      2126 UNION STREET
      11212
      316
      Brooklyn
      BK81
      Brownsville
      40.664317
      -73.921130
    
    
      15
      12
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      449 MYRTLE AVENUE
      11205
      302
      Brooklyn
      BK69
      Clinton Hill
      40.693314
      -73.967601
    
    
      16
      2
      Alive
      Fair
      Ginkgo biloba
      ginkgo
      Damage
      None
      No
      No
      No
      ...
      No
      No
      8797 25 AVENUE
      11214
      311
      Brooklyn
      BK29
      Bensonhurst East
      40.593788
      -73.991597
    
    
      17
      14
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      None
      No
      No
      No
      ...
      No
      No
      1601 CHURCH AVENUE
      11226
      314
      Brooklyn
      BK42
      Flatbush
      40.648788
      -73.964524
    
    
      18
      14
      Alive
      Fair
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      TrunkLights,BranchLights
      No
      No
      No
      ...
      No
      No
      55-026 96 STREET
      11373
      404
      Queens
      QN25
      Corona
      40.737646
      -73.865300
    
    
      19
      10
      Alive
      Good
      Ginkgo biloba
      ginkgo
      NoDamage
      Stones
      Yes
      No
      No
      ...
      No
      No
      206 CARLTON AVENUE
      11205
      302
      Brooklyn
      BK68
      Fort Greene
      40.691499
      -73.972588
    
    
      20
      11
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      367 PROSPECT AVENUE
      11215
      307
      Brooklyn
      BK37
      Park Slope-Gowanus
      40.661239
      -73.985889
    
    
      21
      14
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      RootOther,TrunkOther,BranchOther
      No
      No
      Yes
      ...
      No
      Yes
      170 EAST 75 STREET
      10021
      108
      Manhattan
      MN40
      Upper East Side-Carnegie Hill
      40.772171
      -73.960456
    
    
      22
      33
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      Stones,BranchLights
      Yes
      No
      No
      ...
      No
      No
      84-036 127 STREET
      11415
      409
      Queens
      QN60
      Kew Gardens
      40.706534
      -73.824992
    
    
      23
      19
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      401 AVENUE O
      11230
      312
      Brooklyn
      BK46
      Ocean Parkway South
      40.611905
      -73.970427
    
    
      24
      9
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      71 STANTON STREET
      10002
      103
      Manhattan
      MN27
      Chinatown
      40.721807
      -73.989830
    
    
      25
      9
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      TrunkOther,BranchLights
      No
      No
      No
      ...
      No
      No
      1817 DE KALB AVENUE
      11385
      405
      Queens
      QN20
      Ridgewood
      40.708040
      -73.915497
    
    
      26
      7
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      456 5 AVENUE
      11215
      306
      Brooklyn
      BK37
      Park Slope-Gowanus
      40.668826
      -73.986703
    
    
      27
      3
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      Stones
      Yes
      No
      No
      ...
      No
      No
      2022 LA FONTAINE AVENUE
      10457
      206
      Bronx
      BX17
      East Tremont
      40.847947
      -73.893382
    
    
      28
      7
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      MetalGrates,TrunkOther
      No
      Yes
      No
      ...
      No
      No
      1880 BROADWAY
      10023
      107
      Manhattan
      MN14
      Lincoln Square
      40.770396
      -73.981627
    
    
      29
      5
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      MetalGrates,TrunkOther
      No
      Yes
      No
      ...
      No
      No
      1 WEST 62 STREET
      10023
      107
      Manhattan
      MN14
      Lincoln Square
      40.770227
      -73.981218
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      534484
      20
      Alive
      Good
      Ginkgo biloba
      ginkgo
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      31 FISKE PLACE
      11215
      306
      Brooklyn
      BK37
      Park Slope-Gowanus
      40.671685
      -73.975309
    
    
      534485
      11
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      RootOther
      No
      No
      Yes
      ...
      No
      No
      325 EAST 64 STREET
      10065
      108
      Manhattan
      MN31
      Lenox Hill-Roosevelt Island
      40.763224
      -73.960984
    
    
      534486
      4
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      483 MYRTLE AVENUE
      11205
      302
      Brooklyn
      BK69
      Clinton Hill
      40.693606
      -73.965886
    
    
      534487
      12
      Alive
      Fair
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      491 GRANDVIEW AVENUE
      11385
      405
      Queens
      QN20
      Ridgewood
      40.709886
      -73.907888
    
    
      534488
      18
      Alive
      Fair
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      Stones
      Yes
      No
      No
      ...
      No
      No
      112-001 72 DRIVE
      11375
      406
      Queens
      QN17
      Forest Hills
      40.720503
      -73.837470
    
    
      534489
      22
      Alive
      Good
      Platanus x acerifolia
      London planetree
      Damage
      Stones,RootOther,TrunkOther
      Yes
      No
      Yes
      ...
      No
      No
      1494 PUTNAM AVENUE
      11237
      304
      Brooklyn
      BK77
      Bushwick North
      40.697094
      -73.909830
    
    
      534490
      25
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      Stones,BranchLights
      Yes
      No
      No
      ...
      No
      No
      1349 64 STREET
      11219
      310
      Brooklyn
      BK30
      Dyker Heights
      40.625847
      -73.999582
    
    
      534491
      19
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      81-020 PETTIT AVENUE
      11373
      404
      Queens
      QN29
      Elmhurst
      40.743967
      -73.883604
    
    
      534492
      9
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      21 SOUTH END AVENUE
      10280
      101
      Manhattan
      MN25
      Battery Park City-Lower Manhattan
      40.707884
      -74.017598
    
    
      534493
      6
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      8922 3 AVENUE
      11209
      310
      Brooklyn
      BK31
      Bay Ridge
      40.620861
      -74.032224
    
    
      534494
      4
      Alive
      Fair
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      None
      No
      No
      No
      ...
      No
      No
      140-040 31 DRIVE
      11354
      407
      Queens
      QN22
      Flushing
      40.769787
      -73.827245
    
    
      534495
      7
      Alive
      Poor
      Tilia americana
      American linden
      NoDamage
      Stones,RootOther
      Yes
      No
      Yes
      ...
      No
      No
      8700 25 AVENUE
      11214
      311
      Brooklyn
      BK29
      Bensonhurst East
      40.595673
      -73.989637
    
    
      534496
      10
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      41 WEST 86 STREET
      10024
      107
      Manhattan
      MN12
      Upper West Side
      40.786150
      -73.971152
    
    
      534497
      13
      Alive
      Fair
      Gleditsia triacanthos var. inermis
      honeylocust
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      699 GROTE STREET
      10457
      206
      Bronx
      BX06
      Belmont
      40.851115
      -73.885511
    
    
      534498
      6
      Alive
      Good
      Ginkgo biloba
      ginkgo
      Damage
      Stones
      Yes
      No
      No
      ...
      No
      No
      166 IRWIN STREET
      11235
      315
      Brooklyn
      BK17
      Sheepshead Bay-Gerritsen Beach-Manhattan Beach
      40.578603
      -73.943733
    
    
      534499
      4
      Alive
      Good
      Gleditsia triacanthos var. inermis
      honeylocust
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      86 WAVERLY AVENUE
      11205
      302
      Brooklyn
      BK69
      Clinton Hill
      40.695204
      -73.968306
    
    
      534500
      12
      Alive
      Good
      Ginkgo biloba
      ginkgo
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      149 EAST 23 STREET
      10010
      106
      Manhattan
      MN21
      Gramercy
      40.739270
      -73.983960
    
    
      534501
      17
      Alive
      Good
      Quercus palustris
      pin oak
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      67-008 JUNO STREET
      11375
      406
      Queens
      QN17
      Forest Hills
      40.716741
      -73.857177
    
    
      534502
      5
      Alive
      Good
      Quercus palustris
      pin oak
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      345 WEST 13 STREET
      10014
      102
      Manhattan
      MN23
      West Village
      40.739913
      -74.004892
    
    
      534503
      11
      Alive
      Good
      Quercus palustris
      pin oak
      Damage
      None
      No
      No
      No
      ...
      No
      No
      2766 BEDFORD AVENUE
      11210
      314
      Brooklyn
      BK42
      Flatbush
      40.635175
      -73.953392
    
    
      534504
      12
      Alive
      Good
      Ulmus americana
      American elm
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      1040 EASTERN PARKWAY
      11213
      309
      Brooklyn
      BK61
      Crown Heights North
      40.668770
      -73.934053
    
    
      534505
      29
      Alive
      Good
      Platanus x acerifolia
      London planetree
      Damage
      Stones,BranchLights
      Yes
      No
      No
      ...
      No
      No
      1040 EAST 16 STREET
      11230
      314
      Brooklyn
      BK43
      Midwood
      40.624296
      -73.960344
    
    
      534506
      27
      Alive
      Good
      Platanus x acerifolia
      London planetree
      NoDamage
      Stones,WiresRope,BranchLights
      Yes
      No
      No
      ...
      No
      No
      2720 QUENTIN ROAD
      11229
      315
      Brooklyn
      BK44
      Madison
      40.609541
      -73.945835
    
    
      534507
      15
      Alive
      Fair
      Platanus x acerifolia
      London planetree
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      50-017 SKILLMAN AVENUE
      11377
      402
      Queens
      QN31
      Hunters Point-Sunnyside-West Maspeth
      40.746122
      -73.913657
    
    
      534508
      20
      Alive
      Good
      Quercus palustris
      pin oak
      NoDamage
      Stones
      Yes
      No
      No
      ...
      No
      No
      1040 FLATBUSH AVENUE
      11226
      314
      Brooklyn
      BK42
      Flatbush
      40.645694
      -73.958179
    
    
      534509
      3
      Alive
      Good
      Quercus palustris
      pin oak
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      2185 VALENTINE AVENUE
      10457
      205
      Bronx
      BX41
      Mount Hope
      40.854570
      -73.899192
    
    
      534510
      25
      Alive
      Good
      Quercus palustris
      pin oak
      Damage
      None
      No
      No
      No
      ...
      No
      No
      32 MARCY AVENUE
      11211
      301
      Brooklyn
      BK73
      North Side-South Side
      40.713211
      -73.954944
    
    
      534511
      12
      Alive
      Good
      Acer rubrum
      red maple
      Damage
      None
      No
      No
      No
      ...
      No
      No
      130 BIDWELL AVENUE
      10314
      501
      Staten Island
      SI07
      Westerleigh
      40.620762
      -74.136517
    
    
      534512
      9
      Alive
      Good
      Acer rubrum
      red maple
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      1985 ANTHONY AVENUE
      10457
      205
      Bronx
      BX41
      Mount Hope
      40.850828
      -73.903115
    
    
      534513
      23
      Alive
      Fair
      Acer rubrum
      red maple
      NoDamage
      None
      No
      No
      No
      ...
      No
      No
      69-069 183 STREET
      11365
      408
      Queens
      QN41
      Fresh Meadows-Utopia
      40.732165
      -73.787526
    
  

534514 rows × 24 columns



In [4]:

    
#Convert health categories to numbers. 1: Poor, 2: Fair, 3: Good. Higher = Better
health = []
for i in range(len(tree_data['Health'])):
    if tree_data['Health'][i] == 'Good':
        health.append(3)
    elif tree_data['Health'][i] == 'Fair':
        health.append(2)
    elif tree_data['Health'][i] == 'Poor':
        health.append(1)
    else:
        health.append(0)
        #print "err", tree_data['Health'][i], i

Data analysis



In [5]:

    
#Finding the total number of trees and how many there are of the different species of trees
tree_amount = tree_data['Spc_Common'].value_counts()

print(tree_amount)









    



London planetree     87014
honeylocust          64264
Callery pear         58931
pin oak              53185
Norway maple         34189
littleleaf linden    29742
cherry               29279
Japanese zelkova     29258
ginkgo               21024
Sophora              19338
red maple            17246
green ash            16251
American linden      13530
silver maple         12277
sweetgum             10657
northern red oak      8400
silver linden         7995
American elm          7975
maple                 7080
purple-leaf plum      6879
Name: Spc_Common, dtype: int64



In [7]:

    
#Plotting the results to get an overview
#plt.style.use('ggplot')
%matplotlib inline

def barplot(series, title, figsize, ylabel, flag, rotation):
    ax = series.plot(kind='bar', 
                title = title,
                figsize = figsize,
                fontsize = 13)
    
    # set ylabel
    ax.set_ylabel(ylabel)
    # set xlabel (depending on the flag that comes as a function parameter)
    ax.get_xaxis().set_visible(flag)
    # set series index as xlabels and rotate them
    ax.set_xticklabels(series.index, rotation= rotation)
    
barplot(tree_amount,'Tree types', figsize=(20,8), ylabel = 'tree count',flag = True, rotation = 90)



In [9]:

    
#Putting the percentages on a pie chart
ax = tree_amount.plot(kind='pie', title='Top 20 tree species in NYC', autopct='%1.0f%%', pctdistance=0.9)
ax.set_ylabel('')
ax.set_aspect('equal')



In [10]:

    
#Count no. of trees in each borough:
boros = tree_data['Borough'].unique()
print tree_data['Borough'].value_counts()









    



Queens           195985
Brooklyn         138760
Staten Island     82619
Bronx             63035
Manhattan         54115
Name: Borough, dtype: int64



In [11]:

    
#Finding how many trees there are of the different health types
tree_health = tree_data['Health'].value_counts()
print(tree_health)









    



Good    435306
Fair     78460
Poor     20746
Name: Health, dtype: int64



In [12]:

    
#Comparing the count of each tree species in the whole city with a borough. This was used in the initial analysis to try and 
#determine if focus should be put on a single borough, and which borough this should be.
queens_tree_types = tree_data.loc[tree_data['Borough'] == 'Queens', 'Spc_Common'].value_counts()
brooklyn_tree_types = tree_data.loc[tree_data['Borough'] == 'Brooklyn', 'Spc_Common'].value_counts()
staten_tree_types = tree_data.loc[tree_data['Borough'] == 'Staten Island', 'Spc_Common'].value_counts()
bronx_tree_types = tree_data.loc[tree_data['Borough'] == 'Bronx', 'Spc_Common'].value_counts()
manhattan_tree_types = tree_data.loc[tree_data['Borough'] == 'Manhattan', 'Spc_Common'].value_counts()

df = pd.concat([tree_amount, queens_tree_types], axis=1)
print(df)
df.columns = ['NYC', 'Queens']

df = df.sort_values('NYC', ascending=False) # sort the df using NYC values

df.plot.bar(color=['red','blue'])









    



                   Spc_Common  Spc_Common
American elm             7975        1709
American linden         13530        4769
Callery pear            58931       16547
Japanese zelkova        29258        8987
London planetree        87014       31111
Norway maple            34189       19407
Sophora                 19338        5386
cherry                  29279       13497
ginkgo                  21024        5971
green ash               16251        7389
honeylocust             64264       20290
littleleaf linden       29742       11902
maple                    7080        2992
northern red oak         8400        2697
pin oak                 53185       22610
purple-leaf plum         6879        3035
red maple               17246        4935
silver linden            7995        4146
silver maple            12277        6116
sweetgum                10657        2489






    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x117664a50>



In [17]:

    
#Comparing the number of trees in each of the five boroughs
import matplotlib as plt

fig, axes = df.plot.subplots(nrows=5)

plt.subplots_adjust(wspace=1, hspace=0.5)

plot = queens_tree_types.plot(ax=axes[0], kind='bar', figsize=(8,30)); axes[0].set_title('Queens');
brooklyn_tree_types.plot(ax=axes[1], kind='bar'); axes[1].set_title('Brooklyn');
manhattan_tree_types.plot(ax=axes[2], kind='bar'); axes[2].set_title('Manhattan');
staten_tree_types.plot(ax=axes[3], kind='bar'); axes[3].set_title('Staten Island');
bronx_tree_types.plot(ax=axes[4], kind='bar'); axes[4].set_title('Bronx');

fig = plot.get_figure()









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-1bf616e90beb> in <module>()
      2 import matplotlib as plt
      3 
----> 4 fig, axes = df.plot.subplots(nrows=5)
      5 
      6 plt.subplots_adjust(wspace=1, hspace=0.5)

AttributeError: 'FramePlotMethods' object has no attribute 'subplots'



In [19]:

    
with open('2015_tree_data_updated.csv', 'r') as infile:
    # read the file as a dictionary for each row ({header : value})
    reader = csv.DictReader(infile)
    data = {} #empty set
    for row in reader:
        for header, value in row.items():
            try:
                data[header].append(value)
            except KeyError:
                data[header] = [value]
Diameter = data['Diameter']
Health = data['Health']
Spc_Latin = data['Spc_Latin']
Spc_Common = data['Spc_Common']
Sidewalk_Condition = data['Sidewalk_Condition']
problems = data['problems']
root_stone = data['root_stone']
root_grate = data['root_grate']
root_other = data['root_other']
trunk_wire = data['trunk_wire']
trnk_light = data['trnk_light']
trnk_other = data['trnk_other']
brch_light = data['brch_light']
brch_shoe = data['brch_shoe']
brch_other = data['brch_other']
Address = data['Address']
Zipcode = data['Zipcode']
CB = data['CB']
Borough = data['Borough']
Latitude = data['Latitude']
Longitude = data['Longitude']
#Heatmap of tree distribution
X_new=[]
Y_new=[]

for i in range(len(CB)):
    X_new.append(Longitude[i])
    Y_new.append(Latitude[i])
    
with open('Coordinates_trees.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in zip(X_new, Y_new):
        writer.writerow( (xd, yd ) )
    csvfile.close()

import geoplotlib
from geoplotlib.utils import read_csv, BoundingBox, DataAccessObject

min_lat = min(X_new)
max_lat = max(X_new)
min_lon = min(Y_new)
max_lon = max(Y_new)

bbox = BoundingBox(north=float(max_lon), west=float(max_lat), south=float(min_lon), east=float(min_lat))
print "Trees:", bbox

data_trees = read_csv('Coordinates_trees.csv')
geoplotlib.kde(data_trees, bw=0.5, cmap = 'jet', cut_below=1e-4)
geoplotlib.set_bbox(bbox)
geoplotlib.inline()









    



Trees: BoundingBox(north=40.912614, west=-74.254965, south=40.498466, east=-73.700488)
('smallest non-zero count', 1.392495637759815e-07)
('max count:', 24.80812040755379)

In the repository, there's also data from 2005 and 1995 but it is not included in this explainer notebook as it does not provide any value for explaining our analysis.

Data for histogram



In [20]:

    
#2015 data 

import numpy as np

#NYC top 20 species
unique, counts = np.unique(zip(tree_data['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Extract data for each borough
tree_data_bronx = tree_data.loc[tree_data['Borough'] == 'Bronx']
tree_data_brook = tree_data.loc[tree_data['Borough'] == 'Brooklyn']
tree_data_stat = tree_data.loc[tree_data['Borough'] == 'Staten Island']
tree_data_manh = tree_data.loc[tree_data['Borough'] == 'Manhattan']
tree_data_queens = tree_data.loc[tree_data['Borough'] == 'Queens']

#Bronx
unique, counts = np.unique(zip(tree_data_bronx['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Brooklyn
unique, counts = np.unique(zip(tree_data_brook['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Staten Island
unique, counts = np.unique(zip(tree_data_stat['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Manhattan
unique, counts = np.unique(zip(tree_data_manh['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Queens
unique, counts = np.unique(zip(tree_data_queens['Spc_Common']), return_counts=True)
print "Queens data:"
print sorted(zip(counts, unique), reverse = True)









    



Queens data:
[(31111, 'London planetree'), (22610, 'pin oak'), (20290, 'honeylocust'), (19407, 'Norway maple'), (16547, 'Callery pear'), (13497, 'cherry'), (11902, 'littleleaf linden'), (8987, 'Japanese zelkova'), (7389, 'green ash'), (6116, 'silver maple'), (5971, 'ginkgo'), (5386, 'Sophora'), (4935, 'red maple'), (4769, 'American linden'), (4146, 'silver linden'), (3035, 'purple-leaf plum'), (2992, 'maple'), (2697, 'northern red oak'), (2489, 'sweetgum'), (1709, 'American elm')]



In [23]:

    
#2005 data
import pandas as pd
import numpy as np

tree_data = pd.read_csv('2005_tree_data_updated.csv')

#NYC top 20 species
unique, counts = np.unique(zip(tree_data['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Extract data for each borough
tree_data_bronx = tree_data.loc[tree_data['Borough'] == 'Bronx']
tree_data_brook = tree_data.loc[tree_data['Borough'] == 'Brooklyn']
tree_data_stat = tree_data.loc[tree_data['Borough'] == 5]
tree_data_manh = tree_data.loc[tree_data['Borough'] == 'Manhattan']
tree_data_queens = tree_data.loc[tree_data['Borough'] == 'Queens']

#Bronx
unique, counts = np.unique(zip(tree_data_bronx['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Brooklyn
unique, counts = np.unique(zip(tree_data_brook['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Staten Island
unique, counts = np.unique(zip(tree_data_stat['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Manhattan
unique, counts = np.unique(zip(tree_data_manh['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Queens
unique, counts = np.unique(zip(tree_data_queens['Spc_Common']), return_counts=True)
print "Queens data:"
print sorted(zip(counts, unique), reverse = True)









    



Queens data:
[(31111, 'London planetree'), (22610, 'pin oak'), (20290, 'honeylocust'), (19407, 'Norway maple'), (16547, 'Callery pear'), (13497, 'cherry'), (11902, 'littleleaf linden'), (8987, 'Japanese zelkova'), (7389, 'green ash'), (6116, 'silver maple'), (5971, 'ginkgo'), (5386, 'Sophora'), (4935, 'red maple'), (4769, 'American linden'), (4146, 'silver linden'), (3035, 'purple-leaf plum'), (2992, 'maple'), (2697, 'northern red oak'), (2489, 'sweetgum'), (1709, 'American elm')]



In [ ]:

    
#1995 data
import pandas as pd
import numpy as np

tree_data = pd.read_csv('1995_tree_data_updated.csv')

#NYC top 20 species
unique, counts = np.unique(zip(tree_data['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Extract data for each borough
tree_data_bronx = tree_data.loc[tree_data['Borough'] == 'Bronx']
tree_data_brook = tree_data.loc[tree_data['Borough'] == 'Brooklyn']
tree_data_stat = tree_data.loc[tree_data['Borough'] == 'Staten Island']
tree_data_manh = tree_data.loc[tree_data['Borough'] == 'Manhattan']
tree_data_queens = tree_data.loc[tree_data['Borough'] == 'Queens']

#Bronx
unique, counts = np.unique(zip(tree_data_bronx['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Brooklyn
unique, counts = np.unique(zip(tree_data_brook['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Staten Island
unique, counts = np.unique(zip(tree_data_stat['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Manhattan
unique, counts = np.unique(zip(tree_data_manh['Spc_Common']), return_counts=True)
sorted(zip(counts, unique), reverse = True)

#Queens
unique, counts = np.unique(zip(tree_data_queens['Spc_Common']), return_counts=True)
print "Queens data:"
print sorted(zip(counts, unique), reverse = True)

Problems exploration



In [25]:

    
bor_list = list(set(list(Borough)))



In [26]:

    
bronx_prob_list = [0, 0, 0, 0, 0]
brooklyn_prob_list = [0, 0, 0, 0, 0]
staten_prob_list = [0, 0, 0, 0, 0]
man_prob_list = [0, 0, 0, 0, 0]
queens_prob_list = [0, 0, 0, 0, 0]

dic_all_boro = {}
for b in bor_list:
    dic_all_boro[b] = [0, 0, 0, 0, 0]

temp_root = 0
temp_trunk = 0
temp_branch = 0
temp_tot = 0
sidewalk = 0
for i in range (0, len(CB)):
    if root_stone[i] == 'Yes':
        temp_root += 1
    if root_grate[i] == 'Yes':
        temp_root += 1
    if root_other[i] == 'Yes':
        temp_root += 1
    if trunk_wire[i] == 'Yes':
        temp_trunk += 1
    if trnk_light[i] == 'Yes':
        temp_trunk += 1
    if trnk_other[i] == 'Yes':
        temp_trunk += 1
    if brch_light[i] == 'Yes':
        temp_branch += 1
    if brch_shoe[i] == 'Yes':
        temp_branch += 1
    if brch_other[i] == 'Yes':
        temp_branch += 1
    if Sidewalk_Condition[i] == 'Damage':
        sidewalk += 1
    temp_tot = temp_root + temp_trunk + temp_branch + sidewalk
    temp_list = [temp_root, temp_trunk, temp_branch, sidewalk, temp_tot]
    
    #choose which list to update
    c = 0
    for t in temp_list:
        dic_all_boro[Borough[i]][c] += t
        c += 1
    temp_root = 0
    temp_trunk = 0
    temp_branch = 0
    temp_tot = 0
    sidewalk = 0



In [27]:

    
with open('problem_count.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(('Borough','Root_Prob', 'Trunk_Prob', 'Branch_Prob', 'Sidewalk', 'Tot_Prob'))
    for d in dic_all_boro.keys():
        writer.writerow((d, dic_all_boro[d][0], dic_all_boro[d][1], dic_all_boro[d][2], dic_all_boro[d][3], dic_all_boro[d][4] ))
f.close()



In [28]:

    
bronx_prob_list = [0, 0, 0, 0, 0]
brooklyn_prob_list = [0, 0, 0, 0, 0]
staten_prob_list = [0, 0, 0, 0, 0]
man_prob_list = [0, 0, 0, 0, 0]
queens_prob_list = [0, 0, 0, 0, 0]

dic_all_boro = {}
for b in bor_list:
    dic_all_boro[b] = [0, 0, 0, 0, 0]

temp_root = 0
temp_trunk = 0
temp_branch = 0
temp_tot = 0
sidewalk = 0
for i in range (0, len(CB)):
    if root_stone[i] == 'Yes':
        temp_root += 1
    if root_grate[i] == 'Yes':
        temp_root += 1
    if root_other[i] == 'Yes':
        temp_root += 1
    if trunk_wire[i] == 'Yes':
        temp_trunk += 1
    if trnk_light[i] == 'Yes':
        temp_trunk += 1
    if trnk_other[i] == 'Yes':
        temp_trunk += 1
    if brch_light[i] == 'Yes':
        temp_branch += 1
    if brch_shoe[i] == 'Yes':
        temp_branch += 1
    if brch_other[i] == 'Yes':
        temp_branch += 1
    if Sidewalk_Condition[i] == 'Damage':
        sidewalk += 1
    temp_tot = temp_root + temp_trunk + temp_branch + sidewalk
    temp_list = [temp_root, temp_trunk, temp_branch, sidewalk, temp_tot]
    
    #choose which list to update
    c = 0
    for t in temp_list:
        dic_all_boro[Borough[i]][c] += t
        c += 1
    temp_root = 0
    temp_trunk = 0
    temp_branch = 0
    temp_tot = 0
    sidewalk = 0



In [29]:

    
import matplotlib
import matplotlib.pylab as plt
import numpy as np

root = [dic_all_boro['Bronx'][0], dic_all_boro['Brooklyn'][0] , dic_all_boro['Manhattan'][0] , dic_all_boro['Queens'][0] , dic_all_boro['Staten Island'][0]]
trunk = [dic_all_boro['Bronx'][1], dic_all_boro['Brooklyn'][1], dic_all_boro['Manhattan'][1] , dic_all_boro['Queens'][1] , dic_all_boro['Staten Island'][1]]
branch = [dic_all_boro['Bronx'][2], dic_all_boro['Brooklyn'][2], dic_all_boro['Manhattan'][2] , dic_all_boro['Queens'][2] , dic_all_boro['Staten Island'][2]]
tot = [dic_all_boro['Bronx'][3], dic_all_boro['Brooklyn'][3], dic_all_boro['Manhattan'][3] , dic_all_boro['Queens'][3] , dic_all_boro['Staten Island'][3]]

#f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True, sharey=True, figsize=(15,15))

fig_index = 0
#fig = plt.figure(fig_index)
fig, ax = plt.subplots()

ax.set_xlabel('Root')
ax.set_ylabel('Trunk')
ax.scatter(np.asarray(root), np.asarray(trunk))
for label, x, y in zip(bor_list, root, trunk):
    ax.annotate(
        label,
        xy=(x, y), xytext=(-20, 20),
        textcoords='offset points', ha='right', va='bottom',
        bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
        arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
fig.savefig("problem1")
plt.show()

f, ax = plt.subplots()
ax.set_xlabel('Root')
ax.set_ylabel('Branch')
ax.scatter(np.asarray(root), np.asarray(branch))
for label, x, y in zip(bor_list, root, branch):
    ax.annotate(
        label,
        xy=(x, y), xytext=(-20, 20),
        textcoords='offset points', ha='right', va='bottom',
        bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
        arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
f.savefig("problem2")
plt.show()

f, ax = plt.subplots()
ax.set_xlabel('Trunk')
ax.set_ylabel('Branch')
ax.scatter(np.asarray(trunk), np.asarray(branch))
for label, x, y in zip(bor_list, trunk, branch):
    ax.annotate(
        label,
        xy=(x, y), xytext=(-20, 20),
        textcoords='offset points', ha='right', va='bottom',
        bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
        arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
f.savefig("problem3")
plt.show()

We have used the apriori algorithm for exploring if some problems happen to appear together.

Exploration of problems:



In [30]:

    
index = 0
root_stone_lon = []
root_stone_lat = []
root_grate_lon = []
root_grate_lat = []
trunk_wire_lon = []
trunk_wire_lat = []
trunk_light_lon = []
trunk_light_lat = []
branch_light_lon = []
branch_light_lat = []
branch_shoe_lon = []
branch_shoe_lat = []
count_br = 0
count2 = 0

for i in range(0, len(Latitude)):
    if root_stone[i] == 'Yes':
        root_stone_lat.append(float(Latitude[i]))
        root_stone_lon.append(float(Longitude[i]))
    if root_grate[i] == 'Yes':
        root_grate_lat.append(Latitude[i])
        root_grate_lon.append(Longitude[i])
    if trunk_wire[i] == 'Yes':
        trunk_wire_lat.append(Latitude[i])
        trunk_wire_lon.append(Longitude[i])
    if trnk_light[i] == 'Yes':
        trunk_light_lat.append(Latitude[i])
        trunk_light_lon.append(Longitude[i])    
    if brch_light[i] == 'Yes':
        branch_light_lat.append(Latitude[i])
        branch_light_lon.append(Longitude[i])    
    if brch_shoe[i] == 'Yes':
        branch_shoe_lat.append(Latitude[i])
        branch_shoe_lon.append(Longitude[i])
    if Borough[i] == 'Brooklyn' and brch_light[i] == 'Yes' and trunk_wire[i] == 'Yes':
        count_br += 1
    if Borough[i] == 'Brooklyn':
        count2 += 1
print 'Count: ', count_br, count2
print (set(Borough))

root_stone_zip = zip(root_stone_lon, root_stone_lat)
root_grate_zip = zip(root_grate_lon, root_grate_lat)
trunk_wire_zip = zip(trunk_wire_lon, trunk_wire_lat)
trunk_light_zip = zip(trunk_light_lon, trunk_light_lat)
branch_light_zip = zip(branch_light_lon,  branch_light_lat)
branch_shoe_zip = zip(branch_shoe_lon, branch_shoe_lat)









    



Count:  1913 138760
set(['Bronx', 'Brooklyn', 'Staten Island', 'Manhattan', 'Queens'])



In [31]:

    
sidewalk_cond_lon = []
sidewalk_cond_lat = []
for i in range(0, len(Latitude)):
    if Sidewalk_Condition[i] == 'Damage':
        sidewalk_cond_lat.append(float(Latitude[i]))
        sidewalk_cond_lon.append(float(Longitude[i]))
with open('sidewalk_dam.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in zip(sidewalk_cond_lon, sidewalk_cond_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()



In [32]:

    
with open('root_stone.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in zip(root_stone_lon, root_stone_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()
with open('root_grate.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in zip(root_grate_lon, root_grate_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()
with open('trk_wire.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in  zip(trunk_wire_lon, trunk_wire_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()
with open('trk_light.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in  zip(trunk_light_lon, trunk_light_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()
with open('brc_light.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in  zip(branch_light_lon, branch_light_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()
with open('brc_shoe.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(('lon', 'lat'))
    for xd, yd in  zip(branch_shoe_lon, branch_shoe_lat):
        writer.writerow( (xd, yd ) )
    csvfile.close()



In [33]:

    
import geoplotlib
from geoplotlib.utils import read_csv, BoundingBox, DataAccessObject

min_lat = min(root_stone_lat)
max_lat = max(root_stone_lat)
min_lon = min(root_stone_lon)
max_lon = max(root_stone_lon)

bbox = BoundingBox(north=float(max_lat), west=float(max_lon), south=float(min_lat), east=float(min_lon))
print "Trees:", bbox


geoplotlib.set_bbox(bbox)

data = read_csv('root_stone.csv')
print 'Root stone: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data = read_csv('root_grate.csv')
print 'Root grate: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data = read_csv('trk_wire.csv')
print 'Trunk wire: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data = read_csv('trk_light.csv')
print 'Trunk light: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data = read_csv('brc_light.csv')
print 'Branch light: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data = read_csv('brc_shoe.csv')
print 'Branch shoe: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()

data1 = read_csv('brc_light.csv')
data2 = read_csv('trk_wire.csv')
print 'Branch light and Trunk Wire: '
geoplotlib.dot(data1, 'r', point_size = 0.4)
geoplotlib.dot(data2, 'g', point_size = 0.4)
geoplotlib.inline()









    



Trees: BoundingBox(north=40.911798, west=-73.701253, south=40.498466, east=-74.254432)
Root stone: 






    











    



Root grate: 






    











    



Trunk wire: 






    











    



Trunk light: 






    











    



Branch light: 






    











    



Branch shoe: 






    











    



Branch light and Trunk Wire:



In [34]:

    
data = read_csv('sidewalk_dam.csv')
print 'Sidewalk damaged: '
geoplotlib.dot(data, 'r', point_size = 0.4)
geoplotlib.inline()









    



Sidewalk damaged:



In [35]:

    
!pip install apyori-1.1.1.tar.gz
## Trying association mining
from apyori import apriori

transactions = [
    ['cheese', 'nuggets'],
    ['burgers', 'balls'],
]
results = list(apriori(transactions))









    



Processing ./apyori-1.1.1.tar.gz
  Requirement already satisfied (use --upgrade to upgrade): apyori==1.1.1 from file:///Users/daniele/Desktop/NordSecMob/DTU/SocialData/Final%20Project/cecli.github.io/apyori-1.1.1.tar.gz in /anaconda/lib/python2.7/site-packages
Building wheels for collected packages: apyori
  Running setup.py bdist_wheel for apyori ... - \ done
  Stored in directory: /Users/daniele/Library/Caches/pip/wheels/78/7c/59/591d6048c22ef3269e40f050a804ad92cae7ea71bf04fcd19f
Successfully built apyori



In [36]:

    
## Trying association mining
from apyori import apriori

transactions = [
    ['beer', 'nuts'],
    ['beer', 'cheese'],
    ['nuts', 'cheese'],
]
transactions.append(['nuts', 'cheese'])
results = list(apriori(transactions))
print results[0]
print ''
print results[1]
print ''
print results[4]









    



RelationRecord(items=frozenset(['beer']), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset([]), items_add=frozenset(['beer']), confidence=0.5, lift=1.0)])

RelationRecord(items=frozenset(['cheese']), support=0.75, ordered_statistics=[OrderedStatistic(items_base=frozenset([]), items_add=frozenset(['cheese']), confidence=0.75, lift=1.0)])

RelationRecord(items=frozenset(['beer', 'nuts']), support=0.25, ordered_statistics=[OrderedStatistic(items_base=frozenset(['beer']), items_add=frozenset(['nuts']), confidence=0.5, lift=0.6666666666666666), OrderedStatistic(items_base=frozenset(['nuts']), items_add=frozenset(['beer']), confidence=0.3333333333333333, lift=0.6666666666666666)])



In [37]:

    
import numpy as np
#root_stone = data['root_stone']
#root_grate = data['root_grate']
#root_other = data['root_other']
#trunk_wire = data['trunk_wire']
#trnk_light = data['trnk_light']
#trnk_other = data['trnk_other']
#brch_light = data['brch_light']
#brch_shoe = data['brch_shoe']
#brch_other = data['brch_other']
transactions = []
temp = []
np_count = 0
nuno_count = 0
counter = 0
print len(temp)
for i in range(0,len(root_stone)):
    temp = []
    if root_stone[i] == 'Yes':
        temp.append("Root_Stone")
    if root_grate[i] == 'Yes':
        temp.append("Root_Grate")
    #if root_other[i] == 'Yes':
        #temp.append("Root_Other")
    if trunk_wire[i] == 'Yes':
        temp.append("Trunk_Wire")
    if trnk_light[i] == 'Yes':
        temp.append("Trunk_Light")
    #if trnk_other[i] == 'Yes':
        #temp.append("Trunk_Other")
    if brch_light[i] == 'Yes':
        temp.append("Branch_Light")
    if brch_shoe[i] == 'Yes':
        temp.append("Branch_Shoe")
    if Sidewalk_Condition[i] == 'Damage':
        temp.append("Sidewalk")
    #if brch_other[i] == 'Yes':
        #temp.append("Branch_Other")
    if (len(temp)) > 1:    
        transactions.append(temp)
    elif (len(temp)) == 0: 
        np_count = np_count + 1
    elif (len(temp)) == 1: 
        nuno_count += 1
    if (len(temp)) > 1:
        counter += 1
        
results = list(apriori(np.asarray(transactions)))
print 'Associated:', len(transactions)
print len(results)
print 'Empty:', np_count
print 'One Item:', nuno_count
print 'More: ', counter









    



0
Associated: 98665
7
Empty: 291659
One Item: 144190
More:  98665



In [38]:

    
import numpy as np
transactions = []
temp = []
np_count = 0
nuno_count = 0
counter = 0
print len(temp)
for i in range(0,len(root_stone)):
    temp = []
    if root_stone[i] == 'Yes':
        temp.append("Root_Stone")
    #if root_grate[i] == 'Yes':
    #    temp.append("Root_Grate")
    if trunk_wire[i] == 'Yes':
        temp.append("Trunk_Wire")
    #if trnk_light[i] == 'Yes':
    #    temp.append("Trunk_Light")
    if brch_light[i] == 'Yes':
        temp.append("Branch_Light")
    #if brch_shoe[i] == 'Yes':
    #    temp.append("Branch_Shoe")
    #if Sidewalk_Condition[i] == 'Damage':
    #    temp.append("Sidewalk")
    if (len(temp)) > 1:    
        transactions.append(temp)
    elif (len(temp)) == 0: 
        np_count = np_count + 1
    elif (len(temp)) == 1: 
        nuno_count += 1
    if (len(temp)) > 1:
        counter += 1
        
results = list(apriori(np.asarray(transactions)))
for i in range (0, len(results)):
    print '- ', i, ':', results[i][0], results[i][1], ', Lift:' ,results[i][-1][-1][-1]









    



0
-  0 : frozenset(['Branch_Light']) 0.935900331226 , Lift: 1.0
-  1 : frozenset(['Root_Stone']) 0.893819632641 , Lift: 1.0
-  2 : frozenset(['Trunk_Wire']) 0.263964167419 , Lift: 1.0
-  3 : frozenset(['Branch_Light', 'Root_Stone']) 0.829719963866 , Lift: 0.991863820558
-  4 : frozenset(['Trunk_Wire', 'Branch_Light']) 0.199864498645 , Lift: 0.809023396238
-  5 : frozenset(['Trunk_Wire', 'Root_Stone']) 0.15778380006 , Lift: 0.668755775081

The results show us that branch light appears together with trunk wire, which can also be seen in the plot. This could be because trees places such that their branches grow into street lights also have the trunk be provoked by the lighting structures.

We have also explored the 311 dataset in the context of trees, as there is some data in there that is specific to our domain.

Some of the most interesting 311 requests that we found, were in relation to overgrown trees and new tree requests. The Python analysis has not been included in detail in the notebook, but two images are included which show two geoplots of the mentioned complaints (note that there are some hotspots ).



In [39]:

    
from IPython.display import Image
Image("new_requests.png")









    Out[39]:



In [40]:

    
Image("overgrown_trees.png")









    Out[40]:

We have found out that, sadly, problems are not related with the health. In fact, the Pearson correlation of these two parameters was very low.

We explore the diameter and the amount of trees in area of the city for discovering that these two factors are influencing the air quality. We also discovered that the problems seem to have an influence on the diameter.



In [41]:

    
Image("problem_amount.png")









    Out[41]:

The above image shows the resuls of the regression, which can also be seen here:

0.28485178

Regression

These are the results of the regression between air pollution and O2 for predicting the amount of air pollution (and O2) given the amount of trees and their diameter as parameters. The results were almost the same for the other types of particles found. The whole regression notebook is included in the repo with the necessary datafiles. (https://github.com/cecli/cecli.github.io/blob/master/regression_notebook.ipynb)

r^2 elastic net on test data : 0.509594
Mean squared error: 5.42
Mean squared error: 5.36
r^2 lasso on test data : 0.515746
Mean squared error: 5.52
Variance score (ols): 0.50

The images showing the corellation between the amount of trees and the pollution is in the other notebook, and this is also the data which the regression visualizaiton on the website is based on.



In [ ]:

Predicting species based on location



In [42]:

    
#KNN classifier

#Load relevant libraries
import numpy as np
import pylab as pl
from sklearn import neighbors, datasets, model_selection

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Latitude'], tree_data['Longitude'] )
                                                    , tree_data['Spc_Common']
                                                    , test_size=0.15
                                                    , random_state=42)


accuracy = []
#Classify KNN with K=2-10
for k in range(2,11):
    knn = neighbors.KNeighborsClassifier(n_neighbors=k, weights = "distance")

    #Fit the data and make predictions
    knn.fit(X_train, y_train).predict(X_test)

    #Calculate accuracy from validation set
    n_folds = 5
    score = np.mean(model_selection.cross_val_score(knn.fit(X_train, y_train),X_train, y_train,cv=n_folds))
    print "KNN score for k =", k, ":", score
    
    #Save accuracy into a list
    accuracy.append(score)
    
#KNN score for k = 2 : 0.523009486208
#KNN score for k = 3 : 0.518710892965
#KNN score for k = 4 : 0.520454115483
#KNN score for k = 5 : 0.519925851647
#KNN score for k = 6 : 0.51948343831
#KNN score for k = 7 : 0.518847340502
#KNN score for k = 8 : 0.518138623743
#KNN score for k = 9 : 0.517289048133
#KNN score for k = 10 : 0.51639320774









    



KNN score for k = 2 : 0.523009486208
KNN score for k = 3 : 0.518710892965
KNN score for k = 4 : 0.520454115483
KNN score for k = 5 : 0.519925851647
KNN score for k = 6 : 0.51948343831
KNN score for k = 7 : 0.518847340502
KNN score for k = 8 : 0.518138623743
KNN score for k = 9 : 0.517289048133
KNN score for k = 10 : 0.51639320774



In [43]:

    
#Plot accuracy as a function of the number of K (2-10)
import matplotlib.pyplot as plt

plt.figure(figsize=(20,5))
ks = range(2, 11)
plt.plot(ks, accuracy)
plt.xticks(ks)
plt.xlabel("k")
plt.ylabel("Accuracy")
plt.title("Prediction accuracy as a function of k")
plt.show()



In [44]:

    
#K=4 was chosen for simplicity compared to accuracy

#Test score
knn = neighbors.KNeighborsClassifier(n_neighbors=4, weights = "distance")

#Fit the data and make predictions
knn.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
score = knn.fit(X_train, y_train).score(X_test, y_test)
print score #0.543241288134









    



0.543241288134



In [45]:

    
tree_data = pd.read_csv('2015_tree_data_updated.csv')

unique, counts = np.unique(zip(tree_data['Spc_Common']), return_counts=True)
print sorted(zip(counts, unique), reverse = True)
#Try doing KNN for only the top 5 species
top5_spec = ['London planetree','honeylocust', 'Callery pear','pin oak', 'Norway maple']
tree_spec5 = []
tree_lat5 = []
tree_lon5 = []
for i in range(len(tree_data)):
    if tree_data['Spc_Common'][i] in top5_spec:
        tree_spec5.append(tree_data['Spc_Common'][i])
        tree_lat5.append(tree_data['Latitude'][i])
        tree_lon5.append(tree_data['Longitude'][i])
print len(tree_spec5)
#print tree_spec5[:10]
#print tree_lat5[:10]
#print tree_lon5[:10]

#KNN classifier
#Load relevant libraries
import numpy as np
import pylab as pl
from sklearn import neighbors, datasets, model_selection

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_lat5, tree_lon5)
                                                    , tree_spec5
                                                    , test_size=0.15
                                                    , random_state=42)

accuracy = []
#Classify KNN with K=2-10
for k in range(2,11):
    knn = neighbors.KNeighborsClassifier(n_neighbors=k, weights = "distance")
    
    #Fit the data and make predictions
    knn.fit(X_train, y_train).predict(X_test)

    #Calculate accuracy
    #score = knn.fit(X_train, y_train).score(X_test, y_test)
    n_folds = 5
    score = np.mean(model_selection.cross_val_score(knn.fit(X_train, y_train),X_train, y_train,cv=n_folds))
    print "KNN score for k =", k, ":", score
    
    #Save accuracy into a list
    accuracy.append(score)









    



[(87014, 'London planetree'), (64264, 'honeylocust'), (58931, 'Callery pear'), (53185, 'pin oak'), (34189, 'Norway maple'), (29742, 'littleleaf linden'), (29279, 'cherry'), (29258, 'Japanese zelkova'), (21024, 'ginkgo'), (19338, 'Sophora'), (17246, 'red maple'), (16251, 'green ash'), (13530, 'American linden'), (12277, 'silver maple'), (10657, 'sweetgum'), (8400, 'northern red oak'), (7995, 'silver linden'), (7975, 'American elm'), (7080, 'maple'), (6879, 'purple-leaf plum')]
297583
KNN score for k = 2 : 0.704081936424
KNN score for k = 3 : 0.702540110987
KNN score for k = 4 : 0.704034511536
KNN score for k = 5 : 0.702745680523
KNN score for k = 6 : 0.701962891889
KNN score for k = 7 : 0.701346174765
KNN score for k = 8 : 0.700005955431
KNN score for k = 9 : 0.698875265002
KNN score for k = 10 : 0.69768924392



In [46]:

    
#Create Decision tree classifier

#Load relevant libraries
import numpy as np
from sklearn import tree
from sklearn import model_selection

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Latitude'], tree_data['Longitude'])
                                                    , tree_data['Spc_Common']
                                                    , test_size=0.15
                                                    , random_state=42)

#Classify Decision trees
dt = tree.DecisionTreeClassifier(random_state = 42)

#Fit the data and make predictions
dt.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 5
score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Decision tree accuracy:", score









    



Decision tree accuracy: 0.453208685409

Classify health based on location



In [47]:

    
#Adjust KNN classifyer

#Load relevant libraries
import numpy as np
import pylab as pl
from sklearn import neighbors, datasets, model_selection

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Latitude'], tree_data['Longitude'])
                                                    , tree_data['Health']
                                                    , test_size=0.15
                                                    , random_state=42)

accuracy = []
#Classify KNN with K=2-10
for k in range(2,11):
    knn = neighbors.KNeighborsClassifier(n_neighbors=k, weights="distance")

    #Fit the data and make predictions
    knn_pred = knn.fit(X_train, y_train).predict(X_test)

    #Calculate accuracy
    #score = knn.fit(X_train, y_train).score(X_test, y_test)
    n_folds = 5
    score = np.mean(model_selection.cross_val_score(knn.fit(X_train, y_train),X_train, y_train,cv=n_folds))
    print "KNN score for k =", k, ":", score
    
    #Save accuracy into a list
    accuracy.append(score)
    
#KNN score for k = 2 : 0.768112149227
#KNN score for k = 3 : 0.789842752784
#KNN score for k = 4 : 0.798109764942
#KNN score for k = 5 : 0.804213182933
#KNN score for k = 6 : 0.80804514699
#KNN score for k = 7 : 0.810998909425
#KNN score for k = 8 : 0.813439834781
#KNN score for k = 9 : 0.815178637038
#KNN score for k = 10 : 0.816549869474









    



//anaconda/lib/python2.7/site-packages/numpy/lib/arraysetops.py:200: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))
//anaconda/lib/python2.7/site-packages/sklearn/model_selection/_split.py:581: Warning: The least populated class in y has only 2 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=5.
  % (min_groups, self.n_splits)), Warning)






    



KNN score for k = 2 : 0.768112149227
KNN score for k = 3 : 0.789842752784
KNN score for k = 4 : 0.798109764942
KNN score for k = 5 : 0.804213182933
KNN score for k = 6 : 0.80804514699
KNN score for k = 7 : 0.810998909425
KNN score for k = 8 : 0.813439834781
KNN score for k = 9 : 0.815178637038
KNN score for k = 10 : 0.816549869474



In [48]:

    
#Test accuracy

knn = neighbors.KNeighborsClassifier(n_neighbors=5, weights="distance")
#Fit the data and make predictions
knn_pred = knn.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
score = knn.fit(X_train, y_train).score(X_test, y_test)
print score #0.806917109432









    



0.806917109432



In [ ]:

    
#Create Decision tree classifier

#Load relevant libraries
import numpy as np
from sklearn import tree
from sklearn import model_selection


#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Latitude'], tree_data['Longitude'])
                                                    , tree_data['Health']
                                                    , test_size=0.15
                                                    , random_state=42)

#Classify Decision trees
dt = tree.DecisionTreeClassifier(random_state = 42)

#Fit the data and make predictions
dt_pred = dt.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 5
score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Decision tree accuracy:", score #0.748111525321









    



Decision tree accuracy: 0.748111525321



In [ ]:

    
# Create SVM classifier

#Load relevant libraries
import numpy as np
from sklearn import svm
from sklearn import model_selection

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Latitude'], tree_data['Longitude'])
                                                    , tree_data['Health']
                                                    , test_size=0.15
                                                    , random_state=42)

#Classify Decision trees
svm = svm.SVC(random_state = 42)

#Fit the data and make predictions
svm.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 5
score = np.mean(model_selection.cross_val_score(svm.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "SVM accuracy:", score

Decision tree and Random Forest

We tried decision tree and random forest to predict species based on locations and diameter. We tried to classify both top 20, 10 and 5 species. We found out that the decision tree was not an idea solution, as the best result was 0.5 for the top 5 species. Random forest had the same result. We have included the images of the decision trees, and as can be seen, they just predict two species: Locust and Pine Oak. So this is clearly not a good model. We also tried predicting neighbourhoods based on problems, but that also did not work, as the accuracy was even worse. The images are included in the repository.



In [ ]:

    
#Try doing KNN for only the top 10 species
top10_spec = ['London planetree','pin oak', 'honeylocust','Norway maple', 'Callery pear']
tree_spec10 = []
tree_lat10 = []
tree_lon10 = []
tree_health10 = []
tree_nth10 = []
tree_diam10 = []
tree_cb10 = []
tree_boro10 = []
tree_root10 = []
tree_branch10 = []
tree_trunk10 = []
tree_total10 = []
print len(health), len(tree_data)
for i in range(len(health)):
    if tree_data['Spc_Common'][i] in top10_spec and float(tree_data['Diameter'][i]) >= 10.00:
        tree_spec10.append(tree_data['Spc_Common'][i])
        tree_lat10.append(tree_data['Latitude'][i])
        tree_lon10.append(tree_data['Longitude'][i])
        tree_health10.append(health[i])
        tree_nth10.append(tree_data['Neighbourhoods'][i])
        tree_diam10.append(tree_data['Diameter'][i])
        tree_cb10.append(tree_data['CB'][i])
        tree_boro10.append(tree_data['Borough'][i])
        tree_root10.append(root_list[i])
        tree_branch10.append(branch_list[i])
        tree_trunk10.append(trunk_list[i])
        tree_total10.append(total_prob_list[i])
print len(tree_spec10)
print tree_spec10[:10]
print tree_lat10[:10]
print tree_lon10[:10]



In [ ]:

    
bin1 = 1.00
bin2 = 10.00
bin3 = 15.00
bin4 = 20.00
bin5 = 25.00
bin6 = 30.00
bin7 = 35.00
bin8 = 40.00
bin9 = 45.00
bin10 = 50.00
bin11 = 60.00
bin12 = 70.00
bin13 = 80.00
bin14 = 90.00
bin15 = 100.00
bin16 = 150.00
bin17 = 200.00
bin18 = 250.00
bin19 = 300.00
list_bins = [bin19, bin18, bin17, bin16, bin15, bin14, bin13, bin12, bin11, bin10, bin9, bin8, 
             bin7, bin6, bin5, bin4, bin3, bin2]
new_diam = []
for t in tree_diam10: 
    ft = float(t)
    for l in list_bins: 
        if ft >= l: 
            new_diam.append(l)
            break
        else: 
            continue



In [ ]:

    
#Decision tree for classifying tree species based on health and diameter 
#Load relevant libraries
import numpy as np
from sklearn import tree
from sklearn import model_selection
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
#le.fit(tree_data['Neighbourhoods'])
#list(le.classes_)
#trans_nbh = le.transform(tree_data['Neighbourhoods']) 

#Split data set into a training and a test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_data['Diameter'], tree_data['Longitude'], tree_data['Latitude'])
                                                    , tree_data['Spc_Common']
                                                    , test_size=0.10
                                                    , random_state=42)

#Classify Decision trees
dt = tree.DecisionTreeClassifier(random_state = 42)

#Fit the data and make predictions
dt.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 10
score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Decision tree accuracy:", score

#trans_nbh10 = le.transform(tree_nth10) 

#X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_diam10, tree_health10, trans_cb10)
                                                    #, tree_spec10
                                                    #, test_size=0.33
                                                    #, random_state=42)
#Classify Decision trees
#dt = tree.DecisionTreeClassifier(random_state = 42)

#Fit the data and make predictions
#dt.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
#n_folds = 10
#score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
#print "Decision tree accuracy for classifying top 10 species:", score



In [ ]:

    
import numpy as np
from sklearn import tree
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import ensemble
le = preprocessing.LabelEncoder()
le.fit(tree_nth10)
list(le.classes_)
trans_nbh10 = le.transform(tree_nth10) 
le = preprocessing.LabelEncoder()
le.fit(tree_boro10)
list(le.classes_)
trans_boro10 = le.transform(tree_boro10) 

X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(tree_lon10, tree_lat10),
                                                     tree_spec10
                                                    , test_size=0.10
                                                    , random_state=42)
#Classify Decision trees
dt = tree.DecisionTreeClassifier(random_state = 42)
dt2 = ensemble.RandomForestClassifier(random_state = 42)

#Fit the data and make predictions
dt.fit(X_train, y_train).predict(X_test)
dt2.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 10
score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Decision tree accuracy for classifying top 10 species:", score
score = np.mean(model_selection.cross_val_score(dt2.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Random forest accuracy for classifying top 10 species:", score



In [ ]:

    
import numpy as np
from sklearn import tree
from sklearn import model_selection
from sklearn import preprocessing
from sklearn import ensemble
le = preprocessing.LabelEncoder()
le.fit(tree_nth10)
list(le.classes_)
trans_nbh10 = le.transform(tree_nth10) 
le = preprocessing.LabelEncoder()
le.fit(tree_boro10)
list(le.classes_)
trans_boro10 = le.transform(tree_boro10) 

le = preprocessing.LabelEncoder()
le.fit(tree_spec10)
list(le.classes_)
trans_species10 = le.transform(tree_spec10) 

X_train, X_test, y_train, y_test = model_selection.train_test_split(zip(trans_species10, tree_total10)
                                                    , new_diam
                                                    , test_size=0.25
                                                    , random_state=42)
#Classify Decision trees
dt = tree.DecisionTreeClassifier(max_depth=20, max_leaf_nodes=40, random_state = 42)
#dt2 = ensemble.RandomForestClassifier(random_state = 42)

#Fit the data and make predictions
dt.fit(X_train, y_train).predict(X_test)
#dt2.fit(X_train, y_train).predict(X_test)

#Calculate accuracy
#score = dtnn.fit(X_train, y_train).score(X_test, y_test)
n_folds = 5
score = np.mean(model_selection.cross_val_score(dt.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Decision tree accuracy for classifying top 10 species:", score
score = np.mean(model_selection.cross_val_score(dt2.fit(X_train, y_train),X_train, y_train,cv=n_folds))
print "Random forest accuracy for classifying top 10 species:", score

	Diameter	Status	Health	Spc_Latin	Spc_Common	Sidewalk_Condition	problems	root_stone	root_grate	root_other	...	brch_shoe	brch_other	Address	Zipcode	CB	Borough	nta	Neighbourhoods	Latitude	Longitude
0	3	Alive	Fair	Acer rubrum	red maple	NoDamage	None	No	No	No	...	No	No	108-005 70 AVENUE	11375	406	Queens	QN17	Forest Hills	40.723092	-73.844215
1	21	Alive	Fair	Quercus palustris	pin oak	Damage	Stones	Yes	No	No	...	No	No	147-074 7 AVENUE	11357	407	Queens	QN49	Whitestone	40.794111	-73.818679
2	3	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	Damage	None	No	No	No	...	No	No	390 MORGAN AVENUE	11211	301	Brooklyn	BK90	East Williamsburg	40.717581	-73.936608
3	10	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	Damage	Stones	Yes	No	No	...	No	No	1027 GRAND STREET	11211	301	Brooklyn	BK90	East Williamsburg	40.713537	-73.934456
4	21	Alive	Good	Tilia americana	American linden	Damage	Stones	Yes	No	No	...	No	No	603 6 STREET	11215	306	Brooklyn	BK37	Park Slope-Gowanus	40.666778	-73.975979
5	11	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	8 COLUMBUS AVENUE	10023	107	Manhattan	MN14	Lincoln Square	40.770046	-73.984950
6	11	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	120 WEST 60 STREET	10023	107	Manhattan	MN14	Lincoln Square	40.770210	-73.985338
7	9	Alive	Good	Tilia americana	American linden	NoDamage	MetalGrates	No	Yes	No	...	No	No	311 WEST 50 STREET	10019	104	Manhattan	MN15	Clinton	40.762724	-73.987297
8	6	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	65 JEROME AVENUE	10305	502	Staten Island	SI14	Grasmere-Arrochar-Ft. Wadsworth	40.596579	-74.076255
9	21	Alive	Fair	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	638 AVENUE Z	11223	313	Brooklyn	BK26	Gravesend	40.586357	-73.969744
10	11	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	20-025 24 STREET	11105	401	Queens	QN72	Steinway	40.782428	-73.911171
11	8	Alive	Poor	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	20-055 24 STREET	11105	401	Queens	QN72	Steinway	40.781735	-73.912020
12	13	Alive	Fair	Platanus x acerifolia	London planetree	NoDamage	Stones	Yes	No	No	...	No	No	35 FENWAY CIRCLE	10308	503	Staten Island	SI54	Great Kills	40.557103	-74.162670
13	22	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	RootOther	No	No	Yes	...	No	No	100 WAVERLY AVENUE	11205	302	Brooklyn	BK69	Clinton Hill	40.694733	-73.968211
14	30	Alive	Fair	Platanus x acerifolia	London planetree	Damage	Stones,BranchOther	Yes	No	No	...	No	Yes	2126 UNION STREET	11212	316	Brooklyn	BK81	Brownsville	40.664317	-73.921130
15	12	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	449 MYRTLE AVENUE	11205	302	Brooklyn	BK69	Clinton Hill	40.693314	-73.967601
16	2	Alive	Fair	Ginkgo biloba	ginkgo	Damage	None	No	No	No	...	No	No	8797 25 AVENUE	11214	311	Brooklyn	BK29	Bensonhurst East	40.593788	-73.991597
17	14	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	Damage	None	No	No	No	...	No	No	1601 CHURCH AVENUE	11226	314	Brooklyn	BK42	Flatbush	40.648788	-73.964524
18	14	Alive	Fair	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	TrunkLights,BranchLights	No	No	No	...	No	No	55-026 96 STREET	11373	404	Queens	QN25	Corona	40.737646	-73.865300
19	10	Alive	Good	Ginkgo biloba	ginkgo	NoDamage	Stones	Yes	No	No	...	No	No	206 CARLTON AVENUE	11205	302	Brooklyn	BK68	Fort Greene	40.691499	-73.972588
20	11	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	367 PROSPECT AVENUE	11215	307	Brooklyn	BK37	Park Slope-Gowanus	40.661239	-73.985889
21	14	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	Damage	RootOther,TrunkOther,BranchOther	No	No	Yes	...	No	Yes	170 EAST 75 STREET	10021	108	Manhattan	MN40	Upper East Side-Carnegie Hill	40.772171	-73.960456
22	33	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	Stones,BranchLights	Yes	No	No	...	No	No	84-036 127 STREET	11415	409	Queens	QN60	Kew Gardens	40.706534	-73.824992
23	19	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	401 AVENUE O	11230	312	Brooklyn	BK46	Ocean Parkway South	40.611905	-73.970427
24	9	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	71 STANTON STREET	10002	103	Manhattan	MN27	Chinatown	40.721807	-73.989830
25	9	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	Damage	TrunkOther,BranchLights	No	No	No	...	No	No	1817 DE KALB AVENUE	11385	405	Queens	QN20	Ridgewood	40.708040	-73.915497
26	7	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	456 5 AVENUE	11215	306	Brooklyn	BK37	Park Slope-Gowanus	40.668826	-73.986703
27	3	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	Stones	Yes	No	No	...	No	No	2022 LA FONTAINE AVENUE	10457	206	Bronx	BX17	East Tremont	40.847947	-73.893382
28	7	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	MetalGrates,TrunkOther	No	Yes	No	...	No	No	1880 BROADWAY	10023	107	Manhattan	MN14	Lincoln Square	40.770396	-73.981627
29	5	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	MetalGrates,TrunkOther	No	Yes	No	...	No	No	1 WEST 62 STREET	10023	107	Manhattan	MN14	Lincoln Square	40.770227	-73.981218
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
534484	20	Alive	Good	Ginkgo biloba	ginkgo	Damage	Stones	Yes	No	No	...	No	No	31 FISKE PLACE	11215	306	Brooklyn	BK37	Park Slope-Gowanus	40.671685	-73.975309
534485	11	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	RootOther	No	No	Yes	...	No	No	325 EAST 64 STREET	10065	108	Manhattan	MN31	Lenox Hill-Roosevelt Island	40.763224	-73.960984
534486	4	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	483 MYRTLE AVENUE	11205	302	Brooklyn	BK69	Clinton Hill	40.693606	-73.965886
534487	12	Alive	Fair	Gleditsia triacanthos var. inermis	honeylocust	Damage	Stones	Yes	No	No	...	No	No	491 GRANDVIEW AVENUE	11385	405	Queens	QN20	Ridgewood	40.709886	-73.907888
534488	18	Alive	Fair	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	Stones	Yes	No	No	...	No	No	112-001 72 DRIVE	11375	406	Queens	QN17	Forest Hills	40.720503	-73.837470
534489	22	Alive	Good	Platanus x acerifolia	London planetree	Damage	Stones,RootOther,TrunkOther	Yes	No	Yes	...	No	No	1494 PUTNAM AVENUE	11237	304	Brooklyn	BK77	Bushwick North	40.697094	-73.909830
534490	25	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	Stones,BranchLights	Yes	No	No	...	No	No	1349 64 STREET	11219	310	Brooklyn	BK30	Dyker Heights	40.625847	-73.999582
534491	19	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	81-020 PETTIT AVENUE	11373	404	Queens	QN29	Elmhurst	40.743967	-73.883604
534492	9	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	21 SOUTH END AVENUE	10280	101	Manhattan	MN25	Battery Park City-Lower Manhattan	40.707884	-74.017598
534493	6	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	8922 3 AVENUE	11209	310	Brooklyn	BK31	Bay Ridge	40.620861	-74.032224
534494	4	Alive	Fair	Gleditsia triacanthos var. inermis	honeylocust	Damage	None	No	No	No	...	No	No	140-040 31 DRIVE	11354	407	Queens	QN22	Flushing	40.769787	-73.827245
534495	7	Alive	Poor	Tilia americana	American linden	NoDamage	Stones,RootOther	Yes	No	Yes	...	No	No	8700 25 AVENUE	11214	311	Brooklyn	BK29	Bensonhurst East	40.595673	-73.989637
534496	10	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	41 WEST 86 STREET	10024	107	Manhattan	MN12	Upper West Side	40.786150	-73.971152
534497	13	Alive	Fair	Gleditsia triacanthos var. inermis	honeylocust	Damage	Stones	Yes	No	No	...	No	No	699 GROTE STREET	10457	206	Bronx	BX06	Belmont	40.851115	-73.885511
534498	6	Alive	Good	Ginkgo biloba	ginkgo	Damage	Stones	Yes	No	No	...	No	No	166 IRWIN STREET	11235	315	Brooklyn	BK17	Sheepshead Bay-Gerritsen Beach-Manhattan Beach	40.578603	-73.943733
534499	4	Alive	Good	Gleditsia triacanthos var. inermis	honeylocust	NoDamage	None	No	No	No	...	No	No	86 WAVERLY AVENUE	11205	302	Brooklyn	BK69	Clinton Hill	40.695204	-73.968306
534500	12	Alive	Good	Ginkgo biloba	ginkgo	NoDamage	None	No	No	No	...	No	No	149 EAST 23 STREET	10010	106	Manhattan	MN21	Gramercy	40.739270	-73.983960
534501	17	Alive	Good	Quercus palustris	pin oak	NoDamage	None	No	No	No	...	No	No	67-008 JUNO STREET	11375	406	Queens	QN17	Forest Hills	40.716741	-73.857177
534502	5	Alive	Good	Quercus palustris	pin oak	NoDamage	None	No	No	No	...	No	No	345 WEST 13 STREET	10014	102	Manhattan	MN23	West Village	40.739913	-74.004892
534503	11	Alive	Good	Quercus palustris	pin oak	Damage	None	No	No	No	...	No	No	2766 BEDFORD AVENUE	11210	314	Brooklyn	BK42	Flatbush	40.635175	-73.953392
534504	12	Alive	Good	Ulmus americana	American elm	NoDamage	None	No	No	No	...	No	No	1040 EASTERN PARKWAY	11213	309	Brooklyn	BK61	Crown Heights North	40.668770	-73.934053
534505	29	Alive	Good	Platanus x acerifolia	London planetree	Damage	Stones,BranchLights	Yes	No	No	...	No	No	1040 EAST 16 STREET	11230	314	Brooklyn	BK43	Midwood	40.624296	-73.960344
534506	27	Alive	Good	Platanus x acerifolia	London planetree	NoDamage	Stones,WiresRope,BranchLights	Yes	No	No	...	No	No	2720 QUENTIN ROAD	11229	315	Brooklyn	BK44	Madison	40.609541	-73.945835
534507	15	Alive	Fair	Platanus x acerifolia	London planetree	NoDamage	None	No	No	No	...	No	No	50-017 SKILLMAN AVENUE	11377	402	Queens	QN31	Hunters Point-Sunnyside-West Maspeth	40.746122	-73.913657
534508	20	Alive	Good	Quercus palustris	pin oak	NoDamage	Stones	Yes	No	No	...	No	No	1040 FLATBUSH AVENUE	11226	314	Brooklyn	BK42	Flatbush	40.645694	-73.958179
534509	3	Alive	Good	Quercus palustris	pin oak	NoDamage	None	No	No	No	...	No	No	2185 VALENTINE AVENUE	10457	205	Bronx	BX41	Mount Hope	40.854570	-73.899192
534510	25	Alive	Good	Quercus palustris	pin oak	Damage	None	No	No	No	...	No	No	32 MARCY AVENUE	11211	301	Brooklyn	BK73	North Side-South Side	40.713211	-73.954944
534511	12	Alive	Good	Acer rubrum	red maple	Damage	None	No	No	No	...	No	No	130 BIDWELL AVENUE	10314	501	Staten Island	SI07	Westerleigh	40.620762	-74.136517
534512	9	Alive	Good	Acer rubrum	red maple	NoDamage	None	No	No	No	...	No	No	1985 ANTHONY AVENUE	10457	205	Bronx	BX41	Mount Hope	40.850828	-73.903115
534513	23	Alive	Fair	Acer rubrum	red maple	NoDamage	None	No	No	No	...	No	No	69-069 183 STREET	11365	408	Queens	QN41	Fresh Meadows-Utopia	40.732165	-73.787526