In [1]:
sc


Out[1]:
<pyspark.context.SparkContext at 0x1053a0b90>

This Notebool is Mainly to Check the Buffer built from Entrances .shp (previously) and from geojson (.json) in most recent Dumbo-Can-Run scripts.

Load


In [2]:
from shapely.geometry import Point
import pyproj
import geopandas as gpd
proj = pyproj.Proj(init='epsg:2263', preserve_units=True)

entr_points = sqlContext.read.load('../why_yellow_taxi/Data/2016_(May)_New_York_City_Subway_Station_Entrances.json', \
                                format='json', header=True, inferSchema=True).collect()[0].asDict()['features']
routes = ['route_'+str(i) for i in range(1,12)]
entr_geo = gpd.GeoDataFrame(columns=['geometry', 'lines'])


for i in range(len(entr_points)):
    entr_coor = entr_points[i].asDict()['geometry'].asDict()['coordinates']
    entr_buffer = Point(proj(float(entr_coor[0]), float(entr_coor[1]))).buffer(100)
    entr_prop = entr_points[i].asDict()['properties'].asDict()
    entr_lines = [entr_prop[r] for r in routes if entr_prop[r]]
    entr_geo = entr_geo.append({'geometry':entr_buffer, 'lines':entr_lines}, ignore_index=True)

In [3]:
shp = gpd.read_file('../why_yellow_taxi/Buffer/entr_buffer_100_feet_epsg4269_nad83/entr_buffer_100_feet_epsg4269_nad83.shp')

List


In [4]:
entr_geo.head(2)


Out[4]:
geometry lines
0 POLYGON ((1008702.708067201 221696.7163773214,... [N, Q]
1 POLYGON ((1008681.505385148 221573.1859670931,... [N, Q]

In [5]:
shp.head(2)


Out[5]:
ADA ADA_Notes Corner Division East_West_ Entrance_T Entry Exit_Only Free_Cross GEOID ... Route_7 Route_8 Route_9 Staff_Hour Staffing Station_La Station_Lo Station_Na Vending geometry
0 FALSE None NW BMT 23rd Ave Stair YES None TRUE 36081 ... None NaN NaN None FULL 40.775036 -73.912034 Ditmars Blvd YES POLYGON ((1008702.708067201 221696.7163773818,...
1 FALSE None NE BMT 23rd Ave Stair YES None TRUE 36081 ... None NaN NaN None FULL 40.775036 -73.912034 Ditmars Blvd YES POLYGON ((1008681.505385144 221573.1859671536,...

2 rows × 33 columns

Identical or Not?


In [6]:
entr_geo.head(2).geometry[1]  == shp.head(2).geometry[1]


Out[6]:
False

Detail


In [7]:
shp.head(2).geometry[0].centroid.x


Out[7]:
1008602.7080672013

In [8]:
shp.head(2).geometry[0].centroid.y


Out[8]:
221696.71637738176

In [9]:
entr_geo.head(2).geometry[0].centroid.x


Out[9]:
1008602.7080672013

In [10]:
entr_geo.head(2).geometry[0].centroid.y


Out[10]:
221696.71637732137

Because there are slight differences of geo-locations between two buffer lists, Here we Take 'geojson' Version for furthur study.


Use '2016_(May)_New_York_City_Subway_Station_Entrances.json'


Not '2016_(May)_New_York_City_Subway_Station_Entrances.zip'