This notebook demonstrates the some of the utility provided by the pygw
python package.
In this guide, we will show how you can use pygw
to easily:
To make this guide more interesting, we will be playing with this toy-data set from Kaggle on Boston Public School buildings
In [1]:
# Install pygw
!pip install ../main/python/
In [2]:
import pygw
# --- Importing Relevant Modules ---
# Data Stores module
import pygw.stores
# Index module
import pygw.indices
# Geotools support
import pygw.geotools
# Query module
import pygw.query
In [ ]:
import csv
with open("public_schools.csv", encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
raw_data = [row for row in reader]
In [ ]:
# Let's take a look at what the data looks like
raw_data[0]
For the purposes of this exercise, let's just look at the ADDRESS
, X
, Y
, and BLDG_NAME
properties of each datapoint.
We can define a data schema for our needs & create an appropriate SimpleFeatureType
. The SimpleFeatureType
constructor takes in varargs for the kinds of attributes we want our type to have.
We can easily create these with data-type specific convenience methods for constructing Attributes like SimpleFeatureTypeAttribute.string
In [ ]:
from pygw.geotools import SimpleFeatureType as SFT
from pygw.geotools import SimpleFeatureTypeAttribute as SFTAttr
# Creating the Data Type for Public Schools data
pub_school_dt = SFT("public_schools",
SFTAttr.string("building_name"),
SFTAttr.string("address"),
SFTAttr.geometry("coordinates")) # Let's group X and Y as a coordinate
PyGw allows you to create SimpleFeature
instances straight from a SimpleFeatureType
. We can use the SimpleFeatureType.create_feature
method to do so easily!
SimpleFeatureType.create_feature
takes in an id
and kwargs
corresponding to the attribute descriptions associated with the type when we first created it.
In [ ]:
features = []
for bldg in raw_data:
data_id = int(bldg["BLDG_ID"])
addr = bldg["ADDRESS"]
name = bldg["BLDG_NAME"]
coords = (float(bldg["X"]), float(bldg["Y"]))
ft = pub_school_dt.create_feature(data_id, building_name=name, address=addr, coordinates=coords)
features.append(ft)
In [ ]:
store = pygw.stores.RocksDbDs(gw_namespace="pygw.boston_schools.example", dir="./schools")
In [ ]:
help(store)
In [ ]:
# We provide a convenience method to get the type adapter straight from the SimpleFeatureType!
pub_school_adapter = pub_school_dt.get_type_adapter()
In [ ]:
# We want to index by coordinates so we want a spatial index
index = pygw.indices.SpatialIndex()
In [ ]:
# Add our type to our data store
store.add_type(pub_school_adapter, index)
In [ ]:
# Check that we've successfully registered an index and type
store.get_types()
In [ ]:
store.get_indices()
In [ ]:
# Create a writer for our data
writer = store.create_writer(pub_school_dt.get_name())
In [ ]:
# Writing data to the data store
for ft in features:
writer.write(ft)
In [ ]:
writer.close()
In [ ]:
from pygw.query import Query
# `Query.everything` is a convenience method for creating an 'Everything` query
results = store.query(Query.everything())
In [ ]:
# The results returned above was an interator, so let's convert to a list
results = [r for r in results]
In [ ]:
# Do we have anything"?
len(results)
Unfortunately pretty pygw
wrapping of returned results from a query is not yet supported. However, we can use the pygw.debug.print_obj
method to see what things look like:
In [ ]:
from pygw.debug import print_obj
In [ ]:
print_obj(results[0])
Woo-hoo! We've successfully ingested our custom data into our data store. That's cool, but now what? ... Can pygw
do more?
Let's say we wanted to get retrieve only the public school buildings in East Boston -- How would we go about doing that? For the purposes of this, let's just say we want schools to the East of Franklin Park Zoo, which has coordinates:
42.3055° N, 71.0900° W
--> (-71.0900, 42.3055)
In [ ]:
# A CQL query for things east of the zoo
cql_query_string = "BBOX(coordinates,-71.0900,-180,180,180)"
In [ ]:
# Getting the results iterable
results = store.query(Query.cql(cql_query_string))
In [ ]:
# list of results
results = [r for r in results]
In [ ]:
# Less than before!
len(results)
In [ ]:
print_obj(results[0])
Let's say we still want to query for buildings to the East of the zoo, but also we only want to find buildings that exist on "Avenue"s. We can do that!
In [ ]:
cql_query_string = "BBOX(coordinates,-71.0900,-180,180,180) and address like '%Avenue'"
In [ ]:
# Getting the results iterable
results = store.query(Query.cql(cql_query_string))
results = [r for r in results]
len(results)
In [ ]:
print_obj(results[0])
In [ ]:
# DELETE EVERYTHING
store.delete_all()
In [ ]: