This notebook demonstrates the some of the utility provided by the pygw
python package.
In this guide, we will show how you can use pygw
to easily:
In [ ]:
%pip install ../../../../python/src/main/python
In [1]:
import csv
with open("../../../java-api/src/main/resources/stateCapitals.csv", encoding="utf-8-sig") as f:
reader = csv.reader(f)
raw_data = [row for row in reader]
In [2]:
# Let's take a look at what the data looks like
raw_data[0]
Out[2]:
For the purposes of this exercise, we will use the state name ([0]
), capital name ([1]
), longitude ([2]
), latitude ([3]
), and the year that the capital was established ([4]
).
In [3]:
from pygw.geotools import SimpleFeatureTypeBuilder
from pygw.geotools import AttributeDescriptor
# Create the feature type builder
type_builder = SimpleFeatureTypeBuilder()
# Set the name of the feature type
type_builder.set_name("StateCapitals")
# Add the attributes
type_builder.add(AttributeDescriptor.point("location"))
type_builder.add(AttributeDescriptor.string("state_name"))
type_builder.add(AttributeDescriptor.string("capital_name"))
type_builder.add(AttributeDescriptor.date("established"))
# Build the feature type
state_capitals_type = type_builder.build_feature_type()
pygw
allows you to create SimpleFeature
instances for SimpleFeatureType
using a SimpleFeatureBuilder
.
The SimpleFeatureBuilder
allows us to specify all of the attributes of a feature, and then build it by providing a feature ID. For this exercise, we will use the index of the data as the unique feature id. We will use shapely
to create the geometries for each feature.
In [4]:
from pygw.geotools import SimpleFeatureBuilder
from shapely.geometry import Point
from datetime import datetime
feature_builder = SimpleFeatureBuilder(state_capitals_type)
features = []
for idx, capital in enumerate(raw_data):
state_name = capital[0]
capital_name = capital[1]
longitude = float(capital[2])
latitude = float(capital[3])
established = datetime(int(capital[4]), 1, 1)
feature_builder.set_attr("location", Point(longitude, latitude))
feature_builder.set_attr("state_name", state_name)
feature_builder.set_attr("capital_name", capital_name)
feature_builder.set_attr("established", established)
feature = feature_builder.build(str(idx))
features.append(feature)
Now that we have a set of SimpleFeatures
, let's create a data store to write the features into. pygw
supports all of the data store types that GeoWave supports. All that is needed is to first construct the appropriate DataStoreOptions
variant that defines the parameters of the data store, then to pass those options to a DataStoreFactory
to construct the DataStore
. In this example we will create a new RocksDB data store.
In [5]:
from pygw.store import DataStoreFactory
from pygw.store.rocksdb import RocksDBOptions
# Specify the options for the data store
options = RocksDBOptions()
options.set_geowave_namespace("geowave.example")
# NOTE: Directory is relative to the JVM working directory.
options.set_directory("./datastore")
# Create the data store
datastore = DataStoreFactory.create_data_store(options)
In [6]:
help(datastore)
To store data into our data store, we first have to register a DataTypeAdapter
for our simple feature data and create an index that defines how the data is queried. GeoWave supports simple feature data through the use of a FeatureDataAdapter
. All that is needed for a FeatureDataAdapter
is a SimpleFeatureType
. We will also add both spatial and spatial/temporal indices.
In [7]:
from pygw.geotools import FeatureDataAdapter
# Create an adapter for feature type
state_capitals_adapter = FeatureDataAdapter(state_capitals_type)
In [8]:
from pygw.index import SpatialIndexBuilder
from pygw.index import SpatialTemporalIndexBuilder
# Add a spatial index
spatial_idx = SpatialIndexBuilder().set_name("spatial_idx").create_index()
# Add a spatial/temporal index
spatial_temporal_idx = SpatialTemporalIndexBuilder().set_name("spatial_temporal_idx").create_index()
In [9]:
# Now we can add our type to the data store with our spatial index
datastore.add_type(state_capitals_adapter, spatial_idx, spatial_temporal_idx)
In [10]:
# Check that we've successfully registered an index and type
registered_types = datastore.get_types()
for t in registered_types:
print(t.get_type_name())
In [11]:
registered_indices = datastore.get_indices(state_capitals_adapter.get_type_name())
for i in registered_indices:
print(i.get_name())
In [12]:
# Create a writer for our data
writer = datastore.create_writer(state_capitals_adapter.get_type_name())
In [13]:
# Writing data to the data store
for ft in features:
writer.write(ft)
In [14]:
# Close the writer when we are done with it
writer.close()
In [15]:
from pygw.query import VectorQueryBuilder
# Create the query builder
query_builder = VectorQueryBuilder()
# When you don't supply any constraints to the query builder, everything will be queried
query = query_builder.build()
# Execute the query
results = datastore.query(query)
The results returned above is a closeable iterator of SimpleFeature
objects. Let's define a function that we can use to print out some information about these feature and then close the iterator when we are finished with it.
In [16]:
def print_results(results):
for result in results:
capital_name = result.get_attribute("capital_name")
state_name = result.get_attribute("state_name")
established = result.get_attribute("established")
print("{}, {} was established in {}".format(capital_name, state_name, established.year))
# Close the iterator
results.close()
In [17]:
# Print the results
print_results(results)
In [18]:
# A CQL expression for capitals that are in the northeastern part of the US
cql_expression = "BBOX(location, -87.83,36.64,-66.74,48.44)"
In [19]:
# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())
# If we want, we can tell the query builder to use the spatial index, since we aren't using time
query_builder.index_name(spatial_idx.get_name())
# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the cql constraints
constraints = constraints_factory.cql_constraints(cql_expression)
# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)
In [20]:
# Display the results
print_results(results)
In [21]:
# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())
# We can tell the builder to use the spatial/temporal index
query_builder.index_name(spatial_temporal_idx.get_name())
# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the spatial/temporal constraints builder
constraints_builder = constraints_factory.spatial_temporal_constraints()
# Create the spatial constraint geometry.
washington_dc_buffer = Point(-77.035, 38.894).buffer(10.0)
# Set the spatial constraint
constraints_builder.spatial_constraints(washington_dc_buffer)
# Set the temporal constraint
constraints_builder.add_time_range(datetime(1800,1,1), datetime.now())
# Build the constraints
constraints = constraints_builder.build()
# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)
In [22]:
# Display the results
print_results(results)
In [23]:
from pygw.query import FilterFactory
# Create the filter factory
filter_factory = FilterFactory()
# Create a filter that passes when the capital location is within 500 miles of the
# literal location of Washington DC
location_prop = filter_factory.property("location")
washington_dc_lit = filter_factory.literal(Point(-77.035, 38.894))
distance_km = 500 * 1.609344 # Convert miles to kilometers
distance_filter = filter_factory.dwithin(location_prop, washington_dc_lit, distance_km, "kilometers")
# Create a filter that passes when the capital name contains the letter L.
capital_name_prop = filter_factory.property("capital_name")
name_filter = filter_factory.like(capital_name_prop, "*l*")
# Create a filter that passes when the established date is after 1830
established_prop = filter_factory.property("established")
date_lit = filter_factory.literal(datetime(1830, 1, 1))
date_filter = filter_factory.after(established_prop, date_lit)
# Combine the name, distance, and date filters
combined_filter = filter_factory.and_([distance_filter, name_filter, date_filter])
# Create the query builder
query_builder = VectorQueryBuilder()
query_builder.add_type_name(state_capitals_adapter.get_type_name())
# Get the constraints factory
constraints_factory = query_builder.constraints_factory()
# Create the filter constraints
constraints = constraints_factory.filter_constraints(combined_filter)
# Set the constraints and build the query
query = query_builder.constraints(constraints).build()
# Execute the query
results = datastore.query(query)
In [24]:
# Display the results
print_results(results)
In [ ]:
%pip install pandas
Next we will import pandas and issue a query to the datastore to load into a dataframe.
In [25]:
from pandas import DataFrame
# Query everything
query = VectorQueryBuilder().build()
results = datastore.query(query)
# Load the results into a pandas dataframe
dataframe = DataFrame.from_records([feature.to_dict() for feature in results])
# Display the dataframe
dataframe
Out[25]:
In [ ]: