Author: Hu Shao shaohutiger@gmail.com
Testing whether a point locates inside a polygon is a time consuming work. Especially when the point number is huge and the boundary of study area is complex.
Here we implement a "Fast Point in Polygon Testing Method" to help users determine whether a point is inside a specific polygon rapidly.
The Quadtree structure is employed to divide a polygon into plenty of lattices. During the dividing, each lattices is marked as in, out or maybe, which means "totally lies inside the polygon", "totally lies out side the polygon" and "intersect with the boundary if the polygon".
For each point, we firstly allocate it into a lattice. If the lattice is marked as "in" or "out", we will immediately know the result; if the lattice is marked as "maybe", we then need to do some further calculation, which won't be complex.
Once the quadtree structure is constructed, the "point in polygon testing" will be very fast.
This method is suitable for the situations where point numbers are huge, polygon numbers are small and polygon boundary is complex. E.g., simulation in point pattern analysis.
In [1]:
# Import essential libraries for following calculation
import libpysal as ps
import numpy as np
from libpysal.cg.shapes import Ring, Polygon
from libpysal.cg.segmentLocator import BruteSegmentLocator
from libpysal.cg.polygonQuadTreeStructure import QuadTreeStructureSingleRing
import libpysal.examples as pysal_examples
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import time
# import codecs
# import shapely
# from pysal.cg.shapes import Polygon, Point
# from shapely.geometry import Polygon as spPolygon
# from shapely.geometry import Point as spPoint
# import time
import random
print("import finished!")
In [5]:
def get_ring_from_file(path):
vertices = []
file = open(path, "r")
for line in file:
if len(line)<2:
continue
coordinates = line.split(",")
x = (float)(coordinates[0])
y = (float)(coordinates[1])
vertices.append((x, y))
file.close()
return vertices
# Prepare the polygons for future use.
Texas = Polygon(get_ring_from_file("data/texas_points.txt"))
Pecos = Polygon(get_ring_from_file("data/pecos_points.txt"))
San_Saba = Polygon(get_ring_from_file("data/san_saba_points.txt"))
Texas_with_holes = Polygon(Texas.vertices,
[Pecos.vertices, San_Saba.vertices])
In [6]:
# construct the quadtree structure by explicitly calling the function "build_quad_tree_structure" in polygon
Texas.build_quad_tree_structure()
print "quad tree build finished"
# create some random point and test if these points locate in the polygon
for i in range(0, 10):
x = random.uniform(Texas.bbox[0], Texas.bbox[2])
y = random.uniform(Texas.bbox[1], Texas.bbox[3])
print(x, y, Texas.contains_point([x, y]))
In [7]:
# The region of Texas, to make the steps more clear, here we only use the main region
texas_main_vertices = Texas.parts[0]
fig, ax = plt.subplots(figsize=(16,11))
qts_singlering = QuadTreeStructureSingleRing(Ring(texas_main_vertices))
patches = []
color_array = []
cells_to_draw = [qts_singlering.root_cell]
while len(cells_to_draw) > 0:
cell = cells_to_draw.pop()
if cell.children_l_b is None:
# this is a leaf in the quad tree structure, draw it
verts = [
[cell.min_x, cell.min_y],
[cell.min_x, cell.min_y + cell.length_y],
[cell.min_x + cell.length_x, cell.min_y + cell.length_y],
[cell.min_x + cell.length_x, cell.min_y],
[cell.min_x, cell.min_y]
]
patches.append(verts)
if cell.status == "in":
color_array.append("#c8e6c9") # in color green
elif cell.status == "out":
color_array.append("#b0bec5") # in color grey
else: # means "maybe"
color_array.append("#ffa726") # in color orange
else:
cells_to_draw.append(cell.children_l_b)
cells_to_draw.append(cell.children_l_t)
cells_to_draw.append(cell.children_r_b)
cells_to_draw.append(cell.children_r_t)
coll = matplotlib.collections.PolyCollection(np.array(patches), facecolors=color_array, edgecolors='#eceff1')
ax.add_collection(coll)
point_x_list = []
point_y_list = []
for point in texas_main_vertices:
point_x_list.append(point[0])
point_y_list.append(point[1])
plt.plot(point_x_list, point_y_list)
ax.autoscale_view()
plt.show()
In [9]:
plt.figure(figsize=(16,12))
for vertices in Texas_with_holes.parts:
line_x_list = []
line_y_list = []
for point in vertices:
line_x_list.append(point[0])
line_y_list.append(point[1])
plt.plot(line_x_list, line_y_list, c="#6a1b9a")
for vertices in Texas_with_holes.holes:
line_x_list = []
line_y_list = []
for point in vertices:
line_x_list.append(point[0])
line_y_list.append(point[1])
plt.plot(line_x_list, line_y_list, c="#1565c0")
point_x_list = []
point_y_list = []
point_colors = []
bbox = Texas_with_holes.bbox
for i in range(0, 1000):
x = random.uniform(bbox[0], bbox[2])
y = random.uniform(bbox[1], bbox[3])
point_x_list.append(x)
point_y_list.append(y)
if Texas_with_holes.contains_point([x, y]):
point_colors.append("#e57373") # inside, red
else:
point_colors.append("#4db6ac") # outside, green
plt.scatter(point_x_list, point_y_list, c = point_colors, linewidth = 0)
plt.show()
In [10]:
# construct a study area with 3000+ vertices
Huangshan = Polygon(get_ring_from_file("data/study_region_huangshan_point.txt"))
points = []
bbox = Huangshan.bounding_box
for i in range(0, 10000):
x = random.uniform(bbox[0], bbox[2])
y = random.uniform(bbox[1], bbox[3])
points.append((x, y))
print str(len(points)) + " random points generated"
print "------------------------------"
print "Begin test without quad-tree-structure"
time_begin = int(round(time.time() * 1000))
for point in points:
Huangshan.contains_point(point)
time_end = int(round(time.time() * 1000))
print "Test without quad-tree-structure finished, time used = " + str((time_end-time_begin)/1000.0) + "s"
print "------------------------------"
print "Begin test with quad-tree-structure"
time_begin = int(round(time.time() * 1000))
Huangshan.build_quad_tree_structure()
count_error = 0
for point in points:
Huangshan.contains_point(point)
time_end = int(round(time.time() * 1000))
print "Test with quad-tree-structure finished, time used = " + str((time_end-time_begin)/1000.0) + "s"
In [12]:
# polygons = ps.open("data/Huangshan_region.shp") # read the research region shape file
# research_region = polygons[0] # set the first polygon as research polygon
# len(research_region.vertices)
vertices = get_ring_from_file("data/study_region_huangshan_point.txt")
print "Study region read finished, with vertices of " + str(len(vertices))
huangshan = Ring(vertices)
points = []
bbox = huangshan.bounding_box
for i in range(0, 1000):
x = random.uniform(bbox[0], bbox[2])
y = random.uniform(bbox[1], bbox[3])
points.append([x, y, True])
print str(len(points)) + " random points generated"
# First, test if these points are inside of the polygon by using the conventional method, record the result
for point in points:
is_in = huangshan.contains_point((point[0], point[1]))
point[2] = is_in
# Then, build the quad-tree and do the test again. Compare the results of two methods.
count_error = 0
huangshan.build_quad_tree_structure()
for point in points:
is_in = huangshan.contains_point((point[0], point[1]))
if point[2] != is_in:
print "Error found!!!"
count_error += 1
if count_error == 0:
print "finished ==================== no error found"
else:
print "finished ==================== ERROR FOUND"
A huge number of points will be simulated and test if falls in the study area in the real case. This calculation process is very computing intensive. Especially when the boundary of study area is complex and contains a lot of segments, or the simulation time is also large.
In order to fast decide whether a point is contained in the study area, we can prepare an grid structure which divide the study area into quadtree based regular rectangles. Each rectangles will have a specific status from ['in', 'out', 'maybe']. After we prepared this kind of grid structure, deciding whether a points falls in the study area will be very easy: first, allocate the point into a specific cell. For the cell with different status:
Treat the boundary of the study area as arc. Each time of dividing the study area means use two straight lines (on horizontal and one vertical) to split a big rectangle (cell) into 4 smaller ones. During this process, the arc should also be used to intersect with the straight lines and break into small segments. Different segments should belong to different cells and can be used to determine the status of the cell (as we mentioned: in, out or maybe inside of the study area.) Repeat this process until the cell's size is small enough.
During the dividing, there are some special properties of the arcs we need to know:
The following image depicts the categorize rule of cell boundary when being divided into sub cells:
cell_boundary_category_rule
segment_sequence_search_rule
In the situations that there are some arcs intersect with a cell and we need to extract the segment squence, here is the rule:
From the image we can see that the red border lines also belong to the segment sequence.
During the quadtree dividing, when there is a cell who doesn't intersect with any arc, we need to determine whether this cell is totally within the study area or not. we Can use the method above to determine: If this cell share a border which belongs to other cells' segment sequence, then this cell is totally within the study area; vice versa
extract_connecting_boders_between_points
situation_segment_intersect_with_two_split_line
Under the sitiation that a single segment intersects with both split-lines. This kind of situation should be carefully treated.
Point in Polygon Strategies http://erich.realtimerendering.com/ptinpoly/
Samet, Hanan. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006.
Jiménez, Juan J., Francisco R. Feito, and Rafael J. Segura. "A new hierarchical triangle-based point-in-polygon data structure." Computers & Geosciences 35, no. 9 (2009): 1843-1853.
http://stackoverflow.com/questions/12881848/draw-polygons-more-efficiently-with-matplotlib
http://matplotlib.org/api/collections_api.html
http://materializecss.com/color.html