If any part of this notebook is used in your research, please cite with the reference found in README.md.


Caveats

Demonstrating known caveats and tweaks in spaghetti

Author: James D. Gaboardi jgaboardi@gmail.com

This notebook demonstrates the following limitations and caveats:

  1. Limitations in non-planarity
  2. Relaxing vertex coordinate precision
  3. Distance matrices and shortest paths between (dis)connected components

In [1]:
%load_ext watermark
%watermark


2020-05-02T21:41:58-04:00

CPython 3.7.3
IPython 7.10.2

compiler   : Clang 9.0.0 (tags/RELEASE_900/final)
system     : Darwin
release    : 19.4.0
machine    : x86_64
processor  : i386
CPU cores  : 4
interpreter: 64bit

In [2]:
import geopandas
import libpysal
from libpysal.cg import Point, Chain
import matplotlib
import matplotlib.pyplot as plt
import spaghetti
%matplotlib inline
%watermark -w
%watermark -iv


watermark 2.0.2
spaghetti  1.5.0.rc0
libpysal   4.2.2
matplotlib 3.1.2
geopandas  0.7.0


In [3]:
try:
    from IPython.display import set_matplotlib_formats
    set_matplotlib_formats("retina")
except ImportError:
    pass
Helper functions for arc and vertex labeling

In [4]:
def arc_labels(a, b, s, offset=[-0.025, 0.025]):
    """Label each network arc."""
    def _lab_loc(_x):
        """Helper for labeling network arcs."""
        xy = _x.geometry.interpolate(0.5, normalized=True).coords[0]
        xy = tuple([_xy+o for (_xy,o) in zip(xy,offset)])
        return xy
    kws = {"ha":"right", "va":"bottom","weight":"bold","color":"k","size":s}
    a.apply(lambda x: b.annotate(s=x.comp_label, xy=_lab_loc(x), **kws), axis=1)

def vert_labels(v, b, s, offset=[0.025, 0.025], col="comp_label", c="r"):
    """Label each network vertex."""
    def _lab_loc(_x):
        """Internal helper for labeling vertices."""
        xy = _x.geometry.coords[0]
        xy = tuple([_xy+o for (_xy,o) in zip(xy,offset)])
        return xy
    kws = {"ha":"left", "va":"bottom","weight":"bold","color":c,"size":s}
    v.apply(lambda x: b.annotate(s=x[col], xy=_lab_loc(x), **kws), axis=1)

1. Limitations in non-planarity

1.a — Barbed wire example
Create some line segments of libpysal.cg.Chain objects

In [5]:
lines = [
    Chain([Point((2,7)), Point((2,3))]),
    Chain([Point((2,3)), Point((0,3))]),
    Chain([Point((0,3)), Point((0,5))]),
    Chain([Point((0,5)), Point((5,5))]),
    Chain([Point((5,5)), Point((5,7))]),
    Chain([Point((5,7)), Point((3,7))]),
    Chain([Point((3,7)), Point((3,3))])
]
lines


Out[5]:
[<libpysal.cg.shapes.Chain at 0x12aeb4a90>,
 <libpysal.cg.shapes.Chain at 0x12aeb4b38>,
 <libpysal.cg.shapes.Chain at 0x12aeb4be0>,
 <libpysal.cg.shapes.Chain at 0x12aeb4c88>,
 <libpysal.cg.shapes.Chain at 0x12aeb4d30>,
 <libpysal.cg.shapes.Chain at 0x12aeb4dd8>,
 <libpysal.cg.shapes.Chain at 0x12aeb4e80>]
Instantiate a spaghetti.Network object

In [6]:
ntw = spaghetti.Network(in_data=lines, extractgraph=False)
Extract the network vertices and arcs

In [7]:
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)
vertices_df


Out[7]:
id geometry comp_label
0 0 POINT (2.00000 7.00000) 0
1 1 POINT (2.00000 3.00000) 0
2 2 POINT (0.00000 3.00000) 0
3 3 POINT (0.00000 5.00000) 0
4 4 POINT (5.00000 5.00000) 0
5 5 POINT (5.00000 7.00000) 0
6 6 POINT (3.00000 7.00000) 0
7 7 POINT (3.00000 3.00000) 0

In [8]:
arcs_df


Out[8]:
id geometry comp_label
0 (0, 1) LINESTRING (2.00000 7.00000, 2.00000 3.00000) 0
1 (1, 2) LINESTRING (2.00000 3.00000, 0.00000 3.00000) 0
2 (2, 3) LINESTRING (0.00000 3.00000, 0.00000 5.00000) 0
3 (3, 4) LINESTRING (0.00000 5.00000, 5.00000 5.00000) 0
4 (4, 5) LINESTRING (5.00000 5.00000, 5.00000 7.00000) 0
5 (5, 6) LINESTRING (5.00000 7.00000, 3.00000 7.00000) 0
6 (6, 7) LINESTRING (3.00000 7.00000, 3.00000 3.00000) 0
Plot

In [9]:
base = arcs_df.plot(lw=3, color="k", alpha=.35, figsize=(9, 9))
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# vertex labels
vert_labels(vertices_df, base, 14)
vert_labels(vertices_df, base, 14, offset=[-0.15, 0.025], col="id", c="k")


The plot above shows network arcs in gray with vertices in red. Also, vertices are labled in black and component membership is labeled in red. Since there are no actual intersections between segments (0,1),(3,4) or (3,4),(6,7), there are no vertices. Therefore, a shortest path from vertex 0 to 7 must follow the route {1,2,3,4,5,6} for a total of 21 units of distance.


In [10]:
ntw.full_distance_matrix(n_processes=1, gen_tree=True)

In [11]:
ntw.network_trees[0][7]


Out[11]:
[6, 5, 4, 3, 2, 1, 0]

In [12]:
ntw.distance_matrix[0][7]


Out[12]:
21.0

1.b — Goalpost example
Create some line segments of libpysal.cg.Chain objects

In [13]:
U = [Chain([Point((2,2)), Point((2,1)), Point((4,1)), Point((4,2))])]
I = [Chain([Point((3,0)), Point((3,1))])]
arcs = U + I
arcs


Out[13]:
[<libpysal.cg.shapes.Chain at 0x131839898>,
 <libpysal.cg.shapes.Chain at 0x131839cf8>]
Instantiate a spaghetti.Network object

In [14]:
ntw = spaghetti.Network(in_data=arcs)


/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There is 1 island with id: (4, 5).
  warnings.warn(message)
/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There are 2 islands with ids: (0, 3), (4, 5).
  warnings.warn(message)
Examine the components

There are two components.


In [15]:
ntw.network_n_components


Out[15]:
2

Component 0 includes 4 vertices (the "goal") and component 1 includes 2 (the "post").


In [16]:
ntw.network_component_vertices


Out[16]:
{0: [0, 1, 2, 3], 1: [4, 5]}
Extract the network vertices and arcs

In [17]:
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)
vertices_df


Out[17]:
id geometry comp_label
0 0 POINT (2.00000 2.00000) 0
1 1 POINT (2.00000 1.00000) 0
2 2 POINT (4.00000 1.00000) 0
3 3 POINT (4.00000 2.00000) 0
4 4 POINT (3.00000 0.00000) 1
5 5 POINT (3.00000 1.00000) 1

In [18]:
arcs_df


Out[18]:
id geometry comp_label
0 (0, 1) LINESTRING (2.00000 2.00000, 2.00000 1.00000) 0
1 (1, 2) LINESTRING (2.00000 1.00000, 4.00000 1.00000) 0
2 (2, 3) LINESTRING (4.00000 1.00000, 4.00000 2.00000) 0
3 (4, 5) LINESTRING (3.00000 0.00000, 3.00000 1.00000) 1
Plot

In [19]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), **base_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# vertex labels
vert_labels(vertices_df, base, 14)


As stated previously, flow from one network segment to another is only possible when segments share a vertex. In this case, no vertex is actually shared, although superficially there does appear to be. The plot above visualizes this concept by showing the two (dis)connected components of the "goalpost" network. Network arcs are colored by component membership and vertices are labeled by component membership. This is also demonstrated below when overlaying a plot of libpysal.weights.W. This is accomplished below with arcs also shown with component membership (black nodes connected with black edges and labeled in black).


In [20]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# arc labels
arc_labels(arcs_df, base, 12)
# vertex labels
vert_labels(vertices_df, base, 14)


However, if the line segments were digitized "properly" a single connected component can be obtained (we emphasize "properly" because segments of a spatial network do not always intersect to form a single component in reality, for example highway over/underpases). In order to obtained a single connected component in this case we can simply add another point—(3,1)—to the initial collection of lines prior to network instantiation.


In [21]:
U = [Chain([Point((2,2)), Point((2,1)), Point((3,1)), Point((4,1)), Point((4,2))])]
I = [Chain([Point((3,0)), Point((3,1))])]
arcs = U + I
ntw = spaghetti.Network(in_data=arcs)
print("Network components:\t", ntw.network_n_components)
print("Network vertices:\t", ntw.network_component_vertices)
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)


Network components:	 1
Network vertices:	 {0: [0, 1, 2, 3, 4, 5]}

In [22]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# arc labels
arc_labels(arcs_df, base, 12)
# vertex labels
vert_labels(vertices_df, base, 14)


Now both the network vertices and spatially-weighted arcs are all members of a single connected component.


1.c — Empirical Example: John Snow's Cholera Map

In [23]:
libpysal.examples.explain("snow_maps")


snow_maps
=========

Public water pumps and Cholera deaths in London 1954 (Snow's Map)
-----------------------------------------------------------------

* SohoPeople.dbf: attribute data for Cholera deaths. (k=2)
* SohoPeople.prj: ESRI projection file.
* SohoPeople.sbn: spatial index.
* SohoPeople.sbx: spatial index.
* SohoPeople.shp: Point shapefile for Cholera deaths. (n=324)
* SohoPeople.shx: spatial index.
* SohoWater.dbf: attribute data for public water pumps. (k=1)
* SohoWater.prj: ESRI projection file.
* SohoWater.sbn: spatial index.
* SohoWater.sbx: spatial index.
* SohoWater.shp: Point shapefile for public water pumps. (n=13)
* SohoWater.shx: spatial index.
* Soho_Network.dbf: attribute data for street network. (k=1)
* Soho_Network.prj: ESRI projection file.
* Soho_Network.sbn: spatial index.
* Soho_Network.sbx: spatial index.
* Soho_Network.shp: Line shapefile for street network. (n=118)
* Soho_Network.shx: spatial index.


In [24]:
soho = geopandas.read_file(libpysal.examples.get_path("Soho_Network.shp"))
soho = soho.to_crs(epsg=32630)
soho


Out[24]:
Id geometry
0 0 LINESTRING (698394.862 5710772.742, 698230.823...
1 0 LINESTRING (698373.590 5710607.200, 698481.244...
2 0 LINESTRING (698466.845 5710652.970, 698381.993...
3 0 LINESTRING (698373.590 5710607.200, 698322.058...
4 0 LINESTRING (698322.058 5710573.561, 698304.893...
... ... ...
113 0 LINESTRING (698851.874 5710691.953, 698837.518...
114 0 LINESTRING (698860.680 5710803.234, 698825.980...
115 0 LINESTRING (698748.567 5710903.073, 698703.656...
116 0 LINESTRING (698715.115 5710881.223, 698695.567...
117 0 LINESTRING (698673.506 5710931.132, 698695.567...

118 rows × 2 columns


In [25]:
ntw = spaghetti.Network(in_data=soho)


/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 45 disconnected components.
 There are 24 islands with ids: (22, 23), (28, 29), (30, 31), (47, 48), (49, 50), (58, 59), (67, 68), (75, 76), (84, 85), (86, 87), (108, 109), (110, 111), (121, 122), (128, 129), (172, 173), (183, 184), (185, 186), (190, 191), (208, 209), (212, 213), (216, 217), (218, 219), (220, 221), (222, 223).
  warnings.warn(message)
/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 45 disconnected components.
 There are 37 islands with ids: (5, 149), (22, 23), (24, 27), (28, 29), (30, 31), (47, 48), (49, 50), (58, 59), (67, 68), (72, 74), (75, 76), (84, 85), (86, 87), (89, 90), (108, 109), (110, 111), (121, 122), (124, 127), (128, 129), (150, 152), (153, 155), (163, 215), (165, 211), (168, 170), (172, 173), (183, 184), (185, 186), (190, 191), (193, 195), (197, 199), (208, 209), (212, 213), (216, 217), (218, 219), (220, 221), (222, 223), (224, 226).
  warnings.warn(message)

In [26]:
ntw.network_n_components


Out[26]:
45

In [27]:
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(12, 12), **base_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1321d1cc0>

In this dataset that are 45 components of the networks, indicating some serious issues with intended connectivity. When overlayed with arc spatial weights it becomes clear that many of the arcs that appear to (and should) intersect, actually do not.


In [28]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(12, 12), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2);


Fuzzy contiguity can be acheived through libpysal.weights.fuzzy_contiguity(), which provides us with a single connected component, but does not capture the true nature of the network stucture, as demonstrated below.


In [29]:
fuzzy = libpysal.weights.fuzzy_contiguity(arcs_df)
fuzzy.remap_ids(arcs_df.id)
fuzzy.n_components


Out[29]:
1

In [30]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(12, 12), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
fuzzy.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2);


The issue with the Soho_Network.shp dataset is not precision, it is that not all true intersections are demarcated with vertices, as in the "goalpost" example above. This is clearly seen in the upper two segments of the network exterior where each segment is a single object, not broken at joining segments. In order to properly use the Soho_Network.shp dataset as a network, geometric preprocessing would need to be performed in order to extract all intersections.


2. Precision limitations and/or known issues with digitization

Revisting the "goalpost" network — Slightly imprecise coordinates

Here the "I" of the goalpost has been digitized slightly short of the intersection with the "U" component.


In [31]:
U = [Chain([Point((2,2)), Point((2,1)), Point((3,1)), Point((4,1)), Point((4,2))])]
I = [Chain([Point((3,0)), Point((3,0.99999))])]
arcs = U + I
ntw = spaghetti.Network(in_data=arcs)
print("Network components:\t", ntw.network_n_components)
print("Network vertices:\t", ntw.network_component_vertices)
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)


Network components:	 2
Network vertices:	 {0: [0, 1, 2, 3, 4], 1: [5, 6]}
/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There is 1 island with id: (5, 6).
  warnings.warn(message)
/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There are 2 islands with ids: (0, 4), (5, 6).
  warnings.warn(message)

In [32]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
plt.xlim(2.9999, 3.0001)
plt.ylim(0.9999, 1.0001);


A small gap is shown in the plot above resulting in two network components.


In [33]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# arc labels
arc_labels(arcs_df, base, 12)
# vertex labels
vert_labels(vertices_df, base, 14)


In order to accomodate situations like these where network elements are digitized in generally the correct locations but the precision is lacking, the vertex_atol parameter can be invoked when instantiating the network. This parameter allows for the rounding of decimal places (not significant digits — see vertex_sig), but should be used with caution.


In [34]:
ntw = spaghetti.Network(in_data=arcs, vertex_atol=3)
print("Network components:\t", ntw.network_n_components)
print("Network vertices:\t", ntw.network_component_vertices)
vertices_df, arcs_df = spaghetti.element_as_gdf(ntw, vertices=True, arcs=True)


Network components:	 1
Network vertices:	 {0: [0, 1, 2, 3, 4, 5]}

In [35]:
base_kws = {"alpha":.5, "lw":5, "cmap":"Paired", "column":"comp_label"}
base = arcs_df.plot(figsize=(9, 9), zorder=0, **base_kws)
# plot keywords
node_kws, edge_kws = {"s":100, "zorder":2}, {"zorder":1}
w_kws = {"edge_kws":edge_kws, "node_kws":node_kws}
ntw.w_network.plot(arcs_df, indexed_on="id", ax=base, **w_kws)
vertices_df.plot(ax=base, fc="r", ec="k", markersize=50, zorder=2)
# arc labels
arc_labels(arcs_df, base, 12)
# vertex labels
vert_labels(vertices_df, base, 14)



3. Distance matrices and shortest paths for multiple connected components

Continuing with the original "goalpost" network comprised of 2 components

In [36]:
U = [Chain([Point((2,2)), Point((2,1)), Point((4,1)), Point((4,2))])]
I = [Chain([Point((3,0)), Point((3,1))])]
arcs = U + I
ntw = spaghetti.Network(in_data=arcs)
ntw.full_distance_matrix(n_processes=1, gen_tree=True)


/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There is 1 island with id: (4, 5).
  warnings.warn(message)
/Users/jgaboardi/miniconda3/envs/py3_spgh_dev/lib/python3.7/site-packages/libpysal/weights/weights.py:167: UserWarning: The weights matrix is not fully connected: 
 There are 2 disconnected components.
 There are 2 islands with ids: (0, 3), (4, 5).
  warnings.warn(message)

In [37]:
ntw.network_component_vertices


Out[37]:
{0: [0, 1, 2, 3], 1: [4, 5]}

In [38]:
ntw.distance_matrix


Out[38]:
array([[ 0.,  1.,  3.,  4., inf, inf],
       [ 1.,  0.,  2.,  3., inf, inf],
       [ 3.,  2.,  0.,  1., inf, inf],
       [ 4.,  3.,  1.,  0., inf, inf],
       [inf, inf, inf, inf,  0.,  1.],
       [inf, inf, inf, inf,  1.,  0.]])

From vertex 0 only vertices {1,2,3} can be reached. This is shown in the distance matrix by inf values in the 4th and 5th columns, and by identical origin-destination shortest paths in the network trees.


In [39]:
ntw.distance_matrix[0]


Out[39]:
array([ 0.,  1.,  3.,  4., inf, inf])

In [40]:
ntw.network_trees[0]


Out[40]:
{0: [0], 1: [0], 2: [1, 0], 3: [2, 1, 0], 4: [4], 5: [5]}

Like the example in cell 21 above, by inserting a vertex complete network connectivity can be achieved.


In [41]:
U = [Chain([Point((2,2)), Point((2,1)), Point((3,1)), Point((4,1)), Point((4,2))])]
I = [Chain([Point((3,0)), Point((3,1))])]
arcs = U + I
ntw = spaghetti.Network(in_data=arcs)
ntw.full_distance_matrix(n_processes=1, gen_tree=True)

In [42]:
ntw.network_component_vertices


Out[42]:
{0: [0, 1, 2, 3, 4, 5]}

In [43]:
ntw.distance_matrix


Out[43]:
array([[0., 1., 2., 3., 4., 3.],
       [1., 0., 1., 2., 3., 2.],
       [2., 1., 0., 1., 2., 1.],
       [3., 2., 1., 0., 1., 2.],
       [4., 3., 2., 1., 0., 3.],
       [3., 2., 1., 2., 3., 0.]])

In [44]:
ntw.distance_matrix[0]


Out[44]:
array([0., 1., 2., 3., 4., 3.])

In [45]:
ntw.network_trees[0]


Out[45]:
{0: [0], 1: [0], 2: [1, 0], 3: [2, 1, 0], 4: [3, 2, 1, 0], 5: [2, 1, 0]}