Karta provides a set of tools for analysing geographical data. The organization of Karta is around a set of classes for representing vector and raster data. These classes contain built-in methods for common tasks, and are easily extended for more specialized processing. This tutorial provides a brief introduction to the elements of Karta.
Should you come across any mistakes, please file a bug or submit a pull request on Github!
The following examples are shown using Python 3, however Karta is supported on Python 2.7+ and Python 3.4+.
Vector data are data that can be treated as a set of connected or disconnected vertices. Examples might be road networks, a set of borders, geophysical survey lines, or the path taken by a bottle floating in an ocean current. In Karta, these data are classified as belonging to Point, Line and Polygon classes, and their Multipart equivalents Multipoint, Multiline, and Multipolygon. Some questions that might be asked of vector data include
Raster data is data that are typically thought of in terms of pixels or a grid of values covering a surface. Examples might be an elevation model, a satellite image, or an upstream area map. Depending on what the data represents, one might
The term coordinate reference system refers to a system of relating numerical coordinates to actual positions on Earth. Karta includes methods for geodetic calculations and basic support of projected and geographical coordinates, as well as coordinate system classes backed by pyproj.
In [1]:
from karta import Point, Line, Polygon, Multipoint, Multiline, Multipolygon
The Point
, Line
, and Polygon
classes can all be instantiated by providing vertices, and optionally, associated properties. The Multipart Multipoint
, Multiline
, and Multipolygon
classes are similar, and can additionally include part-specific tabular metadata.
In [2]:
pt = Point((-123.1, 49.25))
print(pt)
In [3]:
mpt = Multipoint([(-122.93, 48.62),
(-123.10, 48.54),
(-122.90, 48.49),
(-122.81, 48.56)],
data={"color": ["red", "blue", "green", "yellow"],
"value": [2, 1, 3, 5]})
print(mpt)
print(mpt.data)
In [4]:
line = Line([(-124.35713, 49.31437),
(-124.37857, 49.31720),
(-124.39442, 49.31833),
(-124.40311, 49.31942),
(-124.41052, 49.32203),
(-124.41681, 49.32477),
(-124.42278, 49.32588)])
print(line)
In [5]:
poly = Polygon([(-25.41, 67.03),
(-24.83, 62.92),
(-12.76, 63.15),
(-11.44, 66.82)])
print(poly)
Each geometrical object now contains a vertex/vertices in a cartesian plane.
We may be interested in determining whether the Point
created about is within the Polygon
:
In [6]:
print(poly.contains(pt)) # False
In [7]:
# but this one is:
pt2 = Point((-25, 65))
print(poly.contains(pt2)) # True
We also can test whether the Line
from above crosses the Polygon
:
In [8]:
print(line.intersects(poly)) # False
Or compute the shortest distance between the Point
and the Line
:
In [9]:
print(line.shortest_distance_to(pt))
There are methods for computing the nearest vertex to an external point, or the nearest point on an edge to an external point:
In [10]:
pt = Point((0.0, 60.0))
print(poly.nearest_vertex_to(pt))
print(poly.nearest_on_boundary(pt))
The positions of objects with multiple vertices can be sliced and iterated through:
In [11]:
subline = line[2:-2]
print(subline)
for pt in subline:
print(pt)
A slice that takes part of a polygon returns a line.
In [12]:
print(poly[:2])
Points have a distance()
method that calculates the distance to another point.
In [13]:
pt = Point((-123.1, 49.25))
pt2 = Point((-70.66, 41.52))
print(pt.distance(pt2))
By default, geometries in Karta use a planar cartesian coordinate system. If our positions are meant to be geographical coordinates, then we can provide the crs
argument to each geometry at creation, as in
In [14]:
from karta.crs import LonLatWGS84
pt = Point((-123.1, 49.25), crs=LonLatWGS84)
pt2 = Point((-70.66, 41.52), crs=LonLatWGS84)
pt.distance(pt2)
Out[14]:
which now gives the great circle distance between point on the Earth, in meters. We can mix coordinate systems to some degree, with Karta performing the necessary transformations in the background:
In [15]:
from karta.crs import WebMercator
pt_web = Point((-14000000, 6300000), crs=WebMercator)
print(pt.distance(pt_web)) # distance in coordinate system units of *pt*
When the coordinate system is specified, all geometrical methods obey that coordinate system. We can use this to perform queries, such which American state capitols are within 2000 km of Mexico City?
In [16]:
from karta.examples import us_capitols
mexico_city = Point((-99.13, 19.43), crs=LonLatWGS84)
# Filter those within 2000 km of Mexico City
nearby = list(filter(lambda pt: pt.distance(mexico_city) < 2000e3, us_capitols))
for capitol in nearby:
print("{0:4.0f} km {1}".format(mexico_city.distance(capitol)/1e3, capitol.properties["n"]))
In [17]:
# Or, list capitols from nearest to furthest from Mexico City
distances = map(lambda pt: mexico_city.distance(pt), us_capitols)
distances_capitols = sorted(zip(distances, us_capitols))
for d, pt in distances_capitols:
print("{km:.0f} km {name}".format(km=d/1e3, name=pt.properties["n"]))
All of the above calculations are performed on a geoid. The LonLatWGS84
coordinate system means to use geographical (longitude and latitude) coordinates on the WGS 84 ellipsoid.
In [18]:
mp = Multipoint([(1, 1), (3, 1), (4, 3), (2, 2)],
data={"species": ["T. officianale", "C. tectorum",
"M. alba", "V. cracca"]})
These data live in the .data
attribute, which is a Table
instance. For convenience, the data can also be accessed via the .d
attribute, which provides a streamlined syntax supporting key-lookups, indexing, and slicing.
In [19]:
mp.d
Out[19]:
In [20]:
mp.d["species"]
Out[20]:
In [21]:
mp.d[1:3]
Out[21]:
The data are propagated through indexing operations on their parent geometry:
In [22]:
pt = mp[2]
print(pt, "-", pt.properties["species"])
Geometry-level metadata at the geometry level can be provided using the properties
keyword argument, which accepts a dictionary. Derived geometries carry the properties of their parent geometry.
In [23]:
poly = Polygon([(-25.41, 67.03),
(-24.83, 62.92),
(-12.76, 63.15),
(-11.44, 66.82)],
properties={"geology": "volcanic",
"alcohol": "brennivin"})
print(poly[0:3].properties)
The get_coordinate_lists
method and coordinates
attribute provide lists of coordinates for plotting or data export.
Higher-level plotting operations are provided by the separate karta.plotting
submodule, not described here.
In [24]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(*line.coords())
Out[24]:
Data can be read from and written to several common formats, including ESRI shapefiles (through bindings to the pyshp module), GeoJSON, and GPX. Convenience functions are kept in the karta.vector.read
namespace.
Each geometry has appropriate methods to write data:
In [25]:
line.to_shapefile("line.shp")
pt_web.to_geojson("point.geojson")
Raster data are primarily represented by the karta.RegularGrid
class. RegularGrid
instances have a CRS, a Null-data value, a geotransform, and one or more bands, which containing the actual data.
To provide flexibility, different band classes are provided by karta.raster.bands
using different strategies for data storage.
SimpleBand
, uses a numpy array to store all data. This makes it reasonably fast, but can be memory-hungry with large rasters.CompressedBand
, uses chunking and compression via the blosc library to reduce the memory footprint of the raster data at a small speed cost.GdalFileBand
reads data directly from a valid GDAL datasource, using the least memory but performing the slowest.Note:
GdalFileBand
doesn't currently handle all raster operations supported by the other band types. Many operations implicitly convert to in-memoryCompressedBand
representation.
In [26]:
import numpy as np
from karta.raster import RegularGrid, SimpleBand, CompressedBand, read_gtiff
ls8 = read_gtiff("LC08_L1TP_011031_20180930_20181010_01_T1_B8.TIF")
print(ls8.bands) # list of one CompressedBand instance
In [27]:
# Print grid dimensions
print(ls8.size)
In [28]:
# Print grid extent
print(ls8.extent())
In [29]:
# Visualize data
plt.imshow(ls8[::10,::10, 0], origin="bottom", extent=ls8.extent(), cmap=plt.cm.binary, vmin=3e3, vmax=10e3)
plt.colorbar()
Out[29]:
When opening or creating a RegularGrid
, a non-default band type can be specified as a keyword argument. The following code re-opens the same grid as a SimpleBand
and verifies that all data are the same.
In [30]:
ls8_numpy = read_gtiff("LC08_L1TP_011031_20180930_20181010_01_T1_B8.TIF", bandclass=SimpleBand)
np.all(ls8[:,:] == ls8_numpy[:,:]) # True
Out[30]:
In the above, the slice syntax [:,:]
is used to get an array of all grid data. Because the grid ls8
has only a single band in this case, the data array has two dimensions. The normal simple slicing rules apply, i.e. one can do things like:
In [31]:
subgrid = ls8[2000:3000, 4000:4500]
print(subgrid.shape)
In [32]:
every_other = ls8[::2, ::2]
print(every_other.shape)
In [33]:
ls8.transform
Out[33]:
Grid geolocation is based on an affine matrix transformation represented by the .transform
attribute, as well as an associated coordinate system under the .crs
attribute. The extent and bounding box of the grid can be retrieved using the respective properties:
In [34]:
print(ls8.bbox())
print(ls8.extent())
The extent
is of the form (xmin, xmax, ymin, xmax), and refers to grid centers. The bbox
is of the form (xmin, ymin, xmax, ymax), and refers to grid edges.
To get swaths of grid coordinates conveniently and in arbitrary coordinate systems, used the .coordinates()
method, which returns a CoordinateGenerator
instance that can be indexed to generate coordinate pairs.
In [35]:
coords = ls8.coordinates(crs=LonLatWGS84)
print(coords[0,0])
print(coords[-1,-1])
print(coords[:5,:5])
Nearest-neighbour and bilinear sampling of grid points is supported via the .sample()
method. It is also possible to resample the full grid at a new resolution, or to sample along a profile.
In [36]:
val = ls8.sample(Point((500000, 8750000), crs=ls8.crs)) # sample a point using the grid CRS
print(val)
In [37]:
# Resample grid at 100 m postings:
ls8_coarse = ls8.resample(100, 100)
print("Original resolution:", ls8.resolution)
print("Resampled resolution:", ls8_coarse.resolution)
In [38]:
# Generate a line and sample at intervals along it
transit = Line(zip(np.linspace(350000, 450000), np.linspace(4550000, 4700000)), crs=ls8.crs)
mp, val = ls8.profile(transit)
plt.subplot(2, 1, 1)
x, y = mp.coords()
plt.scatter(x, y, c=val.flatten(), edgecolor="none", cmap=plt.cm.binary)
plt.subplot(2, 1, 2)
dist = Line(mp).cumulength() # convert sample points to a line and extract
# the cumulative distance along it
plt.plot(dist, val[0])
Out[38]:
Note
When getting raster data, the array provided by slicing is not necessarily a view of the underlying data, and may be a copy instead. Modifying the array is not guaranteed to modify the raster. When the raster data must be replaced by an element-wise computation, use the Grid.apply(func)
method, which operates in-place. The apply
method may be chained.
# Example
grid.apply(lambda x: x**2) \
.apply(np.sin) \
.apply(lambda x: np.where(x < 0.5, grid.nodata, x))
This handles nodata pixels automatically. If the raster data must be replaced by arbitrary data, set it explicitly with Grid[:,:] = ...
.
# Example
grid[:,:] = np.convolve(np.ones([3,3])/9.0, grid[:,:], mode='same')
In [39]:
ls8_small = ls8.resize((350000, 4600000, 450000, 4650000))
plt.imshow(ls8_small[:,:,0], origin="bottom", extent=ls8_small.extent(), cmap=plt.cm.binary, vmin=3e3, vmax=12e3)
Out[39]:
New RegularGrid
instances are created by specifying a geotransform. The geotransform is represented by a tuple of the form
transform = (xll, yll, dx, dy, sx, sy)
where xll
and yll
are the coordinates of the lower left grid corner, dx
and dy
specify resolution, and sx
and sy
specify grid skew and rotation.
The following creates an empty global grid with 5 degree resolution, oriented "north-up" and "east-right", and then plots the pixel centers:
In [40]:
from karta.crs import LonLatWGS84, GallPetersEqualArea
newgrid = RegularGrid((-180, -80, 5, 5, 0, 0), values=np.zeros((160//5, 360//5)), crs=LonLatWGS84)
# visualize the coordinate positions on a Gall-Peters projection
coords = newgrid.coordinates(crs=GallPetersEqualArea)
x, y = coords[:,:]
_ = plt.plot(x, y, ".k", ms=2)