This notebook goes with the article
Bianco, E and M Hall (2015). Programming a seismic program. CSEG Recorder, November 2015.
It's based on some other notebooks, which are themselves accompaniments to various blog posts at Agile Geoscience.
We'll start with the usual prelims...
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, LineString
import geopandas as gpd
import pandas as pd
from fiona.crs import from_epsg
%matplotlib inline
Define some survey parameters:
In [2]:
xmi = 575000 # Easting of bottom-left corner of grid (m)
ymi = 4710000 # Northing of bottom-left corner (m)
SL = 600 # Source line interval (m)
RL = 600 # Receiver line interval (m)
si = 100 # Source point interval (m)
ri = 100 # Receiver point interval (m)
x = 3000 # x extent of survey (m)
y = 1800 # y extent of survey (m)
epsg = 26911 # A CRS to place it on the earth
Now we can compute some properties of the survey:
In [3]:
rlines = int(y/RL) + 1
slines = int(x/SL) + 1
In [4]:
rperline = int(x/ri) + 2
sperline = int(y/si) + 2
In [5]:
shiftx = -si/2.0
shifty = -ri/2.0
We compute lists of the x- and y-locations of the receivers and sources with nested loops. We'll use Python's list comprehensions, which are shorthand for for loops that generate lists.
In [45]:
rcvrx, rcvry = zip(*[(xmi + rcvr*ri + shiftx, ymi + line*RL - shifty)
for line in range(rlines)
for rcvr in range(rperline)])
srcx, srcy = zip(*[(xmi + line*SL, ymi + src*si)
for line in range(slines)
for src in range(sperline)])
In [46]:
rcvrs = [Point(x, y) for x, y in zip(rcvrx, rcvry)]
srcs = [Point(x, y) for x, y in zip(srcx, srcy)]
We can make a list of 'r' and 's' labels then compile into a dataframe.
In [8]:
station_list = ['r']*len(rcvrs) + ['s']*len(srcs)
survey = gpd.GeoDataFrame({'geometry': rcvrs+srcs, 'station': station_list})
survey.crs = from_epsg(epsg)
In [10]:
try:
# Needs geopandas fork: https://github.com/kwinkunks/geopandas
survey.plot(figsize=(12,12), column='station', colormap="bwr", markersize=20)
except:
# This will work regardless.
survey.plot()
plt.grid()
plt.show()
Make a Station ID row, so we can recognize the stations again.
In [11]:
sid = np.arange(len(survey))
survey['SID'] = sid
In [12]:
survey.to_file('data/survey_orig.shp')
We need midpoints. There is a midpoint between every source-receiver pair.
Hopefully it's not too inelegant to get to the midpoints now that we're using this layout object thing.
In [13]:
midpoint_list = [LineString([r, s]).interpolate(0.5, normalized=True)
for r in rcvrs
for s in srcs]
As well as knowing the (x,y) of the midpoints, we'd also like to record the distance from each s to each live r (each r in the live patch). This is easy enough to compute:
Point(x1, y1).distance(Point(x2, y2))
Then we can make a list of all the offsets when we count the midpoints into the bins.
In [14]:
offsets = [r.distance(s)
for r in rcvrs
for s in srcs]
In [15]:
azimuths = [np.arctan((r.x - s.x)/(r.y - s.y))
for r in rcvrs
for s in srcs]
Make a Geoseries of the midpoints, offsets and azimths:
In [16]:
midpoints = gpd.GeoDataFrame({'geometry': midpoint_list,
'offset': offsets,
'azimuth': np.degrees(azimuths),
})
midpoints[:5]
Out[16]:
In [17]:
ax = midpoints.plot(color='b')
Save to a shapefile if desired.
In [18]:
# midpoints.to_file('midpoints.shp')
In [18]:
midpoints['offsetx'] = offsets * np.cos(azimuths)
midpoints['offsety'] = offsets * np.sin(azimuths)
midpoints[:5].offsetx # Easy!
Out[18]:
In [19]:
midpoints.ix[3].geometry.x # Less easy :(
Out[19]:
We need lists (or arrays) to pass into the matplotlib quiver plot. This takes four main parameters: x, y, u, and v, where x, y will be our coordinates, and u, v will be the offset vector for that midpoint.
We can get at the GeoDataFrame's attributes easily, but I can't see how to get at the coordinates in the geometry GeoSeries (seems like a user error — it feels like it should be really easy) so I am resorting to this:
In [20]:
x = [m.geometry.x for i, m in midpoints.iterrows()]
y = [m.geometry.y for i, m in midpoints.iterrows()]
In [21]:
fig = plt.figure(figsize=(12,8))
plt.quiver(x, y, midpoints.offsetx, midpoints.offsety, units='xy', width=0.5, scale=1/0.025, pivot='mid', headlength=0)
plt.axis('equal')
plt.show()
The bins are a new geometry, related to but separate from the survey itself, and the midpoints. We will model them as a GeoDataFrame of polygons. The steps are:
In [19]:
# Factor to shift the bins relative to source and receiver points
jig = si / 4.
bin_centres = gpd.GeoSeries([Point(xmi + 0.5*r*ri + jig, ymi + 0.5*s*si + jig)
for r in range(2*rperline - 3)
for s in range(2*sperline - 2)
])
# Buffers are diamond shaped so we have to scale and rotate them.
scale_factor = np.sin(np.pi/4.)/2.
bin_polys = bin_centres.buffer(scale_factor*ri, 1).rotate(-45)
bins = gpd.GeoDataFrame(geometry=bin_polys)
bins[:3]
Out[19]:
In [20]:
ax = bins.plot()
Thank you to Jake Wasserman and Kelsey Jordahl for this code snippet, and many pointers.
This takes about 20 seconds to run on my iMac, compared to something close to 30 minutes for the old nested loops.
In [21]:
def bin_the_midpoints(bins, midpoints):
b = bins.copy()
m = midpoints.copy()
reindexed = b.reset_index().rename(columns={'index':'bins_index'})
joined = gpd.tools.sjoin(reindexed, m)
bin_stats = joined.groupby('bins_index')['offset']\
.agg({'fold': len, 'min_offset': np.min})
return gpd.GeoDataFrame(b.join(bin_stats))
In [39]:
bin_stats = bin_the_midpoints(bins, midpoints)
In [40]:
bin_stats[:10]
Out[40]:
In [41]:
ax = bin_stats.plot(column="fold")
In [28]:
ax = bin_stats.plot(column="min_offset")
In [26]:
new_survey = gpd.GeoDataFrame.from_file('data/adjusted.shp')
Join the old survey to the new, based on SID.
In [27]:
complete = pd.merge(survey, new_survey[['geometry', 'SID']], how='outer', on='SID')
Rename the columns.
In [28]:
complete.columns = ['geometry', 'station', 'SID', 'new_geometry']
In [1]:
complete[18:23]
Calculate the distance each station has moved. We have to use GeoSeries to do this, and everything was transformed to ordinary Series when we did the join.
In [30]:
old = gpd.GeoSeries(complete.geometry)
new = gpd.GeoSeries(complete['new_geometry'])
complete['skid'] = old.distance(new)
In [31]:
complete[15:20]
Out[31]:
Now that we have the new geometry, we can recompute everything, this time from the dataframe rather than from lists.
First the midpoints.
In [49]:
rcvrs = complete[complete['station']=='r'][complete['new_geometry'].notnull()]['new_geometry']
srcs = complete[complete['station']!='r'][complete['new_geometry'].notnull()]['new_geometry']
midpoint_list = [LineString([r, s]).interpolate(0.5, normalized=True)
for r in rcvrs
for s in srcs]
In [50]:
len(srcs)
Out[50]:
In [51]:
offsets = [r.distance(s)
for r in rcvrs
for s in srcs]
In [52]:
azimuths = [np.arctan((r.x - s.x)/(r.y - s.y))
for r in rcvrs
for s in srcs]
In [53]:
new_midpoints = gpd.GeoDataFrame({'geometry': midpoint_list,
'offset': offsets,
'azimuth': np.degrees(azimuths),
})
Now we can fill the bins and get the statistics.
In [54]:
new_stats = bin_the_midpoints(bins, new_midpoints).fillna(-1)
In [55]:
new_stats.plot(column="fold", figsize=(12,8))
plt.show()
A quick diff shows how the fold has changed.
In [56]:
new_stats['diff'] = bin_stats.fold - new_stats.fold
new_stats.plot(column="diff", figsize=(12,8))
plt.show()
In [58]:
# new_stats.to_file('data/new_stats.shp')