Seismic acquisition fiddling

The idea is to replicate what we've done so far but with 3 enhancements:

With a Survey object to hold the various features of a survey.
With more GeoPandas stuff, and less fussing with (x,y)'s directly.
Making bins and assigning midpoints to them.

We'll start with the usual prelims...



In [5]:

    
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, LineString
import geopandas as gpd
import pandas as pd
from fiona.crs import from_epsg

%matplotlib inline

Survey object



In [6]:

    
class Survey:
    """
    A seismic survey.
    """

    def __init__(self, params):
        
        # Assign the variables from the parameter dict,
        # using dict.items() for Python 3 compatibility.
        for k, v in params.items(): 
            setattr(self, k, v)
          
        # These are just a convenience; we could use the
        # tuples directly, or make objects with attrs.
        self.xmi = self.corner[0]
        self.ymi = self.corner[1]
        
        self.x = self.size[0]
        self.y = self.size[1]
        
        self.SL = self.line_spacing[0]
        self.RL = self.line_spacing[1]
        
        self.si = self.point_spacing[0]
        self.ri = self.point_spacing[1]
        
        self.shiftx = -self.si/2.
        self.shifty = -self.ri/2.
           
    @property
    def lines(self):
        """
        Returns number of (src, rcvr) lines.
        """
        slines = int(self.x/self.SL) + 1
        rlines = int(self.y/self.RL) + 1
        return slines, rlines

    @property
    def points_per_line(self):
        """
        Returns number of (src, rcvr) points per line.
        """
        spoints = int(self.y/self.si) + 2
        rpoints = int(self.x/self.ri) + 2
        return spoints, rpoints
    
    @property
    def src(self):
        s = [Point(self.xmi+line*self.SL, self.ymi+s*self.si)
             for line in range(self.lines[0])
             for s in range(self.points_per_line[0])
             ]
        S = gpd.GeoSeries(s)
        S.crs = from_epsg(26911)
        return S

    @property
    def rcvr(self):
        r = [Point(self.xmi + r*self.ri + self.shiftx, self.ymi + line*self.RL - self.shifty)
             for line in range(self.lines[1])
             for r in range(self.points_per_line[1])
             ]
        R = gpd.GeoSeries(r)
        R.crs = from_epsg(self.epsg)
        return R
    
    @property
    def layout(self):
        """
        Provide a GeoDataFrame of all points,
        labelled as columns and in hierarchical index.
        """
        # Feels like there might be a better way to do this...
        sgdf = gpd.GeoDataFrame({'geometry': self.src, 'station': 'src'})
        rgdf = gpd.GeoDataFrame({'geometry': self.rcvr, 'station': 'rcvr'})

        # Concatenate with a hierarchical index
        layout = pd.concat([sgdf,rgdf], keys=['sources','receivers'])
        layout.crs = from_epsg(self.epsg)

        return layout

Perhaps s and r should be objects too. I think you might want to have survey.receivers.x for the list of x locations, for example.

Instantiate and plot



In [7]:

    
params = {'corner': (5750000,4710000),
          'size': (3000,1800),
          'line_spacing': (600,600),
          'point_spacing': (100,100),
          'epsg': 26911 # http://spatialreference.org/ref/epsg/26911/
          }

survey = Survey(params)



In [8]:

    
s = survey.src
r = survey.rcvr
r[:10]









    Out[8]:





0    POINT (5749950 4710050)
1    POINT (5750050 4710050)
2    POINT (5750150 4710050)
3    POINT (5750250 4710050)
4    POINT (5750350 4710050)
5    POINT (5750450 4710050)
6    POINT (5750550 4710050)
7    POINT (5750650 4710050)
8    POINT (5750750 4710050)
9    POINT (5750850 4710050)
dtype: object



In [9]:

    
layout = survey.layout
layout[:10]









    Out[9]:






  
    
      
      
      geometry
      station
    
  
  
    
      sources
      0
      POINT (5750000 4710000)
      src
    
    
      1
      POINT (5750000 4710100)
      src
    
    
      2
      POINT (5750000 4710200)
      src
    
    
      3
      POINT (5750000 4710300)
      src
    
    
      4
      POINT (5750000 4710400)
      src
    
    
      5
      POINT (5750000 4710500)
      src
    
    
      6
      POINT (5750000 4710600)
      src
    
    
      7
      POINT (5750000 4710700)
      src
    
    
      8
      POINT (5750000 4710800)
      src
    
    
      9
      POINT (5750000 4710900)
      src

With a hierarchical index you can do cool things, e.g. show the last five sources:



In [10]:

    
layout.ix['sources'][-5:]









    Out[10]:






  
    
      
      geometry
      station
    
  
  
    
      115
      POINT (5753000 4711500)
      src
    
    
      116
      POINT (5753000 4711600)
      src
    
    
      117
      POINT (5753000 4711700)
      src
    
    
      118
      POINT (5753000 4711800)
      src
    
    
      119
      POINT (5753000 4711900)
      src



In [11]:

    
layout.crs









    Out[11]:





{'init': 'epsg:26911', 'no_defs': True}



In [12]:

    
ax = layout.plot()

Export GeoDataFrames to GIS shapefile.



In [13]:

    
# gdf.to_file('src_and_rcvr.shp')

Midpoint calculations

We need midpoints. There is a midpoint between every source-receiver pair.

Hopefully it's not too inelegant to get to the midpoints now that we're using this layout object thing.



In [14]:

    
midpoint_list = [LineString([r, s]).interpolate(0.5, normalized=True)
                  for r in layout.ix['receivers'].geometry
                  for s in layout.ix['sources'].geometry
                  ]

As well as knowing the (x,y) of the midpoints, we'd also like to record the distance from each s to each live r (each r in the live patch). This is easy enough to compute:

Point(x1, y1).distance(Point(x2, y2))

Then we can make a list of all the offsets when we count the midpoints into the bins.



In [15]:

    
offsets = [r.distance(s)
           for r in layout.ix['receivers'].geometry
           for s in layout.ix['sources'].geometry
           ]



In [16]:

    
azimuths = [(180.0/np.pi) * np.arctan((r.x - s.x)/(r.y - s.y))
            for r in layout.ix['receivers'].geometry
            for s in layout.ix['sources'].geometry
            ]



In [17]:

    
offsetx = np.array(offsets)*np.cos(np.array(azimuths)*np.pi/180.)
offsety = np.array(offsets)*np.sin(np.array(azimuths)*np.pi/180.)

Make a Geoseries of the midpoints, offsets and azimths:



In [18]:

    
midpoints = gpd.GeoDataFrame({
                   'geometry' : midpoint_list,
                   'offset' : offsets,
                   'azimuth': azimuths,
                   'offsetx' : offsetx,
                   'offsety' : offsety
                   })
midpoints[:5]









    Out[18]:






  
    
      
      azimuth
      geometry
      offset
      offsetx
      offsety
    
  
  
    
      0
      -45.000000
      POINT (5749975 4710025)
      70.710678
      50
      -50
    
    
      1
      45.000000
      POINT (5749975 4710075)
      70.710678
      50
      50
    
    
      2
      18.434949
      POINT (5749975 4710125)
      158.113883
      150
      50
    
    
      3
      11.309932
      POINT (5749975 4710175)
      254.950976
      250
      50
    
    
      4
      8.130102
      POINT (5749975 4710225)
      353.553391
      350
      50



In [19]:

    
ax = midpoints.plot()

Save to a shapefile if desired.



In [20]:

    
#midpt.to_file('CMPs.shp')

Spider plot



In [21]:

    
midpoints[:5].offsetx # Easy!









    Out[21]:





0     50
1     50
2    150
3    250
4    350
Name: offsetx, dtype: float64



In [22]:

    
midpoints.ix[3].geometry.x # Less easy :(









    Out[22]:





5749975.0

We need lists (or arrays) to pass into the matplotlib quiver plot. This takes four main parameters: x, y, u, and v, where x, y will be our coordinates, and u, v will be the offset vector for that midpoint.

We can get at the GeoDataFrame's attributes easily, but I can't see how to get at the coordinates in the geometry GeoSeries (seems like a user error — it feels like it should be really easy) so I am resorting to this:



In [23]:

    
x = [m.geometry.x for i, m in midpoints.iterrows()]
y = [m.geometry.y for i, m in midpoints.iterrows()]



In [24]:

    
fig = plt.figure(figsize=(12,8))
plt.quiver(x, y, midpoints.offsetx, midpoints.offsety, units='xy', width=0.5, scale=1/0.025, pivot='mid', headlength=0)
plt.axis('equal')
plt.show()

Bins

The bins are a new geometry, related to but separate from the survey itself, and the midpoints. We will model them as a GeoDataFrame of polygons. The steps are:

Compute the bin centre locations with our usual list comprehension trick.
Buffer the centres with a square.
Gather the buffered polygons into a GeoDataFrame.



In [30]:

    
# Factor to shift the bins relative to source and receiver points
jig = survey.si / 4.
bin_centres = gpd.GeoSeries([Point(survey.xmi + 0.5*r*survey.ri + jig, survey.ymi + 0.5*s*survey.si + jig)
                             for r in range(2*survey.points_per_line[1] - 3)
                             for s in range(2*survey.points_per_line[0] - 2)
                            ])

# Buffers are diamond shaped so we have to scale and rotate them.
scale_factor = np.sin(np.pi/4.)/2.
bin_polys = bin_centres.buffer(scale_factor*survey.ri, 1).rotate(-45)
bins = gpd.GeoDataFrame(geometry=bin_polys)

bins[:3]









    Out[30]:






  
    
      
      geometry
    
  
  
    
      0
      POLYGON ((5750050 4710000, 5750000 4710000, 57...
    
    
      1
      POLYGON ((5750050.000000001 4710050, 5750000 4...
    
    
      2
      POLYGON ((5750050.000000001 4710100, 5749999.9...



In [31]:

    
ax = bins.plot()

New spatial join

Thank you to Jake Wasserman and Kelsey Jordahl for this code snippet, and many pointers.

This takes about 20 seconds to run on my iMac, compared to something close to 30 minutes for the old nested loops.



In [32]:

    
reindexed = bins.reset_index().rename(columns={'index':'bins_index'})
joined = gpd.tools.sjoin(reindexed, midpoints)
bin_stats = joined.groupby('bins_index')['offset']\
                  .agg({'fold': len, 'min_offset': np.min})
bins = gpd.GeoDataFrame(bins.join(bin_stats))



In [33]:

    
joined[:10]









    Out[33]:






  
    
      
      bins_index
      geometry
      azimuth
      offset
      offsetx
      offsety
    
  
  
    
      0
      0
      POLYGON ((5750050 4710000, 5750000 4710000, 57...
      45.000000
      70.710678
      50
      50
    
    
      1
      1
      POLYGON ((5750050.000000001 4710050, 5750000 4...
      -45.000000
      70.710678
      50
      -50
    
    
      2
      2
      POLYGON ((5750050.000000001 4710100, 5749999.9...
      -18.434949
      158.113883
      150
      -50
    
    
      3
      3
      POLYGON ((5750050 4710150, 5750000 4710150, 57...
      -11.309932
      254.950976
      250
      -50
    
    
      4
      4
      POLYGON ((5750050 4710200, 5750000 4710200, 57...
      -8.130102
      353.553391
      350
      -50
    
    
      5
      5
      POLYGON ((5750050 4710250, 5750000 4710250, 57...
      -6.340192
      452.769257
      450
      -50
    
    
      6
      6
      POLYGON ((5750050 4710300, 5750000 4710300, 57...
      4.398705
      651.920241
      650
      50
    
    
      7
      6
      POLYGON ((5750050 4710300, 5750000 4710300, 57...
      -5.194429
      552.268051
      550
      -50
    
    
      8
      7
      POLYGON ((5750050 4710350, 5750000 4710350, 57...
      5.194429
      552.268051
      550
      50
    
    
      9
      7
      POLYGON ((5750050 4710350, 5750000 4710350, 57...
      -4.398705
      651.920241
      650
      -50



In [34]:

    
bins[:10]









    Out[34]:






  
    
      
      geometry
      min_offset
      fold
    
  
  
    
      0
      POLYGON ((5750050 4710000, 5750000 4710000, 57...
      70.710678
      1
    
    
      1
      POLYGON ((5750050.000000001 4710050, 5750000 4...
      70.710678
      1
    
    
      2
      POLYGON ((5750050.000000001 4710100, 5749999.9...
      158.113883
      1
    
    
      3
      POLYGON ((5750050 4710150, 5750000 4710150, 57...
      254.950976
      1
    
    
      4
      POLYGON ((5750050 4710200, 5750000 4710200, 57...
      353.553391
      1
    
    
      5
      POLYGON ((5750050 4710250, 5750000 4710250, 57...
      452.769257
      1
    
    
      6
      POLYGON ((5750050 4710300, 5750000 4710300, 57...
      552.268051
      2
    
    
      7
      POLYGON ((5750050 4710350, 5750000 4710350, 57...
      552.268051
      2
    
    
      8
      POLYGON ((5750050.000000001 4710400, 5750000 4...
      452.769257
      2
    
    
      9
      POLYGON ((5750050.000000001 4710450, 5749999.9...
      353.553391
      2



In [35]:

    
ax = bins.plot(column="fold")



In [36]:

    
ax = bins.plot(column="min_offset")



In [ ]:

		geometry	station
sources	0	POINT (5750000 4710000)	src
	1	POINT (5750000 4710100)	src
	2	POINT (5750000 4710200)	src
	3	POINT (5750000 4710300)	src
	4	POINT (5750000 4710400)	src
	5	POINT (5750000 4710500)	src
	6	POINT (5750000 4710600)	src
	7	POINT (5750000 4710700)	src
	8	POINT (5750000 4710800)	src
	9	POINT (5750000 4710900)	src

	geometry	station
115	POINT (5753000 4711500)	src
116	POINT (5753000 4711600)	src
117	POINT (5753000 4711700)	src
118	POINT (5753000 4711800)	src
119	POINT (5753000 4711900)	src

	azimuth	geometry	offset	offsetx	offsety
0	-45.000000	POINT (5749975 4710025)	70.710678	50	-50
1	45.000000	POINT (5749975 4710075)	70.710678	50	50
2	18.434949	POINT (5749975 4710125)	158.113883	150	50
3	11.309932	POINT (5749975 4710175)	254.950976	250	50
4	8.130102	POINT (5749975 4710225)	353.553391	350	50

	geometry
0	POLYGON ((5750050 4710000, 5750000 4710000, 57...
1	POLYGON ((5750050.000000001 4710050, 5750000 4...
2	POLYGON ((5750050.000000001 4710100, 5749999.9...