In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import ticker
from matplotlib.colors import LogNorm

import seaborn as sns
sns.set(context="poster", style="ticks", font_scale=2)

import numpy as np
import pandas as pd

from astropy import constants as const
M_solar = const.M_sun.cgs.value
m_proton = const.m_p.cgs.value

Goal

Take in data sampled on a uneven 2D grid, and plot what a smoothed surface would look like.

Constraints:

  • the extrema of the smoothed surface shouldn't exceed the extrema of the data (for science reasons the maximum of this surface is important; it would be bad if the interpolated maximum was just an artifact of the smoothing). This means splines are out.
  • This should be in loglog space (this makes some standard matplotlib gridding routines a bit more difficult)

Read in my sample data


In [2]:
df = pd.read_hdf("sample_data.h5")

In [3]:
x_in = df.N_SNe
y_in = df.density / m_proton
z_in = df.momentum / (100 * M_solar * df.N_SNe)

Define some useful functions


In [4]:
def distance_metric_squared(x_a, y_a, x_b, y_b, spacing_x, spacing_y):
    """Find the squared distance between two points, using a cartesian norm
    in log space, which each dimension scaled by a `spacing` variable"""
    log_x_a = np.log10(x_a)
    log_x_b = np.log10(x_b)
    log_y_a = np.log10(y_a)
    log_y_b = np.log10(y_b)
    return ((log_x_a - log_x_b)/spacing_x)**2 \
          + ((log_y_a - log_y_b)/spacing_y)**2

In [5]:
def rbf_interpolate_logloglog(x_in, y_in, z_in, x_out, y_out, spacing_x, spacing_y):
    """Interpolate z_in(x_in, y_in) to the points x_out, y_out, 
    using a radial basis function and a log10,log10,log10 metric
    
    Inputs
    ------
        x_in : 1D numpy array
            x coordinates of input sample points
        y_in : 1D numpy array
            y coordinates of input sample points
        z_in : 1D numpy array
            value of target function at sampled points
            
        x_out : 1D numpy array
            x coordinates of points you would like to interpolate at
        y_out : 1D numpy array
            y coordinates of points you would like to interpolate at
            
        spacing_x : float
            The desired gaussian kernel size (in log space) in the y direction
        spacing_y : float
            The desired gaussian kernel size (in log space) in the y direction

    
    Returns
    -------
        zz_out : 2d numpy array
            the interpolated value at every combination of x_out and y_out
            
    Notes
    -----
        You are passing 1D arrays for x_out and y_out; zz_out is then interpolate
        to a 2D grid of all combinations of x_out[i] and y_out[j] for all i,j
         - this effectively evaluates the interpolation at the 
         coordinates given by a np.meshgrid of x_out and y_out
    """



    zz_out = np.empty((y_out.size, x_out.size))
    for j in range(x_out.size):
        for i in range(y_out.size):
            weights = np.exp(-.5*distance_metric_squared(x_in, y_in,
                                                         x_out[j], y_out[i],
                                                         spacing_x, spacing_y))
            weights /= weights.sum()
            zz_out[i,j] = 10**(weights * np.log10(z_in)).sum()
    
    return zz_out

In [6]:
# this isn't exciting, but it'll make the plot a bit prettier
# (puts the colorbar labels into scientific notation)

def momentum_tick_formatter(x,pos):
    a, b = '{:.2e}'.format(x).split('e')
    b = int(b)
    if int(float(a)) in {1,3}:
        return r'${} \times 10^{{{}}}$'.format(int(float(a)), b)
    else:
        return ""

Make the plot

In the next cell I actually show how to use these interpolated points. In any plotting step (labeled "plot: " in a commented line), only the first line is necessary; any other lines are just to make it look prettier for publication.


In [7]:
## Interpolate to a uniform grid in loglog space
x_out = np.logspace(np.log10(x_in).min(), np.log10(x_in).max(), num=100)
y_out = np.logspace(np.log10(y_in).min(), np.log10(y_in).max(), num=100)

# These 'spacing' variables are hand-tuned parameters!
spacing_x = .3
spacing_y = .6
z_out = rbf_interpolate_logloglog(x_in, y_in, z_in, x_out, y_out, spacing_x, spacing_y)

# plot: the color image
plt.pcolormesh(x_out, y_out, z_out, 
                cmap=plt.cm.viridis, 
                norm=LogNorm(vmin=3e3, vmax=6e4),
                edgecolors="none")

# plot: add a colorbar
plt.colorbar(
             label=r"$ p / (100$ $ M_\odot$ $ N_\mathrm{SNe}) $ $[\mathrm{km}$ $\mathrm{s}^{-1}]$",
             ticks=[3e3,4e3,5e3,6e3,7e3,8e3,9e3,1e4,2e4,3e4,4e4,5e4,6e4,7e4,8e4,9e4,1e5],
             format=ticker.FuncFormatter(momentum_tick_formatter))

# plot: add contours
plt.contour(x_out, y_out, z_out,
            levels=np.logspace(np.log10(3e3),np.log10(6e4),num=7+2)[1:-1], 
            cmap=plt.cm.Greys_r,
            locator=ticker.LogLocator(),
           )

# plot: add black scatter points to show the input points for my interpolation
plt.scatter(x_in,
            y_in,
            color="k",s=60)


# The rest is just to make it look prettier
plt.xlabel(r"$ N_\mathrm{SNe}$")
plt.ylabel(r"$ \rho $ $[m_\mathrm{H}$ $\mathrm{cm}^{-3}]$")
plt.xscale("log")
plt.yscale("log")
plt.xlim(x_out.min()*10**-.25, x_out.max()*10**.25)
plt.ylim(y_out.min()*10**-.25, y_out.max()*10**.25)


Out[7]:
(0.0007479139748048511, 236.51116542531167)

What about alternatives?

Triangulation

Basically, for any desired location, it finds the 3 closest input data, and linearly interpolates between them. This is related to Delaunay triangulation / Voronoi tesselation.


In [8]:
import matplotlib.tri as tri

In [9]:
triang = tri.Triangulation(np.log10(x_in), np.log10(y_in))

Flat shading

Way too many linear features, that aren't actually representative of the data. It might be okay for a quick diagnostic of the data, but your eyes get too drawn to artifacts for it to be useful for publications-quality figures.


In [10]:
plt.tripcolor(triang, np.log10(z_in), shading='flat',cmap=plt.cm.viridis )
plt.colorbar()
plt.tricontour(triang, np.log10(z_in), cmap=plt.cm.Greys_r, )


plt.scatter(np.log10(x_in), np.log10(y_in), color="k")


Out[10]:
<matplotlib.collections.PathCollection at 0x111857390>

Smoothed shading

This is actually pretty decent. There's also still some artifacts (e.g. along the bottom x axis), but overall I'd say this is pretty good for having put no effort into customizing it.

The axes are a little screwed up, but you could probably find a way to get around that.


In [11]:
plt.tripcolor(triang, np.log10(z_in), shading='gouraud',cmap=plt.cm.viridis )
plt.colorbar()
plt.tricontour(triang, np.log10(z_in), cmap=plt.cm.Greys_r, )

plt.scatter(np.log10(x_in), np.log10(y_in), color="k")


Out[11]:
<matplotlib.collections.PathCollection at 0x11189f198>