Before coming into the Notebook, spend some time in an interactive session learning about sequences (strings, lists), and doing basic indexing, slicing, append()
, in
, etc.
Then you can come in here...
In [1]:
layers = [0.23, 0.34, 0.45, 0.25, 0.23, 0.35]
In [2]:
uppers = layers[:-1]
lowers = layers[1:]
In [3]:
rcs = []
for pair in zip(lowers, uppers):
rc = (pair[1] - pair[0]) / (pair[1] + pair[0])
rcs.append(rc)
In [4]:
rcs
Out[4]:
Definition, inputs, side-effects, returning, scope, docstrings
In [5]:
# Exercise
def compute_rc(layers):
"""
Computes reflection coefficients given
a list of layer impedances.
"""
uppers = layers[:-1]
lowers = layers[1:]
rcs = []
for pair in zip(lowers, uppers):
rc = (pair[1] - pair[0]) / (pair[1] + pair[0])
rcs.append(rc)
return rcs
In [6]:
compute_rc(layers)
Out[6]:
Put in a file and import into a new notebook
In [7]:
import numpy as np # Just like importing file
In [8]:
biglog = np.random.random(10000000)
%timeit compute_rc(biglog)
Note that the log has to be fairly big for the benchmarking to work properly, because otherwise the CPU caches the computation and this skews the results.
Now we can re-write our function using arrays instead of lists.
In [9]:
# Exercise
def compute_rc_vector(layers):
uppers = layers[:-1]
lowers = layers[1:]
return (lowers - uppers) / (uppers + lowers)
In [10]:
%timeit compute_rc_vector(biglog)
60 times faster on my machine!
In [11]:
from numba import jit
In [12]:
@jit
def compute_rc_numba(layers):
uppers = layers[:-1]
lowers = layers[1:]
return (lowers - uppers) / (uppers + lowers)
In [13]:
%timeit compute_rc_numba(biglog)
OK, we'll make a fake example.
In [14]:
def compute_rc_slow(layers):
uppers = layers[:-1]
lowers = layers[1:]
rcs = np.zeros_like(uppers)
for i in range(rcs.size):
rcs[i] = (lowers[i] - uppers[i]) / (uppers[i] + lowers[i])
return rcs
In [15]:
%timeit compute_rc_slow(biglog)
In [16]:
@jit
def compute_rc_faster(layers):
uppers = layers[:-1]
lowers = layers[1:]
rcs = np.zeros_like(uppers)
for i in range(rcs.size):
rcs[i] = (lowers[i] - uppers[i]) / (uppers[i] + lowers[i])
return rcs
In [17]:
%timeit compute_rc_faster(biglog)
However, you can't speed up our original list-based function this way.
In [18]:
@jit
def compute_rc_hopeful(layers):
"""
Computes reflection coefficients given
a list of layer impedances.
"""
uppers = layers[:-1]
lowers = layers[1:]
rcs = []
for pair in zip(lowers, uppers):
rc = (pair[1] - pair[0]) / (pair[1] + pair[0])
rcs.append(rc)
return rcs
In [19]:
%timeit compute_rc_hopeful(biglog)
In [20]:
import matplotlib.pyplot as plt
%matplotlib inline
Not we can only plot part of biglog
because it contains too many points for matplotlib
(and for our screen!). If we really wanted to plot it, we'd have to find a way to upscale it.
In [21]:
plt.plot(biglog[:500])
Out[21]:
In [22]:
fig = plt.figure(figsize=(15,2))
ax = fig.add_subplot(111)
ax.plot(biglog[:500])
ax.set_title("big log")
plt.show()
The point is that we often want to store data along with relevant functions (methods) in one 'thing' — an object.
Build this up, piece by piece.
Start with __init__()
which is required anyway. Only define self.layers
.
Then add rcs()
, then plot()
, and finally __len__()
, once you discover that len(l)
doesn't work, because this object doesn't have that property.
In [34]:
class Layers(object):
def __init__(self, layers, label=None):
# Just make sure we end up with an array
self.layers = np.array(layers)
self.label = label or "My log"
self.length = self.layers.size # But storing len in an attribute is unexpected...
def __len__(self): # ...better to do this.
return len(self.layers)
def rcs(self):
uppers = self.layers[:-1]
lowers = self.layers[1:]
return (lowers-uppers) / (uppers+lowers)
def plot(self, lw=0.5, color='#6699ff'):
fig = plt.figure(figsize=(2,6))
ax = fig.add_subplot(111)
ax.barh(range(len(self.layers)), self.layers, color=color, lw=lw, align='edge', height=1.0, alpha=1.0, zorder=10)
ax.grid(zorder=2)
ax.set_ylabel('Layers')
ax.set_title(self.label)
ax.set_xlim([-0.5,1.0])
ax.set_xlabel('Measurement (units)')
ax.invert_yaxis()
#ax.set_xticks(ax.get_xticks()[::2]) # take out every second tick
ax.spines['right'].set_visible(False) # hide the spine on the right
ax.yaxis.set_ticks_position('left') # Only show ticks on the left and bottom spines
plt.show()
In [35]:
l = Layers(layers, label='Well # 1')
In [36]:
l.rcs()
Out[36]:
In [37]:
len(l)
Out[37]:
In [38]:
l.plot()
In [39]:
rel_interval = np.cumsum(l.rcs(), dtype=float)
len(rel_interval)
Out[39]:
In [40]:
relative_layers = np.insert(rel_interval, 0, 0)
In [41]:
relative_layers
Out[41]:
In [42]:
relative = Layers(relative_layers, "relative")
In [43]:
relative.layers
Out[43]:
In [44]:
relative.plot()
In [ ]:
url = "http://en.wikipedia.org/wiki/Jurassic"
Use View Source in your browser to figure out where the age range is on the page, and what it looks like.
Try to find the same string here.
In [ ]:
import requests
r = requests.get(url)
r.text[:500]
Using a regular expression:
In [ ]:
import re
s = re.search(r'<i>(.+?million years ago)</i>', r.text)
text = s.group(1)
Exercise: Make a function to get the start and end ages of any geologic period, taking the name of the period as an argument.
In [ ]:
def get_age(period):
url = "http://en.wikipedia.org/wiki/" + period
r = requests.get(url)
start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+) million years ago</i>', r.text).groups()
return float(start), float(end)
In [ ]:
period = "Jurassic"
get_age(period)
In [ ]:
def duration(period):
t0, t1 = get_age(period)
duration = t0 - t1
response = "According to Wikipedia, the {0} period was {1:.2f} Ma long.".format(period, duration)
return response
In [ ]:
duration('Cretaceous')
In Python, functions are first class objects — you can pass them around like other objects.
Sometimes it's convenient to think in terms of map()
and reduce()
. This pattern is fundamental in data analytics and some packages, such as pandas
.
A couple of very common functions, sorted()
and filter()
, always works with a function as one of their arguments. It can be confusing the first time you see it.
With no functional argument, sorted
does what you'd expect:
In [ ]:
l = [0.001, 1, 3, 51, 41 , 601]
sorted(l)
What if we want to sort based on the number of characters? Don't ask why, we just do. Then we write a function that returns a key which, when sorted, will give the ordering we want.
In [ ]:
def strlen(n):
return len(str(n))
In [ ]:
sorted(l, key=strlen)
We could rewrite that tiny function as a lambda, which is basically a little unnamed function:
In [ ]:
sorted(l, key=lambda n: len(str(n)))
When would you make a named function vs a lambda? It all depends on whether you want to use it again or not.
In [ ]:
def sq(n):
return n**2
# In Python 3, map produces an iterator, not a list.
# So we must cast to list to inspect its contents.
list(map(sq, l))
We can get around defining that tiny function sq()
with a lambda, which you can think of as a temporary, 'throwaway' function:
In [ ]:
list(map(lambda n: n**2, l))
In practice, we'd often write this as a list comprehension. Then we can skip the creation of the funciton or lambda entirely:
In [ ]:
[n**2 for n in l]
One of the advantages of map
is that it is 'lazy', so if you map a function to a giant list, you don't get a giant list back, you get an iterator. A list-comp would give you a giant list, possibly jamming the memory on your box.
reduce
reduce
takes a sequence and applies some function to it recursively. You could think of it like a running sum, say, but for any function, not just summing.
In [ ]:
def runsum(a, b):
return a + b
# For some reason reduce is not in the main namespace like map
from functools import reduce
reduce(runsum, l)
In [ ]:
def runmult(a, b):
return a * b
reduce(runmult, l)
partial
for making curry'Preapplying' an argument to a function is sometimes useful. For example, we might have a general function for raising a number a to the power b, but then want another function which raises numbers to the 3rd power.
I could do this by simply calling the first function from the second:
In [ ]:
def power(a, b):
return a**b
def cuber(a):
return power(a, 3)
cuber(2)
But some people might find it more inuitive to do it this way:
In [ ]:
from functools import partial
cuber = partial(power, b=3)
cuber(2)
This is called 'currying' the function power()
.