In [1]:
# http://ipython.org/ipython-doc/rel-1.1.0/api/generated/IPython.core.magics.pylab.html#
%pylab --no-import-all inline
http://mathandmultimedia.com/2010/09/15/sum-first-n-positive-integers/
Gauss displayed his genius at an early age. According to anecdotes, when he was in primary school, he was punished by his teacher due to misbehavior. He was told to add the numbers from 1 to 100. He was able to compute its sum, which is 5050, in a matter of seconds.
Now, how on earth did he do it?
See also:
Let's verify this result in a number of ways. Take some time now to write some code to add up 1 to 100.
Specifically:
sumBeware: in ipython w/ pylab mode, sum might be overwritten by numpy's sum -- use __builtin__.sum if you want http://docs.python.org/2/library/functions.html#sum as opposed to http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
In [2]:
# using loop and xrange
n = 100
s = 0L
for i in xrange(n+1):
s += i
print s
In [3]:
# using builtin sum and range
print range(101)
sum(range(101))
Out[3]:
In [4]:
# xrange
sum(xrange(101))
Out[4]:
In [5]:
# itertools is a great library
# http://docs.python.org/2/library/itertools.html#itertools.count
# itertools.count(start=0, step=1):
# "Make an iterator that returns evenly spaced values starting with step."
from itertools import islice, count
c = count(0, 1)
In [6]:
# look at how count() works by repetively calling c.next()
print c.next()
print c.next()
print c.next()
In [7]:
# let's add up using count and islice to limit how high we count
# also make sure we're also using the builtin sum
# http://docs.python.org/2/library/functions.html#sum
__builtin__.sum(islice(count(0,1), 101L))
Out[7]:
In [8]:
# generator for the lowercase English alphabet
import string
def alpha1():
m = list(string.lowercase)
while m:
yield m.pop(0)
In [1]:
import string
# make a generator comprehension -- generate items on demand
k = (s for s in list(string.lowercase))
k #--> generator
k = [s for s in list(string.lowercase)]
k #--> list comprehension
Out[1]:
In [10]:
k.next()
Out[10]:
In [11]:
# compare to k1, a list comprehension
k1 = [s for s in list(string.lowercase)]
k1
Out[11]:
In [12]:
# create my own version of itertools.count
def my_count(start, step):
n = start
while True:
yield n
n += step
__builtin__.sum(islice(my_count(0,1), 101L))
Out[12]:
$T_n= \sum_{k=1}^n k = 1+2+3+ \dotsb +n = \frac{n(n+1)}{2} = {n+1 \choose 2}$
In [2]:
from itertools import islice
def triangular(): #--> Fibonaccci series?
n = 1
i = 1
while True:
yield n
i +=1
n += i
In [3]:
for i, n in enumerate(islice(triangular(), 10)):
print i+1, n
In [15]:
list(islice(triangular(), 100))[-1]
Out[15]:
In [16]:
list(islice(triangular(),99,100))[0]
Out[16]:
http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem :
If a chessboard were to have wheat placed upon each square such that one grain were placed on the first square, two on the second, four on the third, and so on (doubling the number of grains on each subsequent square), how many grains of wheat would be on the chessboard at the finish?
The total number of grains equals 18,446,744,073,709,551,615, which is a much higher number than most people intuitively expect.
In [17]:
# Legend of the Chessboard YouTube video
from IPython.display import YouTubeVideo
YouTubeVideo('t3d0Y-JpRRg')
Out[17]:
In [18]:
# generator comprehension
k = (pow(2,n) for n in xrange(64))
k.next()
Out[18]:
In [19]:
__builtin__.sum((pow(2,n) for n in xrange(64)))
Out[19]:
In [20]:
pow(2,64) -1
Out[20]:
http://stackoverflow.com/a/509295/7782
Use on any of the sequence types (python docs on sequence types):
There are seven sequence types: strings, Unicode strings, lists, tuples, bytearrays, buffers, and xrange objects.
The use of square brackets are for accessing slices of sequence.
Let's remind ourselves of how to use slices
s[i]s[i:j]s[i:j:k]
In [21]:
m = range(10)
m
Out[21]:
In [22]:
m[0]
Out[22]:
In [23]:
m[-1]
Out[23]:
In [24]:
m[::-1]
Out[24]:
In [25]:
m[2:3]
Out[25]:
In [26]:
import string
alphabet = string.lowercase
alphabet
Out[26]:
In [27]:
# 13 letter of the alphabet
alphabet[12]
Out[27]:
We will revisit generalized slicing in NumPy.
http://my.safaribooksonline.com/book/programming/python/9781449323592/1dot-preliminaries/id2699702
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
These imports done for you in pylab mode.
ipython --help
yields
--pylab=<CaselessStrEnum> (InteractiveShellApp.pylab)
Default: None
Choices: ['tk', 'qt', 'wx', 'gtk', 'osx', 'inline', 'auto']
Pre-load matplotlib and numpy for interactive use, selecting a particular
matplotlib backend and loop integration.
In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas import Series, DataFrame
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
See PfDA, Chapter 4
In [6]:
# first: a numpy array of zero-dimension
a0 = np.array(5)
a0
Out[6]:
use shape to get a tuple of array dimensions
In [7]:
a0.ndim, a0.shape
Out[7]:
In [8]:
# 1-d array
a1 = np.array([1,2])
a1.ndim, a1.shape
Out[8]:
In [9]:
# 2-d array
a2 = np.array(([1,2], [3,4]))
a2.ndim, a2.shape
Out[9]:
In [16]:
a2.dtype
Out[16]:
In [14]:
a3 = np.array(([1,2], ['a',4]))
a3.ndim, a3.shape
a3.dtype
Out[14]:
arange is one instance of ndarray creating function in NumPy
Compare to xrange.
In [34]:
from numpy import arange
In [ ]:
type(arange(10))
In [17]:
np.arange(3,7) #--> why did it take only 1 as interval
Out[17]:
In [36]:
for k in arange(10):
print k
In [37]:
list(arange(10)) == list(xrange(10))
Out[37]:
In [18]:
#how to map 0..63 -> 2x2 array
a3 = np.arange(64).reshape(8,8)
a3
Out[18]:
In [19]:
# 2nd row, 3rd column --> remember index starts at 0
a3[1,2]
Out[19]:
In [21]:
# check that reshape works --> Nice!!
for i in range(8):
for j in range(8):
if a3[i,j] != i*8 + j:
print i, j
example of broadcasting:
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.
In [41]:
2*a3
Out[41]:
In [42]:
a3+2
Out[42]:
In [43]:
# reverse sort -- best way?
#http://stackoverflow.com/a/6771620/7782
np.sort(np.arange(100))[::-1]
Out[43]:
This stuff is a bit tricky (see PfDA, pp. 89-92)
Consider example of picking out whole numbers less than 20 that are evenly divisible by 3. Generate a list of such numbers
In [44]:
# list comprehension
[i for i in xrange(20) if i % 3 == 0]
Out[44]:
In [45]:
a3 = np.arange(20)
a3
Out[45]:
In [46]:
# basic indexing
print a3[0]
print a3[::-1]
print a3[2:5]
In [47]:
np.mod(a3, 3)
Out[47]:
In [48]:
np.mod(a3, 3) == 0
Out[48]:
In [49]:
divisible_by_3 = np.mod(a3, 3) == 0
In [50]:
a3[divisible_by_3]
Out[50]:
In [51]:
# if you want to understand this in terms of the overloaded operators -- don't worry if you don't get this.
a3.__getitem__(np.mod(a3,3).__eq__(0))
Out[51]:
Use arange, np.sqrt, astype
In [52]:
a4 = arange(100)
a4sqrt = np.sqrt(a4)
a4[a4sqrt == a4sqrt.astype(np.int)]
Out[52]:
Make a series out of an array
In [53]:
s1 = Series(arange(5))
confirm that the type of s1 is what you would expect
In [54]:
type(s1)
Out[54]:
show that the series is also an array
In [55]:
s1.ndim, isinstance(s1, np.ndarray)
Out[55]:
In [56]:
s1.index
Out[56]:
In [57]:
import string
allTheLetters = string.lowercase
allTheLetters
Out[57]:
In [58]:
s2 = Series(data=arange(5), index=list(allTheLetters)[:5])
s2
Out[58]:
In [59]:
s2.index
Out[59]:
Compared with a regular NumPy array, you can use values in the index when selecting single values or a set of values
In [60]:
# can use both numeric indexing and the labels
s2[0], s2['a']
Out[60]:
In [61]:
for i in range(len(s2)):
print i, s2[i]
it is possible conflict in indexing -- consider
In [62]:
s3 = Series(data=['albert', 'betty', 'cathy'], index=[3,1, 0])
s3
Out[62]:
In [63]:
s3[0], list(s3)[0]
Out[63]:
but slicing works to return specific numeric index
In [64]:
s3[::-1]
Out[64]:
In [65]:
for i in range(len(s3)):
print i, s3[i:i+1]
In [66]:
s3.name = 'person names'
s3.name
Out[66]:
In [67]:
s3.index.name = 'confounding label'
s3.index.name
Out[67]:
In [68]:
s3
Out[68]:
Important points remaining:
You get some nice matplotlib integration via pandas
In [69]:
# Gauss addition using np.arange, Series
from pandas import Series
Series(arange(101).cumsum()).plot()
Out[69]:
In [70]:
from pandas import Series
Series((pow(2,k) for k in xrange(64)), dtype=np.float64).cumsum().plot()
Out[70]:
In [71]:
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html
from numpy import ones
In [72]:
2*ones(64, dtype=np.int)
Out[72]:
In [73]:
arange(64)
Out[73]:
In [74]:
sum(np.power(2, arange(64, dtype=np.uint64)))
Out[74]:
In [75]:
sum(np.power(2*ones(64, dtype=np.uint64), arange(64)))
Out[75]:
In [76]:
precise_ans = sum([pow(2,n) for n in xrange(64)])
np_ans = sum(np.power(2*ones(64, dtype=np.uint64), arange(64)))
precise_ans, np_ans
Out[76]:
In [77]:
# Raise an assertion if two items are not equal up to desired precision.
np.testing.assert_almost_equal(precise_ans, np_ans) is None
Out[77]:
so many ways to use DataFrames....let's try them out in context of the census calculations
In [78]:
# not really intuitive to me: reversal of column/row
DataFrame(dict([('06', {'name': 'California', 'abbreviation':'CA'})] ))
Out[78]:
In [79]:
DataFrame([{'name': 'California', 'abbreviation':'CA'}], index= ['06'])
Out[79]:
In [80]:
Series(['06'], name='FIPS')
Out[80]:
In [81]:
DataFrame([{'name': 'California', 'abbreviation':'CA'}],
index=Series(['06'], name='FIPS'))
Out[81]:
In [82]:
n0 = 5
n0 == 5
Out[82]:
Now I thought I'd be able to use a n0.__eq__(5) but nope -- it's complicated -- see http://stackoverflow.com/questions/2281222/why-when-in-python-does-x-y-call-y-eq-x#comment2254663_2282795
In [83]:
try:
n0.__eq__(5)
except Exception as e:
print e
can do: int.__cmp__(x)
In [84]:
(n0.__cmp__(4), n0.__cmp__(5), n0.__cmp__(6))
Out[84]:
how about ndarray?
In [85]:
arange(5) == 2
Out[85]:
In [86]:
#
# http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html
np.array_equal(arange(5) == 2 , arange(5).__eq__(2))
Out[86]:
Useful if you want to understand how the slicing syntax really works.
In [87]:
isinstance([1,2], list)
Out[87]:
In [88]:
isinstance(arange(5), list) # what does that mean -- could still be list-like
Out[88]:
In [89]:
l1 = range(5)
In [90]:
type(l1)
Out[90]:
In [91]:
l1[0], l1.__getitem__(0), l1[0] == l1.__getitem__(0)
Out[91]:
In [92]:
l1[::-1], l1.__getitem__(slice(None, None, -1))
Out[92]:
In [93]:
ar1 = arange(5)
ar1[3], ar1.__getitem__(3)
Out[93]:
In [94]:
ar1 == 2
Out[94]:
In [95]:
ar1[ar1 == 2].shape
Out[95]:
In [96]:
ar1.__eq__(2)
Out[96]:
In [97]:
ar1.__getitem__(slice(2, 4, None))
Out[97]:
In [98]:
slice(ar1.__eq__(2), None, None)
Out[98]:
In [99]:
ar1.__getitem__(ar1.__eq__(2))
Out[99]:
In [100]:
ar1[:2], ar1.__getitem__(slice(2))
Out[100]:
In [101]:
ar1 + 7
Out[101]:
In [102]:
ar1.__add__(7)
Out[102]:
In [103]:
min(ar1 + 7)
Out[103]:
In [104]:
alphabet[:]
Out[104]: