Abhishek Gupta, Artificial Intelligence Club, DA-IICT
version 0.1, Released 25/1/2014
This work is licensed under a GNU General Public License, version 2
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:
Benefits of numpy:
NumPy's arrays are more compact than Python lists. Access in reading and writing items is also faster with NumPy. Maybe you don't care that much for just a million cells, but you definitely would for a billion cells -- neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) -- a much costlier piece of hardware!
The difference is mostly due to "indirectness" -- a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value -- and the memory allocators rounds up to 16). A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!
Numpy is not just more efficient, it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.
Functionality: You get a lot built in with Numpy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc. And really, who can live without FFTs?
In [2]:
from numpy import *
In [108]:
a = array((1,2,3,4)) #What is (1,2,3,4)
array?
print 'type:', type(a)
print 'dtype', a.dtype
print 'itemsize', a.itemsize
print 'shape', a.shape
print 'size', a.size
print 'len', len(a)
print 'nbytes', a.nbytes
print 'ndim', a.ndim
print a
By default operations are element-wise.
In [7]:
a + 1
Out[7]:
In [9]:
b = a + 1
print a + b
print a ** b
In [20]:
print sin(a)
range()
counterpart to numpy:
In [10]:
arange(11)
Out[10]:
In [11]:
arange(1,11,2)
Out[11]:
In [17]:
linspace(2,20,10)
Out[17]:
In [19]:
linspace(0.1,0.1254,10)
Out[19]:
Setting Array:
In [109]:
a[2] = 10
print a
In [110]:
a.fill(0)
print a
In [111]:
a.fill(-4.8)
print a
In [24]:
from matplotlib import pyplot as plt
%matplotlib inline
In [25]:
plt.plot(a,b)
Out[25]:
In [32]:
x = linspace(0,20,100)
plt.plot(x,sin(x))
plt.grid()
In [33]:
plt.plot(x, sin(x), x, sin(2*x))
Out[33]:
In [35]:
plt.plot(x, sin(x), 'r-^')
Out[35]:
In [36]:
plt.plot?
In [61]:
N = 70
a = random.rand(N)
b = random.rand(N)
area = pi * (15 * random.rand(N))**2 # 0 to 15 point radiuses
plt.scatter(a, b, s=area, alpha=0.5)
Out[61]:
In [64]:
plt.plot(a, b, 'bo', markersize=20, alpha=0.5)
Out[64]:
Multiple Figures:
In [68]:
fig1 = plt.figure()
plt.plot(a)
fig2 = plt.figure()
plt.plot(b)
Out[68]:
In [69]:
plt.subplot(2,1,1)
plt.plot(a)
plt.subplot(2,1,2)
plt.plot(b)
Out[69]:
By default hold is true.
In [71]:
plt.plot(a)
plt.plot(b)
Out[71]:
In [77]:
plt.plot(a)
plt.hold(False)
plt.plot(b)
plt.hold(True);
In [85]:
plt.plot(sin(x))
plt.xlabel('radians')
plt.ylabel('amplitude', fontsize='large')
plt.title('Sin(x)')
plt.show()
In [83]:
plt.plot(sin(x), label='sin(x)')
plt.plot(cos(x), label='cos(x)')
plt.legend()
Out[83]:
In [96]:
from scipy.misc import lena
img = lena()
plt.imshow(img, cmap=plt.get_cmap('gray'))
plt.colorbar()
Out[96]:
In [100]:
plt.hist(random.randn(1000))
Out[100]:
In [101]:
plt.hist(random.randn(1000),30)
Out[101]:
In [106]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
theta = linspace(-4 * pi, 4 * pi, 100)
z = linspace(-2, 2, 100)
r = z**2 + 1
x = r * sin(theta)
y = r * cos(theta)
ax.plot(x, y, z, label='parametric curve')
ax.legend()
plt.tight_layout()
In [113]:
a = linspace(0,10,10)
print a[-2:]
In [115]:
am = array([[1,2,3,4],
[5,6,7,8]])
print 'type:', type(am)
print 'dtype', am.dtype
print 'itemsize', am.itemsize
print 'shape', am.shape
print 'size', am.size
print 'len', len(am)
print 'nbytes', am.nbytes
print 'ndim', am.ndim
print am
In [119]:
am.shape = (4,-1)
print am
In [121]:
am = arange(50).reshape(-1, 5)
print am
In [125]:
#Python's way
#file = open('data.txt')
with open('data.txt') as file:
data = []
for line in file:
fields = line.split()
raw_data = [float(x) for x in fields]
data.append(raw_data)
data = array(data)
In [156]:
dat = loadtxt('data.txt', skiprows=1, dtype=int, delimiter=',', usecols=(0,1,2,4), comments='%')
print dat
In [157]:
savetxt('data2.txt', dat)
In [132]:
a = arange(0, 40).reshape(4,-1)
print a
In [131]:
a[0, 3:8]
Out[131]:
In [133]:
a[1:,5:]
Out[133]:
In [134]:
a[:,2]
Out[134]:
In [135]:
a[1,:]
Out[135]:
In [154]:
a = arange(0,80,10)
print a
indices = [1,2,3]
print a[indices]
In [162]:
mask = array([0,1,1,0,0,1,0,0], dtype=bool)
print mask
print a[mask]
In [164]:
mask2 = a < 30
print mask2
print a[mask2]
In [169]:
a = arange(40).reshape(5,-1)
print a
In [170]:
print a[(0,1,2,3,4), (1,2,3,4,5)]
In [171]:
print a[3:, [0, 2, 5]]
In [173]:
a[2,mask]
Out[173]:
Unlike slicing Fancy Indexing creates copy. Do you knwo why? - Strides(reshape, slice, transpose)
What is the difference between [(),()] and [[]/:, []/:]?
In [149]:
a = arange(36).reshape(-1,6)
a[:3]
Out[149]:
In [151]:
a[a[:,4]>10]
Out[151]:
When to use this?
In [178]:
print a
print where(a>10)
In [179]:
a = arange(12)
b = a[:,newaxis]
print b
In [181]:
c = a[newaxis,:]
print a
In [183]:
d = a[:, newaxis, newaxis]
print d
In [185]:
a = arange(4).reshape(2,2)
print a
In [189]:
b = a.flatten() #it returns new copy (of array)
print b
In [190]:
c = a.flat #an iterator object that access data in multi-dim as 1-D (akin pass by reference)
print c
In [193]:
print a
b = a.ravel() #pass by ?
print b
In [194]:
c = a.T.ravel() #pass by ?
print c
In [196]:
a = arange(40).reshape(4,1,1,-1)
print a
In [197]:
print a.shape
print a.squeeze().shape
In [200]:
a = array([[11,21,31],
[12,22,32],
[13,23,33]])
print a.diagonal()
In [201]:
print a.diagonal(offset=1)
print a.diagonal(offset=-1)
In [203]:
i = 0,1,2
print a[i,i]
In [206]:
i= array([0,1])
print a[i,i+1]
print a[i+1,i]
What is the need of ndarray
in second i
?
In [207]:
a = array([1+1j, 3,2,1j,4])
print a
In [208]:
print a.real
print a.imag
In [209]:
a.imag = [1,2,3,4,2]
print a
In [211]:
print a.conj()
For float
and other arrays real and imag
are available but you can't set the imag
. Its only readable.
In [214]:
a = arange(30).reshape(3,-1)
print a
In [215]:
print a.sum()
print a.sum(axis=0)
In [217]:
print a.prod()
print a.prod(axis=1)
In [219]:
print a.min(axis=0)
print amin(a, axis=0)
In [220]:
print a.max(axis=0)
print amax(a, axis=0)
Which to use??
In [222]:
print a.argmin(axis=0)
print a.argmax(axis=0)
In [223]:
print a.mean(axis=0)
In [224]:
print average(a,axis=0)
In [232]:
print average(a,weights=[4,1,2],axis=0)
In [233]:
a.std(axis=0)
Out[233]:
In [234]:
a.var(axis=0)
Out[234]:
In [237]:
b = a.clip(3,20) #set all values less than 3 to 3 and greater than 20 to 5
print a
print b
In [240]:
b = random.randn(10)
print b
print b.round()
In [241]:
b.round(decimals=1)
Out[241]:
In [248]:
print a.nonzero()
In [260]:
b= random.randn(3,5).round()
print b
b.sort(axis=1)
print b
In [265]:
a[0].searchsorted(5)
Out[265]:
In [266]:
print arange(5, 100, 5, dtype=float)
In [267]:
print ones((2,3), dtype=int)
In [268]:
print zeros((2,3), dtype=float)
In [269]:
print identity(3, dtype=short)
In [273]:
empty((2,4), dtype=float)
Out[273]:
In [271]:
a.fill(5.)
a[:] = 5.
Which to use??
In [295]:
a = arange(4).reshape(2,-1)
b = a.T
print add(a, b, a)
In [298]:
print a == b
print equal(a,b)
In [300]:
print a != b
print not_equal(a,b)
In [299]:
print a > b
print greater(a,b)
Similarily, we have less()/<, greater_equal()/>=, less_equal()/<=
.
In [301]:
logical_and(a,b)
Out[301]:
Similarily, we have logical_or(), logical_not(), logical_xor()
In [303]:
print all(logical_and(a,b))
print any(logical_and(a,b))
In [308]:
print (a>1) & (b>1)
In [275]:
m = mat('1,2,3;4,5,6;7,8,9') #matlab style
print m
In [276]:
m * m
Out[276]:
In [277]:
m**4
Out[277]:
All are matrix operations, not element-wise
In [282]:
ma = mat(arange(4).reshape(2,-1))
mb = ma.T
print ma
print mb
In [285]:
mBlock = bmat('ma,mb;mb,ma')
print mBlock
op : {mathematical, comparative, logical, bitwise}
op.reduce(a, axis=0)
: Reduce 1-D array to single value by applying op.
op.accumulate(a, axis=0)
: Create new array contain intermediate result of the op.reduce at each element.
op.outer(a, b)
: Forms all possible combinations of elements in a and b using the op.
op.reduceat(a, indicies)
: Reduce at specified indicies(1-D). For M-D, it always applicable on last axis.
In [309]:
a = arange(30).reshape(3,-1)
print a
In [313]:
add.reduce(a,axis=0)
Out[313]:
In [315]:
add.accumulate(a,axis=1)
Out[315]:
In [319]:
add.outer(a[0:2,0:2], a[0:2,1:3])
Out[319]:
In [322]:
add.reduceat(a[0],[2,5,6])
Out[322]:
In [328]:
#y = choose(choose_array, (option1, option2, option3))
a = array([[1,11,20],
[21,2,12],
[10,3,22]])
choice_array = array([[0,1,2],
[0,2,1],
[2,0,1]])
print choose(choice_array, (a, [-1,-2,-3], 100))
In [329]:
#Use case
print choose((a<10) + 2*(a>15), (a,10,15))
In [331]:
x = array([[0,1],[10,11]])
y = x + 5
print x
print y
In [338]:
cat = concatenate((x,y))
print cat
print cat.shape
In [337]:
cat = concatenate((x,y),axis=1)
print cat
print cat.shape
In [336]:
cat = array((x,y))
print cat
print cat.shape
In [339]:
import sys
print sys.version