ndarray
objectndarray
objectNumpy is a fundamental library for Python which is used for scientific computing and manipulation of large arrays of data. Numpy performs mathematical, statistical and data manipulation on arrays much faster that regular Python operations. This difference is very important when you are performing large amount of calculations of arrays.
The major advantage of using Numpy for handling arrays, is that Numpy implements all the heavy lifting in C language which is much faster. This allows you to tap into the powerful C language performance from the comfort of Python.
Numpy is a part of the Scipy ecosystem of libraries which is used for mathimatics, science and engineering. This ecosystem includes:
ndarray
objectThe ndarray
(pronounced N D array) object is the main object for representing your array. This object can handle multi-dimensional array of any size that your memory can store. The biggest differences between a Python list
and a ndarray
object are these:
ndarray
is a fixed size array while list
has a dynamic size. When you reshare a ndarray
a new object is created with the new shape and the old object is deleted from memory.ndarray
allows mathematical and logical operations on complete multi-dimensional arrays. With a Python list
you will have to iterate over the sequence which take more time and code.ndarray
has homogeneous data type for the complete array while Python list
can contain multiple data types within a single array.*Note: You could have multiple data types in a single object that has multiple dimensions.
You will have to import Numpy in your code to use it. A common alias for Numpy is np
.
In [1]:
import numpy as np
In [2]:
#%%timeit
python_list_1 = list(range(5000000))
python_list_2 = list(range(5000000))
In [3]:
#%%timeit
np_array_1 = np.arange(5000000)
np_array_2 = np.arange(5000000)
In [4]:
%%timeit
np_array_1 + 7
In python it is much more complicated because you will have to do that manually with a loop. It is also much slower.
In [5]:
%%timeit
python_output = []
for i in python_list_1:
python_output.append(i + 7)
In [6]:
%%timeit
python_output = [i + 7 for i in python_list_1]
In [7]:
%%timeit
np_array_1 + np_array_2
In [8]:
%%timeit
python_output = []
for i in range(len(python_list_1)):
python_output.append(python_list_1[i] + python_list_2[i])
In [9]:
%%timeit
python_output = [python_list_1[i] + python_list_2[i] for i in range(len(python_list_1))]
In [10]:
%%timeit
np_array_1 * np_array_2
In [11]:
%%timeit
python_output = []
for i in range(len(python_list_1)):
python_output.append(python_list_1[i] * python_list_2[i])
In [12]:
%%timeit
python_output = [python_list_1[i] * python_list_2[i] for i in range(len(python_list_1))]
This is different that the previous pointwise product multiplication. If we had matrix $A$ and matrix $B$ and we wanted to multiply them the following must be true.
$A$ has size of $m,n$ (rows, columns) and $B$ has a size of $o,p$. To multiply:
$$A_{m,n} . B_{o,p} = C_{m,p}$$if $n=o$ (the inner dimension) resulting in a matrix of the size $(m,p)$ (the outer dimension).
In [13]:
A = np.random.randint(1,10,(3,1))
B = np.random.randint(1,10,(1,3))
print("A:\n==")
print(A)
print("B:\n==")
print(B)
Zhiang
numpy.random.randint(low, high=None, size=None)
Return random integers from low (inclusive) to high (exclusive).
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.random.randint.html
In [14]:
C = A.dot(B)
print(C)
In [15]:
C = B.dot(A)
print(C)
In [16]:
np_array_1 % 2 == 0
Out[16]:
In [17]:
python_output = [python_list_1[i] % 2 == 0 for i in python_list_1]
In [18]:
python_list_1 = [[i for l in range(2000)] for i in range(2000)]
python_list_2 = [[i for l in range(2000)] for i in range(2000)]
np_array_1 = np.array(python_list_1)
np_array_2 = np.array(python_list_2)
In [19]:
np_array_1
Out[19]:
Zhiang
numpy.arrary(multi_list)
Convert python multi_list to numpy arrary.
In [20]:
%%timeit
np_array_1 * np_array_2
In [21]:
%%timeit
python_output = []
for i in range(len(python_list_1)):
python_output.append([])
for l in range(len(python_list_2)):
python_output[-1:].append(python_list_1[i][l] * python_list_2[i][l])
In [22]:
np_array_1 = np.array(range(5000000))
In [23]:
np_array_1[0]
Out[23]:
In [24]:
np_array_1[-1]
Out[24]:
In [25]:
np_array_1[1:10]
Out[25]:
In [26]:
np_array_1[1:10:2]
Out[26]:
In [27]:
python_list_1 = [[i+l for l in range(2000)] for i in range(2000)]
np_array_1 = np.array(python_list_1)
np_array_1
Out[27]:
In [28]:
np_array_1[2]
Out[28]:
In [29]:
np_array_1[2][0]
Out[29]:
In [30]:
np_array_1[2,0]
Out[30]:
In [31]:
np_array_1[1:5,0]
Out[31]:
In [1]:
%%HTML
<img src="" alt="Array Filter" />
In [33]:
np_array_1 = np.array(range(10))
print(np_array_1)
In [34]:
mask = np.array([True, False, False, False, False, False, False, False, False, True])
print(np_array_1[mask])
In [35]:
np_array_1[np_array_1 < 5]
Out[35]:
In [36]:
np_array_1[np_array_1 % 2 == 0]
Out[36]:
In [37]:
def isprime(n):
'''check if integer n is a prime'''
# make sure n is a positive integer
n = abs(int(n))
# 0 and 1 are not primes
if n < 2:
return False
# 2 is the only even prime number
if n == 2:
return True
# all other even numbers are not primes
if not n & 1:
return False
# range starts with 3 and only needs to go up the squareroot of n
# for all odd numbers
for x in range(3, int(n**0.5)+1, 2):
if n % x == 0:
return False
return True
visprime = np.vectorize(isprime)
In [38]:
np_array_1[visprime(np_array_1)]
Out[38]:
In [39]:
np_array_1 = np.random.randint(1,10, size=(500000,))
np_array_1
Out[39]:
In [40]:
np_array_1 % 2 == 0
Out[40]:
In [41]:
(np_array_1 % 2 == 0).all()
Out[41]:
In [42]:
(np_array_1 % 2 == 0).any()
Out[42]:
In [43]:
np_array_1.max()
Out[43]:
In [44]:
np_array_1.min()
Out[44]:
In [45]:
np_array_1.mean()
Out[45]:
In [46]:
np_array_1.std()
Out[46]:
In [47]:
np_array_1.dtype
Out[47]:
In [48]:
np_array_1.shape
Out[48]:
In [49]:
np_array_1.ndim
Out[49]:
In [50]:
np_array_1.size * np_array_1.itemsize
Out[50]:
In [51]:
"MB: %0.1f" % (_ / 1024 / 1024)
Out[51]:
In [52]:
np_array_1.astype(float)
Out[52]:
In [53]:
np_array_1.reshape(500, 1000)
Out[53]:
In [54]:
np_array_1.reshape(100, 1000, 5)
Out[54]:
In [1]:
%%HTML
<img src="" alt="3D Array" />
Zhiang
reshape will reassign the elements from the last dimension to the top dimension.
np_array_1.reshape(100, 1000, 5)
The array will firstly be flattened out as 1-D array. Then it will form a 2-D array by grouping each 5 elements as a subarray. Then it will form a 3-D array by grouping each 1000 5-elements array as a subarrary. And the number of elements before and after reshaping should be consistent.
In [56]:
np.array(range(10))
Out[56]:
In [57]:
np.zeros((10,), dtype=int)
Out[57]:
In [58]:
np.ones((10,))
Out[58]:
In [59]:
np.arange(4, 14)
Out[59]:
In [60]:
np.linspace(1, 10)
Out[60]:
In [61]:
np.linspace(1, 10, 37)
Out[61]:
In [62]:
np.logspace(1, 5, 5)
Out[62]:
In [63]:
np.logspace(-1, 5, 7)
Out[63]:
In [64]:
np.logspace(-1, 5, 7, base=2)
Out[64]:
In [65]:
import matplotlib.pyplot as plt
%matplotlib inline
In [66]:
randim_image = np.random.rand(50,50,3)
plt.imshow(randim_image, interpolation="nearest")
Out[66]:
In [67]:
randim_image[:,:,0] = 0
plt.imshow(randim_image, interpolation="nearest")
Out[67]:
In [68]:
randim_image[:,:,2] /= 3
plt.imshow(randim_image, interpolation="nearest")
Out[68]:
In [69]:
randim_image[::2,:,1] = 1
plt.imshow(randim_image, interpolation="nearest")
Out[69]:
In numpy arrays, dimensionality refers to the number of axes needed to index it, not the dimensionality of any geometrical space. For example, you can describe the locations of points in 3D space with a 2D array:
array([[0, 0, 0],
[1, 2, 3],
[2, 2, 2],
[9, 9, 9]])
Which has shape of (4, 3) and dimension 2. But it can describe 3D space because the length of each row (axis 1) is three, so each row can be the x, y, and z component of a point's location. The length of axis 0 indicates the number of points (here, 4). However, that is more of an application to the math that the code is describing, not an attribute of the array itself. In mathematics, the dimension of a vector would be its length (e.g., x, y, and z components of a 3d vector), but in numpy, any "vector" is really just considered a 1d array of varying length. The array doesn't care what the dimension of the space (if any) being described is.
numpy.arrange((2,3,4,5))
axis 0 has 2 elements; axis 1 has 3 elements; axis 2 has 4 elements; axis 3 has 5 elements.