In [2]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)
Now we’ve got our array and we’ve got a method or two that we can use to create them. Let’s learn a bit more about some of the functions that arrays give us.
In [3]:
npa = np.arange(25)
In [4]:
npa
Out[4]:
In [6]:
print(type(npa))
First let’s checkout the type we’ve got, you can see it’s an ndarray which is the fundamental array in numpy. We can get the data type of the array. It’s worth elaborating a bit on dtypes or data types.
Unlike python, numpy has a variety of types. I’ll copy and paste the list here and speak to the different types.
In [7]:
print(npa.dtype)
Data type Description
We’ve got all these different types that obviously have different abilities but let’s try creating some numpy arrays with those different types.
In [8]:
np.array(range(20))
Out[8]:
It’s worth starting that numpy arrays can be created in the fly simply by passing a list into the “array” function.
By default, numpy will try to guess what kinds of types you have in there but on creation you can also specify which type you would like. Now numpy arrays all have to have the same type, you can’t have a mix of floats and integers for example.
In [9]:
np.array([1.0,0,2,3])
Out[9]:
In [10]:
np.array([1.0,0,2,3]).dtype
Out[10]:
In [11]:
np.array([True,2,2.0]).dtype
Out[11]:
This allows for consistent memory allocation and vectorization.
Sometimes you can get strange types when you're not quite expecting them.
In [12]:
True == 1
Out[12]:
In [13]:
np.array([True,2,2.0])
Out[13]:
If you have any worry about what kind of array you’re creating, just specify the type.
In [14]:
np.array([True, 1, 2.0], dtype='bool_')
Out[14]:
In [15]:
np.array([True, 1, 2.0], dtype='float_')
Out[15]:
In [16]:
np.array([True, 1, 2.0], dtype='uint8')
Out[16]:
In [17]:
npa
Out[17]:
Numpy arrays also have some nifty functions that make our lives a lot easier. We can get the size, which is functionally equivalent to len in raw python. We can get the shape which gives us a hint as to where we’ll be going next, towards multidimensional arrays.
In [18]:
npa.size
Out[18]:
In [19]:
npa.shape
Out[19]:
We can also get things like the maximum, the minimum, the mean, the standard deviation, and variance. This makes it super easy to perform calculations quickly.
In [22]:
print(npa.min(), npa.max())
In [23]:
npa.std()
Out[23]:
In [24]:
npa.var()
Out[24]:
We can also just get the locations of certain values with argmin, argmax.
In [25]:
npa.argmin()
Out[25]:
In [26]:
npa.argmax()
Out[26]:
In [27]:
np.arange(1,10)
Out[27]:
In [28]:
np.arange(10,1,-1)
Out[28]:
Now that we’ve seen how we can play around with arrays and some of their helper functions. We can elaborate on all the different ways of creating them. We’ve seen arange but we can actually do some cool things with it. We can count up or down. We can also fill in between two numbers. Say we want 1 to 2 broken up into 5 numbers.
In [29]:
np.linspace(1,2,5)
Out[29]:
In [30]:
np.linspace(0,10,11)
Out[30]:
What about converting that to integers? We can use the as type method to convert it.
We can convert it to an integer
In [31]:
np.linspace(0,10,11).astype('int')
Out[31]:
a 32bit float or any other type
In [32]:
np.linspace(0,10,11).astype(np.float32)
Out[32]:
In [33]:
np.float32(5)
Out[33]:
Now there is one type that hasn’t been listed yet that is a bit difficult to describe. is nan or not a number. This will certainly come up and can throw off your analysis. Now nan’s are technically floats and can be created when you divide two 0 floating point numbers.
In [44]:
problem = np.array([0.])/np.array([0.])
# python throws an error because it knows something is wrong
In [35]:
problem[0]
Out[35]:
In [36]:
np.nan
Out[36]:
In [37]:
ar = np.linspace(0,10,11).astype(np.float32)
ar
Out[37]:
In [38]:
ar[0] = np.nan
In [39]:
ar
Out[39]:
If you’ve got a nan value in your array. It will throw off all your calculations. You’ve got to explicitly check for it in numpy.
In [40]:
ar.min()
Out[40]:
In [41]:
ar.max()
Out[41]:
In [42]:
ar.mean()
Out[42]:
We’re going to build on this knowledge but at this point all that we need to know is that
In [43]:
np.isnan(ar.max())
Out[43]:
As I said we’ll dive into this in more detail in the future. I just want you to be aware that there is a representation of illegal float values in numpy and pandas. In numpy those aren’t handled with out being explicit while in pandas they do some of the work for you.
Now that we understand more about array creation, some array properties, and data types. Let’s dive a bit more into vectorization.
In [ ]: