3-2 NumPy Array Basics - Helpful Methods, Shortcuts, NaN Values, and Dtypes


NumPy Array Basics - Helpful Methods and Shortcuts


In [2]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)


3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2

Now we’ve got our array and we’ve got a method or two that we can use to create them. Let’s learn a bit more about some of the functions that arrays give us.


In [3]:
npa = np.arange(25)

In [4]:
npa


Out[4]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [6]:
print(type(npa))


<class 'numpy.ndarray'>

First let’s checkout the type we’ve got, you can see it’s an ndarray which is the fundamental array in numpy. We can get the data type of the array. It’s worth elaborating a bit on dtypes or data types.

Unlike python, numpy has a variety of types. I’ll copy and paste the list here and speak to the different types.


In [7]:
print(npa.dtype)


int64

Data type Description

  • bool_
    • Boolean (True or False) stored as a byte
  • int_
    • Default integer type (same as C long; normally either int64 or int32)
  • intc
    • Identical to C int (normally int32 or int64)
  • intp
    • Integer used for indexing (same as C ssize_t; normally either int32 or int64)
  • int8
    • Byte (-128 to 127)
  • int16
    • Integer (-32768 to 32767)
  • int32
    • Integer (-2147483648 to 2147483647)
  • int64
    • Integer (-9223372036854775808 to 9223372036854775807)
  • uint8
    • Unsigned integer (0 to 255)
  • uint16
    • Unsigned integer (0 to 65535)
  • uint32
    • Unsigned integer (0 to 4294967295)
  • uint64
    • Unsigned integer (0 to 18446744073709551615)
  • float_
    • Shorthand for float64.
  • float16
    • alf precision float: sign bit, 5 bits exponent, 10 bits mantissa
  • float32
    • ingle precision float: sign bit, 8 bits exponent, 23 bits mantissa
  • float64
    • ouble precision float: sign bit, 11 bits exponent, 52 bits mantissa
  • complex_
    • Shorthand for complex128.
  • complex64
    • Complex number, represented by two 32-bit floats (real and imaginary components)
  • complex128
    • Complex number, represented by two 64-bit floats (real and imaginary component- s)

We’ve got all these different types that obviously have different abilities but let’s try creating some numpy arrays with those different types.


In [8]:
np.array(range(20))


Out[8]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

It’s worth starting that numpy arrays can be created in the fly simply by passing a list into the “array” function.

By default, numpy will try to guess what kinds of types you have in there but on creation you can also specify which type you would like. Now numpy arrays all have to have the same type, you can’t have a mix of floats and integers for example.


In [9]:
np.array([1.0,0,2,3])


Out[9]:
array([ 1.,  0.,  2.,  3.])

In [10]:
np.array([1.0,0,2,3]).dtype


Out[10]:
dtype('float64')

In [11]:
np.array([True,2,2.0]).dtype


Out[11]:
dtype('float64')

This allows for consistent memory allocation and vectorization.

Sometimes you can get strange types when you're not quite expecting them.


In [12]:
True == 1


Out[12]:
True

In [13]:
np.array([True,2,2.0])


Out[13]:
array([ 1.,  2.,  2.])

If you have any worry about what kind of array you’re creating, just specify the type.


In [14]:
np.array([True, 1, 2.0], dtype='bool_')


Out[14]:
array([ True,  True,  True], dtype=bool)

In [15]:
np.array([True, 1, 2.0], dtype='float_')


Out[15]:
array([ 1.,  1.,  2.])

In [16]:
np.array([True, 1, 2.0], dtype='uint8')


Out[16]:
array([1, 1, 2], dtype=uint8)

In [17]:
npa


Out[17]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

Numpy arrays also have some nifty functions that make our lives a lot easier. We can get the size, which is functionally equivalent to len in raw python. We can get the shape which gives us a hint as to where we’ll be going next, towards multidimensional arrays.


In [18]:
npa.size


Out[18]:
25

In [19]:
npa.shape


Out[19]:
(25,)

We can also get things like the maximum, the minimum, the mean, the standard deviation, and variance. This makes it super easy to perform calculations quickly.


In [22]:
print(npa.min(), npa.max())


0 24

In [23]:
npa.std()


Out[23]:
7.2111025509279782

In [24]:
npa.var()


Out[24]:
52.0

We can also just get the locations of certain values with argmin, argmax.


In [25]:
npa.argmin()


Out[25]:
0

In [26]:
npa.argmax()


Out[26]:
24

In [27]:
np.arange(1,10)


Out[27]:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
np.arange(10,1,-1)


Out[28]:
array([10,  9,  8,  7,  6,  5,  4,  3,  2])

Now that we’ve seen how we can play around with arrays and some of their helper functions. We can elaborate on all the different ways of creating them. We’ve seen arange but we can actually do some cool things with it. We can count up or down. We can also fill in between two numbers. Say we want 1 to 2 broken up into 5 numbers.


In [29]:
np.linspace(1,2,5)


Out[29]:
array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])

In [30]:
np.linspace(0,10,11)


Out[30]:
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])

What about converting that to integers? We can use the as type method to convert it.

We can convert it to an integer


In [31]:
np.linspace(0,10,11).astype('int')


Out[31]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

a 32bit float or any other type


In [32]:
np.linspace(0,10,11).astype(np.float32)


Out[32]:
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.], dtype=float32)

In [33]:
np.float32(5)


Out[33]:
5.0

Now there is one type that hasn’t been listed yet that is a bit difficult to describe. is nan or not a number. This will certainly come up and can throw off your analysis. Now nan’s are technically floats and can be created when you divide two 0 floating point numbers.


In [44]:
problem = np.array([0.])/np.array([0.])
# python throws an error because it knows something is wrong


/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/IPython/kernel/__main__.py:1: RuntimeWarning: invalid value encountered in true_divide
  if __name__ == '__main__':

In [35]:
problem[0]


Out[35]:
nan

In [36]:
np.nan


Out[36]:
nan

In [37]:
ar = np.linspace(0,10,11).astype(np.float32)
ar


Out[37]:
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.], dtype=float32)

In [38]:
ar[0] = np.nan

In [39]:
ar


Out[39]:
array([ nan,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.], dtype=float32)

If you’ve got a nan value in your array. It will throw off all your calculations. You’ve got to explicitly check for it in numpy.


In [40]:
ar.min()


Out[40]:
nan

In [41]:
ar.max()


Out[41]:
nan

In [42]:
ar.mean()


Out[42]:
nan

We’re going to build on this knowledge but at this point all that we need to know is that

  • arrays need to have the same values and
  • nan values affect statistics derived from our arrays like the mean

In [43]:
np.isnan(ar.max())


Out[43]:
True

As I said we’ll dive into this in more detail in the future. I just want you to be aware that there is a representation of illegal float values in numpy and pandas. In numpy those aren’t handled with out being explicit while in pandas they do some of the work for you.

Now that we understand more about array creation, some array properties, and data types. Let’s dive a bit more into vectorization.


In [ ]: