4 NumPy and Pandas

Introduction

After getting used to how Python works, it is the moment to begin getting our hands dirty with data analysis. We will study two packages: NumPy is the fundamental numeric computing and linear algebra package in Python, that allows for decent data analysis. We will learn it not only for the data analysis, but more importantly because it will be a package that will be always present in our import section as scientists. After NumPy we will go to Pandas. Pandas is a dedicated data analysis package with a lot more functionalities than NumPy, making our life much easier in terms of data visualization and manipulation. All the power of Pandas will be completely unleashed in Section 5, where we will see how to visualize information in plots.

As usual, we begin importing the necessary packages



In [1]:

    
import numpy as np
import pandas as pd
print('NumPy:', np.__version__)
print('Pandas:', pd.__version__)









    



NumPy: 1.13.3
Pandas: 0.22.0

Numeric Python (NumPy)

NumPy is an open-source add-on module to Python that provides common mathematical and numerical routines in pre-compiled, fast functions. These are growing into highly mature packages that provide functionality that meets, or perhaps exceeds, that associated with common commercial software like MATLAB. The NumPy (Numeric Python) package provides basic routines for manipulating large arrays and matrices of numeric data. The main object NumPy works with is a homogeneous multidimensional array. Despite its intimidating name, these are nothing but tables of numbers, each labelled by a tuple of indices.

We will now explore some capabilities of NumPy that will prove very useful not only for data analyisis, but throughout all our life with Python.

Creating Arrays

As mentioned before, the main object in NumPy is the array. Creating one is as easy as calling the command array



In [2]:

    
mylist = [1, 2, 3]
x = np.array(mylist)
x









    Out[2]:





array([1, 2, 3])



In [3]:

    
type(x)









    Out[3]:





numpy.ndarray

The same applies to multidimensional arrays



In [4]:

    
m = np.array([[[7, 8, 9], [10, 11, 12]], [[1, 2, 3], [4, 5, 6]]])
m









    Out[4]:





array([[[ 7,  8,  9],
        [10, 11, 12]],

       [[ 1,  2,  3],
        [ 4,  5,  6]]])

There is one restriction with respect to the use of lists: while you could create lists with data of different type, all the data in an array has to be of the same type, and it will be converted automatically.



In [5]:

    
lst = [1., 'cat']
print(type(lst[0]))

arr = np.array(lst)
print(type(arr[0]))









    



<class 'float'>
<class 'numpy.str_'>

(We will go deeper into indexing in a while)

A NumPy array has a number of dimensions (or axes). To obtain the number of axes and the size of each of them you use the command shape. For 2-dimensional arrays (matrices), the order corresponds to (rows, columns)

There are two different ways of calling the shape command, either with np.shape(arr) or arr.shape. This is not the only command that works in both formats, and we will be finding some more in our way.



In [6]:

    
print(x.shape)
print(np.shape(m))









    



(3,)
(2, 2, 3)

Special Arrays

Now we review some built-in functions that create matrices commonly used. ones and zeros return arrays of given shape and type (default is float64), filled with ones or zeros, respectively.



In [7]:

    
array_zeros=np.zeros((3, 2))
array_ones=np.ones((3, 2, 4), dtype=np.int8)



In [8]:

    
array_zeros









    Out[8]:





array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])



In [9]:

    
array_ones









    Out[9]:





array([[[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int8)

eye(d) returns a 2-D, dimension-$d$ array with ones on the diagonal and zeros elsewhere.



In [10]:

    
np.eye(3)









    Out[10]:





array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

eye can also create arrays with ones in upper and lower diagonals. To achieve this, you must call eye(d, d, k) where $k$ denotes the diagonal (positive for above the center diagonal, negative for below), or eye(d, k=num)



In [11]:

    
print(np.eye(5, 5, 2))
np.eye(5, k=-1)









    



[[ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]






    Out[11]:





array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.]])

diag, depending on the input, either extracts a diagonal from a matrix (if the input is a 2-D array), or constructs a diagonal array (if the input is a vector).



In [12]:

    
np.diag(x, 1)









    Out[12]:





array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])



In [13]:

    
y = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
print(np.diag(y))
np.diag(np.diag(y))









    



[ 1  6 11 16]






    Out[13]:





array([[ 1,  0,  0,  0],
       [ 0,  6,  0,  0],
       [ 0,  0, 11,  0],
       [ 0,  0,  0, 16]])

arange(begin, end, step) returns evenly spaced values within a given interval. Note that the beginning point is included, but not the ending.



In [14]:

    
n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n









    Out[14]:





array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])



In [15]:

    
len(n)









    Out[15]:





15

Exercise 1: Create an array of the first million of odd numbers, both with arange and using loops. Try timing both methods to see which one is faster. For that, use %timeit.



In [16]:

    
%timeit np.arange(0, 2e6, 2)

%timeit [i for i in range(2000000) if i % 2 == 0]









    



966 µs ± 55.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
207 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Similarly, linspace(begin, end, points) returns evenly spaced numbers over a specified interval. Here, instead of specifying the step, you specify the amount of points you want. Also with linspace you include the ending of the interval.



In [17]:

    
o = np.linspace(0, 30, 15)
o









    Out[17]:





array([  0.        ,   2.14285714,   4.28571429,   6.42857143,
         8.57142857,  10.71428571,  12.85714286,  15.        ,
        17.14285714,  19.28571429,  21.42857143,  23.57142857,
        25.71428571,  27.85714286,  30.        ])



In [18]:

    
len(o)









    Out[18]:





15

reshape changes the shape of an array, but not its data. This is another of the commands that can be called before or after the array.



In [19]:

    
print(n.reshape(3, 5))
np.reshape(n, (5, 3))









    



[[ 0  2  4  6  8]
 [10 12 14 16 18]
 [20 22 24 26 28]]






    Out[19]:





array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22],
       [24, 26, 28]])

Note however that, in order for these changes to be permanent, you should do a reassignment of the variable



In [20]:

    
print(n)  # After the reshapings above, the original array stays being the same

n = n.reshape(3, 5)
n   # Now that we have reassigned it is when it definitely changes shape









    



[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]






    Out[20]:





array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

Combining Arrays

The most general command for combining arrays is concatenate(arrs, d). It takes a list of arrays and concatenates them along axis $d$



In [21]:

    
p = np.ones([2, 2, 2])
p









    Out[21]:





array([[[ 1.,  1.],
        [ 1.,  1.]],

       [[ 1.,  1.],
        [ 1.,  1.]]])



In [22]:

    
np.concatenate([p, 2 * p], 0)









    Out[22]:





array([[[ 1.,  1.],
        [ 1.,  1.]],

       [[ 1.,  1.],
        [ 1.,  1.]],

       [[ 2.,  2.],
        [ 2.,  2.]],

       [[ 2.,  2.],
        [ 2.,  2.]]])



In [23]:

    
np.concatenate([p, 2 * p], 1)









    Out[23]:





array([[[ 1.,  1.],
        [ 1.,  1.],
        [ 2.,  2.],
        [ 2.,  2.]],

       [[ 1.,  1.],
        [ 1.,  1.],
        [ 2.,  2.],
        [ 2.,  2.]]])



In [24]:

    
np.concatenate([p, 2 * p], 2)









    Out[24]:





array([[[ 1.,  1.,  2.,  2.],
        [ 1.,  1.,  2.,  2.]],

       [[ 1.,  1.,  2.,  2.],
        [ 1.,  1.,  2.,  2.]]])

However, for common combinations there exist special commands. Use vstack to stack arrays in sequence vertically (row wise), hstack to stack arrays in sequence horizontally (column wise), and block to create arrays out of blocks (only available in versions 1.13.0+)



In [25]:

    
q = np.ones((2, 2))
np.vstack([q, 2 * q])









    Out[25]:





array([[ 1.,  1.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 2.,  2.]])



In [26]:

    
np.hstack([q, 2 * q])









    Out[26]:





array([[ 1.,  1.,  2.,  2.],
       [ 1.,  1.,  2.,  2.]])



In [27]:

    
np.block([[q, np.zeros((2, 2))], [np.zeros((2, 2)), 2 * q]])









    Out[27]:





array([[ 1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  2.],
       [ 0.,  0.,  2.,  2.]])

Operations

You can perform easily element-wise operations on arrays of any shape. Use the typical symbols, +, -, *, / and ** to perform element-wise addition, subtraction, multiplication, division and power.



In [28]:

    
x = np.array([1, 2, 3])
print(x)
print(x + 10)
print(3 * x)
print(1 / x)
print(x ** (-2 / 3))
print(2 ** x)









    



[1 2 3]
[11 12 13]
[3 6 9]
[ 1.          0.5         0.33333333]
[ 1.          0.62996052  0.48074986]
[2 4 8]

Also (and obviously) these symbols can be used to operate between two arrays, which must be of the same shape. If this is the case, they also do element-wise operations



In [29]:

    
y = np.arange(4, 7, 1)
print(x + y)     # [1+4, 2+5, 3+6]
print(x * y)     # [1*4, 2*5, 3*6]
print(x / y)     # [1/4, 2/5, 3/6]
print(x ** y)    # [1**4, 2**5, 3**6]









    



[5 7 9]
[ 4 10 18]
[ 0.25  0.4   0.5 ]
[  1  32 729]

For doing vector or matrix multiplication, the command to be used is dot



In [30]:

    
x.dot(y) # 1*4 + 2*5 + 3*6









    Out[30]:





32

With python 3.5 matrix multiplication got it's own operator



In [31]:

    
x@y









    Out[31]:





32



In [32]:

    
X = np.array([[i + j for i in range(3, 6)] for j in range(3)])
Y = np.diag([1, 1], 1) + np.diag([1], -2)

print('{}\n'.format(X))
print(X * Y)
np.dot(X, Y)









    



[[3 4 5]
 [4 5 6]
 [5 6 7]]

[[0 4 0]
 [0 0 6]
 [5 0 0]]






    Out[32]:





array([[5, 3, 4],
       [6, 4, 5],
       [7, 5, 6]])

Exercise 2: Take a 10x2 matrix representing $(x1,x2)$ coordinates and transform them into polar coordinates $(r,\theta)$.

Hint 1: the inverse transformation is given by $x1 = r\cos\theta$, $x2 = r\sin\theta$

Hint 2: generate random numbers with the functions in numpy.random



In [33]:

    
z = np.random.random((10, 2))
x1, x2 = z[:, 0], z[:, 1]
R = np.sqrt(x1 ** 2 + x2 ** 2)
T = np.arctan2(x2, x1)
print(R)
print(T)









    



[ 0.88919595  0.67700298  0.43657171  0.99695062  0.09369465  0.42234775
  0.9362286   0.56539581  1.01339843  0.72450275]
[ 0.08591593  1.31038962  1.40461825  0.53314706  1.36699826  1.2306954
  0.49694134  0.68438617  0.83524534  1.54853991]

Transposing

Transposition is a very important operation for linear algebra. Although NumPy is capable of correctly doing matrix-vector products correctly regardless of the orientation of the vector, it is not the case for products of matrices



In [34]:

    
Z = np.arange(0, 12, 1).reshape((4, 3))



In [35]:

    
np.dot(Z, y)









    Out[35]:





array([ 17,  62, 107, 152])



In [36]:

    
np.dot(Z, y.T)









    Out[36]:





array([ 17,  62, 107, 152])



In [37]:

    
Z.dot(X)









    Out[37]:





array([[ 14,  17,  20],
       [ 50,  62,  74],
       [ 86, 107, 128],
       [122, 152, 182]])



In [38]:

    
(Z.T).dot(X)









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-2e11f30e177d> in <module>()
----> 1 (Z.T).dot(X)

ValueError: shapes (3,4) and (3,3) not aligned: 4 (dim 1) != 3 (dim 0)

Array Methods

To not have to go from NumPy arrays to lists back and forth, NumPy contains some functions to know properties of your arrays. Actually, there are more of these functions than in standard Python.



In [39]:

    
a = np.array([-4, -2, 1, 3, 5])
print(a.max())
print(a.min())
print(a.sum())
print(a.mean())
print(a.std())









    



5
-4
3
0.6
3.26190128606

Some interesting functions are argmax and argmin, which return the index of the maximum and minimum values in the array.



In [40]:

    
print(a.argmax())
print(a.argmin())

4
0

Indexing/Slicing

We have already seen briefly that to access individual elements you use the bracket notation: array[ax_0, ax_1, ...], where the ax_i denotes the coordinate in the i-th axis. You can even use this to assign new values to your elements.



In [41]:

    
r = [4, 5, 6, 7]
print(r[2])
r[0] = 198
r









    



6






    Out[41]:





[198, 5, 6, 7]

To select a range of rows or columns you can use a colon :. A second : can be used to indicate the step size. array[start:stop:stepsize]. If you leave start (stop) blank, the selection will go from the very beginning (until the very end) of the array



In [42]:

    
s = np.arange(13)**2
print(s)
print(s[3:9])
print(s[2:10:3])
s[-5::-2]









    



[  0   1   4   9  16  25  36  49  64  81 100 121 144]
[ 9 16 25 36 49 64]
[ 4 25 64]






    Out[42]:





array([64, 36, 16,  4,  0])

The same applies to matrices or higher-dimensional arrays



In [43]:

    
r = np.arange(36).reshape((6, 6))
r









    Out[43]:





array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])



In [44]:

    
r[2:5, 1:3]









    Out[44]:





array([[13, 14],
       [19, 20],
       [25, 26]])

You can also select specific rows and columns, separated by commas



In [45]:

    
r[[1, 3, 4], 1:3]









    Out[45]:





array([[ 7,  8],
       [19, 20],
       [25, 26]])

A very useful tool is conditional indexing, where we apply a function, assignment... only to those elements of an array that satisfy some condition



In [46]:

    
r[r > 30] = 30
r









    Out[46]:





array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Exercise 3: Create a random 1-dimensional array, and find which element is closest to 0.7



In [47]:

    
Z = np.random.uniform(0,1,100)
z = 0.7
m = Z[np.abs(Z - z).argmin()]
print(m)









    



0.705902515629

Copying Data

Be very careful with copying and modifying arrays in NumPy! You will see the reason right now. Let's begin defining r2 as a slice of r



In [48]:

    
r2 = r[:3,:3]
r2









    Out[48]:





array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

And now let's set all its elements to zero



In [49]:

    
r2[:] = 0
r2









    Out[49]:





array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

When looking at r, we see that it has also been changed!



In [50]:

    
r









    Out[50]:





array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

The proper way of handling selections without modifying the original arrays is through the copy command.



In [51]:

    
r_copy = r.copy()
r_copy









    Out[51]:





array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Now we can safely modify r_copy without affecting r.



In [52]:

    
r_copy[:] = 10
print('{}\n'.format(r_copy))
r









    



[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]]







    Out[52]:





array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

Iterating Over Arrays

Finally, you can iterate over arrays in the same way as you iterate over lists



In [53]:

    
test = np.random.randint(0, 10, (4,3))
test









    Out[53]:





array([[1, 6, 0],
       [4, 8, 0],
       [2, 1, 6],
       [1, 7, 4]])

You can iterate by row:



In [54]:

    
for row in test:
    print(row)









    



[1 6 0]
[4 8 0]
[2 1 6]
[1 7 4]

Or by row index



In [55]:

    
for i in range(len(test)):
    print(test[i])









    



[1 6 0]
[4 8 0]
[2 1 6]
[1 7 4]

Or by row and index:



In [56]:

    
for i, row in enumerate(test):
    print('Row {} is {}'.format(i, row))









    



Row 0 is [1 6 0]
Row 1 is [4 8 0]
Row 2 is [2 1 6]
Row 3 is [1 7 4]

In the same way as with lists, you can use zip to iterate over multiple iterables.



In [57]:

    
test2 = test**2
test2









    Out[57]:





array([[ 1, 36,  0],
       [16, 64,  0],
       [ 4,  1, 36],
       [ 1, 49, 16]])



In [58]:

    
for i, j in zip(test, test2):
    print('{} + {} = {}'.format(i, j, i + j))









    



[1 6 0] + [ 1 36  0] = [ 2 42  0]
[4 8 0] + [16 64  0] = [20 72  0]
[2 1 6] + [ 4  1 36] = [ 6  2 42]
[1 7 4] + [ 1 49 16] = [ 2 56 20]

Exercise 4: Create a function that iterates over the columns of a 2-dimensional array



In [59]:

    
def iterate(df):
    for i, row in enumerate(df):
        shp = row.shape
        row.shape = shp + (1,)
        print('Column {} is {}'.format(i, row))

iterate(test.T)









    



Column 0 is [[1]
 [4]
 [2]
 [1]]
Column 1 is [[6]
 [8]
 [1]
 [7]]
Column 2 is [[0]
 [0]
 [6]
 [4]]

Loading and Saving Data

To load and save data NumPy has the loadtxt and savetxt commands. However, they only work for two-dimensional arrays



In [60]:

    
np.savetxt('numpytest.txt', test)
np.loadtxt('numpytest.txt')









    Out[60]:





array([[ 1.,  6.,  0.],
       [ 4.,  8.,  0.],
       [ 2.,  1.,  6.],
       [ 1.,  7.,  4.]])

Pandas

When dealing with numeric matrices and vectors in Python, NumPy makes life a lot easier. For more complex data, however, it leaves a bit to be desired. For those used to working with dedicated languages like R, doing data analysis directly with numpy feels like a step back. Fortunately, some nice folks have written the Python Data Analysis Library (a.k.a. pandas). Pandas provides an R-like DataFrame, produces high quality plots with matplotlib, and integrates nicely with other libraries that expect NumPy arrays.

Pandas works with Series of data, that then are arranged in DataFrames. A dataframe will be the object closest to an Excel spreadsheet that you will see throughout the course (but of course, given that it is integrated in Python and can be combined with so many different packages, dataframes are much more powerful than Excel spreadsheets). The data in the series can be either qualitative or quantitative data. Creating a series is as easy as creating a NumPy array from a one-dimensional list.



In [61]:

    
animals = ['Tiger', 'Bear', 'Moose']
pd.Series(animals)









    Out[61]:





0    Tiger
1     Bear
2    Moose
dtype: object



In [62]:

    
numbers = [1, 2, 3]
pd.Series(numbers)









    Out[62]:





0    1
1    2
2    3
dtype: int64

Notice that the series is indexed by default by integers. You can change this indexing by using a dictionary instead of a list for creating the series.



In [63]:

    
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s









    Out[63]:





Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

On the other hand, DataFrames can be built from two-dimensional arrays, with the ability of labelling columns and indexing the rows



In [64]:

    
u = pd.DataFrame(np.random.randn(1000,6), index=np.arange(0, 3000, 3), 
                 columns=['A', 'B', 'C', 'D', 'E', 'F'])
u









    Out[64]:







  
    
      
      A
      B
      C
      D
      E
      F
    
  
  
    
      0
      -0.761486
      -1.856963
      -0.259506
      -0.965839
      0.595335
      -1.393989
    
    
      3
      0.730582
      -1.933338
      0.159036
      -0.303658
      0.285985
      -0.015358
    
    
      6
      1.181648
      -0.308067
      0.194845
      -0.110363
      -0.377039
      -0.563322
    
    
      9
      -0.207188
      0.887225
      -0.089879
      1.535221
      -0.178550
      -0.119188
    
    
      12
      -1.452087
      -1.678056
      0.442135
      0.815065
      -0.788070
      0.865199
    
    
      15
      -0.312236
      0.405971
      -0.916694
      -1.570857
      -1.808676
      -0.601334
    
    
      18
      -0.520655
      0.520683
      -0.420893
      0.866483
      0.707625
      -0.457042
    
    
      21
      -0.763168
      1.941782
      -0.345903
      -1.078514
      -2.348245
      -0.507215
    
    
      24
      0.286458
      -0.203965
      2.202701
      0.305345
      -0.373617
      -0.516685
    
    
      27
      0.483357
      0.582012
      0.151425
      -0.280406
      -1.193848
      0.074661
    
    
      30
      0.251333
      -0.984670
      -1.126600
      1.437455
      1.144455
      0.916344
    
    
      33
      0.501224
      -1.196463
      -0.890707
      -0.330306
      1.701601
      0.195782
    
    
      36
      -0.493956
      -1.001679
      -0.944730
      0.740723
      0.359925
      1.290009
    
    
      39
      -0.616836
      0.040601
      0.839057
      0.233386
      0.572832
      -0.672458
    
    
      42
      -1.017777
      0.913523
      -2.204982
      -0.398876
      1.027856
      -0.112541
    
    
      45
      1.037946
      1.087577
      -0.024363
      -1.386254
      -0.700079
      -0.451928
    
    
      48
      0.590588
      1.912149
      0.382943
      1.103702
      -0.174166
      -0.475542
    
    
      51
      0.123948
      -1.953421
      -0.829696
      0.385270
      -1.386090
      0.290291
    
    
      54
      -0.775569
      2.293993
      -0.889644
      -0.786743
      0.471794
      0.610733
    
    
      57
      0.237778
      -2.610199
      -0.629762
      0.555422
      0.843687
      -2.165088
    
    
      60
      -0.971872
      1.174113
      0.521214
      0.634707
      0.373459
      1.213645
    
    
      63
      1.167321
      0.057173
      0.485950
      0.605712
      -0.307385
      -1.319829
    
    
      66
      0.159974
      -1.433897
      0.605769
      0.010578
      -2.978682
      -0.352156
    
    
      69
      -0.782439
      1.053819
      0.533252
      0.878424
      -0.269466
      0.495181
    
    
      72
      0.891193
      0.364128
      0.178855
      -1.640823
      1.809076
      -0.263593
    
    
      75
      0.051243
      -0.263917
      0.640288
      -2.158057
      -0.508163
      1.132559
    
    
      78
      -1.096922
      -1.401145
      0.833753
      -1.807646
      -1.168890
      1.390710
    
    
      81
      -0.988110
      -0.369140
      -0.237166
      -1.303915
      0.888377
      1.142403
    
    
      84
      -0.865377
      0.323423
      0.492383
      1.370745
      -0.021754
      -1.719541
    
    
      87
      0.321882
      1.381522
      -0.674172
      -1.083496
      1.914568
      1.028375
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2910
      0.314307
      -0.431866
      1.527405
      0.601484
      -0.160716
      1.314302
    
    
      2913
      0.257940
      -0.099765
      1.157054
      -0.060946
      -0.494843
      0.341643
    
    
      2916
      -0.428302
      1.507066
      0.611729
      -0.332719
      -0.196556
      0.423443
    
    
      2919
      0.349743
      -0.282030
      0.196447
      0.002695
      0.136590
      0.401846
    
    
      2922
      -0.307028
      -0.791913
      -0.265240
      0.125383
      -1.003613
      -1.160399
    
    
      2925
      0.219177
      0.940110
      0.781867
      0.126816
      0.161123
      1.089183
    
    
      2928
      -1.196400
      -0.184507
      1.962129
      -0.343204
      1.436841
      -1.038041
    
    
      2931
      -0.272810
      0.410523
      0.541178
      -0.793986
      0.742207
      2.590795
    
    
      2934
      -0.439342
      0.136602
      -0.225461
      0.364491
      1.802063
      -0.582919
    
    
      2937
      0.225935
      1.015986
      -1.177095
      -0.971220
      0.236631
      0.891794
    
    
      2940
      -0.252398
      -0.012515
      -0.207346
      -0.493030
      -0.983243
      -0.657357
    
    
      2943
      0.868827
      -0.624654
      0.365266
      -1.029895
      -0.904947
      0.933805
    
    
      2946
      -1.673025
      -0.421382
      2.676548
      -1.360665
      -0.999639
      -0.409515
    
    
      2949
      -0.629402
      -0.742095
      0.559815
      -1.556713
      -2.251803
      -1.368577
    
    
      2952
      0.025487
      -0.217591
      0.182265
      -1.496190
      0.499133
      -0.494414
    
    
      2955
      -1.311549
      -1.553367
      -0.267299
      1.937489
      -1.462930
      0.015008
    
    
      2958
      -0.629710
      -1.414470
      0.165272
      -0.123763
      -0.774348
      2.061785
    
    
      2961
      -0.018643
      -2.001437
      -0.692074
      2.882349
      -0.027989
      -0.864119
    
    
      2964
      -1.199184
      -2.128532
      -1.322724
      -0.951057
      -0.370693
      0.145449
    
    
      2967
      0.720639
      -1.539494
      0.966309
      1.557915
      0.100207
      0.188154
    
    
      2970
      1.483731
      0.821352
      0.872577
      -1.694163
      1.452975
      1.234305
    
    
      2973
      -0.203272
      -1.026092
      -0.303214
      -0.634275
      0.454736
      0.541063
    
    
      2976
      -0.519535
      -1.405036
      -0.218642
      0.372721
      0.585361
      0.018672
    
    
      2979
      0.419742
      0.763683
      -0.111488
      -0.397157
      -1.451003
      1.081199
    
    
      2982
      1.542049
      -0.929883
      -2.279710
      0.352081
      -0.345532
      0.163429
    
    
      2985
      -0.779326
      0.424736
      0.548876
      -2.316646
      1.518206
      0.038710
    
    
      2988
      1.245536
      1.010383
      -1.802921
      -2.585215
      0.362162
      -0.180893
    
    
      2991
      0.752105
      -1.444161
      -0.507737
      0.018064
      1.158738
      -0.461061
    
    
      2994
      -0.688970
      0.508954
      -0.110612
      -0.657601
      -1.061930
      0.050378
    
    
      2997
      0.208066
      1.819959
      -1.461952
      1.949032
      -0.814686
      -0.153806
    
  

1000 rows × 6 columns

As you might have noticed, it is a bit ugly to deal with large dataframes. There are however some functions that allows to have an idea of the data in a frame.



In [65]:

    
u.head()



In [66]:

    
u.tail()



In [67]:

    
u.describe()









    Out[67]:







  
    
      
      A
      B
      C
      D
      E
      F
    
  
  
    
      count
      1000.000000
      1000.000000
      1000.000000
      1000.000000
      1000.000000
      1000.000000
    
    
      mean
      -0.050794
      0.061913
      -0.025466
      0.045786
      0.022753
      -0.060224
    
    
      std
      1.020710
      1.005581
      0.993343
      1.005364
      1.025786
      1.005457
    
    
      min
      -3.205216
      -3.110890
      -3.052732
      -2.773758
      -3.470618
      -3.867963
    
    
      25%
      -0.751474
      -0.631540
      -0.695884
      -0.640073
      -0.654438
      -0.764999
    
    
      50%
      -0.061007
      0.098579
      -0.029510
      0.047236
      0.037100
      -0.037634
    
    
      75%
      0.650328
      0.757897
      0.647189
      0.720574
      0.731039
      0.604459
    
    
      max
      3.796625
      3.126197
      2.889348
      2.913861
      3.159995
      4.216350

One can also change the maximal number of rows that is displayed:



In [68]:

    
pd.set_option('display.max_rows', 15)

Pandas can also generate random DataFrames for testing:



In [69]:

    
import pandas.util.testing as tm

tm.makeDataFrame().head()

Indexing/Slicing in Pandas

The easiest way of accessing information in a Pandas dataframe, equivalent to the way used in NumPy, is using the iloc command. With this you can also set specific values, do conditional indexing... all that we have seen before in section 2.4



In [70]:

    
u.iloc[125:132,[0, 2, 5]]

However, there are a few different ways of accessing the data in a Pandas dataframe, that typically have a more "direct" connection with the actual content fo the dataframe. Individual or sets of columns can also be accessed by their column names. Choosing one single column will give a Series, while two or more will produce a Dataset



In [71]:

    
u['A'].head()









    Out[71]:





0    -0.761486
3     0.730582
6     1.181648
9    -0.207188
12   -1.452087
Name: A, dtype: float64



In [72]:

    
u[['A', 'D']].head()

Not only that, you can access a single column without the need of brackets []



In [73]:

    
u.A.head()









    Out[73]:





0    -0.761486
3     0.730582
6     1.181648
9    -0.207188
12   -1.452087
Name: A, dtype: float64

The usual [] will select specific rows according to the row number



In [74]:

    
u[0:10][list('BCF')]

You can also choose specific rows according to their indices with the loc command



In [75]:

    
u.loc[6:15]

Or, you can access just the elements that satisfy some condition



In [76]:

    
u[u.D > 2]









    Out[76]:







  
    
      
      A
      B
      C
      D
      E
      F
    
  
  
    
      96
      -1.062619
      -0.326839
      0.698481
      2.260445
      -0.480326
      -0.030888
    
    
      231
      -0.062075
      1.700715
      0.004098
      2.265953
      0.971236
      -1.165149
    
    
      243
      -1.548629
      -0.682061
      -0.988246
      2.101512
      -0.281365
      0.165648
    
    
      294
      1.536812
      -0.065230
      0.089026
      2.173031
      -0.229206
      3.471096
    
    
      717
      1.043050
      -0.596884
      -2.254242
      2.728187
      0.597010
      1.541664
    
    
      984
      0.697663
      -0.625542
      -0.445807
      2.276466
      -0.918990
      -0.303464
    
    
      1092
      0.038387
      -0.155476
      -0.253734
      2.684383
      0.774636
      0.283066
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2313
      -0.205846
      -1.757420
      -1.705877
      2.100689
      -2.208538
      -1.132553
    
    
      2343
      -1.878500
      -1.878065
      0.689921
      2.751148
      -0.482695
      -1.619270
    
    
      2361
      -3.205216
      1.287279
      -0.571579
      2.167375
      -1.650644
      0.358632
    
    
      2430
      0.580814
      -0.215444
      -1.704029
      2.505435
      2.415872
      0.450578
    
    
      2589
      -1.093291
      -0.069399
      -0.168713
      2.109153
      0.723736
      0.105442
    
    
      2751
      2.553238
      0.060056
      -0.290016
      2.663297
      1.081204
      -0.416994
    
    
      2961
      -0.018643
      -2.001437
      -0.692074
      2.882349
      -0.027989
      -0.864119
    
  

29 rows × 6 columns



In [77]:

    
u[~(u.D > 2)]  # For the inverse of u.D > 2









    Out[77]:







  
    
      
      A
      B
      C
      D
      E
      F
    
  
  
    
      0
      -0.761486
      -1.856963
      -0.259506
      -0.965839
      0.595335
      -1.393989
    
    
      3
      0.730582
      -1.933338
      0.159036
      -0.303658
      0.285985
      -0.015358
    
    
      6
      1.181648
      -0.308067
      0.194845
      -0.110363
      -0.377039
      -0.563322
    
    
      9
      -0.207188
      0.887225
      -0.089879
      1.535221
      -0.178550
      -0.119188
    
    
      12
      -1.452087
      -1.678056
      0.442135
      0.815065
      -0.788070
      0.865199
    
    
      15
      -0.312236
      0.405971
      -0.916694
      -1.570857
      -1.808676
      -0.601334
    
    
      18
      -0.520655
      0.520683
      -0.420893
      0.866483
      0.707625
      -0.457042
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2979
      0.419742
      0.763683
      -0.111488
      -0.397157
      -1.451003
      1.081199
    
    
      2982
      1.542049
      -0.929883
      -2.279710
      0.352081
      -0.345532
      0.163429
    
    
      2985
      -0.779326
      0.424736
      0.548876
      -2.316646
      1.518206
      0.038710
    
    
      2988
      1.245536
      1.010383
      -1.802921
      -2.585215
      0.362162
      -0.180893
    
    
      2991
      0.752105
      -1.444161
      -0.507737
      0.018064
      1.158738
      -0.461061
    
    
      2994
      -0.688970
      0.508954
      -0.110612
      -0.657601
      -1.061930
      0.050378
    
    
      2997
      0.208066
      1.819959
      -1.461952
      1.949032
      -0.814686
      -0.153806
    
  

971 rows × 6 columns

Recently query has been added to DataFrame for the same purpose. While it is less powerful than logical indexing, it is often faster and shorter (when names are longer than just u):



In [78]:

    
u.query('D > 2')









    Out[78]:







  
    
      
      A
      B
      C
      D
      E
      F
    
  
  
    
      96
      -1.062619
      -0.326839
      0.698481
      2.260445
      -0.480326
      -0.030888
    
    
      231
      -0.062075
      1.700715
      0.004098
      2.265953
      0.971236
      -1.165149
    
    
      243
      -1.548629
      -0.682061
      -0.988246
      2.101512
      -0.281365
      0.165648
    
    
      294
      1.536812
      -0.065230
      0.089026
      2.173031
      -0.229206
      3.471096
    
    
      717
      1.043050
      -0.596884
      -2.254242
      2.728187
      0.597010
      1.541664
    
    
      984
      0.697663
      -0.625542
      -0.445807
      2.276466
      -0.918990
      -0.303464
    
    
      1092
      0.038387
      -0.155476
      -0.253734
      2.684383
      0.774636
      0.283066
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2313
      -0.205846
      -1.757420
      -1.705877
      2.100689
      -2.208538
      -1.132553
    
    
      2343
      -1.878500
      -1.878065
      0.689921
      2.751148
      -0.482695
      -1.619270
    
    
      2361
      -3.205216
      1.287279
      -0.571579
      2.167375
      -1.650644
      0.358632
    
    
      2430
      0.580814
      -0.215444
      -1.704029
      2.505435
      2.415872
      0.450578
    
    
      2589
      -1.093291
      -0.069399
      -0.168713
      2.109153
      0.723736
      0.105442
    
    
      2751
      2.553238
      0.060056
      -0.290016
      2.663297
      1.081204
      -0.416994
    
    
      2961
      -0.018643
      -2.001437
      -0.692074
      2.882349
      -0.027989
      -0.864119
    
  

29 rows × 6 columns

Reshaping `DataFrame`s



In [132]:

    
u.pivot(index='E', columns='G', values='A')









    Out[132]:







  
    
      G
      a
      b
      c
    
    
      E
      
      
      
    
  
  
    
      -3.470618
      -0.525781
      NaN
      NaN
    
    
      -2.978682
      NaN
      0.159974
      NaN
    
    
      -2.668974
      NaN
      -1.407538
      NaN
    
    
      -2.587076
      NaN
      0.782683
      NaN
    
    
      -2.485586
      -1.196322
      NaN
      NaN
    
    
      -2.457113
      NaN
      NaN
      0.591406
    
    
      -2.439570
      NaN
      NaN
      -0.315309
    
    
      ...
      ...
      ...
      ...
    
    
      2.471763
      NaN
      NaN
      0.588655
    
    
      2.523968
      0.968783
      NaN
      NaN
    
    
      2.528503
      NaN
      NaN
      0.737061
    
    
      2.686937
      NaN
      -0.392155
      NaN
    
    
      2.829819
      NaN
      NaN
      1.453957
    
    
      3.060963
      NaN
      1.390720
      NaN
    
    
      3.159995
      0.190599
      NaN
      NaN
    
  

1000 rows × 3 columns



In [136]:

    
u.stack()









    Out[136]:





0     A   -0.761486
      B    -1.85696
      C   -0.259506
      D   -0.965839
      E    0.595335
      F   -0.717366
      G           b
             ...   
2997  A    0.208066
      B     1.81996
      C    -1.46195
      D     1.94903
      E   -0.814686
      F    -6.50169
      G           b
Length: 7000, dtype: object



In [137]:

    
u.unstack()









    Out[137]:





A  0      -0.761486
   3       0.730582
   6        1.18165
   9      -0.207188
   12      -1.45209
   15     -0.312236
   18     -0.520655
             ...   
G  2979           a
   2982           a
   2985           a
   2988           c
   2991           b
   2994           a
   2997           b
Length: 7000, dtype: object



In [138]:

    
u.stack().unstack()









    Out[138]:







  
    
      
      A
      B
      C
      D
      E
      F
      G
    
  
  
    
      0
      -0.761486
      -1.85696
      -0.259506
      -0.965839
      0.595335
      -0.717366
      b
    
    
      3
      0.730582
      -1.93334
      0.159036
      -0.303658
      0.285985
      -65.1111
      c
    
    
      6
      1.18165
      -0.308067
      0.194845
      -0.110363
      -0.377039
      -1.77518
      c
    
    
      9
      -0.207188
      0.887225
      -0.0898789
      1.53522
      -0.17855
      -8.39008
      b
    
    
      12
      -1.45209
      -1.67806
      0.442135
      0.815065
      -0.78807
      1.1558
      a
    
    
      15
      -0.312236
      0.405971
      -0.916694
      -1.57086
      -1.80868
      -1.66297
      b
    
    
      18
      -0.520655
      0.520683
      -0.420893
      0.866483
      0.707625
      -2.18798
      a
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2979
      0.419742
      0.763683
      -0.111488
      -0.397157
      -1.451
      0.924899
      a
    
    
      2982
      1.54205
      -0.929883
      -2.27971
      0.352081
      -0.345532
      6.11887
      a
    
    
      2985
      -0.779326
      0.424736
      0.548876
      -2.31665
      1.51821
      25.8329
      a
    
    
      2988
      1.24554
      1.01038
      -1.80292
      -2.58522
      0.362162
      -5.52813
      c
    
    
      2991
      0.752105
      -1.44416
      -0.507737
      0.018064
      1.15874
      -2.16891
      b
    
    
      2994
      -0.68897
      0.508954
      -0.110612
      -0.657601
      -1.06193
      19.8499
      a
    
    
      2997
      0.208066
      1.81996
      -1.46195
      1.94903
      -0.814686
      -6.50169
      b
    
  

1000 rows × 7 columns

Computing With `DataFrames`

You can calculate with DataFrames or their columns (which are Series) the same way you could with arrayss



In [79]:

    
u['F'] = 1 / u['F']
u['F'].head()









    Out[79]:





0     -0.717366
3    -65.111125
6     -1.775183
9     -8.390081
12     1.155803
Name: F, dtype: float64



In [80]:

    
np.mean(u)









    Out[80]:





A   -0.050794
B    0.061913
C   -0.025466
D    0.045786
E    0.022753
F   -4.038254
dtype: float64

You can apply functions to the whole dataset or specific columns with the apply command. apply acts on the whole column at a time (i.e. a Pandas Series), so you can compute things that depend on several values of the column, for instance the mean value. To apply functions in a real element-by-element basis the function applymap or Series.apply should be used.



In [81]:

    
def mn(col):
    return sum(col) / len(col)

u.apply(mn)









    Out[81]:





A   -0.050794
B    0.061913
C   -0.025466
D    0.045786
E    0.022753
F   -4.038254
dtype: float64

While most can be directly calculated (including the given example of the mean), apply also works on columns with strings or categorical data, where no mathematical operations are defined. The limit is the imagination.

Combining `DataFrames`

Something we will do quite often as scientists is combining data from different sources into one single source. This can be achieved by different commands in Pandas, depending on the actual goal we want.

To begin with, appending new rows of data is achieved by the command append.



In [82]:

    
newdata = pd.DataFrame(np.ones((5, 6)), index=np.arange(3003, 3018, 3), columns=list('ABCDEF'))
newdata



In [83]:

    
unew = u.append(newdata)
unew.tail(10)

The same result can be obtained with concat.



In [84]:

    
pd.concat([u, newdata]).tail(10)

New columns of data can just be asigned or added with the command join.



In [85]:

    
u['G'] = np.random.choice(['a', 'b', 'c'], len(u))
u.tail()

Grouping Data



In [86]:

    
for h, group in u.groupby('G'):
    print('{}: {}'.format(h, np.mean(group['F'])))









    



a: -0.0008421855752116789
b: -13.980448225745274
c: 1.5318088670645276



In [87]:

    
u.groupby('G').describe()









    Out[87]:







  
    
      
      A
      B
      ...
      E
      F
    
    
      
      count
      mean
      std
      min
      25%
      50%
      75%
      max
      count
      mean
      ...
      75%
      max
      count
      mean
      std
      min
      25%
      50%
      75%
      max
    
    
      G
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      a
      355.0
      -0.054040
      1.056119
      -3.205216
      -0.762618
      -0.024805
      0.642713
      3.796625
      355.0
      0.137639
      ...
      0.660793
      3.159995
      355.0
      -0.000842
      15.675441
      -223.960885
      -1.462605
      -0.504428
      1.587306
      77.513602
    
    
      b
      324.0
      -0.016923
      0.979483
      -2.487953
      -0.717918
      -0.057716
      0.681434
      3.356327
      324.0
      0.040709
      ...
      0.662474
      3.060963
      324.0
      -13.980448
      264.328321
      -4752.187932
      -1.289143
      -0.438322
      1.574061
      110.350988
    
    
      c
      321.0
      -0.081392
      1.023785
      -2.690009
      -0.809986
      -0.109089
      0.596933
      2.553238
      321.0
      -0.000431
      ...
      0.885457
      2.829819
      321.0
      1.531809
      44.547577
      -142.745170
      -1.451198
      -0.483571
      1.372699
      762.072499
    
  

3 rows × 48 columns



In [98]:

    
u.pivot_table(index='G', aggfunc='mean')

Loading and saving dataframes

To load and save Pandas dataframes we will use the to_csv and read_csv commands



In [107]:

    
u.to_csv('test.csv')
v = pd.read_csv('test.csv', index_col=0)
v.head()

But, as an addition, Pandas has special commands to load and save Excel spreadsheets (yay!). However, to use it you'll need the openpyxl and xlrd packages.



In [108]:

    
u.to_excel('test.xlsx', sheet_name='My sheet')
pd.read_excel('test.xlsx', 'My sheet', index_col=0).head()

Exercise 5: Download this dataset and load it, using the first column as the index. Take a look at it, and do the following things:

Choose the columns 'Identifier', 'BaseStamina', 'BaseAttack', 'BaseDefense', 'Type1' and 'Type2'
Create a function that lowercases strings and apply it to 'Type1' and 'Type2' (Extra: just capitalize the strings, i.e., leave the first letter uppercase and lowercase the rest)
Create a function that returns a Boolean value (don't be afraif by this, it is a function that returns either True or False) that tells if a Pokémon has high stamina (BaseStamina>170) or not. Store this information in a new column and show the list of Pokémon with high stamina
Show the instructor the last 15 rows of your dataset



In [109]:

    
df = pd.read_csv('https://raw.githubusercontent.com/ChihChengLiang/pokemongor/master/data-raw/pokemons.csv', 
                 index_col=0)

df = df[['Identifier', 'BaseStamina', 'BaseAttack', 'BaseDefense', 'Type1', 'Type2']]

capitalize = lambda st: st.capitalize()

for col in ['Type1', 'Type2']:
    df[col] = df[col].apply(capitalize)
    
def highstamina(x):
    return True if x > 170 else False

df['HighStamina'] = df.BaseStamina.apply(highstamina)

print(df[df['HighStamina'] == True].Identifier)

df.tail(15)









    



PkMn
31      Nidoqueen
36       Clefable
39     Jigglypuff
40     Wigglytuff
59       Arcanine
62      Poliwrath
68        Machamp
          ...    
143       Snorlax
144      Articuno
145        Zapdos
146       Moltres
149     Dragonite
150        Mewtwo
151           Mew
Name: Identifier, Length: 26, dtype: object






    Out[109]:







  
    
      
      Identifier
      BaseStamina
      BaseAttack
      BaseDefense
      Type1
      Type2
      HighStamina
    
    
      PkMn
      
      
      
      
      
      
      
    
  
  
    
      137
      Porygon
      130
      156
      158
      Normal
      None
      False
    
    
      138
      Omanyte
      70
      132
      160
      Rock
      Water
      False
    
    
      139
      Omastar
      140
      180
      202
      Rock
      Water
      False
    
    
      140
      Kabuto
      60
      148
      142
      Rock
      Water
      False
    
    
      141
      Kabutops
      120
      190
      190
      Rock
      Water
      False
    
    
      142
      Aerodactyl
      160
      182
      162
      Rock
      Flying
      False
    
    
      143
      Snorlax
      320
      180
      180
      Normal
      None
      True
    
    
      144
      Articuno
      180
      198
      242
      Ice
      Flying
      True
    
    
      145
      Zapdos
      180
      232
      194
      Electric
      Flying
      True
    
    
      146
      Moltres
      180
      242
      194
      Fire
      Flying
      True
    
    
      147
      Dratini
      82
      128
      110
      Dragon
      None
      False
    
    
      148
      Dragonair
      122
      170
      152
      Dragon
      None
      False
    
    
      149
      Dragonite
      182
      250
      212
      Dragon
      Flying
      True
    
    
      150
      Mewtwo
      212
      284
      202
      Psychic
      None
      True
    
    
      151
      Mew
      200
      220
      220
      Psychic
      None
      True

	A	B	C	D
P08BTc9mdE	-0.527469	-0.273173	-0.636832	-0.156654
pxObI8gZh6	-0.993787	-1.260044	0.165435	0.716617
S81GD7fAO0	0.649611	-0.345573	-0.476357	-0.430727
r2d2zMxxmv	-0.426132	-0.282060	-0.055078	0.638711
jaKW4OkQ0V	-0.669180	-0.251879	0.949884	-1.117098

	A	C	F
375	-0.636038	0.116078	-0.488360
378	-0.950089	-2.232674	-0.784635
381	1.168677	-0.428499	-0.069833
384	-1.268749	1.368481	0.612919
387	0.008765	-0.606459	0.171234
390	-0.212263	-0.327441	-0.592471
393	0.801593	-1.103829	-1.240144

	A	B	C	D	E	F
2985	-0.779326	0.424736	0.548876	-2.316646	1.518206	25.832926
2988	1.245536	1.010383	-1.802921	-2.585215	0.362162	-5.528130
2991	0.752105	-1.444161	-0.507737	0.018064	1.158738	-2.168912
2994	-0.688970	0.508954	-0.110612	-0.657601	-1.061930	19.849865
2997	0.208066	1.819959	-1.461952	1.949032	-0.814686	-6.501688
3003	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3006	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3009	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3012	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3015	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000

	A	B	C	D	E	F
2985	-0.779326	0.424736	0.548876	-2.316646	1.518206	25.832926
2988	1.245536	1.010383	-1.802921	-2.585215	0.362162	-5.528130
2991	0.752105	-1.444161	-0.507737	0.018064	1.158738	-2.168912
2994	-0.688970	0.508954	-0.110612	-0.657601	-1.061930	19.849865
2997	0.208066	1.819959	-1.461952	1.949032	-0.814686	-6.501688
3003	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3006	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3009	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3012	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3015	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000

	A	B	C	D	E	F
0	-0.761486	-1.856963	-0.259506	-0.965839	0.595335	-1.393989
3	0.730582	-1.933338	0.159036	-0.303658	0.285985	-0.015358
6	1.181648	-0.308067	0.194845	-0.110363	-0.377039	-0.563322
9	-0.207188	0.887225	-0.089879	1.535221	-0.178550	-0.119188
12	-1.452087	-1.678056	0.442135	0.815065	-0.788070	0.865199
15	-0.312236	0.405971	-0.916694	-1.570857	-1.808676	-0.601334
18	-0.520655	0.520683	-0.420893	0.866483	0.707625	-0.457042
21	-0.763168	1.941782	-0.345903	-1.078514	-2.348245	-0.507215
24	0.286458	-0.203965	2.202701	0.305345	-0.373617	-0.516685
27	0.483357	0.582012	0.151425	-0.280406	-1.193848	0.074661
30	0.251333	-0.984670	-1.126600	1.437455	1.144455	0.916344
33	0.501224	-1.196463	-0.890707	-0.330306	1.701601	0.195782
36	-0.493956	-1.001679	-0.944730	0.740723	0.359925	1.290009
39	-0.616836	0.040601	0.839057	0.233386	0.572832	-0.672458
42	-1.017777	0.913523	-2.204982	-0.398876	1.027856	-0.112541
45	1.037946	1.087577	-0.024363	-1.386254	-0.700079	-0.451928
48	0.590588	1.912149	0.382943	1.103702	-0.174166	-0.475542
51	0.123948	-1.953421	-0.829696	0.385270	-1.386090	0.290291
54	-0.775569	2.293993	-0.889644	-0.786743	0.471794	0.610733
57	0.237778	-2.610199	-0.629762	0.555422	0.843687	-2.165088
60	-0.971872	1.174113	0.521214	0.634707	0.373459	1.213645
63	1.167321	0.057173	0.485950	0.605712	-0.307385	-1.319829
66	0.159974	-1.433897	0.605769	0.010578	-2.978682	-0.352156
69	-0.782439	1.053819	0.533252	0.878424	-0.269466	0.495181
72	0.891193	0.364128	0.178855	-1.640823	1.809076	-0.263593
75	0.051243	-0.263917	0.640288	-2.158057	-0.508163	1.132559
78	-1.096922	-1.401145	0.833753	-1.807646	-1.168890	1.390710
81	-0.988110	-0.369140	-0.237166	-1.303915	0.888377	1.142403
84	-0.865377	0.323423	0.492383	1.370745	-0.021754	-1.719541
87	0.321882	1.381522	-0.674172	-1.083496	1.914568	1.028375
...	...	...	...	...	...	...
2910	0.314307	-0.431866	1.527405	0.601484	-0.160716	1.314302
2913	0.257940	-0.099765	1.157054	-0.060946	-0.494843	0.341643
2916	-0.428302	1.507066	0.611729	-0.332719	-0.196556	0.423443
2919	0.349743	-0.282030	0.196447	0.002695	0.136590	0.401846
2922	-0.307028	-0.791913	-0.265240	0.125383	-1.003613	-1.160399
2925	0.219177	0.940110	0.781867	0.126816	0.161123	1.089183
2928	-1.196400	-0.184507	1.962129	-0.343204	1.436841	-1.038041
2931	-0.272810	0.410523	0.541178	-0.793986	0.742207	2.590795
2934	-0.439342	0.136602	-0.225461	0.364491	1.802063	-0.582919
2937	0.225935	1.015986	-1.177095	-0.971220	0.236631	0.891794
2940	-0.252398	-0.012515	-0.207346	-0.493030	-0.983243	-0.657357
2943	0.868827	-0.624654	0.365266	-1.029895	-0.904947	0.933805
2946	-1.673025	-0.421382	2.676548	-1.360665	-0.999639	-0.409515
2949	-0.629402	-0.742095	0.559815	-1.556713	-2.251803	-1.368577
2952	0.025487	-0.217591	0.182265	-1.496190	0.499133	-0.494414
2955	-1.311549	-1.553367	-0.267299	1.937489	-1.462930	0.015008
2958	-0.629710	-1.414470	0.165272	-0.123763	-0.774348	2.061785
2961	-0.018643	-2.001437	-0.692074	2.882349	-0.027989	-0.864119
2964	-1.199184	-2.128532	-1.322724	-0.951057	-0.370693	0.145449
2967	0.720639	-1.539494	0.966309	1.557915	0.100207	0.188154
2970	1.483731	0.821352	0.872577	-1.694163	1.452975	1.234305
2973	-0.203272	-1.026092	-0.303214	-0.634275	0.454736	0.541063
2976	-0.519535	-1.405036	-0.218642	0.372721	0.585361	0.018672
2979	0.419742	0.763683	-0.111488	-0.397157	-1.451003	1.081199
2982	1.542049	-0.929883	-2.279710	0.352081	-0.345532	0.163429
2985	-0.779326	0.424736	0.548876	-2.316646	1.518206	0.038710
2988	1.245536	1.010383	-1.802921	-2.585215	0.362162	-0.180893
2991	0.752105	-1.444161	-0.507737	0.018064	1.158738	-0.461061
2994	-0.688970	0.508954	-0.110612	-0.657601	-1.061930	0.050378
2997	0.208066	1.819959	-1.461952	1.949032	-0.814686	-0.153806

	A	B	C	D	E	F
count	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000	1000.000000
mean	-0.050794	0.061913	-0.025466	0.045786	0.022753	-0.060224
std	1.020710	1.005581	0.993343	1.005364	1.025786	1.005457
min	-3.205216	-3.110890	-3.052732	-2.773758	-3.470618	-3.867963
25%	-0.751474	-0.631540	-0.695884	-0.640073	-0.654438	-0.764999
50%	-0.061007	0.098579	-0.029510	0.047236	0.037100	-0.037634
75%	0.650328	0.757897	0.647189	0.720574	0.731039	0.604459
max	3.796625	3.126197	2.889348	2.913861	3.159995	4.216350

	A	B	C	D	E	F
96	-1.062619	-0.326839	0.698481	2.260445	-0.480326	-0.030888
231	-0.062075	1.700715	0.004098	2.265953	0.971236	-1.165149
243	-1.548629	-0.682061	-0.988246	2.101512	-0.281365	0.165648
294	1.536812	-0.065230	0.089026	2.173031	-0.229206	3.471096
717	1.043050	-0.596884	-2.254242	2.728187	0.597010	1.541664
984	0.697663	-0.625542	-0.445807	2.276466	-0.918990	-0.303464
1092	0.038387	-0.155476	-0.253734	2.684383	0.774636	0.283066
...	...	...	...	...	...	...
2313	-0.205846	-1.757420	-1.705877	2.100689	-2.208538	-1.132553
2343	-1.878500	-1.878065	0.689921	2.751148	-0.482695	-1.619270
2361	-3.205216	1.287279	-0.571579	2.167375	-1.650644	0.358632
2430	0.580814	-0.215444	-1.704029	2.505435	2.415872	0.450578
2589	-1.093291	-0.069399	-0.168713	2.109153	0.723736	0.105442
2751	2.553238	0.060056	-0.290016	2.663297	1.081204	-0.416994
2961	-0.018643	-2.001437	-0.692074	2.882349	-0.027989	-0.864119

G	a	b	c
E
-3.470618	-0.525781	NaN	NaN
-2.978682	NaN	0.159974	NaN
-2.668974	NaN	-1.407538	NaN
-2.587076	NaN	0.782683	NaN
-2.485586	-1.196322	NaN	NaN
-2.457113	NaN	NaN	0.591406
-2.439570	NaN	NaN	-0.315309
...	...	...	...
2.471763	NaN	NaN	0.588655
2.523968	0.968783	NaN	NaN
2.528503	NaN	NaN	0.737061
2.686937	NaN	-0.392155	NaN
2.829819	NaN	NaN	1.453957
3.060963	NaN	1.390720	NaN
3.159995	0.190599	NaN	NaN

	A	B	C	D	E	F	G
0	-0.761486	-1.85696	-0.259506	-0.965839	0.595335	-0.717366	b
3	0.730582	-1.93334	0.159036	-0.303658	0.285985	-65.1111	c
6	1.18165	-0.308067	0.194845	-0.110363	-0.377039	-1.77518	c
9	-0.207188	0.887225	-0.0898789	1.53522	-0.17855	-8.39008	b
12	-1.45209	-1.67806	0.442135	0.815065	-0.78807	1.1558	a
15	-0.312236	0.405971	-0.916694	-1.57086	-1.80868	-1.66297	b
18	-0.520655	0.520683	-0.420893	0.866483	0.707625	-2.18798	a
...	...	...	...	...	...	...	...
2979	0.419742	0.763683	-0.111488	-0.397157	-1.451	0.924899	a
2982	1.54205	-0.929883	-2.27971	0.352081	-0.345532	6.11887	a
2985	-0.779326	0.424736	0.548876	-2.31665	1.51821	25.8329	a
2988	1.24554	1.01038	-1.80292	-2.58522	0.362162	-5.52813	c
2991	0.752105	-1.44416	-0.507737	0.018064	1.15874	-2.16891	b
2994	-0.68897	0.508954	-0.110612	-0.657601	-1.06193	19.8499	a
2997	0.208066	1.81996	-1.46195	1.94903	-0.814686	-6.50169	b

	A	B	C	D	E	F
G
a	-0.054040	0.137639	-0.028253	-0.024943	-0.038171	-0.000842
b	-0.016923	0.040709	-0.090043	0.116645	-0.014294	-13.980448
c	-0.081392	-0.000431	0.042796	0.052485	0.127525	1.531809

	Identifier	BaseStamina	BaseAttack	BaseDefense	Type1	Type2	HighStamina
PkMn
137	Porygon	130	156	158	Normal	None	False
138	Omanyte	70	132	160	Rock	Water	False
139	Omastar	140	180	202	Rock	Water	False
140	Kabuto	60	148	142	Rock	Water	False
141	Kabutops	120	190	190	Rock	Water	False
142	Aerodactyl	160	182	162	Rock	Flying	False
143	Snorlax	320	180	180	Normal	None	True
144	Articuno	180	198	242	Ice	Flying	True
145	Zapdos	180	232	194	Electric	Flying	True
146	Moltres	180	242	194	Fire	Flying	True
147	Dratini	82	128	110	Dragon	None	False
148	Dragonair	122	170	152	Dragon	None	False
149	Dragonite	182	250	212	Dragon	Flying	True
150	Mewtwo	212	284	202	Psychic	None	True
151	Mew	200	220	220	Psychic	None	True