In [1]:
import numpy as np
import pandas as pd
In [2]:
array = np.array([1,2,3,4],float)
array
Out[2]:
Here we create an 2x3 array
In [3]:
two_dimen_array = np.array([[1,2,3],[4,5,6]], float)
two_dimen_array
Out[3]:
Now let us show how to perform certain slicing and indexing operations on numpy Arrays.If we want to print from the 3rd element until the end on a vector array
In [4]:
array[2:]
Out[4]:
In this line of code, we print all elements from the 2nd element until the end of the 1st row.
In [5]:
two_dimen_array[0][1:]
Out[5]:
In this line, we print the 2nd row of the matrix
In [6]:
two_dimen_array[1, :]
Out[6]:
In this line, we print the 2nd column
In [7]:
two_dimen_array[:,1]
Out[7]:
In this line, I wanted to experiment a bit with slicing operations with the arithmetic operations. Here I print out the results of subtracting the 2nd row of the matrix from 1st three elements of the vector array
In [8]:
array[:3] - two_dimen_array[1,:]
Out[8]:
Here we create a 2x2 array display it's output. Next, we print out the results of the 1st two elements from the vector array
In [9]:
two_by_two_array = np.array([[1,4],[2,5]],float)
two_by_two_array
Out[9]:
In [10]:
array[:2]
Out[10]:
This operation I found to be kind of weird since I would have assumed the multiplication of two arrays to be the dot product, but that is not what occurs here. In this operation, The first element of array is multiplied by each element of the first column in two_by_two_array and the 2nd element of the array is multiplied by each element of the 2nd column of two_by_two_array.
In [11]:
two_by_two_array * array[:2]
Out[11]:
If we want to compute the dot product, you have to use the dot member function of the numpy library. Here, we create a new array and perform the dot product on two_by_two_array
In [12]:
array2 = np.array([1,2],float)
array2
Out[12]:
In [13]:
np.dot(array2,two_by_two_array)
Out[13]:
In [14]:
series = pd.Series(['Ransford', "Hyman Sr.", 'January', 1941], index=['First Name', 'Last Name', 'Birth Month', 'Birth Year'])
series
Out[14]:
Here we create a Python Dictionary and then convert that into a Panda DataFrame. Here we create a dictionary named family and create dataframe df from family.
In [15]:
family = {'name': ['Ransford','Denzel'],
'Birth year': [1984, 2004],
'favorite subject': ['Math','Science']}
family
Out[15]:
In [16]:
df = pd.DataFrame(family)
df
Out[16]:
Notice that when we print out the dictionary it prints out in regular text, But when we print out the dataframe, it gives us a nice table in IPython. Pretty cool!!!
Now we are going to create a seperate dataframe and play around with some of it's member functions.
In [17]:
frank_grades = {'subject':['Math','English','Social Studies','Science','Music','Art'],
'grades': [95,87,80,96,98,70]}
df2 = pd.DataFrame(frank_grades)
df2
Out[17]:
Notice here that we can index Dataframes by the index name just like in Python dictionaries. Panda's dataframes have a function called describe which generates some interesting statiscal information. This function is very helpful for doing some initial sanity checking on the dataframe's columns. Here we are given the number of entries (given by the count row), the mean, standard deviation and the Interquartile Range (IQR).
In [18]:
df2['grades'].describe()
Out[18]:
Here is a way to examine the DataFrame without printing the entire thing. The printout shows the printing of the first 2 rows of the subject column. Note that you can specify how rows you would like to print by passing the number as a parameter to the head function. The second printout shows the function being called on the entire dataframe object.
In [19]:
print df2['subject'].head(2)
df2.head()
Out[19]:
Here we perform the tail on the DataFrame object.
In [20]:
print df2['grades'].tail()
df2.tail
Out[20]:
Of course we probably could have presented this same information in a python script with comments, but where is the fun in that? I hope that you find this page useful.