In [1]:
a = 10
b = 20
c = "Hello"
print a, b, c
In [2]:
list_items = ["milk", "cereal", "banana", 22.5, [1,2,3]] ## A list can contain another list and items of different types
print list_items
print "3rd item in the list: ", list_items[2] # Zero based index starts from 0 so 3rd item will have index 2
Like list but only store unique items which are hashable (think basic data types like string, ints and not lists, will explain later). Super useful for checking if an item is already in the list. Items are not indexed. So items can only be added or removed. You will use these when you want to keep track of unique items e.g. feature names in the data.
In [3]:
set_items = set([1,2,3, 1])
print set_items
print "Is 1 in set_items: ", 1 in set_items
print "Is 10 in set_items: ", 10 in set_items
Like sets but can also map values to each unique item. Essentially, it stores key-value pairs which are useful for fast lookup of items. Think of telephone directory or shopping catalogue. Keys should be of same time as items in sets, but values can be anything. You will use these when you want to keep unique items and their related values e.g. words in the data and the number of times they occur.
In [4]:
item_details = {
"milk": {
"brand": "Amul",
"quantity": 2.5,
"cost": 10
},
"chocolate": {
"brand": "Cadbury",
"quantity": 1,
"cost": 5
},
}
print item_details
print "What are is the brand of milk: ", item_details["milk"]["brand"]
print "What are is the cost of chocolate: ", item_details["chocolate"]["cost"]
Using a function is handy in cases when you need to repeat something over an over again. A function can take arguments and return some variables.
E.g. if you want to fetch tweets using different queries then you can define a function which takes the query and gives you as output the tweets on that query. You can then just call the function with different queries rather than rewriting the whole code for getting the queries.
In [5]:
def get_items_from_file(filename):
data = []
with open(filename) as fp:
for line in fp:
line = line.strip().split(" ")
data.append(line)
return data
In [6]:
print "Data in file data/temp1.txt"
print get_items_from_file("../data/temp1.txt")
In [7]:
print "Data in file data/temp2.txt"
print get_items_from_file("../data/temp2.txt")
In [8]:
from scipy.io import arff
In [9]:
data, meta = arff.loadarff("../data/iris.arff")
In [10]:
data.shape, meta
Out[10]:
In [11]:
data[0]
Out[11]:
In [12]:
import pandas as pd
In [13]:
df_iris = pd.DataFrame(data, columns=meta.names())
df_iris.head()
Out[13]:
In [14]:
print "The shape of iris data is: ", df_iris.shape
In [15]:
print "Show how many instances are of each class: "
df_iris["class"].value_counts()
Out[15]:
In [16]:
df_iris["sepallength"].hist(bins=10)
Out[16]:
Filtering data Filtering parts of the data in pandas is really easy. If you want to filter data for editing it then you need to make a copy of the filtered data.
In [17]:
print "Show data containing with petalwidth > 2.0"
df_iris[df_iris["petalwidth"] > 2.0]
Out[17]:
VARIABLE DESCRIPTIONS:
survival Survival
(0 = No; 1 = Yes)
pclass Passenger Class
(1 = 1st; 2 = 2nd; 3 = 3rd)
name Name
sex Sex
age Age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Ticket Number
fare Passenger Fare
cabin Cabin
embarked Port of Embarkation
(C = Cherbourg; Q = Queenstown; S = Southampton)
SPECIAL NOTES:
Pclass is a proxy for socio-economic status (SES)
1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower
Age is in Years; Fractional if Age less than One (1)
If the Age is Estimated, it is in the form xx.5
With respect to the family relation variables (i.e. sibsp and parch)
some relations were ignored. The following are the definitions used
for sibsp and parch.
Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic
Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)
Parent: Mother or Father of Passenger Aboard Titanic
Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic
Other family relatives excluded from this study include cousins,
nephews/nieces, aunts/uncles, and in-laws. Some children travelled
only with a nanny, therefore parch=0 for them. As well, some
travelled with very close friends or neighbors in a village, however,
the definitions do not support such relations.
In [18]:
df = pd.read_csv("../data/titanic.csv")
df.shape
Out[18]:
In [19]:
df.head()
Out[19]:
In [20]:
# We need the line below to show plots directly in the notebook.
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
In [21]:
sns.set_style("ticks")
sns.set_context("paper")
In [22]:
colors = {
"Iris-setosa": "red",
"Iris-versicolor": "green",
"Iris-virginica": "blue",
}
plt.scatter(df_iris.petallength, df_iris.petalwidth, c=map(lambda x: colors[x], df_iris["class"]))
plt.xlabel("petallength")
plt.ylabel("petalwidth")
Out[22]:
In [23]:
sns.lmplot(x="petallength", y="petalwidth", hue="class", data=df_iris, fit_reg=False)
Out[23]:
In [24]:
sns.pairplot(df_iris, hue="class")
Out[24]:
In [25]:
sns.countplot(x="sex", data=df)
Out[25]:
In [26]:
sns.countplot(x="class", data=df)
Out[26]:
In [27]:
sns.countplot(x="embark_town", data=df)
Out[27]:
In [28]:
sns.countplot(x="alive", data=df)
Out[28]:
In [29]:
sns.countplot(x="alone", data=df)
Out[29]:
In [30]:
sns.lmplot(x="age", y="survived", hue="sex", data=df, fit_reg=True, logistic=True)
Out[30]:
In [31]:
sns.barplot(x="sex", y="survived", hue="embark_town", data=df)
Out[31]:
In [32]:
sns.barplot(x="sex", y="survived", hue="class", data=df)
Out[32]:
In [33]:
sns.barplot(x="sex", y="survived", hue=pd.cut(df.age, bins=[0,18,30,100]), data=df)
Out[33]:
In [34]:
sns.barplot(x="sex", y="survived", hue="alone", data=df)
Out[34]:
In [35]:
sns.barplot(x="sex", y="survived", hue=pd.cut(df.sibsp, bins=[0,1,2,3,10]), data=df)
Out[35]:
In [36]:
sns.barplot(x="sex", y="survived", hue=pd.cut(df.parch, bins=[0,1,2,3,10]), data=df)
Out[36]:
In [37]:
sns.barplot(x="sex", y="age", hue=pd.cut(df.parch, bins=[0,1,2,3,10]), data=df)
Out[37]:
In [38]:
sns.barplot(x="sex", y="age", hue=pd.cut(df.sibsp, bins=[0,1,2,3,10]), data=df)
Out[38]:
In [39]:
sns.barplot(x="sex", y="age", hue="embark_town", data=df)
Out[39]:
In [40]:
sns.barplot(x="sex", y="age", hue="class", data=df)
Out[40]:
In [ ]:
ANSWER BELOW
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In [41]:
sns.barplot(x="class", y="petalwidth", hue=pd.cut(df_iris.petallength, bins=[0, 2.5, 4.5, 6.5, 10]), data=df_iris)
Out[41]:
In [ ]: