Introduction to Data Structure in Python

Variables and Assignment


In [ ]:
x = 10


In [ ]:
x = 'hello'


In [ ]:
x = 2.7


In [ ]:
y = 1.0

In [ ]:
z = x + y
print(z)

Lists

I want a list of numbers


In [ ]:
items = [2.7, 3.1, 69.1]

I want to print the second number in the list


In [ ]:
print(items[1])

I want to print the length of the list


In [ ]:
print len(items)

I want to print all the numbers in the list


In [ ]:
for item in items:
    print item

Introduction to Pandas

Data Frame

What is a data frame?

2-d matrix


In [ ]:
import pandas as pd

In [ ]:
df = pd.read_csv("sample.csv")

In [ ]:
df

In [ ]:
df.columns = ["area", "sales2014", "profit", "sales2016"]

In [ ]:
df.dtypes

How many rows & columns does the dataframe have?


In [ ]:
?df.shape

I want to see the top 5 rows of the dataframe


In [ ]:
df.shape

In [ ]:
df.head

In [ ]:
df1 = df.head(1)

In [ ]:
df1

I want to see the bottom 2 rows only


In [ ]:
df.tail(2)

In [ ]:
df.index

What are the column names?


In [ ]:
df.columns

Show me the values of the dataframe. (exclude the column and index information)


In [ ]:
df.values

Can I quickly get a sense of how the data is in the dataframe? Information like min, max, mean etc


In [ ]:
df.sales.describe()

I want to sort the dataframe based on sales in descending order


In [ ]:
df.sort_values(by=['sales2014', "profit"], ascending=False)

I want to do the same, but based on profit column


In [ ]:
df.sort_values(by='profit', ascending=False)

I want to find the area with least profit & sales


In [ ]:
df.sort_values(by=['sales','profit'], ascending=True)

I want to view the sales alone


In [ ]:
df.sales

I want to view just sales & profit columns, not the area names


In [ ]:
df.loc[2:4, ['sales2014', 'profit']]

I want the third row of the dataframe


In [ ]:
df.loc[2, :]

I want the third row, sales & profit columns


In [ ]:
df.loc[2, ['sales', 'profit']]

I want rows between index 2 and 3, and column 2 only


In [ ]:
df.iloc[2:3, 2]

In [ ]: