Pandas

CLEPY - August Module of the month

Anurag Saxena

@_asaxena

Pandas - Python Data Analysis Library

pandas.pydata.org

Open Source

High Performance

Easy to use Data Structures and Data Analysis Tools


In [ ]:
import pandas as pd
import numpy as np

There are two data structures in pandas that you will encounter the most:

- Series

- DataFrame

Series

Nothing but a one dimensional array like object containing an array of data


In [ ]:
obj = pd.Series([1,3,4,5,6,7,8,9])
obj

In [ ]:
# You can define a custom index for series data

obj_c = pd.Series([1,2,4], index=['a','b','c'])
obj_c

DataFrame

Represents a tabular, spreadsheet like data sturcture.

It has both row and column index.

You can think about it as a dict of Series but sharing the same index.

If you work with pandas, you are mostly working in dataframes.


In [ ]:
df_1 = pd.DataFrame(np.random.randint(0,100,size=(100,4)), columns=list('ABCD'))

print(type(df_1))
df_1.head()

In [ ]:
df_1.tail(3)

In [ ]:
df_1.index

In [ ]:
df_1.columns

In [ ]:
df_1.values

In [ ]:
df_1.describe()

In [ ]:
df_1.mean()

In [ ]:
df_1

In [ ]:
df_1['A']

In [ ]:
df_1[0:3]

Reading data from File

Pandas can read and write to a multitude of file formats


In [ ]:
import time
t_start = time.time()
apple_df = pd.read_csv('https://raw.githubusercontent.com/matplotlib/sample_data/master/aapl.csv')
t_end = time.time()
print (t_end - t_start)
apple_df.head()

In [ ]:
apple_describe_time_start = time.time()
apple_df.describe()
apple_describe_time_end = time.time()

In [ ]:
print (apple_describe_time_end - apple_describe_time_start)

In [ ]:
# pd.set_eng_float_format(accuracy=2, use_eng_prefix=True)
apple_df.mean()

Checkout 10 minute to pandas to get started quickly

https://pandas.pydata.org/pandas-docs/stable/10min.html

Book to checkout if you want to work on this more

Thanks

Questions?