Fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
pandas is well suited for many different kinds of data:
In this section we will be looking at some of the basics functions that Pandas can perform
Standard libraries
In [ ]:
import pandas as pd
import numpy as np
In [ ]:
s = pd.Series([1,3,5,np.nan,6,8])
s
In [ ]:
dates = pd.date_range('20130101', periods=6)
dates
In [ ]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df
Creating a DataFrame by passing a dict of objects that can be converted to series-like.
In [ ]:
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2
In [ ]:
df2.dtypes
In [ ]:
wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
major_axis=pd.date_range('1/1/2000', periods=5),
minor_axis=['A', 'B', 'C', 'D'])
wp
In [ ]:
df.head()
In [ ]:
df.tail(3)
See NumPy data
In [ ]:
df.index
In [ ]:
df.columns
In [ ]:
df.values
Statistic Summary
In [ ]:
df.describe()
Transposing data
In [ ]:
df.T
Sorting
In [ ]:
df.sort_index(axis=1, ascending=False)
Get a column
In [ ]:
df['A']
Get rows
In [ ]:
# By index
df[0:3]
In [ ]:
#By Value
df['20130102':'20130104']
In [ ]:
df.loc[dates[0]]
In [ ]:
# Limit columns
df.loc[:,['A','B']]
In [ ]:
df_stock = pd.DataFrame({'Stocks': ["AAPL","CA","CTXS","FIS","MA"],
'Values': [126.17,31.85,65.38,64.08,88.72]})
df_stock
In [ ]:
df_stock = df_stock.append({"Stocks":"GOOG", "Values":523.53}, ignore_index=True)
df_stock
In [ ]:
df_stock[df_stock["Values"]>65]
In [ ]:
df_stock.mean()
In [ ]:
# Per column
df.mean()
In [ ]:
# Per row
df.mean(1)
In [ ]:
big_dates = pd.date_range('20130101', periods=60000)
big_dates
big_df = pd.DataFrame(np.random.randn(60000,4), index=big_dates, columns=list('ABCD'))
big_df
In [ ]:
big_df['20200102':'20200104']
In [ ]:
big_df.loc['20130102':'20130104']
In [ ]:
%timeit big_df['20200102':'20200104']
%timeit big_df.loc['20200102':'20200104']
In [ ]:
big_df[30000:30003]
In [ ]:
big_df.iloc[30000:30003]
In [ ]:
%timeit big_df[30000:30003]
%timeit big_df.iloc[30000:30003]
In [ ]: