It is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
In [1]:
import pandas as pd
Pandas is well suited for many different kinds of data:
Data structures in pandas are:
In [2]:
import pandas as pd
import numpy as np
ndarray = np.array(['a','b','c','d'])
serie = pd.Series(ndarray)
print(serie)
In [3]:
dog_data=[
['Pedro','Doberman',3],\
['Clementine','Golden Retriever',8],\
['Norah','Great Dane',6],\
['Mabel','Austrailian Shepherd',1],\
['Bear','Maltese',4],\
['Bill','Great Dane',10]
]
In [4]:
dog_df=pd.DataFrame(dog_data,columns=['name','breed','age'])
dog_df
Out[4]:
In [5]:
print(type(dog_df['age'].iloc[0]))
In [6]:
dog_df.head()
Out[6]:
In [7]:
dog_df.tail(3)
Out[7]:
In [8]:
dog_df.shape
Out[8]:
In [9]:
len(dog_df)
Out[9]:
In [10]:
dog_df.columns
Out[10]:
In [11]:
dog_df.dtypes
Out[11]:
In [12]:
dog_df.values
Out[12]:
DataFrame.describe(percentiles=None, include=None, exclude=None)
In [13]:
dog_df.describe()
Out[13]:
Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
In [14]:
dog_df['breed'].value_counts()
Out[14]:
In [15]:
dog_df[['name','age']]
Out[15]:
Allowed inputs are:
In [16]:
dog_df.iloc[2:4]
Out[16]:
In [17]:
dog_df.iloc[1:4, 0:2]
Out[17]:
In [18]:
dog_df[dog_df['breed'].isin(['Great Dane', 'Maltese'])]
Out[18]:
In [19]:
dog_df[dog_df['name']=='Norah']
Out[19]:
In [20]:
dog_df[(dog_df['name']=='Bill') & (dog_df['breed']=='Great Dane')]
Out[20]:
In [21]:
dog_df[dog_df['age']<5]
Out[21]:
In [22]:
dog_df[dog_df['breed'].str.contains('G')]
Out[22]:
In [23]:
owner_data=[['Bilbo','Pedro'],['Gandalf','Bear'],['Sam','Bill']]
owner_df=pd.DataFrame(owner_data,columns=['owner_name','dog_name'])
In [24]:
df=pd.merge(owner_df,dog_df,left_on='dog_name',right_on='name',how='inner')
In [25]:
df
Out[25]:
More details on merge parameters:
Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
Merge method | SQL Join Name | Description |
---|---|---|
left | LEFT OUTER JOIN | Use keys from left frame only |
right | RIGHT OUTER JOIN | Use keys from right frame only |
outer | FULL OUTER JOIN | Use union of keys from both frames |
inner | INNER JOIN | Use intersection of keys from both frames |
In [26]:
inner_df = owner_df.merge(dog_df, left_on='dog_name', right_on='name', how='inner')
In [27]:
inner_df
Out[27]:
In [28]:
inner_df=inner_df.drop(['name'],axis=1)
In [29]:
inner_df
Out[29]:
In [30]:
left_df = owner_df.merge(dog_df, left_on='dog_name', right_on='name', how='left')
In [31]:
left_df
Out[31]:
In [32]:
right_df = owner_df.merge(dog_df, left_on='dog_name', right_on='name', how='right')
In [33]:
right_df
Out[33]:
In [34]:
outer_df = owner_df.merge(dog_df, left_on='dog_name', right_on='name', how='outer')
In [35]:
outer_df
Out[35]:
In [36]:
df=df.drop(['name'],axis=1)
In [37]:
df
Out[37]:
In [38]:
import matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
In [39]:
# Will allow us to embed images in the notebook
%matplotlib inline
In [40]:
plot_df = pd.DataFrame({
'col1': [1, 3, 2, 4],
'col2': [3, 6, 5, 1],
'col3': [4, 7, 6, 2],
})
matplotlib.pyplot.plot(*args, scalex=True, scaley=True, data=None, kwargs)**
In [41]:
plot_df.plot()
Out[41]:
In [42]:
plot_df.plot(kind='box')
Out[42]:
In [43]:
plot_df.plot(kind='bar')
Out[43]:
In [ ]: