(from panda docs: https://pandas.pydata.org/pandas-docs/stable/indexing.html)
The axis labeling information in pandas objects serves many purposes:
indexing operators: []
attribute operators: .
Example:
In [3]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
df["A"] #indexing
Out[3]:
In [4]:
df.A #attribute
Out[4]:
In [5]:
type(df.A)
Out[5]:
In [6]:
df.A[0]
Out[6]:
In [7]:
df[["A","B"]]
Out[7]:
In [9]:
type(df[["A","B"]])
Out[9]:
The most robust way to slice Dataframes is by using .loc and .iloc methods, however the following also holds:
In [11]:
s = df["A"]
s[:5]
Out[11]:
In [18]:
s[::2]
Out[18]:
In [19]:
s[::-1]
Out[19]:
Watchout... this is a rather incoherent use of the indexinf method over rows. That's why it is said that loc provides a more coherent use.
In [22]:
df[:3] # for convenience as it is a common use
Out[22]:
In [21]:
df["A"]
Out[21]:
The .loc attribute is the primary access method. The following are valid inputs:
In [24]:
df.loc[:,"B":]
Out[24]:
In [30]:
df.loc[4:,"C":] = 0
df
Out[30]:
Boolean accessing
In [42]:
df.loc[:,"A"]>0
Out[42]:
In [43]:
df.loc[df.loc[:,"A"]>0]
Out[43]:
Accessing by position by .iloc
In [44]:
df.iloc[0,0]
Out[44]:
In [45]:
df.iloc[3:,2:]
Out[45]:
In [46]:
df.iloc[[0,1,3],[1,3]]
Out[46]:
Selection by callable
In [48]:
df.loc[:,lambda df: df.columns == "A"]
Out[48]:
Selection by isin
In [51]:
df["X"] = range(0, df.shape[0])
df
Out[51]:
In [53]:
df[df["X"].isin([0,2])]
Out[53]:
The where() Method
In [54]:
df.where(df["A"]>0)
Out[54]:
In [56]:
df.where(df["A"]>0,100)
Out[56]:
Over series:
In [58]:
s * 2
Out[58]:
In [59]:
s.max()
Out[59]:
In [60]:
np.max(s)
Out[60]:
In [62]:
s.apply(np.max)
Out[62]:
In [64]:
def multiply_by_2(x):
return x*2
s.apply(multiply_by_2)
Out[64]:
In [65]:
s.apply(lambda x: x*2)
Out[65]:
In [67]:
s.map(lambda x: x*2)
Out[67]:
In [70]:
mydict={2:"a"}
df["X"].map(mydict)
Out[70]:
Over dataframes:
In [73]:
df.apply(np.max,axis=0)
Out[73]:
In [74]:
df.apply(np.max,axis=1)
Out[74]:
In [75]:
df.applymap(lambda x: x*2)
Out[75]: