Pandas之数据定位



In [ ]:

    
import pandas as pd



In [ ]:

    
# 参考　https://stackoverflow.com/questions/28757389/pandas-loc-vs-iloc-vs-ix-vs-at-vs-iat

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])



In [ ]:

    
df

loc 通过行标签(labels)定位

一个标签



In [ ]:

    
ss_Penelope = df.loc['Penelope']
ss_Penelope

多个标签



In [ ]:

    
df_Jane_Nick = df.loc[['Jane', 'Nick']]
df_Jane_Nick

标签切片选取（slice)



In [ ]:

    
df_Aaron_to_Dean = df.loc['Aaron':'Dean']
df_Aaron_to_Dean

行标签+列标签



In [ ]:

    
df_age_score_of_Jane_Nick = df.loc[['Jane', 'Nick'], ['age', 'score']]
df_age_score_of_Jane_Nick

区别

参考



In [ ]:

    
df.loc['Jane', 'age'] # 同一块内存上操作



In [ ]:

    
df.loc['Jane']['age'] # 内存有copy

iloc 通过行索引(index)定位

一个索引



In [ ]:

    
ss_Penelope = df.iloc[3]
ss_Penelope

多个索引



In [ ]:

    
df_Jane_Nick = df.iloc[[0, 1]]
df_Jane_Nick

索引切片选取



In [ ]:

    
df_Aaron_to_Dean = df.iloc[2:-2]
df_Aaron_to_Dean

行索引+列索引



In [ ]:

    
df_age_score_of_Jane_Nick = df.iloc[[0,1], [0,4]]
df_age_score_of_Jane_Nick

标签(labels) 和索引(index)之间互转换

行索引 -> 行标签



In [ ]:

    
row_names = df.index[[1, 3]]
row_names

行标签 -> 行引用



In [ ]:

    
row_idxes = [df.index.get_loc(row) for row in row_names]
row_idxes

列索引 -> 列标签



In [ ]:

    
col_names = df.columns[[2, 4]]
col_names

列标签 -> 列索引



In [ ]:

    
col_idxes = [df.columns.get_loc(col) for col in col_names]
col_idxes

掩码定位



In [ ]:

    
ss_age_bt_30 = df.age > 30
ss_age_bt_30



In [ ]:

    
print(ss_age_bt_30.values)



In [ ]:

    
df.loc[df.age > 30]



In [ ]:

    
df.loc[ss_age_bt_30]



In [ ]:

    
df.loc[ss_age_bt_30.values]

at 和 iat 选取

和 loc, iloc类似, 只能选取单一标量, 性能好些



In [ ]:

    
df.at['Dean', 'age']



In [ ]:

    
df.iat[4, 0]

其他

选取age列: case1



In [ ]:

    
df['age'] # or df.age

选取age列: case2



In [ ]:

    
df.age

选取age列: case3



In [ ]:

    
df.iloc[:, 0]



In [ ]:

    
df[0:2]['age']



In [ ]:

    
df[0:2]

Pandas之数据定位

loc 通过行标签(labels)定位

一个标签

多个标签

标签切片选取（slice)

行标签+列标签

区别

iloc 通过行索引(index)定位

一个索引

多个索引

索引切片选取

行索引+列索引

标签(labels) 和 索引(index)之间互转换

行索引 -> 行标签

行标签 -> 行引用

列索引 -> 列标签

列标签 -> 列索引

掩码定位

at 和 iat 选取

其他

选取age列: case1

选取age列: case2

选取age列: case3

标签(labels) 和索引(index)之间互转换