Origin from http://pandas.pydata.org/pandas-docs/stable/
by openthings@163.com, 2016-04.
Creating a Series by passing a list of values, letting pandas create a default integer index:
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = pd.Series([1,3,5,np.nan,6,8])
s
Out[1]:
Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:
In [2]:
dates = pd.date_range('20130101', periods=6)
dates
Out[2]:
In [7]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df
Out[7]:
Creating a DataFrame by passing a dict of objects that can be converted to series-like.
In [8]:
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df2
Out[8]:
In [9]:
df2.dtypes
Out[9]:
If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed:
In [13]: df2.<TAB>
In [11]:
df2.
Out[11]:
As you can see, the columns A, B, C, and D are automatically tab completed. E is there as well; the rest of the attributes have been truncated for brevity.
In [14]:
df.head()
Out[14]:
In [15]:
df.tail(3)
Out[15]:
In [16]:
df.index
Out[16]:
In [17]:
df.values
Out[17]:
In [18]:
df.describe()
Out[18]:
In [19]:
df.T
Out[19]:
In [20]:
df.sort_index(axis=1, ascending=False)
Out[20]:
In [21]:
df.sort_values(by='B')
Out[21]:
Getting
In [22]:
df['A']
Out[22]:
In [23]:
df[0:3]
Out[23]:
In [24]:
df['20130102':'20130104']
Out[24]:
For getting a cross section using a label
In [25]:
df.loc[dates[0]]
Out[25]:
Selecting on a multi-axis by label
In [26]:
df.loc[:,['A','B']]
Out[26]:
Showing label slicing, both endpoints are included
In [27]:
df.loc['20130102':'20130104',['A','B']]
Out[27]:
Reduction in the dimensions of the returned object
In [30]:
df.loc['20130102',['A','B']]
Out[30]:
For getting a scalar value
In [31]:
df.loc[dates[0],'A']
Out[31]:
For getting fast access to a scalar (equiv to the prior method)
In [32]:
df.at[dates[0],'A']
Out[32]:
In [33]:
df.iloc[3]
Out[33]:
By integer slices, acting similar to numpy/python
In [34]:
df.iloc[3:5,0:2]
Out[34]:
By lists of integer position locations, similar to the numpy/python style
In [35]:
df.iloc[[1,2,4],[0,2]]
Out[35]:
For slicing rows explicitly
In [36]:
df.iloc[1:3,:]
Out[36]:
For slicing columns explicitly
In [37]:
df.iloc[:,1:3]
Out[37]:
For getting a value explicitly
In [39]:
df.iloc[1,1]
Out[39]:
For getting fast access to a scalar (equiv to the prior method)
In [40]:
df.iat[1,1]
Out[40]:
Using a single column’s values to select data.
In [41]:
df[df.A > 0]
Out[41]:
A where operation for getting.
In [42]:
df[df > 0]
Out[42]:
In [43]:
df2 = df.copy()
添加一列。
In [44]:
df2['E'] = ['one', 'one','two','three','four','three']
In [45]:
df2
Out[45]:
In [46]:
df2[df2['E'].isin(['two','four'])]
Out[46]:
In [48]:
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
s1
Out[48]:
Setting values by position
In [49]:
df.iat[0,1] = 0
Setting by assigning with a numpy array
In [50]:
df.loc[:,'D'] = np.array([5] * len(df))
The result of the prior setting operations
In [51]:
df
Out[51]:
A where operation with setting.
In [52]:
df2 = df.copy()
In [53]:
df2[df2 > 0] = -df2
In [54]:
df2
Out[54]:
In [ ]: