10 Minutes to pandas写経

https://pandas.pydata.org/pandas-docs/stable/10min.html をやってみる



In [1]:

    
# それぞれ必要なものを import するけど、こういう風に短く書くのがこっち界隈だと一般的らしい
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Object creation

Intro to Data Structures — pandas 0.20.1 documentation
Series と DataFrame を作る練習



In [2]:

    
# Creating a Series by passing a list of values, letting pandas create a default integer index:
# リストを指定してシリーズを作成すると、Pandasはデフォルトで数値のインデックスを生成する
s = pd.Series([1,3,5,np.nan,6,8])
s









    Out[2]:





0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64



In [4]:

    
# まずは日付のインデックスを作成
dates = pd.date_range('20130101', periods=6)
dates









    Out[4]:





DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')



In [5]:

    
# Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:
# numpy arrayを渡してDataFrameを作成する時に、インデックスと列名を指定
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df



In [6]:

    
# Creating a DataFrame by passing a dict of objects that can be converted to series-like.
# DataFrame作成時に辞書を渡すと、それぞれの辞書の値を Series のように扱う

df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })
df2



In [7]:

    
# Having specific dtypes
# それぞれ特定のデータ型(dtypes)を持つ

df2.dtypes









    Out[7]:





A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object



In [11]:

    
# If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled.
# Here’s a subset of the attributes that will be completed:
# IPython使っていたら db2. で TAB を入力すると列名が自動的に補完対象になる
# Jupyter notebok だと TAB を入力すると補完するための一覧がリストボックスで表示される

Viewing Data

Essential Basic Functionality — pandas 0.20.1 documentation
データを参照する練習



In [15]:

    
# See the top & bottom rows of the frame
# DataFrame の先頭と最後の行を参照する
df.head()



In [14]:

    
df.tail(3)



In [20]:

    
# Display the index, columns, and the underlying numpy data
# インデックス、列、値を参照する

print(df.index)
print(df.columns)
print(df.values)









    



DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
Index(['A', 'B', 'C', 'D'], dtype='object')
[[ 0.67380202  0.1176613  -0.26892769 -0.47658929]
 [-0.8492771   1.25455745 -0.78829475 -0.93090736]
 [-0.99142854  0.2524888  -1.63663823  0.66014099]
 [-0.31149185  1.78676205 -0.27191572 -0.99270547]
 [ 1.39007055 -1.08938607  2.13814835  1.36508903]
 [ 1.80154377  0.72719875  0.86513779 -0.62864099]]



In [24]:

    
# Describe shows a quick statistic summary of your data
# データの特徴(平均、中央値、最大、最小など)を表示する

df.describe()



In [25]:

    
# Transposing your data
# データを転置する(インデックスと列を入れ替える)

df.T









    Out[25]:







  
    
      
      2013-01-01 00:00:00
      2013-01-02 00:00:00
      2013-01-03 00:00:00
      2013-01-04 00:00:00
      2013-01-05 00:00:00
      2013-01-06 00:00:00
    
  
  
    
      A
      0.673802
      -0.849277
      -0.991429
      -0.311492
      1.390071
      1.801544
    
    
      B
      0.117661
      1.254557
      0.252489
      1.786762
      -1.089386
      0.727199
    
    
      C
      -0.268928
      -0.788295
      -1.636638
      -0.271916
      2.138148
      0.865138
    
    
      D
      -0.476589
      -0.930907
      0.660141
      -0.992705
      1.365089
      -0.628641



In [27]:

    
# Sorting by an axis
# インデックスでソートする(axis=1なので横方向)

df.sort_index(axis=1, ascending=False)



In [28]:

    
# Sorting by values
# 値でソートする(ここではB列を指定している)

df.sort_values(by='B')



In [ ]:

	A	B	C	D
count	6.000000	6.000000	6.000000	6.000000
mean	0.285536	0.508214	0.006252	-0.167269
std	1.178466	1.000992	1.324097	0.959767
min	-0.991429	-1.089386	-1.636638	-0.992705
25%	-0.714831	0.151368	-0.659200	-0.855341
50%	0.181155	0.489844	-0.270422	-0.552615
75%	1.211003	1.122718	0.581621	0.375958
max	1.801544	1.786762	2.138148	1.365089

	A	B	C	D
2013-01-01	0.673802	0.117661	-0.268928	-0.476589
2013-01-02	-0.849277	1.254557	-0.788295	-0.930907
2013-01-03	-0.991429	0.252489	-1.636638	0.660141
2013-01-04	-0.311492	1.786762	-0.271916	-0.992705
2013-01-05	1.390071	-1.089386	2.138148	1.365089
2013-01-06	1.801544	0.727199	0.865138	-0.628641

	A	B	C	D	E	F
0	1.0	2013-01-02	1.0	3	test	foo
1	1.0	2013-01-02	1.0	3	train	foo
2	1.0	2013-01-02	1.0	3	test	foo
3	1.0	2013-01-02	1.0	3	train	foo