• # pandas

THis notebook records some tips for the pandas module


In [1]:
import pandas as pd
import numpy as np

Create dataframe

Create a dataframe of random integers


In [2]:
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
df


Out[2]:
A B C D
0 48 20 45 57
1 59 78 37 89
2 58 56 71 42
3 39 33 86 2
4 75 89 56 1
5 77 45 11 23
6 44 55 38 22
7 30 37 99 93
8 60 80 80 11
9 53 64 82 68

.loc

use .loc to select both rows and columns by label based indexing. The labels being the values of the index or the columns. Slicing with .loc includes the last element.


In [3]:
df.loc[3:9:2, 'B':]


Out[3]:
B C D
3 33 86 2
5 45 11 23
7 37 99 93
9 64 82 68

Change index and to_datetime

Use to_datetime to convert the 'time' column to pandas's time format and set the index of dataframe to the column that records the time of data.


In [4]:
df2 = pd.DataFrame([['2017-01-01', 253, 234], ['2017-02-04', 283, 333], ['2017-02-11', 3, 55]], columns=['time', 'data1', 'data2'])
df2


Out[4]:
time data1 data2
0 2017-01-01 253 234
1 2017-02-04 283 333
2 2017-02-11 3 55

In [5]:
df2.index = pd.to_datetime(df2.pop('time'))
df2


Out[5]:
data1 data2
time
2017-01-01 253 234
2017-02-04 283 333
2017-02-11 3 55