Pandas Data Structures

Series

From the Book Python for Data Analysis: A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index.


In [43]:
import pandas as pd
from pandas import Series, DataFrame

series_1 = Series([-2, -1, 0, 1, 2, 3, 4, 5])
series_1


Out[43]:
0   -2
1   -1
2    0
3    1
4    2
5    3
6    4
7    5
dtype: int64

In [44]:
series_1.values


Out[44]:
array([-2, -1,  0,  1,  2,  3,  4,  5])

In [45]:
series_1.index


Out[45]:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

In [46]:
series_2 = Series([1, 2, 3], index=['a', 'b', 'c'])
series_2


Out[46]:
a    1
b    2
c    3
dtype: int64

In [47]:
series_2.index


Out[47]:
Index([u'a', u'b', u'c'], dtype='object')

In [48]:
series_2['a']


Out[48]:
1

In [49]:
series_2[['a', 'b']]


Out[49]:
a    1
b    2
dtype: int64

In [50]:
series_2[series_2 > 1]


Out[50]:
b    2
c    3
dtype: int64

In [51]:
series_2 * 2


Out[51]:
a    2
b    4
c    6
dtype: int64

Using a dict for Series


In [52]:
numbers_1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series_3 = Series(numbers_1)
series_3


Out[52]:
a    1
b    2
c    3
d    4
e    5
dtype: int64

In [53]:
numbers_2_index = ['a', 'b', 'c', 'd']
numbers_2 = {'a': 1, 'b': 2, 'c': 3}
series_4 = Series(numbers_2, index=numbers_2_index)
series_4


Out[53]:
a     1
b     2
c     3
d   NaN
dtype: float64

In [54]:
pd.isnull(series_4) # same as series_4.isnull()


Out[54]:
a    False
b    False
c    False
d     True
dtype: bool

In [55]:
pd.notnull(series_4)


Out[55]:
a     True
b     True
c     True
d    False
dtype: bool

In [57]:
series_3 + series_4


Out[57]:
a     2
b     4
c     6
d   NaN
e   NaN
dtype: float64

In [61]:
series_4.name = 'numbers'
series_4.index.name = 'letter'
series_4


Out[61]:
letter
a     1
b     2
c     3
d   NaN
Name: numbers, dtype: float64

In [63]:
series_4.index = ['1', '2', '3', 'x'] # update index
series_4


Out[63]:
1     1
2     2
3     3
x   NaN
Name: numbers, dtype: float64

DataFrame

From the Book Python for Data Analysis: A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type. Can be thought as a dict of Series.


In [65]:
data_1 = {
    'pet': ['Toffee', 'Candy', 'Cake', 'Sussy'],
    'age': [3, 1, 2, 4],
}
frame_1 = DataFrame(data_1)
frame_1


Out[65]:
age pet
0 3 Toffee
1 1 Candy
2 2 Cake
3 4 Sussy

In [66]:
frame_1['age'] # dict-like notation


Out[66]:
0    3
1    1
2    2
3    4
Name: age, dtype: int64

In [67]:
frame_1.pet # attribute


Out[67]:
0    Toffee
1     Candy
2      Cake
3     Sussy
Name: pet, dtype: object

In [68]:
frame_2 = DataFrame(data_1, columns=['pet', 'age', 'toy'], index=['one', 'two', 'three', 'four'])
frame_2


Out[68]:
pet age toy
one Toffee 3 NaN
two Candy 1 NaN
three Cake 2 NaN
four Sussy 4 NaN

In [69]:
frame_2.ix['four'] # row


Out[69]:
pet    Sussy
age        4
toy      NaN
Name: four, dtype: object

In [70]:
frame_2.toy = 'bone'
frame_2


Out[70]:
pet age toy
one Toffee 3 bone
two Candy 1 bone
three Cake 2 bone
four Sussy 4 bone

In [73]:
frame_2.toy = Series(['bone', None, 'bone', 'bone'], index=['one', 'two', 'three', 'four'])
frame_2


Out[73]:
pet age toy
one Toffee 3 bone
two Candy 1 None
three Cake 2 bone
four Sussy 4 bone

In [74]:
frame_2['likes_bone_toy'] = frame_2.toy == 'bone'
frame_2


Out[74]:
pet age toy likes_bone_toy
one Toffee 3 bone True
two Candy 1 None False
three Cake 2 bone True
four Sussy 4 bone True

In [75]:
frame_2.T # transpose


Out[75]:
one two three four
pet Toffee Candy Cake Sussy
age 3 1 2 4
toy bone None bone bone
likes_bone_toy True False True True

In [76]:
frame_2.columns.name = 'Number'
frame_2.index.name = 'Pet'
frame_2


Out[76]:
Number pet age toy likes_bone_toy
Pet
one Toffee 3 bone True
two Candy 1 None False
three Cake 2 bone True
four Sussy 4 bone True

In [77]:
frame_2.values


Out[77]:
array([['Toffee', 3, 'bone', True],
       ['Candy', 1, None, False],
       ['Cake', 2, 'bone', True],
       ['Sussy', 4, 'bone', True]], dtype=object)

In [78]:
frame_2.index


Out[78]:
Index([u'one', u'two', u'three', u'four'], dtype='object', name=u'Pet')

In [79]:
'one' in frame_2.index


Out[79]:
True

In [83]:
frame_2.reindex(['four', 'three', 'two', 'one'], fill_value=0)


Out[83]:
Number pet age toy likes_bone_toy
Pet
four Sussy 4 bone True
three Cake 2 bone True
two Candy 1 None False
one Toffee 3 bone True

Interpolation / filling of values when reindex (Series)


In [88]:
fill_numbers = Series(['blue', 'yellow', 'green'], index=[0, 4, 8])
fill_numbers.reindex(range(12), method='ffill')


Out[88]:
0       blue
1       blue
2       blue
3       blue
4     yellow
5     yellow
6     yellow
7     yellow
8      green
9      green
10     green
11     green
dtype: object