Python for Data Analysis Lightning Tutorials is a series of tutorials in Data Analysis, Statistics, and Graphics using Python. The Pandas Cookbook series of tutorials provides recipes for common tasks and moves on to more advanced topics in statistics and time series analysis.
Created by Alfred Essa, Aug 8, 2013
Note: IPython Notebook and Data files can be found at my Github Site: github/alfredessa
In [118]:
import pandas as pd
import numpy as np
In [119]:
# series constructor with data as a list of integers
s1 = pd.Series([33, 19, 15, 89, 11, -5, 9])
In [120]:
# the default index, if not specified in the Series constructor, is a series of integers
s1
Out[120]:
In [121]:
# type of series is pandas series
type(s1)
Out[121]:
In [122]:
# retrieve the values of the series
s1.values
Out[122]:
In [123]:
# type of data values is NumPy ndarray
type(s1.values)
Out[123]:
In [124]:
# retrieve the indices of the array
s1.index
Out[124]:
In [125]:
# think of a series as a mapping from index to values
s1
Out[125]:
In [126]:
# define the data and index as lists
data1 = [33, 19, 15, 89, 11, -5, 9]
index1 = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
In [127]:
# create series
s2 = pd.Series(data1, index=index1)
In [128]:
s2
Out[128]:
In [129]:
# verify index
s2.index
Out[129]:
In [130]:
# we can also give meaningful labels to the series data and the index
s2.name='Daily Temperatures'
s2.index.name='Weekday'
In [131]:
s2
Out[131]:
In [ ]:
# the second data element in the list is a float
data2 = [33, 19.3, 15, 89, 11, -5, 9]
In [132]:
s3 = pd.Series(data2, index=index1)
In [133]:
# all the data elements are of type float
s3
Out[133]:
In [ ]:
dict1 = {'Mon': 33, 'Tue': 19, 'Wed': 15, 'Thu': 89, 'Fri': 11, 'Sat': -5, 'Sun': 9}
In [134]:
s4 = pd.Series(dict1)
In [135]:
s4
Out[135]:
In [ ]:
The most general representation of a Series is as an ordered key-value store.
In [137]:
# vectorized operations
s4 * 2
Out[137]:
In [138]:
np.log(s4)
Out[138]:
Note: NaN (not a number) is the standard missing data marker used in Pandas
In [139]:
# slice using index labels
s4['Thu':'Wed']
Out[139]:
In [140]:
# slice using position
s4[1:3]
Out[140]:
In [141]:
# retrieve value using offset
s4[1]
Out[141]:
In [142]:
# set value using offset
s4[1]=199
In [143]:
s4
Out[143]:
In [144]:
# as a subclass of ndarray, Series is a valid argument to most NumPy functions - median
s4
Out[144]:
In [145]:
s4.median()
Out[145]:
In [146]:
# maximum
s4.max()
Out[146]:
In [147]:
# cumsum
s4.cumsum()
Out[147]:
In [148]:
# looping over a collection and indices
for i,v in enumerate(s4):
print i,v
In [149]:
# list comprehension can be used to create a new list
new_list = [x**2 for x in s4]
In [150]:
new_list
Out[150]:
In [151]:
# is the key in the
'Sun' in s4
Out[151]:
In [152]:
Out[152]:
In [153]:
# retrieve value using key or index
s4['Tue']
Out[153]:
In [154]:
# assignment using key
s4['Tue']=200
In [155]:
s4
Out[155]:
In [157]:
# looping over dictionary keys and values
for k,v in s4.iteritems():
print k,v
In [ ]:
from IPython.core.display import HTML
HTML("<iframe src=http://pandas.pydata.org/pandas-docs/dev/dsintro.html#series width=800 height=350></iframe>")
In [ ]:
In [ ]:
!pwd
In [ ]:
np.rand(50)
In [ ]: