Pandas Intro

From this tutorial

What is pandas? Why use Pandas?

Pandas is an open source Python Library. Like must coding languages you can manipulate data in a very easy manner. This means that my days almost manually puckering data are behind me, since I moved in to the green fields of programing. But after cleansing the data, you still needed a program to load it. Proprietary good old Stata for me was very user friendly, but it is expensive! Must user friendly options for programs are expensive too. And after going through many months of “trial” and hacking my way in to getting several trials… After even considering buying a pirated copy of the program I decided that it was going to be much easier to just learn Python.

Data Structures

Pandas has 2 data structures that are built on top of numpy, this makes them faster.

Section Description
Series One dimensional Object, simillar to an array. It assigns label indexes to each item
Data Frame Tabular data structure with rows and coluns

Series

A Series is a one-dimensional object similar to an array, list or a column in a table. It that it has a labeled index for each item. By default, the indexes go from 0-N


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline

In [4]:
series = pd.Series([1, "number", 6, "Happy Series!"])
series


Out[4]:
0                1
1           number
2                6
3    Happy Series!
dtype: object

Series from a dictionary:


In [5]:
dictionary = {'Favorite Food': 'mexican', 'Favorite city': 'Portland', 'Hometown': 'Mexico City'}
favorite = pd.Series(dictionary)
favorite


Out[5]:
Favorite Food        mexican
Favorite city       Portland
Hometown         Mexico City
dtype: object

Accesing an item from a series:


In [7]:
favorite['Favorite Food']


Out[7]:
'mexican'

BOOLEAN indexing for selection


In [10]:
favorite[favorite=='mexican']


Out[10]:
Favorite Food    mexican
dtype: object

Not null function


In [16]:
favorite.notnull()


Out[16]:
Favorite Food    True
Favorite city    True
Hometown         True
other            True
dtype: bool

In [18]:
favorite[favorite.notnull()]


Out[18]:
Favorite Food        mexican
Favorite city       Portland
Hometown         Mexico City
other                      0
dtype: object

Data Frame

To create a DataFrame we can pass a dictionary of lits in to the DataFrame constructor.


In [19]:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions'],
        'wins': [11, 8, 10, 15, 11, 6, 10, 4],
        'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data, columns=['year', 'team', 'wins', 'losses'])
football


Out[19]:
year team wins losses
0 2010 Bears 11 5
1 2011 Bears 8 8
2 2012 Bears 10 6
3 2011 Packers 15 1
4 2012 Packers 11 5
5 2010 Lions 6 10
6 2011 Lions 10 6
7 2012 Lions 4 12