Pandas Intro

What is pandas? Why use Pandas?

Pandas is an open source Python Library. Like must coding languages you can manipulate data in a very easy manner. This means that my days almost manually puckering data are behind me, since I moved in to the green fields of programing. But after cleansing the data, you still needed a program to load it. Proprietary good old Stata for me was very user friendly, but it is expensive! Must user friendly options for programs are expensive too. And after going through many months of “trial” and hacking my way in to getting several trials… After even considering buying a pirated copy of the program I decided that it was going to be much easier to just learn Python.

Data Structures

Pandas has 2 data structures that are built on top of numpy, this makes them faster.

Section	Description
Series	One dimensional Object, simillar to an array. It assigns label indexes to each item
Data Frame	Tabular data structure with rows and coluns

Series

A Series is a one-dimensional object similar to an array, list or a column in a table. It that it has a labeled index for each item. By default, the indexes go from 0-N



In [2]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline



In [4]:

    
series = pd.Series([1, "number", 6, "Happy Series!"])
series









    Out[4]:





0                1
1           number
2                6
3    Happy Series!
dtype: object

Series from a dictionary:



In [5]:

    
dictionary = {'Favorite Food': 'mexican', 'Favorite city': 'Portland', 'Hometown': 'Mexico City'}
favorite = pd.Series(dictionary)
favorite









    Out[5]:





Favorite Food        mexican
Favorite city       Portland
Hometown         Mexico City
dtype: object

Accesing an item from a series:



In [7]:

    
favorite['Favorite Food']









    Out[7]:





'mexican'

BOOLEAN indexing for selection



In [10]:

    
favorite[favorite=='mexican']









    Out[10]:





Favorite Food    mexican
dtype: object

Not null function



In [16]:

    
favorite.notnull()









    Out[16]:





Favorite Food    True
Favorite city    True
Hometown         True
other            True
dtype: bool



In [18]:

    
favorite[favorite.notnull()]









    Out[18]:





Favorite Food        mexican
Favorite city       Portland
Hometown         Mexico City
other                      0
dtype: object

Data Frame

To create a DataFrame we can pass a dictionary of lits in to the DataFrame constructor.



In [19]:

    
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions'],
        'wins': [11, 8, 10, 15, 11, 6, 10, 4],
        'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data, columns=['year', 'team', 'wins', 'losses'])
football

	year	team	wins	losses
0	2010	Bears	11	5
1	2011	Bears	8	8
2	2012	Bears	10	6
3	2011	Packers	15	1
4	2012	Packers	11	5
5	2010	Lions	6	10
6	2011	Lions	10	6
7	2012	Lions	4	12