Software Engineering for Data Scientists

Manipulating Data with Python

DATA 515 A

Today's Objectives

0. Cloning LectureNotes

1. Opening & Navigating the Jupyter Notebook

2. Data type basics

3. Loading data with pandas

4. Cleaning and Manipulating data with pandas

5. Visualizing data with pandas & matplotlib

0. Cloning Lecture Notes

The course materials are maintained on github. The next lecture will discuss github in detail. Today, you'll get minimal instructions to get access to today's lecture materials.

  1. Open a terminal session
  2. Type 'git clone https://github.com/UWSEDS/LectureNotes.git'
  3. Wait until the download is complete
  4. cd LectureNotes
  5. cd 02_Procedural_Python

1. Opening and Navigating the IPython Notebook

We will start today with the interactive environment that we will be using often through the course: the Jupyter Notebook.

We will walk through the following steps together:

  1. Download miniconda (be sure to get Version 3.6) and install it on your system (hopefully you have done this before coming to class)

  2. Use the conda command-line tool to update your package listing and install the IPython notebook:

    Update conda's listing of packages for your system:

    $ conda update conda

    Install IPython notebook and all its requirements

    $ conda install jupyter notebook
  3. Navigate to the directory containing the course material. For example:

    $ cd LectureNotes/02_Procedural_Python

    You should see a number of files in the directory, including these:

    $ ls
  4. Type jupyter notebook in the terminal to start the notebook

    $ jupyter notebook

    If everything has worked correctly, it should automatically launch your default browser

  5. Click on Lecture-Python-And-Data.ipynb to open the notebook containing the content for this lecture.

With that, you're set up to use the Jupyter notebook!

2. Data Types Basics

2.1 Data type theory

  • Components with the same capabilities are of the same type.
    • For example, the numbers 2 and 200 are both integers.
  • A type is defined recursively. Some examples.
    • A list is a collection of objects that can be indexed by position.
    • A list of integers contains an integer at each position.
  • A type has a set of supported operations. For example:
    • Integers can be added
    • Strings can be concatented
    • A table can find the name of its columns
      • What type is returned from the operation?
  • In python, members (components and operations) are indicated by a '.'
    • If a is a list, the a.append(1) adds 1 to the list.

2.2 Primitive types

The primitive types are integers, floats, strings, booleans.

2.2.1 Integers


In [71]:
# Integer arithematic
1 + 1


Out[71]:
2

In [72]:
# Integer division version floating point division
print (6 // 4, 6/ 4)


1 1.5

2.2.2 Floats


In [73]:
# Have the full set of "calculator functions" but need the numpy package
import numpy as np
print (6.0 * 3, np.sin(2*np.pi))


18.0 -2.4492935982947064e-16

In [74]:
# Floats can have a null value called nan, not a number
a = np.nan
3*a


Out[74]:
nan

2.2.3 Strings


In [75]:
# Can concatenate, substring, find, count, ...

In [76]:
a = "The lazy"
b = "brown fox"
print ("Concatenation: ", a + b)
print ("First three letters: " + a[0:3])
print ("Index of 'z': " + str(a.find('z')))


Concatenation:  The lazybrown fox
First three letters: The
Index of 'z': 6

2.3 Tuples

A tuple is an ordered sequence of objects. Tuples cannot be changed; they are immuteable.


In [77]:
a_tuple = (1, 'ab', (1,2))
a_tuple


Out[77]:
(1, 'ab', (1, 2))

In [78]:
a_tuple[2]


Out[78]:
(1, 2)

2.4 Lists

A list is an ordered sequence of objects that can be changed.


In [79]:
a_list = [1, 'a', [1,2]]

In [80]:
a_list[0]


Out[80]:
1

In [81]:
a_list.append(2)
a_list


Out[81]:
[1, 'a', [1, 2], 2]

In [82]:
a_list


Out[82]:
[1, 'a', [1, 2], 2]

In [83]:
dir(a_list)


Out[83]:
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [84]:
help (a_list)


Help on list object:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.n
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __reversed__(...)
 |      L.__reversed__() -- return a reverse iterator over the list
 |  
 |  __rmul__(self, value, /)
 |      Return self*value.
 |  
 |  __setitem__(self, key, value, /)
 |      Set self[key] to value.
 |  
 |  __sizeof__(...)
 |      L.__sizeof__() -- size of L in memory, in bytes
 |  
 |  append(...)
 |      L.append(object) -> None -- append object to end
 |  
 |  clear(...)
 |      L.clear() -> None -- remove all items from L
 |  
 |  copy(...)
 |      L.copy() -> list -- a shallow copy of L
 |  
 |  count(...)
 |      L.count(value) -> integer -- return number of occurrences of value
 |  
 |  extend(...)
 |      L.extend(iterable) -> None -- extend list by appending elements from the iterable
 |  
 |  index(...)
 |      L.index(value, [start, [stop]]) -> integer -- return first index of value.
 |      Raises ValueError if the value is not present.
 |  
 |  insert(...)
 |      L.insert(index, object) -- insert object before index
 |  
 |  pop(...)
 |      L.pop([index]) -> item -- remove and return item at index (default last).
 |      Raises IndexError if list is empty or index is out of range.
 |  
 |  remove(...)
 |      L.remove(value) -> None -- remove first occurrence of value.
 |      Raises ValueError if the value is not present.
 |  
 |  reverse(...)
 |      L.reverse() -- reverse *IN PLACE*
 |  
 |  sort(...)
 |      L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __hash__ = None


In [85]:
a_list.count(1)


Out[85]:
1

2.5 Dictionaries

A dictionary is a kind of associates a key with a value. A value can be any object, even another dictionary.


In [86]:
dessert_dict = {}  # Empty dictionary
dessert_dict['Dave'] = "Cake"
dessert_dict["Joe"] = ["Cake", "Pie"]
print (dessert_dict)


{'Dave': 'Cake', 'Joe': ['Cake', 'Pie']}

In [87]:
dessert_dict["Dave"]


Out[87]:
'Cake'

In [88]:
# This produces an error
dessert_dict["Bernease"] = {}
dessert_dict


Out[88]:
{'Bernease': {}, 'Dave': 'Cake', 'Joe': ['Cake', 'Pie']}

In [89]:
dessert_dict["Bernease"] = {"Favorite": ["sorbet", "cobbler"], "Dislike": "Brownies"}

2.7 A Shakespearean Detour: "What's in a Name?"

Deep vs. Shallow Copies

A deep copy can be manipulated separately. A shallow copy is a pointer to the same data as the original.


In [90]:
# A first name shell game
first_int = 1
second_int = first_int
second_int += 1
second_int


Out[90]:
2

In [91]:
# What is first_int?
first_int


Out[91]:
1

In [92]:
# A second name shell game
a_list = ['a', 'aa', 'aaa']
b_list = a_list
b_list.append('bb')
b_list


Out[92]:
['a', 'aa', 'aaa', 'bb']

In [93]:
# What is a_list?
a_list


Out[93]:
['a', 'aa', 'aaa', 'bb']

In [94]:
# Create a deep copy
import copy
# A second name shell game
a_list = ['a', 'aa', 'aaa']
b_list = copy.deepcopy(a_list)
b_list.append('bb')
print("b_list = %s" % str(b_list))
print("a_list = %s" % str(a_list))


b_list = ['a', 'aa', 'aaa', 'bb']
a_list = ['a', 'aa', 'aaa']

Key insight: Deep vs. Shallow Copies

  • A deep copy can be manipulated separately from the original.
  • A shallow copy cannot.
  • Assigning a python immutable creates a deep copy. Non-immutables are shallow copies.

Name Resolution

The most common errors that you'll see in your python codes are:

  • NameError
  • AttributeError A common error when using the bash shell is command not found.

Name resolution: Associating a name with code or data.

Resolving a name in the bash shell is done by searching the directories in the PATH environment variable. The first executable with the name is run.


In [95]:
# Example 1 of name resolution in python
var = 10
def func(val):
    var = val + 1
    return val

In [96]:
# What is returned?
print("func(2) = %d" % func(2))
# What is var?
print("var = %d" % var)


func(2) = 2
var = 10

In [97]:
# Example 2 of name resolution in python
var = 10
def func(val):
    return val + var

In [98]:
# What is returned?
print("func(2) = %d" % func(2))
# What is var?
print("var = %d" % var)


func(2) = 12
var = 10

Insights on python name resolution

  • Names are assigned within a context.
  • Context changes with the function and module.
    • Assigning a name in a function creates a new name.
    • Referencing an unassigned name in function uses an existing name.

2.7 Object Essentials

Objects are a "packaging" of data and code. Almost all python entities are objects.


In [99]:
# A list and a dict are objects.
# dict has been implemented so that you see its values when you type
# the instance name.
# This is done with many python objects, like list.
a_dict = {'a': [1, 2], 'b': [3, 4, 5]}
a_dict


Out[99]:
{'a': [1, 2], 'b': [3, 4, 5]}

In [100]:
# You access the data and methods (codes) associated with an object by
# using the "." operator. These are referred to collectively
# as attributes. Methods are followed by parentheses;
# values (properties) are not.
a_dict.keys()


Out[100]:
dict_keys(['a', 'b'])

In [101]:
# You can discover the attributes of an object using "dir"
dir(a_dict)


Out[101]:
['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

2.8 Summary


type description
primitive int, float, string, bool
tuple An immutable collection of ordered objects
list A mutable collection of ordered objects
dictionary A mutable collection of named objects
object A packaging of codes and data

3. Python's Data Science Ecosystem

With this simple Python computation experience under our belt, we can now move to doing some more interesting analysis.

Python's Data Science Ecosystem

In addition to Python's built-in modules like the math module we explored above, there are also many often-used third-party modules that are core tools for doing data science with Python. Some of the most important ones are:

numpy: Numerical Python

Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data. If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.

scipy: Scientific Python

Scipy is short for "Scientific Python", and contains a wide range of functionality for accomplishing common scientific tasks, such as optimization/minimization, numerical integration, interpolation, and much more. We will not look closely at Scipy today, but we will use its functionality later in the course.

pandas: Labeled Data Manipulation in Python

Pandas is short for "Panel Data", and contains tools for doing more advanced manipulation of labeled data in Python, in particular with a columnar data structure called a Data Frame. If you've used the R statistical language (and in particular the so-called "Hadley Stack"), much of the functionality in Pandas should feel very familiar.

matplotlib: Visualization in Python

Matplotlib started out as a Matlab plotting clone in Python, and has grown from there in the 15 years since its creation. It is the most popular data visualization tool currently in the Python data world (though other recent packages are starting to encroach on its monopoly).

Installing Pandas & friends

Because the above packages are not included in Python itself, you need to install them separately. While it is possible to install these from source (compiling the C and/or Fortran code that does the heavy lifting under the hood) it is much easier to use a package manager like conda. All it takes is to run

$ conda install numpy scipy pandas matplotlib

and (so long as your conda setup is working) the packages will be downloaded and installed on your system.

4. Introduction to DataFrames

What are the elements of a table?


In [102]:
# Pandas DataFrames as table elements
import pandas as pd

What operations do we perform on tables?


In [103]:
df = pd.DataFrame({'A': [1,2,3], 'B': [2, 4, 6], 'ccc': [1.0, 33, 4]})
df


Out[103]:
A B ccc
0 1 2 1.0
1 2 4 33.0
2 3 6 4.0

In [104]:
sub_df = df[['A', 'ccc']]
sub_df


Out[104]:
A ccc
0 1 1.0
1 2 33.0
2 3 4.0

In [105]:
df['A'] + 2*df['B']


Out[105]:
0     5
1    10
2    15
dtype: int64

In [106]:
# Operations on a Pandas DataFrame

5. Manipulating Data with DataFrames

Downloading the data

shell commands can be run from the notebook by preceding them with an exclamation point:


In [107]:
!ls


Lecture-Python-And-Data.ipynb		pronto.csv
procedural_programming_in_python.ipynb

uncomment this to download the data:


In [108]:
#!curl -o pronto.csv https://data.seattle.gov/api/views/tw7j-dfaw/rows.csv?accessType=DOWNLOAD

Loading Data into a DataFrame

Because we'll use it so much, we often import under a shortened name using the import ... as ... pattern:


In [109]:
import pandas as pd
df = pd.read_csv('pronto.csv')

In [110]:
type(df)


Out[110]:
pandas.core.frame.DataFrame

In [111]:
len(df)


Out[111]:
275091

Now we can use the read_csv command to read the comma-separated-value data:

Note: strings in Python can be defined either with double quotes or single quotes

Viewing Pandas Dataframes

The head() and tail() methods show us the first and last rows of the data


In [112]:
df.head()


Out[112]:
trip_id starttime stoptime bikeid tripduration from_station_name to_station_name from_station_id to_station_id usertype gender birthyear
0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 985.935 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1960.0
1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 926.375 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1970.0
2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 883.831 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1988.0
3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 865.937 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1977.0
4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 923.923 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1971.0

In [113]:
df.columns


Out[113]:
Index(['trip_id', 'starttime', 'stoptime', 'bikeid', 'tripduration',
       'from_station_name', 'to_station_name', 'from_station_id',
       'to_station_id', 'usertype', 'gender', 'birthyear'],
      dtype='object')

The shape attribute shows us the number of elements:


In [114]:
df.shape


Out[114]:
(275091, 12)

The columns attribute gives us the column names

The index attribute gives us the index names

The dtypes attribute gives the data types of each column:


In [115]:
df.dtypes


Out[115]:
trip_id                int64
starttime             object
stoptime              object
bikeid                object
tripduration         float64
from_station_name     object
to_station_name       object
from_station_id       object
to_station_id         object
usertype              object
gender                object
birthyear            float64
dtype: object

Sophisticated Data Manipulation

Here we'll cover some key features of manipulating data with pandas

Access columns by name using square-bracket indexing:


In [116]:
df_small = df[ 'stoptime']

In [117]:
type(df_small)


Out[117]:
pandas.core.series.Series

In [118]:
df_small.tolist()


Out[118]:
['10/13/2014 10:48:00 AM',
 '10/13/2014 10:48:00 AM',
 '10/13/2014 10:48:00 AM',
 '10/13/2014 10:48:00 AM',
 '10/13/2014 10:49:00 AM',
 '10/13/2014 10:47:00 AM',
 '10/13/2014 11:45:00 AM',
 '10/13/2014 11:45:00 AM',
 '10/13/2014 11:45:00 AM',
 '10/13/2014 11:45:00 AM',
 '10/13/2014 11:45:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:47:00 AM',
 '10/13/2014 11:49:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:49:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:52:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:55:00 AM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 12:02:00 PM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 11:51:00 AM',
 '10/13/2014 11:59:00 AM',
 '10/13/2014 12:00:00 PM',
 '10/13/2014 11:59:00 AM',
 '10/13/2014 11:59:00 AM',
 '10/13/2014 11:59:00 AM',
 '10/13/2014 12:06:00 PM',
 '10/13/2014 12:01:00 PM',
 '10/13/2014 12:04:00 PM',
 '10/13/2014 12:01:00 PM',
 '10/13/2014 12:11:00 PM',
 '10/13/2014 12:11:00 PM',
 '10/13/2014 12:04:00 PM',
 '10/13/2014 12:01:00 PM',
 '10/13/2014 12:04:00 PM',
 '10/13/2014 12:05:00 PM',
 '10/13/2014 12:04:00 PM',
 '10/13/2014 11:55:00 AM',
 '10/13/2014 12:02:00 PM',
 '10/13/2014 12:39:00 PM',
 '10/13/2014 12:39:00 PM',
 '10/13/2014 12:05:00 PM',
 '10/13/2014 12:39:00 PM',
 '10/13/2014 12:19:00 PM',
 '10/13/2014 12:40:00 PM',
 '10/13/2014 12:20:00 PM',
 '10/13/2014 02:28:00 PM',
 '10/13/2014 12:10:00 PM',
 '10/13/2014 12:13:00 PM',
 '10/13/2014 12:13:00 PM',
 '10/13/2014 12:16:00 PM',
 '10/13/2014 12:45:00 PM',
 '10/13/2014 07:07:00 PM',
 '10/13/2014 12:16:00 PM',
 '10/13/2014 12:25:00 PM',
 '10/13/2014 12:20:00 PM',
 '10/13/2014 12:31:00 PM',
 '10/13/2014 12:31:00 PM',
 '10/13/2014 12:31:00 PM',
 '10/13/2014 12:22:00 PM',
 '10/13/2014 12:21:00 PM',
 '10/13/2014 12:34:00 PM',
 '10/13/2014 12:31:00 PM',
 '10/13/2014 12:26:00 PM',
 '10/13/2014 12:32:00 PM',
 '10/13/2014 12:57:00 PM',
 '10/13/2014 12:57:00 PM',
 '10/13/2014 01:21:00 PM',
 '10/13/2014 12:36:00 PM',
 '10/13/2014 12:40:00 PM',
 '10/13/2014 12:40:00 PM',
 '10/13/2014 12:47:00 PM',
 '10/13/2014 12:46:00 PM',
 '10/13/2014 12:45:00 PM',
 '10/13/2014 12:48:00 PM',
 '10/13/2014 12:47:00 PM',
 '10/13/2014 12:51:00 PM',
 '10/13/2014 12:58:00 PM',
 '10/13/2014 12:52:00 PM',
 '10/13/2014 12:58:00 PM',
 '10/13/2014 12:59:00 PM',
 '10/13/2014 01:15:00 PM',
 '10/13/2014 01:04:00 PM',
 '10/13/2014 01:18:00 PM',
 '10/13/2014 01:16:00 PM',
 '10/13/2014 01:13:00 PM',
 '10/13/2014 01:10:00 PM',
 '10/13/2014 01:12:00 PM',
 '10/13/2014 01:16:00 PM',
 '10/13/2014 01:14:00 PM',
 '10/13/2014 01:30:00 PM',
 '10/13/2014 01:22:00 PM',
 '10/13/2014 01:22:00 PM',
 '10/13/2014 01:15:00 PM',
 '10/13/2014 01:09:00 PM',
 '10/13/2014 01:18:00 PM',
 '10/13/2014 01:27:00 PM',
 '10/13/2014 01:24:00 PM',
 '10/13/2014 01:30:00 PM',
 '10/13/2014 01:33:00 PM',
 '10/13/2014 01:33:00 PM',
 '10/13/2014 01:33:00 PM',
 '10/13/2014 01:42:00 PM',
 '10/13/2014 01:42:00 PM',
 '10/13/2014 01:42:00 PM',
 '10/13/2014 12:47:00 PM',
 '10/13/2014 12:49:00 PM',
 '10/13/2014 01:44:00 PM',
 '10/13/2014 02:29:00 PM',
 '10/13/2014 01:52:00 PM',
 '10/13/2014 02:22:00 PM',
 '10/13/2014 01:45:00 PM',
 '10/13/2014 01:55:00 PM',
 '10/13/2014 01:56:00 PM',
 '10/13/2014 02:03:00 PM',
 '10/13/2014 01:52:00 PM',
 '10/13/2014 02:09:00 PM',
 '10/13/2014 02:09:00 PM',
 '10/13/2014 01:49:00 PM',
 '10/13/2014 02:09:00 PM',
 '10/13/2014 01:56:00 PM',
 '10/13/2014 01:59:00 PM',
 '10/13/2014 02:08:00 PM',
 '10/13/2014 02:05:00 PM',
 '10/13/2014 02:12:00 PM',
 '10/13/2014 02:07:00 PM',
 '10/13/2014 02:35:00 PM',
 '10/13/2014 02:35:00 PM',
 '10/13/2014 02:07:00 PM',
 '10/13/2014 02:11:00 PM',
 '10/13/2014 02:12:00 PM',
 '10/13/2014 02:35:00 PM',
 '10/13/2014 02:18:00 PM',
 '10/13/2014 02:18:00 PM',
 '10/13/2014 02:18:00 PM',
 '10/13/2014 02:18:00 PM',
 '10/13/2014 02:39:00 PM',
 '10/13/2014 02:39:00 PM',
 '10/13/2014 02:16:00 PM',
 '10/13/2014 02:39:00 PM',
 '10/13/2014 02:39:00 PM',
 '10/13/2014 03:08:00 PM',
 '10/13/2014 02:38:00 PM',
 '10/13/2014 02:38:00 PM',
 '10/13/2014 02:20:00 PM',
 '10/13/2014 02:24:00 PM',
 '10/13/2014 02:38:00 PM',
 '10/13/2014 02:26:00 PM',
 '10/13/2014 02:34:00 PM',
 '10/13/2014 02:34:00 PM',
 '10/13/2014 02:31:00 PM',
 '10/13/2014 02:26:00 PM',
 '10/13/2014 02:29:00 PM',
 '10/13/2014 02:48:00 PM',
 '10/13/2014 02:48:00 PM',
 '10/13/2014 02:51:00 PM',
 '10/13/2014 02:37:00 PM',
 '10/13/2014 02:33:00 PM',
 '10/13/2014 02:34:00 PM',
 '10/13/2014 02:31:00 PM',
 '10/13/2014 02:30:00 PM',
 '10/13/2014 02:49:00 PM',
 '10/13/2014 02:40:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 02:39:00 PM',
 '10/13/2014 02:49:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 02:55:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 03:01:00 PM',
 '10/13/2014 02:54:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 02:56:00 PM',
 '10/13/2014 03:09:00 PM',
 '10/13/2014 03:10:00 PM',
 '10/13/2014 05:59:00 PM',
 '10/13/2014 03:02:00 PM',
 '10/13/2014 03:13:00 PM',
 '10/13/2014 03:10:00 PM',
 '10/13/2014 03:16:00 PM',
 '10/13/2014 03:11:00 PM',
 '10/13/2014 03:11:00 PM',
 '10/13/2014 03:23:00 PM',
 '10/13/2014 03:23:00 PM',
 '10/13/2014 03:25:00 PM',
 '10/13/2014 03:53:00 PM',
 '10/13/2014 03:47:00 PM',
 '10/13/2014 03:20:00 PM',
 '10/13/2014 03:53:00 PM',
 '10/13/2014 03:57:00 PM',
 '10/13/2014 05:16:00 PM',
 '10/13/2014 03:27:00 PM',
 '10/13/2014 03:36:00 PM',
 '10/13/2014 03:50:00 PM',
 '10/13/2014 03:48:00 PM',
 '10/13/2014 03:37:00 PM',
 '10/13/2014 03:47:00 PM',
 '10/13/2014 03:48:00 PM',
 '10/13/2014 03:48:00 PM',
 '10/13/2014 03:57:00 PM',
 '10/13/2014 03:59:00 PM',
 '10/13/2014 04:05:00 PM',
 '10/13/2014 04:08:00 PM',
 '10/13/2014 04:07:00 PM',
 '10/13/2014 03:57:00 PM',
 '10/13/2014 04:00:00 PM',
 '10/13/2014 04:16:00 PM',
 '10/13/2014 05:08:00 PM',
 '10/13/2014 04:22:00 PM',
 '10/13/2014 04:10:00 PM',
 '10/13/2014 04:10:00 PM',
 '10/13/2014 04:05:00 PM',
 '10/13/2014 04:05:00 PM',
 '10/13/2014 04:13:00 PM',
 '10/13/2014 04:20:00 PM',
 '10/13/2014 04:10:00 PM',
 '10/13/2014 04:15:00 PM',
 '10/13/2014 04:15:00 PM',
 '10/13/2014 04:30:00 PM',
 '10/13/2014 04:19:00 PM',
 '10/13/2014 04:28:00 PM',
 '10/13/2014 04:24:00 PM',
 '10/13/2014 04:47:00 PM',
 '10/13/2014 04:46:00 PM',
 '10/13/2014 04:22:00 PM',
 '10/13/2014 04:24:00 PM',
 '10/13/2014 04:22:00 PM',
 '10/13/2014 04:42:00 PM',
 '10/13/2014 04:37:00 PM',
 '10/13/2014 04:37:00 PM',
 '10/13/2014 04:47:00 PM',
 '10/13/2014 04:32:00 PM',
 '10/13/2014 04:47:00 PM',
 '10/13/2014 04:48:00 PM',
 '10/13/2014 04:40:00 PM',
 '10/13/2014 04:33:00 PM',
 '10/13/2014 04:33:00 PM',
 '10/13/2014 04:39:00 PM',
 '10/13/2014 04:42:00 PM',
 '10/13/2014 04:51:00 PM',
 '10/13/2014 04:51:00 PM',
 '10/13/2014 04:45:00 PM',
 '10/13/2014 04:48:00 PM',
 '10/13/2014 04:50:00 PM',
 '10/13/2014 04:58:00 PM',
 '10/13/2014 05:11:00 PM',
 '10/13/2014 05:14:00 PM',
 '10/13/2014 04:54:00 PM',
 '10/13/2014 05:14:00 PM',
 '10/13/2014 05:14:00 PM',
 '10/13/2014 05:03:00 PM',
 '10/13/2014 05:07:00 PM',
 '10/13/2014 05:14:00 PM',
 '10/13/2014 05:19:00 PM',
 '10/13/2014 05:01:00 PM',
 '10/13/2014 05:12:00 PM',
 '10/13/2014 05:16:00 PM',
 '10/13/2014 05:10:00 PM',
 '10/13/2014 05:19:00 PM',
 '10/13/2014 05:20:00 PM',
 '10/13/2014 05:19:00 PM',
 '10/13/2014 05:19:00 PM',
 '10/13/2014 05:16:00 PM',
 '10/13/2014 05:07:00 PM',
 '10/13/2014 05:21:00 PM',
 '10/13/2014 05:20:00 PM',
 '10/13/2014 05:18:00 PM',
 '10/13/2014 05:20:00 PM',
 '10/13/2014 05:15:00 PM',
 '10/13/2014 05:15:00 PM',
 '10/13/2014 05:27:00 PM',
 '10/13/2014 05:20:00 PM',
 '10/13/2014 05:25:00 PM',
 '10/13/2014 05:23:00 PM',
 '10/13/2014 05:32:00 PM',
 '10/13/2014 05:31:00 PM',
 '10/13/2014 05:30:00 PM',
 '10/13/2014 05:39:00 PM',
 '10/13/2014 05:43:00 PM',
 '10/13/2014 05:52:00 PM',
 '10/13/2014 05:52:00 PM',
 '10/13/2014 05:32:00 PM',
 '10/13/2014 05:52:00 PM',
 '10/13/2014 06:09:00 PM',
 '10/13/2014 05:52:00 PM',
 '10/13/2014 05:53:00 PM',
 '10/13/2014 05:47:00 PM',
 '10/13/2014 05:47:00 PM',
 '10/13/2014 05:47:00 PM',
 '10/13/2014 05:48:00 PM',
 '10/13/2014 05:55:00 PM',
 '10/13/2014 05:58:00 PM',
 '10/13/2014 05:51:00 PM',
 '10/13/2014 05:44:00 PM',
 '10/13/2014 05:55:00 PM',
 '10/13/2014 05:49:00 PM',
 '10/13/2014 06:06:00 PM',
 '10/13/2014 05:52:00 PM',
 '10/13/2014 05:58:00 PM',
 '10/13/2014 05:49:00 PM',
 '10/13/2014 06:00:00 PM',
 '10/13/2014 06:02:00 PM',
 '10/13/2014 06:11:00 PM',
 '10/13/2014 06:01:00 PM',
 '10/13/2014 06:03:00 PM',
 '10/13/2014 06:18:00 PM',
 '10/13/2014 06:03:00 PM',
 '10/13/2014 07:28:00 PM',
 '10/13/2014 06:27:00 PM',
 '10/13/2014 06:27:00 PM',
 '10/13/2014 06:07:00 PM',
 '10/13/2014 06:28:00 PM',
 '10/13/2014 06:19:00 PM',
 '10/13/2014 06:27:00 PM',
 '10/13/2014 06:08:00 PM',
 '10/13/2014 06:15:00 PM',
 '10/13/2014 06:11:00 PM',
 '10/13/2014 06:16:00 PM',
 '10/13/2014 06:13:00 PM',
 '10/13/2014 06:13:00 PM',
 '10/13/2014 06:34:00 PM',
 '10/13/2014 06:19:00 PM',
 '10/13/2014 06:30:00 PM',
 '10/13/2014 06:35:00 PM',
 '10/13/2014 06:28:00 PM',
 '10/13/2014 06:31:00 PM',
 '10/13/2014 06:30:00 PM',
 '10/13/2014 06:43:00 PM',
 '10/13/2014 06:32:00 PM',
 '10/13/2014 06:44:00 PM',
 '10/13/2014 06:44:00 PM',
 '10/13/2014 06:33:00 PM',
 '10/13/2014 06:46:00 PM',
 '10/13/2014 06:46:00 PM',
 '10/13/2014 06:38:00 PM',
 '10/13/2014 06:46:00 PM',
 '10/13/2014 06:46:00 PM',
 '10/13/2014 06:38:00 PM',
 '10/13/2014 06:43:00 PM',
 '10/13/2014 06:43:00 PM',
 '10/13/2014 06:54:00 PM',
 '10/13/2014 06:49:00 PM',
 '10/13/2014 06:54:00 PM',
 '10/13/2014 07:10:00 PM',
 '10/13/2014 07:47:00 PM',
 '10/13/2014 07:48:00 PM',
 '10/13/2014 06:53:00 PM',
 '10/13/2014 07:05:00 PM',
 '10/13/2014 06:54:00 PM',
 '10/13/2014 07:05:00 PM',
 '10/13/2014 07:19:00 PM',
 '10/13/2014 07:47:00 PM',
 '10/13/2014 07:46:00 PM',
 '10/13/2014 07:39:00 PM',
 '10/13/2014 07:39:00 PM',
 '10/13/2014 07:47:00 PM',
 '10/13/2014 07:46:00 PM',
 '10/13/2014 07:56:00 PM',
 '10/13/2014 07:45:00 PM',
 '10/13/2014 08:12:00 PM',
 '10/13/2014 08:15:00 PM',
 '10/13/2014 08:14:00 PM',
 '10/13/2014 08:08:00 PM',
 '10/13/2014 08:22:00 PM',
 '10/13/2014 08:19:00 PM',
 '10/13/2014 08:28:00 PM',
 '10/13/2014 08:33:00 PM',
 '10/13/2014 09:24:00 PM',
 '10/13/2014 08:52:00 PM',
 '10/13/2014 08:38:00 PM',
 '10/13/2014 08:52:00 PM',
 '10/13/2014 08:57:00 PM',
 '10/13/2014 08:49:00 PM',
 '10/13/2014 09:18:00 PM',
 '10/13/2014 09:26:00 PM',
 '10/13/2014 09:11:00 PM',
 '10/13/2014 09:21:00 PM',
 '10/13/2014 09:20:00 PM',
 '10/13/2014 09:39:00 PM',
 '10/13/2014 09:28:00 PM',
 '10/13/2014 09:41:00 PM',
 '10/13/2014 09:54:00 PM',
 '10/13/2014 09:59:00 PM',
 '10/13/2014 09:57:00 PM',
 '10/13/2014 10:05:00 PM',
 '10/13/2014 10:08:00 PM',
 '10/13/2014 10:22:00 PM',
 '10/13/2014 10:32:00 PM',
 '10/13/2014 11:17:00 PM',
 '10/13/2014 11:15:00 PM',
 '10/14/2014 06:52:00 AM',
 '10/14/2014 05:57:00 AM',
 '10/14/2014 06:14:00 AM',
 '10/14/2014 06:36:00 AM',
 '10/14/2014 06:42:00 AM',
 '10/14/2014 06:50:00 AM',
 '10/14/2014 06:50:00 AM',
 '10/14/2014 07:21:00 AM',
 '10/14/2014 07:28:00 AM',
 '10/14/2014 07:29:00 AM',
 '10/14/2014 07:36:00 AM',
 '10/14/2014 07:34:00 AM',
 '10/14/2014 07:34:00 AM',
 '10/14/2014 07:39:00 AM',
 '10/14/2014 07:50:00 AM',
 '10/14/2014 07:50:00 AM',
 '10/14/2014 07:48:00 AM',
 '10/14/2014 07:45:00 AM',
 '10/14/2014 07:48:00 AM',
 '10/14/2014 07:58:00 AM',
 '10/14/2014 08:00:00 AM',
 '10/14/2014 07:50:00 AM',
 '10/14/2014 07:55:00 AM',
 '10/14/2014 08:05:00 AM',
 '10/14/2014 08:01:00 AM',
 '10/14/2014 08:18:00 AM',
 '10/14/2014 08:03:00 AM',
 '10/14/2014 08:14:00 AM',
 '10/14/2014 08:06:00 AM',
 '10/14/2014 08:06:00 AM',
 '10/14/2014 08:09:00 AM',
 '10/14/2014 08:12:00 AM',
 '10/14/2014 08:16:00 AM',
 '10/14/2014 08:11:00 AM',
 '10/14/2014 08:09:00 AM',
 '10/14/2014 08:14:00 AM',
 '10/14/2014 08:19:00 AM',
 '10/14/2014 08:37:00 AM',
 '10/14/2014 08:19:00 AM',
 '10/14/2014 10:11:00 AM',
 '10/14/2014 08:17:00 AM',
 '10/14/2014 08:29:00 AM',
 '10/14/2014 08:26:00 AM',
 '10/14/2014 08:30:00 AM',
 '10/14/2014 08:41:00 AM',
 '10/14/2014 08:31:00 AM',
 '10/14/2014 08:33:00 AM',
 '10/14/2014 08:33:00 AM',
 '10/14/2014 08:38:00 AM',
 '10/14/2014 08:44:00 AM',
 '10/14/2014 08:39:00 AM',
 '10/14/2014 08:54:00 AM',
 '10/14/2014 08:46:00 AM',
 '10/14/2014 08:45:00 AM',
 '10/14/2014 08:48:00 AM',
 '10/14/2014 09:00:00 AM',
 '10/14/2014 09:14:00 AM',
 '10/14/2014 08:50:00 AM',
 '10/14/2014 08:59:00 AM',
 '10/14/2014 08:54:00 AM',
 '10/14/2014 09:17:00 AM',
 '10/14/2014 09:26:00 AM',
 '10/14/2014 09:04:00 AM',
 '10/14/2014 09:02:00 AM',
 '10/14/2014 09:18:00 AM',
 '10/14/2014 09:10:00 AM',
 '10/14/2014 09:17:00 AM',
 '10/14/2014 01:20:00 PM',
 '10/14/2014 09:26:00 AM',
 '10/14/2014 09:20:00 AM',
 '10/14/2014 09:30:00 AM',
 '10/14/2014 09:37:00 AM',
 '10/14/2014 09:32:00 AM',
 '10/14/2014 09:38:00 AM',
 '10/14/2014 09:41:00 AM',
 '10/14/2014 09:40:00 AM',
 '10/14/2014 09:45:00 AM',
 '10/14/2014 09:40:00 AM',
 '10/14/2014 10:15:00 AM',
 '10/14/2014 10:15:00 AM',
 '10/14/2014 09:44:00 AM',
 '10/14/2014 09:54:00 AM',
 '10/14/2014 11:59:00 AM',
 '10/14/2014 09:54:00 AM',
 '10/14/2014 10:00:00 AM',
 '10/14/2014 10:10:00 AM',
 '10/14/2014 10:05:00 AM',
 '10/14/2014 10:01:00 AM',
 '10/14/2014 10:08:00 AM',
 '10/14/2014 10:09:00 AM',
 '10/14/2014 10:27:00 AM',
 '10/14/2014 10:08:00 AM',
 '10/14/2014 10:17:00 AM',
 '10/14/2014 10:10:00 AM',
 '10/14/2014 10:30:00 AM',
 '10/14/2014 10:30:00 AM',
 '10/14/2014 10:23:00 AM',
 '10/14/2014 10:24:00 AM',
 '10/14/2014 10:34:00 AM',
 '10/14/2014 10:53:00 AM',
 '10/14/2014 10:53:00 AM',
 '10/14/2014 10:51:00 AM',
 '10/14/2014 10:48:00 AM',
 '10/14/2014 10:53:00 AM',
 '10/14/2014 05:19:00 PM',
 '10/14/2014 11:01:00 AM',
 '10/14/2014 11:01:00 AM',
 '10/14/2014 11:07:00 AM',
 '10/14/2014 11:27:00 AM',
 '10/14/2014 11:25:00 AM',
 '10/14/2014 11:13:00 AM',
 '10/14/2014 11:13:00 AM',
 '10/14/2014 11:17:00 AM',
 '10/14/2014 11:18:00 AM',
 '10/14/2014 11:18:00 AM',
 '10/14/2014 11:20:00 AM',
 '10/14/2014 11:27:00 AM',
 '10/14/2014 11:27:00 AM',
 '10/14/2014 11:27:00 AM',
 '10/14/2014 11:32:00 AM',
 '10/14/2014 11:26:00 AM',
 '10/14/2014 11:35:00 AM',
 '10/14/2014 12:16:00 PM',
 '10/14/2014 12:16:00 PM',
 '10/14/2014 11:38:00 AM',
 '10/14/2014 11:43:00 AM',
 '10/14/2014 11:43:00 AM',
 '10/14/2014 11:43:00 AM',
 '10/14/2014 11:54:00 AM',
 '10/14/2014 11:54:00 AM',
 '10/14/2014 11:54:00 AM',
 '10/14/2014 11:54:00 AM',
 '10/14/2014 11:49:00 AM',
 '10/14/2014 11:55:00 AM',
 '10/14/2014 11:54:00 AM',
 '10/14/2014 11:57:00 AM',
 '10/14/2014 11:56:00 AM',
 '10/14/2014 12:02:00 PM',
 '10/14/2014 12:06:00 PM',
 '10/14/2014 11:56:00 AM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 02:11:00 PM',
 '10/14/2014 02:11:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 11:58:00 AM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:00:00 PM',
 '10/14/2014 12:26:00 PM',
 '10/14/2014 12:13:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:10:00 PM',
 '10/14/2014 12:11:00 PM',
 '10/14/2014 12:17:00 PM',
 '10/14/2014 12:21:00 PM',
 '10/14/2014 12:54:00 PM',
 '10/14/2014 12:45:00 PM',
 '10/14/2014 01:17:00 PM',
 '10/14/2014 12:21:00 PM',
 '10/14/2014 12:26:00 PM',
 '10/14/2014 12:44:00 PM',
 '10/14/2014 12:29:00 PM',
 '10/14/2014 12:30:00 PM',
 '10/14/2014 12:29:00 PM',
 '10/14/2014 12:37:00 PM',
 '10/14/2014 12:54:00 PM',
 '10/14/2014 12:38:00 PM',
 '10/14/2014 12:54:00 PM',
 '10/14/2014 12:35:00 PM',
 '10/14/2014 12:41:00 PM',
 '10/14/2014 12:34:00 PM',
 '10/14/2014 12:46:00 PM',
 '10/14/2014 12:49:00 PM',
 '10/14/2014 01:09:00 PM',
 '10/14/2014 12:40:00 PM',
 '10/14/2014 12:44:00 PM',
 '10/14/2014 01:09:00 PM',
 '10/14/2014 12:52:00 PM',
 '10/14/2014 12:55:00 PM',
 '10/14/2014 12:58:00 PM',
 '10/14/2014 12:58:00 PM',
 '10/14/2014 12:55:00 PM',
 '10/14/2014 12:54:00 PM',
 '10/14/2014 12:58:00 PM',
 '10/14/2014 01:02:00 PM',
 '10/14/2014 01:01:00 PM',
 '10/14/2014 12:59:00 PM',
 '10/14/2014 12:59:00 PM',
 '10/14/2014 01:10:00 PM',
 '10/14/2014 01:11:00 PM',
 '10/14/2014 01:01:00 PM',
 '10/14/2014 01:10:00 PM',
 '10/14/2014 02:05:00 PM',
 '10/14/2014 01:09:00 PM',
 '10/14/2014 01:31:00 PM',
 '10/14/2014 01:22:00 PM',
 '10/14/2014 01:38:00 PM',
 '10/14/2014 01:10:00 PM',
 '10/14/2014 01:20:00 PM',
 '10/14/2014 01:10:00 PM',
 '10/14/2014 01:10:00 PM',
 '10/14/2014 01:16:00 PM',
 '10/14/2014 01:17:00 PM',
 '10/14/2014 01:19:00 PM',
 '10/14/2014 01:17:00 PM',
 '10/14/2014 01:27:00 PM',
 '10/14/2014 01:30:00 PM',
 '10/14/2014 01:29:00 PM',
 '10/14/2014 01:23:00 PM',
 '10/14/2014 01:26:00 PM',
 '10/14/2014 01:30:00 PM',
 '10/14/2014 01:30:00 PM',
 '10/14/2014 01:26:00 PM',
 '10/14/2014 01:26:00 PM',
 '10/14/2014 01:25:00 PM',
 '10/14/2014 01:30:00 PM',
 '10/14/2014 01:25:00 PM',
 '10/14/2014 01:38:00 PM',
 '10/14/2014 01:42:00 PM',
 '10/14/2014 01:33:00 PM',
 '10/14/2014 01:29:00 PM',
 '10/14/2014 01:39:00 PM',
 '10/14/2014 02:15:00 PM',
 '10/14/2014 01:43:00 PM',
 '10/14/2014 01:41:00 PM',
 '10/14/2014 01:40:00 PM',
 '10/14/2014 01:43:00 PM',
 '10/14/2014 01:43:00 PM',
 '10/14/2014 03:52:00 PM',
 '10/14/2014 01:42:00 PM',
 '10/14/2014 01:56:00 PM',
 '10/14/2014 01:53:00 PM',
 '10/14/2014 01:41:00 PM',
 '10/14/2014 01:53:00 PM',
 '10/14/2014 01:55:00 PM',
 '10/14/2014 01:51:00 PM',
 '10/14/2014 02:10:00 PM',
 '10/14/2014 01:43:00 PM',
 '10/14/2014 01:44:00 PM',
 '10/14/2014 01:54:00 PM',
 '10/14/2014 01:54:00 PM',
 '10/14/2014 01:56:00 PM',
 '10/14/2014 02:27:00 PM',
 '10/14/2014 02:09:00 PM',
 '10/14/2014 02:30:00 PM',
 '10/14/2014 02:29:00 PM',
 '10/14/2014 02:10:00 PM',
 '10/14/2014 02:38:00 PM',
 '10/14/2014 02:18:00 PM',
 '10/14/2014 02:38:00 PM',
 '10/14/2014 02:16:00 PM',
 '10/14/2014 02:22:00 PM',
 '10/14/2014 02:26:00 PM',
 '10/14/2014 02:32:00 PM',
 '10/14/2014 03:51:00 PM',
 '10/14/2014 02:22:00 PM',
 '10/14/2014 02:35:00 PM',
 '10/14/2014 02:40:00 PM',
 '10/14/2014 02:31:00 PM',
 '10/14/2014 02:36:00 PM',
 '10/14/2014 02:38:00 PM',
 '10/14/2014 02:52:00 PM',
 '10/14/2014 02:54:00 PM',
 '10/14/2014 02:52:00 PM',
 '10/14/2014 03:01:00 PM',
 '10/14/2014 03:00:00 PM',
 '10/14/2014 03:09:00 PM',
 '10/14/2014 03:09:00 PM',
 '10/14/2014 02:57:00 PM',
 '10/14/2014 02:57:00 PM',
 '10/14/2014 03:03:00 PM',
 '10/14/2014 03:06:00 PM',
 '10/14/2014 03:10:00 PM',
 '10/14/2014 03:17:00 PM',
 '10/14/2014 03:17:00 PM',
 '10/14/2014 03:20:00 PM',
 '10/14/2014 03:21:00 PM',
 '10/14/2014 03:32:00 PM',
 '10/14/2014 03:42:00 PM',
 '10/14/2014 03:42:00 PM',
 '10/14/2014 03:34:00 PM',
 '10/14/2014 03:46:00 PM',
 '10/14/2014 03:45:00 PM',
 '10/14/2014 03:47:00 PM',
 '10/14/2014 03:52:00 PM',
 '10/14/2014 03:46:00 PM',
 '10/14/2014 03:46:00 PM',
 '10/14/2014 03:48:00 PM',
 '10/14/2014 04:23:00 PM',
 '10/14/2014 03:49:00 PM',
 '10/14/2014 03:48:00 PM',
 '10/14/2014 06:41:00 PM',
 '10/14/2014 03:48:00 PM',
 '10/14/2014 04:50:00 PM',
 '10/14/2014 04:00:00 PM',
 '10/14/2014 04:25:00 PM',
 '10/14/2014 06:40:00 PM',
 '10/14/2014 04:24:00 PM',
 '10/14/2014 04:25:00 PM',
 '10/14/2014 04:00:00 PM',
 '10/14/2014 04:25:00 PM',
 '10/14/2014 04:25:00 PM',
 '10/14/2014 03:57:00 PM',
 '10/14/2014 04:09:00 PM',
 '10/14/2014 04:25:00 PM',
 '10/14/2014 04:21:00 PM',
 '10/14/2014 04:21:00 PM',
 '10/14/2014 04:27:00 PM',
 '10/14/2014 04:08:00 PM',
 '10/14/2014 04:14:00 PM',
 '10/14/2014 04:17:00 PM',
 '10/14/2014 04:16:00 PM',
 '10/14/2014 05:54:00 PM',
 '10/14/2014 04:20:00 PM',
 '10/14/2014 04:41:00 PM',
 '10/14/2014 04:41:00 PM',
 '10/14/2014 04:26:00 PM',
 '10/14/2014 04:47:00 PM',
 '10/14/2014 04:39:00 PM',
 '10/14/2014 04:49:00 PM',
 '10/14/2014 04:40:00 PM',
 '10/14/2014 04:43:00 PM',
 '10/14/2014 04:46:00 PM',
 '10/14/2014 04:51:00 PM',
 '10/14/2014 04:52:00 PM',
 '10/14/2014 04:50:00 PM',
 '10/14/2014 04:50:00 PM',
 '10/14/2014 05:07:00 PM',
 '10/14/2014 05:18:00 PM',
 '10/14/2014 05:04:00 PM',
 '10/14/2014 05:01:00 PM',
 '10/14/2014 05:03:00 PM',
 '10/14/2014 04:58:00 PM',
 '10/14/2014 05:09:00 PM',
 '10/14/2014 05:08:00 PM',
 '10/14/2014 05:12:00 PM',
 '10/14/2014 05:11:00 PM',
 '10/14/2014 05:13:00 PM',
 '10/14/2014 05:07:00 PM',
 '10/14/2014 05:14:00 PM',
 '10/14/2014 05:21:00 PM',
 '10/14/2014 05:56:00 PM',
 '10/14/2014 05:25:00 PM',
 '10/14/2014 05:24:00 PM',
 '10/14/2014 05:18:00 PM',
 '10/14/2014 05:33:00 PM',
 '10/14/2014 05:26:00 PM',
 '10/14/2014 05:22:00 PM',
 '10/14/2014 05:34:00 PM',
 '10/14/2014 05:32:00 PM',
 '10/14/2014 06:55:00 PM',
 '10/14/2014 05:29:00 PM',
 '10/14/2014 05:24:00 PM',
 '10/14/2014 05:32:00 PM',
 '10/14/2014 05:30:00 PM',
 '10/14/2014 05:42:00 PM',
 '10/14/2014 05:41:00 PM',
 '10/14/2014 05:36:00 PM',
 '10/14/2014 05:37:00 PM',
 '10/14/2014 05:45:00 PM',
 '10/14/2014 05:49:00 PM',
 '10/14/2014 05:45:00 PM',
 '10/14/2014 05:44:00 PM',
 '10/14/2014 05:54:00 PM',
 '10/14/2014 05:51:00 PM',
 '10/14/2014 05:52:00 PM',
 '10/14/2014 05:47:00 PM',
 '10/14/2014 05:56:00 PM',
 '10/14/2014 05:48:00 PM',
 '10/14/2014 05:56:00 PM',
 '10/14/2014 05:56:00 PM',
 '10/14/2014 05:54:00 PM',
 '10/14/2014 05:54:00 PM',
 '10/14/2014 06:07:00 PM',
 '10/14/2014 06:10:00 PM',
 '10/14/2014 06:07:00 PM',
 '10/14/2014 06:02:00 PM',
 '10/14/2014 06:09:00 PM',
 '10/14/2014 06:04:00 PM',
 '10/14/2014 06:02:00 PM',
 '10/14/2014 07:03:00 PM',
 '10/14/2014 06:36:00 PM',
 '10/14/2014 06:06:00 PM',
 '10/14/2014 06:06:00 PM',
 '10/14/2014 06:09:00 PM',
 '10/14/2014 06:14:00 PM',
 '10/14/2014 07:04:00 PM',
 '10/14/2014 07:03:00 PM',
 '10/14/2014 06:10:00 PM',
 '10/14/2014 06:47:00 PM',
 '10/14/2014 06:24:00 PM',
 '10/14/2014 06:35:00 PM',
 '10/14/2014 06:35:00 PM',
 '10/14/2014 06:27:00 PM',
 '10/14/2014 06:27:00 PM',
 '10/14/2014 06:28:00 PM',
 '10/14/2014 06:26:00 PM',
 '10/14/2014 06:30:00 PM',
 '10/14/2014 06:29:00 PM',
 '10/14/2014 06:32:00 PM',
 '10/14/2014 06:35:00 PM',
 '10/14/2014 06:37:00 PM',
 '10/14/2014 06:30:00 PM',
 '10/14/2014 06:35:00 PM',
 '10/14/2014 06:29:00 PM',
 '10/14/2014 06:39:00 PM',
 '10/14/2014 06:34:00 PM',
 '10/14/2014 06:53:00 PM',
 '10/14/2014 06:44:00 PM',
 '10/14/2014 06:40:00 PM',
 '10/14/2014 07:09:00 PM',
 '10/14/2014 06:45:00 PM',
 '10/14/2014 06:51:00 PM',
 '10/14/2014 06:49:00 PM',
 '10/14/2014 06:52:00 PM',
 '10/14/2014 06:53:00 PM',
 '10/14/2014 06:55:00 PM',
 '10/14/2014 06:55:00 PM',
 '10/14/2014 07:02:00 PM',
 '10/14/2014 06:59:00 PM',
 '10/14/2014 06:59:00 PM',
 '10/14/2014 06:59:00 PM',
 '10/14/2014 07:03:00 PM',
 '10/14/2014 07:07:00 PM',
 '10/14/2014 07:05:00 PM',
 '10/14/2014 07:22:00 PM',
 '10/14/2014 07:12:00 PM',
 '10/14/2014 07:28:00 PM',
 '10/14/2014 07:28:00 PM',
 '10/14/2014 07:25:00 PM',
 '10/14/2014 07:16:00 PM',
 '10/14/2014 07:36:00 PM',
 '10/14/2014 07:24:00 PM',
 '10/14/2014 07:29:00 PM',
 '10/14/2014 07:47:00 PM',
 '10/14/2014 07:44:00 PM',
 '10/14/2014 07:44:00 PM',
 '10/14/2014 07:48:00 PM',
 '10/14/2014 07:48:00 PM',
 '10/14/2014 07:42:00 PM',
 '10/14/2014 07:45:00 PM',
 '10/14/2014 07:55:00 PM',
 '10/14/2014 07:48:00 PM',
 '10/14/2014 07:51:00 PM',
 '10/14/2014 08:26:00 PM',
 '10/14/2014 08:26:00 PM',
 '10/14/2014 07:51:00 PM',
 '10/14/2014 07:53:00 PM',
 '10/14/2014 09:27:00 PM',
 '10/14/2014 09:37:00 PM',
 '10/14/2014 09:26:00 PM',
 '10/14/2014 08:11:00 PM',
 '10/14/2014 08:36:00 PM',
 '10/14/2014 08:17:00 PM',
 '10/14/2014 08:36:00 PM',
 '10/14/2014 08:36:00 PM',
 '10/14/2014 08:14:00 PM',
 '10/14/2014 08:22:00 PM',
 '10/14/2014 08:22:00 PM',
 '10/14/2014 08:42:00 PM',
 '10/14/2014 08:32:00 PM',
 '10/14/2014 08:49:00 PM',
 '10/14/2014 09:39:00 PM',
 '10/14/2014 08:57:00 PM',
 '10/14/2014 09:01:00 PM',
 '10/14/2014 09:02:00 PM',
 '10/14/2014 09:04:00 PM',
 '10/14/2014 09:04:00 PM',
 '10/14/2014 09:08:00 PM',
 '10/14/2014 09:06:00 PM',
 '10/14/2014 09:16:00 PM',
 '10/14/2014 09:23:00 PM',
 '10/14/2014 09:19:00 PM',
 '10/14/2014 10:46:00 PM',
 '10/14/2014 09:26:00 PM',
 '10/14/2014 09:26:00 PM',
 '10/14/2014 09:33:00 PM',
 '10/14/2014 09:43:00 PM',
 '10/14/2014 09:46:00 PM',
 '10/14/2014 09:48:00 PM',
 '10/14/2014 10:02:00 PM',
 '10/14/2014 10:16:00 PM',
 '10/14/2014 10:30:00 PM',
 '10/14/2014 10:30:00 PM',
 '10/15/2014 04:29:00 AM',
 '10/15/2014 04:29:00 AM',
 '10/14/2014 10:50:00 PM',
 '10/14/2014 10:55:00 PM',
 '10/14/2014 11:20:00 PM',
 '10/15/2014 12:01:00 AM',
 '10/15/2014 12:18:00 AM',
 '10/15/2014 12:22:00 AM',
 '10/15/2014 12:53:00 AM',
 '10/15/2014 01:13:00 AM',
 '10/15/2014 01:59:00 AM',
 '10/15/2014 02:12:00 AM',
 '10/15/2014 06:16:00 AM',
 '10/15/2014 06:20:00 AM',
 '10/15/2014 06:47:00 AM',
 '10/15/2014 06:40:00 AM',
 '10/15/2014 06:44:00 AM',
 '10/15/2014 06:39:00 AM',
 '10/15/2014 06:47:00 AM',
 '10/15/2014 07:03:00 AM',
 '10/15/2014 07:21:00 AM',
 '10/15/2014 07:24:00 AM',
 '10/15/2014 07:22:00 AM',
 '10/15/2014 07:44:00 AM',
 '10/15/2014 07:49:00 AM',
 '10/15/2014 07:52:00 AM',
 '10/15/2014 08:01:00 AM',
 '10/15/2014 08:02:00 AM',
 '10/15/2014 08:15:00 AM',
 '10/15/2014 08:16:00 AM',
 '10/15/2014 08:13:00 AM',
 '10/15/2014 08:18:00 AM',
 '10/15/2014 08:19:00 AM',
 '10/15/2014 08:38:00 AM',
 '10/15/2014 08:35:00 AM',
 '10/15/2014 08:35:00 AM',
 '10/15/2014 08:47:00 AM',
 '10/15/2014 08:36:00 AM',
 '10/15/2014 08:52:00 AM',
 '10/15/2014 08:56:00 AM',
 '10/15/2014 09:16:00 AM',
 '10/15/2014 09:16:00 AM',
 '10/15/2014 09:09:00 AM',
 '10/15/2014 09:14:00 AM',
 '10/15/2014 09:26:00 AM',
 '10/15/2014 09:19:00 AM',
 '10/15/2014 09:23:00 AM',
 '10/15/2014 09:28:00 AM',
 '10/15/2014 09:31:00 AM',
 '10/15/2014 09:34:00 AM',
 '10/15/2014 09:27:00 AM',
 '10/15/2014 09:39:00 AM',
 '10/15/2014 09:29:00 AM',
 '10/15/2014 09:37:00 AM',
 '10/15/2014 09:30:00 AM',
 '10/15/2014 09:39:00 AM',
 '10/15/2014 09:57:00 AM',
 '10/15/2014 09:54:00 AM',
 '10/15/2014 10:01:00 AM',
 '10/15/2014 09:57:00 AM',
 '10/15/2014 09:58:00 AM',
 '10/15/2014 10:09:00 AM',
 '10/15/2014 10:15:00 AM',
 '10/15/2014 10:22:00 AM',
 '10/15/2014 10:08:00 AM',
 '10/15/2014 10:13:00 AM',
 '10/15/2014 10:15:00 AM',
 '10/15/2014 10:41:00 AM',
 '10/15/2014 10:28:00 AM',
 '10/15/2014 10:33:00 AM',
 '10/15/2014 10:38:00 AM',
 '10/15/2014 10:34:00 AM',
 '10/15/2014 10:42:00 AM',
 '10/15/2014 10:37:00 AM',
 '10/15/2014 11:06:00 AM',
 '10/15/2014 11:06:00 AM',
 '10/15/2014 10:49:00 AM',
 '10/15/2014 10:52:00 AM',
 '10/15/2014 10:53:00 AM',
 '10/15/2014 11:11:00 AM',
 '10/15/2014 11:03:00 AM',
 '10/15/2014 11:05:00 AM',
 '10/15/2014 11:03:00 AM',
 '10/15/2014 11:03:00 AM',
 '10/15/2014 11:05:00 AM',
 '10/15/2014 11:07:00 AM',
 '10/15/2014 11:24:00 AM',
 '10/15/2014 11:29:00 AM',
 '10/15/2014 11:32:00 AM',
 '10/15/2014 11:21:00 AM',
 '10/15/2014 11:42:00 AM',
 '10/15/2014 11:34:00 AM',
 '10/15/2014 11:42:00 AM',
 '10/15/2014 11:53:00 AM',
 '10/15/2014 11:44:00 AM',
 '10/15/2014 12:01:00 PM',
 '10/15/2014 11:52:00 AM',
 '10/15/2014 12:05:00 PM',
 '10/15/2014 12:05:00 PM',
 '10/15/2014 12:40:00 PM',
 '10/15/2014 12:40:00 PM',
 '10/15/2014 12:52:00 PM',
 '10/15/2014 12:34:00 PM',
 '10/15/2014 12:32:00 PM',
 '10/15/2014 12:38:00 PM',
 '10/15/2014 12:53:00 PM',
 ...]

Mathematical operations on columns happen element-wise:


In [119]:
trip_duration_hours = df['tripduration']/3600
trip_duration_hours[:3]


Out[119]:
0    0.273871
1    0.257326
2    0.245509
Name: tripduration, dtype: float64

In [120]:
df['trip_duration_hours'] = df['tripduration']/3600

In [121]:
del df['trip_duration_hours']

In [122]:
df.head()


Out[122]:
trip_id starttime stoptime bikeid tripduration from_station_name to_station_name from_station_id to_station_id usertype gender birthyear
0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 985.935 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1960.0
1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 926.375 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1970.0
2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 883.831 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1988.0
3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 865.937 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1977.0
4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 923.923 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1971.0

In [123]:
df.loc[[0,1],:]


Out[123]:
trip_id starttime stoptime bikeid tripduration from_station_name to_station_name from_station_id to_station_id usertype gender birthyear
0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 985.935 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1960.0
1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 926.375 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1970.0

In [124]:
df_long_trips = df[df['tripduration'] >10000]

In [125]:
sel = df['tripduration'] >10000 
df_long_trips = df[sel]

In [126]:
len(df)


Out[126]:
275091

In [127]:
# Make a copy of a slice
df_subset = df[['starttime', 'stoptime']].copy()
df_subset['trip_hours'] = df['tripduration']/3600

Columns can be created (or overwritten) with the assignment operator. Let's create a tripminutes column with the number of minutes for each trip

More complicated mathematical operations can be done with tools in the numpy package:

Working with Times

One trick to know when working with columns of times is that Pandas DateTimeIndex provides a nice interface for working with columns of times.

For a dataset of this size, using pd.to_datetime and specifying the date format can make things much faster (from the strftime reference, we see that the pronto data has format "%m/%d/%Y %I:%M:%S %p"

(Note: you can also use infer_datetime_format=True in most cases to automatically infer the correct format, though due to a bug it doesn't work when AM/PM are present)

With it, we can extract, the hour of the day, the day of the week, the month, and a wide range of other views of the time:

Simple Grouping of Data

The real power of Pandas comes in its tools for grouping and aggregating data. Here we'll look at value counts and the basics of group-by operations.

Value Counts

Pandas includes an array of useful functionality for manipulating and analyzing tabular data. We'll take a look at two of these here.

The pandas.value_counts returns statistics on the unique values within each column.

We can use it, for example, to break down rides by gender:


In [ ]:
#

Or to break down rides by age:


In [ ]:
#

By default, the values rather than the index are sorted. Use sort=False to turn this behavior off:


In [ ]:
#

We can explore other things as well: day of week, hour of day, etc.


In [ ]:
#

Group-by Operation

One of the killer features of the Pandas dataframe is the ability to do group-by operations. You can visualize the group-by like this (image borrowed from the Python Data Science Handbook)


In [58]:
df.head()


Out[58]:
trip_id starttime stoptime bikeid tripduration from_station_name to_station_name from_station_id to_station_id usertype gender birthyear
0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 985.935 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1960.0
1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 926.375 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1970.0
2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 883.831 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1988.0
3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 865.937 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Female 1977.0
4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 923.923 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Member Male 1971.0

In [59]:
df_count = df.groupby(['from_station_id']).count()
df_count.head()


Out[59]:
trip_id starttime stoptime bikeid tripduration from_station_name to_station_name to_station_id usertype gender birthyear
from_station_id
BT-01 10463 10463 10463 10463 10463 10463 10463 10463 10463 4162 4162
BT-03 7334 7334 7334 7334 7334 7334 7334 7334 7334 4862 4862
BT-04 4666 4666 4666 4666 4666 4666 4666 4666 4666 3424 3424
BT-05 5699 5699 5699 5699 5699 5699 5699 5699 5699 2975 2975
BT-06 150 150 150 150 150 150 150 150 150 130 130

In [60]:
ser_count = df_count['trip_id']
type(ser_count)


Out[60]:
pandas.core.series.Series

In [61]:
ser_count.sort_values()


Out[61]:
from_station_id
BT-06       150
UW-01       480
WF-03       646
UW-12       689
CD-01       958
SLU-21     1114
UW-10      1175
UW-11      1237
UD-02      1417
DPD-03     1423
SLU-22     1748
UW-07      1905
UW-02      2002
CH-16      2089
FH-01      2349
UW-06      2383
UD-07      2429
SLU-20     2452
ID-04      2474
UW-04      2688
CBD-07     3263
EL-05      3400
CBD-04     3440
SLU-18     3461
UD-04      3534
EL-01      3604
CH-06      3765
UD-01      3889
PS-05      3969
FH-04      4208
BT-04      4666
CBD-03     4822
DPD-01     4822
CBD-06     4911
SLU-16     5045
CBD-05     5068
SLU-04     5226
CH-09      5246
PS-04      5409
BT-05      5699
SLU-23     5739
EL-03      5788
CH-12      5857
CH-03      6218
WF-04      6271
SLU-07     6339
CH-01      6409
CH-15      6550
CH-05      6948
SLU-02     7018
SLU-01     7084
SLU-19     7285
BT-03      7334
CH-02      8546
CH-08      8573
CBD-13     9067
SLU-15     9741
BT-01     10463
CH-07     11568
WF-01     13038
Name: trip_id, dtype: int64

In [62]:
df_count1 = df_count['trip_id']
df_count2 = df_count1.rename(columns={'trip_id': 'count'})
df_count2['new'] = 1
df_count2.head()


Out[62]:
from_station_id
BT-01    10463
BT-03     7334
BT-04     4666
BT-05     5699
BT-06      150
dtype: int64

In [63]:
df_mean = df.groupby(['from_station_id']).mean()
df_mean.head()


Out[63]:
trip_id tripduration birthyear
from_station_id
BT-01 147831.009844 1375.031203 1980.131427
BT-03 139404.294655 1019.200684 1976.505142
BT-04 157992.809687 891.095897 1979.877044
BT-05 139283.572381 1199.949481 1975.937479
BT-06 291807.953333 659.770547 1975.830769

In [64]:
dfgroup = df.groupby(['from_station_id'])
dfgroup.groups


Out[64]:
{'BT-01': Int64Index([   217,    227,    228,    282,    283,    310,    326,    327,
                329,    331,
             ...
             274971, 274973, 274974, 274975, 274976, 274979, 275032, 275033,
             275075, 275076],
            dtype='int64', length=10463),
 'BT-03': Int64Index([    87,     88,    230,    261,    366,    407,    414,    439,
                453,    754,
             ...
             268122, 268181, 268307, 268318, 268319, 268391, 268392, 268467,
             268527, 268528],
            dtype='int64', length=7334),
 'BT-04': Int64Index([    66,     67,     94,    104,    108,    166,    233,    259,
                322,    333,
             ...
             274350, 274361, 274424, 274704, 274789, 274970, 275009, 275064,
             275065, 275083],
            dtype='int64', length=4666),
 'BT-05': Int64Index([   110,    413,    426,    513,    585,    618,    744,    753,
                795,   1003,
             ...
             274547, 274605, 274610, 274621, 274817, 274847, 274910, 274911,
             275029, 275034],
            dtype='int64', length=5699),
 'BT-06': Int64Index([268581, 268642, 268667, 268718, 268735, 268781, 268897, 268903,
             268914, 268961,
             ...
             274777, 274778, 274865, 274931, 274937, 274949, 274951, 274956,
             274962, 274965],
            dtype='int64', length=150),
 'CBD-03': Int64Index([   118,    119,    164,    229,    275,    285,    328,    339,
                356,    357,
             ...
             274574, 274594, 274599, 274622, 274672, 274743, 274774, 274810,
             274915, 275053],
            dtype='int64', length=4822),
 'CBD-04': Int64Index([105392, 105458, 105467, 105472, 105614, 105615, 105835, 105836,
             105855, 105858,
             ...
             274730, 274922, 274924, 274925, 274926, 274927, 274952, 274958,
             274983, 275080],
            dtype='int64', length=3440),
 'CBD-05': Int64Index([    54,     79,     95,    148,    149,    150,    151,    165,
                211,    219,
             ...
             274081, 274085, 274086, 274138, 274139, 274623, 274624, 274692,
             274814, 274906],
            dtype='int64', length=5068),
 'CBD-06': Int64Index([     0,      1,      2,      3,      4,      5,     63,     68,
                 70,     71,
             ...
             274534, 274544, 274570, 274671, 274744, 274765, 274797, 274836,
             274838, 274980],
            dtype='int64', length=4911),
 'CBD-07': Int64Index([    42,     69,     78,    141,    196,    269,    478,    510,
                522,    542,
             ...
             274317, 274413, 274439, 274467, 274579, 274673, 274688, 274752,
             274879, 274907],
            dtype='int64', length=3263),
 'CBD-13': Int64Index([    99,    139,    198,    249,    276,    334,    381,    388,
                424,    442,
             ...
             274444, 274535, 274555, 274613, 274726, 274780, 274903, 274916,
             274966, 274968],
            dtype='int64', length=9067),
 'CD-01': Int64Index([ 68531,  68532,  69169,  69170,  69954,  70367,  70529,  70546,
              70621,  70622,
             ...
             224878, 225305, 225422, 225427, 225492, 225493, 225587, 225666,
             226277, 226597],
            dtype='int64', length=958),
 'CH-01': Int64Index([   256,    355,    382,    416,    437,    444,    502,    503,
                600,    679,
             ...
             274585, 274632, 274641, 274656, 274670, 274827, 274829, 274871,
             274933, 274995],
            dtype='int64', length=6409),
 'CH-02': Int64Index([    55,     56,     58,     83,    113,    126,    127,    154,
                162,    163,
             ...
             274450, 274488, 274593, 274660, 274668, 274868, 274917, 275067,
             275081, 275082],
            dtype='int64', length=8546),
 'CH-03': Int64Index([   290,    417,    428,    435,    436,    452,    494,    516,
                640,    665,
             ...
             274625, 274650, 274658, 274693, 274822, 274844, 274872, 274893,
             274894, 274950],
            dtype='int64', length=6218),
 'CH-05': Int64Index([   134,    195,    205,    248,    250,    251,    315,    324,
                337,    390,
             ...
             274457, 274597, 274714, 274850, 274855, 274873, 274889, 274932,
             274936, 274993],
            dtype='int64', length=6948),
 'CH-06': Int64Index([   212,    253,    277,    278,    279,    403,    449,    504,
                684,    881,
             ...
             274679, 274745, 274746, 274750, 274751, 274781, 274788, 274834,
             274839, 274875],
            dtype='int64', length=3765),
 'CH-07': Int64Index([   146,    210,    299,    341,    374,    377,    401,    415,
                431,    466,
             ...
             274832, 274846, 274890, 274891, 274892, 274955, 275002, 275049,
             275060, 275077],
            dtype='int64', length=11568),
 'CH-08': Int64Index([   120,    136,    144,    158,    159,    242,    262,    294,
                311,    321,
             ...
             274824, 274848, 274853, 274904, 274905, 274935, 274943, 274982,
             275006, 275090],
            dtype='int64', length=8573),
 'CH-09': Int64Index([   101,    168,    222,    349,    380,    467,    567,    578,
                628,    647,
             ...
             274357, 274466, 274468, 274475, 274500, 274595, 274611, 274732,
             274895, 275066],
            dtype='int64', length=5246),
 'CH-12': Int64Index([   319,    384,    411,    441,    451,    462,    540,    554,
                558,    605,
             ...
             274577, 274603, 274609, 274615, 274631, 274680, 275050, 275056,
             275088, 275089],
            dtype='int64', length=5857),
 'CH-15': Int64Index([   109,    160,    244,    340,    402,    430,    459,    468,
                723,    724,
             ...
             274674, 274756, 274791, 274812, 274826, 274840, 274852, 274857,
             274921, 274969],
            dtype='int64', length=6550),
 'CH-16': Int64Index([175075, 175093, 175108, 175126, 175127, 175131, 175136, 175144,
             175145, 175210,
             ...
             274629, 274805, 274816, 274825, 274854, 274920, 275035, 275042,
             275051, 275072],
            dtype='int64', length=2089),
 'DPD-01': Int64Index([    59,     91,     93,    289,    568,    644,    667,    740,
                839,    974,
             ...
             274530, 274604, 274665, 274681, 274798, 274823, 274939, 274957,
             274961, 275026],
            dtype='int64', length=4822),
 'DPD-03': Int64Index([   131,    197,    345,    347,    727,    844,   1075,   1144,
               1347,   1430,
             ...
             273047, 273048, 274048, 274092, 274776, 274800, 275001, 275003,
             275004, 275005],
            dtype='int64', length=1423),
 'EL-01': Int64Index([   199,    400,    700,    702,    715,    716,    769,   1175,
               1350,   1351,
             ...
             274676, 274718, 274731, 274770, 274837, 274928, 274929, 274948,
             274959, 275038],
            dtype='int64', length=3604),
 'EL-03': Int64Index([   344,    358,    360,    425,    492,    583,    927,   1027,
               1071,   1110,
             ...
             274757, 274861, 274862, 274882, 274984, 274988, 274990, 274994,
             275031, 275052],
            dtype='int64', length=5788),
 'EL-05': Int64Index([   200,    201,    447,    456,    488,    615,    646,    694,
                763,    858,
             ...
             274019, 274157, 274162, 274253, 274368, 274477, 274584, 274725,
             274877, 274878],
            dtype='int64', length=3400),
 'FH-01': Int64Index([   100,    231,    325,    330,    373,    386,    455,    485,
                505,    521,
             ...
             173748, 173988, 174253, 174384, 174549, 174647, 174657, 174690,
             174986, 175005],
            dtype='int64', length=2349),
 'FH-04': Int64Index([   364,    371,    392,    396,    460,    482,    529,    950,
                970,    984,
             ...
             274428, 274519, 274600, 274640, 274646, 274648, 274849, 274851,
             274964, 275000],
            dtype='int64', length=4208),
 'ID-04': Int64Index([    89,    123,    155,    156,    169,    170,    214,    223,
                237,    309,
             ...
             274353, 274445, 274548, 274792, 274930, 275014, 275057, 275058,
             275084, 275085],
            dtype='int64', length=2474),
 'PS-04': Int64Index([     6,      7,      8,      9,     10,     11,     12,     13,
                 14,     15,
             ...
             274446, 274471, 274572, 274734, 274766, 274874, 274901, 274902,
             274944, 275068],
            dtype='int64', length=5409),
 'PS-05': Int64Index([    45,     49,     53,     57,     90,    130,    202,    218,
                246,    247,
             ...
             274633, 274634, 274635, 274666, 274820, 274828, 274978, 274987,
             275022, 275073],
            dtype='int64', length=3969),
 'SLU-01': Int64Index([   111,    142,    143,    147,    152,    153,    220,    308,
                312,    370,
             ...
             274686, 274722, 274768, 274769, 274883, 274884, 274997, 275041,
             275059, 275071],
            dtype='int64', length=7084),
 'SLU-02': Int64Index([   137,    181,    296,    397,    427,    458,    464,    500,
                530,    531,
             ...
             274678, 274684, 274687, 274747, 274753, 274833, 274835, 274885,
             274899, 274991],
            dtype='int64', length=7018),
 'SLU-04': Int64Index([   213,    245,    273,    288,    291,    295,    316,    432,
                589,    639,
             ...
             274185, 274309, 274311, 274415, 274493, 274711, 274887, 274941,
             275027, 275048],
            dtype='int64', length=5226),
 'SLU-07': Int64Index([   368,    454,    551,    552,    575,    577,    633,    648,
                735,    741,
             ...
             274607, 274608, 274647, 274715, 274716, 274771, 274845, 274898,
             274946, 275040],
            dtype='int64', length=6339),
 'SLU-15': Int64Index([   102,    178,    232,    243,    284,    287,    292,    313,
                318,    338,
             ...
             274808, 274863, 274897, 274923, 274953, 274967, 274985, 275036,
             275037, 275061],
            dtype='int64', length=9741),
 'SLU-16': Int64Index([   391,    406,    420,    448,    486,    487,    532,    536,
                537,    538,
             ...
             274099, 274103, 274296, 274442, 274602, 274720, 274763, 274764,
             274867, 274896],
            dtype='int64', length=5045),
 'SLU-18': Int64Index([   103,    320,    359,    446,    544,    556,    565,    566,
                591,    614,
             ...
             209477, 209600, 209625, 209663, 209671, 209907, 209917, 209918,
             209929, 210002],
            dtype='int64', length=3461),
 'SLU-19': Int64Index([   129,    280,    304,    350,    351,    353,    354,    457,
                493,    564,
             ...
             274407, 274512, 274590, 274651, 274701, 274702, 274703, 274841,
             274963, 275062],
            dtype='int64', length=7285),
 'SLU-20': Int64Index([ 79307,  79441,  79473,  79584,  79657,  79658,  79659,  79864,
              79868,  79994,
             ...
             273606, 274539, 274540, 274561, 274606, 274758, 274806, 274807,
             274918, 275039],
            dtype='int64', length=2452),
 'SLU-21': Int64Index([133364, 133365, 133388, 133620, 133621, 133744, 133745, 134178,
             134179, 135016,
             ...
             274136, 274193, 274497, 274612, 274698, 274801, 274960, 275030,
             275043, 275074],
            dtype='int64', length=1114),
 'SLU-22': Int64Index([210885, 210897, 210898, 210899, 210913, 210918, 211084, 211085,
             211264, 211318,
             ...
             274525, 274652, 274669, 274683, 274699, 274772, 274803, 274804,
             274842, 274866],
            dtype='int64', length=1748),
 'SLU-23': Int64Index([   192,    206,    224,    225,    226,    305,    306,    549,
                550,    635,
             ...
             275010, 275011, 275012, 275016, 275017, 275018, 275019, 275020,
             275021, 275023],
            dtype='int64', length=5739),
 'UD-01': Int64Index([    60,     61,     76,    177,    182,    208,    608,    942,
                943,   1054,
             ...
             274123, 274124, 274158, 274460, 274575, 274700, 274705, 274869,
             274881, 275025],
            dtype='int64', length=3889),
 'UD-02': Int64Index([    92,     97,    183,    193,    204,    240,    241,    543,
                654,    655,
             ...
             274286, 274287, 274369, 274418, 274502, 274815, 274919, 275024,
             275086, 275087],
            dtype='int64', length=1417),
 'UD-04': Int64Index([    96,    161,    184,    188,    260,    372,    499,    611,
                678,    891,
             ...
             274104, 274105, 274160, 274259, 274264, 274283, 274400, 274528,
             274659, 274870],
            dtype='int64', length=3534),
 'UD-07': Int64Index([   115,    116,    281,    469,    669,    696,    738,    904,
                963,   1040,
             ...
             273080, 273086, 273331, 273359, 273545, 273783, 274165, 274175,
             274281, 274759],
            dtype='int64', length=2429),
 'UW-01': Int64Index([   730,   1691,   1759,   2124,   2383,   2746,   3087,   3356,
               3404,   3510,
             ...
             142135, 142136, 142249, 142254, 142259, 143101, 144918, 145571,
             147714, 147773],
            dtype='int64', length=480),
 'UW-02': Int64Index([    72,     73,     74,     80,    421,    857,    964,   1026,
               1183,   1433,
             ...
             274078, 274079, 274088, 274089, 274300, 274301, 274537, 274586,
             274998, 275063],
            dtype='int64', length=2002),
 'UW-04': Int64Index([   187,    343,    375,    463,    477,    580,    673,    762,
                781,    833,
             ...
             274811, 274856, 274858, 274876, 274888, 274996, 275045, 275054,
             275069, 275070],
            dtype='int64', length=2688),
 'UW-06': Int64Index([   167,    272,    385,    631,    774,    951,   1011,   1048,
               1078,   1083,
             ...
             274454, 274499, 274562, 274617, 274742, 274773, 274786, 274802,
             274809, 274945],
            dtype='int64', length=2383),
 'UW-07': Int64Index([   121,    122,    215,    216,    365,    367,    404,    721,
               1090,   1141,
             ...
             273936, 273944, 273961, 274498, 274571, 274576, 274721, 274727,
             274947, 275046],
            dtype='int64', length=1905),
 'UW-10': Int64Index([   105,    124,    128,    314,    619,    896,    934,    935,
                999,   1006,
             ...
             238201, 238453, 238514, 238816, 238854, 239274, 239545, 240295,
             240446, 240775],
            dtype='int64', length=1175),
 'UW-11': Int64Index([150250, 150776, 151044, 151373, 151690, 152037, 152061, 153903,
             153941, 154905,
             ...
             273385, 273607, 273760, 273892, 273904, 274016, 274017, 274018,
             274473, 274675],
            dtype='int64', length=1237),
 'UW-12': Int64Index([241157, 241173, 241175, 241194, 241208, 241245, 241292, 241403,
             241435, 241447,
             ...
             274748, 274794, 274795, 274843, 274859, 274864, 274938, 274940,
             274999, 275055],
            dtype='int64', length=689),
 'WF-01': Int64Index([   133,    135,    297,    298,    300,    302,    307,    369,
                475,    514,
             ...
             274972, 274977, 274989, 275013, 275015, 275028, 275044, 275047,
             275078, 275079],
            dtype='int64', length=13038),
 'WF-03': Int64Index([226781, 226784, 226827, 227100, 227321, 227322, 227569, 227570,
             227768, 227769,
             ...
             274097, 274100, 274306, 274307, 274325, 274383, 274667, 274708,
             274909, 274934],
            dtype='int64', length=646),
 'WF-04': Int64Index([    64,     65,    132,    157,    203,    207,    236,    264,
                266,    267,
             ...
             274382, 274630, 274637, 274697, 274707, 274709, 274880, 274913,
             274981, 274986],
            dtype='int64', length=6271)}

The simplest version of a groupby looks like this, and you can use almost any aggregation function you wish (mean, median, sum, minimum, maximum, standard deviation, count, etc.)

<data object>.groupby(<grouping values>).<aggregate>()

for example, we can group by gender and find the average of all numerical columns:


In [ ]:

It's also possible to indes the grouped object like it is a dataframe:


In [ ]:

You can even group by multiple values: for example we can look at the trip duration by time of day and by gender:


In [ ]:

The unstack() operation can help make sense of this type of multiply-grouped data. What this technically does is split a multiple-valued index into an index plus columns:


In [ ]:

Visualizing data with pandas

Of course, looking at tables of data is not very intuitive. Fortunately Pandas has many useful plotting functions built-in, all of which make use of the matplotlib library to generate plots.

Whenever you do plotting in the IPython notebook, you will want to first run this magic command which configures the notebook to work well with plots:


In [65]:
%matplotlib inline

Now we can simply call the plot() method of any series or dataframe to get a reasonable view of the data:


In [66]:
import matplotlib.pyplot as plt
df['tripduration'].hist()


Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f6c5496b3c8>

Adjusting the Plot Style

Matplotlib has a number of plot styles you can use. For example, if you like R you might use the ggplot style:


In [ ]:

Other plot types

Pandas supports a range of other plotting types; you can find these by using the autocomplete on the plot method:


In [ ]:

For example, we can create a histogram of trip durations:


In [ ]:

If you'd like to adjust the x and y limits of the plot, you can use the set_xlim() and set_ylim() method of the resulting object:


In [ ]:

Breakout: Exploring the Data

Make a plot of the total number of rides as a function of month of the year (You'll need to extract the month, use a groupby, and find the appropriate aggregation to count the number in each group).


In [ ]:

Split this plot by gender. Do you see any seasonal ridership patterns by gender?


In [ ]:

Split this plot by user type. Do you see any seasonal ridership patterns by usertype?


In [ ]:

Repeat the above three steps, counting the number of rides by time of day rather thatn by month.


In [ ]:

Are there any other interesting insights you can discover in the data using these tools?


In [ ]:

Using Files

  • Writing and running python modules
  • Using python modules in your Jupyter Notebook

In [67]:
# A script for creating a dataframe with counts of the occurrence of a columns' values
df_count = df.groupby('from_station_id').count()
df_count1 = df_count[['trip_id']]
df_count2 = df_count1.rename(columns={'trip_id': 'count'})

In [68]:
df_count2.head()


Out[68]:
count
from_station_id
BT-01 10463
BT-03 7334
BT-04 4666
BT-05 5699
BT-06 150

In [69]:
def make_table_count(df_arg, groupby_column):
    df_count = df_arg.groupby(groupby_column).count()
    column_name = df.columns[0]
    df_count1 = df_count[[column_name]]
    df_count2 = df_count1.rename(columns={column_name: 'count'})
    return df_count2

In [70]:
dff = make_table_count(df, 'from_station_id')
dff.head()


Out[70]:
count
from_station_id
BT-01 10463
BT-03 7334
BT-04 4666
BT-05 5699
BT-06 150