Basic intro to pandas

Pandas is a very rich library for working with tabular data. It's especially good at dealing with timeseries and hierarchical indexes. The full documentation for version 0.15.2 is here.

Note that you'll need to git annex get data/2015-01-04-carto-export.csv before this will work!


In [1]:
# This is a not-very-pleasant way to look at the data
!head ../data/2015-01-04-carto-export.csv


post_id,post_title,post_date,post_excerpt,lat,lng,location_id,location_name,location_notes,street_address,city,state,zip,county,total_cost,start_year,completion_year,new_deal_agencies,new_deal_categories,artists,contractors,designers,status,menu_order,image
49119,"Snohomish County Drainage Improvements - Monroe WA",2014/10/25,""In Snohomish County, farm land conditions will be improved by a drainage system affecting seven sections of land between the cities of Snohomish and Monroe. Cooperating with the State of...",47.876538,-122.034003,,,"General marker for area within Snohomish County",,Monroe,WA,,Snohomish,,1937,,"Works Progress Administration (WPA)","Flood erosion and control, Utilities and Infrastructure, Water disposal",,,,,,
36443,"Plum Bayou Resettlement Project - Plum Bayou AR",2014/02/03,"Plum Bayou was the first settlement in Arkansas and in the United States (Arkansas Historic Preservation Program). Resettlement Administrator Rexford G. Tugwell, was present at the opening dedication...",34.295303,-91.91833,,,"Location marker approximate. Exact coordinates needed.",,"Plum Bayou",Arkansas,72182,Jefferson,,1935,1936,"Resettlement Administration (RA)","Resettlement Communities",,,,,,http://livingnewdeal.org/wp-content/uploads/2014/02/plum_bayou1_f-300x218.jpg
14921,"Temple Street Bridge - Los Angeles CA",2014/01/21,"The PWA built this large concrete bridge over Figueroa St.",34.0599094,-118.24851030000002,,,,"765-799 W Temple St","Los Angeles ",CA,90012,,,,1939,"Public Works Administration (PWA)","Roads, highways and bridges, Utilities and Infrastructure",,,,Marked,,http://livingnewdeal.org/wp-content/uploads/2013/07/Temple04-289x225.jpg
49364,"Husky Stadium Expansion - Seattle WA",2014/10/27,"The University of Washington's Husky Stadium was expanded during the 1930s as a result of WPA funding assistance and efforts. A WPA press release from Dec. 1937 announced $23,345 in funds for...",47.6503,-122.3015,2934,"University of Washington - Seattle WA",,,Seattle,WA,,,,,,"Works Progress Administration (WPA)","Parks and recreation, Stadiums",,,,,,
40496,"Nelson W. Aldrich High School - Warwick RI",2014/05/11,"A long, low Colonial Revival school with a portico and pediment. One of the last major commissions of its architects, William R. Walker & Son. Has served as a junior high school since...",41.754381,-71.41475400000002,,,,"789 Post Road",Warwick,RI,02888,Kent,,1934,1935,"Works Progress Administration (WPA)","Education, Schools",,,"William R. Walker & Son",,,
40497,"Oakland Beach School - Warwick RI",2014/05/11,"A mundane Colonial Revival structure serving the Oakland Beach neighborhood of Warwick. The architects were William R. Walker & Son of...",41.698839,-71.39910299999997,,,,"383 Oakland Beach Avenue",Warwick,RI,02889,Kent,,1933,1934,"Works Progress Administration (WPA)","Education, Schools",,,"William R. Walker & Son",,,
40498,"Municipal Utility Improvements - Auburn ME",2014/05/11,"According to an article in the Lewiston Evening Journal of January 3, 1935 by Gerald Reed, extensive utility work was undertaken in the city by a combination of the CWA, FERA, & ERA agencies....",44.0976659,-70.232664,,,,"268 Court St.",Auburn,ME,04210,,,1933,,"Civil Works Administration (CWA), Federal Emergency Relief Administration (FERA)","Public utilities and sanitation, Utilities and Infrastructure, Water disposal",,,,,,http://livingnewdeal.org/wp-content/uploads/2014/05/AuburnWS-300x214.jpg
40509,"Municipal improvements - Auburn ME",2014/05/11,"The Lewiston Evening Journal reported that by 1935, a combination of the CWA, FERA, and ERA had completed numerous work projects in Auburn Maine:  A two mile hiking trail along the Little...",44.0978509,-70.23116549999997,,,"General marker for city of Auburn.",,Auburn,ME,04210,,,1933,1935,"Civil Works Administration (CWA), Federal Emergency Relief Administration (FERA)","Education, New Deal Work Site, Parks and recreation, Public buildings, Schools, Stadiums, Trails",,,,,,
40501,"Suburban Parkway Landscaping - Warwick RI",2014/05/11,"By 1940, the tracks of the former Warwick Railroad had been removed from the center of Suburban Parkway in Oakland Beach. As a WPA project, this center strip was landscaped.",41.6868847,-71.39776840000002,,,,"Suburban Parkway",Warwick,RI,02889,Kent,,1940,,"Works Progress Administration (WPA)","Roads, highways and bridges, Utilities and Infrastructure",,,,,,

In [9]:
import pandas
# I always print these versions in my notebooks to make sure things haven't changed on me...
print(pandas.__version__)


0.15.2

In [2]:
# This is much nicer!
carto_df = pandas.read_csv('../data/2015-01-04-carto-export.csv')
carto_df.head()


Out[2]:
post_id post_title post_date post_excerpt lat lng location_id location_name location_notes street_address ... start_year completion_year new_deal_agencies new_deal_categories artists contractors designers status menu_order image
0 49119 Snohomish County Drainage Improvements - Monro... 2014/10/25 "In Snohomish County, farm land condition... 47.876538 -122.034003 NaN NaN General marker for area within Snohomish County NaN ... 1937 NaN Works Progress Administration (WPA) Flood erosion and control, Utilities and Infra... NaN NaN NaN NaN NaN NaN
1 36443 Plum Bayou Resettlement Project - Plum Bayou AR 2014/02/03 Plum Bayou was the first settlement in Arkansa... 34.295303 -91.91833 NaN NaN Location marker approximate. Exact coordinates... NaN ... 1935 1936 Resettlement Administration (RA) Resettlement Communities NaN NaN NaN NaN NaN http://livingnewdeal.org/wp-content/uploads/20...
2 14921 Temple Street Bridge - Los Angeles CA 2014/01/21 The PWA built this large concrete bridge over ... 34.0599094 -118.24851030000002 NaN NaN NaN 765-799 W Temple St ... NaN 1939 Public Works Administration (PWA) Roads, highways and bridges, Utilities and Inf... NaN NaN NaN Marked NaN http://livingnewdeal.org/wp-content/uploads/20...
3 49364 Husky Stadium Expansion - Seattle WA 2014/10/27 The University of Washington's Husky Stad... 47.6503 -122.3015 2934 University of Washington - Seattle WA NaN NaN ... NaN NaN Works Progress Administration (WPA) Parks and recreation, Stadiums NaN NaN NaN NaN NaN NaN
4 40496 Nelson W. Aldrich High School - Warwick RI 2014/05/11 A long, low Colonial Revival school with a por... 41.754381 -71.41475400000002 NaN NaN NaN 789 Post Road ... 1934 1935 Works Progress Administration (WPA) Education, Schools NaN NaN William R. Walker & Son NaN NaN NaN

5 rows × 25 columns


In [5]:
# This gives a description of columns that pandas understands
carto_df.describe()


Out[5]:
post_id location_id menu_order
count 8943.000000 4143.000000 983.000000
mean 24169.356144 5141.478639 1.355036
std 17733.598439 2646.650055 2.486848
min 1.000000 730.000000 1.000000
25% 7387.500000 2777.500000 1.000000
50% 21266.000000 5045.000000 1.000000
75% 41384.500000 6839.000000 1.000000
max 56469.000000 11345.000000 27.000000

That's not many columns! What's going on here?


In [6]:
carto_df.dtypes


Out[6]:
post_id                  int64
post_title              object
post_date               object
post_excerpt            object
lat                     object
lng                     object
location_id            float64
location_name           object
location_notes          object
street_address          object
city                    object
state                   object
zip                     object
county                  object
total_cost              object
start_year              object
completion_year         object
new_deal_agencies       object
new_deal_categories     object
artists                 object
contractors             object
designers               object
status                  object
menu_order             float64
image                   object
dtype: object

So, pandas doesn't know what those object columns are. We have some data cleaning to do!


In [ ]: