Two Dimensional Data Worksheet

This worksheet focuses on manipulating two dimensional data using Python and Pandas.


In [1]:
%pylab inline
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None


Populating the interactive namespace from numpy and matplotlib

In [2]:
#Create a dataframe called twitter data from the CSV file
#Note if this is breaking your machine there is a smaller data set in the data file called twitter1-small.csv
twitterData = pd.read_csv( '../Data/twitter1.csv', encoding='iso8859_15' )

Exercise 1

Using the twitterData DataFrame and the commands we have learned thus far and create a Series called tweetCounts which contains the user name and how many tweets each user posted. Next, output the top 10 "tweeters".


In [ ]:

Exercise 2

Using the original twitter data set, create a second DataFrame called twitterSummary which contains the following columns:

  • Username
  • Friends
  • Followers

Next add a column called ffratio which contains the ratio of friends to followers.


In [ ]:

Exercise 3

In the Data folder, there is a spreadsheet called studentData.csv consisting of students and test scores. Write a script which calculates each students' average test score and adds that as a column to the DataFrame. The first person to raise their hand and tell me which student has the highest average test score, and what it is wins something.


In [ ]:

Exercise 4

Using the twitter data, find all the users with Facebook accounts and create a new column called FacebookID which contains the users' Facebook ID. As you can see in the URL below, a user's Facebook ID can be found in the URL column, http://www.facebook.com/profile.php?id=5141860. Extract this by using the str.extract function. Don't forget to remove all the invalid or empty IDs.

We've already created a DataFrame for you in the cell above.


In [ ]: