In [1]:
%pylab inline
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None
In [2]:
#Create a dataframe called twitter data from the CSV file
#Note if this is breaking your machine there is a smaller data set in the data file called twitter1-small.csv
twitterData = pd.read_csv( '../Data/twitter1.csv', encoding='iso8859_15' )
In [ ]:
In [ ]:
In the Data folder, there is a spreadsheet called studentData.csv consisting of students and test scores. Write a script which calculates each students' average test score and adds that as a column to the DataFrame. The first person to raise their hand and tell me which student has the highest average test score, and what it is wins something.
In [ ]:
Using the twitter data, find all the users with Facebook accounts and create a new column called FacebookID which contains the users' Facebook ID. As you can see in the URL below, a user's Facebook ID can be found in the URL column, http://www.facebook.com/profile.php?id=5141860. Extract this by using the str.extract function. Don't forget to remove all the invalid or empty IDs.
We've already created a DataFrame for you in the cell above.
In [ ]: