Fire up GraphLab Create



In [2]:

    
import graphlab

Load a tabular data set



In [3]:

    
sf = graphlab.SFrame('people-example.csv')









    



2016-04-05 07:43:18,699 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: /tmp/graphlab_server_1459856596.log






    



This non-commercial license of GraphLab Create is assigned to robert.petit@emory.edu and will expire on March 28, 2017. For commercial licensing options, visit https://dato.com/buy/.






    




Finished parsing file /Users/rpetit/repos/rpetit3-education/coursera-ml-foundations/week01/people-example.csv






    




Parsing completed. Parsed 7 lines in 0.034906 secs.






    



------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------






    




Finished parsing file /Users/rpetit/repos/rpetit3-education/coursera-ml-foundations/week01/people-example.csv






    




Parsing completed. Parsed 7 lines in 0.014495 secs.

SFrame basics



In [6]:

    
# View first few lines of table
sf









    Out[6]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    
    
        Alice
        Williams
        Canada
        23
    
    
        Malcolm
        Jone
        England
        22
    
    
        Felix
        Brown
        USA
        23
    
    
        Alex
        Cooper
        Poland
        23
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[7 rows x 4 columns]



In [7]:

    
# sf.head() works the same as Unix 'head'
sf.head(1)









    Out[7]:





    
        First Name
        Last Name
        Country
        age
    
    
        Bob
        Smith
        United States
        24
    

[1 rows x 4 columns]



In [8]:

    
# sf.tail() works the same as Unix 'tail'
sf.tail(2)









    Out[8]:





    
        First Name
        Last Name
        Country
        age
    
    
        Tod
        Campbell
        United States
        22
    
    
        Derek
        Ward
        Switzerland
        25
    

[2 rows x 4 columns]

GraphLab Canvas



In [9]:

    
# Show the data with GraphLab Create
sf.show()









    



Canvas is accessible via web browser at the URL: http://localhost:53922/index.html
Opening Canvas in default web browser.



In [11]:

    
graphlab.canvas.set_target('ipynb')



In [12]:

    
sf['age'].show(view='Categorical')

Inspect columns of dataset



In [13]:

    
sf['Country']









    Out[13]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']



In [14]:

    
sf['age']









    Out[14]:





dtype: int
Rows: 7
[24, 23, 22, 23, 23, 22, 25]



In [16]:

    
sf['age'].mean()









    Out[16]:





23.142857142857146



In [17]:

    
sf['age'].max()









    Out[17]:





25

Create new columns in SFrame



In [21]:

    
# Often called feature engineering
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']



In [22]:

    
sf









    Out[22]:





    
        First Name
        Last Name
        Country
        age
        Full Name
    
    
        Bob
        Smith
        United States
        24
        Bob Smith
    
    
        Alice
        Williams
        Canada
        23
        Alice Williams
    
    
        Malcolm
        Jone
        England
        22
        Malcolm Jone
    
    
        Felix
        Brown
        USA
        23
        Felix Brown
    
    
        Alex
        Cooper
        Poland
        23
        Alex Cooper
    
    
        Tod
        Campbell
        United States
        22
        Tod Campbell
    
    
        Derek
        Ward
        Switzerland
        25
        Derek Ward
    

[7 rows x 5 columns]

Use apply() for advanced data transformation



In [23]:

    
sf['Country']









    Out[23]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']



In [24]:

    
sf['Country'].show()



In [25]:

    
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country



In [26]:

    
transform_country('Brazil')









    Out[26]:





'Brazil'



In [27]:

    
transform_country('USA')









    Out[27]:





'United States'



In [28]:

    
sf['Country'].apply(transform_country)









    Out[28]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']



In [30]:

    
sf['Country'] = sf['Country'].apply(transform_country)
sf['Country']









    Out[30]:





dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']



In [31]:

    
sf['Country'].show()

First Name	Last Name	Country	age
Bob	Smith	United States	24
Alice	Williams	Canada	23
Malcolm	Jone	England	22
Felix	Brown	USA	23
Alex	Cooper	Poland	23
Tod	Campbell	United States	22
Derek	Ward	Switzerland	25

First Name	Last Name	Country	age	Full Name
Bob	Smith	United States	24	Bob Smith
Alice	Williams	Canada	23	Alice Williams
Malcolm	Jone	England	22	Malcolm Jone
Felix	Brown	USA	23	Felix Brown
Alex	Cooper	Poland	23	Alex Cooper
Tod	Campbell	United States	22	Tod Campbell
Derek	Ward	Switzerland	25	Derek Ward