Fire up GraphLab Create

We always start with this line before using any part of GraphLab Create


In [4]:
import graphlab

Load a tabular data set


In [6]:
%ls


Getting Started with SFrames.ipynb  people-example.csv

In [7]:
sf = graphlab.SFrame('test.csv')


[ERROR] unity_server launched with command ("/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/unity_server" --help) failed 
return code: 127
message: /bin/sh: symbol lookup error: /bin/sh: undefined symbol: rl_signal_event_hook

[ERROR] /bin/sh: symbol lookup error: /bin/sh: undefined symbol: rl_signal_event_hook

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-7-6465d4464f6d> in <module>()
----> 1 sf = graphlab.SFrame('test.csv')

/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/data_structures/sframe.pyc in __init__(self, data, format, _proxy)
    776             self.__proxy__ = _proxy
    777         else:
--> 778             self.__proxy__ = UnitySFrameProxy(glconnect.get_client())
    779             _format = None
    780             if (format == 'auto'):

/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/connect/main.pyc in get_client()
    286     if not is_connected():
    287         launch()
--> 288     assert is_connected(), ENGINE_START_ERROR_MESSAGE
    289     return __CLIENT__
    290 

AssertionError: Cannot connect to GraphLab Create engine. Visit https://dato.com/support for support options.

In [5]:
sf = graphlab.SFrame('people-example.csv')


[ERROR] unity_server launched with command ("/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/unity_server" --help) failed 
return code: 127
message: /bin/sh: symbol lookup error: /bin/sh: undefined symbol: rl_signal_event_hook

[ERROR] /bin/sh: symbol lookup error: /bin/sh: undefined symbol: rl_signal_event_hook

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-4df0be298ea8> in <module>()
----> 1 sf = graphlab.SFrame('people-example.csv')

/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/data_structures/sframe.pyc in __init__(self, data, format, _proxy)
    776             self.__proxy__ = _proxy
    777         else:
--> 778             self.__proxy__ = UnitySFrameProxy(glconnect.get_client())
    779             _format = None
    780             if (format == 'auto'):

/home/idwaker/Applications/miniconda3/envs/coursera-ml/lib/python2.7/site-packages/graphlab/connect/main.pyc in get_client()
    286     if not is_connected():
    287         launch()
--> 288     assert is_connected(), ENGINE_START_ERROR_MESSAGE
    289     return __CLIENT__
    290 

AssertionError: Cannot connect to GraphLab Create engine. Visit https://dato.com/support for support options.

SFrame basics


In [6]:
sf #we can view first few lines of table


Out[6]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [8]:
sf.tail()  # view end of the table


Out[8]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

GraphLab Canvas


In [9]:
# .show() visualizes any data structure in GraphLab Create
sf.show()


Canvas is accessible via web browser at the URL: http://localhost:12000/index.html

In [10]:
# If you want Canvas visualization to show up on this notebook, 
# rather than popping up a new window, add this line:
graphlab.canvas.set_target('ipynb')

In [11]:
sf['age'].show(view='Categorical')


Inspect columns of dataset


In [12]:
sf['Country']


Out[12]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [13]:
sf['age']


Out[13]:
dtype: int
Rows: 7
[24, 23, 22, 23, 23, 22, 25]

Some simple columnar operations


In [14]:
sf['age'].mean()


Out[14]:
23.142857142857146

In [15]:
sf['age'].max()


Out[15]:
25

Create new columns in our SFrame


In [16]:
sf


Out[16]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [18]:
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']

In [19]:
sf


Out[19]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [21]:
sf['age'] * sf['age']


Out[21]:
dtype: int
Rows: 7
[576, 529, 484, 529, 529, 484, 625]

Use the apply function to do a advance transformation of our data


In [22]:
sf['Country']


Out[22]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

In [23]:
sf['Country'].show()



In [24]:
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country

In [25]:
transform_country('Brazil')


Out[25]:
'Brazil'

In [26]:
transform_country('Brasil')


Out[26]:
'Brasil'

In [27]:
transform_country('USA')


Out[27]:
'United States'

In [28]:
sf['Country'].apply(transform_country)


PROGRESS: Using default 16 lambda workers.
PROGRESS: To maximize the degree of parallelism, add the following code to the beginning of the program:
PROGRESS: "graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 32)"
PROGRESS: Note that increasing the degree of parallelism also increases the memory footprint.
Out[28]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']

In [29]:
sf['Country'] = sf['Country'].apply(transform_country)

In [30]:
sf


Out[30]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown United States 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [ ]: