Getting Started with Python and GraphLab Create

Python is a popular high-level programming language. It's a simple language, designed with an emphsis on code readability. If you already have programming experience, Python is easy to learn.

Installing GraphLab and Python

Follow these detailed instructions to install GraphLab Create and Python: https://turi.com/download/install-graphlab-create.html

Once you have have Anaconda, start an IPython session. IPython is a powerful interactive shell for executing Python. You can start an IPython session by running "ipython" from the command line.

This tutorial is written as IPython notebooks. This allows you to download and run the tutorials on your own machine, either as a notebook (.ipynb) or Python file (.py).

Python Basics

Now it time to execute our first Python command. We can use print keyword to print a string.


In [1]:
print 'Hello World!'


Hello World!

In Python single line comments are started with a #.


In [2]:
# this is a comment!

Python doesn't actually have built in support of multiline comments. However this can be done by just creating a multine string and not setting it equal to anything. Multiline string are started and ended with three single quotes or three double quotes. (Single and double quotes are equivant in Python.)


In [3]:
'''
This is technically just
 a multiline string but
 ususually it's used as a
 multiline comment.
'''


Out[3]:
"\nThis is technically just\n a multiline string but\n ususually it's used as a\n multiline comment.\n"

Python has several built in data types. The simple built in types are called: bool, str, int, and float. These are just shorthand names for: boolean, string, integer, and floating point number.

Below are examples of creating each type.


In [4]:
b = True                              # bool
s = 'This is a string'                # str
i = 4                                 # int
f = 4.1                               # float

Python has other built in types that are compound types (i.e. types composed of other types). The most common are: list, dict and tuple.

dict is just short for dictionary.

Below are examples of creating these types, and accessing their elements.


In [5]:
d = {'foo': 1, 'bar': 2}              # dict
l = [3,2,1]                           # list
t = (1,2,3)                           # tuple

print d['foo']
print l[2]
print t[1]


1
1
2

Tuples are like lists except they are immutable. Strings are also immutable.

Python also has a special type called None which can be set to any data type.


In [6]:
b = None
s = None

You can print the value of variable inside of strings by using the % operator and placing %s inside of the string. For example:


In [7]:
print "Our float value is %s. Our int value is %s." % (f, i)


Our float value is 4.1. Our int value is 4.

You create a functions by using the def keyword. Here is an example of a function called add2 that takes a value called x return the value of two added to it.


In [8]:
def add2(x):
    return x + 2

add2(10)


Out[8]:
12

Like most programming languages, Python has if and else statements. The elif keyword is used for else-if statements. Unlike a lot of programming language, white space is meaningfull; the body of if-statements must be indented from its test-expression. Python doesn't use braces.

You can use the and and or keywords to string together multipart tests.


In [9]:
if i == 1 and f > 4:
    print "The value of i is 1 and f is greater than 4."
elif i > 4 or f > 4:
    print "i is greater than 4 or f is greateer than 4."
else:
    print "Both i and f are less or equal to 4."


i is greater than 4 or f is greateer than 4.

Python has two types of loops, for loops and while loops.

In a for-loops there is one iteration for each element in the variable. Note that i is the current element, not the index value.


In [10]:
for i in l:
    print i


3
2
1

While-loops are executed as long as the given expression is True.


In [11]:
while i < 10:
    print i
    i += 1


1
2
3
4
5
6
7
8
9

Notice the use of "+=" to increment. Unlike a lot of programming languages, Python does not have a increment or decrement operator.

GraphLab Create Basics

First, download and install GraphLab-Create by following these directions: https://turi.com/download/

In order to use another library, you first need to import that library. Like so:


In [1]:
import graphlab

Using GraphLab Create, we can easily read in comma seperated file.


In [2]:
sf = graphlab.SFrame.read_csv('https://static.turi.com/datasets/coursera/toy_datasets/people-example.csv')


[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-32043 - Server binary: /Users/shawnscully/Programming/new_install_test/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1439957106.log
[INFO] GraphLab Server Version: 1.5.2
PROGRESS: Downloading https://static.turi.com/datasets/coursera/toy_datasets/people-example.csv to /var/tmp/graphlab-shawnscully/32043/000000.csv
PROGRESS: Finished parsing file https://static.turi.com/datasets/coursera/toy_datasets/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.02295 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file https://static.turi.com/datasets/coursera/toy_datasets/people-example.csv
PROGRESS: Parsing completed. Parsed 7 lines in 0.018382 secs.

SFrame basics


In [3]:
sf # you can view the contents


Out[3]:
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]

In [7]:
# you can explore summaries of the data
sf.show()


Canvas is accessible via web browser at the URL: http://localhost:57167/index.html
Opening Canvas in default web browser.

In [4]:
# you can also do this inline
graphlab.canvas.set_target('ipynb')     
sf['age'].show(view='Categorical')


Suppose we just wanted to look a single column.


In [5]:
sf['Country']


Out[5]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

You can add columns.


In [6]:
# add a new column called "Full Name":
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']
sf


Out[6]:
First Name Last Name Country age Full Name
Bob Smith United States 24 Bob Smith
Alice Williams Canada 23 Alice Williams
Malcolm Jone England 22 Malcolm Jone
Felix Brown USA 23 Felix Brown
Alex Cooper Poland 23 Alex Cooper
Tod Campbell United States 22 Tod Campbell
Derek Ward Switzerland 25 Derek Ward
[7 rows x 5 columns]

In [7]:
# You can filter finding all rows that match a logical condition
sf[sf['Full Name'] == 'Felix Brown']


Out[7]:
First Name Last Name Country age Full Name
Felix Brown USA 23 Felix Brown
[? rows x 5 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.

In [13]:
# You can do math
print sf['age']
print sf['age'].mean()
print sf['age'].std()
print sf['age']*2
print sf['age']+2*sf['age']


[24, 23, 22, 23, 23, 22, 25]
23.1428571429
0.989743318611
[48, 46, 44, 46, 46, 44, 50]
[72, 69, 66, 69, 69, 66, 75]

In [15]:
sf['Country']


Out[15]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'USA', 'Poland', 'United States', 'Switzerland']

On the countries, notice that we have two country values that mean the same thing: "United States" and "USA".

To fix this we can apply a function to transform the 'USA' to 'United States'


In [14]:
def transform_country(country):
    if country == 'USA':
        return 'United States'
    else:
        return country

In [16]:
sf['Country'].apply(transform_country)


Out[16]:
dtype: str
Rows: 7
['United States', 'Canada', 'England', 'United States', 'Poland', 'United States', 'Switzerland']

We could also have used a lambda function in the apply. Lambdas are just inline, unamed functions. Lambdas also don't have explict return statements. What the expression evaluates to will be automatically returned


In [18]:
sf['Country'] = sf['Country'].apply(lambda cur_value: 'United States' if cur_value == 'USA' else cur_value)
sf.show()