Python is a popular high-level programming language. It's a simple language, designed with an emphsis on code readability. If you already have programming experience, Python is easy to learn.
Follow these detailed instructions to install GraphLab Create and Python: https://turi.com/download/install-graphlab-create.html
Once you have have Anaconda, start an IPython session. IPython is a powerful interactive shell for executing Python. You can start an IPython session by running "ipython" from the command line.
This tutorial is written as IPython notebooks. This allows you to download and run the tutorials on your own machine, either as a notebook (.ipynb) or Python file (.py).
Now it time to execute our first Python command. We can use print keyword to print a string.
In [1]:
print 'Hello World!'
In Python single line comments are started with a #.
In [2]:
# this is a comment!
Python doesn't actually have built in support of multiline comments. However this can be done by just creating a multine string and not setting it equal to anything. Multiline string are started and ended with three single quotes or three double quotes. (Single and double quotes are equivant in Python.)
In [3]:
'''
This is technically just
a multiline string but
ususually it's used as a
multiline comment.
'''
Out[3]:
Python has several built in data types. The simple built in types are called: bool, str, int, and float. These are just shorthand names for: boolean, string, integer, and floating point number.
Below are examples of creating each type.
In [4]:
b = True # bool
s = 'This is a string' # str
i = 4 # int
f = 4.1 # float
Python has other built in types that are compound types (i.e. types composed of other types). The most common are: list, dict and tuple.
dict is just short for dictionary.
Below are examples of creating these types, and accessing their elements.
In [5]:
d = {'foo': 1, 'bar': 2} # dict
l = [3,2,1] # list
t = (1,2,3) # tuple
print d['foo']
print l[2]
print t[1]
Tuples are like lists except they are immutable. Strings are also immutable.
Python also has a special type called None which can be set to any data type.
In [6]:
b = None
s = None
You can print the value of variable inside of strings by using the % operator and placing %s inside of the string. For example:
In [7]:
print "Our float value is %s. Our int value is %s." % (f, i)
You create a functions by using the def keyword. Here is an example of a function called add2 that takes a value called x return the value of two added to it.
In [8]:
def add2(x):
return x + 2
add2(10)
Out[8]:
Like most programming languages, Python has if and else statements. The elif keyword is used for else-if statements. Unlike a lot of programming language, white space is meaningfull; the body of if-statements must be indented from its test-expression. Python doesn't use braces.
You can use the and and or keywords to string together multipart tests.
In [9]:
if i == 1 and f > 4:
print "The value of i is 1 and f is greater than 4."
elif i > 4 or f > 4:
print "i is greater than 4 or f is greateer than 4."
else:
print "Both i and f are less or equal to 4."
Python has two types of loops, for loops and while loops.
In a for-loops there is one iteration for each element in the variable. Note that i is the current element, not the index value.
In [10]:
for i in l:
print i
While-loops are executed as long as the given expression is True.
In [11]:
while i < 10:
print i
i += 1
Notice the use of "+=" to increment. Unlike a lot of programming languages, Python does not have a increment or decrement operator.
First, download and install GraphLab-Create by following these directions: https://turi.com/download/
In order to use another library, you first need to import that library. Like so:
In [1]:
import graphlab
Using GraphLab Create, we can easily read in comma seperated file.
In [2]:
sf = graphlab.SFrame.read_csv('https://static.turi.com/datasets/coursera/toy_datasets/people-example.csv')
In [3]:
sf # you can view the contents
Out[3]:
In [7]:
# you can explore summaries of the data
sf.show()
In [4]:
# you can also do this inline
graphlab.canvas.set_target('ipynb')
sf['age'].show(view='Categorical')
Suppose we just wanted to look a single column.
In [5]:
sf['Country']
Out[5]:
You can add columns.
In [6]:
# add a new column called "Full Name":
sf['Full Name'] = sf['First Name'] + ' ' + sf['Last Name']
sf
Out[6]:
In [7]:
# You can filter finding all rows that match a logical condition
sf[sf['Full Name'] == 'Felix Brown']
Out[7]:
In [13]:
# You can do math
print sf['age']
print sf['age'].mean()
print sf['age'].std()
print sf['age']*2
print sf['age']+2*sf['age']
In [15]:
sf['Country']
Out[15]:
On the countries, notice that we have two country values that mean the same thing: "United States" and "USA".
To fix this we can apply a function to transform the 'USA' to 'United States'
In [14]:
def transform_country(country):
if country == 'USA':
return 'United States'
else:
return country
In [16]:
sf['Country'].apply(transform_country)
Out[16]:
We could also have used a lambda function in the apply. Lambdas are just inline, unamed functions. Lambdas also don't have explict return statements. What the expression evaluates to will be automatically returned
In [18]:
sf['Country'] = sf['Country'].apply(lambda cur_value: 'United States' if cur_value == 'USA' else cur_value)
sf.show()
For more about GraphLab Create see our Getting Started with GraphLab Create Notebook or our Introduction to SFrame Notebook.