In [1]:
import graphlab as gl

In [2]:
url = 'http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv'
song = gl.SFrame.read_csv(url)


[INFO] This non-commercial license of GraphLab Create is assigned to iliassweb@gmail.comand will expire on September 22, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-16203 - Server binary: /home/zax/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443512832.log
[INFO] GraphLab Server Version: 1.6.1
PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv to /var/tmp/graphlab-zax/16203/d84209ef-feb0-4d66-9dbb-6b5ce9e9937e.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv
PROGRESS: Parsing completed. Parsed 100 lines in 1.49752 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Read 637410 lines. Lines per second: 435946
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv
PROGRESS: Parsing completed. Parsed 1000000 lines in 1.71802 secs.

In [4]:
song.show()


Canvas is accessible via web browser at the URL: http://localhost:51504/index.html
Opening Canvas in default web browser.

In [5]:
song.head()


Out[5]:
song_id title release artist_name year
SOQMMHC12AB0180CB8 Silent Night Monster Ballads X-Mas Faster Pussy cat 2003
SOVFVAK12A8C1350D9 Tanssi vaan Karkuteillä Karkkiautomaatti 1995
SOGTUKN12AB017F4F1 No One Could Ever Butter Hudson Mohawke 2006
SOBNYVR12A8C13558C Si Vos Querés De Culo Yerba Brava 2003
SOHSBXH12A8C13B0DF Tangle Of Aspens Rene Ablaze Presents
Winter Sessions ...
Der Mystic 0
SOZVAPQ12A8C13B63C Symphony No. 1 G minor
"Sinfonie ...
Berwald: Symphonies Nos.
1/2/3/4 ...
David Montgomery 0
SOQVRHI12A6D4FB2D7 We Have Got Love Strictly The Best Vol. 34 Sasha / Turbulence 0
SOEYRFT12AB018936C 2 Da Beat Ch'yall Da Bomb Kris Kross 1993
SOPMIYT12A6D4F851E Goodbye Danny Boy Joseph Locke 0
SOJCFMH12A8C13B0C2 Mama_ mama can't you see
? ...
March to cadence with the
US marines ...
The Sun Harbor's Chorus-
Documentary Recordings ...
0
[10 rows x 5 columns]


In [6]:
song['year'].mean()


Out[6]:
1030.3256520000118

Transforming columns


In [9]:
song['num_words']= song['title'].apply(lambda x : len(x.split(' ')))
song


Out[9]:
song_id title release artist_name year num_words
SOQMMHC12AB0180CB8 Silent Night Monster Ballads X-Mas Faster Pussy cat 2003 2
SOVFVAK12A8C1350D9 Tanssi vaan Karkuteillä Karkkiautomaatti 1995 2
SOGTUKN12AB017F4F1 No One Could Ever Butter Hudson Mohawke 2006 4
SOBNYVR12A8C13558C Si Vos Querés De Culo Yerba Brava 2003 3
SOHSBXH12A8C13B0DF Tangle Of Aspens Rene Ablaze Presents
Winter Sessions ...
Der Mystic 0 3
SOZVAPQ12A8C13B63C Symphony No. 1 G minor
"Sinfonie ...
Berwald: Symphonies Nos.
1/2/3/4 ...
David Montgomery 0 9
SOQVRHI12A6D4FB2D7 We Have Got Love Strictly The Best Vol. 34 Sasha / Turbulence 0 4
SOEYRFT12AB018936C 2 Da Beat Ch'yall Da Bomb Kris Kross 1993 4
SOPMIYT12A6D4F851E Goodbye Danny Boy Joseph Locke 0 1
SOJCFMH12A8C13B0C2 Mama_ mama can't you see
? ...
March to cadence with the
US marines ...
The Sun Harbor's Chorus-
Documentary Recordings ...
0 6
[1000000 rows x 6 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Agregation


In [10]:
song.groupby('artist_name', {'total':gl.aggregate.COUNT})


Out[10]:
artist_name total
Gary Lewis & The Playboys 45
The Dells 54
DJ Konnat 1
VULTURE WHALE 11
Martha Argerich/Kathia
Buniatishvili/Dora ...
1
S.S featuring Jp_ Mar Da
Cigar Splitter ...
1
Ian Astbury 11
Big L / Kool G Rap 1
Son House 106
Lil Wayne / BG 1
[72664 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Regresssion model


In [13]:
url = 'http://s3.amazonaws.com/dato-datasets/regression/Housing.csv'
x = gl.SFrame.read_csv(url)
model = gl.linear_regression.create(x, target='price')


PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/regression/Housing.csv to /var/tmp/graphlab-zax/16203/3e472d59-83ab-4cf8-8a1a-b8fbeb638439.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/Housing.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.049366 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,int,int,int,int,int,str,str,str,str,str,int,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/Housing.csv
PROGRESS: Parsing completed. Parsed 546 lines in 0.018493 secs.
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: Linear regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 518
PROGRESS: Number of features          : 12
PROGRESS: Number of unpacked features : 12
PROGRESS: Number of coefficients    : 13
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-max_error | Validation-max_error | Training-rmse | Validation-rmse |
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
PROGRESS: | 1         | 2        | 1.003654     | 97059.006065       | 34861.599338         | 15799.058155  | 16005.847610    |
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+

In [16]:
gl.canvas.set_target('ipynb')
song.show(view="Scatter Plot", x="year", y="release")



In [ ]: