In [1]:
import graphlab as gl
In [2]:
url = 'http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv'
song = gl.SFrame.read_csv(url)
[INFO] This non-commercial license of GraphLab Create is assigned to iliassweb@gmail.comand will expire on September 22, 2016. For commercial licensing options, visit https://dato.com/buy/.
[INFO] Start server at: ipc:///tmp/graphlab_server-16203 - Server binary: /home/zax/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443512832.log
[INFO] GraphLab Server Version: 1.6.1
PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv to /var/tmp/graphlab-zax/16203/d84209ef-feb0-4d66-9dbb-6b5ce9e9937e.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv
PROGRESS: Parsing completed. Parsed 100 lines in 1.49752 secs.
------------------------------------------------------
Inferred types from first line of file as
column_type_hints=[str,str,str,str,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Read 637410 lines. Lines per second: 435946
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/millionsong/song_data.csv
PROGRESS: Parsing completed. Parsed 1000000 lines in 1.71802 secs.
In [4]:
song.show()
Canvas is accessible via web browser at the URL: http://localhost:51504/index.html
Opening Canvas in default web browser.
In [5]:
song.head()
Out[5]:
song_id
title
release
artist_name
year
SOQMMHC12AB0180CB8
Silent Night
Monster Ballads X-Mas
Faster Pussy cat
2003
SOVFVAK12A8C1350D9
Tanssi vaan
Karkuteillä
Karkkiautomaatti
1995
SOGTUKN12AB017F4F1
No One Could Ever
Butter
Hudson Mohawke
2006
SOBNYVR12A8C13558C
Si Vos Querés
De Culo
Yerba Brava
2003
SOHSBXH12A8C13B0DF
Tangle Of Aspens
Rene Ablaze Presents
Winter Sessions ...
Der Mystic
0
SOZVAPQ12A8C13B63C
Symphony No. 1 G minor
"Sinfonie ...
Berwald: Symphonies Nos.
1/2/3/4 ...
David Montgomery
0
SOQVRHI12A6D4FB2D7
We Have Got Love
Strictly The Best Vol. 34
Sasha / Turbulence
0
SOEYRFT12AB018936C
2 Da Beat Ch'yall
Da Bomb
Kris Kross
1993
SOPMIYT12A6D4F851E
Goodbye
Danny Boy
Joseph Locke
0
SOJCFMH12A8C13B0C2
Mama_ mama can't you see
? ...
March to cadence with the
US marines ...
The Sun Harbor's Chorus-
Documentary Recordings ...
0
[10 rows x 5 columns]
In [6]:
song['year'].mean()
Out[6]:
1030.3256520000118
In [9]:
song['num_words']= song['title'].apply(lambda x : len(x.split(' ')))
song
Out[9]:
song_id
title
release
artist_name
year
num_words
SOQMMHC12AB0180CB8
Silent Night
Monster Ballads X-Mas
Faster Pussy cat
2003
2
SOVFVAK12A8C1350D9
Tanssi vaan
Karkuteillä
Karkkiautomaatti
1995
2
SOGTUKN12AB017F4F1
No One Could Ever
Butter
Hudson Mohawke
2006
4
SOBNYVR12A8C13558C
Si Vos Querés
De Culo
Yerba Brava
2003
3
SOHSBXH12A8C13B0DF
Tangle Of Aspens
Rene Ablaze Presents
Winter Sessions ...
Der Mystic
0
3
SOZVAPQ12A8C13B63C
Symphony No. 1 G minor
"Sinfonie ...
Berwald: Symphonies Nos.
1/2/3/4 ...
David Montgomery
0
9
SOQVRHI12A6D4FB2D7
We Have Got Love
Strictly The Best Vol. 34
Sasha / Turbulence
0
4
SOEYRFT12AB018936C
2 Da Beat Ch'yall
Da Bomb
Kris Kross
1993
4
SOPMIYT12A6D4F851E
Goodbye
Danny Boy
Joseph Locke
0
1
SOJCFMH12A8C13B0C2
Mama_ mama can't you see
? ...
March to cadence with the
US marines ...
The Sun Harbor's Chorus-
Documentary Recordings ...
0
6
[1000000 rows x 6 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [10]:
song.groupby('artist_name', {'total':gl.aggregate.COUNT})
Out[10]:
artist_name
total
Gary Lewis & The Playboys
45
The Dells
54
DJ Konnat
1
VULTURE WHALE
11
Martha Argerich/Kathia
Buniatishvili/Dora ...
1
S.S featuring Jp_ Mar Da
Cigar Splitter ...
1
Ian Astbury
11
Big L / Kool G Rap
1
Son House
106
Lil Wayne / BG
1
[72664 rows x 2 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [13]:
url = 'http://s3.amazonaws.com/dato-datasets/regression/Housing.csv'
x = gl.SFrame.read_csv(url)
model = gl.linear_regression.create(x, target='price')
PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/regression/Housing.csv to /var/tmp/graphlab-zax/16203/3e472d59-83ab-4cf8-8a1a-b8fbeb638439.csv
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/Housing.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.049366 secs.
------------------------------------------------------
Inferred types from first line of file as
column_type_hints=[int,int,int,int,int,int,str,str,str,str,str,int,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/regression/Housing.csv
PROGRESS: Parsing completed. Parsed 546 lines in 0.018493 secs.
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
PROGRESS: Linear regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples : 518
PROGRESS: Number of features : 12
PROGRESS: Number of unpacked features : 12
PROGRESS: Number of coefficients : 13
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
PROGRESS: | Iteration | Passes | Elapsed Time | Training-max_error | Validation-max_error | Training-rmse | Validation-rmse |
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
PROGRESS: | 1 | 2 | 1.003654 | 97059.006065 | 34861.599338 | 15799.058155 | 16005.847610 |
PROGRESS: +-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
In [16]:
gl.canvas.set_target('ipynb')
song.show(view="Scatter Plot", x="year", y="release")
In [ ]:
Content source: ledrui/Predicting-House-Prices
Similar notebooks: