In [1]:
IMPORT GRAPHLAB


  File "<ipython-input-1-bc0bc74d23c6>", line 1
    IMPORT GRAPHLAB
                  ^
SyntaxError: invalid syntax

In [2]:
import graphlab

In [3]:
sales = graphlab.SFrame('home_data.gl')


[INFO] GraphLab Create v1.8.2 started. Logging: C:\Users\VSAMJI~1.ASU\AppData\Local\Temp\graphlab_server_1456120959.log.0

In [4]:
sales.show(view='BoxWhisker Plot', x ='zipcode', y='price')


Canvas is accessible via web browser at the URL: http://localhost:64825/index.html
Opening Canvas in default web browser.

In [6]:
filtered_sales = sales[sales['zipcode'] == '98039']

In [7]:
filtered_sales


Out[7]:
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront
3625049014 2014-08-29 00:00:00+00:00 2950000 4 3.5 4860 23885 2 0
2540700110 2015-02-12 00:00:00+00:00 1905000 4 3.5 4210 18564 2 0
3262300940 2014-11-07 00:00:00+00:00 875000 3 1 1220 8119 1 0
3262300940 2015-02-10 00:00:00+00:00 940000 3 1 1220 8119 1 0
6447300265 2014-10-14 00:00:00+00:00 4000000 4 5.5 7080 16573 2 0
2470100110 2014-08-04 00:00:00+00:00 5570000 5 5.75 9200 35069 2 0
2210500019 2015-03-24 00:00:00+00:00 937500 3 1 1320 8500 1 0
6447300345 2015-04-06 00:00:00+00:00 1160000 4 3 2680 15438 2 0
6447300225 2014-11-06 00:00:00+00:00 1880000 3 2.75 2620 17919 1 0
2525049148 2014-10-07 00:00:00+00:00 3418800 5 5 5450 20412 2 0
view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat
0 3 12 4860 0 1996 0 98039 47.61717049
0 3 11 4210 0 2001 0 98039 47.62060082
0 4 7 1220 0 1955 0 98039 47.63281908
0 4 7 1220 0 1955 0 98039 47.63281908
0 3 12 5760 1320 2008 0 98039 47.61512031
0 3 13 6200 3000 2001 0 98039 47.62888314
0 4 7 1320 0 1954 0 98039 47.61872888
2 3 8 2680 0 1902 1956 98039 47.61089438
1 4 9 2620 0 1949 0 98039 47.61435052
0 3 11 5450 0 2014 0 98039 47.62087993
long sqft_living15 sqft_lot15
-122.23040939 3580.0 16054.0
-122.2245047 3520.0 18564.0
-122.23554392 1910.0 8119.0
-122.23554392 1910.0 8119.0
-122.22420058 3140.0 15996.0
-122.23346379 3560.0 24345.0
-122.22643371 2790.0 10800.0
-122.22582388 4480.0 14406.0
-122.22772057 3400.0 14400.0
-122.23726918 3160.0 17825.0
[? rows x 21 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.

In [8]:
print filtered_sales['price'].mean()


2160606.6

In [11]:
filtered_sales = sales[((sales['sqft_living']) > 2000) & ((sales['sqft_living']) <= 4000)]

In [12]:
filtered_sales.num_rows()


Out[12]:
9118

In [13]:
sales.num_rows()


Out[13]:
21613

In [14]:
my_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']

In [16]:
advanced_features = [
'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode',
'condition', # condition of house				
'grade', # measure of quality of construction				
'waterfront', # waterfront property				
'view', # type of view				
'sqft_above', # square feet above ground				
'sqft_basement', # square feet in basement				
'yr_built', # the year built				
'yr_renovated', # the year renovated				
'lat', 'long', # the lat-long of the parcel				
'sqft_living15', # average sq.ft. of 15 nearest neighbors 				
'sqft_lot15', # average lot size of 15 nearest neighbors 
]

In [18]:
features_train_data, features_test_data = sales.random_split(.8, seed  = 0)

In [21]:
my_features_model = graphlab.linear_regression.create(features_train_data, target = 'price', features = my_features)
adv_features_model = graphlab.linear_regression.create(features_train_data, target = 'price', features = advanced_features)


Linear regression:
--------------------------------------------------------
Number of examples          : 16415
Number of features          : 6
Number of unpacked features : 6
Number of coefficients    : 115
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Validation-max_error | Training-rmse | Validation-rmse |
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
| 1         | 2        | 0.013013     | 3758543.779309     | 1759162.306382       | 182294.090578 | 176100.234259   |
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
SUCCESS: Optimal solution found.

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.
Linear regression:
--------------------------------------------------------
Number of examples          : 16464
Number of features          : 18
Number of unpacked features : 18
Number of coefficients    : 127
Starting Newton Method
--------------------------------------------------------
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
| Iteration | Passes   | Elapsed Time | Training-max_error | Validation-max_error | Training-rmse | Validation-rmse |
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
| 1         | 2        | 0.023023     | 3463789.565884     | 1946790.944900       | 153912.032123 | 170848.474443   |
+-----------+----------+--------------+--------------------+----------------------+---------------+-----------------+
SUCCESS: Optimal solution found.



In [22]:
print my_features_model.evaluate(features_train_data)
print adv_features_model.evaluate(features_test_data)


{'max_error': 3758543.779309143, 'rmse': 181954.388214861}
{'max_error': 3559198.7690726146, 'rmse': 156925.86959612375}

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: