Regression Week 5: Feature Selection and LASSO (Interpretation)

In this notebook, you will use LASSO to select features, building on a pre-implemented solver for LASSO (using GraphLab Create, though you can use other solvers). You will:

Run LASSO with different L1 penalties.
Choose best L1 penalty using a validation set.
Choose best L1 penalty using a validation set, with additional constraint on the size of subset.

In the second notebook, you will implement your own LASSO solver, using coordinate descent.

Fire up graphlab create



In [1]:

    
import graphlab

Load in house sales data

Dataset is from house sales in King County, the region where the city of Seattle, WA is located.



In [2]:

    
sales = graphlab.SFrame('kc_house_data.gl/')









    



[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1476930985.log






    



This non-commercial license of GraphLab Create for academic use is assigned to sudhanshu.shekhar.iitd@gmail.com and will expire on September 18, 2017.

Create new features



In [3]:

    
sales.head()









    Out[3]:





    
        id
        date
        price
        bedrooms
        bathrooms
        sqft_living
        sqft_lot
        floors
        waterfront
    
    
        7129300520
        2014-10-13 00:00:00+00:00
        221900.0
        3.0
        1.0
        1180.0
        5650
        1
        0
    
    
        6414100192
        2014-12-09 00:00:00+00:00
        538000.0
        3.0
        2.25
        2570.0
        7242
        2
        0
    
    
        5631500400
        2015-02-25 00:00:00+00:00
        180000.0
        2.0
        1.0
        770.0
        10000
        1
        0
    
    
        2487200875
        2014-12-09 00:00:00+00:00
        604000.0
        4.0
        3.0
        1960.0
        5000
        1
        0
    
    
        1954400510
        2015-02-18 00:00:00+00:00
        510000.0
        3.0
        2.0
        1680.0
        8080
        1
        0
    
    
        7237550310
        2014-05-12 00:00:00+00:00
        1225000.0
        4.0
        4.5
        5420.0
        101930
        1
        0
    
    
        1321400060
        2014-06-27 00:00:00+00:00
        257500.0
        3.0
        2.25
        1715.0
        6819
        2
        0
    
    
        2008000270
        2015-01-15 00:00:00+00:00
        291850.0
        3.0
        1.5
        1060.0
        9711
        1
        0
    
    
        2414600126
        2015-04-15 00:00:00+00:00
        229500.0
        3.0
        1.0
        1780.0
        7470
        1
        0
    
    
        3793500160
        2015-03-12 00:00:00+00:00
        323000.0
        3.0
        2.5
        1890.0
        6560
        2
        0
    


    
        view
        condition
        grade
        sqft_above
        sqft_basement
        yr_built
        yr_renovated
        zipcode
        lat
    
    
        0
        3
        7
        1180
        0
        1955
        0
        98178
        47.51123398
    
    
        0
        3
        7
        2170
        400
        1951
        1991
        98125
        47.72102274
    
    
        0
        3
        6
        770
        0
        1933
        0
        98028
        47.73792661
    
    
        0
        5
        7
        1050
        910
        1965
        0
        98136
        47.52082
    
    
        0
        3
        8
        1680
        0
        1987
        0
        98074
        47.61681228
    
    
        0
        3
        11
        3890
        1530
        2001
        0
        98053
        47.65611835
    
    
        0
        3
        7
        1715
        0
        1995
        0
        98003
        47.30972002
    
    
        0
        3
        7
        1060
        0
        1963
        0
        98198
        47.40949984
    
    
        0
        3
        7
        1050
        730
        1960
        0
        98146
        47.51229381
    
    
        0
        3
        7
        1890
        0
        2003
        0
        98038
        47.36840673
    


    
        long
        sqft_living15
        sqft_lot15
    
    
        -122.25677536
        1340.0
        5650.0
    
    
        -122.3188624
        1690.0
        7639.0
    
    
        -122.23319601
        2720.0
        8062.0
    
    
        -122.39318505
        1360.0
        5000.0
    
    
        -122.04490059
        1800.0
        7503.0
    
    
        -122.00528655
        4760.0
        101930.0
    
    
        -122.32704857
        2238.0
        6819.0
    
    
        -122.31457273
        1650.0
        9711.0
    
    
        -122.33659507
        1780.0
        8113.0
    
    
        -122.0308176
        2390.0
        7570.0
    

[10 rows x 21 columns]

As in Week 2, we consider features that are some transformations of inputs.



In [4]:

    
from math import log, sqrt
sales['sqft_living_sqrt'] = sales['sqft_living'].apply(sqrt)
sales['sqft_lot_sqrt'] = sales['sqft_lot'].apply(sqrt)
sales['bedrooms_square'] = sales['bedrooms']*sales['bedrooms']

# In the dataset, 'floors' was defined with type string, 
# so we'll convert them to float, before creating a new feature.
sales['floors'] = sales['floors'].astype(float) 
sales['floors_square'] = sales['floors']*sales['floors']

Squaring bedrooms will increase the separation between not many bedrooms (e.g. 1) and lots of bedrooms (e.g. 4) since 1^2 = 1 but 4^2 = 16. Consequently this variable will mostly affect houses with many bedrooms.
On the other hand, taking square root of sqft_living will decrease the separation between big house and small house. The owner may not be exactly twice as happy for getting a house that is twice as big.

Learn regression weights with L1 penalty

Let us fit a model with all the features available, plus the features we just created above.



In [5]:

    
all_features = ['bedrooms', 'bedrooms_square',
            'bathrooms',
            'sqft_living', 'sqft_living_sqrt',
            'sqft_lot', 'sqft_lot_sqrt',
            'floors', 'floors_square',
            'waterfront', 'view', 'condition', 'grade',
            'sqft_above',
            'sqft_basement',
            'yr_built', 'yr_renovated']

Applying L1 penalty requires adding an extra parameter (l1_penalty) to the linear regression call in GraphLab Create. (Other tools may have separate implementations of LASSO.) Note that it's important to set l2_penalty=0 to ensure we don't introduce an additional L2 penalty.



In [6]:

    
model_all = graphlab.linear_regression.create(sales, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=1e10)









    




Linear regression:






    




--------------------------------------------------------






    




Number of examples          : 21613






    




Number of features          : 17






    




Number of unpacked features : 17






    




Number of coefficients    : 18






    




Starting Accelerated Gradient (FISTA)






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+--------------------+---------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-max_error | Training-rmse |






    




+-----------+----------+-----------+--------------+--------------------+---------------+






    




Tuning step size. First iteration could take longer than subsequent iterations.






    




| 1         | 2        | 0.000002  | 1.378619     | 6962915.603493     | 426631.749026 |






    




| 2         | 3        | 0.000002  | 1.410414     | 6843144.200219     | 392488.929838 |






    




| 3         | 4        | 0.000002  | 1.451617     | 6831900.032123     | 385340.166783 |






    




| 4         | 5        | 0.000002  | 1.483224     | 6847166.848958     | 384842.383767 |






    




| 5         | 6        | 0.000002  | 1.516477     | 6869667.895833     | 385998.458623 |






    




| 6         | 7        | 0.000002  | 1.554943     | 6847177.773672     | 380824.455891 |






    




+-----------+----------+-----------+--------------+--------------------+---------------+






    




TERMINATED: Iteration limit reached.






    




This model may not be optimal. To improve it, consider increasing `max_iterations`.

Find what features had non-zero weight.



In [9]:

    
model_all.get('coefficients')[model_all.get('coefficients')['value'] > 0.0]









    Out[9]:





    
        name
        index
        value
        stderr
    
    
        (intercept)
        None
        274873.05595
        None
    
    
        bathrooms
        None
        8468.53108691
        None
    
    
        sqft_living
        None
        24.4207209824
        None
    
    
        sqft_living_sqrt
        None
        350.060553386
        None
    
    
        grade
        None
        842.068034898
        None
    
    
        sqft_above
        None
        20.0247224171
        None
    

[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.

Note that a majority of the weights have been set to zero. So by setting an L1 penalty that's large enough, we are performing a subset selection.

QUIZ QUESTION: According to this list of weights, which of the features have been chosen?

Selecting an L1 penalty

To find a good L1 penalty, we will explore multiple values using a validation set. Let us do three way split into train, validation, and test sets:

Split our sales data into 2 sets: training and test
Further split our training data into two sets: train, validation

Be very careful that you use seed = 1 to ensure you get the same answer!



In [10]:

    
(training_and_validation, testing) = sales.random_split(.9,seed=1) # initial train/test split
(training, validation) = training_and_validation.random_split(0.5, seed=1) # split training into train and validate

Next, we write a loop that does the following:

For l1_penalty in [10^1, 10^1.5, 10^2, 10^2.5, ..., 10^7] (to get this in Python, type np.logspace(1, 7, num=13).)
- Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list.
- Compute the RSS on VALIDATION data (here you will want to use .predict()) for that l1_penalty
Report which l1_penalty produced the lowest RSS on validation data.

When you call linear_regression.create() make sure you set validation_set = None.

Note: you can turn off the print out of linear_regression.create() with verbose = False



In [15]:

    
validation_rss_avg_list = []
best_l1_penalty = 1
min_rss = float("inf")
import numpy as np
for l1_penalty in np.logspace(1, 7, num=13):
    model = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=l1_penalty, verbose=False)
    
    # find validation error
    prediction = model.predict(validation[all_features])
    error = prediction - validation['price']
    error_squared = error * error
    rss = error_squared.sum()
    print "L1 penalty " + str(l1_penalty) + " validation rss = " + str(rss)
    
    if (rss < min_rss):
        min_rss = rss
        best_l1_penalty = l1_penalty
    validation_rss_avg_list.append(rss)


print "Best L1 penalty " + str(best_l1_penalty) + " validation rss = " + str(min_rss)
validation_rss_avg_list









    



L1 penalty 10.0 validation rss = 6.25766285142e+14
L1 penalty 31.6227766017 validation rss = 6.25766285362e+14
L1 penalty 100.0 validation rss = 6.25766286058e+14
L1 penalty 316.227766017 validation rss = 6.25766288257e+14
L1 penalty 1000.0 validation rss = 6.25766295212e+14
L1 penalty 3162.27766017 validation rss = 6.25766317206e+14
L1 penalty 10000.0 validation rss = 6.25766386761e+14
L1 penalty 31622.7766017 validation rss = 6.25766606749e+14
L1 penalty 100000.0 validation rss = 6.25767302792e+14
L1 penalty 316227.766017 validation rss = 6.25769507644e+14
L1 penalty 1000000.0 validation rss = 6.25776517727e+14
L1 penalty 3162277.66017 validation rss = 6.25799062845e+14
L1 penalty 10000000.0 validation rss = 6.25883719085e+14
Best L1 penalty 10.0 validation rss = 6.25766285142e+14






    Out[15]:





[625766285142460.5,
 625766285362394.4,
 625766286057885.1,
 625766288257224.4,
 625766295212186.1,
 625766317206080.8,
 625766386760658.0,
 625766606749278.5,
 625767302791634.9,
 625769507643885.8,
 625776517727024.5,
 625799062845466.6,
 625883719085425.0]



In [16]:

    
np.logspace(1, 7, num=13)









    Out[16]:





array([  1.00000000e+01,   3.16227766e+01,   1.00000000e+02,
         3.16227766e+02,   1.00000000e+03,   3.16227766e+03,
         1.00000000e+04,   3.16227766e+04,   1.00000000e+05,
         3.16227766e+05,   1.00000000e+06,   3.16227766e+06,
         1.00000000e+07])

QUIZ QUESTIONS

What was the best value for the l1_penalty?
What is the RSS on TEST data of the model with the best l1_penalty?



In [17]:

    
best_l1_penalty









    Out[17]:





10.0



In [18]:

    
model_best = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=best_l1_penalty, verbose=False)

QUIZ QUESTION Also, using this value of L1 penalty, how many nonzero weights do you have?



In [20]:

    
len(model_best.get('coefficients')[model_best.get('coefficients')['value'] > 0.0])









    Out[20]:





18

Limit the number of nonzero weights

What if we absolutely wanted to limit ourselves to, say, 7 features? This may be important if we want to derive "a rule of thumb" --- an interpretable model that has only a few features in them.

In this section, you are going to implement a simple, two phase procedure to achive this goal:

Explore a large range of l1_penalty values to find a narrow region of l1_penalty values where models are likely to have the desired number of non-zero weights.
Further explore the narrow region you found to find a good value for l1_penalty that achieves the desired sparsity. Here, we will again use a validation set to choose the best value for l1_penalty.



In [21]:

    
max_nonzeros = 7

Exploring the larger range of values to find a narrow range with the desired sparsity

Let's define a wide range of possible l1_penalty_values:



In [22]:

    
l1_penalty_values = np.logspace(8, 10, num=20)

Now, implement a loop that search through this space of possible l1_penalty values:

For l1_penalty in np.logspace(8, 10, num=20):
- Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list. When you call linear_regression.create() make sure you set validation_set = None
- Extract the weights of the model and count the number of nonzeros. Save the number of nonzeros to a list.
  - Hint: model['coefficients']['value'] gives you an SArray with the parameters you learned. If you call the method .nnz() on it, you will find the number of non-zero parameters!



In [23]:

    
nnz_list = []
for l1_penalty in np.logspace(8, 10, num=20):
    model = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=l1_penalty, verbose=False)
    
    # extract number of nnz
    nnz = model['coefficients']['value'].nnz()
    
    print "L1 penalty " + str(l1_penalty) + " : # nnz = " + str(nnz)

    nnz_list.append(nnz)


nnz_list









    



L1 penalty 100000000.0 : # nnz = 18
L1 penalty 127427498.57 : # nnz = 18
L1 penalty 162377673.919 : # nnz = 18
L1 penalty 206913808.111 : # nnz = 18
L1 penalty 263665089.873 : # nnz = 17
L1 penalty 335981828.628 : # nnz = 17
L1 penalty 428133239.872 : # nnz = 17
L1 penalty 545559478.117 : # nnz = 17
L1 penalty 695192796.178 : # nnz = 17
L1 penalty 885866790.41 : # nnz = 16
L1 penalty 1128837891.68 : # nnz = 15
L1 penalty 1438449888.29 : # nnz = 15
L1 penalty 1832980710.83 : # nnz = 13
L1 penalty 2335721469.09 : # nnz = 12
L1 penalty 2976351441.63 : # nnz = 10
L1 penalty 3792690190.73 : # nnz = 6
L1 penalty 4832930238.57 : # nnz = 5
L1 penalty 6158482110.66 : # nnz = 3
L1 penalty 7847599703.51 : # nnz = 1
L1 penalty 10000000000.0 : # nnz = 1






    Out[23]:





[18, 18, 18, 18, 17, 17, 17, 17, 17, 16, 15, 15, 13, 12, 10, 6, 5, 3, 1, 1]

Out of this large range, we want to find the two ends of our desired narrow range of l1_penalty. At one end, we will have l1_penalty values that have too few non-zeros, and at the other end, we will have an l1_penalty that has too many non-zeros.

More formally, find:

The largest l1_penalty that has more non-zeros than max_nonzero (if we pick a penalty smaller than this value, we will definitely have too many non-zero weights)
- Store this value in the variable l1_penalty_min (we will use it later)
The smallest l1_penalty that has fewer non-zeros than max_nonzero (if we pick a penalty larger than this value, we will definitely have too few non-zero weights)
- Store this value in the variable l1_penalty_max (we will use it later)

Hint: there are many ways to do this, e.g.:

Programmatically within the loop above
Creating a list with the number of non-zeros for each value of l1_penalty and inspecting it to find the appropriate boundaries.



In [25]:

    
l1_penalty_min = 2976351441.63
l1_penalty_max = 3792690190.73

QUIZ QUESTIONS

What values did you find for l1_penalty_min andl1_penalty_max?

Exploring the narrow range of values to find the solution with the right number of non-zeros that has lowest RSS on the validation set

We will now explore the narrow region of l1_penalty values we found:



In [26]:

    
l1_penalty_values = np.linspace(l1_penalty_min,l1_penalty_max,20)

For l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):
- Fit a regression model with a given l1_penalty on TRAIN data. Specify l1_penalty=l1_penalty and l2_penalty=0. in the parameter list. When you call linear_regression.create() make sure you set validation_set = None
- Measure the RSS of the learned model on the VALIDATION set

Find the model that the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzero.



In [29]:

    
nnz_list = []
validation_rss_avg_list = []
best_l1_penalty = 1
min_rss = float("inf")
import numpy as np
for l1_penalty in np.linspace(l1_penalty_min,l1_penalty_max,20):
    model = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=l1_penalty, verbose=False)
    
    # find validation error
    prediction = model.predict(validation[all_features])
    error = prediction - validation['price']
    error_squared = error * error
    rss = error_squared.sum()
    print "L1 penalty " + str(l1_penalty) + " validation rss = " + str(rss)
    
    # extract number of nnz
    nnz = model['coefficients']['value'].nnz()
    
    print "L1 penalty " + str(l1_penalty) + " : # nnz = " + str(nnz)

    nnz_list.append(nnz)
    
    print "----------------------------------------------------------"
    
    if (nnz == max_nonzeros and rss < min_rss):
        min_rss = rss
        best_l1_penalty = l1_penalty
    validation_rss_avg_list.append(rss)

print "Best L1 penalty " + str(best_l1_penalty) + " validation rss = " + str(min_rss)









    



L1 penalty 2976351441.63 validation rss = 9.66925692362e+14
L1 penalty 2976351441.63 : # nnz = 10
----------------------------------------------------------
L1 penalty 3019316638.95 validation rss = 9.74019450085e+14
L1 penalty 3019316638.95 : # nnz = 10
----------------------------------------------------------
L1 penalty 3062281836.27 validation rss = 9.81188367942e+14
L1 penalty 3062281836.27 : # nnz = 10
----------------------------------------------------------
L1 penalty 3105247033.59 validation rss = 9.89328342459e+14
L1 penalty 3105247033.59 : # nnz = 10
----------------------------------------------------------
L1 penalty 3148212230.91 validation rss = 9.98783211266e+14
L1 penalty 3148212230.91 : # nnz = 10
----------------------------------------------------------
L1 penalty 3191177428.24 validation rss = 1.00847716702e+15
L1 penalty 3191177428.24 : # nnz = 10
----------------------------------------------------------
L1 penalty 3234142625.56 validation rss = 1.01829878055e+15
L1 penalty 3234142625.56 : # nnz = 10
----------------------------------------------------------
L1 penalty 3277107822.88 validation rss = 1.02824799221e+15
L1 penalty 3277107822.88 : # nnz = 10
----------------------------------------------------------
L1 penalty 3320073020.2 validation rss = 1.03461690923e+15
L1 penalty 3320073020.2 : # nnz = 8
----------------------------------------------------------
L1 penalty 3363038217.52 validation rss = 1.03855473594e+15
L1 penalty 3363038217.52 : # nnz = 8
----------------------------------------------------------
L1 penalty 3406003414.84 validation rss = 1.04323723787e+15
L1 penalty 3406003414.84 : # nnz = 8
----------------------------------------------------------
L1 penalty 3448968612.16 validation rss = 1.04693748875e+15
L1 penalty 3448968612.16 : # nnz = 7
----------------------------------------------------------
L1 penalty 3491933809.48 validation rss = 1.05114762561e+15
L1 penalty 3491933809.48 : # nnz = 7
----------------------------------------------------------
L1 penalty 3534899006.8 validation rss = 1.05599273534e+15
L1 penalty 3534899006.8 : # nnz = 7
----------------------------------------------------------
L1 penalty 3577864204.12 validation rss = 1.06079953176e+15
L1 penalty 3577864204.12 : # nnz = 7
----------------------------------------------------------
L1 penalty 3620829401.45 validation rss = 1.0657076895e+15
L1 penalty 3620829401.45 : # nnz = 6
----------------------------------------------------------
L1 penalty 3663794598.77 validation rss = 1.06946433543e+15
L1 penalty 3663794598.77 : # nnz = 6
----------------------------------------------------------
L1 penalty 3706759796.09 validation rss = 1.07350454959e+15
L1 penalty 3706759796.09 : # nnz = 6
----------------------------------------------------------
L1 penalty 3749724993.41 validation rss = 1.07763277558e+15
L1 penalty 3749724993.41 : # nnz = 6
----------------------------------------------------------
L1 penalty 3792690190.73 validation rss = 1.08186759232e+15
L1 penalty 3792690190.73 : # nnz = 6
----------------------------------------------------------
Best L1 penalty 3448968612.16 validation rss = 1.04693748875e+15

QUIZ QUESTIONS

What value of l1_penalty in our narrow range has the lowest RSS on the VALIDATION set and has sparsity equal to max_nonzeros?
What features in this model have non-zero coefficients?



In [31]:

    
model_best = graphlab.linear_regression.create(training, target='price', features=all_features,
                                              validation_set=None, 
                                              l2_penalty=0., l1_penalty=best_l1_penalty, verbose=False)
model_best.get('coefficients')[model_best.get('coefficients')['value'] > 0.0]









    Out[31]:





    
        name
        index
        value
        stderr
    
    
        (intercept)
        None
        222253.192544
        None
    
    
        bedrooms
        None
        661.722717782
        None
    
    
        bathrooms
        None
        15873.9572593
        None
    
    
        sqft_living
        None
        32.4102214513
        None
    
    
        sqft_living_sqrt
        None
        690.114773313
        None
    
    
        grade
        None
        2899.42026975
        None
    
    
        sqft_above
        None
        30.0115753022
        None
    

[? rows x 4 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force materialization.



In [ ]:

id	date	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors
7129300520	2014-10-13 00:00:00+00:00	221900.0	3.0	1.0	1180.0	5650	1
6414100192	2014-12-09 00:00:00+00:00	538000.0	3.0	2.25	2570.0	7242	2
5631500400	2015-02-25 00:00:00+00:00	180000.0	2.0	1.0	770.0	10000	1
2487200875	2014-12-09 00:00:00+00:00	604000.0	4.0	3.0	1960.0	5000	1
1954400510	2015-02-18 00:00:00+00:00	510000.0	3.0	2.0	1680.0	8080	1
7237550310	2014-05-12 00:00:00+00:00	1225000.0	4.0	4.5	5420.0	101930	1
1321400060	2014-06-27 00:00:00+00:00	257500.0	3.0	2.25	1715.0	6819	2
2008000270	2015-01-15 00:00:00+00:00	291850.0	3.0	1.5	1060.0	9711	1
2414600126	2015-04-15 00:00:00+00:00	229500.0	3.0	1.0	1780.0	7470	1
3793500160	2015-03-12 00:00:00+00:00	323000.0	3.0	2.5	1890.0	6560	2

condition	grade	sqft_above	sqft_basement	yr_built	yr_renovated	zipcode	lat
3	7	1180	0	1955	0	98178	47.51123398
3	7	2170	400	1951	1991	98125	47.72102274
3	6	770	0	1933	0	98028	47.73792661
5	7	1050	910	1965	0	98136	47.52082
3	8	1680	0	1987	0	98074	47.61681228
3	11	3890	1530	2001	0	98053	47.65611835
3	7	1715	0	1995	0	98003	47.30972002
3	7	1060	0	1963	0	98198	47.40949984
3	7	1050	730	1960	0	98146	47.51229381
3	7	1890	0	2003	0	98038	47.36840673

long	sqft_living15	sqft_lot15
-122.25677536	1340.0	5650.0
-122.3188624	1690.0	7639.0
-122.23319601	2720.0	8062.0
-122.39318505	1360.0	5000.0
-122.04490059	1800.0	7503.0
-122.00528655	4760.0	101930.0
-122.32704857	2238.0	6819.0
-122.31457273	1650.0	9711.0
-122.33659507	1780.0	8113.0
-122.0308176	2390.0	7570.0

name	index	value	stderr
(intercept)	None	274873.05595	None
bathrooms	None	8468.53108691	None
sqft_living	None	24.4207209824	None
sqft_living_sqrt	None	350.060553386	None
grade	None	842.068034898	None
sqft_above	None	20.0247224171	None

name	index	value	stderr
(intercept)	None	222253.192544	None
bedrooms	None	661.722717782	None
bathrooms	None	15873.9572593	None
sqft_living	None	32.4102214513	None
sqft_living_sqrt	None	690.114773313	None
grade	None	2899.42026975	None
sqft_above	None	30.0115753022	None