Every algorithm is exposed via an `Estimator`

which can be imported as

```
from sklearn.<family> import <model>
```

for linear regression

```
from sklearn.linear_model import LinearRegression
lm_model = LinearRegression(<estimator parameters>)
```

**Estimator parameters** are provided as arguments when you instantiate an Estimator. Sklearn provides good defaults.

In Scikit-learn, Estimators are designed such that

**consistency**: all estimators share a common interface**inspection**: the hyperparameters you set when you instantiate an estimator is available for inspection as properties of that object**limited hierarchy**: only the algorithms are represented as Python objects. Training data, results, parameter names follow standard Python or Numpy / Pandas types**composition**: many workflows can be achieved as a series of more fundamental algorithms**sensible defaults**: you guessed it.

- choose a class of model
- instantiate a model from the class by specifying hyperparameters to its constructor
- arrange data into
`X`

and`y`

and split them for training and testing - fit / learn the model on training data by calling
method`fit()`

- predict new values by calling the
method`predict()`

- evaluate results

To split the input data into train and validation sets, use

```
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)
```

to split it at 70% train and 30% test sets. This method splits both the dependent and independent attributes so as to validate the prediction.

The general syntax is `model.fit(independent_train, dependent_train)`

. Thus

```
lm_model.fit(x_train, y_train)
```

In case of **unsupervised models** you only have a training data, no test data. Hence

```
model.fit(x_train)
```

In Scikit-Learn, by convention all model parameters that were learned during the `fit()`

process have trailing underscores; for example in this linear model, we have `model.coef_`

, `model.intercept_`

A `model.score()`

method returns the a value `0-1`

illustrating how well the model fitted the training data. Note this is useful to understand the influence of **underfitting** and **overfitting** of training data.

Use `model.predict(<independent_test data>)`

. Thus for linear reg,

```
y_predicted = lm_model.predict(x_test)
```

In case of **classification** problems, you also get a `model.predict_proba()`

method which will return the probabilities for each class. The `model.predict()`

will return the class with highest probability.

Relevant in **unsupervised** models, `model.transform()`

is used to transform input data to a new basis. Some models combine the fitting and transformation in one step using the `model.fit_transform()`

method.

You can obtain the **MAE** (Mean Absolute Error), **MSE** (Mean Squared Error) and **RMSE** (Root Mean Squared Error) from the `metrics`

module.

```
from sklearn import metrics
import numpy.np
metrics.mean_absolute_error(y_test, y_predicted)
metrics.mean_squared_error(y_test, y_predicted)
np.sqrt(metrics.mean_squared_error(y_test, y_predicted)
```

```
In [ ]:
```