Title: Make Simulated Data For Regression
Slug: make_simulated_data_for_regression
Summary: Make a simulated dataset for regression using scikit-learn.
Date: 2017-01-16 12:00
Category: Machine Learning
Tags: Basics
Authors: Chris Albon
In [1]:
import pandas as pd
from sklearn.datasets import make_regression
In [2]:
# Generate fetures, outputs, and true coefficient of 100 samples,
features, output, coef = make_regression(n_samples = 100,
# three features
n_features = 3,
# where only two features are useful,
n_informative = 2,
# a single target value per observation
n_targets = 1,
# 0.0 standard deviation of the guassian noise
noise = 0.0,
# show the true coefficient used to generated the data
coef = True)
In [3]:
# View the features of the first five rows
pd.DataFrame(features, columns=['Store 1', 'Store 2', 'Store 3']).head()
Out[3]:
In [4]:
# View the output of the first five rows
pd.DataFrame(output, columns=['Sales']).head()
Out[4]:
In [5]:
# View the actual, true coefficients used to generate the data
pd.DataFrame(coef, columns=['True Coefficient Values'])
Out[5]: