In [1]:
!pip install .
In [2]:
import fastfsr
In [3]:
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
In the originally paper, the authors used the R package leaps for best subset selection. However, there does not exist corresponding package or function in Python and hence we also implemented a regression subset selection.
The function takes in X and Y and return the best subset selection.
In [4]:
ncaa = pd.read_csv("http://www4.stat.ncsu.edu/~boos/var.select/ncaa.data2.txt",
delim_whitespace = True)
x = ncaa.ix[:,:-1]
y = ncaa.ix[:,-1]
x.head()
Out[4]:
In [5]:
fastfsr.reg_subset(x, y)
Out[5]:
In [ ]: