Fri can be installed via the Python Package Index (PyPI).
If you have pip installed just execute the command
pip install fri
to get the newest stable version.
The dependencies should be installed and checked automatically. If you have problems installing please open issue at our tracker.
To install a bleeding edge dev version of FRI you can clone the GitHub repository using
git clone git@github.com:lpfann/fri.git
and then check out the dev branch: git checkout dev.
We use poetry for dependency management.
Run
poetry install
in the cloned repository to install fri in a virtualenv.
To check if everything works as intented you can use pytest to run the unit tests.
Just run the command
poetry run pytest
in the main project folder
In [1]:
import numpy as np
# fixed Seed for demonstration
STATE = np.random.RandomState(123)
from fri import genClassificationData
We want to create a small set with a few features.
Because we want to showcase the all-relevant feature selection, we generate multiple strongly and weakly relevant features.
In [2]:
n = 300
features = 6
strongly_relevant = 2
weakly_relevant = 2
In [3]:
X,y = genClassificationData(n_samples=n,
n_features=features,
n_strel=strongly_relevant,
n_redundant=weakly_relevant,
random_state=STATE)
The method also prints out the parameters again.
In [4]:
X.shape
Out[4]:
We created a binary classification set with 6 features of which 2 are strongly relevant and 2 weakly relevant.
In [5]:
from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
In [6]:
import fri
fri provides a convenience class fri.FRI to create a model.
fri.FRI needs the type of problem as a first argument of type ProblemName.
Depending on the Problem you want to analyze pick from one of the available models in ProblemName.
In [7]:
list(fri.ProblemName)
Out[7]:
Because we have Classification data we use the ProblemName.CLASSIFICATION to instantiate our model.
In [10]:
fri_model = fri.FRI(fri.ProblemName.CLASSIFICATION,
loss_slack=0.2,
w_l1_slack=0.2,
random_state=STATE)
In [11]:
fri_model
Out[11]:
We used no parameters for creation so the defaults are active.
In [12]:
fri_model.fit(X_scaled,y)
Out[12]:
The resulting feature relevance bounds are saved in the interval_ variable.
In [13]:
fri_model.interval_
Out[13]:
If you want to print out the relevance class use the print_interval_with_class() function.
In [14]:
print(fri_model.print_interval_with_class())
The bounds are grouped in 2d sublists for each feature.
To acess the relevance bounds for feature 2 we would use
In [15]:
fri_model.interval_[2]
Out[15]:
The relevance classes are saved in the corresponding variable relevance_classes_:
In [16]:
fri_model.relevance_classes_
Out[16]:
2 denotes strongly relevant features, 1 weakly relevant and 0 irrelevant.
In [17]:
# Import plot function
from fri.plot import plot_relevance_bars
import matplotlib.pyplot as plt
%matplotlib inline
# Create new figure, where we can put an axis on
fig, ax = plt.subplots(1, 1,figsize=(6,3))
# plot the bars on the axis, colored according to fri
out = plot_relevance_bars(ax,fri_model.interval_,classes=fri_model.relevance_classes_)
In [18]:
preset = {}
In [19]:
preset[2] = fri_model.interval_[2, 0]
We use the function constrained_intervals.
Note: we need to fit the model before we can use this function. We already did that, so we are fine.
In [20]:
const_ints = fri_model.constrained_intervals(preset=preset)
In [21]:
const_ints
Out[21]:
Feature 3 is set to its minimum (at 0).
How does it look visually?
In [22]:
fig, ax = plt.subplots(1, 1,figsize=(6,3))
out = plot_relevance_bars(ax, const_ints)
Feature 3 is reduced to its minimum (no contribution).
In turn, its correlated partner feature 4 had to take its maximum contribution.
In [23]:
fri_model = fri.FRI(fri.ProblemName.CLASSIFICATION, verbose=True, random_state=STATE)
In [24]:
fri_model.fit(X_scaled,y)
Out[24]:
This prints out the parameters of the baseline model
One can also see the best selected hyperparameter according to gridsearch and the training score of the model in score.
In [25]:
fri_model = fri.FRI(fri.ProblemName.CLASSIFICATION,
n_jobs=-1,
verbose=1,
random_state=STATE)
In [26]:
fri_model.fit(X_scaled,y)
Out[26]: