Prove that the KL divergence $KL(p,q)$ for any distributions $p,q$ is non negative.
Hint: Show that $$\min_{q} KL(p,q) \geq 0$$
Prove that two random variables $X$ and $Y$ are independent if and only if $I(X,Y) = 0$ where $I$ is the mutual information.
Hint: In the last problem, we almost proved $$KL(p,q) = 0 \iff p = q$$ but we swear it's true
Prove that
Prove that $$I(X,Y) = H(X)$$ if $Y$ is a determinisitc, one-to-one function of $X$
Let's explore cases where decision trees perform better and worse than linear models.
Note: we're using the function 'plot_decision_regions' from the mlextend library; you can install with pip, conda or just copy and paste the function into a cell from here.
$ conda install -c rasbt mlxtend
In [1]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression, Perceptron
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.evaluate import plot_decision_regions
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
In [2]:
X = np.array([
[-1, -1],
[-1, 1],
[1, -1],
[1, 1]
])
y_and = np.array([0, 1, 1, 1])
y_xor = np.array([0, 1, 1, 0])
In [3]:
lr = LogisticRegression(C=100000)
for label, y in [('and', y_and), ('xor', y_xor)]:
lr.fit(X, y)
plot_decision_regions(X, y, lr)
plt.xlabel('x')
plt.ylabel('y')
plt.title('linear model fitting "{}"'.format(label))
plt.show()
In [4]:
tree = DecisionTreeClassifier(criterion='entropy', max_depth=2, random_state=0)
tree.fit(X, y)
for label, y in [('and', y_and), ('xor', y_xor)]:
tree.fit(X, y)
plot_decision_regions(X, y, tree)
plt.xlabel('x')
plt.ylabel('y')
plt.title('decision tree model fitting "{}"'.format(label))
plt.show()
In [5]:
from sklearn.metrics import accuracy_score
X_linear_wins = np.array([
# place 2d samples here, each value between -1 and 1
])
y_linear_wins = np.array([
# place class label 0, 1 for each 2d point here
])
# uncommment code below to test out whether your dataset is more accurately predicted by a linear model
# than a tree of depth 3.
# for label, model in [('linear', lr), ('tree', tree)]:
# model.fit(X_linear_wins, y_linear_wins)
# plot_decision_regions(X_linear_wins, y_linear_wins, model)
# plt.xlabel('x')
# plt.ylabel('y')
# title = "{}: accuracy {:.2f}".format(label, accuracy_score(y_linear_wins, model.predict(X_linear_wins)))
# plt.title(title)
# plt.show()
In [6]:
X_needs_depth = np.array([
# place 2d samples here, each value between -1 and 1
])
y_needs_depth = np.array([
# place class label 0, 1 for each 2d point here
])
# uncomment the code below to compare
# tree_d2 = DecisionTreeClassifier(criterion='entropy', max_depth=2, random_state=0)
# tree_d4 = DecisionTreeClassifier(criterion='entropy', max_depth=4, random_state=0)
# tree_d2.fit(X_needs_depth, y_needs_depth)
# plot_decision_regions(X_needs_depth, y_needs_depth, tree_d2)
# plt.title("depth 2 fit: {:.2f}".format(accuracy_score(y_needs_depth, tree_d2.predict(X_needs_depth))))
# plt.show()
# tree_d4.fit(X_needs_depth, y_needs_depth)
# plot_decision_regions(X_needs_depth, y_needs_depth, tree_d4)
# plt.title("depth 4 fit: {:.2f}".format(accuracy_score(y_needs_depth, tree_d4.predict(X_needs_depth))))
# plt.show()