Single decision trees generally overfit, leading to poor predictive performance. Tree ensembles (RF, GBM) perform well, but are black-box models. In this notebook, we investigate whether smoothing the predictions in decision trees can produce a traceable, white-box model with improved predicive accuracy.

In a smoothed regression tree, the node values, $s_n$, will be as follows:

$$ s_n = \begin{cases} y_n & \text{n is root}\\ \frac{w_n y_n + v_{ss} s_p}{w_n + v_{ss}} & \text{otherwise} \end{cases} $$Where $y_n$ is the mean of the targets of the node n, $s_n$ is the smoothed node value of node n, $s_p$ is the smoothed value of the parent of node n, $v_{ss}$ is the virtual sample size (a free parameter of the model), and $w_n$ is the total weight of data in node n, or the count if the tree is unweighted.

Smoothed classification trees are similar, but operate on class probabilities instead of on the mean of the targets.

```
In [3]:
```from arboretum.datasets import load_diabetes
xtr, ytr, xte, yte = load_diabetes()
xtr.shape, xte.shape

```
Out[3]:
```

```
In [4]:
```from sklearn.metrics import mean_squared_error as mse
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from arboretum import SmoothRegressionTree
from sklearn.model_selection import GridSearchCV
dtr = DecisionTreeRegressor(min_samples_leaf=5)
rf = RandomForestRegressor(n_estimators=100, min_samples_leaf=5)
mytree = SmoothRegressionTree(vss= 5, min_leaf=5)

```
In [5]:
```dtr.fit(xtr, ytr)
pred = dtr.predict(xte)
mse(yte, pred)

```
Out[5]:
```

```
In [6]:
```mytree.fit(xtr, ytr)
pred = mytree.predict(xte)
mse(yte, pred)

```
Out[6]:
```

```
In [7]:
```rf.fit(xtr, ytr)
pred = rf.predict(xte)
mse(yte, pred)

```
Out[7]:
```

```
In [8]:
```params = {'min_samples_leaf':[5, 10, 20, 50, 100], 'max_depth':[2,4,8,16, None]}
gcv = GridSearchCV(dtr, params, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
gcv.best_score_, gcv.best_params_

```
Out[8]:
```

`GridSearchCV`

maximizes the scoring function. And on the test set, that estimator gets:

```
In [9]:
```pred = gcv.predict(xte)
mse(yte, pred)

```
Out[9]:
```

`vss`

and `min_leaf`

set and then with `vss`

and `max_depth`

.

```
In [10]:
```myparams = {'min_leaf':[5, 10, 20, 50, 100], 'vss':[5, 10, 20, 50, 100]}
gcv = GridSearchCV(mytree, myparams, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
mypred = gcv.predict(xte)
mse(yte, mypred), gcv.best_score_, gcv.best_params_

```
Out[10]:
```

That's about the same. Next we'll try it with the other parameter set:

```
In [11]:
```myparams = {'max_depth':[2,4,8,16, None], 'vss':[5, 10, 20, 50, 100]}
gcv = GridSearchCV(mytree, myparams, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
mypred = gcv.predict(xte)
mse(yte, mypred), gcv.best_score_, gcv.best_params_

```
Out[11]:
```

```
In [14]:
```from arboretum.datasets import load_als
xtr, ytr, xte, yte = load_als()
xtr.shape, xte.shape

```
Out[14]:
```

```
In [15]:
```mse(yte, 0 * yte + ytr.mean())

```
Out[15]:
```

```
In [16]:
```dtr.fit(xtr, ytr)
pred = dtr.predict(xte)
mse(yte, pred)

```
Out[16]:
```

```
In [17]:
```mytree.fit(xtr, ytr)
pred = mytree.predict(xte)
mse(yte, pred)

```
Out[17]:
```

```
In [18]:
```rf.n_estimators = 100
rf.fit(xtr, ytr)
pred = rf.predict(xte)
mse(yte, pred)

```
Out[18]:
```

`arboretum.SmoothRegressionTree`

. For this noisy data, much higher smoothing values are better.

```
In [19]:
```mytree.vss = 100
pred = mytree.predict(xte)
mse(yte, pred)

```
Out[19]:
```

`vss`

and one other control parameter to a regular tree with two control parameters.

```
In [24]:
```params = {'min_samples_leaf':[5, 10, 20, 50, 100, 200, 400], 'max_depth':[2,4,8,16, None]}
gcv = GridSearchCV(dtr, params, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
pred = gcv.predict(xte)
mse(yte, pred), gcv.best_score_, gcv.best_params_

```
Out[24]:
```

```
In [26]:
```myparams = {'min_leaf':[5, 10, 20, 50, 100], 'vss':[5, 10, 20, 50, 100, 200, 400]}
gcv = GridSearchCV(mytree, myparams, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
mypred = gcv.predict(xte)
mse(yte, mypred), gcv.best_score_, gcv.best_params_

```
Out[26]:
```

```
In [27]:
```myparams = {'max_depth':[2,4,8,16, None], 'vss':[5, 10, 20, 50, 100, 200, 400]}
gcv = GridSearchCV(mytree, myparams, 'neg_mean_squared_error')
gcv.fit(xtr, ytr)
mypred = gcv.predict(xte)
mse(yte, mypred), gcv.best_score_, gcv.best_params_

```
Out[27]:
```

```
In [29]:
```from arboretum import RFRegressor
myrf = RFRegressor()
myrf.base_estimator = mytree
myrf.fit(xtr[:10], ytr[:10])

```
Out[29]:
```

```
In [30]:
```myrf.n_trees = 100
myrf.fit(xtr, ytr)
pred = myrf.predict(xte)
mse(yte, pred)

```
Out[30]:
```

From before we got:

```
In [31]:
```rf.fit(xtr, ytr)
pred = rf.predict(xte)
mse(yte, pred)

```
Out[31]:
```

So it looks like the smoothing doesn't help any in an RF model.

Smoothing trees initially looked promising, however, closer investigation indicates that the initial positive results were due to having better control of overfitting due to having an extra control parameter. In comparing two-parameter models after a grid search, smoothing trees were not better than regular trees. In summary, well-tuned decision trees perform as well as smoothing trees, given the same number of control paramters.