Help on DecisionTreeClassifier in module sklearn.tree.tree object:
class DecisionTreeClassifier(BaseDecisionTree, sklearn.base.ClassifierMixin)
| A decision tree classifier.
| Parameters
| ----------
| criterion : string, optional (default="gini")
| The function to measure the quality of a split. Supported criteria are
| "gini" for the Gini impurity and "entropy" for the information gain.
| splitter : string, optional (default="best")
| The strategy used to choose the split at each node. Supported
| strategies are "best" to choose the best split and "random" to choose
| the best random split.
| max_features : int, float, string or None, optional (default=None)
| The number of features to consider when looking for the best split:
| - If int, then consider `max_features` features at each split.
| - If float, then `max_features` is a percentage and
| `int(max_features * n_features)` features are considered at each
| split.
| - If "auto", then `max_features=sqrt(n_features)`.
| - If "sqrt", then `max_features=sqrt(n_features)`.
| - If "log2", then `max_features=log2(n_features)`.
| - If None, then `max_features=n_features`.
| Note: the search for a split does not stop until at least one
| valid partition of the node samples is found, even if it requires to
| effectively inspect more than ``max_features`` features.
| max_depth : int or None, optional (default=None)
| The maximum depth of the tree. If None, then nodes are expanded until
| all leaves are pure or until all leaves contain less than
| min_samples_split samples.
| Ignored if ``max_leaf_nodes`` is not None.
| min_samples_split : int, optional (default=2)
| The minimum number of samples required to split an internal node.
| min_samples_leaf : int, optional (default=1)
| The minimum number of samples required to be at a leaf node.
| min_weight_fraction_leaf : float, optional (default=0.)
| The minimum weighted fraction of the input samples required to be at a
| leaf node.
| max_leaf_nodes : int or None, optional (default=None)
| Grow a tree with ``max_leaf_nodes`` in best-first fashion.
| Best nodes are defined as relative reduction in impurity.
| If None then unlimited number of leaf nodes.
| If not None then ``max_depth`` will be ignored.
| class_weight : dict, list of dicts, "auto" or None, optional (default=None)
| Weights associated with classes in the form ``{class_label: weight}``.
| If not given, all classes are supposed to have weight one. For
| multi-output problems, a list of dicts can be provided in the same
| order as the columns of y.
| The "auto" mode uses the values of y to automatically adjust
| weights inversely proportional to class frequencies in the input data.
| For multi-output, the weights of each column of y will be multiplied.
| Note that these weights will be multiplied with sample_weight (passed
| through the fit method) if sample_weight is specified.
| random_state : int, RandomState instance or None, optional (default=None)
| If int, random_state is the seed used by the random number generator;
| If RandomState instance, random_state is the random number generator;
| If None, the random number generator is the RandomState instance used
| by `np.random`.
| Attributes
| ----------
| tree_ : Tree object
| The underlying Tree object.
| max_features_ : int,
| The inferred value of max_features.
| classes_ : array of shape = [n_classes] or a list of such arrays
| The classes labels (single output problem),
| or a list of arrays of class labels (multi-output problem).
| n_classes_ : int or list
| The number of classes (for single output problems),
| or a list containing the number of classes for each
| output (for multi-output problems).
| feature_importances_ : array of shape = [n_features]
| The feature importances. The higher, the more important the
| feature. The importance of a feature is computed as the (normalized)
| total reduction of the criterion brought by that feature. It is also
| known as the Gini importance [4]_.
| See also
| --------
| DecisionTreeRegressor
| References
| ----------
| .. [1]
| .. [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, "Classification
| and Regression Trees", Wadsworth, Belmont, CA, 1984.
| .. [3] T. Hastie, R. Tibshirani and J. Friedman. "Elements of Statistical
| Learning", Springer, 2009.
| .. [4] L. Breiman, and A. Cutler, "Random Forests",
| Examples
| --------
| >>> from sklearn.datasets import load_iris
| >>> from sklearn.cross_validation import cross_val_score
| >>> from sklearn.tree import DecisionTreeClassifier
| >>> clf = DecisionTreeClassifier(random_state=0)
| >>> iris = load_iris()
| >>> cross_val_score(clf,,, cv=10)
| ... # doctest: +SKIP
| ...
| array([ 1. , 0.93..., 0.86..., 0.93..., 0.93...,
| 0.93..., 0.93..., 1. , 0.93..., 1. ])
| Method resolution order:
| DecisionTreeClassifier
| BaseDecisionTree
| abc.NewBase
| sklearn.base.BaseEstimator
| sklearn.feature_selection.from_model._LearntSelectorMixin
| sklearn.base.TransformerMixin
| sklearn.base.ClassifierMixin
| __builtin__.object
| Methods defined here:
| __init__(self, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, class_weight=None)
| predict_log_proba(self, X)
| Predict class log-probabilities of the input samples X.
| Parameters
| ----------
| X : array-like or sparse matrix of shape = [n_samples, n_features]
| The input samples. Internally, it will be converted to
| ``dtype=np.float32`` and if a sparse matrix is provided
| to a sparse ``csr_matrix``.
| Returns
| -------
| p : array of shape = [n_samples, n_classes], or a list of n_outputs
| such arrays if n_outputs > 1.
| The class log-probabilities of the input samples. The order of the
| classes corresponds to that in the attribute `classes_`.
| predict_proba(self, X, check_input=True)
| Predict class probabilities of the input samples X.
| The predicted class probability is the fraction of samples of the same
| class in a leaf.
| check_input : boolean, (default=True)
| Allow to bypass several input checking.
| Don't use this parameter unless you know what you do.
| Parameters
| ----------
| X : array-like or sparse matrix of shape = [n_samples, n_features]
| The input samples. Internally, it will be converted to
| ``dtype=np.float32`` and if a sparse matrix is provided
| to a sparse ``csr_matrix``.
| Returns
| -------
| p : array of shape = [n_samples, n_classes], or a list of n_outputs
| such arrays if n_outputs > 1.
| The class probabilities of the input samples. The order of the
| classes corresponds to that in the attribute `classes_`.
| ----------------------------------------------------------------------
| Data and other attributes defined here:
| __abstractmethods__ = frozenset([])
| ----------------------------------------------------------------------
| Methods inherited from BaseDecisionTree:
| fit(self, X, y, sample_weight=None, check_input=True)
| Build a decision tree from the training set (X, y).
| Parameters
| ----------
| X : array-like or sparse matrix, shape = [n_samples, n_features]
| The training input samples. Internally, it will be converted to
| ``dtype=np.float32`` and if a sparse matrix is provided
| to a sparse ``csc_matrix``.
| y : array-like, shape = [n_samples] or [n_samples, n_outputs]
| The target values (class labels in classification, real numbers in
| regression). In the regression case, use ``dtype=np.float64`` and
| ``order='C'`` for maximum efficiency.
| sample_weight : array-like, shape = [n_samples] or None
| Sample weights. If None, then samples are equally weighted. Splits
| that would create child nodes with net zero or negative weight are
| ignored while searching for a split in each node. In the case of
| classification, splits are also ignored if they would result in any
| single class carrying a negative weight in either child node.
| check_input : boolean, (default=True)
| Allow to bypass several input checking.
| Don't use this parameter unless you know what you do.
| Returns
| -------
| self : object
| Returns self.
| predict(self, X, check_input=True)
| Predict class or regression value for X.
| For a classification model, the predicted class for each sample in X is
| returned. For a regression model, the predicted value based on X is
| returned.
| Parameters
| ----------
| X : array-like or sparse matrix of shape = [n_samples, n_features]
| The input samples. Internally, it will be converted to
| ``dtype=np.float32`` and if a sparse matrix is provided
| to a sparse ``csr_matrix``.
| check_input : boolean, (default=True)
| Allow to bypass several input checking.
| Don't use this parameter unless you know what you do.
| Returns
| -------
| y : array of shape = [n_samples] or [n_samples, n_outputs]
| The predicted classes, or the predict values.
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseDecisionTree:
| feature_importances_
| Return the feature importances.
| The importance of a feature is computed as the (normalized) total
| reduction of the criterion brought by that feature.
| It is also known as the Gini importance.
| Returns
| -------
| feature_importances_ : array, shape = [n_features]
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.BaseEstimator:
| __repr__(self)
| get_params(self, deep=True)
| Get parameters for this estimator.
| Parameters
| ----------
| deep: boolean, optional
| If True, will return the parameters for this estimator and
| contained subobjects that are estimators.
| Returns
| -------
| params : mapping of string to any
| Parameter names mapped to their values.
| set_params(self, **params)
| Set the parameters of this estimator.
| The method works on simple estimators as well as on nested objects
| (such as pipelines). The former have parameters of the form
| ``<component>__<parameter>`` so that it's possible to update each
| component of a nested object.
| Returns
| -------
| self
| ----------------------------------------------------------------------
| Data descriptors inherited from sklearn.base.BaseEstimator:
| __dict__
| dictionary for instance variables (if defined)
| __weakref__
| list of weak references to the object (if defined)
| ----------------------------------------------------------------------
| Methods inherited from sklearn.feature_selection.from_model._LearntSelectorMixin:
| transform(self, X, threshold=None)
| Reduce X to its most important features.
| Uses ``coef_`` or ``feature_importances_`` to determine the most
| important features. For models with a ``coef_`` for each class, the
| absolute sum over the classes is used.
| Parameters
| ----------
| X : array or scipy sparse matrix of shape [n_samples, n_features]
| The input samples.
| threshold : string, float or None, optional (default=None)
| The threshold value to use for feature selection. Features whose
| importance is greater or equal are kept while the others are
| discarded. If "median" (resp. "mean"), then the threshold value is
| the median (resp. the mean) of the feature importances. A scaling
| factor (e.g., "1.25*mean") may also be used. If None and if
| available, the object attribute ``threshold`` is used. Otherwise,
| "mean" is used by default.
| Returns
| -------
| X_r : array of shape [n_samples, n_selected_features]
| The input samples with only the selected features.
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.TransformerMixin:
| fit_transform(self, X, y=None, **fit_params)
| Fit to data, then transform it.
| Fits transformer to X and y with optional parameters fit_params
| and returns a transformed version of X.
| Parameters
| ----------
| X : numpy array of shape [n_samples, n_features]
| Training set.
| y : numpy array of shape [n_samples]
| Target values.
| Returns
| -------
| X_new : numpy array of shape [n_samples, n_features_new]
| Transformed array.
| ----------------------------------------------------------------------
| Methods inherited from sklearn.base.ClassifierMixin:
| score(self, X, y, sample_weight=None)
| Returns the mean accuracy on the given test data and labels.
| In multi-label classification, this is the subset accuracy
| which is a harsh metric since you require for each sample that
| each label set be correctly predicted.
| Parameters
| ----------
| X : array-like, shape = (n_samples, n_features)
| Test samples.
| y : array-like, shape = (n_samples) or (n_samples, n_outputs)
| True labels for X.
| sample_weight : array-like, shape = [n_samples], optional
| Sample weights.
| Returns
| -------
| score : float
| Mean accuracy of self.predict(X) wrt. y.