Stay organized with collections Save and categorize content based on your preferences.
Overfitting and pruningUsing the algorithm described above, we can train a decision tree that will perfectly classify training examples, assuming the examples are separable. However, if the dataset contains noise, this tree will overfit to the data and show poor test accuracy.
The following figure shows a noisy dataset with a linear relation between a feature x and the label y. The figure also shows a decision tree trained on this dataset without any type of regularization. This model correctly predicts all the training examples (the model's prediction match the training examples). However, on a new dataset containing the same linear pattern and a different noise instance, the model would perform poorly.
Figure 12. A noisy dataset.
To limit overfitting a decision tree, apply one or both of the following regularization criteria while training the decision tree:
The following figure illustrates the effect of differing minimum number of examples per leaf. The model captures less of the noise.
Figure 13. Differing minimum number of examples per leaf.
You can also regularize after training by selectively removing (pruning) certain branches, that is, by converting certain non-leaf nodes to leaves. A common solution to select the branches to remove is to use a validation dataset. That is, if removing a branch improves the quality of the model on the validation dataset, then the branch is removed.
The following drawing illustrates this idea. Here, we test if the validation accuracy of the decision tree would be improved if the non-leaf green node was turned into a leaf; that is, pruning the orange nodes.
Figure 14. Pruning a condition and its children into a leaf.
The following figure illustrates the effect of using 20% of the dataset as validation to prune the decision tree:
Figure 15. Using 20% of the dataset to prune the decision tree.
Note that using a validation dataset reduces the number of examples available for the initial training of the decision tree.
Many model creators apply multiple criteria. For example, you could do all of the following:
In YDF, the learning algorithms are pre-configured with default values for all the pruning hyperparameters. For example, here are the default values for two pruning hyperparameters:
min_examples = 5
)validation_ratio = 0.1
).You can disable pruning with the validation dataset by setting
validation_ratio=0.0
.
Those criteria introduce new hyperparameters that need to be tuned (e.g. maximum tree depth), often with automated hyperparameter tuning. Decision trees are generally fast enough to train to use hyperparameter tuning with cross-validation. For example, on a dataset with "n" examples:
p=10
.In this section we discussed the ways decision trees limit overfitting. Despite these methods, underfitting and overfitting are major weaknesses of decision trees. Decision forests introduce new methods to limit overfitting, which we will see later.
Direct decision tree interpretationDecision trees are easily interpretable. That said, changing even a few examples can completely change the structure—and therefore the interpretation—of the decision tree.
Note: Especially when the dataset contains many somehow similar features, the learned decision tree is only one of multiple more-or-less equivalent decision trees that fit the data.Because of the way decision trees are built, partitioning the training examples, one can use a decision tree to interpret the dataset itself (as opposed to the model). Each leaf represents a particular corner of the dataset.
In YDF, you can look at trees with the
model.describe()
function. You can also access and plot individual tree with
model.get_tree()
. See
YDF's model inspection tutorialfor more details.
However, indirect interpretation is also informative.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-02-25 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-25 UTC."],[[["Decision trees are prone to overfitting, especially with noisy data, leading to poor generalization on unseen data."],["Regularization techniques like setting a maximum depth, minimum examples per leaf, and pruning can mitigate overfitting."],["Hyperparameter tuning, such as using cross-validation to optimize maximum depth and minimum examples, further enhances model performance."],["Decision trees offer direct interpretability, but their structure can be sensitive to changes in the training data."],["While interpretable, decision trees can still exhibit overfitting despite regularization efforts, a challenge addressed by more advanced models like decision forests."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4