A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/machine-learning/decision-forests/binary-classification below:

Exact splitter for binary classification with numerical features | Machine Learning

Exact splitter for binary classification with numerical features

Stay organized with collections Save and categorize content based on your preferences.

In this unit, you'll explore the simplest and most common splitter algorithm, which creates conditions of the form $\mathrm{feature}_i \geq t$ in the following setting:

Assume a set of $n$ examples with a numerical feature and a binary label "orange" and "blue". Formally, let's describe the dataset $D$ as:

$$D = \{(x_i,y_i)\}_{i\in[1,n]}$$

where:

Our objective is to find a threshold value $t$ (threshold) such that dividing the examples $D$ into the groups $T(rue)$ and $F(alse)$ according to the $x_i \geq t$ condition improves the separation of the labels; for example, more "orange" examples in $T$ and more "blue" examples in $F$.

Shannon entropy is a measure of disorder. For a binary label:

Figure 8. Three different entropy levels.

Formally, we want to find a condition that decreases the weighted sum of the entropy of the label distributions in $T$ and $F$. The corresponding score is the information gain, which is the difference between $D$'s entropy and {$T$,$F$} entropy. This difference is called the information gain.

The following figure shows a bad split, in which the entropy remains high and the information gain low:

Figure 9. A bad split does not reduce the entropy of the label.

By contrast, the following figure shows a better split in which the entropy becomes low (and the information gain high):

Figure 10. A good split reduces the entropy of the label.

Formally:

\[\begin{eqnarray} T & = & \{ (x_i,y_i) | (x_i,y_i) \in D \ \textrm{with} \ x_i \ge t \} \\[12pt] F & = & \{ (x_i,y_i) | (x_i,y_i) \in D \ \textrm{with} \ x_i \lt t \} \\[12pt] R(X) & = & \frac{\lvert \{ x | x \in X \ \textrm{and} \ x = \mathrm{pos} \} \rvert}{\lvert X \rvert} \\[12pt] H(X) & = & - p \log p - (1 - p) \log (1-p) \ \textrm{with} \ p = R(X)\\[12pt] IG(D,T,F) & = & H(D) - \frac {\lvert T\rvert} {\lvert D \rvert } H(T) - \frac {\lvert F \rvert} {\lvert D \rvert } H(F) \end{eqnarray}\]

where:

In the following example, we consider a binary classification dataset with a single numerical feature $x$. The following figure shows for different threshold $t$ values (x-axis):

  1. The histogram of the feature $x$.
  2. The ratio of "blue" examples in the $D$, $T$, and $F$ sets according to the threshold value.
  3. The entropy in $D$, $T$ and $F$.
  4. The information gain; that is, the entropy delta between $D$ and {$T$,$F$} weighted by the number of examples.

Figure 11. Four threshold plots.

These plots show the following:

The candidate values for $t$ in the set of real numbers ($\mathbb{R}$) are infinite. However, given a finite number of examples, only a finite number of divisions of $D$ into $T$ and $F$ exist. Therefore, only a finite number of values of $t$ are meaningful to test.

A classical approach is to sort the values xi in increasing order xs(i) such that:

$$ x_{s(i)} \leq x_{s(i+1)} $$

Then, test $t$ for every value halfway between consecutive sorted values of $x_i$. For example, suppose you have 1,000 floating-point values of a particular feature. After sorting, suppose the first two values are 8.5 and 8.7. In this case, the first threshold value to test would be 8.6.

Formally, we consider the following candidate values for t:

$$ X = \left\{ \frac {x_{s(i)} + x_{s(i + 1)}} {2} \lvert x_{s(i)} \ne x_{s(i+1)} \right\} $$

The time complexity of this algorithm is $\mathcal{O} ( n \log n) $ with $n$ the number of examples in the node (because of the sorting of the feature values). When applied on a decision tree, the splitter algorithm is applied to each node and each feature. Note that each node receives ~1/2 of its parent examples. Therefore, according to the master theorem, the time complexity of training a decision tree with this splitter is:

$$ \mathcal{O} ( m n \log^2 n ) $$

where:

In this algorithm, the value of the features do not matter; only the order matters. For this reason, this algorithm works independently of the scale or the distribution of the feature values. This is why we do not need to normalize or scale the numerical features when training a decision tree.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-02-25 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-25 UTC."],[[["The simplest splitter algorithm for decision trees uses a threshold on a numerical feature to divide data into two groups, aiming to improve label separation."],["Information gain, based on Shannon entropy, quantifies the improvement in label separation achieved by a split, with higher values indicating better splits."],["The algorithm efficiently finds the optimal threshold by sorting feature values and testing midpoints between consecutive values, with a time complexity of O(n log n)."],["Decision tree training with this splitter, applied to each node and feature, has a time complexity of O(m n log² n), where 'm' is the number of features and 'n' is the number of training examples."],["This algorithm is insensitive to the scale or distribution of feature values, eliminating the need for feature normalization or scaling before training."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4