RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://scikit-learn.org/dev/modules/../auto_examples/ensemble/plot_voting_decision_regions.html below:

Visualizing the probabilistic predictions of a VotingClassifier — scikit-learn 1.8.dev0 documentation

Note

Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder

Visualizing the probabilistic predictions of a VotingClassifier#

Plot the predicted class probabilities in a toy dataset predicted by three different classifiers and averaged by the VotingClassifier.

First, three linear classifiers are initialized. Two are spline models with interaction terms, one using constant extrapolation and the other using periodic extrapolation. The third classifier is a Nystroem with the default “rbf” kernel.

In the first part of this example, these three classifiers are used to demonstrate soft-voting using VotingClassifier with weighted average. We set weights=[2, 1, 3], meaning the constant extrapolation spline model’s predictions are weighted twice as much as the periodic spline model’s, and the Nystroem model’s predictions are weighted three times as much as the periodic spline.

The second part demonstrates how soft predictions can be converted into hard predictions.

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clause

We first generate a noisy XOR dataset, which is a binary classification task.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import ListedColormap

n_samples = 500
rng = np.random.default_rng(0)
feature_names = ["Feature #0", "Feature #1"]
common_scatter_plot_params = dict(
    cmap=ListedColormap(["tab:red", "tab:blue"]),
    edgecolor="white",
    linewidth=1,
)

xor = pd.DataFrame(
    np.random.RandomState(0).uniform(low=-1, high=1, size=(n_samples, 2)),
    columns=feature_names,
)
noise = rng.normal(loc=0, scale=0.1, size=(n_samples, 2))
target_xor = np.logical_xor(
    xor["Feature #0"] + noise[:, 0] > 0, xor["Feature #1"] + noise[:, 1] > 0
)

X = xor[feature_names]
y = target_xor.astype(np.int32)

fig, ax = plt.subplots()
ax.scatter(X["Feature #0"], X["Feature #1"], c=y, **common_scatter_plot_params)
ax.set_title("The XOR dataset")
plt.show()

Due to the inherent non-linear separability of the XOR dataset, tree-based models would often be preferred. However, appropriate feature engineering combined with a linear model can yield effective results, with the added benefit of producing better-calibrated probabilities for samples located in the transition regions affected by noise.

We define and fit the models on the whole dataset.

from sklearn.ensemble import VotingClassifier
from sklearn.kernel_approximation import Nystroem
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, SplineTransformer, StandardScaler

clf1 = make_pipeline(
    SplineTransformer(degree=2, n_knots=2),
    PolynomialFeatures(interaction_only=True),
    LogisticRegression(C=10),
)
clf2 = make_pipeline(
    SplineTransformer(
        degree=2,
        n_knots=4,
        extrapolation="periodic",
        include_bias=True,
    ),
    PolynomialFeatures(interaction_only=True),
    LogisticRegression(C=10),
)
clf3 = make_pipeline(
    StandardScaler(),
    Nystroem(gamma=2, random_state=0),
    LogisticRegression(C=10),
)
weights = [2, 1, 3]
eclf = VotingClassifier(
    estimators=[
        ("constant splines model", clf1),
        ("periodic splines model", clf2),
        ("nystroem model", clf3),
    ],
    voting="soft",
    weights=weights,
)

clf1.fit(X, y)
clf2.fit(X, y)
clf3.fit(X, y)
eclf.fit(X, y)

VotingClassifier(estimators=[('constant splines model',
                              Pipeline(steps=[('splinetransformer',
                                               SplineTransformer(degree=2,
                                                                 n_knots=2)),
                                              ('polynomialfeatures',
                                               PolynomialFeatures(interaction_only=True)),
                                              ('logisticregression',
                                               LogisticRegression(C=10))])),
                             ('periodic splines model',
                              Pipeline(steps=[('splinetransformer',
                                               SplineTransformer(degree=2,
                                                                 extrapolation='periodic',
                                                                 n_knots=4)),
                                              ('polynomialfeatures',
                                               PolynomialFeatures(interaction_only=True)),
                                              ('logisticregression',
                                               LogisticRegression(C=10))])),
                             ('nystroem model',
                              Pipeline(steps=[('standardscaler',
                                               StandardScaler()),
                                              ('nystroem',
                                               Nystroem(gamma=2,
                                                        random_state=0)),
                                              ('logisticregression',
                                               LogisticRegression(C=10))]))],
                 voting='soft', weights=[2, 1, 3])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. Parameters estimators [('constant splines model', ...), ('periodic splines model', ...), ...] voting 'soft' weights [2, 1, ...] n_jobs None flatten_transform True verbose False Parameters n_knots 2 degree 2 knots 'uniform' extrapolation 'constant' include_bias True order 'C' handle_missing 'error' sparse_output False Parameters degree 2 interaction_only True include_bias True order 'C' Parameters penalty 'l2' dual False tol 0.0001 C 10 fit_intercept True intercept_scaling 1 class_weight None random_state None solver 'lbfgs' max_iter 100 multi_class 'deprecated' verbose 0 warm_start False n_jobs None l1_ratio None Parameters n_knots 4 degree 2 knots 'uniform' extrapolation 'periodic' include_bias True order 'C' handle_missing 'error' sparse_output False Parameters degree 2 interaction_only True include_bias True order 'C' Parameters penalty 'l2' dual False tol 0.0001 C 10 fit_intercept True intercept_scaling 1 class_weight None random_state None solver 'lbfgs' max_iter 100 multi_class 'deprecated' verbose 0 warm_start False n_jobs None l1_ratio None Parameters copy True with_mean True with_std True Parameters kernel 'rbf' gamma 2 coef0 None degree None kernel_params None n_components 100 random_state 0 n_jobs None Parameters penalty 'l2' dual False tol 0.0001 C 10 fit_intercept True intercept_scaling 1 class_weight None random_state None solver 'lbfgs' max_iter 100 multi_class 'deprecated' verbose 0 warm_start False n_jobs None l1_ratio None

Finally we use DecisionBoundaryDisplay to plot the predicted probabilities. By using a diverging colormap (such as "RdBu"), we can ensure that darker colors correspond to predict_proba close to either 0 or 1, and white corresponds to predict_proba of 0.5.

from itertools import product

from sklearn.inspection import DecisionBoundaryDisplay

fig, axarr = plt.subplots(2, 2, sharex="col", sharey="row", figsize=(10, 8))
for idx, clf, title in zip(
    product([0, 1], [0, 1]),
    [clf1, clf2, clf3, eclf],
    [
        "Splines with\nconstant extrapolation",
        "Splines with\nperiodic extrapolation",
        "RBF Nystroem",
        "Soft Voting",
    ],
):
    disp = DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        response_method="predict_proba",
        plot_method="pcolormesh",
        cmap="RdBu",
        alpha=0.8,
        ax=axarr[idx[0], idx[1]],
    )
    axarr[idx[0], idx[1]].scatter(
        X["Feature #0"],
        X["Feature #1"],
        c=y,
        **common_scatter_plot_params,
    )
    axarr[idx[0], idx[1]].set_title(title)
    fig.colorbar(disp.surface_, ax=axarr[idx[0], idx[1]], label="Probability estimate")

plt.show()

As a sanity check, we can verify for a given sample that the probability predicted by the VotingClassifier is indeed the weighted average of the individual classifiers’ soft-predictions.

In the case of binary classification such as in the present example, the predict_proba arrays contain the probability of belonging to class 0 (here in red) as the first entry, and the probability of belonging to class 1 (here in blue) as the second entry.

test_sample = pd.DataFrame({"Feature #0": [-0.5], "Feature #1": [1.5]})
predict_probas = [est.predict_proba(test_sample).ravel() for est in eclf.estimators_]
for (est_name, _), est_probas in zip(eclf.estimators, predict_probas):
    print(f"{est_name}'s predicted probabilities: {est_probas}")

constant splines model's predicted probabilities: [0.11272662 0.88727338]
periodic splines model's predicted probabilities: [0.99726573 0.00273427]
nystroem model's predicted probabilities: [0.3185838 0.6814162]

print(
    "Weighted average of soft-predictions: "
    f"{np.dot(weights, predict_probas) / np.sum(weights)}"
)

Weighted average of soft-predictions: [0.3630784 0.6369216]

We can see that manual calculation of predicted probabilities above is equivalent to that produced by the VotingClassifier:

print(
    "Predicted probability of VotingClassifier: "
    f"{eclf.predict_proba(test_sample).ravel()}"
)

Predicted probability of VotingClassifier: [0.3630784 0.6369216]

To convert soft predictions into hard predictions when weights are provided, the weighted average predicted probabilities are computed for each class. Then, the final class label is then derived from the class label with the highest average probability, which corresponds to the default threshold at predict_proba=0.5 in the case of binary classification.

print(
    "Class with the highest weighted average of soft-predictions: "
    f"{np.argmax(np.dot(weights, predict_probas) / np.sum(weights))}"
)

Class with the highest weighted average of soft-predictions: 1

This is equivalent to the output of VotingClassifier’s predict method:

print(f"Predicted class of VotingClassifier: {eclf.predict(test_sample).ravel()}")

Predicted class of VotingClassifier: [1]

Soft votes can be thresholded as for any other probabilistic classifier. This allows you to set a threshold probability at which the positive class will be predicted, instead of simply selecting the class with the highest predicted probability.

from sklearn.model_selection import FixedThresholdClassifier

eclf_other_threshold = FixedThresholdClassifier(
    eclf, threshold=0.7, response_method="predict_proba"
).fit(X, y)
print(
    "Predicted class of thresholded VotingClassifier: "
    f"{eclf_other_threshold.predict(test_sample)}"
)

Predicted class of thresholded VotingClassifier: [0]

Total running time of the script: (0 minutes 0.830 seconds)

Related examples

Gallery generated by Sphinx-Gallery

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4