RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html below:

chi2 — scikit-learn 1.7.0 documentation

chi2#

sklearn.feature_selection.chi2(X, y)[source]#

Compute chi-squared stats between each non-negative feature and class.

This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative integer feature values such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.

If some of your features are continuous, you need to bin them, for example by using KBinsDiscretizer.

Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.

See also

f_classif: ANOVA F-value between label/feature for classification tasks.
f_regression: F-value between label/feature for regression tasks.

Notes

Complexity of this algorithm is O(n_classes * n_features).

Examples

>>> import numpy as np
>>> from sklearn.feature_selection import chi2
>>> X = np.array([[1, 1, 3],
...               [0, 1, 5],
...               [5, 4, 1],
...               [6, 6, 2],
...               [1, 4, 0],
...               [0, 0, 0]])
>>> y = np.array([1, 1, 0, 0, 2, 2])
>>> chi2_stats, p_values = chi2(X, y)
>>> chi2_stats
array([15.3,  6.5       ,  8.9])
>>> p_values
array([0.000456, 0.0387, 0.0116 ])

Gallery examples#

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4