A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.geeksforgeeks.org/feature-selection-in-python-with-scikit-learn/ below:

Feature Selection in Python with Scikit-Learn

Feature Selection in Python with Scikit-Learn

Last Updated : 23 Jul, 2025

Feature selection is a crucial step in the machine learning pipeline. It involves selecting the most important features from your dataset to improve model performance and reduce computational cost. In this article, we will explore various techniques for feature selection in Python using the Scikit-Learn library.

What is feature selection?

Feature selection is the process of identifying and selecting a subset of relevant features for use in model construction. The goal is to enhance the model's performance by reducing overfitting, improving accuracy, and reducing training time.

Why is Feature Selection Important?

Feature selection offers several benefits:

Types of Feature Selection Methods

Feature selection methods can be broadly classified into three categories:

Feature Selection Techniques with Scikit-Learn

Scikit-Learn provides several tools for feature selection, including:

Practical Implementation of Feature Selection with Scikit-Learn

Let's implement these feature selection techniques using Scikit-Learn.

Data Preparation:

First, let's load a dataset and split it into features and target variables.

Python
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Method 1 : Univariate Selection in Python with Scikit-Learn

We'll use SelectKBest with the chi-square test to select the top 2 features.

Python
from sklearn.feature_selection import SelectKBest, chi2

# Apply SelectKBest with chi2
select_k_best = SelectKBest(score_func=chi2, k=2)
X_train_k_best = select_k_best.fit_transform(X_train, y_train)

print("Selected features:", X_train.columns[select_k_best.get_support()])

Output:

Selected features: Index(['petal length (cm)', 'petal width (cm)'], dtype='object')
Method 2: Recursive Feature Elimination

Next, we'll use RFE with a logistic regression model to select the top 2 features.

Python
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# Apply RFE with logistic regression
model = LogisticRegression()
rfe = RFE(model, n_features_to_select=2)
X_train_rfe = rfe.fit_transform(X_train, y_train)

print("Selected features:", X_train.columns[rfe.get_support()])

Output:

Selected features: Index(['petal length (cm)', 'petal width (cm)'], dtype='object')
Method 3: Tree-Based Feature Importance

Finally, we'll use a random forest classifier to determine feature importance.

Python
from sklearn.ensemble import RandomForestClassifier

# Train random forest and get feature importances
model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_

# Display feature importances
feature_importances = pd.Series(importances, index=X_train.columns)
print(feature_importances.sort_values(ascending=False))

Output:

petal length (cm)    0.480141
petal width (cm) 0.378693
sepal length (cm) 0.092960
sepal width (cm) 0.048206
Conclusion

Feature selection is an essential part of the machine learning workflow. By selecting the most relevant features, we can build more efficient and accurate models. Scikit-Learn provides a variety of tools to help with feature selection, including univariate selection, recursive feature elimination, and feature importance from tree-based models. Implementing these techniques can significantly improve your model's performance and computational efficiency.

By following the steps outlined in this article, you can effectively perform feature selection in Python using Scikit-Learn, enhancing your machine learning projects and achieving better results.



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4