Note
Go to the end to download the full example code. or to run this example in your browser via JupyterLite or Binder
Displaying Pipelines#The default configuration for displaying a pipeline in a Jupyter Notebook is 'diagram'
where set_config(display='diagram')
. To deactivate HTML representation, use set_config(display='text')
.
To see more detailed steps in the visualization of the pipeline, click on the steps in the pipeline.
# Authors: The scikit-learn developers # SPDX-License-Identifier: BSD-3-ClauseDisplaying a Pipeline with a Preprocessing Step and Classifier#
This section constructs a Pipeline
with a preprocessing step, StandardScaler
, and classifier, LogisticRegression
, and displays its visual representation.
To visualize the diagram, the default is display='diagram'
.
set_config(display="diagram") pipe # click on the diagram below to see the details of each step
Pipeline(steps=[('preprocessing', StandardScaler()), ('classifier', LogisticRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
To view the text pipeline, change to display='text'
.
Pipeline(steps=[('preprocessing', StandardScaler()), ('classifier', LogisticRegression())])
Put back the default display
Displaying a Pipeline Chaining Multiple Preprocessing Steps & Classifier#This section constructs a Pipeline
with multiple preprocessing steps, PolynomialFeatures
and StandardScaler
, and a classifier step, LogisticRegression
, and displays its visual representation.
Pipeline(steps=[('standard_scaler', StandardScaler()), ('polynomial', PolynomialFeatures(degree=3)), ('classifier', LogisticRegression(C=2.0))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
This section constructs a Pipeline
with a dimensionality reduction step, PCA
, a classifier, SVC
, and displays its visual representation.
from sklearn.decomposition import PCA from sklearn.pipeline import Pipeline from sklearn.svm import SVC steps = [("reduce_dim", PCA(n_components=4)), ("classifier", SVC(kernel="linear"))] pipe = Pipeline(steps) pipe # click on the diagram below to see the details of each step
Pipeline(steps=[('reduce_dim', PCA(n_components=4)), ('classifier', SVC(kernel='linear'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
This section constructs a complex Pipeline
with a ColumnTransformer
and a classifier, LogisticRegression
, and displays its visual representation.
import numpy as np from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline, make_pipeline from sklearn.preprocessing import OneHotEncoder, StandardScaler numeric_preprocessor = Pipeline( steps=[ ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")), ("scaler", StandardScaler()), ] ) categorical_preprocessor = Pipeline( steps=[ ( "imputation_constant", SimpleImputer(fill_value="missing", strategy="constant"), ), ("onehot", OneHotEncoder(handle_unknown="ignore")), ] ) preprocessor = ColumnTransformer( [ ("categorical", categorical_preprocessor, ["state", "gender"]), ("numerical", numeric_preprocessor, ["age", "weight"]), ] ) pipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500)) pipe # click on the diagram below to see the details of each step
Pipeline(steps=[('columntransformer', ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])), ('logisticregression', LogisticRegression(max_iter=500))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
columntransformer: ColumnTransformer
Parameters transformers [('categorical', ...), ('numerical', ...)] remainder 'drop' sparse_threshold 0.3 n_jobs None transformer_weights None verbose False verbose_feature_names_out True force_int_remainder_cols 'deprecated' Parameters missing_values nan strategy 'constant' fill_value 'missing' copy True add_indicator False keep_empty_features False Parameters categories 'auto' drop None sparse_output True dtype <class 'numpy.float64'> handle_unknown 'ignore' min_frequency None max_categories None feature_name_combiner 'concat' Parameters missing_values nan strategy 'mean' fill_value None copy True add_indicator False keep_empty_features False Parameters copy True with_mean True with_std True Parameters penalty 'l2' dual False tol 0.0001 C 1.0 fit_intercept True intercept_scaling 1 class_weight None random_state None solver 'lbfgs' max_iter 500 multi_class 'deprecated' verbose 0 warm_start False n_jobs None l1_ratio None Displaying a Grid Search over a Pipeline with a Classifier#This section constructs a GridSearchCV
over a Pipeline
with RandomForestClassifier
and displays its visual representation.
import numpy as np from sklearn.compose import ColumnTransformer from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline, make_pipeline from sklearn.preprocessing import OneHotEncoder, StandardScaler numeric_preprocessor = Pipeline( steps=[ ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")), ("scaler", StandardScaler()), ] ) categorical_preprocessor = Pipeline( steps=[ ( "imputation_constant", SimpleImputer(fill_value="missing", strategy="constant"), ), ("onehot", OneHotEncoder(handle_unknown="ignore")), ] ) preprocessor = ColumnTransformer( [ ("categorical", categorical_preprocessor, ["state", "gender"]), ("numerical", numeric_preprocessor, ["age", "weight"]), ] ) pipe = Pipeline( steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier())] ) param_grid = { "classifier__n_estimators": [200, 500], "classifier__max_features": ["auto", "sqrt", "log2"], "classifier__max_depth": [4, 5, 6, 7, 8], "classifier__criterion": ["gini", "entropy"], } grid_search = GridSearchCV(pipe, param_grid=param_grid, n_jobs=1) grid_search # click on the diagram below to see the details of each step
GridSearchCV(estimator=Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('categorical', Pipeline(steps=[('imputation_constant', SimpleImputer(fill_value='missing', strategy='constant')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]), ['state', 'gender']), ('numerical', Pipeline(steps=[('imputation_mean', SimpleImputer()), ('scaler', StandardScaler())]), ['age', 'weight'])])), ('classifier', RandomForestClassifier())]), n_jobs=1, param_grid={'classifier__criterion': ['gini', 'entropy'], 'classifier__max_depth': [4, 5, 6, 7, 8], 'classifier__max_features': ['auto', 'sqrt', 'log2'], 'classifier__n_estimators': [200, 500]})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
preprocessor: ColumnTransformer
Parameters transformers [('categorical', ...), ('numerical', ...)] remainder 'drop' sparse_threshold 0.3 n_jobs None transformer_weights None verbose False verbose_feature_names_out True force_int_remainder_cols 'deprecated' Parameters missing_values nan strategy 'constant' fill_value 'missing' copy True add_indicator False keep_empty_features False Parameters categories 'auto' drop None sparse_output True dtype <class 'numpy.float64'> handle_unknown 'ignore' min_frequency None max_categories None feature_name_combiner 'concat' Parameters missing_values nan strategy 'mean' fill_value None copy True add_indicator False keep_empty_features False Parameters copy True with_mean True with_std True Parameters n_estimators 100 criterion 'gini' max_depth None min_samples_split 2 min_samples_leaf 1 min_weight_fraction_leaf 0.0 max_features 'sqrt' max_leaf_nodes None min_impurity_decrease 0.0 bootstrap True oob_score False n_jobs None random_state None verbose 0 warm_start False class_weight None ccp_alpha 0.0 max_samples None monotonic_cst NoneTotal running time of the script: (0 minutes 0.124 seconds)
Related examples
Gallery generated by Sphinx-Gallery
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4