A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/mlr-org/mlr3filters below:

mlr-org/mlr3filters: Filter-based feature selection for mlr3

Package website: release | dev

{mlr3filters} adds feature selection filters to mlr3. The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with mlr3pipelines and the filter operator.

Wrapper methods for feature selection are implemented in mlr3fselect. Learners which support the extraction feature importance scores can be combined with a filter from this package for embedded feature selection.

CRAN version

install.packages("mlr3filters")

Development version

remotes::install_github("mlr-org/mlr3filters")
set.seed(1)
library("mlr3")
library("mlr3filters")

task = tsk("sonar")
filter = flt("auc")
head(as.data.table(filter$calculate(task)))
##    feature     score
## 1:     V11 0.2811368
## 2:     V12 0.2429182
## 3:     V10 0.2327018
## 4:     V49 0.2312622
## 5:      V9 0.2308442
## 6:     V48 0.2062784
Name label Task Types Feature Types Package anova ANOVA F-Test Classif Integer, Numeric stats auc Area Under the ROC Curve Score Classif Integer, Numeric mlr3measures carscore Correlation-Adjusted coRrelation Score Regr Logical, Integer, Numeric care carsurvscore Correlation-Adjusted coRrelation Survival Score Surv Integer, Numeric carSurv, mlr3proba cmim Minimal Conditional Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik correlation Correlation Regr Integer, Numeric stats disr Double Input Symmetrical Relevance Classif & Regr Integer, Numeric, Factor, Ordered praznik find_correlation Correlation-based Score Universal Integer, Numeric stats importance Importance Score Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct information_gain Information Gain Classif & Regr Integer, Numeric, Factor, Ordered FSelectorRcpp jmi Joint Mutual Information Classif & Regr Integer, Numeric, Factor, Ordered praznik jmim Minimal Joint Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik kruskal_test Kruskal-Wallis Test Classif Integer, Numeric stats mim Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik mrmr Minimum Redundancy Maximal Relevancy Classif & Regr Integer, Numeric, Factor, Ordered praznik njmim Minimal Normalised Joint Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik performance Predictive Performance Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct permutation Permutation Score Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct relief RELIEF Classif & Regr Integer, Numeric, Factor, Ordered FSelectorRcpp selected_features Embedded Feature Selection Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct univariate_cox Univariate Cox Survival Score Surv Integer, Numeric, Logical survival variance Variance Universal Integer, Numeric stats Variable Importance Filters

The following learners allow the extraction of variable importance and therefore are supported by FilterImportance:

## [1] "classif.featureless" "classif.ranger"      "classif.rpart"      
## [4] "classif.xgboost"     "regr.featureless"    "regr.ranger"        
## [7] "regr.rpart"          "regr.xgboost"

If your learner is not listed here but capable of extracting variable importance from the fitted model, the reason is most likely that it is not yet integrated in the package mlr3learners or the extra learner extension. Please open an issue so we can add your package.

Some learners need to have their variable importance measure “activated” during learner creation. For example, to use the “impurity” measure of Random Forest via the {ranger} package:

task = tsk("iris")
lrn = lrn("classif.ranger", seed = 42)
lrn$param_set$values = list(importance = "impurity")

filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)
##         feature     score
## 1: Petal.Length 44.682462
## 2:  Petal.Width 43.113031
## 3: Sepal.Length  9.039099

FilterPerformance is a univariate filter method which calls resample() with every predictor variable in the dataset and ranks the final outcome using the supplied measure. Any learner can be passed to this filter with classif.rpart being the default. Of course, also regression learners can be passed if the task is of type “regr”.

Filter-based Feature Selection

In many cases filtering is only one step in the modeling pipeline. To select features based on filter values, one can use PipeOpFilter from mlr3pipelines.

library(mlr3pipelines)
task = tsk("spam")

# the `filter.frac` should be tuned
graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
  po("learner", lrn("classif.rpart"))

learner = as_learner(graph)
rr = resample(task, learner, rsmp("holdout"))

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4