Serializable
, org.apache.spark.internal.Logging
, SelectorParams
, Params
, HasFeaturesCol
, HasLabelCol
, HasOutputCol
, DefaultParamsWritable
, Identifiable
, MLWritable
Chi-Squared feature selection, which selects categorical features to use for predicting a categorical label. The selector supports different selection methods: numTopFeatures
, percentile
, fpr
, fdr
, fwe
. - numTopFeatures
chooses a fixed number of top features according to a chi-squared test. - percentile
is similar but chooses a fraction of all features instead of a fixed number. - fpr
chooses all features whose p-value are below a threshold, thus controlling the false positive rate of selection. - fdr
uses the [Benjamini-Hochberg procedure] (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold. - fwe
chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection. By default, the selection method is numTopFeatures
, with the default number of top features set to 50.
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructors
Deprecated.
Creates a copy of this instance with the same UID and some extra params.
The upper bound of the expected false discovery rate.
Param for features column name.
Deprecated.
Fits a model to the input data.
The highest p-value for features to be kept.
The upper bound of the expected family-wise error rate.
Param for label column name.
Number of features that selector will select, ordered by ascending p-value.
Param for output column name.
Percentile of features that selector will select, ordered by ascending p-value.
Deprecated.
Check transform validity and derive the output schema from the input schema.
Deprecated.
An immutable unique ID for the object and its derivatives.
Methods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritablesave
Methods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
Deprecated.
public ChiSqSelector()
Deprecated.
Deprecated.
Deprecated.
Deprecated.
An immutable unique ID for the object and its derivatives.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Deprecated.
Fits a model to the input data.
dataset
- (undocumented)
Deprecated.
Check transform validity and derive the output schema from the input schema.
We check validity for interactions between parameters during transformSchema
and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
schema
- (undocumented)
Deprecated.
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy()
.
The upper bound of the expected false discovery rate. Only applicable when selectorType = "fdr". Default value is 0.05.
fdr
in interface SelectorParams
Param for features column name.
featuresCol
in interface HasFeaturesCol
The highest p-value for features to be kept. Only applicable when selectorType = "fpr". Default value is 0.05.
fpr
in interface SelectorParams
The upper bound of the expected family-wise error rate. Only applicable when selectorType = "fwe". Default value is 0.05.
fwe
in interface SelectorParams
Param for label column name.
labelCol
in interface HasLabelCol
numTopFeatures
in interface SelectorParams
Param for output column name.
outputCol
in interface HasOutputCol
Percentile of features that selector will select, ordered by ascending p-value. Only applicable when selectorType = "percentile". Default value is 0.1.
percentile
in interface SelectorParams
The selector type. Supported options: "numTopFeatures" (default), "percentile", "fpr", "fdr", "fwe"
selectorType
in interface SelectorParams
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4