Serializable
, org.apache.spark.internal.Logging
, TargetEncoderBase
, Params
, HasHandleInvalid
, HasInputCol
, HasInputCols
, HasLabelCol
, HasOutputCol
, HasOutputCols
, DefaultParamsWritable
, Identifiable
, MLWritable
Target Encoding maps a column of categorical indices into a numerical feature derived from the target.
When handleInvalid
is configured to 'keep', previously unseen values of a feature are mapped to the dataset overall statistics.
When 'targetType' is configured to 'binary', categories are encoded as the conditional probability of the target given that category (bin counting). When 'targetType' is configured to 'continuous', categories are encoded as the average of the target given that category (mean encoding)
Parameter 'smoothing' controls how in-category stats and overall stats are weighted.
StringIndexer
for converting categorical values into category indicesinputCols
and outputCols
params, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructors
Creates a copy of this instance with the same UID and some extra params.
Fits a model to the input data.
Param for how to handle invalid data during transform().
Param for input column name.
Param for input column names.
Param for label column name.
Param for output column name.
Param for output column names.
Check transform validity and derive the output schema from the input schema.
An immutable unique ID for the object and its derivatives.
Methods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritablesave
Methods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
public TargetEncoder()
Param for how to handle invalid data during transform(). Options are 'keep' (invalid data presented as an extra categorical feature) or 'error' (throw an error). Note that this Param is only used during transform; during fitting, invalid data will result in an error. Default: "error"
handleInvalid
in interface HasHandleInvalid
handleInvalid
in interface TargetEncoderBase
targetType
in interface TargetEncoderBase
smoothing
in interface TargetEncoderBase
Param for output column names.
outputCols
in interface HasOutputCols
Param for output column name.
outputCol
in interface HasOutputCol
Param for input column names.
inputCols
in interface HasInputCols
Param for input column name.
inputCol
in interface HasInputCol
Param for label column name.
labelCol
in interface HasLabelCol
An immutable unique ID for the object and its derivatives.
uid
in interface Identifiable
Check transform validity and derive the output schema from the input schema.
We check validity for interactions between parameters during transformSchema
and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema
in class PipelineStage
schema
- (undocumented)
Fits a model to the input data.
fit
in class Estimator<TargetEncoderModel>
dataset
- (undocumented)
Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy()
.
copy
in interface Params
copy
in class Estimator<TargetEncoderModel>
extra
- (undocumented)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4