Serializable
, org.apache.spark.internal.Logging
, OneHotEncoderBase
, Params
, HasHandleInvalid
, HasInputCol
, HasInputCols
, HasOutputCol
, HasOutputCols
, DefaultParamsWritable
, Identifiable
, MLWritable
A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. For example with 5 categories, an input value of 2.0 would map to an output vector of
[0.0, 0.0, 1.0, 0.0]
. The last category is not included by default (configurable via
dropLast
), because it makes the vector entries sum up to one, and hence linearly dependent. So an input value of 4.0 maps to
[0.0, 0.0, 0.0, 0.0]
.
StringIndexer
for converting categorical values into category indicesWhen handleInvalid
is configured to 'keep', an extra "category" indicating invalid values is added as last category. So when dropLast
is true, invalid values are encoded as all-zeros vector.
, When encoding multi-column by using inputCols
and outputCols
params, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructors
Creates a copy of this instance with the same UID and some extra params.
Whether to drop the last category in the encoded vector (default: true)
Fits a model to the input data.
Param for how to handle invalid data during transform().
Param for input column name.
Param for input column names.
Param for output column name.
Param for output column names.
Check transform validity and derive the output schema from the input schema.
An immutable unique ID for the object and its derivatives.
Methods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
Methods inherited from interface org.apache.spark.ml.util.MLWritablesave
Methods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
public OneHotEncoder()
Param for how to handle invalid data during transform(). Options are 'keep' (invalid data presented as an extra categorical feature) or 'error' (throw an error). Note that this Param is only used during transform; during fitting, invalid data will result in an error. Default: "error"
handleInvalid
in interface HasHandleInvalid
handleInvalid
in interface OneHotEncoderBase
Whether to drop the last category in the encoded vector (default: true)
dropLast
in interface OneHotEncoderBase
Param for output column names.
outputCols
in interface HasOutputCols
Param for output column name.
outputCol
in interface HasOutputCol
Param for input column names.
inputCols
in interface HasInputCols
Param for input column name.
inputCol
in interface HasInputCol
An immutable unique ID for the object and its derivatives.
uid
in interface Identifiable
Check transform validity and derive the output schema from the input schema.
We check validity for interactions between parameters during transformSchema
and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema
in class PipelineStage
schema
- (undocumented)
Fits a model to the input data.
fit
in class Estimator<OneHotEncoderModel>
dataset
- (undocumented)
Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy()
.
copy
in interface Params
copy
in class Estimator<OneHotEncoderModel>
extra
- (undocumented)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4