A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/OneHotEncoder.html below:

OneHotEncoder (Spark 4.0.0 JavaDoc)

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, OneHotEncoderBase, Params, HasHandleInvalid, HasInputCol, HasInputCols, HasOutputCol, HasOutputCols, DefaultParamsWritable, Identifiable, MLWritable

A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. For example with 5 categories, an input value of 2.0 would map to an output vector of

[0.0, 0.0, 1.0, 0.0]

. The last category is not included by default (configurable via

dropLast

), because it makes the vector entries sum up to one, and hence linearly dependent. So an input value of 4.0 maps to

[0.0, 0.0, 0.0, 0.0]

.

See Also:
Note:
This is different from scikit-learn's OneHotEncoder, which keeps all categories. The output vectors are sparse.

When handleInvalid is configured to 'keep', an extra "category" indicating invalid values is added as last category. So when dropLast is true, invalid values are encoded as all-zeros vector.

, When encoding multi-column by using inputCols and outputCols params, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4