A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.mllib.feature.HashingTF.html below:

HashingTF — PySpark 4.0.0 documentation

HashingTF#
class pyspark.mllib.feature.HashingTF(numFeatures=1048576)[source]#

Maps a sequence of terms to their term frequencies using the hashing trick.

New in version 1.2.0.

Parameters
numFeaturesint, optional

number of features (default: 2^20)

Notes

The terms must be hashable (can not be dict/set/list…).

Examples

>>> htf = HashingTF(100)
>>> doc = "a a b b c d".split(" ")
>>> htf.transform(doc)
SparseVector(100, {...})

Methods

indexOf(term)

Returns the index of the input term.

setBinary(value)

If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)

transform(document)

Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

Methods Documentation

indexOf(term)[source]#

Returns the index of the input term.

New in version 1.2.0.

setBinary(value)[source]#

If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)

New in version 2.0.0.

transform(document)[source]#

Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

New in version 1.2.0.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4