RetroSearch Browse

Showing content from http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.mllib.feature.HashingTF.html below:

HashingTF#

class pyspark.mllib.feature.HashingTF(numFeatures=1048576)[source]#

Maps a sequence of terms to their term frequencies using the hashing trick.

New in version 1.2.0.

Parameters

Notes

The terms must be hashable (can not be dict/set/listâ¦).

Examples

>>> htf = HashingTF(100)
>>> doc = "a a b b c d".split(" ")
>>> htf.transform(doc)
SparseVector(100, {...})

Methods

Returns the index of the input term.

If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)

Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

Methods Documentation

indexOf(term)[source]#

Returns the index of the input term.

New in version 1.2.0.

setBinary(value)[source]#

If True, term frequency vector will be binary such that non-zero term counts will be set to 1 (default: False)

New in version 2.0.0.

transform(document)[source]#

Transforms the input document (list of terms) to term frequency vectors, or transform the RDD of document to RDD of term frequency vectors.

New in version 1.2.0.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4