RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/CountVectorizerParams.html below:

CountVectorizerParams (Spark 4.0.0 JavaDoc)

All Superinterfaces:: HasInputCol, HasOutputCol, Identifiable, Params, Serializable

All Known Implementing Classes:: CountVectorizer, CountVectorizerModel

Method Summary
Binary toggle to control the output vector values.

boolean

double

double

double

int

Specifies the maximum number of different documents a term could appear in to be included in the vocabulary.

Specifies the minimum number of different documents a term must appear in to be included in the vocabulary.

Filter to ignore rare words in a document.

Validates and transforms the input schema.

Max size of the vocabulary.
Methods inherited from interface org.apache.spark.ml.param.Paramsclear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Method Details
- binary
  Binary toggle to control the output vector values. If True, all nonzero counts (after minTF filter applied) are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts. Default: false
  
  Returns:
  
  (undocumented)
- getBinary
  boolean getBinary()
- getMaxDF
  double getMaxDF()
- getMinDF
  double getMinDF()
- getMinTF
  double getMinTF()
- getVocabSize
  int getVocabSize()
- maxDF
  Default: (2^63^) - 1
  
  Returns:
  
  (undocumented)
- minDF
  Default: 1.0
  
  Returns:
  
  (undocumented)
- minTF
  Filter to ignore rare words in a document. For each document, terms with frequency/count less than the given threshold are ignored. If this is an integer greater than or equal to 1, then this specifies a count (of times the term must appear in the document); if this is a double in [0,1), then this specifies a fraction (out of the document's token count).
  
  Note that the parameter is only used in transform of CountVectorizerModel and does not affect fitting.
  
  Default: 1.0
  
  Returns:
  
  (undocumented)
- validateAndTransformSchema
  Validates and transforms the input schema.
- vocabSize
  Default: 2^18^
  
  Returns:
  
  (undocumented)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4