org.apache.spark.mllib.feature.ChiSqSelector
Serializable
Creates a ChiSquared feature selector. The selector supports different selection methods: numTopFeatures
, percentile
, fpr
, fdr
, fwe
. - numTopFeatures
chooses a fixed number of top features according to a chi-squared test. - percentile
is similar but chooses a fraction of all features instead of a fixed number. - fpr
chooses all features whose p-values are below a threshold, thus controlling the false positive rate of selection. - fdr
uses the [Benjamini-Hochberg procedure] (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold. - fwe
chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection. By default, the selection method is numTopFeatures
, with the default number of top features set to 50.
Constructors
The is the same to call this() and setNumTopFeatures(numTopFeatures)
double
Returns a ChiSquared feature selector.
double
double
int
double
Set of selector types that ChiSqSelector supports.
public ChiSqSelector()
public ChiSqSelector(int numTopFeatures)
The is the same to call this() and setNumTopFeatures(numTopFeatures)
numTopFeatures
- (undocumented)
()
Set of selector types that ChiSqSelector supports.
public int numTopFeatures()
public double percentile()
public double fpr()
public double fdr()
public double fwe()
data
- an RDD[LabeledPoint]
containing the labeled dataset with categorical features. Real-valued features will be treated as categorical for each distinct value. Apply feature discretizer before using this function.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4