A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/RFormula.html below:

RFormula (Spark 4.0.0 JavaDoc)

All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, RFormulaBase, Params, HasFeaturesCol, HasHandleInvalid, HasLabelCol, DefaultParamsWritable, Identifiable, MLWritable

Implements the transforms required for fitting a dataset against an R model formula. Currently we support a limited subset of the R operators, including '~', '.', ':', '+', '-', '*' and '^'. Also see the R formula docs here: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/formula.html

The basic operators are: - ~ separate target and terms - + concat terms, "+ 0" means removing intercept - - remove a term, "- 1" means removing intercept - : interaction (multiplication for numeric values, or binarized categorical values) - . all columns except target - * factor crossing, includes the terms and interactions between them - ^ factor crossing to a specified degree

Suppose a and b are double columns, we use the following simple examples to illustrate the effect of RFormula: - y ~ a + b means model y ~ w0 + w1 * a + w2 * b where w0 is the intercept and w1, w2 are coefficients. - y ~ a + b + a:b - 1 means model y ~ w1 * a + w2 * b + w3 * a * b where w1, w2, w3 are coefficients. - y ~ a * b means model y ~ w0 + w1 * a + w2 * b + w3 * a * b where w0 is the intercept and w1, w2, w3 are coefficients - y ~ (a + b)^2 means model y ~ w0 + w1 * a + w2 * b + w3 * a * b where w0 is the intercept and w1, w2, w3 are coefficients

RFormula produces a vector column of features and a double or string column of label. Like when formulas are used in R for linear regression, string input columns will be one-hot encoded, and numeric columns will be cast to doubles. If the label column is of type string, it will be first transformed to double with StringIndexer. If the label column does not exist in the DataFrame, the output label column will be created from the specified response variable in the formula.

See Also:

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4